Querying the Gencove database¶
In addition to submitting jobs to a cluster and storing files on S3, Gencove Explorer enables users to easily interact with the Gencove platform. Samples can be:
- Queried using the
Query
object - Restored from the Gencove Archive using the
Sample.restore_samples()
method
Simple queries with get_samples()
¶
In addition to complex queries with the Query
object (see below), users can use the
get_samples()
convenience function for simply fetching a collection of samples.
The first function parameter can be either:
- A Gencove project id
- A list of Gencove sample ids
Remaining function parameters are:
file_types
quality_control_types
statuses
The following example function call retrieves VCF files, CSI files, and raw coverage QC
for all successful samples from project with id d3220fbd-75cb-4086-bea1-defd8279d509
from gencove_explorer import get_samples
s = get_samples(
input_data="d3220fbd-75cb-4086-bea1-defd8279d509",
file_types=["impute-vcf", "impute-csi"],
quality_control_types=["raw_coverage"],
statuses=["succeeded"],
)
The Query object¶
The Query
object enables users to query samples available in their account on the Gencove platform.
The samples()
method of the Query
object accepts four arguments:
filters
- filters for selecting samples to be returned from the query- Multiple filters can be chained with
&
and have to be wrapped in parentheses
- Multiple filters can be chained with
include_related
- enables users to select which sample-related objects to return along with the sample itself.- A list of objects from the
models
module:Project
SampleMetadata
GencoveFile
QualityControl
Ancestry
TraitScore
Microbiome
- A list of objects from the
file_types
- list of file types to return when specifyingGencoveFile
ininclude_related
- For this parameter to have an effect,
GencoveFile
must be specified ininclude_related
- For this parameter to have an effect,
quality_control_types
- list of file types to return when specifyingQualityControl
ininclude_related
- For this parameter to have an effect,
QualityControl
must be specified ininclude_related
- For this parameter to have an effect,
As an example, this query would pull the imputed vcf file for all samples matching the client_id HG00102-1-0-0-2
.
from gencove_explorer.query import Query
from gencove_explorer.filters import SampleFilter
from gencove_explorer.models import GencoveFile
samples = Query().samples(
filters=SampleFilter.client_id == "HG00102-1-0-0-2",
include_related=[GencoveFile],
file_types=["impute-vcf"])
The full list of query parameters and options can be found in the SDK Reference documentation.
Restoring samples from the Gencove Archive¶
Samples can be restored from the Gencove Archive by specifying:
- Project id - restores all samples in the project
- List of sample ids - restore the specified samples
For example: