Querying the Gencove database¶
In addition to submitting jobs to a cluster and storing files on S3, Gencove Explorer enables users to easily interact with the Gencove platform. Samples can be:
- Queried using the
Queryobject - Restored from the Gencove Archive using the
Sample.restore_samples()method
Simple queries with get_samples()¶
In addition to complex queries with the Query object (see below), users can use the
get_samples() convenience function for simply fetching a collection of samples.
The first function parameter can be either:
- A Gencove project id
- A list of Gencove sample ids
Remaining function parameters are:
file_typesquality_control_typesstatusesarchive_status
The following example function call retrieves VCF files, CSI files, and raw coverage QC for samples that are currently available and not archived.
for all successful samples from project with id d3220fbd-75cb-4086-bea1-defd8279d509
from gencove_explorer import get_samples
s = get_samples(
input_data="d3220fbd-75cb-4086-bea1-defd8279d509",
file_types=["impute-vcf", "impute-csi"],
quality_control_types=["raw_coverage"],
statuses=["succeeded"],
archive_status=["available"],
)
The Query object¶
The Query object enables users to query samples available in their account on the Gencove platform.
The samples() method of the Query object accepts four arguments:
filters- filters for selecting samples to be returned from the query- Multiple filters can be chained with
&and have to be wrapped in parentheses
- Multiple filters can be chained with
include_related- enables users to select which sample-related objects to return along with the sample itself.- A list of objects from the
modelsmodule:ProjectSampleMetadataGencoveFileQualityControlAncestryTraitScoreMicrobiome
- A list of objects from the
file_types- list of file types to return when specifyingGencoveFileininclude_related- For this parameter to have an effect,
GencoveFilemust be specified ininclude_related
- For this parameter to have an effect,
quality_control_types- list of file types to return when specifyingQualityControlininclude_related- For this parameter to have an effect,
QualityControlmust be specified ininclude_related
- For this parameter to have an effect,
As an example, this query would pull the imputed vcf file for all samples matching the client_id HG00102-1-0-0-2.
from gencove_explorer.query import Query
from gencove_explorer.filters import SampleFilter
from gencove_explorer.models import GencoveFile
samples = Query().samples(
filters=SampleFilter.client_id == "HG00102-1-0-0-2",
include_related=[GencoveFile],
file_types=["impute-vcf"])
The full list of query parameters and options can be found in the SDK Reference documentation.
Restoring samples from the Gencove Archive¶
Samples can be restored from the Gencove Archive by specifying:
- Project id - restores all samples in the project
- List of sample ids - restore the specified samples
For example: