Skip to content

Querying the Gencove database

In addition to submitting jobs to a cluster and storing files on S3, Gencove Explorer enables users to easily interact with the Gencove platform. Samples can be:

  1. Queried using the Query object
  2. Restored from the Gencove Archive using the Sample.restore_samples() method

Simple queries with get_samples()

In addition to complex queries with the Query object (see below), users can use the get_samples() convenience function for simply fetching a collection of samples.

The first function parameter can be either:

  1. A Gencove project id
  2. A list of Gencove sample ids

Remaining function parameters are:

  1. file_types
  2. quality_control_types
  3. statuses

The following example function call retrieves VCF files, CSI files, and raw coverage QC for all successful samples from project with id d3220fbd-75cb-4086-bea1-defd8279d509

from gencove_explorer import get_samples

s = get_samples(
    input_data="d3220fbd-75cb-4086-bea1-defd8279d509",
    file_types=["impute-vcf", "impute-csi"],
    quality_control_types=["raw_coverage"],
    statuses=["succeeded"],
)

The Query object

The Query object enables users to query samples available in their account on the Gencove platform.

The samples() method of the Query object accepts four arguments:

  1. filters - filters for selecting samples to be returned from the query
    • Multiple filters can be chained with & and have to be wrapped in parentheses
  2. include_related - enables users to select which sample-related objects to return along with the sample itself.
    • A list of objects from the models module:
      • Project
      • SampleMetadata
      • GencoveFile
      • QualityControl
      • Ancestry
      • TraitScore
      • Microbiome
  3. file_types - list of file types to return when specifying GencoveFile in include_related
    • For this parameter to have an effect, GencoveFile must be specified in include_related
  4. quality_control_types - list of file types to return when specifying QualityControl in include_related
    • For this parameter to have an effect, QualityControl must be specified in include_related

As an example, this query would pull the imputed vcf file for all samples matching the client_id HG00102-1-0-0-2.

from gencove_explorer.query import Query
from gencove_explorer.filters import SampleFilter
from gencove_explorer.models import GencoveFile

samples = Query().samples(
    filters=SampleFilter.client_id == "HG00102-1-0-0-2",
    include_related=[GencoveFile],
    file_types=["impute-vcf"])

The full list of query parameters and options can be found in the SDK Reference documentation.

Restoring samples from the Gencove Archive

Samples can be restored from the Gencove Archive by specifying:

  • Project id - restores all samples in the project
  • List of sample ids - restore the specified samples

For example:

from gencove_explorer.models import Sample

Sample.restore_samples(project_id="7481fcb8-7a86-4599-bcdd-59fded7a2c36")
Sample.restore_samples(sample_ids=[
    "5f9e9892-003a-41f7-803e-7e83eb979aef",
    "478f370e-2d99-412a-864b-46b6f6b59ffa",
    "5ea8f124-fea2-47e2-86d5-fa7401f40151"
])