Skip to content

Shortcut Library

The Gencove Explorer Library is a Python package that contains a collection of pre-made “shortcuts” that represent commonly used genomic analysis workflows like subsetting and annotating VCF files.

There two main types of shortcuts:

  • Local: execute locally
  • Remote: execute on a cluster

Local shortcuts

These shortcuts execute locally and commonly do not have large resource requirements. They commonly provide visualization and summaries of various statistics.

Local shortcuts provide:

  • run() method for running the shortcut
  • result() method for accessing shortcut results (if applicable)
  • save() and load() methods for saving and reloading shortcut state from local file storage

IGV

This shortcut is a version of the Integrative Genomics Viewer (IGV) integrated into the broader Gencove platform, making it easy to visually observe various aspects of a sample like BAM file read coverage relative to the reference genome.

from gencove_explorer_library.shortcuts.igv import IGV
IGV().run(
    sample="c721a787-3550-4f2c-8324-97ba4686ef4c",
    region="chr2:56,804,074-56,811,712",
)

Remote shortcuts

These shortcuts execute remotely on the cluster and commonly represent workloads with large resource requirements that cannot reasonably complete in a local environment.

Remote shortcuts provide:

  • input_helper() method to generate input for the shortcut in a simple and user-friendly manner
    • this is a static method, therefore it is not required to instantiate an object to execute the method
  • run() method for scheduling execution of the shortcut onto the cluster
  • status() method for checking shortcut execution status
  • result() method for accessing shortcut results
  • analyses() method for returning Analysis objects upon which downstream shortcuts must depend on
  • save() and load() methods for saving and reloading shortcut state from local file storage

Subset VCFs

This shortcut enables the user to subset a collection of VCF files to a set of genomic regions.

The shortcut's input_helper() method accepts either:

  • Gencove project id
  • List of Gencove sample ids

and returns a dictionary containing a list of Sample objects from the Gencove platform.

Alternatively, the user may provide a list of Sample or VCF objects to the shortcut without using input_helper().

from gencove_explorer_library.shortcuts.subset import SubsetVCFs
from gencove_explorer.helpers import GenomicRegion

input_parameters = SubsetVCFs.input_helper("aa3a46e0-c390-4943-b613-26f9908367d5")

subset = SubsetVCFs(
    regions=[GenomicRegion(contig=1, start=860000, stop=880000)],
    **input_parameters,
).run()

Annotate VCFs

This shortcut enables the user to annotate a collection of VCF files with a specific version of ClinVar.

The shortcut's input_helper() method accepts either:

  • Gencove project id
  • List of Gencove sample ids

and returns a dictionary containing a list of Sample objects from the Gencove platform.

Alternatively, the user may provide a list of Sample or VCF objects to the shortcut without using input_helper().

from gencove_explorer_library.shortcuts.annotate import AnnotateVCFs, AnnotationClinVar

input_parameters = AnnotateVCFs.input_helper("aa3a46e0-c390-4943-b613-26f9908367d5")

annotated = AnnotateVCFs(
    annotation=AnnotationClinVar(genome="GRCh37"),
    **input_parameters,
).run()

Shard VCFs

This shortcut enables the user to shard a collection of VCF files to a set of genomic regions.

The shortcut's input_helper() method accepts either:

  • Gencove project id
  • List of Gencove sample ids

and returns a dictionary containing a list of Sample objects from the Gencove platform.

Alternatively, the user may provide a list of Sample or VCF objects to the shortcut without using input_helper().

from gencove_explorer_library.shortcuts.shard_vcfs import ShardVCFs
from gencove_explorer.helpers import GenomicRegion

input_parameters = ShardVCFs.input_helper("aa3a46e0-c390-4943-b613-26f9908367d5")

sharded = ShardVCFs(
    regions=[GenomicRegion(contig=1), GenomicRegion(contig=2)],
    **input_parameters,
).run()

Merge VCFs

This shortcut enables the user to merge a collection of VCF files from non-overlapping sample sets to create one multi-sample file.

The shortcut's input_helper() method accepts either:

  • Gencove project id
  • List of Gencove sample ids

and returns a dictionary containing a list of lists of Sample objects from the Gencove platform.

Alternatively, the user may provide a list of lists of Sample or VCF objects to the shortcut without using input_helper().

VCFs from each sublist of the list will be merged into a single VCF file.

from gencove_explorer_library.shortcuts.merge_vcfs import MergeVCFs

input_parameters = MergeVCFs.input_helper("aa3a46e0-c390-4943-b613-26f9908367d5")

merged = MergeVCFs(
    **input_parameters,
).run()

Concatenate VCFs

This shortcut enables the user to concatenate a collection of VCF files from the same set of samples. All source files must have the same sample columns appearing in the same order.

The user may provide a list of lists of VCF objects to the shortcut. VCFs from each sublist of the list will be concatenated into a single VCF file.

from gencove_explorer_library.shortcuts.concatenate_vcfs import ConcatenateVCFs

concatenated = ConcatenateVCFs(
    vcfs=[
        [shard1_vcf, shard2_vcf, shard3_vcf],
        [shard4_vcf, shard5_vcf, shard6_vcf]
    ],
).run()

Shard, merge, and concatenate VCFs

This shortcut enables the user to shard a collection of VCF files to a set of genomic regions, merge the shards, and concatenate the merged shards into large shards. It is essentially a distributed version of MergeVCFs that works for large sample numbers and large VCF files by composing ShardVCFs, MergeVCFs, and ConcatenateVCFs shortcuts.

The shortcut's input_helper() method accepts either:

  • Gencove project id
  • List of Gencove sample ids

and returns a dictionary containing a list of Sample objects from the Gencove platform.

Alternatively, the user may provide a list of Sample or VCF objects to the shortcut without using input_helper().

The regions input is a list of lists, where the elements in each sublist define the shards and each sublist defines which shards get concatenated into final output VCFs.

from gencove_explorer_library.shortcuts.shard_merge_concatenate_vcfs import ShardMergeConcatenateVCFs
from gencove_explorer.helpers import GenomicRegion

step = int(5e6)
regions = [
        [GenomicRegion(contig="chr22", start=s, stop=s+step) for s in range(int(10e6), int(30e6), step)], # These shards are concatenated into the first output VCF
        [GenomicRegion(contig="chr22", start=s, stop=s+step) for s in range(int(30e6), int(55e6), step)], # These shards are concatenated into the second output VCF
]

input_parameters = ShardMergeConcatenateVCFs.input_helper("aa3a46e0-c390-4943-b613-26f9908367d5")

sharded_merged_concatenated = ShardMergeConcatenateVCFs(
    regions=regions,
    **input_parameters,
).run()

Exporting sample deliverables to S3

This shortcut enables the user to export all or a subset of sample deliverables to AWS S3 created by Gencove's analysis pipeline.

The shortcut's input_helper() method accepts:

  • Gencove project id
  • Optional list of file types
  • Optional list of sample statuses; if not defined otherwise, only succeeded samples are used

and returns a dictionary containing a list of Sample objects from the Gencove platform.

Alternatively, the user may provide a list of Sample objects to the shortcut without using input_helper().

If the user is copying the files to a bucket that is outside of Explorer workspace, standard AWS credentials need to be provided.

from gencove_explorer_library.shortcuts.export_sample_deliverables import ExportSampleDeliverablesToS3

input_parameters = ExportSampleDeliverablesToS3.input_helper("aa3a46e0-c390-4943-b613-26f9908367d5")

export = ExportSampleDeliverablesToS3(
    s3_path="s3://bucket/prefix/",
    aws_session_configuration={
        "aws_access_key_id": "AKIA...",
        "aws_secret_access_key": "123...",
    },
    **input_parameters,
).run()

Exporting sample deliverables to Azure

This shortcut enables the user to export all or a subset of sample deliverables to Microsoft Azure Storage created by Gencove's analysis pipeline.

The shortcut's input_helper() method accepts:

  • Gencove project id
  • Optional list of file types
  • Optional list of sample statuses; if not defined otherwise, only succeeded samples are used

and returns a dictionary containing a list of Sample objects from the Gencove platform.

Alternatively, the user may provide a list of Sample objects to the shortcut without using input_helper().

In order to be able to upload to Azure Storage, the user needs to provide a connection string.

from gencove_explorer_library.shortcuts.export_sample_deliverables import ExportSampleDeliverablesToAzureStorage

input_parameters = ExportSampleDeliverablesToAzureStorage.input_helper("aa3a46e0-c390-4943-b613-26f9908367d5")

export = ExportSampleDeliverablesToAzureStorage(
    azure_container_name="my-container",
    azure_blob_path="foo/bar/baz",
    azure_connection_string="DefaultEndpointsProtocol=https;AccountName=storagesample;AccountKey=<account-key>",
    **input_parameters,
).run()

Exporting sample deliverables to GCP

This shortcut enables the user to export all or a subset of sample deliverables to GCP Cloud Storage created by Gencove's analysis pipeline.

The shortcut's input_helper() method accepts:

  • Gencove project id
  • Optional list of file types
  • Optional list of sample statuses; if not defined otherwise, only succeeded samples are used

and returns a dictionary containing a list of Sample objects from the Gencove platform.

Alternatively, the user may provide a list of Sample objects to the shortcut without using input_helper().

In order to be able to upload to GCP Storage, the user needs to provide a path to a GCP service account JSON credentials file.

from gencove_explorer_library.shortcuts.export_sample_deliverables import ExportSampleDeliverablesToGCPStorage

input_parameters = ExportSampleDeliverablesToGCPStorage.input_helper("aa3a46e0-c390-4943-b613-26f9908367d5")

export = ExportSampleDeliverablesToGCPStorage(
    storage_bucket="my-bucket",
    storage_path="foo/bar/baz",
    gcp_service_account_json_path="credentials.json",
    **input_parameters,
).run()

Composing remote shortcuts

One important aspect of these shortcuts is that they can be easily composed, assuming the respective inputs and outputs are compatible.

The example below subsets a collection of VCF files to a genomic region and annotates the resulting VCF files with ClinVar annotations.

from gencove_explorer_library.shortcuts.annotate import AnnotateVCFs, AnnotationClinVar
from gencove_explorer_library.shortcuts.subset import SubsetVCFs
from gencove_explorer.helpers import GenomicRegion

input_parameters = SubsetVCFs.input_helper("aa3a46e0-c390-4943-b613-26f9908367d5")

subset = SubsetVCFs(
    regions=[GenomicRegion(contig=1, start=860000, stop=880000)],
    **input_parameters,
)

annotated_subset = AnnotateVCFs(
    vcfs=subset,
    annotation=AnnotationClinVar(genome="GRCh37"),
).run()