Shortcut Library¶
The Gencove Explorer Library is a Python package that contains a collection of pre-made “shortcuts” that represent commonly used genomic analysis workflows like subsetting and annotating VCF files.
There two main types of shortcuts:
- Local: execute locally
- Remote: execute on a cluster
Local shortcuts¶
These shortcuts execute locally and commonly do not have large resource requirements. They commonly provide visualization and summaries of various statistics.
Local shortcuts provide:
run()
method for running the shortcutresult()
method for accessing shortcut results (if applicable)save()
andload()
methods for saving and reloading shortcut state from local file storage
IGV¶
This shortcut is a version of the Integrative Genomics Viewer (IGV) integrated into the broader Gencove platform, making it easy to visually observe various aspects of a sample like BAM file read coverage relative to the reference genome.
from gencove_explorer_library.shortcuts.igv import IGV
IGV().run(
sample="c721a787-3550-4f2c-8324-97ba4686ef4c",
region="chr2:56,804,074-56,811,712",
)
Remote shortcuts¶
These shortcuts execute remotely on the cluster and commonly represent workloads with large resource requirements that cannot reasonably complete in a local environment.
Remote shortcuts provide:
input_helper()
method to generate input for the shortcut in a simple and user-friendly manner- this is a static method, therefore it is not required to instantiate an object to execute the method
run()
method for scheduling execution of the shortcut onto the clusterstatus()
method for checking shortcut execution statusresult()
method for accessing shortcut resultsanalyses()
method for returningAnalysis
objects upon which downstream shortcuts must depend onsave()
andload()
methods for saving and reloading shortcut state from local file storage
Subset VCFs¶
This shortcut enables the user to subset a collection of VCF files to a set of genomic regions.
The shortcut's input_helper()
method accepts either:
- Gencove project id
- List of gencove sample ids
and returns a dictionary containing a list of Sample objects from the Gencove platform.
Alternatively, the user may provide a list of Sample or VCF objects to the shortcut without using input_helper()
.
from gencove_explorer_library.shortcuts.subset import SubsetVCFs
from gencove_explorer.helpers import GenomicRegion
input_parameters = SubsetVCFs.input_helper("aa3a46e0-c390-4943-b613-26f9908367d5")
subset = SubsetVCFs(
regions=[GenomicRegion(contig=1, start=860000, stop=880000)],
**input_parameters,
).run()
Annotate VCFs¶
This shortcut enables the user to annotate a collection of VCF files with a specific version of ClinVar.
The shortcut's input_helper()
method accepts either:
- Gencove project id
- List of gencove sample ids
and returns a dictionary containing a list of Sample objects from the Gencove platform.
Alternatively, the user may provide a list of Sample or VCF objects to the shortcut without using input_helper()
.
from gencove_explorer_library.shortcuts.annotate import AnnotateVCFs, AnnotationClinVar
input_parameters = AnnotateVCFs.input_helper("aa3a46e0-c390-4943-b613-26f9908367d5")
annotated = AnnotateVCFs(
annotation=AnnotationClinVar(genome="GRCh37"),
**input_parameters,
).run()
Exporting sample deliverables to S3¶
This shortcut enables the user to export all or a subset of sample deliverables to AWS S3 created by Gencove's analysis pipeline.
The shortcut's input_helper()
method accepts:
- Gencove project id
- Optional list of file types
- Optional list of sample statuses; if not defined otherwise, only
succeeded
samples are used
and returns a dictionary containing a list of Sample objects from the Gencove platform.
Alternatively, the user may provide a list of Sample objects to the shortcut without using input_helper()
.
If the user is copying the files to a bucket that is outside of Explorer workspace, standard AWS credentials need to be provided.
from gencove_explorer_library.shortcuts.export_sample_deliverables import ExportSampleDeliverablesToS3
input_parameters = ExportSampleDeliverablesToS3.input_helper("aa3a46e0-c390-4943-b613-26f9908367d5")
export = ExportSampleDeliverablesToS3(
s3_path="s3://bucket/prefix/",
aws_session_configuration={
"aws_access_key_id": "AKIA...",
"aws_secret_access_key": "123...",
},
**input_parameters,
).run()
Exporting sample deliverables to Azure¶
This shortcut enables the user to export all or a subset of sample deliverables to Microsoft Azure Storage created by Gencove's analysis pipeline.
The shortcut's input_helper()
method accepts:
- Gencove project id
- Optional list of file types
- Optional list of sample statuses; if not defined otherwise, only
succeeded
samples are used
and returns a dictionary containing a list of Sample objects from the Gencove platform.
Alternatively, the user may provide a list of Sample objects to the shortcut without using input_helper()
.
In order to be able to upload to Azure Storage, the user needs to provide a connection string.
from gencove_explorer_library.shortcuts.export_sample_deliverables import ExportSampleDeliverablesToAzureStorage
input_parameters = ExportSampleDeliverablesToAzureStorage.input_helper("aa3a46e0-c390-4943-b613-26f9908367d5")
export = ExportSampleDeliverablesToAzureStorage(
azure_container_name="my-container",
azure_blob_path="foo/bar/baz",
azure_connection_string="DefaultEndpointsProtocol=https;AccountName=storagesample;AccountKey=<account-key>",
**input_parameters,
).run()
Exporting sample deliverables to GCP¶
This shortcut enables the user to export all or a subset of sample deliverables to GCP Cloud Storage created by Gencove's analysis pipeline.
The shortcut's input_helper()
method accepts:
- Gencove project id
- Optional list of file types
- Optional list of sample statuses; if not defined otherwise, only
succeeded
samples are used
and returns a dictionary containing a list of Sample objects from the Gencove platform.
Alternatively, the user may provide a list of Sample objects to the shortcut without using input_helper()
.
In order to be able to upload to GCP Storage, the user needs to provide a path to a GCP service account JSON credentials file.
from gencove_explorer_library.shortcuts.export_sample_deliverables import ExportSampleDeliverablesToGCPStorage
input_parameters = ExportSampleDeliverablesToGCPStorage.input_helper("aa3a46e0-c390-4943-b613-26f9908367d5")
export = ExportSampleDeliverablesToGCPStorage(
storage_bucket="my-bucket",
storage_path="foo/bar/baz",
gcp_service_account_json_path="credentials.json",
**input_parameters,
).run()
Composing remote shortcuts¶
One important aspect of these shortcuts is that they can be easily composed, assuming the respective inputs and outputs are compatible.
The example below subsets a collection of VCF files to a genomic region and annotates the resulting VCF files with ClinVar annotations.
from gencove_explorer_library.shortcuts.annotate import AnnotateVCFs, AnnotationClinVar
from gencove_explorer_library.shortcuts.subset import SubsetVCFs
from gencove_explorer.helpers import GenomicRegion
input_parameters = SubsetVCFs.input_helper("aa3a46e0-c390-4943-b613-26f9908367d5")
subset = SubsetVCFs(
regions=[GenomicRegion(contig=1, start=860000, stop=880000)],
**input_parameters,
)
annotated_subset = AnnotateVCFs(
vcfs=subset,
annotation=AnnotationClinVar(genome="GRCh37"),
).run()