Shortcut Library¶
The Gencove Explorer Library is a Python package that contains a collection of pre-made “shortcuts” that represent commonly used genomic analysis workflows like subsetting and annotating VCF files.
There two main types of shortcuts:
- Local: execute locally
- Remote: execute on a cluster
Local shortcuts¶
These shortcuts execute locally and commonly do not have large resource requirements. They commonly provide visualization and summaries of various statistics.
Local shortcuts provide:
run()
method for running the shortcutresult()
method for accessing shortcut results (if applicable)save()
andload()
methods for saving and reloading shortcut state from local file storage
Remote shortcuts¶
These shortcuts execute remotely on the cluster and commonly represent workloads with large resource requirements that cannot reasonably complete in a local environment.
Remote shortcuts provide:
input_helper()
method to generate input for the shortcut in a simple and user-friendly manner- this is a static method, therefore it is not required to instantiate an object to execute the method
run()
method for scheduling execution of the shortcut onto the clusterstatus()
method for checking shortcut execution statusresult()
method for accessing shortcut resultsanalyses()
method for returningAnalysis
objects upon which downstream shortcuts must depend onsave()
andload()
methods for saving and reloading shortcut state from local file storage
Composing remote shortcuts¶
One important aspect of these shortcuts is that they can be easily composed, assuming the respective inputs and outputs are compatible.
The example below subsets a collection of VCF files to a genomic region and annotates the resulting VCF files with ClinVar annotations.
from gencove_explorer_library.shortcuts.annotate import AnnotateVCFs, AnnotationClinVar
from gencove_explorer_library.shortcuts.subset import SubsetVCFs
from gencove_explorer.helpers import GenomicRegion
input_parameters = SubsetVCFs.input_helper("aa3a46e0-c390-4943-b613-26f9908367d5")
subset = SubsetVCFs(
regions=[GenomicRegion(contig=1, start=860000, stop=880000)],
**input_parameters,
)
annotated_subset = AnnotateVCFs(
vcfs=subset,
annotation=AnnotationClinVar(genome="GRCh37"),
).run()