Shortcut Library¶
The Gencove Explorer Library is a Python package that contains a collection of pre-made “shortcuts” that represent commonly used genomic analysis workflows like subsetting and annotating VCF files.
There two main types of shortcuts:
- Local: execute locally
- Remote: execute on a cluster
Local shortcuts¶
These shortcuts execute locally and commonly do not have large resource requirements. They commonly provide visualization and summaries of various statistics.
Local shortcuts provide:
run()
method for running the shortcutresult()
method for accessing shortcut results (if applicable)
Remote shortcuts¶
These shortcuts execute remotely on the cluster and commonly represent workloads with large resource requirements that cannot reasonably complete in a local environment.
They're automatically serialized and uploaded to EOS after executing run()
and can be later retrieved using ShortcutManager.
Remote shortcuts provide:
input_helper()
method to generate input for the shortcut in a simple and user-friendly manner- this is a static method, therefore it is not required to instantiate an object to execute the method
run()
method for scheduling execution of the shortcut onto the clusterstatus()
method for checking shortcut execution statusresult()
method for accessing shortcut resultsanalyses()
method for returningAnalysis
objects upon which downstream shortcuts must depend on
Composing remote shortcuts¶
One important aspect of these shortcuts is that they can be easily composed, assuming the respective inputs and outputs are compatible.
The example below subsets a collection of VCF files to a genomic region and annotates the resulting VCF files with ClinVar annotations.
from gencove_explorer_library.shortcuts.annotate import AnnotateVCFs, AnnotationClinVar
from gencove_explorer_library.shortcuts.subset import SubsetVCFs
from gencove_explorer.helpers import GenomicRegion
input_parameters = SubsetVCFs.input_helper("aa3a46e0-c390-4943-b613-26f9908367d5")
subset = SubsetVCFs(
regions=[GenomicRegion(contig=1, start=860000, stop=880000)],
**input_parameters,
)
annotated_subset = AnnotateVCFs(
vcfs=subset,
annotation=AnnotationClinVar(genome="GRCh37"),
).run()
Shortcut Manager¶
Similarly to how AnalysisManager works, it's also possible to retrieve and "rehydrate" previous Shortcut
objects via the ShortcutManager
. This allows revisiting previous shortcut executions to retrieve its outputs and
logs.
Listing previous shortcuts¶
The ShortcutManager
provides methods to list and search across previously run shortcuts.
-
Listing all shortcuts
-
Listing shortcuts from relative time points
-
Listing shortcuts from an absolute date
-
Listing shortcuts matching a name filter
Retrieving by ID¶
When you submit a shortcut to the AWS Batch cluster with .run()
, a Shortcut ID is created and the shortcut information is saved in EOS, using the list method described above you can see those ids.
You can retrieve your results using this Shortcut ID with the ShortcutManager
object.