SDK Reference
This section covers class and function implementations within the Gencove Explorer SDK.
gencove_explorer.analysis.Analysis
dataclass
¶
Primary object for defining, running, and monitoring an analysis with the Explorer SDK.
For additional details, please see the Analysis docs.
analysis_prefix
property
¶
Unique S3 prefix for user analysis job
log_group: str
property
¶
Internal property to get log group
delete(force=False)
¶
Cancels all jobs associated with analysis and deletes all of their
respective outputs. Deletes analysis from EOS storage, so cannot be retrieved
afterward via AnalysisManager
.
Note that only outputs defined within the analysis_context.output
container
will be deleted, any other outputs must be deleted manually.
By default, will prompt for confirmation before proceeding.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
force |
bool
|
If set to True, will not prompt for confirmation to delete |
False
|
get_name()
¶
Returns a string name for the Analysis object
Returns:
Type | Description |
---|---|
str
|
|
get_output(job_index=0)
¶
Get the output for a specific Analysis job.
Defaults to job_index=0
logs(job_index=0, since=None, live=False, export=None)
¶
Prints the latest logs of the job to standard output.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
job_index |
int
|
If this is an array job, pass the Job index here. Defaults to 0. |
0
|
since |
str
|
From what time to begin displaying logs. By default, logs will be displayed starting from ten minutes in the past. The value provided can be an ISO 8601 timestamp or a relative time. For relative times, provide a number and a single unit. Supported units include: s (seconds), m (minutes), h (hours), d (days), w (weeks). For example, a value of '5m' would indicate to display logs starting five minutes in the past. Note that multiple units are not supported (i.e. '5h30m'). |
None
|
live |
bool
|
True to see a live tail of a running job. Defaults to False. |
False
|
export |
Optional[str | File]
|
write the logs to a file instead of printing to stdout. |
None
|
run(sdk_commit=None, sdk_branch=None, library_commit=None, library_branch=None, dry_run=False, debug_serialized_objects=False)
¶
Submit a job to batch cluster
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sdk_branch |
Optional[str]
|
Optionally supply a git branch of the Explorer SDK to install in batch job |
None
|
sdk_commit |
Optional[str]
|
Optionally supply a git commit (first 8 chars) of the Explorer SDK to install in batch job |
None
|
library_branch |
Optional[str]
|
Optionally supply a git branch of the Explorer Library to install in batch job |
None
|
library_commit |
Optional[str]
|
Optionally supply a git commit (first 8 chars) of the Explorer Library to install in batch job |
None
|
dry_run |
bool
|
if flag is set, do not execute any AWS calls |
False
|
debug_serialized_objects |
bool
|
if flag is set, write serialized objects to working dir |
False
|
run_local(job_index=0, env_name=None, debug_serialized_objects=False, dry_run=False, sdk_branch=None, sdk_commit=None, library_branch=None, library_commit=None, container=False)
¶
Runs analysis on local machine
Note
- Can only run on a single input at a time
- By default, only runs against first input item
- Does not support jobs with dependencies
- Does not support logs or job status
- Must have >2 vCPUs to run
Parameters:
Name | Type | Description | Default |
---|---|---|---|
job_index |
int
|
The index from supplied inputs to use for processing. Defaults to 0. |
0
|
env_name |
Optional[str]
|
If supplied, will create a virtual env and use Analysis.pip_packages to install dependencies |
None
|
debug_serialized_objects |
bool
|
Set to store serialized objects locally |
False
|
dry_run |
bool
|
Set to prepare job without executing it |
False
|
sdk_branch |
Optional[str]
|
Optionally supply a git branch of the Explorer SDK to install |
None
|
sdk_commit |
Optional[str]
|
Optionally supply a git commit (first 8 chars) of the Explorer SDK to install |
None
|
library_branch |
Optional[str]
|
Optionally supply a git branch of the Explorer Library to install |
None
|
library_commit |
Optional[str]
|
Optionally supply a git commit (first 8 chars) of the Explorer Library to install |
None
|
container |
bool
|
Runs the Analysis locally in a Docker container if set to True. |
False
|
status(job_index=None, include_indices=False, _full=False)
¶
Returns the status of the Job.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
job_index |
Optional[int]
|
If this is an array job, pass the Job index here. Pass None for array jobs to get the array job status instead of a child job status. Defaults to 0. |
None
|
include_indices |
bool
|
If this is True, will return the index of each job across each batch status type (only applicable for array jobs) |
False
|
_full |
Optional[bool]
|
Set as True if you want the full response from AWS. Otherwise, returns a simple dict that contains the status of the job, and its children in "status" and "status_summary" respectively. |
False
|
Returns:
Type | Description |
---|---|
dict
|
Status of the Job |
store_analysis_history(run_type)
¶
Internal method used to write analysis history
terminate(job_index=None)
¶
Terminate a specific analysis job.
The job_index parameter is required if the Analysis
is
an array job (e.g. len(input) > 1
).
Note
- Any dependent jobs will automatically fail
- Termination requests take some time to propagate to the job cluster
- Termination requests are idempotent
Parameters:
Name | Type | Description | Default |
---|---|---|---|
job_index |
Optional[int]
|
index of analysis to terminate |
None
|
terminate_all()
¶
Terminate all jobs for this analysis
Note
- All array jobs will be terminated
- Termination requests take some time to propagate to the job cluster
- Any dependent jobs will automatically fail
wait(job_statuses, job_index=0, spinner_text='Waiting', spinner_complete_text='Complete')
¶
Main method for waiting for the current Analysis to reach
an AWS batch status listed under job_statuses
Parameters:
Name | Type | Description | Default |
---|---|---|---|
job_statuses |
List[str]
|
list of AWS batch job statuses to wait for |
required |
job_index |
int
|
Analysis job index to wait for |
0
|
spinner_text |
str
|
Spinner text to display while waiting |
'Waiting'
|
spinner_complete_text |
str
|
Spinner text to display when target status has been detected |
'Complete'
|
wait_done(job_index=0)
¶
Wait for Analysis
job with index job_index
to reach a
terminal state (succeeded or failed).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
job_index |
int
|
Analysis job index, defaults to 0 |
0
|
wait_running_or_done(job_index=0)
¶
Wait for Analysis
job with index job_index
to reach
a running or terminal state (succeeded or failed).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
job_index |
int
|
Analysis job index, defaults to 0 |
0
|
gencove_explorer.analysis.InputShared
¶
gencove_explorer.analysis.JobDefinition
dataclass
¶
Class for defining jobs resources and configuration
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cpu |
float
|
Number of vCPUs to allocate to job |
required |
memory_mb |
int
|
Amount of memory in Megabytes to allocate to job |
required |
storage_gb |
int
|
Storage in Gigabytes to allocate to job |
20
|
timeout_seconds |
int
|
Amount of time in seconds before job times out |
3600
|
attempts |
int
|
Number of times to retry the submitted job if a failure is encountered |
1
|
privileged |
bool
|
Set to True to run on privileged mode |
False
|
is_custom_image: bool
property
¶
Return True if Docker image != default Explorer image, False otherwise
get_image_name()
¶
Return image name.
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Image name. |
validate(organization_id=None)
¶
Validates that the JobDefinition is correct.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
organization_id |
str
|
Organization ID is optional. If passed checks that the compute environment has enough cpus. |
None
|
Raises:
Type | Description |
---|---|
ValueError
|
Error message if config is not valid. |
gencove_explorer.analysis.AnalysisContext
dataclass
¶
Object for accessing data related to an Analysis
job.
Examples:
child_prefix
property
¶
Returns prefix to write analysis context results. Uses the batch index to determine prefix within jobs prefix.
e.g. s3://gencove-explorer-1111/users/2222/jobs/3333/outputs/0
outputs_prefix
property
¶
Returns prefix to write outputs of the group results.
e.g. s3://gencove-explorer-1111/users/2222/jobs/3333/outputs
dependency(dep_name)
¶
Returns the dependencies of the Job.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dep_name |
str
|
Name of the function or Analysis Job. |
required |
Returns:
Type | Description |
---|---|
Optional[DependencyContainer]
|
Optional[DependencyContainer]: A single dependency object.
If not found returns |
gencove_explorer.analysis_manager.AnalysisManager
dataclass
¶
Class for browsing and retrieving previously run Analyses.
Examples:
# Instantiate manager
analysis_manager = AnalysisManager()
# List Analyses
analysis_manager.list_analyses()
# Get Analysis
a = analysis_manager.get_analysis("analysis_id")
get_analysis(analysis_id)
¶
Restore a previously run Analysis object by ID
Parameters:
Name | Type | Description | Default |
---|---|---|---|
analysis_id |
str
|
analysis ID, can be retrieved with list_analyses() method |
required |
list_analyses(since=None, date=None, name=None)
¶
List user analysis IDs. Can optionally be filtered by relative time, date, or analysis name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
since |
Optional[str]
|
From what time to begin displaying analysis. By default, all analyses will be disabled. The value provided can be a relative time. To filter, provide a number and a single unit. Supported units include: m - minutes h - hours d - days w - weeks |
None
|
date |
Optional[str]
|
Filter analyses to this date, e.g. '2023-04-28' or '2023-05' |
None
|
name |
Optional[str]
|
Filter analyses by analysis name value (case-insensitive) |
None
|
Returns: List of analysis IDs
gencove_explorer.models.Sample
dataclass
¶
Object for interacting with samples available on Gencove platform.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
id |
str
|
sample ID on Gencove platform |
required |
client_id |
Optional[str]
|
non-unique name set by the user |
None
|
project |
Optional[Project]
|
Project of which the sample is part of |
None
|
files |
Optional[dict]
|
map of sample deliverables |
None
|
last_status |
Optional[SampleStatus]
|
current sample status |
None
|
archive_last_status |
Optional[ArchiveSampleStatus]
|
current archival status |
None
|
last_metadata |
Optional[SampleMetadata]
|
user-defined metadata |
None
|
trait_scores |
Optional[dict]
|
trait values |
None
|
samples_ancestries |
Optional[Ancestry]
|
ancestry details |
None
|
samples_microbiomes |
Optional[Microbiome]
|
microbiomes details |
None
|
samples_quality_controls |
Optional[dict]
|
QC values |
None
|
restore_samples(project_id, sample_ids='')
staticmethod
¶
Restore archived samples.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
project_id |
str
|
project id |
required |
sample_ids |
Optional[str]
|
comma separated list of sample IDs, if empty, all archived samples in project will be restored |
''
|
gencove_explorer.file.File
dataclass
¶
Represents a file with a local and/or remote location.
This object provides a method to specify and transfer (download and upload) files between local and remote storage.
Can point to these supported sources/destinations via the following parameters:
- local
: path to local file
- remote
: supports URL, S3 path, EOS path
File objects can optionally be given a unique name via the name
parameter.
Note that this parameter dictates the upload destination of the file on EOS storage.
Examples:
```python
from pathlib import Path
from gencove_explorer.models import File, NamedFile
# Uploading a named file
Path("/home/explorer/example_file.txt").write_text("Hello world")
f1 = File(
remote=NamedFile(path="example_file_1"),
local="/home/explorer/example_file.txt",
)
f1.upload()
# Downloading the previously uploaded named file
f2 = File(
remote=NamedFile(path="example_file_1")
)
f2.download(local="/home/explorer/downloaded_named_file.txt")
# Download a file from a remote location to local storage
f3 = File(
remote="https://www.google.com/robots.txt",
)
f3.download(local="/home/explorer/example_robots.txt")
```
Please see the Storage and File docs for more information on usage.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
Unique name for file |
required | |
remote |
Path to remote source (e.g. URL, S3 Path, EOS Path) |
required | |
local |
Local path for file |
required |
upload(force=False)
¶
Method to upload file to user S3 storage.
Notes
- Requires
self.local
is set - Optionally
self.name
can be set
By default, will raise an exception if the supplied name is already present under the user prefix. This behavior can be overridden with the force parameter.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
force |
bool
|
if True, will ignore name check and can overwrite an existing file |
False
|
Returns:
Type | Description |
---|---|
File
|
self File object |
download(local=None, force=False)
¶
Either retrieves a file from remote storage and downloads to local storage, or generates an empty path that can be written to.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
local |
Optional[Path | str]
|
Path to download file to |
None
|
force |
bool
|
Overwrite file at local if it already exists |
False
|
Returns:
Type | Description |
---|---|
File
|
self File object |
remote_exists()
¶
Check if associated remote file exists
execute(*args, capture_output=False)
¶
Download and execute the current file as a script with args
forwarded
as command-line parameters.
Note: In case the user needs an interpreter other than /bin/sh, simply add a "shebang" line. For example, to run a Python file add the following shebang as the first line in the file: #!/usr/bin/env python
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*args |
forwarded to script as command-line arguments |
()
|
|
capture_output |
if True return output as part of CompletedProcess object and do not forward it directly to stdout/stderr |
False
|
Returns:
Type | Description |
---|---|
CompletedProcess
|
subprocess.CompletedProcess |
delete_local()
¶
Deletes file from local storage
delete_remote()
¶
Deletes file from remote S3 storage
delete_all()
¶
Deletes a file from both local and remote storage, if applicable
gencove_explorer.file.URLFile
dataclass
¶
url: str
property
¶
exists()
¶
Return True if remote URL is accessible, False otherwise
download(local, force=False, chunk_size=1024 * 1024)
¶
Download file to local storage
Parameters:
Name | Type | Description | Default |
---|---|---|---|
local |
Path | str
|
Path to download file to |
required |
force |
bool
|
Overwrite file if True |
False
|
chunk_size |
int
|
Size of chunks to write in bytes |
1024 * 1024
|
Returns:
Type | Description |
---|---|
Optional[URLFile]
|
Local path to downloaded file |
get_filename()
¶
Attempt to retrieve file name from URL
gencove_explorer.file.S3File
dataclass
¶
url: str
property
¶
Return generated presigned URL for remote object
exists()
¶
Return True if remote file is accessible, False otherwise
download(local, force=False)
¶
get_filename()
¶
Attempts to retrieve file name from object. First attempts to retrieve filename from tag, then falls back to object key
gencove_explorer.file.EFile
dataclass
¶
gencove_explorer.file.NamedFile
dataclass
¶
gencove_explorer.models.GencoveFile
dataclass
¶
id: Optional[str] = None
class-attribute
instance-attribute
¶
created: Optional[str] = None
class-attribute
instance-attribute
¶
size: Optional[float] = None
class-attribute
instance-attribute
¶
file_type: Optional[FileType] = None
class-attribute
instance-attribute
¶
last_status: Optional[FileStatus] = None
class-attribute
instance-attribute
¶
archive_last_status: Optional[ArchiveFileStatus] = None
class-attribute
instance-attribute
¶
presigned_url: Optional[str] = None
class-attribute
instance-attribute
¶
gencove_explorer.file_formats.FileFormat
¶
Base class for various supported file formats (e.g. VCF, BAM)
This serves as a parent class for file format-specific subclasses. It doesn't implement any behavior, but establishes a common interface.
gencove_explorer.file_formats.BAM
dataclass
¶
Represents a BAM (Binary Alignment Map) file.
Attributes:
Name | Type | Description |
---|---|---|
file |
Union[File, Sample]
|
The BAM file or sample data. |
index |
Optional[File]
|
The index file associated with the BAM file. |
metadata |
Optional[dict]
|
Additional metadata. |
create_index()
¶
Create a BAM index file (.bai) for the BAM file associated with this object
download()
¶
Analogous to the respective File method. Also generates or retrieves associated index.
upload(*args, **kwargs)
¶
Analogous to the respective File method. Also uploads associated index if available.
gencove_explorer.file_formats.VCF
dataclass
¶
Represents a VCF (Variant Call Format) file.
Attributes:
Name | Type | Description |
---|---|---|
file |
Union[File, Sample]
|
The VCF file or sample data. |
index |
Optional[File]
|
The index file associated with the VCF file. |
metadata |
Optional[dict]
|
Additional metadata. |
create_index()
¶
Create a VCF index file (.csi) for the VCF file associated with this object
download()
¶
Analogous to the respective File method. Also generates or retrieves associated index.
upload(*args, **kwargs)
¶
Analogous to the respective File method. Also uploads associated index if available.
gencove_explorer.datasets.ReferenceGenome
dataclass
¶
Represents a ReferenceGenome dataset.
Attributes:
Name | Type | Description |
---|---|---|
version |
Literal['g1k_v37', 'g1k_v38']
|
Version of the dataset |
species |
Literal['human']
|
species of the dataset |
download(include_bwa_indices=False)
¶
Analogous to the respective File method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
include_bwa_indices |
Include amb, ann, bwt, pac, and sa indexes |
False
|
Returns:
Type | Description |
---|---|
Path to genome.fasta.gz file |
gencove_explorer.query.Query
dataclass
¶
gencove_explorer.helpers
¶
gencove_explorer.helpers.run_shell_command = run_command
module-attribute
¶
Runs the given command in a shell until completion.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
command |
Shell command to be run. |
required | |
check |
If check is true, and the process exits with a non-zero exit code, a CalledProcessError exception will be raised. Defaults to True. |
required | |
capture_output |
If capture_output is true, stdout and stderr will be captured. |
required |
Returns:
Type | Description |
---|---|
An object representing a process that has finished. |
gencove_explorer.helpers.get_samples = samples_from_objects_or_ids
module-attribute
¶
Return a list of samples with the requested File types, given: 1. Project object or id (as strings or UUID) 2. List of Sample objects or ids (as strings or UUIDs)
If providing a list (of Sample objects or ids), all elements must have the same type.
See build_sample_query_parameters() docstring for description of file_types and quality_control_types
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_data |
A Project object or id, or list of Sample objects or ids. |
required | |
file_types |
Types of file to include. |
required | |
quality_control_types |
Quality control types to include. |
required | |
statuses |
If given only samples with this statuses will be included. |
required | |
force_query |
Force that input is refreshed from API. |
required |
Returns:
Type | Description |
---|---|
List of Sample objects. |
gencove_explorer.helpers.get_projects = projects_from_objects_or_ids
module-attribute
¶
Return a list of projects with the requested File types, given ids (as strings or UUID)
If providing a list of ids, all elements must have the same type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_data |
Project id or list of project ids. |
required | |
file_types |
Types of file to include. |
required |
Returns:
Type | Description |
---|---|
List of Project objects. |
gencove_explorer
¶
Gencove Explorer package.
gencove_explorer.s3_path_user()
¶
Returns user's private S3 path.
Data in this path is not accessible by other members and anyone outside the organization.
Returns:
Type | Description |
---|---|
str
|
S3 path of the User |
gencove_explorer.s3_path_shared_org()
¶
Returns organization's shared S3 path.
Data that is here can be accessed by all members of the organization.
Returns:
Type | Description |
---|---|
str
|
S3 path of the Organization |