Skip to content

SDK Reference

This section covers class and function implementations within the Gencove Explorer SDK.

gencove_explorer.analysis.Analysis dataclass

Primary object for defining, running, and monitoring an analysis with the Explorer SDK.

For additional details, please see the Analysis docs.

analysis_prefix property

Unique S3 prefix for user analysis job

log_group: str property

Internal property to get log group

get_name()

Returns a string name for the Analysis object

Returns:

Type Description
str

name of analysis object if availabile, else the function name

get_output(job_index=0)

Get the output for a specific Analysis job. Defaults to job_index=0

logs(job_index=0, since=None, live=False)

Prints the latest logs of the job to standard output.

Parameters:

Name Type Description Default
job_index int

If this is an array job, pass the Job index here. Defaults to 0.

0
since str

From what time to begin displaying logs. By default, logs will be displayed starting from ten minutes in the past. The value provided can be an ISO 8601 timestamp or a relative time. For relative times, provide a number and a single unit. Supported units include: s (seconds), m (minutes), h (hours), d (days), w (weeks). For example, a value of '5m' would indicate to display logs starting five minutes in the past. Note that multiple units are not supported (i.e. '5h30m').

None
live bool

True to see a live tail of a running job. Defaults to False.

False

run(sdk_commit=None, sdk_branch=None, library_commit=None, library_branch=None, dry_run=False, debug_serialized_objects=False)

Submit a job to batch cluster

Parameters:

Name Type Description Default
sdk_branch Optional[str]

Optionally supply a git branch of the Explorer SDK to install in batch job

None
sdk_commit Optional[str]

Optionally supply a git commit (first 8 chars) of the Explorer SDK to install in batch job

None
library_branch Optional[str]

Optionally supply a git branch of the Explorer Library to install in batch job

None
library_commit Optional[str]

Optionally supply a git commit (first 8 chars) of the Explorer Library to install in batch job

None
dry_run bool

if flag is set, do not execute any AWS calls

False
debug_serialized_objects bool

if flag is set, write serialized objects to working dir

False

run_local(job_index=0, env_name=None, debug_serialized_objects=False, dry_run=False, sdk_branch=None, sdk_commit=None, library_branch=None, library_commit=None)

Runs analysis on local machine

Note
  • Can only run on a single input at a time
  • By default, only runs against first input item
  • Does not support jobs with dependencies
  • Does not support logs or job status

Parameters:

Name Type Description Default
job_index int

The index from supplied inputs to use for processing. Defaults to 0.

0
env_name Optional[str]

If supplied, will create a virtual env and use Analysis.pip_packages to install dependencies

None
debug_serialized_objects bool

Set to store serialized objects locally

False
dry_run bool

Set to prepare job without executing it

False
sdk_branch Optional[str]

Optionally supply a git branch of the Explorer SDK to install

None
sdk_commit Optional[str]

Optionally supply a git commit (first 8 chars) of the Explorer SDK to install

None
library_branch Optional[str]

Optionally supply a git branch of the Explorer Library to install

None
library_commit Optional[str]

Optionally supply a git commit (first 8 chars) of the Explorer Library to install

None

status(job_index=None, include_indices=False, _full=False)

Returns the status of the Job.

Parameters:

Name Type Description Default
job_index Optional[int]

If this is an array job, pass the Job index here. Pass None for array jobs to get the array job status instead of a child job status. Defaults to 0.

None
include_indices bool

If this is True, will return the index of each job across each batch status type (only applicable for array jobs)

False
_full Optional[bool]

Set as True if you want the full response from AWS. Otherwise, returns a simple dict that contains the status of the job, and its children in "status" and "status_summary" respectively.

False

Returns:

Type Description
dict

Status of the Job

store_analysis_history(run_type)

Internal method used to write analysis history

terminate(job_index=None)

Terminate a specific analysis job.

The job_index parameter is required if the Analysis is an array job (e.g. len(input) > 1 ).

Note
  • Any dependent jobs will automatically fail
  • Termination requests take some time to propagate to the job cluster
  • Termination requests are idempotent

Parameters:

Name Type Description Default
job_index Optional[int]

index of analysis to terminate

None

terminate_all()

Terminate all jobs for this analysis

Note
  • All array jobs will be terminated
  • Termination requests take some time to propagate to the job cluster
  • Any dependent jobs will automatically fail

wait(job_statuses, job_index=0, spinner_text='Waiting', spinner_complete_text='Complete')

Main method for waiting for the current Analysis to reach an AWS batch status listed under job_statuses

Parameters:

Name Type Description Default
job_statuses List[str]

list of AWS batch job statuses to wait for

required
job_index int

Analysis job index to wait for

0
spinner_text str

Spinner text to display while waiting

'Waiting'
spinner_complete_text str

Spinner text to display when target status has been detected

'Complete'

wait_done(job_index=0)

Wait for Analysis job with index job_index to reach a terminal state (succeeded or failed).

Parameters:

Name Type Description Default
job_index int

Analysis job index, defaults to 0

0

wait_running_or_done(job_index=0)

Wait for Analysis job with index job_index to reach a running or terminal state (succeeded or failed).

Parameters:

Name Type Description Default
job_index int

Analysis job index, defaults to 0

0

gencove_explorer.analysis.InputShared

Object to store shared inputs as defined by user

Examples:

input_shared = InputShared(
    sample_id="ExampleSampleID",
    kmer_length=5
)

gencove_explorer.analysis.JobDefinition dataclass

Class for defining jobs resources and configuration

Parameters:

Name Type Description Default
cpu float

Number of vCPUs to allocate to job

required
memory_mb int

Amount of memory in Megabytes to allocate to job

required
timeout_seconds int

Amount of time in seconds before job times out

3600
attempts int

Number of times to retry the submitted job if a failure is encountered

1

is_custom_image: bool property

Return True if Docker image != default Explorer image, False otherwise

get_image_name()

Return image name.

Returns:

Name Type Description
str

Image name.

validate(organization_id=None)

Validates that the JobDefinition is correct.

Parameters:

Name Type Description Default
organization_id

Organization ID is optional. If passed checks that the compute environment has enough cpus.

None

Raises:

Type Description
ValueError

Error message if config is not valid.

gencove_explorer.analysis.AnalysisContext dataclass

Object for accessing data related to an Analysis job.

Examples:

def my_function(ac: AnalysisContext):
    print(ac.input)

child_prefix property

Returns prefix to write analysis context results. Uses the batch index to determine prefix within jobs prefix.

e.g. s3://gencove-explorer-1111/users/2222/jobs/3333/outputs/0

outputs_prefix property

Returns prefix to write outputs of the group results.

e.g. s3://gencove-explorer-1111/users/2222/jobs/3333/outputs

dependency(dep_name)

Returns the dependencies of the Job.

Parameters:

Name Type Description Default
dep_name str

Name of the function or Analysis Job.

required

Returns:

Type Description
Optional[DependencyContainer]

Optional[DependencyContainer]: A single dependency object. If not found returns None.

gencove_explorer.analysis_manager.AnalysisManager dataclass

Class for browsing and retrieving previously run Analyses

Examples:

analysis_manager = AnalysisManager()
analysis_manager.list_analyses()

get_analysis(analysis_id)

Restore a previously run Analysis object by ID

Parameters:

Name Type Description Default
analysis_id str

analysis ID, can be retrieved with list_analyses() method

required

list_analyses(since=None, date=None, name=None)

List user analysis IDs. Can optionally be filtered by relative time, date, or analysis name.

Parameters:

Name Type Description Default
since Optional[str]

From what time to begin displaying analysis. By default, all analyses will be disabled. The value provided can be a relative time. To filter, provide a number and a single unit. Supported units include: m - minutes h - hours d - days w - weeks

None
date Optional[str]

Filter analyses to this date, e.g. '2023-04-28' or '2023-05'

None
name Optional[str]

Filter analyses by analysis name value (case-insensitive)

None

Returns: List of analysis IDs

gencove_explorer.models.Sample dataclass

Object for interacting with samples available on Gencove platform.

Parameters:

Name Type Description Default
id str

sample ID on Gencove platform

required
client_id Optional[str]

non-unique name set by the user

None
project Optional[Project]

Project of which the sample is part of

None
files Optional[dict]

map of sample deliverables

None
last_status Optional[SampleStatus]

current sample status

None
archive_last_status Optional[ArchiveSampleStatus]

current archival status

None
last_metadata Optional[SampleMetadata]

user-defined metadata

None
trait_scores Optional[dict]

trait values

None
samples_ancestries Optional[Ancestry]

ancestry details

None
samples_microbiomes Optional[Microbiome]

microbiomes details

None
samples_quality_controls Optional[dict]

QC values

None

restore_samples(project_id, sample_ids='') staticmethod

Restore archived samples.

Parameters:

Name Type Description Default
project_id str

project id

required
sample_ids Optional[str]

comma separated list of sample IDs, if empty, all archived samples in project will be restored

''

gencove_explorer.file.File dataclass

Represents a file with a local and/or remote location.

This object provides a method to specify and transfer (download and upload) files between local and remote storage.

Can point to these supported sources/destinations via the following parameters: - local: path to local file - remote: supports URL, S3 path, EOS path

File objects can optionally be given a unique name via the name parameter. Note that this parameter dictates the upload destination of the file on EOS storage.

Examples:

```python
from pathlib import Path
from gencove_explorer.models import File, NamedFile

# Uploading a named file
Path("/home/explorer/example_file.txt").write_text("Hello world")
f1 = File(
    remote=NamedFile(path="example_file_1"),
    local="/home/explorer/example_file.txt",
)
f1.upload()

# Downloading the previously uploaded named file
f2 = File(
    remote=NamedFile(path="example_file_1")
)
f2.download(local="/home/explorer/downloaded_named_file.txt")

# Download a file from a remote location to local storage
f3 = File(
    remote="https://www.google.com/robots.txt",
)
f3.download(local="/home/explorer/example_robots.txt")
```

Please see the Storage and File docs for more information on usage.

Parameters:

Name Type Description Default
name

Unique name for file

required
remote

Path to remote source (e.g. URL, S3 Path, EOS Path)

required
local

Local path for file

required

upload(force=False)

Method to upload file to user S3 storage.

Notes
  • Requires self.local is set
  • Optionally self.name can be set

By default, will raise an exception if the supplied name is already present under the user prefix. This behavior can be overridden with the force parameter.

Parameters:

Name Type Description Default
force bool

if True, will ignore name check and can overwrite an existing file

False

Returns:

Type Description
File

self File object

download(local=None, force=False)

Either retrieves a file from remote storage and downloads to local storage, or generates an empty path that can be written to.

Parameters:

Name Type Description Default
local Optional[Path | str]

Path to download file to

None
force bool

Overwrite file at local if it already exists

False

Returns:

Type Description
File

self File object

remote_exists()

Check if associated remote file exists

execute(*args, capture_output=False)

Download and execute the current file as a script with args forwarded as command-line parameters.

Note: In case the user needs an interpreter other than /bin/sh, simply add a "shebang" line. For example, to run a Python file add the following shebang as the first line in the file:

!/usr/bin/env python

Parameters:

Name Type Description Default
*args

forwarded to script as command-line arguments

()
capture_output

if True return output as part of CompletedProcess object and do not forward it directly to stdout/stderr

False

Returns:

Type Description
CompletedProcess

subprocess.CompletedProcess

delete_local()

Deletes file from local storage

delete_remote()

Deletes file from remote S3 storage

delete_all()

Deletes a file from both local and remote storage, if applicable

gencove_explorer.file.URLFile dataclass

url: str property

exists()

Return True if remote URL is accessible, False otherwise

download(local, force=False, chunk_size=1024 * 1024)

Download file to local storage

Parameters:

Name Type Description Default
local Path | str

Path to download file to

required
force bool

Overwrite file if True

False
chunk_size int

Size of chunks to write in bytes

1024 * 1024

Returns:

Type Description
Optional[URLFile]

Local path to downloaded file

get_filename()

Attempt to retrieve file name from URL

gencove_explorer.file.S3File dataclass

url: str property

Return generated presigned URL for remote object

exists()

Return True if remote file is accessible, False otherwise

download(local, force=False)

Download remote file to local storage

Parameters:

Name Type Description Default
local Path | str

Local path to download file to

required
force bool

Overwrite file at local path if it already exists

False

Returns:

Type Description
Optional[Union[S3File, EFile]]

Path to downloaded file

get_filename()

Attempts to retrieve file name from object. First attempts to retrieve filename from tag, then falls back to object key

gencove_explorer.file.EFile dataclass

gencove_explorer.file.NamedFile dataclass

gencove_explorer.file_formats.FileFormat

Base class for various supported file formats (e.g. VCF, BAM)

This serves as a parent class for file format-specific subclasses. It doesn't implement any behavior, but establishes a common interface.

gencove_explorer.file_formats.BAM dataclass

Represents a BAM (Binary Alignment Map) file.

Attributes:

Name Type Description
file Union[File, Sample]

The BAM file or sample data.

index Optional[File]

The index file associated with the BAM file.

metadata Optional[dict]

Additional metadata.

create_index()

Create a BAM index file (.bai) for the BAM file associated with this object

download()

Analogous to the respective File method. Also generates or retrieves associated index.

upload(*args, **kwargs)

Analogous to the respective File method. Also uploads associated index if available.

gencove_explorer.file_formats.VCF dataclass

Represents a VCF (Variant Call Format) file.

Attributes:

Name Type Description
file Union[File, Sample]

The VCF file or sample data.

index Optional[File]

The index file associated with the VCF file.

metadata Optional[dict]

Additional metadata.

create_index()

Create a VCF index file (.csi) for the VCF file associated with this object

download()

Analogous to the respective File method. Also generates or retrieves associated index.

upload(*args, **kwargs)

Analogous to the respective File method. Also uploads associated index if available.

gencove_explorer.datasets.ReferenceGenome dataclass

Represents a ReferenceGenome dataset.

Attributes:

Name Type Description
version Literal['g1k_v37', 'g1k_v38']

Version of the dataset

species Literal['human']

species of the dataset

download(include_bwa_indices=False)

Analogous to the respective File method.

Parameters:

Name Type Description Default
include_bwa_indices

Include amb, ann, bwt, pac, and sa indexes

False

Returns:

Type Description

Path to genome.fasta.gz file

gencove_explorer

Gencove Explorer package.

gencove_explorer.s3_path_user()

Returns user's private S3 path.

Data in this path is not accessible by other members and anyone outside the organization.

Returns:

Type Description
str

S3 path of the User

gencove_explorer.s3_path_shared_org()

Returns organization's shared S3 path.

Data that is here can be accessed by all members of the organization.

Returns:

Type Description
str

S3 path of the Organization