Skip to content

SDK Reference

This section covers class and function implementations within the Gencove Explorer SDK.

gencove_explorer.analysis.Analysis dataclass

Primary object for defining, running, and monitoring an analysis with the Explorer SDK.

For additional details, please see the Analysis docs.

analysis_prefix property

Unique S3 prefix for user analysis job

log_group: str property

Internal property to get log group

delete(force=False)

Cancels all jobs associated with analysis and deletes all of their respective outputs. Deletes analysis from EOS storage, so cannot be retrieved afterward via AnalysisManager.

Note that only outputs defined within the analysis_context.output container will be deleted, any other outputs must be deleted manually.

By default, will prompt for confirmation before proceeding.

Parameters:

Name Type Description Default
force bool

If set to True, will not prompt for confirmation to delete

False

get_name()

Returns a string name for the Analysis object

Returns:

Type Description
str

name of analysis object if availabile, else the function name

get_output(job_index=0)

Get the output for a specific Analysis job. Defaults to job_index=0

logs(job_index=0, since=None, live=False, export=None)

Prints the latest logs of the job to standard output.

Parameters:

Name Type Description Default
job_index int

If this is an array job, pass the Job index here. Defaults to 0.

0
since str

From what time to begin displaying logs. By default, logs will be displayed starting from ten minutes in the past. The value provided can be an ISO 8601 timestamp or a relative time. For relative times, provide a number and a single unit. Supported units include: s (seconds), m (minutes), h (hours), d (days), w (weeks). For example, a value of '5m' would indicate to display logs starting five minutes in the past. Note that multiple units are not supported (i.e. '5h30m').

None
live bool

True to see a live tail of a running job. Defaults to False.

False
export Optional[str | File]

write the logs to a file instead of printing to stdout.

None

run(sdk_commit=None, sdk_branch=None, library_commit=None, library_branch=None, dry_run=False, debug_serialized_objects=False)

Submit a job to batch cluster

Parameters:

Name Type Description Default
sdk_branch Optional[str]

Optionally supply a git branch of the Explorer SDK to install in batch job

None
sdk_commit Optional[str]

Optionally supply a git commit (first 8 chars) of the Explorer SDK to install in batch job

None
library_branch Optional[str]

Optionally supply a git branch of the Explorer Library to install in batch job

None
library_commit Optional[str]

Optionally supply a git commit (first 8 chars) of the Explorer Library to install in batch job

None
dry_run bool

if flag is set, do not execute any AWS calls

False
debug_serialized_objects bool

if flag is set, write serialized objects to working dir

False

run_local(job_index=0, env_name=None, debug_serialized_objects=False, dry_run=False, sdk_branch=None, sdk_commit=None, library_branch=None, library_commit=None, container=False)

Runs analysis on local machine

Note
  • Can only run on a single input at a time
  • By default, only runs against first input item
  • Does not support jobs with dependencies
  • Does not support logs or job status
  • Must have >2 vCPUs to run

Parameters:

Name Type Description Default
job_index int

The index from supplied inputs to use for processing. Defaults to 0.

0
env_name Optional[str]

If supplied, will create a virtual env and use Analysis.pip_packages to install dependencies

None
debug_serialized_objects bool

Set to store serialized objects locally

False
dry_run bool

Set to prepare job without executing it

False
sdk_branch Optional[str]

Optionally supply a git branch of the Explorer SDK to install

None
sdk_commit Optional[str]

Optionally supply a git commit (first 8 chars) of the Explorer SDK to install

None
library_branch Optional[str]

Optionally supply a git branch of the Explorer Library to install

None
library_commit Optional[str]

Optionally supply a git commit (first 8 chars) of the Explorer Library to install

None
container bool

Runs the Analysis locally in a Docker container if set to True.

False

status(job_index=None, include_indices=False, _full=False)

Returns the status of the Job.

Parameters:

Name Type Description Default
job_index Optional[int]

If this is an array job, pass the Job index here. Pass None for array jobs to get the array job status instead of a child job status. Defaults to 0.

None
include_indices bool

If this is True, will return the index of each job across each batch status type (only applicable for array jobs)

False
_full Optional[bool]

Set as True if you want the full response from AWS. Otherwise, returns a simple dict that contains the status of the job, and its children in "status" and "status_summary" respectively.

False

Returns:

Type Description
dict

Status of the Job

store_analysis_history(run_type)

Internal method used to write analysis history

terminate(job_index=None)

Terminate a specific analysis job.

The job_index parameter is required if the Analysis is an array job (e.g. len(input) > 1 ).

Note
  • Any dependent jobs will automatically fail
  • Termination requests take some time to propagate to the job cluster
  • Termination requests are idempotent

Parameters:

Name Type Description Default
job_index Optional[int]

index of analysis to terminate

None

terminate_all()

Terminate all jobs for this analysis

Note
  • All array jobs will be terminated
  • Termination requests take some time to propagate to the job cluster
  • Any dependent jobs will automatically fail

wait(job_statuses, job_index=0, spinner_text='Waiting', spinner_complete_text='Complete')

Main method for waiting for the current Analysis to reach an AWS batch status listed under job_statuses

Parameters:

Name Type Description Default
job_statuses List[str]

list of AWS batch job statuses to wait for

required
job_index int

Analysis job index to wait for

0
spinner_text str

Spinner text to display while waiting

'Waiting'
spinner_complete_text str

Spinner text to display when target status has been detected

'Complete'

wait_done(job_index=0)

Wait for Analysis job with index job_index to reach a terminal state (succeeded or failed).

Parameters:

Name Type Description Default
job_index int

Analysis job index, defaults to 0

0

wait_running_or_done(job_index=0)

Wait for Analysis job with index job_index to reach a running or terminal state (succeeded or failed).

Parameters:

Name Type Description Default
job_index int

Analysis job index, defaults to 0

0

gencove_explorer.analysis.InputShared

Object to store shared inputs as defined by user

Examples:

input_shared = InputShared(
    sample_id="ExampleSampleID",
    kmer_length=5
)

gencove_explorer.analysis.JobDefinition dataclass

Class for defining jobs resources and configuration

Parameters:

Name Type Description Default
cpu float

Number of vCPUs to allocate to job

required
memory_mb int

Amount of memory in Megabytes to allocate to job

required
storage_gb int

Storage in Gigabytes to allocate to job

20
timeout_seconds int

Amount of time in seconds before job times out

3600
attempts int

Number of times to retry the submitted job if a failure is encountered

1
privileged bool

Set to True to run on privileged mode

False

is_custom_image: bool property

Return True if Docker image != default Explorer image, False otherwise

get_image_name()

Return image name.

Returns:

Name Type Description
str str

Image name.

validate(organization_id=None)

Validates that the JobDefinition is correct.

Parameters:

Name Type Description Default
organization_id str

Organization ID is optional. If passed checks that the compute environment has enough cpus.

None

Raises:

Type Description
ValueError

Error message if config is not valid.

gencove_explorer.analysis.AnalysisContext dataclass

Object for accessing data related to an Analysis job.

Examples:

def my_function(ac: AnalysisContext):
    print(ac.input)

child_prefix property

Returns prefix to write analysis context results. Uses the batch index to determine prefix within jobs prefix.

e.g. s3://gencove-explorer-1111/users/2222/jobs/3333/outputs/0

outputs_prefix property

Returns prefix to write outputs of the group results.

e.g. s3://gencove-explorer-1111/users/2222/jobs/3333/outputs

dependency(dep_name)

Returns the dependencies of the Job.

Parameters:

Name Type Description Default
dep_name str

Name of the function or Analysis Job.

required

Returns:

Type Description
Optional[DependencyContainer]

Optional[DependencyContainer]: A single dependency object. If not found returns None.

gencove_explorer.analysis_manager.AnalysisManager dataclass

Class for browsing and retrieving previously run Analyses.

Examples:

# Instantiate manager
analysis_manager = AnalysisManager()

# List Analyses
analysis_manager.list_analyses()

# Get Analysis
a = analysis_manager.get_analysis("analysis_id")

get_analysis(analysis_id)

Restore a previously run Analysis object by ID

Parameters:

Name Type Description Default
analysis_id str

analysis ID, can be retrieved with list_analyses() method

required

list_analyses(since=None, date=None, name=None)

List user analysis IDs. Can optionally be filtered by relative time, date, or analysis name.

Parameters:

Name Type Description Default
since Optional[str]

From what time to begin displaying analysis. By default, all analyses will be disabled. The value provided can be a relative time. To filter, provide a number and a single unit. Supported units include: m - minutes h - hours d - days w - weeks

None
date Optional[str]

Filter analyses to this date, e.g. '2023-04-28' or '2023-05'

None
name Optional[str]

Filter analyses by analysis name value (case-insensitive)

None

Returns: List of analysis IDs

gencove_explorer.models.Sample dataclass

Object for interacting with samples available on Gencove platform.

Parameters:

Name Type Description Default
id str

sample ID on Gencove platform

required
client_id Optional[str]

non-unique name set by the user

None
project Optional[Project]

Project of which the sample is part of

None
files Optional[dict]

map of sample deliverables

None
last_status Optional[SampleStatus]

current sample status

None
archive_last_status Optional[ArchiveSampleStatus]

current archival status

None
last_metadata Optional[SampleMetadata]

user-defined metadata

None
trait_scores Optional[dict]

trait values

None
samples_ancestries Optional[Ancestry]

ancestry details

None
samples_microbiomes Optional[Microbiome]

microbiomes details

None
samples_quality_controls Optional[dict]

QC values

None

restore_samples(project_id, sample_ids='') staticmethod

Restore archived samples.

Parameters:

Name Type Description Default
project_id str

project id

required
sample_ids Optional[str]

comma separated list of sample IDs, if empty, all archived samples in project will be restored

''

gencove_explorer.file.File dataclass

Represents a file with a local and/or remote location.

This object provides a method to specify and transfer (download and upload) files between local and remote storage.

Can point to these supported sources/destinations via the following parameters: - local: path to local file - remote: supports URL, S3 path, EOS path

File objects can optionally be given a unique name via the name parameter. Note that this parameter dictates the upload destination of the file on EOS storage.

Examples:

```python
from pathlib import Path
from gencove_explorer.models import File, NamedFile

# Uploading a named file
Path("/home/explorer/example_file.txt").write_text("Hello world")
f1 = File(
    remote=NamedFile(path="example_file_1"),
    local="/home/explorer/example_file.txt",
)
f1.upload()

# Downloading the previously uploaded named file
f2 = File(
    remote=NamedFile(path="example_file_1")
)
f2.download(local="/home/explorer/downloaded_named_file.txt")

# Download a file from a remote location to local storage
f3 = File(
    remote="https://www.google.com/robots.txt",
)
f3.download(local="/home/explorer/example_robots.txt")
```

Please see the Storage and File docs for more information on usage.

Parameters:

Name Type Description Default
name

Unique name for file

required
remote

Path to remote source (e.g. URL, S3 Path, EOS Path)

required
local

Local path for file

required

upload(force=False, auto_archive=True, auto_expire=False)

Method to upload file to user S3 storage.

Notes
  • Requires self.local is set
  • Optionally self.name can be set

By default, will raise an exception if the supplied name is already present under the user prefix. This behavior can be overridden with the force parameter.

Parameters:

Name Type Description Default
force bool

if True, will ignore name check and can overwrite an existing file

False
auto_archive bool

by default is True and sets the file to be archived after 10 days, set to False to override this behavior

True
auto_expire bool

by default is False. If set to True it tags the File to be automatically deleted after 7 days

False

Returns:

Type Description
File

self File object

download(local=None, force=False)

Either retrieves a file from remote storage and downloads to local storage, or generates an empty path that can be written to.

Parameters:

Name Type Description Default
local Optional[Path | str]

Path to download file to

None
force bool

Overwrite file at local if it already exists

False

Returns:

Type Description
File

self File object

remote_exists()

Check if associated remote file exists

execute(*args, capture_output=False)

Download and execute the current file as a script with args forwarded as command-line parameters.

Note: In case the user needs an interpreter other than /bin/sh, simply add a "shebang" line. For example, to run a Python file add the following shebang as the first line in the file: #!/usr/bin/env python

Parameters:

Name Type Description Default
*args

forwarded to script as command-line arguments

()
capture_output

if True return output as part of CompletedProcess object and do not forward it directly to stdout/stderr

False

Returns:

Type Description
CompletedProcess

subprocess.CompletedProcess

delete_local()

Deletes file from local storage

delete_remote()

Deletes file from remote S3 storage

delete_all()

Deletes a file from both local and remote storage, if applicable

gencove_explorer.file.URLFile dataclass

url: str property

exists()

Return True if remote URL is accessible, False otherwise

download(local, force=False, chunk_size=1024 * 1024)

Download file to local storage

Parameters:

Name Type Description Default
local Path | str

Path to download file to

required
force bool

Overwrite file if True

False
chunk_size int

Size of chunks to write in bytes

1024 * 1024

Returns:

Type Description
Optional[URLFile]

Local path to downloaded file

get_filename()

Attempt to retrieve file name from URL

gencove_explorer.file.S3File dataclass

url: str property

Return generated presigned URL for remote object

exists()

Return True if remote file is accessible, False otherwise

download(local, force=False)

Download remote file to local storage

Parameters:

Name Type Description Default
local Path | str

Local path to download file to

required
force bool

Overwrite file at local path if it already exists

False

Returns:

Type Description
Optional[Union[S3File, EFile]]

Path to downloaded file

get_filename()

Attempts to retrieve file name from object. First attempts to retrieve filename from tag, then falls back to object key

gencove_explorer.file.EFile dataclass

gencove_explorer.file.NamedFile dataclass

gencove_explorer.models.GencoveFile dataclass

id: Optional[str] = None class-attribute instance-attribute

created: Optional[str] = None class-attribute instance-attribute

size: Optional[float] = None class-attribute instance-attribute

file_type: Optional[FileType] = None class-attribute instance-attribute

last_status: Optional[FileStatus] = None class-attribute instance-attribute

archive_last_status: Optional[ArchiveFileStatus] = None class-attribute instance-attribute

presigned_url: Optional[str] = None class-attribute instance-attribute

gencove_explorer.file_formats.FileFormat

Base class for various supported file formats (e.g. VCF, BAM)

This serves as a parent class for file format-specific subclasses. It doesn't implement any behavior, but establishes a common interface.

gencove_explorer.file_formats.BAM dataclass

Represents a BAM (Binary Alignment Map) file.

Attributes:

Name Type Description
file Union[File, Sample]

The BAM file or sample data.

index Optional[File]

The index file associated with the BAM file.

metadata Optional[dict]

Additional metadata.

create_index()

Create a BAM index file (.bai) for the BAM file associated with this object

download()

Analogous to the respective File method. Also generates or retrieves associated index.

upload(*args, **kwargs)

Analogous to the respective File method. Also uploads associated index if available.

gencove_explorer.file_formats.VCF dataclass

Represents a VCF (Variant Call Format) file.

Attributes:

Name Type Description
file Union[File, Sample]

The VCF file or sample data.

index Optional[File]

The index file associated with the VCF file.

metadata Optional[dict]

Additional metadata.

create_index()

Create a VCF index file (.csi) for the VCF file associated with this object

download()

Analogous to the respective File method. Also generates or retrieves associated index.

upload(*args, **kwargs)

Analogous to the respective File method. Also uploads associated index if available.

gencove_explorer.datasets.ReferenceGenome dataclass

Represents a ReferenceGenome dataset.

Attributes:

Name Type Description
version Literal['g1k_v37', 'g1k_v38']

Version of the dataset

species Literal['human']

species of the dataset

download(include_bwa_indices=False)

Analogous to the respective File method.

Parameters:

Name Type Description Default
include_bwa_indices

Include amb, ann, bwt, pac, and sa indexes

False

Returns:

Type Description

Path to genome.fasta.gz file

gencove_explorer.query.Query dataclass

samples(filters=None, include_related=None, quality_control_types=None, file_types=None)

projects(filters=None, include_related=None, file_types=None)

gencove_explorer.helpers

gencove_explorer.helpers.run_shell_command = run_command module-attribute

Runs the given command in a shell until completion.

Parameters:

Name Type Description Default
command

Shell command to be run.

required
check

If check is true, and the process exits with a non-zero exit code, a CalledProcessError exception will be raised. Defaults to True.

required
capture_output

If capture_output is true, stdout and stderr will be captured.

required

Returns:

Type Description

An object representing a process that has finished.

gencove_explorer.helpers.get_samples = samples_from_objects_or_ids module-attribute

Return a list of samples with the requested File types, given: 1. Project object or id (as strings or UUID) 2. List of Sample objects or ids (as strings or UUIDs)

If providing a list (of Sample objects or ids), all elements must have the same type.

See build_sample_query_parameters() docstring for description of file_types and quality_control_types

Parameters:

Name Type Description Default
input_data

A Project object or id, or list of Sample objects or ids.

required
file_types

Types of file to include.

required
quality_control_types

Quality control types to include.

required
statuses

If given only samples with this statuses will be included.

required
force_query

Force that input is refreshed from API.

required

Returns:

Type Description

List of Sample objects.

gencove_explorer.helpers.get_projects = projects_from_objects_or_ids module-attribute

Return a list of projects with the requested File types, given ids (as strings or UUID)

If providing a list of ids, all elements must have the same type.

Parameters:

Name Type Description Default
input_data

Project id or list of project ids.

required
file_types

Types of file to include.

required

Returns:

Type Description

List of Project objects.

gencove_explorer

Gencove Explorer package.

gencove_explorer.s3_path_user()

Returns user's private S3 path.

Data in this path is not accessible by other members and anyone outside the organization.

Returns:

Type Description
str

S3 path of the User

gencove_explorer.s3_path_shared_org()

Returns organization's shared S3 path.

Data that is here can be accessed by all members of the organization.

Returns:

Type Description
str

S3 path of the Organization