Skip to content

SDK Reference

This section covers class and function implementations within the Gencove Explorer SDK.

gencove_explorer.analysis.Analysis dataclass

Primary object for defining, running, and monitoring an analysis with the Explorer SDK.

For additional details, please see the Analysis docs.

analysis_prefix property

Unique S3 prefix for user analysis job

log_group: str property

Internal property to get log group

get_name()

Returns a string name for the Analysis object

Returns:

Type Description
str

name of analysis object if availabile, else the function name

get_output(job_index=0)

Get the output for a specific Analysis job. Defaults to job_index=0

logs(job_index=0, since=None, live=False)

Prints the latest logs of the job to standard output.

Parameters:

Name Type Description Default
job_index int

If this is an array job, pass the Job index here. Defaults to 0.

0
since str

From what time to begin displaying logs. By default, logs will be displayed starting from ten minutes in the past. The value provided can be an ISO 8601 timestamp or a relative time. For relative times, provide a number and a single unit. Supported units include: s (seconds), m (minutes), h (hours), d (days), w (weeks). For example, a value of '5m' would indicate to display logs starting five minutes in the past. Note that multiple units are not supported (i.e. '5h30m').

None
live bool

True to see a live tail of a running job. Defaults to False.

False

run(sdk_version=None, sdk_branch=None, library_version=None, library_branch=None, dry_run=False, debug_serialized_objects=False)

Submit a job to batch cluster

Parameters:

Name Type Description Default
sdk_version Optional[str]

SDK version to use for batch job

None
sdk_branch Optional[str]

SDK branch to use for batch job

None
library_version Optional[str]

library version to use for batch job

None
library_branch Optional[str]

library branch to use for batch job

None
dry_run bool

if flag is set, do not execute any AWS calls

False
debug_serialized_objects bool

if flag is set, write serialized objects to working dir

False

run_local(job_index=0, env_name=None, debug_serialized_objects=False, dry_run=False, sdk_branch=None, sdk_version=None, library_branch=None, library_version=None)

Runs analysis on local machine

Note
  • Can only run on a single input at a time
  • By default, only runs against first input item
  • Does not support jobs with dependencies
  • Does not support logs or job status

Parameters:

Name Type Description Default
job_index int

The index from supplied inputs to use for processing. Defaults to 0.

0
env_name Optional[str]

If supplied, will create a virtual env and use Analysis.pip_packages to install dependencies

None
debug_serialized_objects bool

Set to store serialized objects locally

False
dry_run bool

Set to prepare job without executing it

False

status(job_index=None, full=False)

Returns the status of the Job.

Parameters:

Name Type Description Default
job_index Optional[int]

If this is an array job, pass the Job index here. Pass None for array jobs to get the array job status instead of a child job status. Defaults to 0.

None
full Optional[bool]

Set as True if you want the full response from AWS. Otherwise, returns a simple dict that contains the status of the job, and its children in "status" and "status_summary" respectively.

False

Returns:

Type Description
dict

Status of the Job

store_analysis_history(run_type)

Internal method used to write analysis history

terminate(job_index=None)

Terminate a specific analysis job.

The job_index parameter is required if the Analysis is an array job (e.g. len(input) > 1 ).

Note
  • Any dependent jobs will automatically fail
  • Termination requests take some time to propagate to the job cluster
  • Termination requests are idempotent

Parameters:

Name Type Description Default
job_index Optional[int]

index of analysis to terminate

None

terminate_all()

Terminate all jobs for this analysis

Note
  • All array jobs will be terminated
  • Termination requests take some time to propagate to the job cluster
  • Any dependent jobs will automatically fail

wait(job_statuses, job_index=0, spinner_text='Waiting', spinner_complete_text='Complete')

Main method for waiting for the current Analysis to reach an AWS batch status listed under job_statuses

Parameters:

Name Type Description Default
job_statuses List[str]

list of AWS batch job statuses to wait for

required
job_index int

Analysis job index to wait for

0
spinner_text str

Spinner text to display while waiting

'Waiting'
spinner_complete_text str

Spinner text to display when target status has been detected

'Complete'

wait_done(job_index=0)

Wait for Analysis job with index job_index to reach a terminal state (succeeded or failed).

Parameters:

Name Type Description Default
job_index int

Analysis job index, defaults to 0

0

wait_running_or_done(job_index=0)

Wait for Analysis job with index job_index to reach a running or terminal state (succeeded or failed).

Parameters:

Name Type Description Default
job_index int

Analysis job index, defaults to 0

0

gencove_explorer.analysis.GlobalConfig

Object to store arbitrary attributes as defined by user

Examples:

global_config = GlobalConfig(
    sample_id="ExampleSampleID",
    kmer_length=5
)

gencove_explorer.analysis.JobDefinition dataclass

Class for defining jobs resources and configuration

Parameters:

Name Type Description Default
cpu int

Number of vCPUs to allocate to job

required
memory_mb int

Amount of memory in Megabytes to allocate to job

required
timeout_seconds int

Amount of time in seconds before job times out

3600

gencove_explorer.analysis.AnalysisContext dataclass

Object for accessing data related to an Analysis job.

Examples:

def my_function(ac: AnalysisContext):
    print(ac.input)

child_prefix property

Returns prefix to write analysis context results. Uses the batch index to determine prefix within jobs prefix.

e.g. s3://gencove-explorer-1111/users/2222/jobs/3333/outputs/0

outputs_prefix property

Returns prefix to write outputs of the group results.

e.g. s3://gencove-explorer-1111/users/2222/jobs/3333/outputs

dependency(dep_name)

Returns the dependencies of the Job.

Parameters:

Name Type Description Default
dep_name str

Name of the function or Analysis Job.

required

Returns:

Type Description
Optional[DependencyContainer]

Optional[DependencyContainer]: A single dependency object. If not found returns None.

gencove_explorer.job_manager.JobManager dataclass

Class for browsing and retrieving previously run Analysis jobs

Examples:

job_manager = JobManager()
job_manager.list_jobs()

get_analysis(job_id)

Restore a previously run Analysis object by ID

Parameters:

Name Type Description Default
job_id str

analysis job ID, can be retrieved with list_jobs() method

required

list_jobs(since=None, date=None, name=None)

List user analysis job IDs. Can optionally be filtered by relative time, date, or analysis name.

Parameters:

Name Type Description Default
since Optional[str]

From what time to begin displaying jobs. By default, all jobs will be disabled. The value provided can be a relative time. To filter, provide a number and a single unit. Supported units include: m - minutes h - hours d - days w - weeks

None
date Optional[str]

Filter jobs to this date, e.g. '2023-04-28' or '2023-05'

None
name Optional[str]

Filter jobs by analysis name value (case-insensitive)

None

Returns:

Type Description
List[str]

List of analysis job IDs

gencove_explorer.models.File dataclass

Object for interacting with local or remote files.

Please see the File docs for more information.

Parameters:

Name Type Description Default
path_s3 InitVar[str]

S3 path to related file

None
name InitVar[str]

Unique name for file

None
url InitVar[str]

Remote URL for file

None
path_local InitVar[Path | str]

Local path for file

None
org_shared InitVar[bool]

Share file across the organization

False

upload(force=False)

Method to upload file to user S3 storage.

Notes
  • Requires self.path_local is set
  • Optionally self.name can be set

By default, will raise an exception if the supplied name is already present in the user prefix. This behaviour can be overridden with the force parameter.

Parameters:

Name Type Description Default
force bool

is True, will ignore name check and can overwrite an existing file

False

Returns:

Type Description
File

self File object

as_local(path_local=None, force=False)

Either retrieves a file from remote storage and downloads to local storage, or generates an empty path that can be written to.

Parameters:

Name Type Description Default
path_local Optional[Path | str]

Path to download file to

None
force bool

Overwrite file at path_local if it already exists

False

Returns:

Type Description
Optional[Path]

Path to file

as_url()

If file is available remotely, generates a URL to access the file contents

Returns:

Type Description
str

URL as string

gencove_explorer

Gencove Explorer package.

gencove_explorer.s3_path_user()

Returns user's private S3 path.

Data in this path is not accessible by other members and anyone outside the organization.

Returns:

Type Description
str

S3 path of the User

gencove_explorer.s3_path_shared_org()

Returns organization's shared S3 path.

Data that is here can be accessed by all members of the organization.

Returns:

Type Description
str

S3 path of the Organization