NAV
bash Python

Introduction

Welcome to the Gencove API docs!

The Gencove REST API makes it easy to:

  1. upload low-pass sequencing FASTQ files to the Gencove analysis pipeline
  2. download analysis results
  3. track sample status
  4. automate data delivery.

Read on to get started and try out the examples on the side along the way.

Also, additional documentation is available here: - API reference for publicly available endpoints: API reference - Command-line interface (CLI) tool reference: CLI reference

Gencove data

Genomic data is organized into “projects”. Each project contains “samples”. Each sample has an id (generated by Gencove) and client_id (provided to Gencove by clients).

In most cases, a user account and project will be created for you by our team.

In case you would like to explore the Gencove data delivery dashboard, feel free to create an account and explore as follows:

  1. Create a free Gencove account using the dashboard
  2. Create a project by going to My Projects -> Add new project

The Gencove CLI

The Gencove command-line interface (CLI) can be used to easily access the API.

It is mostly used for:

  1. Uploading FASTQ files for analysis
  2. Downloading analysis results

Quickstart

$ pip install gencove
$ gencove upload <local-directory-path>

Install the Gencove CLI using the Python package manager pip and upload files to your Gencove account.

For more detailed installation instructions, please see the Installation section below.

Video demo

Setup

Installation

$ pip install gencove

The Gencove CLI can be installed using the Python package manager pip. The source code is mirrored to a public repository on GitHub.

Python 3 and pip are commonly available on many operating systems. In case you do need to install Python 3, straighforward instructions are available here.

In production environments, we highly recommend using virtualenv and/or virtualenvwrapper for installing the Gencove CLI in a dedicated Python environment.

Python 2 end of life

Due to Python 2 reaching its end of life, Gencove migrated towards supporting exclusively Python 3 for the CLI.

We understand this may be disruptive to existing user workflows and Gencove will take the following steps to make transitioning as easy as possible:

Mac OS notes

Due to a known issue with Python that ships with Mac OS, the Gencove CLI should be installed in the user’s home directory (not system-wide) as follows: pip install --user gencove. Make sure to have ~/bin present in your $PATH environment variable.

For advanced users, we highly recommend virtualenvwrapper and installing the Gencove CLI within a dedicated virtualenv.

If you absolutely must install the Gencove CLI system-wide using sudo, the following command can be used as a last resort: sudo pip install gencove --ignore-installed six.

Configuration

$ export GENCOVE_EMAIL='<your-email>'
$ export GENCOVE_PASSWORD='<your-password>'
$ export GENCOVE_API_KEY='<your-api-key>'

Your credentials can be provided to the Gencove CLI via environment variables:

Please note that you cannot use $GENCOVE_EMAIL+$GENCOVE_PASSWORD and $GENCOVE_API_KEY at the same time.

$ curl -H "Authorization: Api-Key <your-api-key>" https://api.gencove.com/api/v2/projects/
import requests

r = requests.get(
  "https://api.gencove.com/api/v2/projects/",
  headers={"Authorization": "Api-Key <your-api-key>"}
)

API keys can also be used to authenticate with the API directly by setting the Authorization HTTP header to Api-Key <your-api-key>.

Uploading FASTQ files

In order to enable FASTQ uploads for your account, log into your account and go to My FASTQs, where instructions will be provided (in case you already do not have access). You can expect a response from Gencove support within 24h.

Once uploads are enabled, users can upload files to the Gencove upload area using the Gencove CLI and assign the files to projects using the Gencove Dashboard. Once files are assigned to a project, they will be processed by the Gencove analysis pipeline. Analysis results will be available via the Gencove API and Dashboard once analysis is complete.

File naming convention

We highly recommend using the standard Illumina naming convention for FASTQ files. If files are named in this manner, Gencove systems will automatically detect:

  1. the sample identifier (and use it as the sample’s client_id)
  2. R1/R2 designations of files

A summary of the naming convention is:

SAMPLE ID + _ + … + _ + (R1 or R2) + _ + … + .fastq.gz

For example, the table below shows examples of file names using this convention and the corresponding detected sample identifiers and read designations

File name Sample ID Read pair
SAMPLE1_R1.fastq.gz SAMPLE1 R1
SAMPLE1_R2.fastq.gz SAMPLE1 R2
SAMPLE2_LANE1_SEQUENCER1_R1.fastq.gz SAMPLE2 R1
SAMPLE3_R1_L001.fastq.gz SAMPLE3 R1
SAMPLE4_R1.fq.gz SAMPLE4 R1

Custom file names

client_id,r_notation,path
<sample_id_1>,<r_notation_1>,<path_to_fastq_file_1>
<sample_id_2>,<r_notation_2>,<path_to_fastq_file_2>
<sample_id_3>,<r_notation_3>,<path_to_fastq_file_3>
...

To bypass the default convention outlined above and explicitly specify sample identifiers and R1/R2 designations for FASTQ files, a file ending with .fastq-map.csv can be provided as the SOURCE to the gencove upload command. The format of the file is outlined in the code snippet on the right.

The following validation is performed on the .fastq-map.csv file:

Grouping files

By default, Gencove systems expect one pair of FASTQ files per sample.

client_id,r_notation,path
sampleid1,r1,sample1_part1_r1.fastq.gz
sampleid1,r1,sample1_part2_r1.fastq.gz
sampleid1,r1,sample1_part3_r1.fastq.gz
sampleid1,r2,sample1_part1_r2.fastq.gz
sampleid1,r2,sample1_part2_r2.fastq.gz
sampleid1,r2,sample1_part3_r2.fastq.gz

If sequencing reads for a single sample are spread across multiple FASTQ files, they need to be merged into one R1 file and one R2 file. This can be accomplished in several ways:

  1. Listing multiple files for the same client_id and r_notation in the .fastq-map.csv file (outlined in previous section) results in the files being concatenated on the fly during upload with the Gencove CLI - see example in code snippet on the right.
  2. Manually concatenate the files. Since gzip-compressed files can be merged without decompressing, it’s simply a matter of concatenating the compressed files.
  3. By providing the --no-lane-splitting flag to bcl2fastq, splitting reads into multiple FASTQ files can be avoided upstream in the demultiplexing phase.

Uploading using the CLI

gencove upload <source-path> [<destination-path>]

Syncs local directories to directories in your Gencove upload area. Recursively copies new and updated files from the source directory to the destination. Only creates folders in the destination if they contain one or more files.

$ gencove upload my-fastq-files/

The example command will recursively copy all files in the my-fastq-files/ directory on your host system to a directory with an automatically generated name the Gencove upload area.

$ gencove upload input.fastq-map.csv

If there are multiple input FASTQ files per sample, or the file names do not follow the conventions described above, a manifest describing the relationship between the sample identifiers and the input FASTQ files must be provided in a CSV file in the format described above.

$ gencove upload my-fastq-files/ gncv://my-fastq/batch-1/

In case more control is needed over the upload destination, a destination path prefixed with gncv:// may be provided. This pattern is commonly used for separating upload batches when continuously uploading data to your Gencove account and is useful for easily filtering files in the Gencove Dashboard. A common directory structure for batching uploads is:

gncv://<project-name>/<batch-name>/

Details of upload behavior:

Automatically starting analysis

$ gencove upload my-fastq-files/ gncv://my-fastq/batch-1/ --run-project-id b1edbb20-ee77-4be0-9944-e8e3a593cc83

To automatically assign uploads to a project and run analysis, provide the --run-project-id flag and destination project id to the Gencove CLI.

When this feature is used, the Gencove CLI will check to make sure that contents of SOURCE and DESTINATION are identical in order to avoid analysis of unwanted samples. This will always be the case if DESTINATION is omitted, i.e., autogenerated by the Gencove CLI.

It is also important to ensure uploaded files follow naming conventions outlined above to avoid sample identifier detection issues.

Downloading deliverables

Gencove provides a number of deliverables for each sample that is processed as part of a project. In case a sample fails processing due to quality control, only the original input files are provided as deliverables.

Downloading using the CLI

$ gencove download . --project-id my-project-id

gencove download <local-destination-path> --project-id <project-id>

Downloads all deliverables for all samples in project the specified project, with the following default naming scheme:

<local-destination-path>/<client-id>/<gencove-id>/<gencove-id>_<file-type>.<file-extension>

This naming scheme reflects the fact that uniqueness of client-ids is not enforced, while uniqueness of gencove-id is enforced.

Customizing download naming scheme

$ gencove download . --project-id my-project-id --download-template '{client_id}.{file_extension}'

The default naming scheme outlined above can be customized by providing the --download-template flag and a custom file naming template that may contain {client_id}, {gencove_id}, {file_type}, {file_extension} and {default_filename} tokens.

When using this feature, make sure to specify download templates that result in unique filenames across all samples.

The {default_filename} token provides access to the API’s default file naming scheme, which takes into account different bioinformatics conventions across a subset of file types. Current exceptions to the default {gencove_id}_{file_type}.{file_extension} scheme are:

Continuing previous downloads

When downloading, existing files on the local filesystem are not overwritten if the file already exists and has the same size in bytes as the file that would be downloaded. This behavior can be tweaked with the --no-skip-existing flag.

Downloading subsets of deliverables

$ gencove download . --sample-ids sample-id-1,sample-id-2,sample-id-3 --file-types impute-vcf,impute-tbi

Behavior of the download can also be tweaked in the following manner:

  1. Download only a specific set of sample ids by providing the --sample-ids flag instead of the --project-id flag
  2. Download only a specific set of file types by providing the --file-types flag. Currently available file types are listed below (not all file types may be available for every project).

The Gencove Archive

gencove projects restore-samples my-project-id --sample-ids sample-id-1,...,sample-id-N

The Gencove Archive automatically transitions samples older than 30 days from hot storage to the Archive. Once a sample is in the Archive, its deliverables are not immediately available for download, rather users need to intentionally restore them from the Archive using the Gencove web dashboard, command-line interface (CLI), or API. Sample restoration can take up to 50 hours. Upon restoration, sample deliverables are available to download for 12 days, after which they return to the Archive.

Note that default views in the Gencove web dashboard and CLI only display samples that are immediately available for download. To view archived samples, set the view filter to either:

Listing projects, samples and uploads

Listing projects

$ gencove projects list

All projects can be listed using the gencove projects list command.

Listing project samples

$ gencove projects list-samples my-project-id

All samples can be listed using the gencove projects list-samples command.

$ gencove projects list-samples my-project-id --status completed
$ gencove projects list-samples my-project-id --search my-client-id

Project samples can also be filtered by status and searched. Metadata substring can be specified as the search query as well.

Listing uploads

$ gencove uploads list

Uploads can be listed using the gencove uploads list command.

$ gencove uploads list --status assigned
$ gencove uploads list --search gncv://upload/path

Uploads can also be filtered by status and searched.

Sample metadata and files

Gencove supports assigning metadata to a sample in JavaScript Object Notation (JSON) format.

Information commonly stored as sample metadata:

Each sample has many different files assigned to it that can be retrieved using the CLI.

The following CLI commands can be used to set and get metadata:

Assigning sample metadata

$ gencove samples set-metadata my-sample-id --json '{"example-key": "example-value"}'
$ gencove samples set-metadata my-sample-id --json '1234567'

Metadata can be assigned to a sample using the gencove samples set-metadata command. Specifying sample id and the --json flag together with a JSON string is mandatory.

Retrieving sample metadata

$ gencove samples get-metadata my-sample-id

Sample metadata can be retrieved by using the gencove samples get-metadata command. Optionally, --output-filename my-filename can be used to specify the filename where the metadata will be output. If not specified, metadata will be printed to stdout.

Downloading single sample file

Download and save file

$ gencove samples download-file sample-id-1 impute-vcf destination.vcf

A single sample file can be downloaded using the gencove samples download-file command.

Download and stream file to stdout

$ gencove samples download-file sample-id-1 impute-vcf -

A single sample file can be downloaded and streamed to stdout using the gencove samples download-file command.

Merged VCF file

Gencove supports generating a merged VCF file containing variant calls from all successful samples in a project.

Generating a merged VCF file is initiated from the Gencove Dashboard, by opening a project and clicking the “Merge VCFs” button. Once the merge operation is complete, a download button will appear on the project page.

Please keep in mind:

In addition to the web interface, the following CLI commands can be used to access merged VCF functionality:

Creating a merged VCF

$ gencove projects create-merged-vcf my-project-id

A merged VCF file can be created using the gencove projects create-merged-vcf command.

Checking the status of a merged VCF

$ gencove projects status-merged-vcf my-project-id

Status of the merging job can be checked using the gencove projects status-merged-vcf command.

Downloading the merged VCF

$ gencove projects get-merged-vcf my-project-id

The merged VCF file can be downloaded using the gencove projects get-merged-vcf command. Optionally, --output-filename my-filename can be used to override the default filename.

Backwards-compatible array deliverables

Backwards-compatible genotyping array deliverables can be generated for batches of samples in projects that support this functionality. Each project configuration can support multiple batch types that correspond to different array types.

More information about these deliverables is available in this blog post

Listing batch types

$ gencove projects list-batch-types my-project-id

Available batch types for a project can be listed using the gencove projects list-batch-types command.

Creating a batch

$ gencove projects create-batch --batch-type illuminasnp50 --batch-name batch-001 --sample-ids sample-id-1,...,sample-id-N my-project-id
$ gencove projects create-batch --batch-type illuminasnp50 --batch-name batch-001 my-project-id

A new batch can be created using the gencove projects create-batch command. Omitting --sample-ids results in all samples belonging to the project being used for the batch.

Successful generation of a batch deliverable will also trigger a webhook associated with the project.

Listing project batches

$ gencove projects list-batches my-project-id

Project batches can be listed using the gencove projects list-batches command.

Downloading batch deliverable

$ gencove projects get-batch my-batch-id --output-filename batch.zip

Once the batch deliverable is generated, it is available for download using the gencove projects get-batch command.

Automated data delivery

Data delivery can be automated using subscriptions.

Subscriptions currently support real-time notifications via webhooks.

Users can specify multiple subscriptions for a project for multiple events. Once a new real-time webhook subscription is specified, events relating to that project will be submitted to the target URL in JSON format via HTTP POST requests.

Currently, legacy webhook format and an updated format are supported. New event types will only support new format.

Once a webhook is received, the receiver is responsible for querying the Gencove API for more details on each object that is referenced. For example, upon receiving a analysis_complete_v2 webhook for a project, the receiver should query the Gencove API for additional sample details and fresh download URLs for deliverables related to those samples.

Available event types and webhook format

[
  {
    "event_id": "0b2d7502-86b0-863e-11e2-990d0e134e8a",
    "event_type": "analysis_complete_v2",
    "timestamp": "2021-04-09T12:01:26.346341Z",
    "payload": {
      "project": {
        "id": "1d6daca6-475a-4961-9841-57aac36cbd0f"
      },
      "samples": [
        {
          "id": "45273390-64bd-4a07-a1be-8514d3ba7750",
          "client_id": "HumanS01-001",
          "last_status": {
            "status": "succeeded"
          }
        }
      ]
    }
  }
]
[
  {
    "event_id": "bdc558af-4815-7736-8bab-9be8c5f63fff",
    "event_type": "batch_final_report_complete_v2",
    "timestamp": "2021-04-09T12:01:26.346341Z",
    "payload": {
      "project": {
        "id": "1d6daca6-475a-4961-9841-57aac36cbd0f"
      },
      "batch": {
        "id": "77799c11-aa57-4a06-aa17-978d203f1eb5",
        "name": "Batch 001",
        "last_status": {
          "status": "succeeded"
        }
      }
    }
  }
]

The content of the webhook contains a list of event objects where each event has the following keys:

The following events are available:

Available legacy event types and webhook format

Please note that users are highly encouraged to use the updated format. We are planning to sunset legacy format at a later date. Please follow our blog or our mailing list for more information about this in the future.

{
    "event": "analysis_complete",
    "object_id": "99573a16-98a8-48fc-8caf-e3b4dcdf34e6",
    "timestamp": "2018-11-18T14:09:59.741183",
    "payload": {
        "project_id": "1d6daca6-475a-4961-9841-57aac36cbd0f",
        "sample_ids": [
            "45273390-64bd-4a07-a1be-8514d3ba7750"
        ]
    }
}
{
    "event": "batch_final_report",
    "object_id": "99573a16-98a8-48fc-8caf-e3b4dcdf34e6",
    "timestamp": "2018-11-18T14:09:59.741183",
    "payload": {
        "project_id": "1d6daca6-475a-4961-9841-57aac36cbd0f",
    }
}

The content of the legacy webhook contains the following keys:

Together, object_id and event should be considered unique and duplicates should be handled by the receiver.

The following legacy events are available:

Webhook signatures

Gencove can optionally sign webhook events it sends to endpoints. This is done by including a signature in each event’s Gencove-Signature header, allowing you to verify that the events were sent by Gencove.

Before verifying signatures, webhooks need to be enabled and the secret needs to be retrieved for each project via the Gencove API (API reference). Note that each subscription has a separate unique secret.

After this setup, Gencove automatically starts signing each webhook notification it sends to the endpoint of the related project.

Verifying webhook signatures

The Gencove-Signature header contains a timestamp and a signature:

Example signature: Gencove-Signature: t=1492774577,v1=5257a869e7ecebeda32affa62cdca3fa51cad7e77a0e56ff536d0ce8e108d8bd

Gencove generates signatures using a hash-based message authentication code (HMAC) with SHA-512. To prevent downgrade attacks, you should ignore all schemes that are not v1.

export SECRET='super-secret'
export HEADER='t=1492774577,v1=5257a869e7ecebeda32affa62cdca3fa51cad7e77a0e56ff536d0ce8e108d8bd'
export PAYLOAD='{"k":"v"}'
gencove webhooks verify $SECRET $HEADER $PAYLOAD
import hmac, hashlib

def calculate_signature(secret, timestamp, payload):
    signature_message = "{}.{}".format(timestamp, payload).encode("utf-8")
    return hmac.new(
        secret.encode("utf-8"),
        signature_message,
        hashlib.sha512
    ).hexdigest()

There are multiple ways to verify a signature, there is a command available in the Gencove CLI, it can be done directly from the python console or from any other tool that can generate a HMAC with SHA-512.

Step 1: Extract the timestamp and signatures from the header

Split the header, using the , character as the separator, to get a list of elements. Then split each element, using the = character as the separator, to get a prefix and value pair.

The value for the prefix t corresponds to the timestamp, and v1 corresponds to the signature.

Step 2: Prepare signature_message

This is achieved by concatenating:

  1. The timestamp (as a string)
  2. The character .
  3. The actual JSON payload (i.e., the request’s body)

Step 3: Determine the expected signature

Compute an HMAC with the SHA512 hash function. Use the endpoint’s signing secret as the key and the signature_message string as the message.

Step 4: Compare signatures

Compare the signature in the header to the expected signature. If a signature matches, compute the difference between the current timestamp and the received timestamp, then decide if the difference is within your tolerance.

Testing environment

Developers may use the Gencove staging environment for development and testing.

The staging developer website URL is: https://web-stage.gencove.com

The staging API URL is: https://api-stage.gencove.com

Data analysis configurations

Each Gencove project is pinned to a ‘configuration’ that specifies the species, reference datasets (e.g. a reference genome and haplotype reference panel), and specific deliverables. These configurations can be private to a specific set of individuals, or public. The datasets underlying the public configurations are as follows:

Permissions

The Gencove permissions system is a role-based access control (RBAC) system. This means that users are assigned roles (e.g., “Owner”, “Manager”, etc.) and each role is assigned a set of permissions.

Organization-level role permissions

Member Uploader Viewer Analyst Manager Owner
Update user profile
Upload data
View all projects
Run sample analysis
Create and edit projects
Invite and manage users

Project-level role permissions

In addition to assigning roles to users at the organization level, a subset of the roles listed above may also be assigned to users at the project level as needed. This enables providing users with basic access to the organization via Member or Uploader roles and escalating privileges for a subset of projects as needed.

To assign a project-level role to a user, the user must have already joined the organization via the standard invitation process.

FAQ

Which browsers does Gencove support?

The Gencove dashboard supports Safari, Chrome, Firefox, and Edge. Generally, any WebKit-based browser should work without issues. In case you encounter compatibility issues, please let us know at support@gencove.com.

How long does it take to get my results back?

Results are returned within 6-8 weeks from the time samples arrive in our lab. Let us know if you have special requirements for turnaround time and we’ll work with you to make sure your samples are done on time!

How do you deliver the resulting data?

Data can be accessed via the Gencove data management website or REST API. For downloading data in bulk, we recommend using the API or our command-line tool.

Do you have an API?

Yes, the API can be used to track sample status and automate data delivery. Check out the docs above in this page.

Which sequencing machines do you use for low-pass sequencing?

We always use the latest Illumina and BGI sequencing machines. Depending on the specifics of your project, we’ll use Illumina (NovaSeq, HiSeq X, NextSeq) or BGI (DNBseq) sequencers.

API Reference

The full API reference for publicly available endpoints is available here: API reference

CLI Reference

The full CLI reference is available here: CLI reference

Terms

We reserve the right to remove your access to our API for any reason at our sole discretion.

Support

Contact us at support@gencove.com