NAV
bash Python

Introduction

Welcome to the Gencove API docs!

The Gencove REST API makes it easy to:

  1. upload low-pass sequencing FASTQ files to the Gencove analysis pipeline
  2. download analysis results
  3. track sample status
  4. automate data delivery.

Read on to get started and try out the examples on the side along the way.

Also, additional documentation is available here: - API reference for publicly available endpoints: API reference - Command-line interface (CLI) tool reference: CLI reference

Gencove data

Genomic data is organized into “projects”. Each project contains “samples”. Each sample has an id (generated by Gencove) and client_id (provided to Gencove by clients).

In most cases, a user account and project will be created for you by our team.

In case you would like to explore the Gencove data delivery dashboard, feel free to create an account and explore as follows:

  1. Create a free Gencove account using the dashboard
  2. Create a project by going to My Projects -> Add new project

The Gencove CLI

The Gencove command-line interface (CLI) can be used to easily access the API.

It is mostly used for:

  1. Uploading FASTQ files for analysis
  2. Downloading analysis results

Quickstart

$ pip install gencove
$ gencove upload <local-directory-path>
# In shell:
pip install gencove
gencove upload <local-directory-path>

Install the Gencove CLI using the Python package manager pip and upload files to your Gencove account.

Hint: for the newest pre-release versions, check: PyPI

Video demo

Setup

Installation

$ pip install gencove
# In shell:
pip install gencove

The Gencove CLI can be installed using the Python package manager pip. The source code is available on GitLab.

Python and pip are commonly preinstalled on most Mac and Linux systems. In case you do need to install Python, commonly used instructions are available here.

In production environments, we highly recommend using virtualenv and/or virtualenvwrapper for installing the Gencove CLI in a dedicated Python environment.

Mac OS notes

Due to a known issue with Python that ships with Mac OS, the Gencove CLI should be installed in the user’s home directory (not system-wide) as follows: pip install --user gencove. Make sure to have ~/bin present in your $PATH environment variable.

For advanced users, we highly recommend virtualenvwrapper and installing the Gencove CLI within a dedicated virtualenv.

If you absolutely must install the Gencove CLI system-wide using sudo, the following command can be used as a last resort: sudo pip install gencove --ignore-installed six.

Configuration

$ export GENCOVE_EMAIL='<your-email>'
$ export GENCOVE_PASSWORD='<your-password>'
export GENCOVE_EMAIL='<your-email>'
export GENCOVE_PASSWORD='<your-password>'

Your credentials can be provided to the Gencove CLI via the environment variables $GENCOVE_EMAIL and $GENCOVE_PASSWORD.

Uploading FASTQ files

In order to enable FASTQ uploads for your account, log into your account and go to My FASTQs, where instructions will be provided (in case you already do not have access). You can expect a response from Gencove support within 24h.

Once uploads are enabled, users can upload files to the Gencove upload area using the Gencove CLI and assign the files to projects using the Gencove Dashboard. Once files are assigned to a project, they will be processed by the Gencove analysis pipeline. Analysis results will be available via the Gencove API and Dashboard once analysis is complete.

Naming files

We highly recommend using the standard Illumina naming convention for FASTQ files. If files are named in this manner, Gencove systems will automatically detect:

  1. the sample identifier (and use it as the sample’s client_id)
  2. R1/R2 designations of files

A summary of the naming convention is:

SAMPLE ID + _ + … + _ + (R1 or R2) + _ + … + .fastq.gz

For example, the table below shows examples of file names using this convention and the corresponding detected sample identifiers and read designations

File name Sample ID Read pair
SAMPLE1_R1.fastq.gz SAMPLE1 R1
SAMPLE1_R2.fastq.gz SAMPLE1 R2
SAMPLE2_LANE1_SEQUENCER1_R1.fastq.gz SAMPLE2 R1
SAMPLE3_R1_L001.fastq.gz SAMPLE3 R1
SAMPLE4_R1.fq.gz SAMPLE4 R1

Grouping files

Gencove software currently supports one pair of FASTQ files per sample. If you have reads spread across multiple files (e.g., from multiple sequencing lanes), they need to be merged into one R1 file and one R2 file. Luckily, gzipped files can be easily merged without decompressing, so it’s simply a matter of concatenating the compressed files.

Splitting files can also be avoided upstream in the demultiplexing phase by providing the --no-lane-splitting flag to bcl2fastq.

Uploading using the CLI

gencove upload <source-path> [<destination-path>]

Syncs local directories to directories in your Gencove upload area. Recursively copies new and updated files from the source directory to the destination. Only creates folders in the destination if they contain one or more files.

$ gencove upload my-fastq-files/
gencove upload my-fastq-files/

The example command will recursively copy all files in the my-fastq-files/ directory on your host system to a directory with an automatically generated name the Gencove upload area.

$ gencove upload my-fastq-files/ gncv://my-fastq/batch-1/
gencove upload my-fastq-files/ gncv://my-fastqs/batch-1/

In case more control is needed over the upload destination, a destination path prefixed with gncv:// may be provided. This pattern is commonly used for separating upload batches when continuously uploading data to your Gencove account and is useful for easily filtering files in the Gencove Dashboard. A common directory structure for batching uploads is:

gncv://<project-name>/<batch-name>/

Details of upload behavior:

Automatically starting analysis

$ gencove upload my-fastq-files/ gncv://my-fastq/batch-1/ --run-project-id b1edbb20-ee77-4be0-9944-e8e3a593cc83
gencove upload my-fastq-files/ gncv://my-fastqs/batch-1/ --run-project-id b1edbb20-ee77-4be0-9944-e8e3a593cc83

To automatically assign uploads to a project and run analysis, provide the --run-project-id flag and destination project id to the Gencove CLI.

When utilizing this feature, make sure uploaded files are named according to the file naming convention outlined above to avoid sample identifier detection issues.

Downloading deliverables

Gencove provides a number of deliverables for each sample that is processed as part of a project. In case a sample fails processing due to quality control, only the original input files are provided as deliverables.

Downloading using the CLI

$ gencove download . --project-id my-project-id
gencove download . --project-id my-project-id

gencove download <local-destination-path> --project-id <project-id>

Downloads all deliverables for all samples in project the specified project, with the following default naming scheme:

<local-destination-path>/<client-id>/<gencove-id>/<gencove-id>_<file-type>.<file-extension>

This naming scheme reflects the fact that uniqueness of client-ids is not enforced, while uniqueness of gencove-id is enforced.

Customizing download naming scheme

$ gencove download . --project-id my-project-id --download-template '{client_id}_'
gencove download . --project-id my-project-id --download-template '{client_id}_'

The default naming scheme outlined above can be customized by providing the --download-template flag and a custom file naming template that may contain {client_id} and {gencove_id} tokens.

The Gencove CLI will always append <file-type>.<file-extension> to any template provided by the user.

When using this feature, make sure to specify download templates that result in unique filenames across all samples.

Continuing previous downloads

When downloading, existing files on the local filesystem are not overwritten if the file already exists and has the same size in bytes as the file that would be downloaded. This behavior can be tweaked with the --no-skip-existing flag.

Downloading subsets of deliverables

$ gencove download . --sample-ids sample-id-1,sample-id-2,sample-id-3 --file-types impute-vcf,impute-tbi
gencove download . --sample-ids sample-id-1,sample-id-2,sample-id-3 --file-types impute-vcf,impute-tbi

Behavior of the download can also be tweaked in the following manner:

  1. Download only a specific set of sample ids by providing the --sample-ids flag instead of the --project-id flag
  2. Download only a specific set of file types by providing the --file-types flag. Currently available file types are listed below (not all file types may be available for every project).

Testing environment

Developers may use the Gencove staging environment for development and testing.

The staging developer website URL is: https://web-stage.gencove.com

The staging API URL is: https://api-stage.gencove.com

Data analysis configurations

Each Gencove project is pinned to a ‘configuration’ that specifies the species, refererence datasets (e.g. a reference genome and haplotype reference panel), and specific deliverables. These configurations can be private to a specific set of individuals, or public. The datasets underlying the public configurations are as follows:

API Reference

The full API reference for publicly available endpoints is available here: API reference

CLI Reference

The full CLI reference is available here: CLI reference

Terms

We reserve the right to remove your access to our API for any reason at our sole discretion.

FAQ

User FAQ

Support

Contact us at support@gencove.com