Skip to content

The Gencove CLI

The Gencove command-line interface (CLI) can be used to easily access the API.

It is mostly used for:

  1. Uploading FASTQ files for analysis
  2. Downloading analysis results

Quickstart

$ pip install gencove
$ gencove upload <local-directory-path>

Install the Gencove CLI using the Python package manager pip and upload files to your Gencove account.

For more detailed installation instructions, please see the Installation section below.

Video demo

Setup

Installation

$ pip install gencove

Note

Gencove CLI is compatible with Python versions 3.8 and above. Please note that Python 3.7 has reached its end of life, and we highly recommend upgrading to a supported version.

The Gencove CLI can be installed using the Python package manager pip. The source code is mirrored to a public repository on GitHub.

Python 3 and pip are commonly available on many operating systems. In case you do need to install Python 3, straightforward instructions are available here.

In production environments, we highly recommend using virtualenv and/or virtualenvwrapper for installing the Gencove CLI in a dedicated Python environment.

Configuring the CLI host

The Gencove platform is deployed across several geographical locations to accommodate users across the world. When using the CLI, the --host option can be configured to point to an alternative environment. Users can find their respective environment by checking the URL used to access the web UI, which follows the format web.<env>.gencove.com.

For example, if you are using the EU environment, you must configure your CLI commands as follows:

$ gencove <command> <options> --host https://api.eu1.gencove.com <args>

To upload a directory to Gencove on the EU host:

$ gencove upload /home/gencove/reads --host https://api.eu1.gencove.com --api-key GENCOVE_API_KEY

Mac OS notes

Due to a known issue with Python that ships with Mac OS, the Gencove CLI should be installed in the user's home directory (not system-wide) as follows: pip install --user gencove. Make sure to have ~/bin present in your $PATH environment variable.

For advanced users, we highly recommend virtualenvwrapper and installing the Gencove CLI within a dedicated virtualenv.

If you absolutely must install the Gencove CLI system-wide using sudo, the following command can be used as a last resort: sudo pip install gencove --ignore-installed six.

Configuration

Your credentials can be provided to the Gencove CLI via environment variables:

  • $GENCOVE_EMAIL and $GENCOVE_PASSWORD
  • $GENCOVE_API_KEY
    • API keys can be generated and revoked using the Gencove Dashboard under Account Settings -> API Keys
$ export GENCOVE_EMAIL='<your-email>'
$ export GENCOVE_PASSWORD='<your-password>'
$ export GENCOVE_API_KEY='<your-api-key>'

Please note that you cannot use $GENCOVE_EMAIL+$GENCOVE_PASSWORD and $GENCOVE_API_KEY at the same time.

$ curl -H "Authorization: Api-Key <your-api-key>" https://api.gencove.com/api/v2/projects/
import requests

r = requests.get(
  "https://api.gencove.com/api/v2/projects/",
  headers={"Authorization": "Api-Key <your-api-key>"}
)

API keys can also be used to authenticate with the API directly by setting the Authorization HTTP header to Api-Key <your-api-key>.

Note

If MFA (multi-factor authentication) is enabled in the account and you use email and password credentials, an MFA token will be requested in the terminal after the command is submitted.

If an API key is used instead, no other token is necessary.

Uploading FASTQ files

In order to enable FASTQ uploads for your account, log into your account and go to My FASTQs, where instructions will be provided (in case you already do not have access). You can expect a response from Gencove support within 24h.

Once uploads are enabled, users can upload files to the Gencove upload area using the Gencove CLI and assign the files to projects using the Gencove Dashboard. Once files are assigned to a project, they will be processed by the Gencove analysis pipeline. Analysis results will be available via the Gencove API and Dashboard once analysis is complete.

Warning

The Gencove upload area should be considered temporary storage and should not be used as permanent storage space for your files. Once files are assigned to a project, they will be stored according to your data retention agreement with Gencove.

File naming convention

We highly recommend using the standard Illumina naming convention for FASTQ files. If files are named in this manner, Gencove systems will automatically detect:

  1. the sample identifier (and use it as the sample's client_id)
  2. R1/R2 designations of files

A summary of the naming convention is:

SAMPLE ID + _ + ... + _ + (R1 or R2) + _ + ... + .fastq.gz

For example, the table below shows examples of file names using this convention and the corresponding detected sample identifiers and read designations

File name Sample ID Read pair
SAMPLE1_R1.fastq.gz SAMPLE1 R1
SAMPLE1_R2.fastq.gz SAMPLE1 R2
SAMPLE2_LANE1_SEQUENCER1_R1.fastq.gz SAMPLE2 R1
SAMPLE3_R1_L001.fastq.gz SAMPLE3 R1
SAMPLE4_R1.fq.gz SAMPLE4 R1

Custom file names

To bypass the default convention outlined above and explicitly specify sample identifiers and R1/R2 designations for FASTQ files, a file ending with .fastq-map.csv can be provided as the SOURCE to the gencove upload command. The format of the file is outlined in the code snippet on the right.

The following validation is performed on the .fastq-map.csv file:

  • the file header is client_id,r_notation,path
  • values in the client_id column cannot contain _
  • values in the r_notation column can only be "r1" or "r2"
  • file listed in the path column must:
  • exist
  • be gzip-compressed

Example:

client_id,r_notation,path
<sample_id_1>,<r_notation_1>,<path_to_fastq_file_1>
<sample_id_2>,<r_notation_2>,<path_to_fastq_file_2>
<sample_id_3>,<r_notation_3>,<path_to_fastq_file_3>
...

Grouping files

By default, Gencove systems expect one pair of FASTQ files per sample.

If sequencing reads for a single sample are spread across multiple FASTQ files, they need to be merged into one R1 file and one R2 file. This can be accomplished in several ways:

  1. Listing multiple files for the same client_id and r_notation in the .fastq-map.csv file (outlined in previous section) results in the files being concatenated on the fly during upload with the Gencove CLI - see example in code snippet on the right. Note that the concatenation order is controlled by the order of rows listed in the .fastq-map.csv file.
  2. Manually concatenate the files. Since gzip-compressed files can be merged without decompressing, it's simply a matter of concatenating the compressed files.
  3. By providing the --no-lane-splitting flag to bcl2fastq, splitting reads into multiple FASTQ files can be avoided upstream in the demultiplexing phase.

Example:

client_id,r_notation,path
sampleid1,r1,sample1_part1_r1.fastq.gz
sampleid1,r1,sample1_part2_r1.fastq.gz
sampleid1,r1,sample1_part3_r1.fastq.gz
sampleid1,r2,sample1_part1_r2.fastq.gz
sampleid1,r2,sample1_part2_r2.fastq.gz
sampleid1,r2,sample1_part3_r2.fastq.gz

Uploading using the CLI

$ gencove upload <source-path> [<destination-path>]

Syncs local directories to directories in your Gencove upload area. Recursively copies new and updated files from the source directory to the destination.

Alternatively, can be used to import FASTQ files from URLs using a map file.

Only creates folders in the destination if they contain one or more files.

$ gencove upload my-fastq-files/

This example command will recursively copy all files in the my-fastq-files/ directory on your host system to a directory with an automatically generated name the Gencove upload area.

$ gencove upload input.fastq-map.csv

If there are multiple input FASTQ files per sample, or the file names do not follow the conventions described above, a manifest describing the relationship between the sample identifiers and the input FASTQ files must be provided in a CSV file in the format described above.

$ gencove upload my-fastq-files/ gncv://my-fastq/batch-1/

In case more control is needed over the upload destination, a destination path prefixed with gncv:// may be provided. This pattern is commonly used for separating upload batches when continuously uploading data to your Gencove account and is useful for easily filtering files in the Gencove Dashboard. A common directory structure for batching uploads is:

gncv://<project-name>/<batch-name>/

If specifying a destination path, it is recommended to have at least one level of directories to separate batches of uploaded data. In other words, it is recommended to avoid placing all files in the root directory gncv://

Details of upload behavior:

  • In case a file in the local directory already exists in the destination, it will not be overwritten
  • In case a file exists in the destination, but not the local directory, it will not be deleted

Importing files from URLs with a map file

Using the map file described above, it is also possible to import FASTQ files from URLs. When constructing the map CSV file, include the URL for each file under the path column.

Here is an example of the contents of a CSV map file that uses URLs:

client_id,r_notation,path
sample1,r1,https://example-bucket.storage.googleapis.com/sample_R1.fastq.gz
sample1,r2,https://example-bucket.storage.googleapis.com/sample_R2.fastq.gz

Note

Note that only the following URL domains are supported:

  • amazonaws.com (AWS)

  • blob.core.windows.net (Azure)

  • googleapis.com (Google Cloud)

Once the map file has been built, the upload command can be used:

$ gencove upload input.fastq-map.csv

Warning

When generating URLs from the above cloud providers, we suggest setting a generous expiration time to ensure the URLs do not expire by the time they reach a project and need to be retrieved by the corresponding pipeline.

Automatically starting analysis

To automatically assign uploads to a project and run analysis, provide the --run-project-id flag and destination project id to the Gencove CLI.

$ gencove upload my-fastq-files/ gncv://my-fastq/batch-1/ --run-project-id b1edbb20-ee77-4be0-9944-e8e3a593cc83

When this feature is used, the Gencove CLI will check to make sure that contents of SOURCE and DESTINATION are identical in order to avoid analysis of unwanted samples. This will always be the case if DESTINATION is omitted, i.e., autogenerated by the Gencove CLI.

It is also important to ensure uploaded files follow naming conventions outlined above to avoid sample identifier detection issues.

Downloading deliverables

Gencove provides a number of deliverables for each sample that is processed as part of a project. In case a sample fails processing due to quality control, only the original input files are provided as deliverables.

Downloading using the CLI

$ gencove download <local-destination-path> --project-id <project-id>`

Downloads all deliverables for all samples in project the specified project, with the following default naming scheme:

<local-destination-path>/<client-id>/<gencove-id>/<gencove-id>_<file-type>.<file-extension>

This naming scheme reflects the fact that uniqueness of client-ids is not enforced, while uniqueness of gencove-id is enforced.

Customizing download naming scheme

The default naming scheme outlined above can be customized by providing the --download-template flag and a custom file naming template that may contain {client_id}, {gencove_id}, {file_type}, {file_extension} and {default_filename} tokens.

$ gencove download . --project-id <project-id> --download-template '{client_id}.{file_extension}'

When using this feature, make sure to specify download templates that result in unique filenames across all samples.

The {default_filename} token provides access to the API's default file naming scheme, which takes into account different bioinformatics conventions across a subset of file types. Current exceptions to the default {gencove_id}_{file_type}.{file_extension} scheme are:

  • fastq-r1: {gencove_id}_R1.fastq.gz
  • fastq-r2: {gencove_id}_R2.fastq.gz
  • alignment-bam: {gencove_id}.bam
  • alignment-bai: {gencove_id}.bam.bai
  • impute-vcf: {gencove_id}.vcf.gz
  • impute-tbi: {gencove_id}.vcf.gz.tbi
  • impute-csi: {gencove_id}.vcf.gz.csi

Continuing previous downloads

When downloading, existing files on the local filesystem are not overwritten if the file already exists and has the same size in bytes as the file that would be downloaded. This behavior can be tweaked with the --no-skip-existing flag.

Downloading subsets of deliverables

$ gencove download . --sample-ids sample-id-1,sample-id-2,sample-id-3 --file-types impute-vcf,impute-tbi
Behavior of the download can also be tweaked in the following manner:

  1. Download only a specific set of sample ids by providing the --sample-ids flag instead of the --project-id flag
  2. Download only a specific set of file types by providing the --file-types flag. Currently available file types are listed below (not all file types may be available for every project).
fastq-r1
original input FASTQ file with raw sequencing reads, containing the first read of a read pair when using paired-end sequencing
fastq-r2
original input FASTQ file with raw sequencing reads, containing the second read of a read pair when using paired-end sequencing
alignment-bam
BAM file with reads aligned to the target genome (includes all reads from original FASTQ files)
alignment-bai
BAI index file accompanying the BAM file
cnv-cnr
CNR file with bin-level log2 ratios for copy-number variation calls
cnv-cns
CNS file with segmented lod2 ratios for copy-number variation calls
cnv-pdf
Portable Document Format (PDF) file with copy-number variation plot
cnv-png
Portable Network Graphics (PNG) file with copy-number variation plot (commonly used when PDFs are too large).
impute-vcf
VCF file with imputed variant calls
impute-tbi
Tabix index file accompanying the VCF file
impute-csi
CSI index file accompanying the VCF file
kraken-report
Kraken report for sequencing reads that didn't map to the target genome
ancestry-json

JavaScript Object Notation (JSON) file with ancestry estimates for subpopulations, contains the following keys:

  • ancestry - contains ancestry estimates
  • ancestry_raw - may contain additional entries for ambiguous groupings in situations where specific subgroups cannot be consistently identified
  • ancestry_metadata_id - legacy key (should be disregarded)
traits-json

JSON file with polygenic risk score calculations

  • each key represents a polygenic score outlined in the "Data analysis configurations" section below
  • each polygenic score object contains the following keys:
    • score - calculated value of polygenic score
    • nsnp - number of single-nucleotide polymorphisms (SNPs) taken into account
    • score_percentile - percentile of individual's score relative to scores calculated for individuals in the reference dataset used to generate the score
call_capture-vcf
VCF file with variant calls from target capture regions, corresponding with the deliverable labeled Target capture, VCF file in the web interface.
call_capture-csi
index accompanying the target capture VCF file, corresponding with the deliverable labeled Target capture, CSI file in the web interface.
call_capture-vcf_pathogenic
VCF file with pathogenic variant calls from target capture regions
call_capture-forced_vcf
VCF file with variant calls at a set of pre-determined variants, corresponding with the deliverable labeled Target capture (pre-defined variants), VCF file in the web interface.
call_capture-forced_csi
index accompanying the VCF file with variant calls at a set of predetermined variants; corresponds with the deliverable labeled Target capture (pre-defined variants), CSI file
qc

JSON file with sample quality control metrics, containing the following quality_control_types:

  • format - FASTQ format validity
  • r1_eq_r2 - number of bases in R1 file equal to number of bases in R2 file
  • r1_r2_ids_match - R1 read identifiers match R2 read identifiers
  • bases_min - minimum number of total bases sequenced
  • bases_max - maximum number of total bases sequenced
  • bases_dedup_min - minimum number of deduplicated bases
  • bases_dedup_mapped_min - minimum number of deduplicated bases that have aligned to the target genome
  • fraction_contamination_max - maximum contamination by DNA from another sample of the same species
  • snps_min - number of variants in reference panel that are covered by at least one sequencing read
  • effective_coverage_min - minimum effective coverage
  • hzy_max - maximum heterozygosity
  • cc_min - minimum "call confidence", i.e., imputation algorithm variant calling confidence across all sites
  • nhref_min - minimum number of homozygous reference calls
  • nhet_max - maximum number of heterozygous calls
  • nhalt_min - minimum number of homozygous alt calls
  • pct_target_bases_30x_min - minimum percentage of target capture bases with 30x coverage
  • pathogenic_min - number of pathogenic variants detected
metadata
JSON file with user-specified metadata that has been assigned to a sample

Downloading checksum files

$ gencove download <local-destination-path> --project-id <project-id> --checksums

Include sha256 checksum files to verify that deliverables are valid. For instance, for file file.vcf.gz a file named file.vcf.gz.sha256 will be downloaded as well.

To verify the integrity of a file you can run

$ shasum -c file.vcf.gz.sha256
# or
$ sha256sum -c file.vcf.gz.sha256

This will output if the checksum of the downloaded file matches the one provided by Gencove.

Note

Only projects that were created after July 6, 2022 have checksums available.

Warning

The CLI does NOT validate deliverables against checksum, even when the checksum flag is provided.

The Gencove Archive

The Gencove Archive automatically transitions samples older than 30 days from hot storage to the Archive. Once a sample is in the Archive, its deliverables are not immediately available for download, rather users need to intentionally restore them from the Archive using the Gencove web dashboard, command-line interface (CLI), or API. Sample restoration can take up to 50 hours. Upon restoration, sample deliverables are available to download for 12 days, after which they return to the Archive.

gencove projects restore-samples <project-id> --sample-ids sample-id-1,...,sample-id-N

Note that default views in the Gencove web dashboard and CLI only display samples that are immediately available for download. To view archived samples, set the view filter to either:

  • all: display available and archived samples
  • archived: display only archived samples

Creating projects, listing pipeline capabilities and listing pipelines

Creating projects

To create a project via CLI, use gencove projects create command. It requires a project name and a pipeline-capability-id.

$ gencove projects create my-project-name pipeline-capability-id

The way to acquire pipeline-capability-id is explained in the sections below.

Listing pipeline capabilities

First, to list the pipeline capabilities, use gencove projects list-pipeline-capabilities command. It requires a pipeline-id as an argument. It will list pipeline capabilities associated with the specified pipeline. The pipeline capability id can be used in the previous command.

$ gencove projects list-pipeline-capabilities pipeline-id

To get the pipeline-id, use the command below.

Listing pipelines

To list pipelines, use gencove projects list-pipelines. The command doesn't require any arguments. It will list pipeline ids that can be used in the previous command.

$ gencove projects list-pipelines

Listing projects, samples and uploads

Listing projects

All projects can be listed using the gencove projects list command.

$ gencove projects list

Listing project samples

All samples can be listed using the gencove projects list-samples command.

$ gencove projects list-samples <project-id>

Project samples can also be filtered by status and searched. Metadata substring can be specified as the search query as well.

$ gencove projects list-samples <project-id> --status completed
$ gencove projects list-samples <project-id> --search my-client-id

Listing uploads

Uploads can be listed using the gencove uploads list command.

$ gencove uploads list

Uploads can also be filtered by status and searched.

$ gencove uploads list --status assigned
$ gencove uploads list --search gncv://upload/path

Importing existing samples to another project

When a sample that finished analysis is required to be processed by another pipeline configuration, this can be accomplished in another project. Uploading the fastq files again and processing them in another project with a different configuration can be tedious. The sample that finished analysis can be imported into another project instead. This process uses the deliverables of the finished sample as a source for the new one. User who does this must have the adequate permissions to manipulate the samples on both the source and destination projects and the samples must not be archived.

Import existing samples

$ gencove projects import-existing-samples destination-project-id --sample-ids source-sample-id

--sample-ids can have multiple comma separated values:

$ gencove projects import-existing-samples destination-project-id --sample-ids source-sample-id-1,source-sample-id-2

Import all samples from another project

By using --source-project-id is possible to import all available samples in succeeded or failed_qc state that have files from one project into a different one.

$ gencove projects import-existing-samples destination-project-id --source-project-id source-project-id

Note

We automatically batch large project-to-project imports into groups of 100 samples.

Optionally, universal metadata can be assigned to each new sample by adding --metadata-json:

$ gencove projects import-existing-samples destination-project-id --sample-ids source-sample-id --metadata-json='{"batch": "batch1"}'

Sample metadata and files

Gencove supports assigning metadata to a sample in JavaScript Object Notation (JSON) format.

Information commonly stored as sample metadata:

  • phenotypes (characteristics) of the individual represented by the sample
  • batch identifiers
  • alternative or auxiliary sample identifiers

Each sample has many different files assigned to it that can be retrieved using the CLI.

The following CLI commands can be used to set and get metadata:

Assigning sample metadata

Metadata can be assigned to a sample using the gencove samples set-metadata command. Specifying sample id and the --json flag together with a JSON string is mandatory.

$ gencove samples set-metadata my-sample-id --json '{"example-key": "example-value"}'
$ gencove samples set-metadata my-sample-id --json '1234567'

Retrieving sample metadata

Sample metadata can be retrieved by using the gencove samples get-metadata command. Optionally, --output-filename my-filename can be used to specify the filename where the metadata will be output. If not specified, metadata will be printed to stdout.

$ gencove samples get-metadata my-sample-id

Downloading single sample file

Download and save file

A single sample file can be downloaded using the gencove samples download-file command.

$ gencove samples download-file sample-id-1 impute-vcf destination.vcf

Include checksum file

A single file can be downloaded along with its checksum file using the gencove samples download-file command.

$ gencove samples download-file sample-id-1 impute-vcf destination.vcf --checksum

Download and stream file to stdout

A single sample file can be downloaded and streamed to stdout using the gencove samples download-file command.

$ gencove samples download-file sample-id-1 impute-vcf -

Merged VCF file

Gencove supports generating a merged VCF file containing variant calls from all successful and available (i.e., not archived) samples in a project.

Generating a merged VCF file is initiated from the Gencove Dashboard, by opening a project and clicking the "Merge VCFs" button. Once the merge operation is complete, a download button will appear on the project page.

Please keep in mind:

  • merging is only possible for projects with two or more successful and available (i.e., not archived) samples
  • not all project configurations support merging
    • in case you need a merged VCF and a project configuration you are using does not support it, please let us know at support@gencove.com
  • depending on the number of samples in your project, merging may take anywhere between several minutes and several hours
  • if multiple samples have the same client_id, the merged VCF file will only contain the newest sample
  • if a subset of samples in the project are archived, they will not be added to the merged VCF. To include them in the merged VCF, restore the samples first.

In addition to the web interface, the following CLI commands can be used to access merged VCF functionality:

Creating a merged VCF

A merged VCF file can be created using the gencove projects create-merged-vcf command.

$ gencove projects create-merged-vcf <project-id>

Checking the status of a merged VCF

Status of the merging job can be checked using the gencove projects status-merged-vcf command.

$ gencove projects status-merged-vcf <project-id>

Downloading the merged VCF

The merged VCF file can be downloaded using the gencove projects get-merged-vcf command. Optionally, --output-filename my-filename can be used to override the default filename.

$ gencove projects get-merged-vcf <project-id>

Reference Genome

Gencove offers access to the reference genome files utilized in generating project deliverables.

These files can be downloaded using the gencove projects get-reference-genome command. Alternatively, they are accessible through the web application, where you can download the genome.fasta file from the project detail page.

$ gencove projects get-reference-genome <project-id> <destination-dir>

Downloading subsets of deliverables

$ gencove projects get-reference-genome <project-id> <destination-dir> --file-types genome-fasta,genome-dict

Currently available file types are listed below (not all file types may be available for every project, please run gencove file-types --object reference-genome --project-id <project-id> for an accurate list of file types).

genome-fasta
Reference genome sequence in FASTA format, compressed using gzip
genome-dict
Picard sequence dictionary corresponding to the reference genome sequence
genome-fasta_amb
Auxiliary file used by the BWA alignment tool for genome indexing
genome-fasta_ann
Annotation file used by the BWA alignment tool for genome indexing
genome-fasta_bwt
Burrows-Wheeler Transform (BWT) index file, used for efficient sequence alignment
genome-fasta_fai
FASTA index file, providing quick access to sequences within the compressed FASTA file
genome-fasta_gzi
Index file for the compressed FASTA file, facilitating quick retrieval of specific regions
genome-fasta_pac
Packed alignment data file used by BWA for indexing.
genome-fasta_sa
Suffix array file, a data structure used for pattern matching and genome alignment
genome-fasta_vcf_header
Header file for a Variant Call Format (VCF) file, containing information about the reference genome and other metadata

Setting up Automated imports from BaseSpace

If a BaseSpace social connection is present for a user and there are adequate permissions for importing from BaseSpace, an automated import can be set up. It runs periodically and parses BaseSpace for the projects created in the last day whose names contain the identifier. Biosamples from those BaseSpace projects are imported into Gencove project to be run there.

Note

identifier is used as a search argument for BaseSpace projects. Any project name that contains the identifier is returned. Letter case is ignored.

Identifier "cattle" will match BaseSpace project names like "Our cattle 1" and "Cattle New", but it won't match "Corn".

Listing autoimport jobs

Available autoimport jobs for a user can be listed using the gencove basespace autoimports list command.

$ gencove basespace autoimports list

Creating an autoimport job

A new autoimport job can be created using the gencove basespace autoimports create command.

$ gencove basespace autoimports create <project-id> identifier-in-basespace-project-name

To assign the sample metadata for each sample set up to be imported this way --metadata-json can be used.

$ gencove basespace autoimports create <project-id> identifier-in-basespace-project-name --metadata-json '{"example-key": "example-value"}'

Note

When autoimport job is first created, it is immediately run. Afterwards it will periodically check if a new BaseSpace project with name containing the identifier is added in the last day and import the Biosamples according to the autoimport instructions.

Backwards-compatible array deliverables

Backwards-compatible genotyping array deliverables can be generated for batches of samples in projects that support this functionality. Each project configuration can support multiple batch types that correspond to different array types.

More information about these deliverables is available in this blog post

Listing batch types

Available batch types for a project can be listed using the gencove projects list-batch-types command.

$ gencove projects list-batch-types <project-id>

Creating a batch

A new batch can be created using the gencove projects create-batch command.

$ gencove projects create-batch --batch-type illuminasnp50 --batch-name batch-001 --sample-ids sample-id-1,...,sample-id-N <project-id>

Omitting--sample-ids results in all samples belonging to the project being used for the batch.

$ gencove projects create-batch --batch-type illuminasnp50 --batch-name batch-001 <project-id>

Successful generation of a batch deliverable will also trigger a webhook associated with the project.

Listing project batches

Project batches can be listed using the gencove projects list-batches command.

$ gencove projects list-batches <project-id>

Downloading batch deliverable

Once the batch deliverable is generated, it is available for download using the gencove projects get-batch command.

$ gencove projects get-batch my-batch-id --output-filename batch.zip

Retrieving reports

The Gencove CLI can be used to retrieve various reports on your Gencove data.

Project QC Reports

Quality control data for every completed sample in a given project can be retrieved via the gencove reports project-qc command. The data is returned as a CSV file that is saved locally.

$ gencove reports project-qc --output-filename report.csv <project-id>

You can also select which columns to retrieve via the --columns parameter. By default, all the columns are retrieved. The following columns are supported:

  • id - Gencove sample ID
  • client_id - User supplied ID
  • project_id - Project ID for sample
  • year - Year sample entered completed state
  • month - Month sample entered completed state
  • day - Day sample entered completed state
  • status - Final reported status for sample
  • sex_string - Inferred sex karyotype for sample (if available)
  • snps_min - Minimum number of SNPs detected
  • bases_dedup_mapped_min - Minimum number of deduplicated bases mapped to the target genome
  • call_rate_min - Call rate
  • effective_coverage_min - Effective coverage
  • raw_coverage - Raw coverage
  • ancestries - The ancestry breakdown per sample. This value will be broken down into individual ancestry columns, varying depending on the species for your project pipeline configuration.

Organization Usage Reports

Monthly usage across your entire organization can be retrieved via the gencove reports monthly-usage command. This command downloads a CSV file which reports the number of succeeded and failed samples across all projects, broken down by month.

By default, the previous 12 months are retrieved for the report. A date range can be optionally specified by passing in the --from and --to parameters, both of which expect a date in YYYY-MM format (e.g. 2023-01). Note that the --to parameter is inclusive.

$ gencove reports monthly-usage --from 2023-05 --to 2023-09 --output-filename monthly_usage.csv

Filing a bug report for the CLI

Before reporting a bug report please follow these steps:

Update to the latest CLI version

$ pip install -U gencove

You might be using an out of date CLI version, we recommend installing the latest one and trying again to see if the problem was fixed.

Additionaly you can download the latest version as a binary executable to avoid problems while using pip.

Check network connection

$ ping api.gencove.com

Your organization firewall rules might be blocking traffic to Gencove Servers. If you're not able to reach us, add a new rule to allow traffic to api.gencove.com and then try again.

Gather information for bug report

Note

CLI dumps a debug log file when encounters an error.

Once you completed the above steps, rerun the command(s) you're having trouble.

Save and attach the debug log file to the bug report (the file path will be in the terminal's output).

Optionally include the following information: - Gencove CLI version - Python version - Operating system version - Hardware info - CPU, Ram, Disks (Total/Free/Used)

Deleting samples

Note

Deleted samples are still counted towards the total count of processed samples on invoices.

It is possible to delete samples belonging to a project using either the CLI or API.

gencove projects delete-samples --sample-ids sample1,sample2 project1

Note that when using the above command, all samples supplied must belong to the same project.

Deleting projects

Note

Deleting a project will delete its samples and batches, but these are still counted towards the total count of processed samples on invoices.

It is possible to delete projects using either the CLI or API.

gencove projects delete project1,project2

Note that when using the above command, all projects supplied must belong to your organization.

Back to top