The Gencove CLI¶
The Gencove command-line interface (CLI) can be used to easily access the API.
It is mostly used for:
- Uploading FASTQ files for analysis
- Downloading analysis results
Quickstart¶
$ pip install gencove
$ gencove upload <local-directory-path>
Install the Gencove CLI using the Python package manager pip
and upload files to your Gencove account.
For more detailed installation instructions, please see the Installation section below.
Video demo¶
Setup¶
Installation¶
$ pip install gencove
Note
Gencove CLI is compatible with Python versions 3.8 and above. Please note that Python 3.7 has reached its end of life, and we highly recommend upgrading to a supported version.
The Gencove CLI can be installed using the Python package manager pip
. The source code is mirrored to a public repository on GitHub.
Python 3 and pip are commonly available on many operating systems. In case you do need to install Python 3, straightforward instructions are available here.
In production environments, we highly recommend using virtualenv and/or virtualenvwrapper for installing the Gencove CLI in a dedicated Python environment.
Configuring the CLI host¶
The Gencove platform is deployed across several geographical locations to accommodate users across the world. When using the CLI, the --host
option can be configured to point to an alternative environment. Users can find their respective environment by checking the URL used to access the web UI, which follows the format web.<env>.gencove.com
.
For example, if you are using the EU environment, you must configure your CLI commands as follows:
$ gencove <command> <options> --host https://api.eu1.gencove.com <args>
To upload a directory to Gencove on the EU host:
$ gencove upload /home/gencove/reads --host https://api.eu1.gencove.com --api-key GENCOVE_API_KEY
Mac OS notes¶
Due to a known issue with Python that ships with Mac OS, the Gencove CLI should be installed in the user's home directory (not system-wide) as follows: pip install --user gencove
. Make sure to have ~/bin
present in your $PATH
environment variable.
For advanced users, we highly recommend virtualenvwrapper and installing the Gencove CLI within a dedicated virtualenv.
If you absolutely must install the Gencove CLI system-wide using sudo
, the following command can be used as a last resort: sudo pip install gencove --ignore-installed six
.
Configuration¶
Your credentials can be provided to the Gencove CLI via environment variables:
$GENCOVE_EMAIL
and$GENCOVE_PASSWORD
$GENCOVE_API_KEY
- API keys can be generated and revoked using the Gencove Dashboard under Account Settings -> API Keys
$ export GENCOVE_EMAIL='<your-email>'
$ export GENCOVE_PASSWORD='<your-password>'
$ export GENCOVE_API_KEY='<your-api-key>'
Please note that you cannot use $GENCOVE_EMAIL
+$GENCOVE_PASSWORD
and $GENCOVE_API_KEY
at the same time.
$ curl -H "Authorization: Api-Key <your-api-key>" https://api.gencove.com/api/v2/projects/
import requests
r = requests.get(
"https://api.gencove.com/api/v2/projects/",
headers={"Authorization": "Api-Key <your-api-key>"}
)
API keys can also be used to authenticate with the API directly by setting the Authorization
HTTP header to Api-Key <your-api-key>
.
Note
If MFA (multi-factor authentication) is enabled in the account and you use email and password credentials, an MFA token will be requested in the terminal after the command is submitted.
If an API key is used instead, no other token is necessary.
Uploading FASTQ files¶
In order to enable FASTQ uploads for your account, log into your account and go to My FASTQs, where instructions will be provided (in case you already do not have access). You can expect a response from Gencove support within 24h.
Once uploads are enabled, users can upload files to the Gencove upload area using the Gencove CLI and assign the files to projects using the Gencove Dashboard. Once files are assigned to a project, they will be processed by the Gencove analysis pipeline. Analysis results will be available via the Gencove API and Dashboard once analysis is complete.
Warning
The Gencove upload area should be considered temporary storage and should not be used as permanent storage space for your files. Once files are assigned to a project, they will be stored according to your data retention agreement with Gencove.
File naming convention¶
We highly recommend using the standard Illumina naming convention for FASTQ files. If files are named in this manner, Gencove systems will automatically detect:
- the sample identifier (and use it as the sample's
client_id
) - R1/R2 designations of files
A summary of the naming convention is:
SAMPLE ID
+ _
+ ... + _
+ (R1
or R2
) + _
+ ... + .fastq.gz
For example, the table below shows examples of file names using this convention and the corresponding detected sample identifiers and read designations
File name | Sample ID | Read pair |
---|---|---|
SAMPLE1_R1.fastq.gz | SAMPLE1 | R1 |
SAMPLE1_R2.fastq.gz | SAMPLE1 | R2 |
SAMPLE2_LANE1_SEQUENCER1_R1.fastq.gz | SAMPLE2 | R1 |
SAMPLE3_R1_L001.fastq.gz | SAMPLE3 | R1 |
SAMPLE4_R1.fq.gz | SAMPLE4 | R1 |
Custom file names¶
To bypass the default convention outlined above and explicitly specify sample identifiers and R1/R2 designations for FASTQ files, a file ending with .fastq-map.csv
can be provided as the SOURCE
to the gencove upload
command. The format of the file is outlined in the code snippet on the right.
The following validation is performed on the .fastq-map.csv
file:
- the file header is
client_id,r_notation,path
- values in the
client_id
column cannot contain_
- values in the
r_notation
column can only be "r1" or "r2" - file listed in the
path
column must: - exist
- be gzip-compressed
Example:
client_id,r_notation,path
<sample_id_1>,<r_notation_1>,<path_to_fastq_file_1>
<sample_id_2>,<r_notation_2>,<path_to_fastq_file_2>
<sample_id_3>,<r_notation_3>,<path_to_fastq_file_3>
...
Grouping files¶
By default, Gencove systems expect one pair of FASTQ files per sample.
If sequencing reads for a single sample are spread across multiple FASTQ files, they need to be merged into one R1 file and one R2 file. This can be accomplished in several ways:
- Listing multiple files for the same
client_id
andr_notation
in the.fastq-map.csv
file (outlined in previous section) results in the files being concatenated on the fly during upload with the Gencove CLI - see example in code snippet on the right. Note that the concatenation order is controlled by the order of rows listed in the.fastq-map.csv
file. - Manually concatenate the files. Since gzip-compressed files can be merged without decompressing, it's simply a matter of concatenating the compressed files.
- By providing the
--no-lane-splitting
flag tobcl2fastq
, splitting reads into multiple FASTQ files can be avoided upstream in the demultiplexing phase.
Example:
client_id,r_notation,path
sampleid1,r1,sample1_part1_r1.fastq.gz
sampleid1,r1,sample1_part2_r1.fastq.gz
sampleid1,r1,sample1_part3_r1.fastq.gz
sampleid1,r2,sample1_part1_r2.fastq.gz
sampleid1,r2,sample1_part2_r2.fastq.gz
sampleid1,r2,sample1_part3_r2.fastq.gz
Uploading using the CLI¶
$ gencove upload <source-path> [<destination-path>]
Syncs local directories to directories in your Gencove upload area. Recursively copies new and updated files from the source directory to the destination.
Alternatively, can be used to import FASTQ files from URLs using a map file.
Only creates folders in the destination if they contain one or more files.
$ gencove upload my-fastq-files/
This example command will recursively copy all files in the my-fastq-files/
directory on your host system to a directory with an automatically generated name the Gencove upload area.
$ gencove upload input.fastq-map.csv
If there are multiple input FASTQ files per sample, or the file names do not follow the conventions described above, a manifest describing the relationship between the sample identifiers and the input FASTQ files must be provided in a CSV file in the format described above.
$ gencove upload my-fastq-files/ gncv://my-fastq/batch-1/
In case more control is needed over the upload destination, a destination path prefixed with gncv://
may be provided. This pattern is commonly used for separating upload batches when continuously uploading data to your Gencove account and is useful for easily filtering files in the Gencove Dashboard. A common directory structure for batching uploads is:
gncv://<project-name>/<batch-name>/
If specifying a destination path, it is recommended to have at least one level of directories to separate batches of uploaded data. In other words, it is recommended to avoid placing all files in the root directory gncv://
Details of upload
behavior:
- In case a file in the local directory already exists in the destination, it will not be overwritten
- In case a file exists in the destination, but not the local directory, it will not be deleted
Importing files from URLs with a map file¶
Using the map file described above, it is also possible to import FASTQ files from URLs. When constructing the map CSV file, include the URL for each file under the path
column.
Here is an example of the contents of a CSV map file that uses URLs:
client_id,r_notation,path
sample1,r1,https://example-bucket.storage.googleapis.com/sample_R1.fastq.gz
sample1,r2,https://example-bucket.storage.googleapis.com/sample_R2.fastq.gz
Note
Note that only the following URL domains are supported:
-
amazonaws.com
(AWS) -
blob.core.windows.net
(Azure) -
googleapis.com
(Google Cloud)
Once the map file has been built, the upload
command can be used:
$ gencove upload input.fastq-map.csv
Warning
When generating URLs from the above cloud providers, we suggest setting a generous expiration time to ensure the URLs do not expire by the time they reach a project and need to be retrieved by the corresponding pipeline.
Automatically starting analysis¶
To automatically assign uploads to a project and run analysis, provide the --run-project-id
flag and destination project id to the Gencove CLI.
$ gencove upload my-fastq-files/ gncv://my-fastq/batch-1/ --run-project-id b1edbb20-ee77-4be0-9944-e8e3a593cc83
When this feature is used, the Gencove CLI will check to make sure that contents of SOURCE
and DESTINATION
are identical in order to avoid analysis of unwanted samples. This will always be the case if DESTINATION
is omitted, i.e., autogenerated by the Gencove CLI.
It is also important to ensure uploaded files follow naming conventions outlined above to avoid sample identifier detection issues.
Downloading deliverables¶
Gencove provides a number of deliverables for each sample that is processed as part of a project. In case a sample fails processing due to quality control, only the original input files are provided as deliverables.
Downloading using the CLI¶
$ gencove download <local-destination-path> --project-id <project-id>`
Downloads all deliverables for all samples in project the specified project, with the following default naming scheme:
<local-destination-path>/<client-id>/<gencove-id>/<gencove-id>_<file-type>.<file-extension>
This naming scheme reflects the fact that uniqueness of client-id
s is not enforced, while uniqueness of gencove-id
is enforced.
Customizing download naming scheme¶
The default naming scheme outlined above can be customized by providing the --download-template
flag and a custom file naming template that may contain {client_id}
, {gencove_id}
, {file_type}
, {file_extension}
and {default_filename}
tokens.
$ gencove download . --project-id <project-id> --download-template '{client_id}.{file_extension}'
When using this feature, make sure to specify download templates that result in unique filenames across all samples.
The {default_filename}
token provides access to the API's default file naming scheme, which takes into account different bioinformatics conventions across a subset of file types. Current exceptions to the default {gencove_id}_{file_type}.{file_extension}
scheme are:
fastq-r1
:{gencove_id}_R1.fastq.gz
fastq-r2
:{gencove_id}_R2.fastq.gz
alignment-bam
:{gencove_id}.bam
alignment-bai
:{gencove_id}.bam.bai
impute-vcf
:{gencove_id}.vcf.gz
impute-tbi
:{gencove_id}.vcf.gz.tbi
impute-csi
:{gencove_id}.vcf.gz.csi
Continuing previous downloads¶
When downloading, existing files on the local filesystem are not overwritten if the file already exists and has the same size in bytes as the file that would be downloaded. This behavior can be tweaked with the --no-skip-existing
flag.
Downloading subsets of deliverables¶
$ gencove download . --sample-ids sample-id-1,sample-id-2,sample-id-3 --file-types impute-vcf,impute-tbi
- Download only a specific set of sample ids by providing the
--sample-ids
flag instead of the--project-id
flag - Download only a specific set of file types by providing the
--file-types
flag. Currently available file types are listed below (not all file types may be available for every project).
fastq-r1
- original input FASTQ file with raw sequencing reads, containing the first read of a read pair when using paired-end sequencing
fastq-r2
- original input FASTQ file with raw sequencing reads, containing the second read of a read pair when using paired-end sequencing
alignment-bam
- BAM file with reads aligned to the target genome (includes all reads from original FASTQ files)
alignment-bai
- BAI index file accompanying the BAM file
cnv-cnr
- CNR file with bin-level log2 ratios for copy-number variation calls
cnv-cns
- CNS file with segmented lod2 ratios for copy-number variation calls
cnv-pdf
- Portable Document Format (PDF) file with copy-number variation plot
cnv-png
- Portable Network Graphics (PNG) file with copy-number variation plot (commonly used when PDFs are too large).
impute-vcf
- VCF file with imputed variant calls
impute-tbi
- Tabix index file accompanying the VCF file
impute-csi
- CSI index file accompanying the VCF file
kraken-report
- Kraken report for sequencing reads that didn't map to the target genome
ancestry-json
-
JavaScript Object Notation (JSON) file with ancestry estimates for subpopulations, contains the following keys:
ancestry
- contains ancestry estimatesancestry_raw
- may contain additional entries for ambiguous groupings in situations where specific subgroups cannot be consistently identifiedancestry_metadata_id
- legacy key (should be disregarded)
traits-json
-
JSON file with polygenic risk score calculations
- each key represents a polygenic score outlined in the "Data analysis configurations" section below
- each polygenic score object contains the following keys:
score
- calculated value of polygenic scorensnp
- number of single-nucleotide polymorphisms (SNPs) taken into accountscore_percentile
- percentile of individual's score relative to scores calculated for individuals in the reference dataset used to generate the score
call_capture-vcf
- VCF file with variant calls from target capture regions, corresponding with the deliverable labeled
Target capture, VCF file
in the web interface. call_capture-csi
- index accompanying the target capture VCF file, corresponding with the deliverable labeled
Target capture, CSI file
in the web interface. call_capture-vcf_pathogenic
- VCF file with pathogenic variant calls from target capture regions
call_capture-forced_vcf
- VCF file with variant calls at a set of pre-determined variants, corresponding with the deliverable labeled
Target capture (pre-defined variants), VCF file
in the web interface. call_capture-forced_csi
- index accompanying the VCF file with variant calls at a set of predetermined variants; corresponds with the deliverable labeled
Target capture (pre-defined variants), CSI file
qc
-
JSON file with sample quality control metrics, containing the following
quality_control_type
s:format
- FASTQ format validityr1_eq_r2
- number of bases inR1
file equal to number of bases inR2
filer1_r2_ids_match
-R1
read identifiers matchR2
read identifiersbases_min
- minimum number of total bases sequencedbases_max
- maximum number of total bases sequencedbases_dedup_min
- minimum number of deduplicated basesbases_dedup_mapped_min
- minimum number of deduplicated bases that have aligned to the target genomefraction_contamination_max
- maximum contamination by DNA from another sample of the same speciessnps_min
- number of variants in reference panel that are covered by at least one sequencing readeffective_coverage_min
- minimum effective coveragehzy_max
- maximum heterozygositycc_min
- minimum "call confidence", i.e., imputation algorithm variant calling confidence across all sitesnhref_min
- minimum number of homozygous reference callsnhet_max
- maximum number of heterozygous callsnhalt_min
- minimum number of homozygous alt callspct_target_bases_30x_min
- minimum percentage of target capture bases with 30x coveragepathogenic_min
- number of pathogenic variants detected
metadata
- JSON file with user-specified metadata that has been assigned to a sample
Downloading checksum files¶
$ gencove download <local-destination-path> --project-id <project-id> --checksums
Include sha256
checksum files to verify that deliverables are valid. For instance, for file file.vcf.gz
a file named file.vcf.gz.sha256
will be downloaded as well.
To verify the integrity of a file you can run
$ shasum -c file.vcf.gz.sha256
# or
$ sha256sum -c file.vcf.gz.sha256
This will output if the checksum of the downloaded file matches the one provided by Gencove.
Note
Only projects that were created after July 6, 2022 have checksums available.
Warning
The CLI does NOT validate deliverables against checksum, even when the checksum flag is provided.
The Gencove Archive¶
The Gencove Archive automatically transitions samples older than 30 days from hot storage to the Archive. Once a sample is in the Archive, its deliverables are not immediately available for download, rather users need to intentionally restore them from the Archive using the Gencove web dashboard, command-line interface (CLI), or API. Sample restoration can take up to 50 hours. Upon restoration, sample deliverables are available to download for 12 days, after which they return to the Archive.
gencove projects restore-samples <project-id> --sample-ids sample-id-1,...,sample-id-N
Note that default views in the Gencove web dashboard and CLI only display samples that are immediately available for download. To view archived samples, set the view filter to either:
all
: display available and archived samplesarchived
: display only archived samples
Creating projects, listing pipeline capabilities and listing pipelines¶
Creating projects¶
To create a project via CLI, use gencove projects create
command. It requires a project name and a pipeline-capability-id.
$ gencove projects create my-project-name pipeline-capability-id
The way to acquire pipeline-capability-id is explained in the sections below.
Listing pipeline capabilities¶
First, to list the pipeline capabilities, use gencove projects list-pipeline-capabilities
command. It requires a pipeline-id as an argument. It will list pipeline capabilities associated with the specified pipeline. The pipeline capability id can be used in the previous command.
$ gencove projects list-pipeline-capabilities pipeline-id
To get the pipeline-id, use the command below.
Listing pipelines¶
To list pipelines, use gencove projects list-pipelines
. The command doesn't require any arguments. It will list pipeline ids that can be used in the previous command.
$ gencove projects list-pipelines
Listing projects, samples and uploads¶
Listing projects¶
All projects can be listed using the gencove projects list
command.
$ gencove projects list
Listing project samples¶
All samples can be listed using the gencove projects list-samples
command.
$ gencove projects list-samples <project-id>
Project samples can also be filtered by status
and searched. Metadata substring can be specified as the search query as well.
$ gencove projects list-samples <project-id> --status completed
$ gencove projects list-samples <project-id> --search my-client-id
Listing uploads¶
Uploads can be listed using the gencove uploads list
command.
$ gencove uploads list
Uploads can also be filtered by status
and searched.
$ gencove uploads list --status assigned
$ gencove uploads list --search gncv://upload/path
Importing existing samples to another project¶
When a sample that finished analysis is required to be processed by another pipeline configuration, this can be accomplished in another project. Uploading the fastq files again and processing them in another project with a different configuration can be tedious. The sample that finished analysis can be imported into another project instead. This process uses the deliverables of the finished sample as a source for the new one. User who does this must have the adequate permissions to manipulate the samples on both the source and destination projects and the samples must not be archived.
Import existing samples¶
$ gencove projects import-existing-samples destination-project-id --sample-ids source-sample-id
--sample-ids
can have multiple comma separated values:
$ gencove projects import-existing-samples destination-project-id --sample-ids source-sample-id-1,source-sample-id-2
Import all samples from another project¶
By using --source-project-id
is possible to import all available samples in succeeded or failed_qc state that have files from one project into a different one.
$ gencove projects import-existing-samples destination-project-id --source-project-id source-project-id
Note
We automatically batch large project-to-project imports into groups of 100 samples.
Optionally, universal metadata can be assigned to each new sample by adding --metadata-json
:
$ gencove projects import-existing-samples destination-project-id --sample-ids source-sample-id --metadata-json='{"batch": "batch1"}'
Sample metadata and files¶
Gencove supports assigning metadata to a sample in JavaScript Object Notation (JSON) format.
Information commonly stored as sample metadata:
- phenotypes (characteristics) of the individual represented by the sample
- batch identifiers
- alternative or auxiliary sample identifiers
Each sample has many different files assigned to it that can be retrieved using the CLI.
The following CLI commands can be used to set and get metadata:
Assigning sample metadata¶
Metadata can be assigned to a sample using the gencove samples set-metadata
command. Specifying sample id and the --json
flag together with a JSON string is mandatory.
$ gencove samples set-metadata my-sample-id --json '{"example-key": "example-value"}'
$ gencove samples set-metadata my-sample-id --json '1234567'
Retrieving sample metadata¶
Sample metadata can be retrieved by using the gencove samples get-metadata
command. Optionally, --output-filename my-filename
can be used to specify the filename where the metadata will be output. If not specified, metadata will be printed to stdout
.
$ gencove samples get-metadata my-sample-id
Downloading single sample file¶
Download and save file¶
A single sample file can be downloaded using the gencove samples download-file
command.
$ gencove samples download-file sample-id-1 impute-vcf destination.vcf
Include checksum file¶
A single file can be downloaded along with its checksum file using the gencove samples download-file
command.
$ gencove samples download-file sample-id-1 impute-vcf destination.vcf --checksum
Download and stream file to stdout¶
A single sample file can be downloaded and streamed to stdout using the gencove samples download-file
command.
$ gencove samples download-file sample-id-1 impute-vcf -
Merged VCF file¶
Gencove supports generating a merged VCF file containing variant calls from all successful and available (i.e., not archived) samples in a project.
Generating a merged VCF file is initiated from the Gencove Dashboard, by opening a project and clicking the "Merge VCFs" button. Once the merge operation is complete, a download button will appear on the project page.
Please keep in mind:
- merging is only possible for projects with two or more successful and available (i.e., not archived) samples
- not all project configurations support merging
- in case you need a merged VCF and a project configuration you are using does not support it, please let us know at support@gencove.com
- depending on the number of samples in your project, merging may take anywhere between several minutes and several hours
- if multiple samples have the same
client_id
, the merged VCF file will only contain the newest sample - if a subset of samples in the project are archived, they will not be added to the merged VCF. To include them in the merged VCF, restore the samples first.
In addition to the web interface, the following CLI commands can be used to access merged VCF functionality:
Creating a merged VCF¶
A merged VCF file can be created using the gencove projects create-merged-vcf
command.
$ gencove projects create-merged-vcf <project-id>
Checking the status of a merged VCF¶
Status of the merging job can be checked using the gencove projects status-merged-vcf
command.
$ gencove projects status-merged-vcf <project-id>
Downloading the merged VCF¶
The merged VCF file can be downloaded using the gencove projects get-merged-vcf
command. Optionally, --output-filename my-filename
can be used to override the default filename.
$ gencove projects get-merged-vcf <project-id>
Reference Genome¶
Gencove offers access to the reference genome files utilized in generating project deliverables.
These files can be downloaded using the gencove projects get-reference-genome
command. Alternatively, they are accessible through the web application, where you can download the genome.fasta
file from the project detail page.
$ gencove projects get-reference-genome <project-id> <destination-dir>
Downloading subsets of deliverables¶
$ gencove projects get-reference-genome <project-id> <destination-dir> --file-types genome-fasta,genome-dict
Currently available file types are listed below (not all file types may be available for every project, please run gencove file-types --object reference-genome --project-id <project-id>
for an accurate list of file types).
genome-fasta
- Reference genome sequence in FASTA format, compressed using gzip
genome-dict
- Picard sequence dictionary corresponding to the reference genome sequence
genome-fasta_amb
- Auxiliary file used by the BWA alignment tool for genome indexing
genome-fasta_ann
- Annotation file used by the BWA alignment tool for genome indexing
genome-fasta_bwt
- Burrows-Wheeler Transform (BWT) index file, used for efficient sequence alignment
genome-fasta_fai
- FASTA index file, providing quick access to sequences within the compressed FASTA file
genome-fasta_gzi
- Index file for the compressed FASTA file, facilitating quick retrieval of specific regions
genome-fasta_pac
- Packed alignment data file used by BWA for indexing.
genome-fasta_sa
- Suffix array file, a data structure used for pattern matching and genome alignment
genome-fasta_vcf_header
- Header file for a Variant Call Format (VCF) file, containing information about the reference genome and other metadata
Setting up Automated imports from BaseSpace¶
If a BaseSpace social connection is present for a user and there are adequate permissions for importing from BaseSpace, an automated import can be set up. It runs periodically and parses BaseSpace for the projects created in the last day whose names contain the identifier
. Biosamples from those BaseSpace projects are imported into Gencove project to be run there.
Note
identifier
is used as a search argument for BaseSpace projects. Any project name that contains the identifier is returned. Letter case is ignored.
Identifier "cattle" will match BaseSpace project names like "Our cattle 1" and "Cattle New", but it won't match "Corn".
Listing autoimport jobs¶
Available autoimport jobs for a user can be listed using the gencove basespace autoimports list
command.
$ gencove basespace autoimports list
Creating an autoimport job¶
A new autoimport job can be created using the gencove basespace autoimports create
command.
$ gencove basespace autoimports create <project-id> identifier-in-basespace-project-name
To assign the sample metadata for each sample set up to be imported this way --metadata-json
can be used.
$ gencove basespace autoimports create <project-id> identifier-in-basespace-project-name --metadata-json '{"example-key": "example-value"}'
Note
When autoimport job is first created, it is immediately run. Afterwards it will periodically check if a new BaseSpace project with name containing the identifier
is added in the last day and import the Biosamples according to the autoimport instructions.
Backwards-compatible array deliverables¶
Backwards-compatible genotyping array deliverables can be generated for batches of samples in projects that support this functionality. Each project configuration can support multiple batch types that correspond to different array types.
More information about these deliverables is available in this blog post
Listing batch types¶
Available batch types for a project can be listed using the gencove projects list-batch-types
command.
$ gencove projects list-batch-types <project-id>
Creating a batch¶
A new batch can be created using the gencove projects create-batch
command.
$ gencove projects create-batch --batch-type illuminasnp50 --batch-name batch-001 --sample-ids sample-id-1,...,sample-id-N <project-id>
Omitting--sample-ids
results in all samples belonging to the project being used for the batch.
$ gencove projects create-batch --batch-type illuminasnp50 --batch-name batch-001 <project-id>
Successful generation of a batch deliverable will also trigger a webhook associated with the project.
Listing project batches¶
Project batches can be listed using the gencove projects list-batches
command.
$ gencove projects list-batches <project-id>
Downloading batch deliverable¶
Once the batch deliverable is generated, it is available for download using the gencove projects get-batch
command.
$ gencove projects get-batch my-batch-id --output-filename batch.zip
Retrieving reports¶
The Gencove CLI can be used to retrieve various reports on your Gencove data.
Project QC Reports¶
Quality control data for every completed sample in a given project can be retrieved via the gencove reports project-qc
command. The data is returned as a CSV file that is saved locally.
$ gencove reports project-qc --output-filename report.csv <project-id>
You can also select which columns to retrieve via the --columns
parameter. By default, all the columns are retrieved. The following columns are supported:
id
- Gencove sample IDclient_id
- User supplied IDproject_id
- Project ID for sampleyear
- Year sample entered completed statemonth
- Month sample entered completed stateday
- Day sample entered completed statestatus
- Final reported status for samplesex_string
- Inferred sex karyotype for sample (if available)snps_min
- Minimum number of SNPs detectedbases_dedup_mapped_min
- Minimum number of deduplicated bases mapped to the target genomecall_rate_min
- Call rateeffective_coverage_min
- Effective coverageraw_coverage
- Raw coverageancestries
- The ancestry breakdown per sample. This value will be broken down into individual ancestry columns, varying depending on the species for your project pipeline configuration.
Organization Usage Reports¶
Monthly usage across your entire organization can be retrieved via the gencove reports monthly-usage
command. This command downloads a CSV file which reports the number of succeeded and failed samples across all projects, broken down by month.
By default, the previous 12 months are retrieved for the report. A date range can be optionally specified by passing in the --from
and --to
parameters, both of which expect a date in YYYY-MM
format (e.g. 2023-01
). Note that the --to
parameter is inclusive.
$ gencove reports monthly-usage --from 2023-05 --to 2023-09 --output-filename monthly_usage.csv
Filing a bug report for the CLI¶
Before reporting a bug report please follow these steps:
Update to the latest CLI version¶
$ pip install -U gencove
You might be using an out of date CLI version, we recommend installing the latest one and trying again to see if the problem was fixed.
Additionaly you can download the latest version as a binary executable to avoid problems while using pip
.
Check network connection¶
$ ping api.gencove.com
Your organization firewall rules might be blocking traffic to Gencove Servers. If you're not able to reach us, add a new rule to allow traffic to api.gencove.com
and then try again.
Gather information for bug report¶
Note
CLI dumps a debug log file when encounters an error.
Once you completed the above steps, rerun the command(s) you're having trouble.
Save and attach the debug log file to the bug report (the file path will be in the terminal's output).
Optionally include the following information: - Gencove CLI version - Python version - Operating system version - Hardware info - CPU, Ram, Disks (Total/Free/Used)
Deleting samples¶
Note
Deleted samples are still counted towards the total count of processed samples on invoices.
It is possible to delete samples belonging to a project using either the CLI or API.
gencove projects delete-samples --sample-ids sample1,sample2 project1
Note that when using the above command, all samples supplied must belong to the same project.
Deleting projects¶
Note
Deleting a project will delete its samples and batches, but these are still counted towards the total count of processed samples on invoices.
It is possible to delete projects using either the CLI or API.
gencove projects delete project1,project2
Note that when using the above command, all projects supplied must belong to your organization.