Reference Genome

Reference Genome¶

Gencove offers access to the reference genome files utilized in generating project deliverables.

These files can be downloaded using the gencove projects get-reference-genome command. Alternatively, they are accessible through the web application, where you can download the genome.fasta file from the project detail page.

$ gencove projects get-reference-genome <project-id> <destination-dir>

Downloading subsets of deliverables¶

$ gencove projects get-reference-genome <project-id> <destination-dir> --file-types genome-fasta,genome-dict

Currently available file types are listed below (not all file types may be available for every project, please run gencove file-types --object reference-genome --project-id <project-id> for an accurate list of file types).

genome-fasta: Reference genome sequence in FASTA format, compressed using gzip
genome-dict: Picard sequence dictionary corresponding to the reference genome sequence
genome-fasta_amb: Auxiliary file used by the BWA alignment tool for genome indexing
genome-fasta_ann: Annotation file used by the BWA alignment tool for genome indexing
genome-fasta_bwt: Burrows-Wheeler Transform (BWT) index file, used for efficient sequence alignment
genome-fasta_fai: FASTA index file, providing quick access to sequences within the compressed FASTA file
genome-fasta_gzi: Index file for the compressed FASTA file, facilitating quick retrieval of specific regions
genome-fasta_pac: Packed alignment data file used by BWA for indexing.
genome-fasta_sa: Suffix array file, a data structure used for pattern matching and genome alignment
genome-fasta_vcf_header: Header file for a Variant Call Format (VCF) file, containing information about the reference genome and other metadata