Skip to content

Data analysis configurations

Each Gencove project is pinned to a 'configuration' that specifies the species, reference datasets (e.g. a reference genome and haplotype reference panel), and specific deliverables. These configurations can be private to a specific set of individuals, or public. The datasets underlying the public configurations are as follows:

Chicken

Chicken low-pass v1.0
  • Reference genome: galGal6
  • Imputation reference panel: Sequencing data was downloaded from the European Nucleotide Archive project PRJEB30270. These data are described in detail in Qanbari et al. (2019). Raw sequences were processed using GATK4 and 26M variants were identified and converted into a haplotype reference panel.
  • Deliverables: original FASTQ, aligned BAM (and index), imputed VCF (and index)

Cattle

Cattle low-pass v1.0
  • Reference genome: ARS-UCD1.2
  • Imputation reference panel: We used sequence data from 484 samples primarily from B. taurus breeds and processed these data using GATK4 into a reference panel of 49M bi-allelic SNPs.
  • Breed analysis reference panel: We report ancestry proportions from 13 breeds: Angus, Brahman, Charolais, Gelbvieh, Hereford, Holstein, Jersey, Limousin, Red Angus, Simmental, Braunvieh, Santa Gertrudis, and Shorthorn.
  • Deliverables: original FASTQ, aligned BAM (and index), imputed VCF (and index), breed analysis
Cattle low-pass v2.0
  • Reference genome: ARS-UCD1.2
  • Imputation reference panel: We used sequence data from 946 samples from B. taurus and B. indicus-related breeds and processed these data using GATK4 into a reference panel of 59M bi-allelic SNPs.
  • Breed analysis reference panel: We report ancestry proportions from 12 breeds: Angus (including red and black Angus), Brahman, Charolais, Gelbvieh, Hereford, Holstein, Jersey, Limousin, Maine Anjou, Simmental, Braunvieh, and Shorthorn.
  • Deliverables: original FASTQ, aligned BAM (and index), imputed VCF (and index), breed analysis
Cattle low-pass v2.2
  • Same configuration as Cattle low-pass v2.0, but includes the Y chromosome
Cattle low-pass v2.3
  • Same configuration as Cattle low-pass v2.2, but with performance improvements
Cattle low-pass v2.4
  • Same configuration as Cattle low-pass v2.3, but the reference panel has been updated to include additional polymorphisms in the imputed VCF
Cattle low-pass v3.0
  • Reference genome: ARS-UCD1.2
  • Imputation reference panel: We used sequence data from 1,987 animals with publicly-available data, and processed these data using GATK into an imputation reference panel. We then subet this reference panel to the set of variants either 1) segregating in annotated Bos taurus samples or 2) present on public genotyping array manifests. This resulted in a reference panel of 59M variants (SNPs and small indels).
  • Breed analysis reference panel: We report ancestry proportions from 13 breeds: Angus, Brahman, Charolais, Gelbvieh, Hereford, Holstein, Jersey, Limousin, Red Angus, Simmental, Braunvieh, Santa Gertrudis, and Shorthorn.
  • Deliverables: original FASTQ, aligned BAM (and index), imputed VCF (and index), breed analysis

Dog

Dog low-pass v1.0
  • Reference genome: canFam3
  • Imputation reference panel: A reference panel of 435 sequenced dogs and 46M sites
  • Breed analysis reference panel: This panel contains data from 91 breeds
  • Deliverables:original FASTQ, aligned BAM (and index), imputed VCF (and index), breed analysis
Dog low-pass v2.0
  • Reference genome: canFam3
  • Imputation reference panel: A reference panel of 676 sequenced dogs and 53M sites.
  • Breed analysis reference panel: This panel contains data from 91 breeds
  • Deliverables: original FASTQ, aligned BAM (and index), imputed VCF (and index), breed analysis

Human

Human low-pass v1.0
  • Reference genome: hs37-1kg
  • Imputation reference panel: 1000 Genomes Phase 3, with all sites with a minor allele count less than three, with more than two alleles, or on the sex chromosomes removed.
  • Ancestry reference panel: We provide an ancestry analysis based on 26 reference populations described here
  • Deliverables: original FASTQ, aligned BAM (and index), imputed VCF (and index), ancestry analysis, polygenic scores
Human low-pass v2.0
  • Reference genome: hs37-1kg
  • Imputation reference panel: 1000 Genomes Phase 3. Relative to v1.0, this includes all sites (including normalized multiallelic sites and the X chromosome.
  • Ancestry reference panel: We provide an ancestry analysis based on 26 reference populations described here
  • Polygenic risk scores:
  • Deliverables: original FASTQ, aligned BAM (and index), imputed VCF (and index), ancestry analysis, polygenic scores
Human low-pass v2.1
  • Same configuration as Human low-pass v2.0, but with duplicate sites removed (see 1000 Genomes website for details)
Human low-pass GRCh37 v2.2
  • Same configuration as Human low-pass v2.1, but with bugfixes and performance improvements
Human low-pass GRCh37 v2.3
  • Same configuration as Human low-pass v2.2, but imputation now takes into account varying recombination rates across the genome. In particular, a recombination map derived from from the HapMap II project is used to interpolate recombination rates across all sites in the haplotype reference panel. This results in increased imputation accuracy compared to the configuration for Human low-pass v2.2.
Human low-pass GRCh37 v2.4
  • Same configuration as Human low-pass 2.3, but the CNV calling part of the pipeline now uses a panel of normals comprising 59 male normal samples. Previously, the CNV calling step did not normalize against any normal human samples.
Human low-pass GRCh37 v2.5
  • Same configuration as Human low-pass 2.4, but with additional imputation QC metrics calculated.
Human low-pass GRCh37 GLIMPSE v0.1 (testing)
Human low-pass GRCh37 GLIMPSE v0.2 (testing)
  • Same configuration as Human low-pass GRCh37 GLIMPSE v0.1, but with the full 1000 Genomes Phase 3 reference panel
Human low-pass GRCh38 (beta):
  • Reference genome: GRCh38 with alternative sequences, plus decoys and HLA here.
  • Imputation reference panel: Variant calls from 1000 Genomes Phase 3 samples resequenced at high depth by the New York Genome Center (processing pipeline described here), after removing singletons (variants with a minor allele count of 1 in the sample), for a total of ~62M variants.
  • Ancestry reference panel: We provide an ancestry analysis based on 26 reference populations described here.
  • Deliverables: original FASTQ, aligned BAM (and index), imputed VCF (and index), ancestry analysis.
Human low-pass GRCh38 v1.0:
  • Same configuration as the Human low-pass GRCh38 (beta) except for the imputation reference panel.
  • Imputation reference panel: Lifted-over panel from the 1000 Genomes Phase 3 GRCh37 release.
Human low-pass GRCh38 v1.1:
  • Same configuration as Human low-pass GRCh38 v1.0 but with bugfixes and performance improvements.
Human low-pass GRCh38 v2.0:
  • Same reference genome and deliverables as Human low-pass GRCh38 v1.1 but with a new imputation reference panel comprising the phased release of genotype calls from the New York Genome Center's resequencing efforts of individuals from the 1000 Genomes Project. Comprises 3202 individuals, including the original 2504 from Phase 3 and an additional 798 relatives. See the preprint here for more details.
Human low-pass GRCh38 v2.1:
  • Same configuration as Human low-pass GRCh38 v2.0 but annotated with variant IDs deriving from dbSNP build 151.
Human low-pass GRCh38 v2.2:
  • Same configuration as Human low-pass GRCh38 v2.1 but with additional imputation QC metrics calculated.

Maize

Maize low-pass v1.0
  • Reference genome: AGPv4
  • Imputation reference panel: Maize 282 association panel genotypes (7x, AGPv4 coordinates), for a total of ~82M variants.
  • Strain reference panel: Each strain in the imputation reference panel was considered a separate population. Those that appear particularly similar are merged downstream into related groups such as "NC262RELATED".
  • Deliverables: original FASTQ, aligned BAM (and index), imputed VCF (and index), strain analysis
Maize low-pass v1.1
  • Same as Maize low-pass v1.0 but with multi-allelic SNPs included in the imputation reference panel.

Mouse

Mouse low-pass v1.0
  • Reference genome: GRCm38_68
  • Imputation reference panel: The Mouse Genomes Project contains ~59M SNPs discovered in 36 sequenced lines.
  • Deliverables: original FASTQ, aligned BAM (and index), imputed VCF (and index)

Rat

Rat low-pass v1.0
  • Reference genome: rn6
  • Imputation reference panel: 42 rat genomes described in Hermsen et al. (2015) and lifted over to rn6, containing 8.7M variants.
  • Strain analysis panel: The same 42 rat genomes used for the imputation reference panel.
  • Deliverables: original FASTQ, aligned BAM (and index), imputed VCF (and index), strain analysis

Soy

Soy low-pass Wm82.a2 v1.0
  • Reference genome: Wm82.a2.v1
  • Imputation reference panel: GmHapMap contains ~11M bi-allelic SNPs identified in around 1000 sequenced accessions.
  • Deliverables: original FASTQ, aligned BAM (and index), imputed VCF (and index)
Soy low-pass Wm82.a2 v2.0
  • Same reference genome as v1.0
  • Imputation reference panel: We used 478 samples from the USDA-GRIN germplasm collection, and processed these data using GATK4 into a reference panel of 32M SNPs and short indels.
  • Deliverables: original FASTQ, aligned BAM (and index), imputed VCF (and index)
Soy low-pass Wm82.a4 v1.0
  • Reference genome: Wm82.a4.v1
  • Imputation reference panel: We used 478 samples from the USDA-GRIN germplasm collection, and processed these data using GATK4 into a reference panel of 25M SNPs and short indels.
  • Deliverables: original FASTQ, aligned BAM (and index), imputed VCF (and index)

Swine

Swine low-pass v1.0
  • Reference genome: susScr11
  • Imputation reference panel: We used 414 samples from the publicly available swine sequence data (PRJEB39374, PRJNA343658, PRJNA414091, PRJNA482384, PRJNA506339, and PRJNA553106), and processed these data using GATK4 into a reference panel of 53M SNPs and short indels.
  • Deliverables: original FASTQ, aligned BAM (and index), imputed VCF (and index)

Domestic cat

Domestic cat low-pass v1.0
  • Reference genome: felCat9
  • Imputation reference panel: We used 78 WGS samples from Felis Catus breeds from the 99lives project and processed these data using GATK4 into a reference panel of 49M snps and small indels.
  • Deliverables: original FASTQ, aligned BAM (and index), imputed VCF (and index)
Domestic cat low-pass v2.0
  • Reference genome: same as Domestic cat low-pass v1.0
  • Imputation reference panel: We used 185 WGS samples from Felis Catus breeds from the 99 lives project (a partial list can be found here) using GATK4 into a reference panel comprising 55M SNPs and small indels.
  • Deliverables: original FASTQ, aligned BAM (and index), imputed VCF (and index)
Back to top