My FASTQs

Enabling uploads¶

In order to enable FASTQ uploads for your account, log into your account and go to My FASTQs, where instructions will be provided (in case you already do not have access). You can expect a response from Gencove support within 24h.

Once uploads are enabled, users can upload files to the Gencove upload area using the Gencove CLI and assign the files to projects using the Gencove Dashboard. Once files are assigned to a project, they will be processed by the Gencove analysis pipeline. Analysis results will be available via the Gencove API and Dashboard once analysis is complete.

Warning

The Gencove upload area should be considered temporary storage and should not be used as permanent storage space for your files. Once files are assigned to a project, they will be stored according to your data retention agreement with Gencove.

File naming convention¶

We highly recommend using the standard Illumina naming convention for FASTQ files. If files are named in this manner, Gencove systems will automatically detect:

the sample identifier (and use it as the sample's client_id)
R1/R2 designations of files

A summary of the naming convention is:

SAMPLE ID + _ + ... + _ + (R1 or R2) + _ + ... + .fastq.gz

For example, the table below shows examples of file names using this convention and the corresponding detected sample identifiers and read designations

File name	Sample ID	Read pair
SAMPLE1_R1.fastq.gz	SAMPLE1	R1
SAMPLE1_R2.fastq.gz	SAMPLE1	R2
SAMPLE2_LANE1_SEQUENCER1_R1.fastq.gz	SAMPLE2	R1
SAMPLE3_R1_L001.fastq.gz	SAMPLE3	R1
SAMPLE4_R1.fq.gz	SAMPLE4	R1

Custom file names¶

To bypass the default convention outlined above and explicitly specify sample identifiers and R1/R2 designations for FASTQ files, a file ending with .fastq-map.csv can be provided as the SOURCE to the gencove upload command. The format of the file is outlined in the code snippet on the right.

The following validation is performed on the .fastq-map.csv file:

the file header is client_id,r_notation,path
values in the client_id column cannot contain _
values in the r_notation column can only be "r1" or "r2"
file listed in the path column must:
exist
be gzip-compressed

Example:

client_id,r_notation,path
<sample_id_1>,<r_notation_1>,<path_to_fastq_file_1>
<sample_id_2>,<r_notation_2>,<path_to_fastq_file_2>
<sample_id_3>,<r_notation_3>,<path_to_fastq_file_3>
...

Grouping files¶

By default, Gencove systems expect one pair of FASTQ files per sample.

If sequencing reads for a single sample are spread across multiple FASTQ files, they need to be merged into one R1 file and one R2 file. This can be accomplished in several ways:

Listing multiple files for the same client_id and r_notation in the .fastq-map.csv file (outlined in previous section) results in the files being concatenated on the fly during upload with the Gencove CLI - see example in code snippet on the right. Note that the concatenation order is controlled by the order of rows listed in the .fastq-map.csv file.
Manually concatenate the files. Since gzip-compressed files can be merged without decompressing, it's simply a matter of concatenating the compressed files.
By providing the --no-lane-splitting flag to bcl2fastq, splitting reads into multiple FASTQ files can be avoided upstream in the demultiplexing phase.

Example:

client_id,r_notation,path
sampleid1,r1,sample1_part1_r1.fastq.gz
sampleid1,r1,sample1_part2_r1.fastq.gz
sampleid1,r1,sample1_part3_r1.fastq.gz
sampleid1,r2,sample1_part1_r2.fastq.gz
sampleid1,r2,sample1_part2_r2.fastq.gz
sampleid1,r2,sample1_part3_r2.fastq.gz