My FASTQs
Enabling uploads¶
In order to enable FASTQ uploads for your account, log into your account and go to My FASTQs, where instructions will be provided (in case you already do not have access). You can expect a response from Gencove support within 24h.
Once uploads are enabled, users can upload files to the Gencove upload area using the Gencove CLI and assign the files to projects using the Gencove Dashboard. Once files are assigned to a project, they will be processed by the Gencove analysis pipeline. Analysis results will be available via the Gencove API and Dashboard once analysis is complete.
Warning
The Gencove upload area should be considered temporary storage and should not be used as permanent storage space for your files. Once files are assigned to a project, they will be stored according to your data retention agreement with Gencove.
File naming convention¶
We highly recommend using the standard Illumina naming convention for FASTQ files. If files are named in this manner, Gencove systems will automatically detect:
- the sample identifier (and use it as the sample's
client_id
) - R1/R2 designations of files
A summary of the naming convention is:
SAMPLE ID
+ _
+ ... + _
+ (R1
or R2
) + _
+ ... + .fastq.gz
For example, the table below shows examples of file names using this convention and the corresponding detected sample identifiers and read designations
File name | Sample ID | Read pair |
---|---|---|
SAMPLE1_R1.fastq.gz | SAMPLE1 | R1 |
SAMPLE1_R2.fastq.gz | SAMPLE1 | R2 |
SAMPLE2_LANE1_SEQUENCER1_R1.fastq.gz | SAMPLE2 | R1 |
SAMPLE3_R1_L001.fastq.gz | SAMPLE3 | R1 |
SAMPLE4_R1.fq.gz | SAMPLE4 | R1 |
Custom file names¶
To bypass the default convention outlined above and explicitly specify sample identifiers and R1/R2 designations for FASTQ files, a file ending with .fastq-map.csv
can be provided as the SOURCE
to the gencove upload
command. The format of the file is outlined in the code snippet on the right.
The following validation is performed on the .fastq-map.csv
file:
- the file header is
client_id,r_notation,path
- values in the
client_id
column cannot contain_
- values in the
r_notation
column can only be "r1" or "r2" - file listed in the
path
column must: - exist
- be gzip-compressed
Example:
client_id,r_notation,path
<sample_id_1>,<r_notation_1>,<path_to_fastq_file_1>
<sample_id_2>,<r_notation_2>,<path_to_fastq_file_2>
<sample_id_3>,<r_notation_3>,<path_to_fastq_file_3>
...
Grouping files¶
By default, Gencove systems expect one pair of FASTQ files per sample.
If sequencing reads for a single sample are spread across multiple FASTQ files, they need to be merged into one R1 file and one R2 file. This can be accomplished in several ways:
- Listing multiple files for the same
client_id
andr_notation
in the.fastq-map.csv
file (outlined in previous section) results in the files being concatenated on the fly during upload with the Gencove CLI - see example in code snippet on the right. Note that the concatenation order is controlled by the order of rows listed in the.fastq-map.csv
file. - Manually concatenate the files. Since gzip-compressed files can be merged without decompressing, it's simply a matter of concatenating the compressed files.
- By providing the
--no-lane-splitting
flag tobcl2fastq
, splitting reads into multiple FASTQ files can be avoided upstream in the demultiplexing phase.
Example: