Uploading using the CLI
Uploading using the CLI¶
Syncs local directories to directories in your Gencove upload area. Recursively copies new and updated files from the source directory to the destination.
Alternatively, can be used to import FASTQ files from URLs using a map file.
Only creates folders in the destination if they contain one or more files.
This example command will recursively copy all files in the
my-fastq-files/
directory on your host system to a directory with an
automatically generated name the Gencove upload area.
If there are multiple input FASTQ files per sample, or the file names do not follow the conventions described above, a manifest describing the relationship between the sample identifiers and the input FASTQ files must be provided in a CSV file in the format described above.
In case more control is needed over the upload destination, a destination path
prefixed with gncv://
may be provided. This pattern is commonly used for
separating upload batches when continuously uploading data to your Gencove
account and is useful for easily filtering files in the Gencove Dashboard. A
common directory structure for batching uploads is:
gncv://<project-name>/<batch-name>/
If specifying a destination path, it is recommended to have at least one level of directories to separate batches of uploaded data. In other words, it is recommended to avoid placing all files in the root directory gncv://
Details of upload
behavior:
- In case a file in the local directory already exists in the destination, it will not be overwritten
- In case a file exists in the destination, but not the local directory, it will not be deleted
Importing files from URLs with a map file¶
Using the map file described above, it is also possible to import FASTQ files from URLs.
When constructing the map CSV file, include the URL for each file under the path
column.
Here is an example of the contents of a CSV map file that uses URLs:
client_id,r_notation,path
sample1,r1,https://example-bucket.storage.googleapis.com/sample_R1.fastq.gz
sample1,r2,https://example-bucket.storage.googleapis.com/sample_R2.fastq.gz
Note
Note that only the following URL domains are supported:
-
amazonaws.com
(AWS) -
blob.core.windows.net
(Azure) -
googleapis.com
(Google Cloud)
Once the map file has been built, the upload
command can be used:
Warning
When generating URLs from the above cloud providers, we suggest setting a generous expiration time to ensure the URLs do not expire by the time they reach a project and need to be retrieved by the corresponding pipeline.
Automatically starting analysis¶
To automatically assign uploads to a project and run analysis, provide the --run-project-id
flag and destination project id to the Gencove CLI.
$ gencove upload my-fastq-files/ gncv://my-fastq/batch-1/ --run-project-id b1edbb20-ee77-4be0-9944-e8e3a593cc83
When this feature is used, the Gencove CLI will check to make sure that contents of SOURCE
and DESTINATION
are identical in order to avoid analysis of unwanted samples. This will always be the case if DESTINATION
is omitted, i.e., autogenerated by the Gencove CLI.
It is also important to ensure uploaded files follow naming conventions outlined above to avoid sample identifier detection issues.