Skip to content

S3 Imports

S3 Integration Guide

You must grant Gencove permission to access your AWS account in which it will be able to list and read objects from S3 buckets. Gencove uses a cross-account role with external ID enabled.

You can perform this integration only once per Gencove user. If you need to use multiple AWS accounts you can create more Gencove users and set up S3 integration for each one.

Step 1: Configure your Gencove account

  1. Login to your Gencove account and go to the fastqs page.
  2. Click the S3 tab.
  3. Click Connect to Amazon S3 and follow the instructions.
    1. Copy the AWS Account ID, it is needed for the next step.
    2. Generate a new External ID and copy it, you will need it for the next step.

Connect to Amazon S3

Step 2: Create a cross-account role and an access policy on AWS

  1. In the AWS Console, go to the IAM service.
  2. Click the Roles tab in the sidebar.
  3. Click Create role.

    • In Select type of trusted entity, click the Another AWS account tile.
    • In the Account ID field, enter the Gencove account ID from Step 1
    • Paste the External ID generated in Step 1 into the corresponding field.

      Create AWS Role

  4. Then click on Next to set permissions.

    • Create a custom policy using the template below. Click the JSON tab, paste the template and replace "${BucketName}" with the name of the bucket you want to give access to.
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Action": [
                    "s3:ListBucket",
                    "s3:GetBucketLocation"
                ],
                "Effect": "Allow",
                "Resource": "arn:aws:s3:::${BucketName}"
            },
            {
                "Effect": "Allow",
                "Action": "s3:GetObject",
                "Resource": "arn:aws:s3:::${BucketName}/*"
            }
        ]
    }
    
  5. Click on Next to set the Role name.

  6. Click on Create Role. Then the list of roles is displayed, click on the role you just created and copy the role ARN. It takes some time for AWS to propagate changes, it could take a couple minutes until the role is ready to be used, we recommend waiting at least 5 minutes before continuing.
  7. Go to Roles list and select the role you created.

    Role detail view

    Copy Role ARN

  8. Go back to your Gencove fastqs S3 configuration page and paste the role ARN in the corresponding field and click "Connect".

Connect to Amazon S3

Step 3: Import samples from S3

  1. Login to your Gencove account and go to the fastqs page.
  2. Click the S3 tab.
  3. Click "Import from S3" and paste the S3 uri that you want to import, then click "Import".

    Import from S3

  4. Choose a project to assign the imported samples.

    Assign to Gencove Project

  5. Finally you will see an overview. You can assign metadata to the samples. When you're done, click "Run analysis".

    Run Analysis

  6. You will be redirected to the project page and you will find the new samples there in importing state.

    Project Samples

  7. After the sample is imported it will be sent to analysis.

Sample details view

Automatic Imports

To enable automatic imports you must have an active S3 Integration and you need to have the Gencove CLI installed.

How automatic imports work

For automatic imports to work you need to setup events in your S3 buckets, ObjectCreated events must be sent to a SNS Topic provided by Gencove. You will then upload files to a specific folder on S3 and when you're finished you need to create an empty file called done inside that folder. Gencove systems will receive a notification that you're done uploading files and will automatically import the entire folder to a given Gencove Project. A given folder will only be processed once, and you can't reuse it after you uploaded a done file for the first time.

There is a main difference between how BaseSpace and S3 automatic imports work. For BaseSpace, when an autoimport job is created Gencove systems do a scan searching for projects created in the last day. But for S3 no initial scan is done and Gencove systems only waits for events of done files being uploaded after a S3 autoimport is created.

Step 1: Create import Job

Using the CLI execute the following command:

$ gencove s3 autoimports create <gencove-project-id> <s3-uri>

Where <gencove-project-id> is the uuid of the gencove project you want the samples to be imported and <s3-uri> is the path where the fastq files are going to be uploaded, for example: s3://bucket-name/human/project-A/.

That command will give you the autoimport id and the Topic ARN.

Step 2: Setup S3 Events

For security reasons we only process event that are triggered by the AWS account id used in Step 2 of the S3 integration guide.

  1. Go to your S3 AWS Console.
  2. Select the desired bucket.
  3. Click on the "Properties" tab.
  4. Scroll down to "Event notifications" and click "Create event notification".
  5. Type an "Event name" like Automatic Import to Gencove Project A a "Prefix" like human/project-A/ and "Suffix" = done. Create event notification
  6. Check the box to react to All object create events. All object create events
  7. Finally paste the Topic ARN that you got from Step 1 and click Save changes. Configure SNS Topic

Step 3: Start uploading files to S3

You're done with setting up configuration. You can start uploading fastq files to S3, when you want a folder to be imported you need to upload/create an empty file called done inside that folder. Then go to your Gencove Project detail page and you'll see that your samples are being imported.

Let's see an example:

  • Configure autoimport job for s3 uri s3://bucket-name/human/project-A/
  • Create a new folder for an assay s3://bucket-name/human/project-A/assay-1/
  • Upload fastq files to new folder
    • s3://bucket-name/human/project-A/assay-1/SAMPLE1_R1.fastq.gz
    • s3://bucket-name/human/project-A/assay-1/SAMPLE1_R2.fastq.gz
    • ...
  • When you finish uploading your samples, create an empty file called done
    • s3://bucket-name/human/project-A/assay-1/done
  • All the samples uploaded to s3://bucket-name/human/project-A/assay-1/ will be imported to Gencove project and analysed
  • If you want to import more files you'll have to use a different folder. Because a given path is only processed once.

Happy analysis!

Back to top