Skip to content

Storage

Storage of data on Gencove Explorer relies on two main mechanisms, 1) local storage and 2) cloud storage. Each Gencove Explorer instance contains its own local storage as you would expect with any virtual machine. However, this HDD storage space is limited, and is intended to be a transient or intermediary store for your data.

Larger persistent data is intended to be stored on your private Gencove Explorer cloud storage space. To that end, we provide several mechanisms through the Explorer SDK to enable easily managing your data in the cloud. Please read on for further information on local and cloud storage.

Local Instance Storage

The directory /home/explorer is your personal local storage area. Any programs, scripts, or data files you work with in Jupyter Lab can be saved here. To ensure optimal system performance, please keep track of your storage usage and manage your files appropriately, offloading any large data files to cloud storage as necessary (described in the following section).

By default, each Explorer instance is allocated 200GB of local disk space.

Running the command df -h through the Explorer terminal, you will see output similar to the following:

Filesystem      Size  Used Avail Use% Mounted on
overlay         196G  5.0G  181G   3% /
tmpfs            64M     0   64M   0% /dev
/dev/nvme1n1    196G  5.0G  181G   3% /home/explorer
shm              64M     0   64M   0% /dev/shm
/dev/nvme0n1p1   50G  2.6G   48G   6% /opt/ecs/metadata/111
tmpfs           7.7G     0  7.7G   0% /proc/acpi
tmpfs           7.7G     0  7.7G   0% /sys/firmware

While using the Gencove Explorer system, it is important to regularly take note of how much local storage space you have used. This can easily retrieved by running df -h /home/explorer in the terminal. If you are nearing your storage limit of 200GB, consider offloading larger data files to cloud storage, described in the following section.

Cloud Storage

Your Gencove Explorer instance comes with private cloud storage which can only be accessed by users in the Gencove Organization you belong to.

There are several mechanisms for uploading data to your cloud storage area. Let's explore these mechanisms through the following examples.

1. Uploading from local to cloud

First, let's create a blank file on our local filesystem through the terminal.

touch /home/explorer/example_file.txt

Next, we can open a new notebook and run the following Python code to import the necessary Explorer SDK File object.

from gencove_explorer.models import File

Now let's create an SDK representation of example_file.txt

example_file = File(name="example_file", path_local="/home/explorer/example_file.txt")

The File object is the primary interface for working with files through the Explorer SDK. For more information, please see the File section. Here, we are setting two parameters:

  1. name: This represents a unique key for the file that will allow retrieving it later. Ensure that the name is descriptive so it can be unambiguously referenced in the future.
  2. path_local: This represents the path to the file on your local filesystem. In this case, we are referencing the empty file we created via the terminal.

Now, we can easily upload the file to cloud storage. By calling .upload() on our object, the file will be automatically uploaded to your private cloud storage space.

example_file.upload()

To retrieve the file at any time in the future, you can reference the name supplied earlier. For this demonstration, you can open a new notebook and use the following code to download the previously uploaded file.

from gencove_explorer.models import File

# Create new File object, only specifying the `name` parameter this time
example_file_from_cloud = File(name="example_file")

# Use .as_local() to download the file to local to the specified path
example_file_from_cloud.as_local(path_local="/home/explorer/example_file_download.txt")

2. Uploading from an analysis job to cloud

When executing jobs via your provisioned Explorer analysis cluster, any outputs generated can be automatically uploaded to your cloud storage. These outputs are then available through the Analysis object used to initially execute the job.

Listing S3 prefixes

Listing User prefix

To get the S3 prefix of your user space, you can use the s3_path_user() method:

import gencove_explorer

print(gencove_explorer.s3_path_user())

Listing organization shared prefix

To get the S3 prefix of your organization shared space, you can use s3_path_shared_org() method:

import gencove_explorer

print(gencove_explorer.s3_path_shared_org())