Storage¶
Storage of data on Gencove Explorer relies on two main mechanisms, 1) local storage and 2) cloud storage. Each Gencove Explorer instance contains its own local storage as you would expect with any virtual machine. However, this HDD storage space is limited, and is intended to be a transient or intermediary store for your data.
Larger persistent data is intended to be stored on your private Gencove Explorer cloud storage space. To that end, we provide several mechanisms through the Explorer SDK to enable easily managing your data in the cloud. Please read on for further information on local and cloud storage.
Local Instance Storage¶
The directory /home/explorer
is your personal local storage area. Any programs, scripts, or data
files you work with in Jupyter Lab can be saved here. To ensure optimal system performance, please
keep track of
your storage usage and manage your files appropriately, offloading any large data files to cloud
storage as necessary (described in the following section).
By default, each Explorer instance is allocated 200GB of local disk space.
Running the command df -h
through the Explorer terminal, you will see output similar to the
following:
Filesystem Size Used Avail Use% Mounted on
overlay 196G 5.0G 181G 3% /
tmpfs 64M 0 64M 0% /dev
/dev/nvme1n1 196G 5.0G 181G 3% /home/explorer
shm 64M 0 64M 0% /dev/shm
/dev/nvme0n1p1 50G 2.6G 48G 6% /opt/ecs/metadata/111
tmpfs 7.7G 0 7.7G 0% /proc/acpi
tmpfs 7.7G 0 7.7G 0% /sys/firmware
While using the Gencove Explorer system, it is
important to regularly take note of how much local storage space you have used. This can easily
retrieved by running df -h /home/explorer
in the terminal. If you are nearing your storage limit
of 200GB
, consider offloading larger data files to cloud storage, described in the
following section.
Cloud Storage¶
Your Gencove Explorer instance comes with private cloud storage which can only be accessed by users in the Gencove Organization you belong to.
There are several mechanisms for uploading data to your cloud storage area. Let's explore these mechanisms through the following examples.
1. Uploading from local to cloud¶
First, let's create a blank file on our local filesystem through the terminal.
Next, we can open a new notebook and run the following Python code to import the necessary
Explorer SDK File
object.
Now let's create an SDK representation of example_file.txt
The File
object is the primary interface for working with files through the Explorer SDK. For
more information, please see the File section. Here, we are setting two
parameters:
name
: This represents a unique key for the file that will allow retrieving it later. Ensure that the name is descriptive so it can be unambiguously referenced in the future.path_local
: This represents the path to the file on your local filesystem. In this case, we are referencing the empty file we created via the terminal.
Now, we can easily upload the file to cloud storage. By calling .upload()
on our object, the file
will be automatically uploaded to your private cloud storage space.
To retrieve the file at any time in the future, you can reference the name
supplied earlier.
For this demonstration, you can open a new notebook and use the following code to download the
previously uploaded file.
from gencove_explorer.models import File
# Create new File object, only specifying the `name` parameter this time
example_file_from_cloud = File(name="example_file")
# Use .as_local() to download the file to local to the specified path
example_file_from_cloud.as_local(path_local="/home/explorer/example_file_download.txt")
2. Uploading from an analysis job to cloud¶
When executing jobs via your provisioned Explorer analysis cluster, any outputs generated can be
automatically uploaded to your cloud storage. These outputs are then available through the
Analysis
object used to initially execute the job.
- For information on how to upload data from an analysis job, see the Analysis section
- For information on how to retrieve data from past analyses, see the Job Manager Section
Listing S3 prefixes¶
Listing User prefix¶
To get the S3 prefix of your user space, you can use the s3_path_user()
method:
Listing organization shared prefix¶
To get the S3 prefix of your organization shared space, you can use s3_path_shared_org()
method: