Storage and Files¶
Storage of data on Gencove Explorer relies on two main mechanisms:
- Local storage
- Cloud storage, aka EOS (Explorer Object Storage)
Each Gencove Explorer instance contains its own local storage as you would expect with any virtual machine. However, this HDD storage space is limited, and is intended to be a transient or intermediary store for your data.
Larger persistent data is intended to be stored on EOS cloud storage, which is private to your organization. To that end, we provide several mechanisms through the Explorer SDK and CLI to enable easily managing your data in the cloud.
Local Instance Storage¶
/home/explorer is your personal local storage area. Any programs, scripts, or data
files you work with in Jupyter Lab can be saved here. To ensure optimal system performance, please
keep track of
your storage usage and manage your files appropriately, offloading any large data files to cloud
storage as necessary (described in the following section).
By default, each Explorer instance is allocated 200GB of local disk space.
Running the command
df -h through the Explorer terminal, you will see output similar to the
While using the Gencove Explorer system, it is
important to regularly take note of how much local storage space you have used. This can easily
retrieved by running
df -h /home/explorer in the terminal. If you are nearing your storage limit
200 GB, consider offloading larger data files to cloud storage, described in the
Cloud Storage, EOS¶
Data files commonly used in genomics applications are often very large (tens or even hundreds of Gb), and therefore can be unwieldy to work with.
Your Gencove Explorer instance and Analysis jobs are all configured with access to EOS (Explorer Object Storage) - private cloud object storage which can only be accessed by users in the Gencove Organization you belong to.
EOS is essentially a lightweight wrapper around AWS S3 object storage that aims to simplify access to the appropriate user, organization, and Gencove-wide locations on S3.
EOS URIs always start with
e://. There are three top-level namespaces for EOS:
e://users/me/as shorthand for the current user)
- read/write for current user, only read for the rest of your organization
- read/write for entire organization
- read for entire organization
EOS can be accessed using the Gencove Explorer CLI and Explorer SDK.
EOS via CLI¶
For ease of use, the
gencove explorer data command is also available via the
d (for "data")
alias. The equivalent
d command to the example above would be:
Listing files can be accomplished with the
Files can be uploaded and downloaded using the
Files can be synced in bulk using the
Remote files can be deleted using the
EOS via SDK File object¶
The Explorer SDK comes with an inbuilt abstraction of a
File object, which represents a file with a local and remote location. It provides a way to specify and transfer (download and upload) files between local and remote storage.
Neither the local nor remote locations need to exist when the object is created, which effectively allows working with files in a "lazy" manner. This way, the abstraction of a file can be worked with but the actual upload or download of a potentially very large file is not effected until an explicit method call is invoked.
Neither local nor remote location need to be specified for the
File object. If left unspecified, the following defaults are assumed:
- Local: a random filename in the
- Remote: a random key in a temporary EOS location
In this section, we describe the
File object at a high level, and describe the various methods available to access and transfer (download and upload) remote files.
💡 Note that “local” in this section refers to your Explorer Instance storage or Explorer Analysis job storage.
In cases where you would like to retrieve a file that already exists on EOS you can use the
path_e parameter to
File. The related
name parameter to
File is shorthand for
e://users/me/<name>. For example, both of the
File objects below point to the same remote file:
File object that refers to a remote file is created, you can make a local copy via the
For files that exist at an S3 location, you can use the
path_s3 parameter to
path_s3 are directly related when referring to EOS locations as follows:
nameparameter is shorthand for
e://...) directly correspond to a location in your organization's S3 bucket
results in the following values for
path_s3 in the resulting
For files that exist on the public Internet, you can provide the URL via the
url parameter to
Additionally, you can specify the local destination path and file name for downloading the file:
# Specify local path when creating the File object
f5 = File(name="project-1/file.txt", path_local="~/file.txt")
# Specify local path when calling .download()
f6 = File(name="project-1/file.txt")
# Print path to local copy of files
💡 All parameters of the
File object must be provided by keyword, otherwise, the exception
File.__init__() takes 1 positional argument but 2 were given will be raised.
💡 Note that downloading files from FTP links via
File.download() is not currently supported.
Similar to downloading files, they can be uploaded from local storage to EOS (or S3 more generally):
Analogously to downloading objects,
path_s3 are all valid upload destinations.
File objects can be executed as scripts by using the
Command-line parameters can be passed to
execute() as follows:
In case it is preferable to process the output instead of printing it to the terminal,
execute() can be configured to provide the output via its return value by setting
/bin/sh as the default interpreter, but any interpreter can be specified by setting the appropriate shebang line in the file. For example, to use Python as the interpreter add the following line to the top of the Python script:
Finally, a real-world example for executing a shell script (
~/script.sh) on an array of inputs utilizing the features described in this section:
It is also possible to use the
File object for creating temporary local and/or remote files.
💡 Temporary files have an automatically generated filename that is guaranteed to be unique. The user should not expect that these files are permanently stored. We recommend using them for intermediate analysis results, while named files should be used for inputs and outputs.
To create a temporary remote file, exclude the
To create a temporary local file, exclude the
To create a file that is temporary both locally and remotely, exclude all parameters:
Generating URLs for remote
It is possible to generate a temporary URL for a file in EOS via the
💡 The URLs generated with this method:
- Provide access to the file over the public Internet by anyone who has the URL
- Expire after 48 hours
Below is an end to end example where we copy a local file to EOS, then obtain a URL for it.
Sharing files within your organization¶
Files can be easily shared within your organization by sharing the EOS URI (
e://...) of objects.
💡 Files created in your user namespace (
e://users/me/) need to be made "shareable" by replacing the
me shorthand with your user id. To simplify this process, the Explorer SDK provides a convenient
File object attribute named
EOS URIs of files created in the organization namespace (
e://org/) can be shared within your organization without modification.