Installing packages and software in Gencove Explorer¶
Conceptually, it is useful to think about the environments you have access to via the Explorer platform in two parts:
- Local environment, or the environment in which you interact with via your JupyterLab instance.
- Cluster environment, or the environment to which you can submit
Analysis
work functions.
These are separate environments and they come with the same pre-installed packages by default.
New packages installed by the user in one environment do not propagate to the other automatically.
This is an important distinction; for example, just because you installed a Python module in your working environment via JupyterLab does not mean it is installed for use in the Analysis
work function.
In the following sections, we will take an example case where you have an analysis to run which requires a command line tool vcftools
and a Python module cyvcf2
, neither of which come pre-installed on Explorer instances or on the cluster, and provide details on how you would install this on both your Explorer instance and for your cluster analysis.
Explorer uses Conda as its primary package manager. It is used to install packages on both the local and cluster environment. The default Conda environment provided by Gencove is explorer
, which is available in both the terminal and Jupyter notebooks.
💡 Advanced users may use Conda for managing separate environments for maintaining different versions of Python, R, and operating system tooling.
Updating Gencove packages¶
The gencove explorer update
command (shorthand alias: u
) can be used to update the Explorer SDK, Library, and Gencove CLI as follows:
$ u sdk # Update Explorer SDK
$ u library # Update Explorer Library
$ u cli # Update Gencove CLI
$ u all # Update all of the above
Installing packages in your Gencove Explorer working environment¶
Your Gencove Explorer instance is a Linux virtual machine running Ubuntu 22.04, and the JupyterLab instance runs on this machine.
In our example, to install the vcftools
Conda package from the bioconda
channel, use the mamba
command as follows:
💡 We do not recommend using the system package manager (e.g., apt-get
) to install packages, since they will not be available after the instance is restarted.
💡 We recommend using mamba
because it resolves dependencies more effectively, but conda
can be used as a drop-in replacement.
Similarly, you can install Python modules using pip
; in our example, to install the cyvcf2
module, which also does not come preinstalled on Explorer instances, you would simply run
After this command successfully runs (and after you have restarted the iPython kernel), cyvcf2
will be installed in your local environment, and you will be able to use it in your Python notebooks on your Explorer instance.
Installing packages for use in Analysis work function¶
The cluster environment in which cluster jobs you submit are run is an entirely separate environment than your Explorer working environment, though it is also based on Ubuntu 22.04. As such, you must install software/modules explicitly when submitting jobs for analysis on the cluster (see documentation in Analysis).
The Analysis
object accepts pip_packages, conda_packages, and image parameters to further assist with installing and using custom software.
Anaconda and PyPI packages can be installed either at the beginning of your work function:
from gencove_explorer.analysis import Analysis, AnalysisContext
def work(ac: AnalysisContext):
# install necessary packages
! pip install cyvcf2
! mamba install -y -c bioconda vcftools
# do actual analysis
...
or as parameters to the Analysis
object:
from gencove_explorer.analysis import Analysis, AnalysisContext
an = Analysis(
function=work,
pip_packages=["cyvcf2"],
conda_packages=["bioconda::vcftools"],
)
an.run()
Reset Explorer Environment¶
When the Explorer Instance is not working as intended it's possible to reset it to a clean environment.
There's a useful script for that. It can be executed by running it from a terminal:
The script asks for confirmation, then it creates a backup and then deletes the following data:
/home/explorer/
miniconda3
.conda*
.bashrc
.explorer/.system
.explorer/rstudio
R
.jupyter
.local/share/jupyter
.ipython
.docker
The backup will be created on a timestamped folder inside /home/explorer/explorer_backups
.
After that the instance restarts into a clean environment.
Inside the backup the relative locations of the backup folders are conserved.
To make sure to list all contents of the backup do:
(explorer) explorer@gncv-238d93ad87784f0c81a3e3695484bcc2:~$ ls -la explorer_backups/20240418T192141Z/
drwxr-xr-x. 7 explorer explorer 4096 Apr 18 19:21 .
drwxr-xr-x. 3 explorer explorer 4096 Apr 18 19:21 ..
-rw-r--r--. 1 explorer explorer 4376 Apr 18 19:21 .bashrc
drwxr-xr-x. 2 explorer explorer 4096 Apr 18 19:21 .explorer
drwxr-xr-x. 3 explorer explorer 4096 Apr 18 19:21 .ipython
drwxr-xr-x. 5 explorer explorer 4096 Apr 18 19:21 .jupyter
drwxr-xr-x. 3 explorer explorer 4096 Apr 18 19:21 .local
drwxr-xr-x. 2 explorer explorer 4096 Apr 18 19:21 conda_envs