Manage Packages in HISE IDEs

A HISE IDE comes with a set of packages and libraries that will be already installed upon provision. The packages installed depend on the image chosen; Docker builds the image by reading the instructions from a Dockerfile. These instructions specify where to install the packages for the virtual machine (VM). The intent is that these images already come with most if not all of the scientific libraries needed.

For example, by default these packages are installed in ‘/usr/local/lib/R/site-library’. Any time a VM is rebooted, the instructions are read from the Dockerfile again, meaning the default install location will not retain changes between reboots; if you were to install libraries to this directory, they would be removed when the VM is rebooted. 

New tools are constantly being developed and users will want to utilize libraries and packages that are not included in the image. We outline several workflows that users can follow to help manage R or Python packages that will help with stability, code reproducibility and saving time. Eventually, they will be standardized with Conda Enterprise.

If you have suggestions or additions for a new image, please contact immunology-support@alleninstitute.org


Custom Directory

To have additional packages persist upon reboot, users can create a directory within '/home/jupyter' and install packages there. Users who choose to do so will be responsible for managing those libraries and handling version control. 

Custom Directory Method 1:

Note that the following procedures is only for R: 

1. CREATE a directory within '/home/jupyter' to install new packages to, for example “/home/jupyter/local_lib

2. ADD directory to search path

a. Method 1: Manually attach folder to path at top of code, must do this in each instance of R. This will allow you to load any libraries installed in any location specified in .libPaths():  

.libPaths( “/home/jupyter/local.lib”) add your library to top of search path

.libPaths(c(.libPaths(), “/home/jupyter/local.lib”)) adds your library to end of search path

3. When you INSTALL a new R package, specify the install location (Note: if you added your install directory at the top of the search path, you technically do not need to specify the directory)

Examples:

CRAN

install.packages(“tictoc”, lib = “/home/jupyter/local.lib”)

Bioconductor

BiocManager::install(“mygene”, lib = “/home/jupyter/local.lib”)

GitHub

remotes::install_github(“aifimmunology/H5weaver”, lib = “/home/jupyter/local.lib”)

Custom Directory Method 2:

1. CREATE a directory within '/home/jupyter' to install new packages to, for example “/home/jupyter/local_lib"

2. ADD directory to search path

a. Automatically update your search path across all instances of R by modifying either the Renviron.site or Rprofile file (easier method long term):

i. Renviron.site (sets environment variables)

1. Locate the existing file: /usr/lib/R/etc/Renviron.site

2. Add this line containing all library folders in the search path as a “:”-delimited string, in order of your preference:

R_LIBS_SITE="/home/jupyter/local.lib:/usr/lib/R/site-library:/usr/lib/R/library:/usr/local/lib/R/site-library"

Make sure you input your own library location and list locations in the order desired. 

3. Copy this modified file into '/home/jupyter' to ensure it persists. 

cp /usr/lib/R/etc/Renviron.site  /home/jupyter/Renviron.site

4. Add the following line to your startup.sh script so the updated config file will be copied back upon reboot.

cp /home/jupyter/Renviron.site /usr/lib/R/etc/Renviron.site

ii. Rprofile file (sourced as R code)

Modify .libPaths() via the Rprofile. The code can be added via terminal using the command:

echo ".libPaths( c('/usr/local/lib/R/site-library','/usr/lib/R/site-library','/usr/lib/R/library','/home/jupyter/local.lib') )" >> /usr/lib/R/library/base/R/Rprofile

Make sure you input your own library location and list locations in the order desired. 

Important Note: For the Renvirone.site/Rprofile method, if you invoke R from the command line (using `Rscript options filename`) you should NOT use the ‘--vanilla’ option if your script relies on a package in a non-default install directory. ‘--vanilla’ implies ‘--no-environ’ and ‘no-init-file’, which will ignore Renviron.site and Rprofile, respectively 

3. When you INSTALL a new R package, specify the install location (Note: if you added your install directory at the top of the search path, you technically do not need to specify the directory)

Examples:

CRAN

install.packages(“tictoc”, lib = “/home/jupyter/local.lib”)

Bioconductor

BiocManager::install(“mygene”, lib = “/home/jupyter/local.lib”)

GitHub

remotes::install_github(“aifimmunology/H5weaver”, lib = “/home/jupyter/local.lib”)


Startup Script

A startup script is a file that performs tasks during the startup process of a virtual machine instance. For Linux startup scripts, you can use base or non-bash files. To use a non-bash file, designate the interpreter by adding a “#!” to the top of the file. For example, to use a Python 3 startup script, add #! /usr/bin/python3 to the top of the file.

The advantage to using a startup script is that upon reboot, the packages will be reinstalled. However, this can increase startup times and introduce version issues unless users make sure to pin the specific versions installed. 

Startup Script Example

The following is an example startup script to download some python packages: 

#!/bin/bash 

Pip install numba 

Pip install numpy 

Pip install pandas 


Packrat 

Packrat enhances your project directory by storing your package dependencies inside it, rather than relying on your personal R library that is shared across all of your other R sessions. You can consider this your private package library. When you start an R session in a Packrat project directory, R will only look for packages in your private library; any time you install or remove a package, those changes will be made to your private library. 

Note that this is for use with terminal only (Jupyter Notebooks for R won’t work). This is a great option for users who prefer writing R scripts instead of utilizing notebooks.  

  1. Choose the main folder you want to operate out of, in which you don't plan to keep Jupyter notebooks. Everytime you launch R from this folder, Packrat will automatically launch. 

  2. Run packrat::init()

  3. Install the packages you plan to use. 

  4. Run packrat::snapshot() this will save all the packages you’ve installed. 

  5. Moving forward, Packrat works like using R normally. 

  6. If you need to load a package that’s already installed in the default image, you can run packrat::extlib(‘packageName’), which will load that package. 


Conda 

Conda is an open source package management system and environment management system. It is widely used in Python but R support has improved dramatically in recent years. The tool allows users to easily create, save, load and switch between environments on a virtual machine. 

More information about Conda can be found here: https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html

Example Usage

Conda comes installed in HISE as part of JupyterLabs. To use Conda to manage packages in HISE:

  1. Create a Conda R environment (e.g. r_scrna) in '/home/jupyter/libs' directory. Open terminal, type conda create --prefix ./libs/r_scrna r-essentials r-base

  2. run conda activate /home/jupyter/libs/r_scrna 

  3. Now you can install any packages from R in the terminal. Just make sure the '/home/jupyter/libs/r_scrna' environment is activated.

  4. Reminder: anaconda (https://anaconda.org/) has lots of R packages collection. If a package is available in anaconda, it will be preferred to install through conda install to avoid conflicts.

  5. Register the kernel in JupyterLab by typing R -e "IRkernel::installspec(name = 'r_scrna', displayname = 'r_scrna')" 

  6. Open a notebook with r_scrna as kernel, you will be able to use all the packages you installed. 

  7. A reboot (stop/start) of the instance will require to redo registration of the kernel by conda activate /home/jupyter/libs/r_scrna then R -e "IRkernel::installspec(name = 'r_scrna', displayname = 'r_scrna')" 

  8. To use HISE functions in the Conda kernel, you will need to install HISE hydration through https://github.com/aifimmunology/r-hydration-sdk

    a. For python SDK: https://github.com/aifimmunology/hisepy