-
Notifications
You must be signed in to change notification settings - Fork 4
Setting up WSL Bash on Windows 10
Anil Chalisey
- Installing the Linux subsystem for Windows
- Build environment
- Installing Anaconda
- Installing HISAT2
- Installing R and RStudio
- Installing key R packages
- Installing HOMER
- Setting paths
- Using X-windows
With the introduction of the windows subsystem for linux (WSL) in Windows 10, the Windows OS is now a viable option for bioinformatic analysis, with no need for virtual managers, Docker or Cygwin. It's early days, but I have found it possible to switch entirely from a Linux computer to a Windows 10 computer for my bioinformatics analyses.
This guide explains how to set up a Windows 10 computer for this purpose. My general workflow is to work within the windows 10 environment, but when I require Linux-based command-line tools, to either call them directly from the WSL terminal, or from within the the windows command prompt terminal using the following syntax: "bash -c '<command here>'". As most of my packages are designed within R, this guide describes the set-up of WSL in a way that allows system commands to be called from within R.
- Go to 'Settings > Update & Security > For developers' and turn on 'Developer mode'
- Go to 'Control panel' > 'Programs and features' > 'Turn Windows features on and off' and then tick the 'Windows subsystem for Linux' box and then allow the machine to restart.
- Once restarted open a command prompt and type 'bash' - the linux subsystem will download and then guide you through setting up a username and password.
On the latest version of Windows 10, the linux subsystem installed will be Ubuntu 16.10.
Once installed, the bash terminal may be started by opening up a command prompt (press Win + R on the keyboard and then type cmd and press Enter or click OK) and typing bash at the command prompt followed by pressing Enter. Once the terminal is open, the system should be updated using the following commands (remembering to provide your password when asked and to type y when it asks if you wish to continue):
sudo apt-get update
sudo apt-get upgradeThe following steps should all be performed within the bash terminal. This means that unless specified otherwise, all the steps here will also work for a native Linux/Unix-based operating system. As a side note, my usual preference is to avoid amending the executable path, and instead I tend to make symbolic links to binaries within a directory already in the path. My preferred directory for this purpose is /usr/local/bin, but if you do not have root access in your Linux system, then you should make a directory within the home directory called bin and make the links within that directory. To make this directory, use the following commands:
cd ~
mkdir binEnsure there is a working build environment using the following command:
sudo apt-get install gcc make build-essential gfortranAnaconda is a Python (and R) distribution specifically developed for data science and may be installed using the instructions below. While we could also simply use the default Python distribution from the Ubuntu repositories, Anaconda comes with Intel's MKL and thus provides a substantial performance boost (not to mention its conda package manager). It may be installed as follows:
wget https://repo.continuum.io/archive/Anaconda2-4.4.0-Linux-x86_64.sh
bash Anaconda2-4.4.0-Linux-x86_64.shDuring installation, accept the license agreement and allow the install location to be prepended to your .bashrc. Once installed, update conda and anaconda.
conda update conda
conda update anacondaFinally, add the bioconda channel and install software. This is much easier than installing the tools separately as it also installs all the dependencies (for example, the latest version of JAVA). The tools I install here are those necessary for my bioinformatics pathways and the packages I have developed:
conda config --add channels bioconda
conda install -c bioconda samtools bedtools fastqc sambamba MACS2 subread
# these need to be added to the executable path. Anaconda asks you whether
# this should be done during its installation. However, the path created
# by anaconda is only accessible from within R by specifying the entire
# path. To make it easier to access the programs from R I create symbolic
# links as described later below.There are binaries available for linux distributions.
wget ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/downloads/hisat2-2.0.5-Linux_x86_64.zip
unzip hisat2-2.0.5-Linux_x86_64.zip
# this only works if you have root access
sudo mv hisat2-2.0.5/* /usr/local/bin
# if you do not have root access, then do the following
mv hisat2-2.0.5/* ~/bin
rm -rf hisat2-2.0.5If using the WSL-based approach, there is no absolute requirement to install R or RStudio within WSL, as the workflow is to use R within Windows 10 and then to call Linux programs as needed. Installation in Windows 10 is straightforward - simply download the executables, double click and follow the instructions. I recommend using the Microsoft R Open (MRO) version of R which is super-charged with the Intel Maths Kernel Library for multi-threading.
For those using Linux, then instructions are below. R and RStudio can usually only be installed if the user has root priviliges. If you do not have root priviliges then speak to your administrator.
The R-base available via apt-get is usually out-of-date and is best installed directly from CRAN or MRAN.
To install R directly from CRAN, first add the repository to the sources list, then add R to the Ubuntu keyring, and then install R-base:
sudo echo "deb http://cran.rstudio.com/bin/linux/ubuntu xenial/" | sudo tee -a /etc/apt/sources.list
gpg --keyserver keyserver.ubuntu.com --recv-key E084DAB9
gpg -a --export E084DAB9 | sudo apt-key add -
sudo apt-get update
sudo apt-get install r-base r-base-devTo install the MRO (Microsoft R open, multithreaded) version of R use the following commands, replacing x.x.x with whichever version is the latest:
sudo wget https://mran.microsoft.com/install/mro/x.x.x/microsoft-r-open-x.x.x.tar.gz
tar -xvzf microsoft-r-open-x.x.x.tar.gz
cd microsoft-r-open
sudo ./install.sh
cd ..
rm microsoft-r-open*It is also possible to install R using Anaconda. This may be a solution if one does not have root priviliges. The caveat, however, is that R packages must be installed in a non-standard way via conda channels, as the normal install.packages() route (see below) throws an error. Also, on some systems, I have found that this results in errors in rendering fonts in plots.
To install R via Anaconda:
conda install -c r r-essentialsTo install RStudio:
sudo apt-get update
sudo apt-get install gdebi-core gfortran libgdal-dev libgeos-dev libpng-dev
sudo apt-get install libjpeg62-dev libjpeg8-dev libcairo-dev libssl-dev
wget https://download1.rstudio.org/rstudio-1.0.143-amd64.deb
sudo gdebi -n rstudio-1.0.143-amd64.deb
rm rstudio-1.0.143-amd64.debTo install RStudio via Anaconda
conda install -c r rstudioThis first step is only required if on Linux and describes installation of some key linux packages on which subsequent R packages are dependent. On most systems, these packages will already be installed. If they are not, root privilege is required.
sudo apt-get update
sudo apt-get install build-essential libx11-dev
sudo apt-get install libcurl4-openssl-dev libxml2 libxml2-dev libncurses5-dev zlib1g-dev curlIn Windows 10 the Rtools package must be installed, which can be downloaded from here.
To perform the subsequent steps in Linux open up an R terminal by typing R at the command line. In Windows 10 open up R or Microsoft R Open from the start menu. Then type the following commands. After these processes have finished running, the R terminal may be closed and we return to the bash terminal.
install.packages(c("tidyverse", "devtools", "rmarkdown", "knitr", "data.table",
"ggthemes"))
source("http://bioconductor.org/biocLite.R")
biocLite("BiocUpgrade")
biocLite(c("rtracklayer", "limma", "DESeq2", "edgeR", "ComplexHeatmap", "goseq",
"Rsamtools"))
# once complete, close the R terminal
q()If R has been installed via Anaconda, then the steps for installing these packages is as follows:
conda install -c r r-tidyverse r-devtools r-rmarkdown r-knitr r-data.table
conda install -c ncil r-ggthemes=3.3.0
conda install -c bioconda bioconductor-limma bioconductor-deseq2 bioconductor-edger
conda install -c bioconda bioconductor-complexheatmap bioconductor-rsamtools
conda install -c bioconda bioconductor-goseq bioconductor-rtracklayerHomer is a PERL-based software for motif discovery and next-generation sequencing analysis, which may be found at http://homer.ucsd.edu/homer/index.html. HOMER has a number of dependencies, all of which we have installed, so we can proceed directly to installation.
mkdir ~/homer
cd ~/homer
wget http://homer.ucsd.edu/homer/configureHomer.pl
perl configureHomer.pl -install
perl configureHomer.pl -install hg19
perl configureHomer.pl -install human-pSome of the HOMER functions depend on R being available to Linux, so if you do not have R or did not install R as described above, then the key items may be installed using anaconda, particularly if you do not have root access:
conda install r-essentials bioconductor-deseq2 bioconductor-edgerTo ensure all the installed programs can be directly from the command line we create symbolic links in /usr/local/bin or ~/bin
cd /usr/local/bin
sudo ln -s ~/anaconda2/bin/* .
sudo ln -s ~/homer/.//bin/* .
# alternatively, if you do not have root access
cd ~/bin
ln -s ~/anaconda2/bin/* .
ln -s ~/homer/.//bin/* .WSL does not natively support Graphical User Interfaces (GUIs) such as RStudio, but there is a work-around so that these programs can be used. The simplest way I have found is to install Mobaxterm which has native X-forwarding and does not require any additional configuration. Other terminal emulators with X-forwarding also exist (e.g. ConEmu). If using such terminal, running RStudio is as simple as:
If you wish to stick with using the standard WSL terminal, then first you need to download and install an X server such as Xming or VcXsrv on windows. Once launched, this will then run in the background, and provide a fully functioning X-Windows system. You just need to tell programs that launch from the bash shell where to send their display by setting the DISPLAY variable:
nano ~/.bashrcThe above command will open .bashrc in nano and you can scroll to the end of the file and write
export DISPLAY=:0.0Save the modified file by pressing CTRL+X and answering Y when asked if you want to save the file. Close and restart the console window or source the modified file using the command:
source ~/.bashrcNow, open the display server by launching XLaunch from the windows start menu. Choose "One large window" or "One large window without titlebar" and set the "display number" to 0. Leave other settings as default and finish the configuration. Once setup, running a GUI-based program is as simple as starting XLaunch alongside WSL and then executing the program.