User:MPopov (WMF)/Notes/Setup

From Meta, a Wikimedia project coordination wiki

Hello, dear reader. These are setup notes written to myself. Substitute the username for your own where appropriate.

SSH[edit]

Locally, with zsh shell (new in macOS 10.15 Catalina)

~/.aliases:

alias swap-s4="ssh -N stat4 -L 8000:127.0.0.1:8000"
alias swap-s5="ssh -N stat5 -L 8000:127.0.0.1:8000"
alias swap-s6="ssh -N stat6 -L 8000:127.0.0.1:8000"
alias swap-s7="ssh -N stat7 -L 8000:127.0.0.1:8000"
alias swap-s8="ssh -N stat8 -L 8000:127.0.0.1:8000"
alias clean-pip="rm -rf ~/Library/Caches/pip/*"

~/.zprofile:

source $HOME/.aliases

Alternatively, those aliases can be put into ~/.bash_profile if bash is still configured as the default shell.

In order for those SSH tunnels to work, I use an ~/.ssh/config based on the config in the Product Analytics onboarding document. Once a tunnel is open in Terminal, I can go to http://localhost:8000 in my browser to use SWAP.

Python environment[edit]

On stat100X hosts, I make a ~/.bashrc with the following:

export PATH=$HOME/venv/bin:${PATH}
export http_proxy=http://webproxy.eqiad.wmnet:8080
export https_proxy=http://webproxy.eqiad.wmnet:8080
alias clean-pip="rm -rf ~/.cache/pip/http/*"

I also have a ~./bash_profile containing [[ -r ~/.bashrc ]] && . ~/.bashrc because sometimes bashrc does not get sourced upon connection. Either exit and re-SSH or source ~/.bashrc to have those environment variables set up. Note: if a host does not have Jupyter, a virtual environment needs to be created manually (one is created automatically when logging in to JupyterHub for the first time):

python3 -m venv venv
source venv/bin/activate

Note: to reset the virtual environment, follow these instructions.

Then on all hosts I update the pre-loaded libraries:

python -m pip install -U pip
pip install -U setuptools wheel virtualenv pyarrow \
  pandas scikit-learn scipy numpy statsmodels \
  mwtext mwtypes mwapi mwreverts
pip install -U git+https://github.com/neilpquinn/wmfdata.git@release

On hosts with Jupyter (which should be all stat100X hosts at this point):

pip install -U ipython ipykernel nbconvert \
  jupyter jupyterlab jupyter-client jupyter-core

The Jupyter server needs to be restarted at that point. go to the classic interface (/user/bearloga/tree), click on "control panel" in the top right, click "stop my server", and then click "my server" on the resulting page.

R environment[edit]

On all hosts I make a ~/.Rprofile containing the following:

Sys.setenv("http_proxy" = "http://webproxy.eqiad.wmnet:8080")
Sys.setenv("https_proxy" = "http://webproxy.eqiad.wmnet:8080")
options(repos = c(CRAN = "https://cran.rstudio.com"), mc.cores = 4)
Sys.setenv(MAKEFLAGS = "-j4")
Sys.setenv(RETICULATE_PYTHON = "/home/bearloga/venv/bin/python")

Note: for the mc.cores option, I don't use parallel::detectCores() because I don't want to use all 40 cores on stat1005, for example. In R:

# Setup library in homedir:
lib_path <- Sys.getenv("R_LIBS_USER")
if (!dir.exists(lib_path)) dir.create(lib_path, recursive = TRUE)
.libPaths(lib_path)

# Install essential packages:
install.packages(c(
  "zeallot", "glue", "here", "remotes", "reticulate",
  "tidyverse", "data.table", "furrr",
  "IRkernel", "hrbrthemes", "patchwork", "knitr"
))

IRkernel::installspec() # registers r kernel with jupyter
remotes::install_github("wikimedia/wikimedia-discovery-wmf")
remotes::install_github("wikimedia/wikimedia-discovery-polloi")

The last couple of lines install wmf (the R counterpart to wmfdata) and polloi (which includes useful functions such as compress() for turning numbers like 1000 into "1K")

hrbrthemes[edit]

To use the {hrbrthemes} R package for nice theming of ggplot2 charts in reports, there are two different steps depending on which host you're using.

On hosts with Debian Buster like stat1005 and stat1008 (which have R 3.5.3), simply install.packages("hrbrthemes") will suffice.

On hosts with Debian Stretch like stat1004, stat1006, and stat1007 (which have R 3.3.3), the last installable version of the package is 0.6.0, which must be downloaded manually from the CRAN archive:

mkdir ~/downloads && cd ~/downloads
wget https://cran.r-project.org/src/contrib/Archive/hrbrthemes/hrbrthemes_0.6.0.tar.gz
R -e "install.packages(c('gdtools', 'extrafont'), repos = c(CRAN = 'https://cran.rstudio.com/'))"
R -e "install.packages('hrbrthemes_0.6.0.tar.gz', repos = NULL)"