As a Data Scientist, I want to work on an RStudio Server image managed by Kubernetes with GPU capability. The motivation for doing this is that I can then access a machine learning rig that is 1) scale invariant (same from desktop to Cloud), 2) leverages Kubernetes for scheduling and 3) taps me into the Container ecosystem for CI/CD. Let’s call this a start to the problem of operationalising machine learning – MLOps. If that’s youR use case Read on. HeRe’s HobaRt.
Objective
To realise our Kubernetes, GPU aware RStudio Server environment, follow the steps below. It is assumed you have built a Machine Learning Rig similar to NUC with eGPU – a Big Little ML Rig . Installing Minikube is covered at Kubernetes Minikube #1 – Configmaps, Storage, Ingress .
- Build the tensforflow/rstudio server image (tf-rstudio) and confirm basic nvidia-docker use cases
- Start Minikube and then configure and deploy your tf-rstudio Pod payload
- Verify the setup with some R GPU enabled libraries
- Now use the technology to create something awesome
Setup
1. Build tf-rstudio image
A GPU enabled RStudio server image has been built by mashing up the gcr.io/tensfor/tensorflow:latest-gpu image with rocker/rstudio. Inspect, then rip and build the image from the Dockerfile as per below.
$ wget https://bitbucket.org/emergile/MLOps/src/master/tensorflow/Dockerfile $ docker build -t stefanopicozzi/tf-rstudio . $ docker images | grep tf-rstudio
2. NVIDIA Docker GPU Test
Now verify docker/nvidia/GPU integration with the tf-rstudio image using the nvidia-docker driver. If successful, the eGPU shuld be visible as per below.
$ minikube stop $ systemctl restart nvidia-docker $ systemctl restart docker.service $ nvidia-smi ... $ cd ~/MLOps/NVIDIA_CUDA-9.0_Samples/bin/x86_64/linux/release $ ./deviceQuery ... PASS $ nvidia-docker run -it stefanopicozzi/tf-rstudio /bin/bash root@123567890:/notebooks# python >>> import tensorflow as tf >>> tf.test.gpu_device_name() ... u'/gpu:0' Ctrl-D
3. Basic NVIDIA Docker R Test
Now check that the container is correctly configured for R/GPU integration by performing this basic smoke test.
$ nvidia-docker run -it -p 8787:8787 stefanopicozzi/tf-rstudio /bin/bash R > install.packages("gpuR") ... > library("gpuR") > options(encoding = "UTF-8") > detectGPUs() > str(gpuInfo(device_idx = 1L)) List of 13 $ deviceName : chr "GeForce GTX 1070" $ deviceVendor : chr "NVIDIA Corporation" $ numberOfCores : num 15 $ maxWorkGroupSize : num 1024 ...
4. Launch Minikube
Good so far! Launch Minikube with the feature gates. It may be worthwhile restarting Minikube from a clean/reset environment as described at Messing with Kubernetes Minikube #1 – Configmaps, Storage, Ingress .
$ minikube stop $ sudo minikube addons enable kube-dns $ sudo minikube start \ --vm-driver=none \ --feature-gates=Accelerators=true $ sudo minikube addons list
Validate
tf-rstudio Deployment
The deployment artifacts for an RStudio Server based on the tensorflow image can be inspected at https://bitbucket.org/emergile/MLOps/src/master/tensorflow/tf-rstudio.yaml . Note the volumeMounts for /home/rstudio, which enables us to persist any packages we installed. Create a file named .Rprofile in the localhost volumeMount location with contents .libPaths(c(“~/lib”)) and then create the subdirectory lib.
The remaining items are much as per Kubernetes Minikube #3 – Jupyter, Tensorflow with (external) GPU . Console and Script operations from the RStudio Server Browser client form an LD_LIBRARY_PATH from /etc/rstudio/rserver.conf . Let’s use Kubernetes configmaps to set up this file.
# Create the namespace $ kubectl create -f - << EOF! apiVersion: v1 kind: Namespace metadata: name: mlops EOF! $ export YAML=https://bitbucket.org/emergile/MLOps/src/master/tensorflow/tf-rstudio.yaml $ kubectl delete -f ${YAML} -n=mlops $ kubectl create -f ${YAML} -n=mlops $ kubectl describe pod tf-rstudio -n=mlops $ kubectl describe service tf-rstudio -n=mlops $ kubectl describe configmap rserver-config -n=mlops Name: rserver-config Namespace: mlops Labels: Annotations: Data ==== rserver-conf: ---- # Server Configuration File rsession-ld-library-path=/usr/local/cuda/extras/CUPTI/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:usr/local/cuda/lib64/stubs:/usr/lib/R/lib Events:
Now point your Browser to the RStudio client, e.g. http://127.0.0.1:30787, login as rstudio/rstudio and reproduce the following R GPU related package tests. These were also covered outside Kubernetes here.
gpuR Test
> install.packages("gpuR")
> source("https://bitbucket.org/emergile/MLOps/src/master/gpu/gpuR.R", echo=TRUE)
...
> result
nrow time_classic time_gpu time_vcl
1 2 2.360344e-05 0.02209187 0.02325058
2 4 9.536743e-06 0.02113152 0.02141833
3 8 9.298325e-06 0.02233958 0.02137542
4 16 1.215935e-05 0.02142715 0.01937914
5 32 2.646446e-05 0.02070546 0.02394533
6 64 1.516342e-04 0.01922011 0.02080178
7 128 1.146793e-03 0.02226043 0.02678847
8 256 1.774359e-02 0.02863193 0.02996230
9 512 7.287693e-02 0.03742671 0.03395653
10 1024 6.393669e-01 0.08641076 0.05084944
11 2048 5.397557e+00 0.38526559 0.19899654
12 4096 4.979891e+01 1.93692446 0.81673932
...
gputools Test
> install.packages("~/gputools_1.1_new.tar.gz", repos = NULL, type = "source")
> source("https://bitbucket.org/emergile/MLOps/src/master/gpu/gputools.R", echo=TRUE)
...
> system.time(matA%*%matB);
user system elapsed
14.794 0.040 14.826
> system.time(gpuMatMult(matA, matB))
user system elapsed
0.357 0.080 0.437
...
Keras using Tensorsoft Test – Nietzsche Example
> Sys.getlocale() [1] "LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8; LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8; LC_IDENTIFICATION=C" > install.packages('devtools', dependencies=TRUE) > devtools::install_github("rstudio/reticulate") > devtools::install_github("rstudio/keras") > library("keras") > install_keras(tensorflow = "gpu") > install.packages('readr', 'tokenizers', dependencies=TRUE) > source("https://raw.githubusercontent.com/rstudio/keras/master/vignettes/examples/lstm_text_generation.R", echo=TRUE) ... iteration: 19 --------------- Epoch 1/1 200284/200284 [==============================] - 63s 317us/step - loss: 1.3469 diversity: 0.200000 --------------- he sense of the same state of the strength of the sense of the sense and a still same sight of the sense, which are not and all the sense and and the sense and account the sense is all the strength of the strength to the sense of all the sense and the standard the sense of the sense and and all the sense of the standard the strength of a sensible the sense of the end the sense and the sense and a ...
Keras using Tensorsoft Test – MNIST Example
> install_keras(tensorflow = "default") > source("https://bitbucket.org/emergile/MLOps/src/master/gpu/MNIST.R", echo=TRUE) ... user system elapsed 194.706 8.667 62.109 ... > install_keras(tensorflow = "gpu") > source("https://bitbucket.org/emergile/MLOps/src/master/gpu/MNIST.R", echo=TRUE) ... user system elapsed 60.438 5.484 46.522 ...
Trivia
Go https://www.kaggle.com/datasets and use the (GE) Force!