This lesson is being piloted (Beta version)

Parallel Computing in R

Overview

Teaching: 20 min
Exercises: 0 min
Questions
  • How to utilize multiple cores for R programming

Objectives
  • Learn doParalel package

  • Learn Parallel package

Example:

Using some intensive computing function: This function generates a square matrix of uniformly distributed random numbers, finds the corresponding (complex) eigenvalues and then selects the eigenvalue with the largest modulus. The dimensions of the matrix and the standard deviation of the random numbers are given as input parameters.

max.eig <- function(N, sigma) {
     d <- matrix(rnorm(N**2, sd = sigma), nrow = N)
     E <- eigen(d)$values
     abs(E)[[1]]
 }

Using foreach package

foreach(i=1:4, .combine='c') %do% max.eig(i,1)

Nested foreach

k=1
foreach(i=1:4) %:%
   foreach(j=1:4) %do%{
      max.eig(k,1)
      k=k+1
    }      

Using doParallel

Check the number of available cpus:

library(doParallel)
co <- detectCores()-1
cl <- makeCluster(co)
registerDoParallel(cl)

Apply doParallel to foreach

system.time(foreach(i=1:200, .combine='c') %do% max.eig(i,1))
system.time(foreach(i=1:200, .combine='c') %dopar% max.eig(i,1))
stopCluster(cl)

Using Parallel and parLapply

(Note: this does not work in Windows, mostly applicable to run in Palmetto) Check number of available processing cpus:

library(parallel)
co <- detectCores()-1
cl <- makecluster(co)

Apply parLapply

#Load necessary packages on the cluster workers
clusterExport(cl, c('max.eig'))
system.time(foreach(i=1:200, .combine='c') %do% max.eig(i,1))
system.time(parLapply(cl, 1:200, function(z) max.eig(z,1)))
stopCluster(cl)

Using built-in Parallel inside packages

Many packages have built-in paralle function. Here we use a bootstraping package: boot

library(boot)
# function to obtain regression weights
bs <- function(formula, data, indices) {
  d <- data[indices,] # allows boot to select sample
  fit <- lm(formula, d)
  return(coef(fit))
}
# bootstrapping with 1000 replications
system.time(results <- boot(data=mtcars, statistic=bs,
                R=10000, formula=mpg~wt+disp))

system.time(results <- boot(data=mtcars, statistic=bs,
                R=10000, formula=mpg~wt+disp,
                parallel = "snow",ncpus=2))

Key Points

  • doParallel()

  • foreach()