R Data Science Library Package

R packages are modules that contain R functions and data sets. LightDB-A Database provides a collection of data science-related R libraries that can be used with the LightDB-A Database PL/R language. You can download these libraries in .gppkg format from VMware Tanzu Network.

This chapter contains the following information:

For information about the LightDB-A Database PL/R Language, see LightDB-A PL/R Language Extension.

Parent topic: Installing Optional Extensions (VMware LightDB-A)

R Data Science Libraries

Libraries provided in the R Data Science package include:

abind

adabag

arm

assertthat

backports

BH

bitops

car

caret

caTools

cli

clipr

coda

colorspace

compHclust

crayon

curl

data.table

DBI

Deriv

dichromat

digest

doParallel

dplyr

e1071

ellipsis

fansi

fastICA

fBasics

fGarch

flashClust

foreach

forecast

foreign

fracdiff

gdata

generics

ggplot2

glmnet

glue

gower

gplots

gss

gtable

gtools

hms

hybridHclust

igraph

ipred

iterators

labeling

lattice

lava

lazyeval

lme4

lmtest

lubridate

magrittr

MASS

Matrix

MatrixModels

mcmc

MCMCpack

minqa

ModelMetrics

MTS

munsell

mvtnorm

neuralnet

nloptr

nnet

numDeriv

pbkrtest

pillar

pkgconfig

plogr

plyr

prodlim

purrr

quadprog

quantmod

quantreg

R2jags

R2WinBUGS

R6

randomForest

RColorBrewer

Rcpp

RcppArmadillo

RcppEigen

readr

recipes

reshape2

rjags

rlang

RobustRankAggreg

ROCR

rpart

RPostgreSQL

sandwich

scales

SparseM

SQUAREM

stabledist

stringi

stringr

survival

tibble

tidyr

tidyselect

timeDate

timeSeries

tseries

TTR

urca

utf8

vctrs

viridisLite

withr

xts

zeallot

zoo

Installing the R Data Science Library Package

Before you install the R Data Science Library package, make sure that your LightDB-A Database is running, you have sourced lightdb_a_path.sh, and that the $MASTER_DATA_DIRECTORY and $GPHOME environment variables are set.

  1. Locate the R Data Science library package that you built or downloaded.

    The file name format of the package is DataScienceR-<version>-relhel<N>_x86_64.gppkg.

  2. Copy the package to the LightDB-A Database coordinator host.

  3. Follow the instructions in Verifying the LightDB-A Database Software Download to verify the integrity of the LightDB-A Procedural Languages R Data Science Package software.

  4. Use the gppkg command to install the package. For example:

    $ gppkg -i DataScienceR-<version>-relhel<N>_x86_64.gppkg
    

    gppkg installs the R Data Science libraries on all nodes in your LightDB-A Database cluster. The command also sets the R_LIBS_USER environment variable and updates the PATH and LD_LIBRARY_PATH environment variables in your lightdb_a_path.sh file.

  5. Restart LightDB-A Database. You must re-source lightdb_a_path.sh before restarting your LightDB-A cluster:

    $ source /usr/local/greenplum-db/lightdb_a_path.sh
    $ gpstop -r
    

The LightDB-A Database R Data Science Modules are installed in the following directory:

$GPHOME/ext/DataScienceR/library

Note rjags libraries are installed in the $GPHOME/ext/DataScienceR/extlib/lib directory. If you want to use rjags and your $GPHOME is not /usr/local/greenplum-db, you must perform additional configuration steps to create a symbolic link from $GPHOME to /usr/local/greenplum-db on each node in your LightDB-A Database cluster. For example:

$ gpssh -f all_hosts -e 'ln -s $GPHOME /usr/local/greenplum-db'
$ gpssh -f all_hosts -e 'chown -h gpadmin /usr/local/greenplum-db'

Uninstalling the R Data Science Library Package

Use the gppkg utility to uninstall the R Data Science Library package. You must include the version number in the package name you provide to gppkg.

To determine your R Data Science Library package version number and remove this package:

$ gppkg -q --all | grep DataScienceR
DataScienceR-<version>
$ gppkg -r DataScienceR-<version>

The command removes the R Data Science libraries from your LightDB-A Database cluster. It also removes the R_LIBS_USER environment variable and updates the PATH and LD_LIBRARY_PATH environment variables in your lightdb_a_path.sh file to their pre-installation values.

Re-source lightdb_a_path.sh and restart LightDB-A Database after you remove the R Data Science Library package:

$ . /usr/local/greenplum-db/lightdb_a_path.sh
$ gpstop -r 

Note When you uninstall the R Data Science Library package from your LightDB-A Database cluster, any UDFs that you have created that use R libraries installed with this package will return an error.