Introduction
[ UPDATE ] A new and improved approached to using Drools including its CEP features with R is via issuing REST APIs to a Drools Decision Server. An approach to this is described here.
R is a functional programming language (FPL) specialising in statistical analyses. Drools is that most excellent community project to build a “universal behavioural platform”. The project includes the fusion complex event processing (CEP) sub-project. By mashing R and CEP Drools together with RStudio you have the makings of a a very powerful visual IDE for the simulation of data sets and the testing of rules formed using the Drools CEP syntax. This means developers have an agile environment for learning and applying the Drools CEP MVEL syntax immediately accessible from a browser without any local configuration. No java code is needed for setting up your Drools test client code and the Drools rules engine is accessed using a few simple API calls. R is feature rich in functions for extracting, transforming and loading data from sources such as REST, spreadsheets, CSV files, databases, CURL and SOAP. This takes the tedium out of generating and preparing sample data sets as inputs to rules services.
Setup
System configuration assumes a functional RStudio environment for either the client or server side version. To do this visit http://www.rstudio.com/ and follow the installation instructions there or read RStudio Server on Fedora.
To integrate Drools with R, a Drools package has been written for R. This Drools package enables you to process pseudo clock based timestamped event streams. To install these packages do as follows. Upon first usage of the Drools package you may need to add additional R packages. Just use the RStudio package installer to add whatever else is necessary as missing packages are encountered. Note that the package has been pre-loaded into the RStudio Server in the instructions for the Fedora RStudio Server set-up referenced above. To install these packages independently do:
# Install whatever R packages you need ... $ wget http://cran.rstudio.com/src/contrib/rJava_0.9-6.tar.gz $ sudo R CMD INSTALL rJava_0.9-6.tar.gz $ wget https://bitbucket.org/emergile/Rdrools6/blob/master/Rdrools6jars_0.0.1.tar.gz $ sudo R CMD INSTALL Rdrools6jars_0.0.1.tar.gz $ wget https://bitbucket.org/emergile/Rdrools6/blob/master/Rdrools6_0.0.1.tar.gz $ sudo R CMD INSTALL Rdrools6_0.0.1.tar.gz
Usage Pattern
Usage is based on a very simple pattern in which an event stream of input is processed with a rules MVEL file. Facts are created all within the file. A fact known as “output” is then used to capture content back to the client. R makes the pre and post data processing much easier allowing you to spend more time in cycles of authoring and testing your rules syntax. To use the drools package the steps are as per below and a code fragment follows:
- Create an input dataframe in which you hold the input data, e.g. inputdata
- Assign columns names to the input dataframe, e.g. input.columns
- Assign column names for the dataframe to hold the output of the rules execution, e.g. output.columns
- Tell R where your MVEL rule file is located, e.g. rules Set the rules engine to use STREAM, e.g. mode <- “STREAM”
- Set up the rules session, e.g. rules.session Run the rules and capture the output dataframe generated, e.g. outputdata
Simple Sample
To reproduce the example shown above create an R script file with the following contents named sample/R and then create a rules files named rules.txt as follows.
sample.R
# Script to author and test rules files using Drools6 packages inside R # Pull down observations and apply all rules Sys.setenv(NOAWT = "true") library("httr") library("rjson") library("Rdrools6") setwd("~/Sample") # Set up some sample input data row <- data.frame(obsid = "1", obsdate = "2014-05-22 00:00:00", obsvalue = 10) inputdata <- row row <- data.frame(obsid = "2", obsdate = "2014-05-21 00:00:00", obsvalue = 20) inputdata <- rbind(inputdata, row) row <- data.frame(obsid = "2", obsdate = "2014-05-20 00:00:00", obsvalue = 30) inputdata <- rbind(inputdata, row) input.columns <- colnames(inputdata) # Set up some sample output data output.columns <-c ("rulename", "rulevalue") # set up rules file rules <- readLines("rules.txt") mode <- "STREAM" # Apply rules rules.session <- rulesSession(mode, rules, input.columns, output.columns) outputdata <- runRules(rules.session, inputdata)
rules.txt
import java.util.HashMap; import org.json.JSONObject; import java.util.Date; import java.text.SimpleDateFormat; import com.satimetry.nudge.Output; global java.util.HashMap output; global SimpleDateFormat inSDF; global SimpleDateFormat outSDF; function void print(String txt) { System.out.println(txt); } declare Observation @role( event ) @timestamp( obsdate ) obsid : String @key obsdate: Date @key obsvalue: Integer end rule "ruleInsertObservation" salience 1000 when $input : JSONObject() from entry-point DEFAULT then inSDF = new SimpleDateFormat("yyyy-M-d h:m:s"); Date obsdate = inSDF.parse( $input.get("obsdate").toString() ); Observation $observation = new Observation( $input.get("obsid").toString(), obsdate ); $observation.setObsvalue( Integer.parseInt($input.get("obsvalue").toString()) ); insert( $observation ); print(drools.getRule().getName() + "->" + $observation.getObsid() + "-" + $observation.getObsdate() ); end rule "ruleTotalValue" salience -1000 no-loop true when $total : Number( intValue > 0) from accumulate( Observation( $obsvalue: obsvalue ) over window:time( 30d ), sum ( $obsvalue ) ) then JSONObject joutput = new JSONObject(); joutput.put("rulename", drools.getRule().getName()); joutput.put("rulevalue", $total); Output $output = new Output(joutput.toString()); insert($output); print(drools.getRule().getName() + "->" + $total); end