Exploring Learning Analytics Software: RapidMiner

I first heard about RapidMiner from the “Data Analytics & Learning” MOOC (Siemens, Gasevic, Baker, and Rosé, 2014).

For this evaluation, I installed a trial version of RapidMiner Studio (desktop software version) and performed the introductory tutorials.

Like Data Science Studio (DSS), RapidMiner is used for generating predictive models.  However, unlike DSS – which records scripts in a macro-like fashion – RapidMiner uses a visual approach to building and executing scripts.
Once a data set is selected from the bottom left-hand menu bar, dragging it onto the “process” workspace turns it into an icon, becoming the first step in the “process.”  For example, in Figure 1, I dragged a dataset called “Deals” onto the workspace.  Subsequently, “operators” (built into the program) are chosen from the pane at the top-left.  In this example, I chose a “modelling operator” (“decision tree”) and dragged it onto the process workspace.  The next task is to connect the different steps in the process together using the input/output ‘bubbles’ on either side of the process icons.  When the model is ready, the final ‘output’ bubble is connected to the “result” bubble at the top-right hand side of the “process” workspace (see Figure 1).

Figure 1 – A simple decision tree process using the data set “Deals”

Hitting the “play” button at the top runs the current process.  Results are shown under the “results” tab at the top right.  The results for the (very simple) model that I ran are shown in Figure 2.

Figure 2 – Output of the decision tree process created in Figure 1

Next, I created a decision tree model on the first dataset, and applied the results to a second dataset (called “Deals-Testset”).  When this process was run, the second dataset (table) got a new column containing the model’s predictions (see Figure 3).

Figure 4 – Predictions from the first decision tree model are applied to the second data set (i.e. input as new columns).

I also tried running a model that contained sub-processes (see Figure 4 and Figure 5).

Figure 5 – A model constructed using sub-processes
Figure 6 – Output of the above model with sub-processes

Closing Thoughts

As was the case with DSS, my familiarity with predictive modelling is relatively weak, therefore I found myself following the procedures of the tutorial without necessarily fully understanding the underlying statistical processes I was performing.  However, as a ‘visual learner’ (if there is such a thing…) I found that RapidMiner’s approach to visualizing the components of a predictive model to be quite helpful.  This seems similar to the block-like approaches used to teach programming to children (e.g. Scratch); useful for novices to turn complex statistical processes into simple, graphical blocks – however knowledge of these processes is still required.

This post is part of a series in which I reflect on my experiences as a first-time explorer of various pieces of learning analytics and data mining software applications.  The purpose of these explorations is for me to gain a better understanding of the current palette of tools and visualizations that may possibly support my own research in learning analytics within the context of a face-to-face/blended collaborative learning environment in secondary science.