Introduction to Data Mining with RapidMiner

Analyses of complex data structures with many examples and a large number of attributes require the use of data mining techniques. Machine learning algorithms such as Random Forests, Support Vector Machines or Boosting usually constitute the cornerstone of such an analysis. Data mining, however, goes beyond the simple training and application of a learning algorithm. It also incorporates finding a good representation of the data in fewer dimensions without losing relevant information, as well as a thorough validation of the results.

The machine learning environment RapidMiner is one of the most widely used data mining platforms on the market. It is written in Java and provides a drag and drop environment for the design of data analysis processes with graphical representations (operators) instead of actual code. This graphical approach enables the user to directly follow the data stream through various stages of the analysis. RapidMiner therefore provides the opportunity to comprehend the concepts of data mining at a general level. Furthermore, RapidMiner offers various plug-ins for other programming languages or data mining platforms such as R or Weka. Beyond that it has a wide community of developers that contribute to the project with useful extensions to the built-in operators.

The tutorial will introduce the basic concepts of data mining using RapidMiner for illustration in hands-on-exercises. It will cover the basic functionalities of the platform as well as the design of an entire analysis chain. This includes the following topics:

data pre-procssing,
feature selection (including a validation of the stability)
reduction of dimensionality
training and testing of a learning algorithm
validation and verification of the results.

Focus will be put mainly on practical aspects, rather than on an analysis of the algorithms from a more theoretical point of view.

Participants will need to bring their own laptop with RapidMiner pre-installed. Datasets will be provided before the tutorial and should be downloaded beforehand. Please come back to this page shortly before the tutorial.

Organizer

Dr. Tim Ruhe

Collaborative Research Center SFB 876
Experimental Physics Vb
TU Dortmund