Questions tagged [rapidminer]

RapidMiner is a software platform that provides an integrated environment for machine learning, data mining, text mining, predictive analytics and business analytics.

RapidMiner is a software platform developed by the company of the same name that provides an integrated environment for machine learning, data mining, text mining, predictive analytics and business analytics. It is used for business and industrial applications as well as for research, education, training, rapid prototyping, and application development and supports all steps of the data mining process including results visualization, validation and optimization. RapidMiner is developed on a business source model which means the core and earlier versions of the software are available under an OSI-certified open source license. [Wikipedia]

24 questions
7
votes
1 answer

How can I find out if a subset of Stack Exchange users increase/decrease their post rate based on badges earned?

I'm trying to mine that Stack Exchange data dump to find out whether there is a cluster of users that may be positively or negatively affected by the number of badges they've been awarded. The theory I'm working on is that for some people who were…
cflewis
  • 73
  • 5
5
votes
1 answer

Sliding window validation for time series

I have a broad question about sliding window validation. Specifically, I am looking at using Rapid Miner to predict future values of a financial series using "lagged" values of that series and other covariates. I have been experimenting with the…
B_Miner
  • 7,560
  • 20
  • 81
  • 144
4
votes
1 answer

How to choose a data subset in RapidMiner?

I'm working with a CSV which contains approximately 220,000 entries. My aim is to predict one of the attributes (ATT1) using the other 3 (ATT2, ATT3, ATT4). I've been able to do this using NaiveBayes, but now I feel unsatisfied with the result. The…
Gurzo
  • 143
  • 1
  • 4
4
votes
1 answer

Prediction of soccer matches / process setup and optimization

Currently I'm working on my master thesis about the application of data mining in football, I'm trying to predict matches based on some stats of the two involved Teams (using RapidMiner). My use case is the German Bundesliga and I will predict the…
3
votes
1 answer

Visualizing large file-based or Redis-in-memory stored large datasets (millions of data points)

I am very active at StackExchange's QuantFinance forum but thought this question is more suitable to be asked here. I am generating large time series data and store them in-memory in Redis (alternatively could also save to disk in any format) and…
Matt
  • 113
  • 5
3
votes
2 answers

How can I detect when a key was pressed with accelerometer or gyroscope data?

I have a dataset (~20k samples) of sensor data gathered from a smartphone. What I want to do with it is to detect those spikes you can see in the graphs below. They occur when the user presses a button. I want to label the data that refers to those…
keinabel
  • 189
  • 2
  • 8
2
votes
0 answers

Generating (forcing) confidence percentages in RapidMiner

I have a dependent variable (my 'label' in RapidMiner terms), that is a binary classification expressed as 'WIN' or 'NOTWIN'. I know 'NOTWIN' is exhibited in about 90% of all observations. When I try to run a K Nearest Neighbors approach, the…
Brett
  • 31
  • 4
2
votes
1 answer

Information Gain vs Gain Ratio

In the building of a decision tree, when it's better to prefer the information gain criterion to the gain ratio criterion ? And why ?
Qwerto
  • 383
  • 3
  • 9
2
votes
1 answer

Low recall and high precision in text summarization

We are trying to generate a model to summarize Persian news. About 14000 news were summarized with help of humans(supervised) and then we extracted all sentences (about 180000) and labeled them (true if were selected in summarization, false if not).…
Oli
  • 155
  • 1
  • 7
2
votes
1 answer

Tool form Hierarchical clustering

I'm trying to perform a hierarchical Clustering Analysis in a dataset of 40 attributes and +70,000 records, which is mostly composed by categorical variables. I've used Matlab and RapidMiner to execute the analysis but among their poor peformance…
1
vote
1 answer

How to re-cluster new instance in centroid base clustering?

I have applied clustering algorithms like k-mean, k-medoid and DBSCAN on my patients dataset. For each algorithm RapidMiner generated a clustered model (centroid table and graphs etc) and a clustered set (shows which examples are part of which…
1
vote
1 answer

Loop over Tokens in RapidMiner's Text Processing Plugin

is there any possibility to iterate over the tokens of a text document within RapidMiner? My first try was to window the document after tokenisation. But this seems very complicated. I'm doing this to simulate the creation of a language model like…
Andreas
  • 431
  • 6
  • 13
1
vote
0 answers

Problem with unequal distribution of classes in sentiment classification

I am performing a binary sentiment classification (positive/negative) with RapidMiner. My problem is that I have about 400 positive and 1350 negative documents. I get pretty good accuracy but therefore my precision and recall for the positive class…
user18075
  • 617
  • 1
  • 6
  • 14
1
vote
2 answers

How Rapidminer handle same distance for KNN Algorithm

Actually I already asked in rapidminer forum, but no one has given an answer yet.. https://community.rapidminer.com/discussion/55963/how-k-nn-algorithms-work-with-same-distance-in-rapidminer#latest I can't find a satisfying answer for KNN-algorithm…
AdeMuchlis
  • 11
  • 1
1
vote
2 answers

Different prediction score for two SVM-based classifiers

As a validation study, I use two libsvm-based svm classifier against the same data set. One classifier is libsvm implementation in Rapidminer. Another classifier is Libsvm itself. Both of them assume the same parameter setting. However, the…
user785099
  • 1,105
  • 3
  • 14
  • 24
1
2