Questions tagged [rhadoop]

A collection of four `R` packages to manage and analyze data with Hadoop, an open-source framework for reliable, scalable, distributed computing.

RHadoop is a collection of four R packages that allow users to manage and analyze data with Hadoop, which is an open-source software framework for reliable, scalable, distributed computing.

The four R packages are:

  • plyrmr - higher level plyr-like data processing for structured data, powered by rmr
  • rmr - functions providing Hadoop MapReduce functionality in R
  • rhdfs - functions providing file management of the HDFS from within R
  • rhbase - functions providing database management for the HBase distributed database from within R.
3 questions
1
vote
0 answers

Distributed K Nearest Neighbor using RHadoop

I thought Revolution Analytics had a presentation out there about how to use RHadoop to do distributed KNN. Has anyone actually done this before, familiar with any blogs, or other references about how to do it? I'd appreciate any guidance!
Chris Simokat
  • 862
  • 1
  • 6
  • 11
1
vote
0 answers

Web Access Logs using RHadoop

Can anyone point me in the direction of an example of the data mining of web access logs using RHadoop? I've been reading 'Big Data Analysis with R and Hadoop' and certain aspects are confusing. It's the final year and few weeks of my studies and…
-1
votes
1 answer

Model selection method with k-fold cross validation

Can you please provide method to get final best model from cross validation.k-fold cross validation we have k models and accuracy estimate by average of k models accuracy.I need to know about how we select model from k-fold cross validation…