A collection of four `R` packages to manage and analyze data with Hadoop, an open-source framework for reliable, scalable, distributed computing.
RHadoop
is a collection of four R
packages that allow users to manage and analyze data with Hadoop
, which is an open-source software framework for reliable, scalable, distributed computing.
The four R
packages are:
plyrmr
- higher level plyr-like data processing for structured data, powered byrmr
rmr
- functions providing Hadoop MapReduce functionality inR
rhdfs
- functions providing file management of the HDFS from withinR
rhbase
- functions providing database management for the HBase distributed database from withinR
.