Logistic regression on HDFS, what's the algorithm?

Asked Jun 18 '18 at 17:48

Active Jun 18 '18 at 17:48

Viewed 75 times

How does Spark (or something similar) estimate a logistic regression model, or any statistical model that is estimated by an optimization algorithm, when the data are stored in a distributed environment, such as HDFS?

I read/heard that each iteration is a MapReduce job. How exactly would this work?

Are the solutions approximations? Would I get the same result if I estimated the model on one machine using all the data?

I could not find any useful resources online to these questions.

asked Jun 18 '18 at 17:48

Glen

6,320
4
37
59

Scala implementation is here: https://github.com/apache/spark/blob/v2.2.0/mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala – Analyst Jun 18 '18 at 20:19

Logistic regression on HDFS, what's the algorithm?

0 Answers0