Just wondering - if my organisation's data never runs into sizes than are bigger than my instances' memory size, why do I need something like Spark?
I can scale the memory up using cloud instances, these days it seems that you can really push the max memory on the cloud instances. https://aws.amazon.com/ec2/instance-types/
So am I missing something here? What does a Spark-based machine learning solution offer over a high memory single instance? Parallel processing?
Thanks.