-
#rstats twitter: Does anyone have a good figure that shows how Spark data is distributed in a cluster and how distributing the data enables fast querying? Realizing there's a real disconnect between dev level diagrams and beginner diagrams.
-
If you are a DB engineer, the Spark diagrams make sense, but explaining to beginners why Spark was chosen needs a different level of abstraction.
-
(And apologies if I'm asking the wrong question, Spark people.)
-
Realizing that I need to learn more about RDDs (Resilient Distributed Datasets).
-
(Ok, just goes to show that I need to brush up on the terminology before I ask these questions.)
-
Btw, this is very good: people.duke.edu/~ccc14/bios-823-2020/notebooks/C02_Sprak_Low_Level_API.html