tladeras’s Twitter Archive—№ 5,790

#rstats twitter: Does anyone have a good figure that shows how Spark data is distributed in a cluster and how distributing the data enables fast querying? Realizing there's a real disconnect between dev level diagrams and beginner diagrams.
Permalink On twitter.com ♻️ 3 Retweets ❤️ 3 Favorites 2021 Jun 11 Mood +4 🙂

…in reply to @tladeras
If you are a DB engineer, the Spark diagrams make sense, but explaining to beginners why Spark was chosen needs a different level of abstraction.
Permalink On twitter.com 2021 Jun 11 Mood +2 🙂

…in reply to @tladeras
(And apologies if I'm asking the wrong question, Spark people.)
Permalink On twitter.com 2021 Jun 11 Mood -1 🙁

…in reply to @tladeras
Realizing that I need to learn more about RDDs (Resilient Distributed Datasets).
Permalink On twitter.com 2021 Jun 11 Mood 0

…in reply to @tladeras
(Ok, just goes to show that I need to brush up on the terminology before I ask these questions.)
Permalink On twitter.com 2021 Jun 11 Mood 0

…in reply to @tladeras
Btw, this is very good: people.duke.edu/~ccc14/bios-823-2020/notebooks/C02_Sprak_Low_Level_API.html
On twitter.com ❤️ 1 Favorite 2021 Jun 11 Mood +3 🙂