tladeras’s Twitter Archive—№ 2,747

@javierluraschi talking about cluster computing made easy with spark and R. #Cascadiarconf
Permalink On twitter.com Mention ♻️ 1 Retweets ❤️ 2 Favorites 2019 Jun 8 Mood +2 🙂

…in reply to @tladeras
@javierluraschi How much information exists in the world? Digital information has overtaken analog information. #Cascadiarconf
Permalink On twitter.com 2019 Jun 8 Mood 0

…in reply to @tladeras
@javierluraschi How did google process so much information? MapReduce. Split data up and process in parallel (map), and summarize (Reduce). #Cascadiarconf
Permalink On twitter.com 2019 Jun 8 Mood 0

…in reply to @tladeras
@javierluraschi Hadoop was initial implementation. Everything was disk-based and slow. Apache Spark works in memory, and is faster. #Cascadiarconf
Permalink On twitter.com 2019 Jun 8 Mood +1 🙂

…in reply to @tladeras
@javierluraschi What can you do with cluster computing? Deep learning algorithms need distributed computing now. #Cascadiarconf
Permalink On twitter.com ♻️ 1 Retweets 2019 Jun 8 Mood 0

…in reply to @tladeras
@javierluraschi What to do with your slow code? Different approaches: 1) usually can sample data. 2) Use profviz to profile bottlenecks in code. 3) Get a bigger computer. 4) Use SparklyR and scale out computational problems #Cascadiarconf
Permalink On twitter.com ♻️ 1 Retweets ❤️ 3 Favorites 2019 Jun 8 Mood -2 🙁

…in reply to @tladeras
@javierluraschi sparklyr package. Use spark_connect() to connect to spark cluster. spark_read commands to read data in, use dplyr syntax or SQL statements, can use machine learning and modeling such as linear regression. #Cascadiarconf
Permalink On twitter.com 2019 Jun 8 Mood +3 🙂

…in reply to @tladeras
@javierluraschi Multiple packages use sparklyr to do more sophisticated analysis, such as graphical analysis. Real-time data can be represented as streams. Structured streams allow for parallel processing. #Cascadiarconf
Permalink On twitter.com ❤️ 1 Favorite 2019 Jun 8 Mood +3 🙂

…in reply to @tladeras
@javierluraschi Can use SQL and dplyr with streams. Can't train on realtime data, but can train on static data, and get scores from data streams. Can use streams as shiny inputs. Ooooohhhh. #Cascadiarconf
Permalink On twitter.com 2019 Jun 8 Mood 0

…in reply to @tladeras
@javierluraschi Spark, Kafka (handles temporary real-time data), and Shiny can work together. Oooooohhhh, neat. #Cascadiarconf
Permalink On twitter.com ❤️ 1 Favorite 2019 Jun 8 Mood +1 🙂

…in reply to @tladeras
@javierluraschi I am laughing really loud that @javierluraschi admitted that he has volumes of Knuth's "Art of Programming" but hasn't read them. #Cascadiarconf
Permalink On twitter.com ♻️ 1 Retweets 2019 Jun 8 Mood 0

…in reply to @tladeras
@javierluraschi @javierluraschi now demonstrating how streams work in Kafka by pulling #Cascadiarconf from twitter and recomputing in RNotebook.
Permalink On twitter.com ♻️ 1 Retweets 2019 Jun 8 Mood 0

…in reply to @tladeras
@javierluraschi He's now pulling images from these tweets using googleAuthR, and classifying them with googleVision. In real time. #Cascadiarconf
On twitter.com ❤️ 2 Favorites 2019 Jun 8 Mood 0