tladeras’s avatartladeras’s Twitter Archive—№ 2,747

                          1. @javierluraschi talking about cluster computing made easy with spark and R. #Cascadiarconf
                        1. …in reply to @tladeras
                          @javierluraschi How much information exists in the world? Digital information has overtaken analog information. #Cascadiarconf
                      1. …in reply to @tladeras
                        @javierluraschi How did google process so much information? MapReduce. Split data up and process in parallel (map), and summarize (Reduce). #Cascadiarconf
                    1. …in reply to @tladeras
                      @javierluraschi Hadoop was initial implementation. Everything was disk-based and slow. Apache Spark works in memory, and is faster. #Cascadiarconf
                  1. …in reply to @tladeras
                    @javierluraschi What can you do with cluster computing? Deep learning algorithms need distributed computing now. #Cascadiarconf
                1. …in reply to @tladeras
                  @javierluraschi What to do with your slow code? Different approaches: 1) usually can sample data. 2) Use profviz to profile bottlenecks in code. 3) Get a bigger computer. 4) Use SparklyR and scale out computational problems #Cascadiarconf
              1. …in reply to @tladeras
                @javierluraschi sparklyr package. Use spark_connect() to connect to spark cluster. spark_read commands to read data in, use dplyr syntax or SQL statements, can use machine learning and modeling such as linear regression. #Cascadiarconf
            1. …in reply to @tladeras
              @javierluraschi Multiple packages use sparklyr to do more sophisticated analysis, such as graphical analysis. Real-time data can be represented as streams. Structured streams allow for parallel processing. #Cascadiarconf
          1. …in reply to @tladeras
            @javierluraschi Can use SQL and dplyr with streams. Can't train on realtime data, but can train on static data, and get scores from data streams. Can use streams as shiny inputs. Ooooohhhh. #Cascadiarconf
        1. …in reply to @tladeras
          @javierluraschi Spark, Kafka (handles temporary real-time data), and Shiny can work together. Oooooohhhh, neat. #Cascadiarconf
      1. …in reply to @tladeras
        @javierluraschi I am laughing really loud that @javierluraschi admitted that he has volumes of Knuth's "Art of Programming" but hasn't read them. #Cascadiarconf
    1. …in reply to @tladeras
      @javierluraschi @javierluraschi now demonstrating how streams work in Kafka by pulling #Cascadiarconf from twitter and recomputing in RNotebook.
  1. …in reply to @tladeras
    @javierluraschi He's now pulling images from these tweets using googleAuthR, and classifying them with googleVision. In real time. #Cascadiarconf
    1. …in reply to @tladeras
      For more info:
      oh my god twitter doesn’t include alt text from images in their API