Big Data con R Tidyverse y Spark

1.1. Intro slides

1.1.1. R & Big Data

https://github.com/rstudio/webinars/blob/master/14-Work-with-big-data/14-Work-with-big-data.pdf

1.1.2. Tidyverse

  • https://github.com/rstudio/webinars/blob/master/46-tidyverse-visualisation-and-manipulation-basics/00-Tidyverse-webinar.pdf
  • https://github.com/rstudio/webinars/blob/master/55-ciencia-de-datos-R/ciencia-de-datos-R.pdf

1.1.3. Spark (sparklyr):

https://github.com/rstudio/webinars/blob/master/42-Introduction%20to%20sparklyr/Introducing%20sparklyr%20-%20Webinar.pdf

1.1.4. Speeding up Spark via R via Arrow

https://arrow.apache.org/blog/2019/01/25/r-spark-improvements/

1.2. Intro Exercise

See:
https://gitlab.com/radup/curs-r-introduccio/blob/master/codi/extra.tips.bigdata.R

1.3. References:

  • TheRinSpark Book
    https://therinspark.com/

  • RStudio Webinar: Introducing an R interface for Apache Spark
    https://www.rstudio.com/resources/webinars/introducing-an-r-interface-for-apache-spark/
    https://github.com/rstudio/webinars/blob/master/42-Introduction%20to%20sparklyr/sparklyr-webinar1.Rmd

  • Some online tutorials
    • 30Gb DataSet
    • Text mining using sparklyr

  • Some cheatsheets:
    https://www.rstudio.com/resources/cheatsheets/
    • And some in Spanish:
      https://www.rstudio.com/resources/cheatsheets/ > Spanish Translations – Traducciones en español



The original document is available at https://seeds4c.org/Big%2BData%2Bcon%2BR%2BTidyverse%2By%2BSpark