Big Data con R Tidyverse y Spark
1.1. Intro slides
1.1.1. R & Big Data
https://github.com/rstudio/webinars/blob/master/14-Work-with-big-data/14-Work-with-big-data.pdf
1.1.2. Tidyverse
- https://github.com/rstudio/webinars/blob/master/46-tidyverse-visualisation-and-manipulation-basics/00-Tidyverse-webinar.pdf
- https://github.com/rstudio/webinars/blob/master/55-ciencia-de-datos-R/ciencia-de-datos-R.pdf
1.1.3. Spark (sparklyr):
1.1.4. Speeding up Spark via R via Arrow
https://arrow.apache.org/blog/2019/01/25/r-spark-improvements/
1.2. Intro Exercise
See:
https://gitlab.com/radup/curs-r-introduccio/blob/master/codi/extra.tips.bigdata.R
1.3. References:
- TheRinSpark Book
https://therinspark.com/
- RStudio Webinar: Introducing an R interface for Apache Spark
https://www.rstudio.com/resources/webinars/introducing-an-r-interface-for-apache-spark/
https://github.com/rstudio/webinars/blob/master/42-Introduction%20to%20sparklyr/sparklyr-webinar1.Rmd
- Some online tutorials
- 30Gb DataSet
- Text mining using sparklyr
- Some cheatsheets:
https://www.rstudio.com/resources/cheatsheets/- And some in Spanish:
https://www.rstudio.com/resources/cheatsheets/ > Spanish Translations – Traducciones en español
- And some in Spanish: