1807 BigDataWithR UPC

1.1. Github repos from professor


  1. https://github.com/sbartek/big_data_with_R
  2. https://github.com/sbartek/big_data_with_R_2
  3. https://github.com/sbartek/sparklyr_exercise
  4. https://github.com/sbartek/sql_dplyr_exercise
  5. https://github.com/sbartek/sql_sparkR_exercise

1.2. DAta sets

Data sets from:

1.3. Interesting scripts

From his github repo: big_data_with_R_2/notebooks

  • 005_plan.Rmd
  • 010sql_vs_dplyr.Rmd
  • 020sql_vs_sparkR.Rmd
  • 030_files_formats.Rmd

1.4.1. Databricks

Cloud storage & Computing service: Databricks

Spark 2.3. 6Gb RAM. Easy usage of sparklyr, hadoop, etc.

Data from Competition: Instacart Market Basket Analysis

Instacart: you ask someone to do the shopping for you.

You can create a free account at the Community Edition:

Community Edition
For students and educational institutions just getting started with Spark

Single cluster limited to 6GB and no worker nodes
Basic notebook without collaboration
Limited to 3 max users
Public environment to share your work

Alias names of this page:

Image Seed: noun \ˈsēd\ : the beginning of something which continues to develop or grow

Knowledge seeds

Switch Language