Loading...
 

1807 BigDataWithR UPC



1.1. Github repos from professor

https://github.com/sbartek?tab=repositories

  1. https://github.com/sbartek/big_data_with_R
  2. https://github.com/sbartek/big_data_with_R_2
  3. https://github.com/sbartek/sparklyr_exercise
  4. https://github.com/sbartek/sql_dplyr_exercise
  5. https://github.com/sbartek/sql_sparkR_exercise

1.2. DAta sets

Data sets from:

1.3. Interesting scripts


From his github repo: big_data_with_R_2/notebooks

  • 005_plan.Rmd
  • 010sql_vs_dplyr.Rmd
  • 020sql_vs_sparkR.Rmd
  • 030_files_formats.Rmd


1.4.1. Databricks

Cloud storage & Computing service: Databricks
https://community.cloud.databricks.com

Spark 2.3. 6Gb RAM. Easy usage of sparklyr, hadoop, etc.

Data from Competition: Instacart Market Basket Analysis
https://www.kaggle.com/c/instacart-market-basket-analysis

Instacart: you ask someone to do the shopping for you.

You can create a free account at the Community Edition:
https://databricks.com/try-databricks-v2

Community Edition
For students and educational institutions just getting started with Spark

Single cluster limited to 6GB and no worker nodes
Basic notebook without collaboration
Limited to 3 max users
Public environment to share your work




Alias names of this page:
BigDataInR

Image Seed: noun \ˈsēd\ : the beginning of something which continues to develop or grow

Knowledge seeds

Switch Language