Data sets from:
- instacart_data: https://www.kaggle.com/c/instacart-market-basket-analysis
- football_data: https://www.kaggle.com/hugomathien/soccer
- Download data `database.sqlite.zip` from https://www.kaggle.com/hugomathien/soccer into directory `football_data` and unzip it.
- future_sales_data: https://www.kaggle.com/c/competitive-data-science-predict-future-sales/data
- Download all datasets from https://www.kaggle.com/c/competitive-data-science-predict-future-sales/data into directory `future_sales_data` and uznip it.
From his github repo:
Cloud storage & Computing service: Databricks
Spark 2.3. 6Gb RAM. Easy usage of sparklyr, hadoop, etc.
Data from Competition: Instacart Market Basket Analysis
Instacart: you ask someone to do the shopping for you.
You can create a free account at the Community Edition:
For students and educational institutions just getting started with Spark
Single cluster limited to 6GB and no worker nodes
Basic notebook without collaboration
Limited to 3 max users
Public environment to share your work
Alias names of this page: