GNU/Linux: Introduction and Administration
4h Session for the course on "Data Science. Applications to Biology and Medicine with Python and R", at IL3 - University of Barcelona. Feb 20th, 2023. 16:00h-19:00h.
|
|
Hands-on Exercise
Source data derived from data obtained from here:
https://analisi.transparenciacatalunya.cat/en/Medi-Ambient/Dades-meteorol-giques-de-la-XEMA/nzvn-apee
Steps:
- PART A: Enter the GNU/Linux machine.
Choose one option from the following 2 options below:- Import the .ova file provided (explained within the session notes) in the VirtualBox program in your own computer. Keep in mind that it will take some time: to download the ova file (7.6Gb), and alsoto import it to your Virtual Box (10 minutes or more),
- OVA file:
http://cloud.seeds4c.org/lubuntu_1804_64bit_v03.ova
OR
. - OVA file:
- Connect (by means of ssh terminal - using Putty in Windows, for instance
[+]
you can use usernames starting from user01, user02, user03.... user20.
ssh userNN@datascience.seeds4c.org
. - Import the .ova file provided (explained within the session notes) in the VirtualBox program in your own computer. Keep in mind that it will take some time: to download the ova file (7.6Gb), and alsoto import it to your Virtual Box (10 minutes or more),
- PART B: Fetch and subset data
Obtain a subset of columns and rows from a dataset, using Linux simple commands in a terminal (using shell commands, not R nor Python in this case),- Copy the source data file (data_smc.csv.bz2 from the usb disk provided by the course professor), or from here for instance:
http://cloud.seeds4c.org/data_smc.csv.bz2 (50Mb file, 10.000.000 rows csv file, bz2 compressed)
cd /home/userNN/ # just in case, change directory to your home folder wget http://cloud.seeds4c.org/data_smc.csv.bz2 # fetch the file from the internet - Uncompress (
bunzip2 file.bz2 -k
) and show (withcat file
), or use +-bzcat file.bz2 -k+- to send to standard output (stdout) on-the-fly while keeping the source compressed file (-k)
bunzip2 data_smc.csv.bz2 -k - filter (keep) the first 100 rows (with
head -n100 file
) - save as new file:
file.csv
bzcat data_smc.csv.bz2 -k | head -n100 > file_all.csv - filter out one column, for instance, remove column 7 (variable _), with
cut
cut --complement -d',' -f7 file_all.csv > file.csv - save in zip
zip file.csv.zip file.csv - Change permissions so that only your user can read and write it
chmod 600 file.csv.zip
. - Copy the source data file (data_smc.csv.bz2 from the usb disk provided by the course professor), or from here for instance:
- PART C: Expose dataset freely through webserver
- Install Apache web server.
sudo apt update sudo apt install apache2- Check that it's installed by visiting with your browser inside the virtual machine:
http://localhost/
- Check that it's installed by visiting with your browser inside the virtual machine:
- Move the produced file.csv.zip to /var/www/html/ while appending the number NN fromthe username you took for the connection to the server:
sudo cp /home/userNN/file.csv.zip /var/www/html/fileNN.csv.zip- Check if you can download it already by means of attempting to fetch the url http://localhost/fileNN.csv.zip
wget http://localhost/fileNN.csv.zip
- Check if you can download it already by means of attempting to fetch the url http://localhost/fileNN.csv.zip
- change owner of that file to www-data:www-data so that it can be viewed (and downloaded) onlilne through your browser
sudo chown www-data:www-data /var/www/html/fileNN.csv.zip
Check again if you can download it (try to fetch again the url http://localhost/fileNN.csv.zip )
+wget http://localhost/fileNN.csv.zip
- Install Apache web server.
That should be it: your file should be downloaded in the terminal window from the web server with the local address.
From the internet, you should be able to fecth it also at the url:
Done!
Additional info
If you want to keep practising and learning, beyond this course session, you can do so for instance here:
Alias names for this page:
GNULinuxOS23 | LinuxDataScience23