GNU/Linux: Introduction and Administration
4h Session for the course on "Data Science. Applications to Biology and Medicine with Python and R", at IL3 - University of Barcelona. April 2rd, 2025. 16:00h-19:00h.
https://seeds4c.org/LinuxDataScience25
Hands-on Exercise
Source data derived from data obtained from here:
https://analisi.transparenciacatalunya.cat/en/Medi-Ambient/Dades-meteorol-giques-de-la-XEMA/nzvn-apee
Steps:
- PART A: Enter the GNU/Linux machine.
Choose one option from the following 3 options below:- Sign up at https://posit.cloud/plans/free to get a free account. Connect to posit.cloud and use the terminal window from the RStudio server there.
OR
. - Import a recent iso file form the Lubuntu GNU Linux distribution (latest Long Term Support version, 24.04 LTS as of this writing) into the VirtualBox program in your own computer.
- ISO File:
.
(Or alternatively, import an older but customized version of Lubuntu GNU Linux through, through importing the .ova file provided below for VirtualBox (explained within the session notes). OVA file:http://cloud.seeds4c.org/lubuntu_1804_64bit_v03.ova
)
.
Keep in mind that it will take some time to download the iso (3.1 Gb) or ova file (7.6Gb), and also to import it to your Virtual Box (5-10 minutes or more), - ISO File:
. - Sign up at https://posit.cloud/plans/free to get a free account. Connect to posit.cloud and use the terminal window from the RStudio server there.
- PART B: Fetch and subset data
Obtain a subset of columns and rows from a dataset, using Linux simple commands in a terminal (using shell commands, not R nor Python in this case),- Copy the source data file (data_smc.csv.bz2 from the usb disk provided by the course professor), or from here for instance:
http://cloud.seeds4c.org/data_smc.csv.bz2 (50Mb file, 10.000.000 rows csv file, bz2 compressed)
Open a Linux terminal in your home folder /home/datascience/ - Uncompress (
bunzip2 file.bz2 -k
) and show (withcat file
), or use +-bzcat file.bz2 -k+- to send to standard output (stdout) on-the-fly while keeping the source compressed file (-k)
- filter (keep) the first 100 rows (with
head -n100 file
) - save as new file:
file.csv
Oneliner with the previous commands piped one after the other in the same line - filter out one column, for instance, remove column 7 (variable _), with
cut
- save in zip
- Change permissions so that only your user can read and write it
. - Copy the source data file (data_smc.csv.bz2 from the usb disk provided by the course professor), or from here for instance:
- PART C: Your turn
- Creativity, Exploration...
- Doubts?
That should be it. Done!
Feel free to test more Linux commands in the linux terminal window from your positcloud space, or from the Linux you have imported in VirtualBox.
Additional info
If you want to keep practising and learning, beyond this course session, you can do so for instance here:
Alias names for this page:
GNULinuxOS25 | LinuxDataScience25