Homework 4
Fundamentals of Data Science
Homework 4
Please submit your solution to this problem on Husky CT by Monday, October 30th at 8:00 AM.
This zip file contains the first few chapters of “The Hitchiker’s Guide to the Galaxy” by Douglas Adams in text format in a file called hhg.txt.
- What shell command will tell you if the word “adorable” occurs in this file? (does it?)
- What shell command will tell you how many lines are in the file? (how many are there?)
Next, using whatever tools you prefer, create 26 text files called hhgX.txt where X runs from A to Z. Each file should contain all of the words from hhg.txt that begin with the corresponding letter, one word per line, in alphabetical order, in lower case. Each word should occur only once in hhgX.txt regardless of how many times it occurs in the original text.
Next, answer the following questions:
What shell command would combine all of the
hhgX.txtfiles into a single file calledhhgwords.txt?What shell commands would carry out the following:
- create a directory called
orig - move the original files
hhg.zipandhhg.txtinto this directory.
- create a directory called
Create a file that contains your answers to a,b,c,d called shell_answers.sh. This file should contain only four lines.
Create a single zip file called hhg-exploded.zip which, when uncompressed, yields:
a directory called
first_lastwherefirstandlastare your first and last names. Inside that directory, there should be:a file
report.txtthat explains your method for creating thehhgX.txtfiles (briefly)the file
shell_answers.sha subdirectory whose name is
hhg-exploded, and whose contents are the 26 files described above.a subdirectory called
bkupwhich contains the original text filehhg.txtas well as the original zip filehhg.zip.
To illustrate (although I’ve only put the ABC files in hhg-exploded) your zip file should unpack to this:
jeremy_teitelbaum
├── bkup
│ ├── hhg.txt
│ └── hhg.zip
├── hhg-exploded
│ ├── hhgA.txt
│ ├── hhgB.txt
│ ├── hhgC.txt
│
├── report.txt
└── shell_answers.sh