Searching and Sorting
description
Transcript of Searching and Sorting
![Page 1: Searching and Sorting](https://reader035.fdocuments.us/reader035/viewer/2022070404/56813a4f550346895da244bc/html5/thumbnails/1.jpg)
Searching and Sorting
![Page 2: Searching and Sorting](https://reader035.fdocuments.us/reader035/viewer/2022070404/56813a4f550346895da244bc/html5/thumbnails/2.jpg)
Why Use Data Files?
There are many cases where the input to the program may come from a data file.Using data files in your programs offer the following advantages.
Users do not have to input repetitive information.Several users and even programs can share common information.Programs are able to run on their own without waiting for a user to input anything.
![Page 3: Searching and Sorting](https://reader035.fdocuments.us/reader035/viewer/2022070404/56813a4f550346895da244bc/html5/thumbnails/3.jpg)
Disadvantages of Data Files
The disadvantage to using data files is that it usually a little more involved to take input from a file, then directly from the user.
![Page 4: Searching and Sorting](https://reader035.fdocuments.us/reader035/viewer/2022070404/56813a4f550346895da244bc/html5/thumbnails/4.jpg)
Searching and Sorting
The reason for this is that the data file may contain hundreds, or thousands Units of data. You must be able to search for the information you wish to find. In addition when writing to data files it would be preferable to have the file in some sort of order.
![Page 5: Searching and Sorting](https://reader035.fdocuments.us/reader035/viewer/2022070404/56813a4f550346895da244bc/html5/thumbnails/5.jpg)
Searching/Sorting Commands
Unix comes with several command to facilitate searching and sorting including the following
grep(g)awksedsortcutuniqdiff
![Page 6: Searching and Sorting](https://reader035.fdocuments.us/reader035/viewer/2022070404/56813a4f550346895da244bc/html5/thumbnails/6.jpg)
grep
grep is a command that will search a file for a certain “string” of information. When it finds a match it will show the whole line.
IE) grep bigelow /etc/passwdbigelow:x:1711:100:,,,:/home_staff/
bigelow:/bin/bash
![Page 7: Searching and Sorting](https://reader035.fdocuments.us/reader035/viewer/2022070404/56813a4f550346895da244bc/html5/thumbnails/7.jpg)
The password FileThe password file in Unix contains information about the users on the system. Every user on the system has an entry in the password fileThis file is consulted during login and to calculate file permissions.In most modern versions of Unix the password file doesn’t contain user passwords.
![Page 8: Searching and Sorting](https://reader035.fdocuments.us/reader035/viewer/2022070404/56813a4f550346895da244bc/html5/thumbnails/8.jpg)
/etc/passwd file syntaxUSER:PASSWORD:UID:GID:COMMENT:HOME
DIR:SHELL
User- The user’s login namePassword- Where the password used to be (now in
shadow file)
UID/GID- The User’s ID number and Group ID number Comment-Stores address or other general info about
the userHOME DIR – Specifies the user’s home directorySHELL – Specifies the user’s shell
![Page 9: Searching and Sorting](https://reader035.fdocuments.us/reader035/viewer/2022070404/56813a4f550346895da244bc/html5/thumbnails/9.jpg)
grep switches
grep –i BiGeLoW /etc/passwd Case insensitive grep –v bigelow Search for everything but the
string (used to remove lines from files).
IE)grep –v –i bobo ~/data.txt >data.tempThis will remove all line(s) that contain the string bobo and re-direct the output to a temp file.
![Page 10: Searching and Sorting](https://reader035.fdocuments.us/reader035/viewer/2022070404/56813a4f550346895da244bc/html5/thumbnails/10.jpg)
More Grep switches
There may be times where you will need to see what occurred above or below the line being searched for.
IE) grep –2 Jim ~/data.txt Will show the 2 lines above Jim and below Jim. Plus the line its self.
grep –c Jim Show how many lines contain the word Jim
![Page 11: Searching and Sorting](https://reader035.fdocuments.us/reader035/viewer/2022070404/56813a4f550346895da244bc/html5/thumbnails/11.jpg)
diff
The diff command shows what's different between any 2 files. The diff command uses < > to indicate which file contains that line.
IE) diff file1 file2
<line in file1 but not file2>line in file2 but not file1
![Page 12: Searching and Sorting](https://reader035.fdocuments.us/reader035/viewer/2022070404/56813a4f550346895da244bc/html5/thumbnails/12.jpg)
uniq
The uniq command is used to remove duplicate entries from a file. IE) uniq data.txt
bobofredjohnFrank
data.txtdata.txtbobofredbobojohnfrank
![Page 13: Searching and Sorting](https://reader035.fdocuments.us/reader035/viewer/2022070404/56813a4f550346895da244bc/html5/thumbnails/13.jpg)
The sort command
The sort command can be used to sort any file.
IE) sort /etc/passwd would put the file in order alphabetical order based on the first field.
![Page 14: Searching and Sorting](https://reader035.fdocuments.us/reader035/viewer/2022070404/56813a4f550346895da244bc/html5/thumbnails/14.jpg)
Sorting by other fields
sort –t<file separator> +<file number> file
IE)sort –t: +2 /etc/passwd
sorts the passwd file based on UID (second field)
![Page 15: Searching and Sorting](https://reader035.fdocuments.us/reader035/viewer/2022070404/56813a4f550346895da244bc/html5/thumbnails/15.jpg)
How sort number the fields
NAME:PASS:UID:GID: Fields 0 1 2 3 Field Number
sort <file><sorts by first field>sort +0 <sorts by first field>sort +1 <sorts by second field>
![Page 16: Searching and Sorting](https://reader035.fdocuments.us/reader035/viewer/2022070404/56813a4f550346895da244bc/html5/thumbnails/16.jpg)
(g)awk
awk or gawk is more then just a simple comand. awk is a powerful programming language.Awk is a great prototyping languageStart with a few lines and keep adding until it does what you want
![Page 17: Searching and Sorting](https://reader035.fdocuments.us/reader035/viewer/2022070404/56813a4f550346895da244bc/html5/thumbnails/17.jpg)
AWKA programming language for handling common data manipulation tasks with only a few lines of programAwk is a pattern action languageThe language looks a little like C but automatically handles input, field splitting, initialization, and memory management
Built-in string and number data typesNo variable type declarations
![Page 18: Searching and Sorting](https://reader035.fdocuments.us/reader035/viewer/2022070404/56813a4f550346895da244bc/html5/thumbnails/18.jpg)
gawk general syntaxgawk ‘/pattern/ {output}’ file
pattern - is what is being searched for. output - what will the command output when the pattern is matched. file -the file being search
The quotes are the single quotes found next to the enter key.
![Page 19: Searching and Sorting](https://reader035.fdocuments.us/reader035/viewer/2022070404/56813a4f550346895da244bc/html5/thumbnails/19.jpg)
Simple Output From AWK
If an action has no pattern, the action is performed for all input lines
gawk ‘{ print }’ filename gawk ‘{ print }’ filename will print all input lines on stdout
gawk ‘{ print $0 }’ filename will do the same thing
![Page 20: Searching and Sorting](https://reader035.fdocuments.us/reader035/viewer/2022070404/56813a4f550346895da244bc/html5/thumbnails/20.jpg)
Printing specific fields
Multiple items can be printed on the same output line with a single print statement
gawk ‘ { print $1, $3 }’ fileThis will print the first and third fields in the file.commas are used in the print statement to indicate spaces.
![Page 21: Searching and Sorting](https://reader035.fdocuments.us/reader035/viewer/2022070404/56813a4f550346895da244bc/html5/thumbnails/21.jpg)
Changing the field Separator
The default field separator in gawk is a space.
To change specify a different field simply use the field separator switch (-F)gawk –F: ‘{print $1,$7}’ /etc/passwd
would print the first and seventh fields (name and shell ) from the password file
![Page 22: Searching and Sorting](https://reader035.fdocuments.us/reader035/viewer/2022070404/56813a4f550346895da244bc/html5/thumbnails/22.jpg)
Using gawk to search
By including a pattern in the gawk statement this will actually allow the gawk command to searchgawk ‘/root/ {print $1,$7}’ /etc/passwdThis will only print the login and shell of those lines that contain the string root
![Page 23: Searching and Sorting](https://reader035.fdocuments.us/reader035/viewer/2022070404/56813a4f550346895da244bc/html5/thumbnails/23.jpg)
Consider the following text file
Joe,Smith,1234567fred,Sam,7654321Hank,Joe,9876543
Suppose you wanted only the people who’s last name are Joe. How would you structure a gawk command to accomplish that?
![Page 24: Searching and Sorting](https://reader035.fdocuments.us/reader035/viewer/2022070404/56813a4f550346895da244bc/html5/thumbnails/24.jpg)
Solution
gawk –F, ‘$2~/joe/ {print }’ datafile
This gawk statement is read as followsUsing field separator of a comma search the second field for the string ‘joe’ and print the whole line using the datafile as input
![Page 25: Searching and Sorting](https://reader035.fdocuments.us/reader035/viewer/2022070404/56813a4f550346895da244bc/html5/thumbnails/25.jpg)
Interactive exercise
Determine the commands that will accomplish the following;Sort the password file based on UID and save a copy of the file in you home directory called passwd.sorted