08 text processing_tools

Text Processing Tools

grep• ‘grep’ is used to search for strings and/or regular-expressions

(REGEX) in other commands’ outputs or as a search tool on its own.

• In order to search for a string within file, we can use: # grep -i ’user1' /etc/passwd

user1:x:500:500::/home/user1:/bin/bash

• ‘grep’ will output the entire line in which the string we searched for was found, as seen in the example above.

grep• grep has many options we can use; some of the common

ones are: -i : case-insensitive; do not mind upper or lower case. -v : return anything that is NOT the string we’ve searched for. -r / -R : recursive; search through sub-directories as well. -q : suppress all normal output; useful when checking and evaluating

in scripts.• For the full list of options, run: “grep --help” or “man grep”.

grep• There are two more variants of grep:

fgrep – suited for string searches only; the searches are performed faster.

egrep – suited for extended regex searches.

• grep can also be used as a filter, on the right side of pipes in order to display only specific outputs:# ls -l | grep “kf”-rw-rw-r-- 1 nir test 0 Jul 19 15:11 kfile9

cut• The “cut” command is used to filter out either fields or

columns of text. • Syntax:

cut [options] [filename(s)]

• Options: -f’[n]’ : [n] refers to field number(s); the fields must be separated by

a delimiter. -d’[delimiter]’ : this option defines which character in our string is

the delimiter; if this option is not supplied by the user, the default will be used (TAB).

# cut -f'6','7' -d':' /etc/passwd | grep user1/home/user1:/bin/bash

sort• The “sort” command enabled sorting of data in numerical or

alphabetical orders.• Syntax:

sort [options] [filename(s)]

• Options: -m – merge already sorted files -r - reverse sort order -M – month name sort -n – numeric sort -u – unique sort; display only the first match of a repetitive string in

the file, only once.

uniq• The “uniq” command searches for duplicates line of data.• Syntax:

uniq [options] [filename(s)]

• Options: -u – show only lines that are not repeated -d – show only one copy of the duplicate line -c – output each line with the count of occurrences -I – case-insensitive

tr• The “tr” command is used to translate characters.

It uses two sets of characters, given as command arguments and converts them on a char-to-char basis. Is can also:

Converts letter cases; upper to lower and vice-versa. Recognizes special characters, such as \n (newline) Cannot open files; can only use data from pipes or redirections from

within files.• Syntax: tr [options] charter-list1 charter-list2 < [file]• Options: -d – delete all characters appearing in “chars1” -s - replace instances of repeated characters with a single character. -cd – delete all characters that are NOT in “chars1”

tail• The “tail” command prints the end of a file• Syntax:

tail [options] [filename(s)]

• Options: -n+N print the last N lines (default is 10) -n-N print the entire file starting from line N -f follow mode. tail will stay active and update on each new line to the

head• The “head” command prints the start of a file• Syntax:

head [options] [filename(s)]

• Options: -n+N print the first N lines (default is 10) -n-N print the entire file until the Nth line

08 text processing_tools

Technology

Transcript of 08 text processing_tools

petercrisci.competercrisci.com/paklikale/paklikale_grammar_v090cc.pdf · · 2017-08-25About this Text This text describes the artificial language Paklikale. It lays out Paklikale’s

Text Data Mining · 2016-08-25 · Text Data Mining: Predictive and Exploratory Analysis of Text Jaime Arguello jarguell@email.unc.edu August 24, 2016 Thursday, August 25, 16. 2 Outline

Recovery Boiler Water Management Guidelines - BLRBAC Water... · 2016-08-08 · Recovery Boiler Water Management Guidelines (black italicized text = to be developed, green text =

LFTNM - L2 - Teacher - Text (14x10) - 08-28-09.indd

08 metabolism text

08 text haqqani_20 oct

1999 NEPC AR 07-08 text FA2 v2[online]

BMW LIFESTYLE. · 2020. 8. 26. · BMW LIFESTYLE | BMW COLLECTION | 08– 09 Text: AUS Page: 08 Text: AUS Page: 09 Previous page BMW T-SHIRT LOGO, LADIES Cut sleeves with lapel and

BizMOOC Discussion 01...[Type text] [Type text] [Type text] !! 08! Fall! BizMOOC Discussion 01 Existing MOOC initiatives in higher education and business sector and the distribution

Business International Text Book Chap 08

Working Draft – 05/08/2019 Workshop draft text for public ...€¦ · Working Draft – 05/08/2019 Workshop draft text for public review & comment . 2 . Control Measure including

static.uni-graz.at · Create rte w text type Text: Title of text: ... 53866 e inen in Bich Er:zhhlkomplex ... 10/4/2008 7:00:08 PM ...

Eric E Monson, Text->Data 08 Nov 2012

Cinahl Plus with Full text · 2013-08-09 · Cinahl Plus with Full text. เป็นฐานข้อมูลชั้นนําระดับโลกที่ได้รับการยอมรับและมีความครอบคลุมใน

MASONRY COURSE TEXT - MIC Course... · 2015-08-20 · COURSE TEXT 1ST EDITION MASONRY COURSE TEXT 1ST EDITION 5A Century Drive, Trincity Industrial Estate, Macoya, Trinidad, West

Excel & worksheet · 2019-08-29 · 2) Select the text with keyboard Inserting & Deleting Text – You can insert or delete text in several ways. 1) Select a block of text from beginning

AMR-to-Text Generation - Grammatical Frameworkschool.grammaticalframework.org › 2017 › slides › Normunds-AMR... · 2017-08-25 · Multilingual AMR-to-Text: experiment TestTrees:

LFTNM - L1 - Workbook ST & TM 2E - Text - 08-02-16

Utilizing Machine Learning in Information Retrieval: Text Classification …people.cs.georgetown.edu/.../COSC488-text-classificati… · · 2012-08-24Utilizing Machine Learning

Stacked Cross Attention for Image-Text Matchingopenaccess.thecvf.com › content_ECCV_2018 › papers › Kuang... · 2018-08-28 · Stacked Cross Attention for Image-Text Matching