NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM
Transcript of NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM
![Page 1: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/1.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Unit 3 Review
1
Lists
Dictionaries
Tuples
Regular Expressions
![Page 2: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/2.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Python Lists
![Page 3: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/3.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
A List is a kind of Collection
A collection allows us to put many values in a
single “variable” A collection is nice because we can carry all many
values around in one convenient package.
friends = [ 'Joseph', 'Glenn', 'Sally' ] carryon = [ 'socks', 'shirt', 'perfume' ]
![Page 4: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/4.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
What is not a “Collection”
• Most of our variables have one value in them - when we put a new value in the variable - the old value is over written
![Page 5: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/5.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
List Constants
• List constants are surrounded by square brakets and the elements in the list are separated by commas.
• A list element can be any Python object - even another list
• A list can be empty
>>> print([1, 24, 76]) [1, 24, 76] >>> print(['red', 'yellow', 'blue‘]) ['red', 'yellow', 'blue'] >>> print(['red', 24, 98.6]) ['red', 24, 98.6] >>> print([ 1, [5, 6], 7]) [1, [5, 6], 7] >>> print([]) []
![Page 6: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/6.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
We already use lists!
for i in [5, 4, 3, 2, 1] : print(i) print 'Blastoff!'
5 4 3 2 1 Blastoff!
![Page 7: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/7.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Lists and definite loops - best pals
friends = ['Joseph', 'Glenn', 'Sally'] for friend in friends : print(‘Happy New Year:’, friend) print( ‘Done!’)
Happy New Year: Joseph Happy New Year: Glenn Happy New Year: Sally Done!
![Page 8: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/8.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Looking Inside Lists
• Just like strings, we can get at any single element in
a list using an index specified in square brackets
0
Joseph >>> friends = [ 'Joseph', 'Glenn', 'Sally' ] >>> print(friends[1]) Glenn >>>
1
Glenn
2
Sally
![Page 9: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/9.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Lists are Mutable
• Strings are "immutable" - we cannot change the contents of a string - we must make a new string to make any change
• Lists are "mutable" - we can change an element of a list using the index operator
>>> fruit = 'Banana’ >>> fruit[0] = 'b’ Traceback TypeError: 'str' object does not support item assignment >>> x = fruit.lower() >>> print(x) banana >>> lotto = [2, 14, 26, 41, 63] >>> print(lotto) [2, 14, 26, 41, 63] >>> lotto[2] = 28 >>> print(lotto) [2, 14, 28, 41, 63]
![Page 10: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/10.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
How Long is a List?
• The len() function takes a list as a parameter and returns the number of elements in the list
• Actually len() tells us the number of elements of any set or sequence (i.e. such as a string...)
>>> greet = 'Hello Bob’ >>> print(len(greet)) 9 >>> x = [ 1, 2, 'joe', 99] >>> print(len(x)) 4 >>>
![Page 11: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/11.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Using the range function
• So in Python 3.x, the range() function got its own type.
• In basic terms, if you want to use range() in a for loop, then you're good to go.
• However you can't use it purely as a list object. For example you cannot slice a range type.
• When you're using an iterator, every loop of the for statement produces the next number on the fly.
• Note that in Python 3.x, you can still produce a list by passing the generator returned to the list() function.
>>> print(range(4)) range(0, 4) >>> friends = ['Joseph', 'Glenn', 'Sally'] >>> print(len(friends)) 3 >>> print(range(len(friends))) range(0, 3) >>> >>> print(type(range(4))) <class 'range'> print(list(range(len(friends)))) [0, 1, 2]
![Page 12: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/12.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
A tale of two loops...
friends = ['Joseph', 'Glenn', 'Sally'] for friend in friends : print('Happy New Year:', friend) for i in range(len(friends)) : friend = friends[i] print('Happy New Year:', friend)
Happy New Year: Joseph Happy New Year: Glenn Happy New Year: Sally
>>> friends = ['Joseph', 'Glenn', 'Sally'] >>> print(len(friends)) 3 >>> print(range(len(friends))) range(0, 3)
>>> print(list(range(len(friends)))) [0, 1, 2]
![Page 13: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/13.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Concatenating lists using +
• We can create a new list by adding two existing lists
together
>>> a = [1, 2, 3] >>> b = [4, 5, 6] >>> c = a + b >>> print(c) [1, 2, 3, 4, 5, 6] >>> print(a) [1, 2, 3]
![Page 14: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/14.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Lists can be sliced using :
>>> t = [9, 41, 12, 3, 74, 15] >>> t[1:3] [41,12] >>> t[:4] [9, 41, 12, 3] >>> t[3:] [3, 74, 15] >>> t[:] [9, 41, 12, 3, 74, 15]
Remember: Just like in strings, the second number is "up to but not including"
![Page 15: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/15.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
List Methods
>>> x = list() >>> type(x) <class 'list'> >>> dir(x) ['append', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort'] >>>
http://docs.python.org/tutorial/datastructures.html
![Page 16: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/16.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Building a list from scratch
• We can create an
empty list and then
add elements using
the append method
• The list stays in
order and new
elements are added
at the end of the list
>>> stuff = list() >>> stuff.append('book') >>> stuff.append(99) >>> print(stuff) ['book', 99] >>> stuff.append('cookie') >>> print(stuff) ['book', 99, 'cookie']
![Page 17: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/17.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Is Something in a List?
• Python provides two operators that let you check if an item is in a list
• These are logical operators that return True or False
• They do not modify the list
>>> some = [1, 9, 21, 10, 16] >>> 9 in some True >>> 15 in some False >>> 20 not in some True >>>
![Page 18: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/18.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
A List is an Ordered Sequence
• A list can hold many items and keeps those items in the order until we do something to change the order
• A list can be sorted (i.e. change its order)
• The sort method (unlike in strings) means "sort yourself"
>>> friends = [ 'Joseph', 'Glenn', 'Sally' ] >>> friends.sort() >>> print(friends) ['Glenn', 'Joseph', 'Sally'] >>> print(friends[1]) Joseph >>>
![Page 19: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/19.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Built in Functions and Lists
• There are a number of functions built into Python that take lists as parameters
• Remember the loops we built? These are much simpler
>>> nums = [3, 41, 12, 9, 74, 15] >>> print(len(nums)) 6 >>> print(max(nums)) 74 >>> print(min(nums)) 3 >>> print(sum(nums)) 154 >>> print(sum(nums)/len(nums)) 25.666
![Page 20: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/20.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Best Friends: Strings and Lists
>>> abc = 'With three words’ >>> stuff = abc.split() >>> print(stuff) ['With', 'three', 'words'] >>> print(len(stuff)) 3 >>> print(stuff[0]) With
>>> print stuff ['With', 'three', 'words'] >>> for w in stuff : ... print(w) ... With Three Words >>>
Split breaks a string into parts produces a list of strings. We think of these as words. We can access a particular word or loop through all the words.
![Page 21: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/21.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
>>> line = 'A lot of spaces’ >>> etc = line.split()
>>> print(etc)
['A', 'lot', 'of', 'spaces']
>>>
>>> line = 'first;second;third’ >>> thing = line.split()
>>> print(thing)
['first;second;third']
>>> print(len(thing))
1
>>> thing = line.split(';')
>>> print(thing)
['first', 'second', 'third']
>>> print(len(thing))
3
>>>
When you do not specify a delimiter, multiple spaces are treated like “one” delimiter. You can specify what delimiter character to use in the splitting.
![Page 22: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/22.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
fhand = open('mbox-short.txt') for line in fhand: line = line.rstrip() if not line.startswith('From ') : continue words = line.split() print(words[2])
Sat Fri Fri Fri ...
From [email protected] Sat Jan 5 09:14:16 2008
>>> line = 'From [email protected] Sat Jan 5 09:14:16 2008’ >>> words = line.split() >>> print(words) ['From', '[email protected]', 'Sat', 'Jan', '5', '09:14:16', '2008'] >>>
![Page 23: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/23.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
From [email protected] Sat Jan 5 09:14:16 2008
The Double Split Pattern
• Sometimes we split a line one way and then grab one of
the pieces of the line and split that piece again
words = line.split() email = words[1] pieces = email.split('@') print(pieces[1])
['stephen.marquard', 'uct.ac.za']
'uct.ac.za'
![Page 24: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/24.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
List Summary
• Concept of a collection
• Lists and definite loops
• Indexing and lookup
• List mutability
• Functions: len, min, max, sum
• Slicing lists
• List methods: append, remove
• Sorting lists
• Splitting strings into lists of
words
• Using split to parse strings
![Page 25: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/25.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Python Dictionaries
![Page 26: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/26.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
What is a Collection?
A collection is nice because we can put more than one
value in them and carry them all around in one
convenient package.
We have a bunch of values in a single “variable” We do this by having more than one place “in” the
variable.
We have ways of finding the different places in the
variable
![Page 27: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/27.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
A Story of Two Collections..
List
A linear collection of values that stay in order
Dictionary
A “bag” of values, each with its own label
![Page 28: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/28.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Dictionaries
money
tissue calculator
perfume
candy
http://en.wikipedia.org/wiki/Associative_array
![Page 29: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/29.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Dictionaries
Dictionaries are Python ’ s most powerful data
collection
Dictionaries allow us to do fast database-like
operations in Python
Dictionaries have different names in different
languages
Associative Arrays - Perl / Php
Properties or Map or HashMap - Java
Property Bag - C# / .Net
http://en.wikipedia.org/wiki/Associative_array
![Page 30: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/30.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Dictionaries
Lists index their entries based on the position in the list
Dictionaries are like bags - no order
So we index the things we put in the dictionary with a “lookup tag”
>>> purse = dict() >>> purse['money'] = 12 >>> purse['candy'] = 3 >>> purse['tissues'] = 75 >>> print(purse) {'money': 12, 'tissues': 75, 'candy': 3} >>> print(purse['candy']) 3 >>> purse['candy'] = purse['candy'] + 2 >>> print(purse) {'money': 12, 'tissues': 75, 'candy': 5}
![Page 31: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/31.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Comparing Lists and Dictionaries
• Dictionaries are like Lists except that they use keys
instead of numbers to look up values
>>> lst = list() >>> lst.append(21) >>> lst.append(183) >>> print(lst) [21, 183] >>> lst[0] = 23 >>> print(lst) [23, 183]
>>> ddd = dict() >>> ddd['age'] = 21 >>> ddd['course'] = 182 >>> print(ddd) {'course': 182, 'age': 21} >>> ddd['age'] = 23 >>> print(ddd) {'course': 182, 'age': 23}
![Page 32: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/32.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
>>> lst = list() >>> lst.append(21) >>> lst.append(183) >>> print(lst) [21, 183] >>> lst[0] = 23 >>> print(lst) [23, 183]
>>> ddd = dict() >>> ddd['age'] = 21 >>> ddd['course'] = 182 >>> print(ddd) {'course': 182, 'age': 21} >>> ddd['age'] = 23 >>> print(ddd) {'course': 182, 'age': 23}
[0] 21
[1] 183
index Value
['course'] 183
['age'] 21
Key Value
List
Dictionary
![Page 33: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/33.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Dictionary Literals (Constants)
• Dictionary literals use curly braces and have a list of key : value pairs
• You can make an empty dictionary using empty curly braces
>>> jjj = { 'chuck' : 1 , 'fred' : 42, 'jan': 100} >>> print(jjj) {'jan': 100, 'chuck': 1, 'fred': 42} >>> ooo = { } >>> print(ooo) {} >>>
![Page 34: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/34.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Many Counters with a Dictionary
One common use of dictionary is counting how often we “ see ” something
Key Value
>>> ccc = dict() >>> ccc['csev'] = 1 >>> ccc['cwen'] = 1 >>> print(ccc) {'csev': 1, 'cwen': 1} >>> ccc['cwen'] = ccc['cwen'] + 1 >>> print(ccc) {'csev': 1, 'cwen': 2}
![Page 35: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/35.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Dictionary Tracebacks
• It is an error to reference a key which is not in the dictionary
• We can use the in operator to see if a key is in the dictionary
>>> ccc = dict() >>> print(ccc['csev']) Traceback (most recent call last): File "<stdin>", line 1, in <module> KeyError: 'csev' >>> print('csev' in ccc) False
![Page 36: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/36.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
When we see a new name
• When we encounter a new name, we need to add a new entry in the dictionary and if this the second or later time we have seen the name, we simply add one to the count in the dictionary under that name
counts = dict()
names = ['csev', 'cwen', 'csev', 'zqian', 'cwen']
for name in names :
if name not in counts:
counts[name] = 1
else :
counts[name] = counts[name] + 1
print(counts)
{'csev': 2, 'zqian': 1, 'cwen': 2}
![Page 37: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/37.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Simplified counting with get()
• This pattern of checking to see if a key is already in a dictionary and assuming a default value if the key is not there is so common, that there is a method called get() that does this for us
• We can use get() and provide a default value of zero when the key is not yet in the dictionary - and then just add one
counts = dict()
names = ['csev', 'cwen', 'csev', 'zqian', 'cwen']
for name in names :
counts[name] = counts.get(name, 0) + 1
print(counts)
{'csev': 2, 'zqian': 1, 'cwen': 2}
Default
![Page 38: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/38.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Counting Pattern
counts = dict()
print('Enter a line of text:’) line = input('')
words = line.split()
print('Words:', words)
print('Counting...’) for word in words:
counts[word] = counts.get(word,0) + 1
print('Counts', counts)
The general pattern to count the words in a line of text is to split the line into words, then loop thrugh the words and use a dictionary to track the count of each word independently.
![Page 39: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/39.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Counting Words
python wordcount.py Enter a line of text:the clown ran after the car and the car ran into the tent and the tent fell down on the clown and the car Words: ['the', 'clown', 'ran', 'after', 'the', 'car', 'and', 'the', 'car', 'ran', 'into', 'the', 'tent', 'and', 'the', 'tent', 'fell', 'down', 'on', 'the', 'clown', 'and', 'the', 'car'] Counting... Counts {'and': 3, 'on': 1, 'ran': 2, 'car': 3, 'into': 1, 'after': 1, 'clown': 2, 'down': 1, 'fell': 1, 'the': 7, 'tent': 2}
![Page 40: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/40.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Definite Loops and Dictionaries
• Even though dictionaries are not stored in order, we can write a for loop that goes through all the entries in a dictionary - actually it goes through all of the keys in the dictionary and looks up the values
>>> counts = { 'chuck' : 1 , 'fred' : 42, 'jan': 100}
>>> for key in counts:
... print(key, counts[key])
...
jan 100
chuck 1
fred 42
>>>
![Page 41: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/41.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Retrieving lists of Keys and Values
• You can get a list of keys, values or items (both) from a
dictionary
>>> jjj = { 'chuck' : 1 , 'fred' : 42, 'jan': 100} >>> print(list(jjj)) ['jan', 'chuck', 'fred'] >>> print(jjj.keys()) dict_keys(['jan', 'chuck', 'fred']) >>> print(jjj.values()) dict_values([100, 1, 42]) >>> print(jjj.items()) dict_items([('jan', 100), ('chuck', 1), ('fred', 42)]) >>>
What is a 'tuple'? - coming soon...
![Page 42: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/42.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Bonus: Two Iteration Variables!
• We loop through the key-value pairs in a dictionary using *two* iteration variables
• Each iteration, the first variable is the key and the the second variable is the corresponding value for the key
>>> jjj = { 'chuck' : 1 , 'fred' : 42, 'jan': 100} >>> for aaa, bbb in jjj.items() : ... print(aaa, bbb) ... jan 100 chuck 1 fred 42 >>>
[chuck] 1
[fred] 42
aaa bbb
[jan] 100
![Page 43: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/43.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Summary
• What is a collection?
• Lists versus Dictionaries
• Dictionary constants
• The most common word
• Using the get() method
• Hashing, and lack of order
• Writing dictionary loops
• Sneak peek: tuples
• Sorting dictionaries
![Page 44: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/44.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Tuples
![Page 45: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/45.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Tuples are like lists
• Tuples are another kind of sequence that function much like a list - they have elements which are indexed starting at 0
>>> x = ('Glenn', 'Sally', 'Joseph')
>>> print(x[2])
Joseph
>>> y = ( 1, 9, 2 )
>>> print(y)
(1, 9, 2)
>>> print(max(y))
9
>>> for iter in y:
... print(iter)
...
1
9
2
>>>
![Page 46: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/46.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
..but.. Tuples are "immutable"
• Unlike a list, once you create a tuple, you cannot alter
its contents - similar to a string
>>> x = [9, 8, 7]
>>> x[2] = 6
>>> print(x)
[9, 8, 6]
>>>
>>> y = 'ABC’ >>> y[2] = 'D’ Traceback:'str'
object does
not support item
Assignment
>>>
>>> z = (5, 4,3)
>>> z[2] = 0
Traceback:'tuple'
object does
not support item
Assignment
>>>
![Page 47: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/47.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Things not to do with tuples
>>> x = (3, 2, 1)
>>> x.sort()
Traceback:AttributeError: 'tuple' object has no
attribute 'sort’ >>> x.append(5)
Traceback:AttributeError: 'tuple' object has no
attribute 'append’ >>> x.reverse()
Traceback:AttributeError: 'tuple' object has no
attribute 'reverse’ >>>
![Page 48: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/48.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
A Tale of Two Sequences
>>> l = list()
>>> dir(l)
['append', 'count', 'extend', 'index', 'insert',
'pop', 'remove', 'reverse', 'sort']
>>> t = tuple()
>>> dir(t)
['count', 'index']
![Page 49: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/49.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Tuples are more efficient
• Since Python does not have to build tuple structures
to be modifiable, they are simpler and more efficient
in terms of memory use and performance than lists
• So in our program when we are making "temporary
variables" we prefer tuples over lists.
![Page 50: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/50.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Tuples and Assignment
• We can also put a tuple on the left hand side of an
assignment statement
• We can even omit the parenthesis
>>> (x, y) = (4, 'fred')
>>> print(y)
Fred
>>> (a, b) = (99, 98)
>>> print(a)
99
![Page 51: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/51.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Tuples and Dictionaries
• The items() method in dictionaries returns a list of
(key, value) tuples
>>> d = dict()
>>> d['csev'] = 2
>>> d['cwen'] = 4
>>> for (k,v) in d.items():
... print(k, v)
...
csev 2
cwen 4
>>> tups = d.items()
>>> print(tups)
dict_items([('csev', 2), ('cwen', 4)])
![Page 52: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/52.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Tuples are Comparable
• The comparison operators work with tuples and other sequences If the first item is equal, Python goes on to the next element, and so on, until it finds elements that differ.
>>> (0, 1, 2) < (5, 1, 2)
True
>>> (0, 1, 2000000) < (0, 3, 4)
True
>>> ( 'Jones', 'Sally' ) < ('Jones', 'Sam')
True
>>> ( 'Jones', 'Sally') > ('Adams', 'Sam')
True
![Page 53: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/53.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Sorting Lists of Tuples - Using sorted()
• We can take advantage of the ability to sort a list of tuples to get a sorted version of a dictionary
• We can do this even more directly using the built-in function sorted that takes a sequence as a parameter and returns a sorted sequence >>> d = {'a':10, 'b':1, 'c':22}
>>> d.items()
dict_items([('a', 10), ('c', 22), ('b', 1)])
>>> t = sorted(d.items())
>>> t
[('a', 10), ('b', 1), ('c', 22)]
>>> for k, v in sorted(d.items()):
... print(k, v)
...
a 10
b 1
c 22
![Page 54: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/54.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Sort by values instead of key
• If we could construct a list of tuples of the form (value, key) we could sort by value
• We do this with a for loop that creates a list of tuples
>>> c = {'a':10, 'b':1, 'c':22}
>>> tmp = list()
>>> for k, v in c.items() :
... tmp.append( (v, k) )
...
>>> print tmp
[(10, 'a'), (22, 'c'), (1, 'b')]
>>> tmp.sort(reverse=True)
>>> print tmp
[(22, 'c'), (10, 'a'), (1, 'b')]
>>> tmp.sort()
>>> print tmp
[(1, 'b'), (10, 'a'), (22, 'c')]
![Page 55: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/55.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
fhand = open('romeo.txt')
counts = dict()
for line in fhand:
words = line.split()
for word in words:
counts[word] = counts.get(word, 0 ) + 1
lst = list()
for key, val in counts.items():
lst.append( (val, key) )
lst.sort(reverse=True)
for val, key in lst[:10] :
print(key, val)
Program to Count the occurrence of each word and sort it according
![Page 56: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/56.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Summary
• Tuple syntax
• Mutability (not)
• Comparability
• Sortable
• Tuples in assignment statements
• Using sorted()
• Sorting dictionaries by either key
or value
![Page 57: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/57.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Regular Expressions
![Page 58: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/58.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Regular Expressions
http://en.wikipedia.org/wiki/Regular_expression
In computing, a regular expression, also referred to as "regex" or "regexp", provides a concise and flexible means for matching strings of text, such as particular characters, words, or patterns of characters. A regular expression is written in a formal language that can be interpreted by a regular expression processor.
![Page 59: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/59.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Regular Expressions
http://en.wikipedia.org/wiki/Regular_expression
Really clever "wild card" expressions for matching and parsing strings.
![Page 60: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/60.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Really smart "Find" or "Search"
![Page 61: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/61.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Understanding Regular Expressions
• Very powerful and quite cryptic
• Fun once you understand them
• Regular expressions are a language unto themselves
• A language of "marker characters" - programming
with characters
• It is kind of an "old school" language - compact
![Page 62: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/62.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Regular Expression Quick Guide
^ Matches the beginning of a line $ Matches the end of the line . Matches any single character \s Matches whitespace \S Matches any single non-whitespace character * Repeats or Matches a character zero or more times of preceding expression *? Repeats or Matches a character zero or more times (non-greedy) of preceding expression + Repeats or Matches a character one or more times of preceding expression +? Repeats or Matches a character one or more times (non-greedy) of preceding expression [aeiou] Matches any single character in the listed set or bracket [^XYZ] Matches any single character not in the listed set or bracket [a-z0-9] The set of characters can include a range denoted by hyphen ( Indicates where string extraction is to start ) Indicates where string extraction is to end r Use “r” at the start of the pattern string, it designates a python raw string \w The characters [a-zA-Z0-9_] are word characters. These are also matched by the short-hand character class \w
![Page 63: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/63.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
The Regular Expression Module
• Before you can use regular expressions in your
program, you must import the library using "import
re"
• You can use re.search() to see if a string matches a
regular expression similar to using the find() method
for strings
• You can use re.findall() extract portions of a string
that match your regular expression similar to a
combination of find() and slicing: var[5:10]
![Page 64: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/64.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Using re.search() like find()
import re hand = open('mbox-short.txt') for line in hand: line = line.rstrip() if re.search('From:', line) : print(line)
hand = open('mbox-short.txt') for line in hand: line = line.rstrip() if line.find('From:') >= 0: print(line)
![Page 65: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/65.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Using re.search() like startswith()
import re hand = open('mbox-short.txt') for line in hand: line = line.rstrip() if re.search('^From:', line) : print(line)
hand = open('mbox-short.txt') for line in hand: line = line.rstrip() if line.startswith('From:') : print(line)
We fine-tune what is matched by adding special characters to the string
![Page 66: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/66.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Wild-Card Characters
• The dot character matches any character
• If you add the asterisk character, the character is
"any number of times"
X-Sieve: CMU Sieve 2.3 X-DSPAM-Result: Innocent X-DSPAM-Confidence: 0.8475 X-Content-Type-Message-Body: text/plain
^X.*:
![Page 67: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/67.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Wild-Card Characters
• The dot character matches any character
• If you add the asterisk character, the character is
"any number of times"
X-Sieve: CMU Sieve 2.3 X-DSPAM-Result: Innocent X-DSPAM-Confidence: 0.8475 X-Content-Type-Message-Body: text/plain
^X.*:
Match the start of the line
Match any character
Many times
![Page 68: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/68.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Fine-Tuning Your Match
• Depending on how "clean" your data is and the
purpose of your application, you may want to
narrow your match down a bit
X-Sieve: CMU Sieve 2.3 X-DSPAM-Result: Innocent X-Plane is behind schedule: two weeks ^X.*:
Match the start of the line
Match any character
Many times
![Page 69: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/69.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Fine-Tuning Your Match
• Depending on how "clean" your data is and the
purpose of your application, you may want to
narrow your match down a bit
X-Sieve: CMU Sieve 2.3 X-DSPAM-Result: Innocent X-Plane is behind schedule: two weeks ^X-\S+:
Match the start of the line
Match any non-whitespace character
One or more times
![Page 70: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/70.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Matching and Extracting Data
• The re.search() returns a True/False depending on whether the string matches the regular expression
• If we actually want the matching strings to be extracted, we use re.findall()
>>> import re >>> x = 'My 2 favorite numbers are 19 and 42' >>> y = re.findall('[0-9]+',x) >>> print(y) ['2', '19', '42']
[0-9]+
One or more digits
![Page 71: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/71.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Matching and Extracting Data
• When we use re.findall() it returns a list of zero or
more sub-strings that match the regular expression
>>> import re >>> x = ‘My 2 favorite numbers are 19 and 42’ >>> y = re.findall('[0-9]+',x) >>> print(y) ['2', '19', '42'] >>> y = re.findall('[AEIOU]+',x) >>> print(y) []
![Page 72: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/72.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Warning: Greedy Matching
• The repeat characters (* and +) push outward in both
directions (greedy) to match the largest possible string
>>> import re >>> x = 'From: Using the : character' >>> y = re.findall('^F.+:', x) >>> print(y) ['From: Using the :']
^F.+:
One or more characters
First character in the match is an F
Last character in the match is a : Why not 'From:'?
![Page 73: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/73.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Non-Greedy Matching
• Not all regular expression repeat codes are greedy!
If you add a ? character - the + and * chill out a bit...
>>> import re >>> x = 'From: Using the : character' >>> y = re.findall('^F.+?:', x) >>> print(y) ['From:']
^F.+?:
One or more characters but not greedily
First character in the match is an F
Last character in the match is a :
![Page 74: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/74.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Fine Tuning String Extraction
• You can refine the match for re.findall() and separately determine which portion of the match that is to be extracted using parenthesis
x = ‘From [email protected] Sat Jan 5 09:14:16 2008’
>>> import re >>> y = re.findall('\S+@\S+',x) >>> print(y) ['[email protected]']
\S+@\S+
At least one non-whitespace character
![Page 75: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/75.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Fine Tuning String Extraction
• Parenthesis are not part of the match - but they tell where to start and stop what string to extract
x = ‘From [email protected] Sat Jan 5 09:14:16 2008’
>>> import re >>> y = re.findall('\S+@\S+',x) >>> print(y) ['[email protected]'] >>> y = re.findall('^From (\S+@\S+)',x) >>> print(y) ['[email protected]']
^From (\S+@\S+)
Give Space
![Page 76: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/76.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
>>> data = 'From [email protected] Sat Jan 5 09:14:16 2008' >>> atpos = data.find('@') >>> print(atpos) 21 >>> sppos = data.find(' ',atpos) >>> print(sppos) 31 >>> host = data[atpos+1 : sppos] >>> print(host) uct.ac.za
From [email protected] Sat Jan 5 09:14:16 2008
21 31
Extracting a host name - using find and string slicing.
![Page 77: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/77.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
The Double Split Version
• Sometimes we split a line one way and then grab one of
the pieces of the line and split that piece again
From [email protected] Sat Jan 5 09:14:16 2008
![Page 78: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/78.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
The Double Split Version
• Sometimes we split a line one way and then grab one of
the pieces of the line and split that piece again
line = ‘From [email protected] Sat Jan 5 09:14:16 2008’
words = line.split() email = words[1] pieces = email.split('@') print(pieces[1])
['stephen.marquard', 'uct.ac.za']
'uct.ac.za'
![Page 79: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/79.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
The Regex Version
From [email protected] Sat Jan 5 09:14:16 2008
import re
lin = 'From [email protected] Sat Jan 5 09:14:16 2008'
y = re.findall('@([^ ]*)',lin)
print(y)
['uct.ac.za']
'@([^ ]*)'
Look through the string until you find an at-sign
![Page 80: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/80.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
The Regex Version
From [email protected] Sat Jan 5 09:14:16 2008
import re
lin = 'From [email protected] Sat Jan 5 09:14:16 2008'
y = re.findall('@([^ ]*)',lin)
print(y)
['uct.ac.za']
'@([^ ]*)'
Match non-blank character Match many of them
![Page 81: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/81.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
The Regex Version
From [email protected] Sat Jan 5 09:14:16 2008
import re
lin = 'From [email protected] Sat Jan 5 09:14:16 2008'
y = re.findall('@([^ ]*)',lin)
print(y)
['uct.ac.za']
'@([^ ]*)'
Extract the non-blank characters
![Page 82: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/82.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Even Cooler Regex Version
From [email protected] Sat Jan 5 09:14:16 2008
import re
lin = 'From [email protected] Sat Jan 5 09:14:16 2008'
y = re.findall('^From .*@([^ ]*)',lin)
print(y)
['uct.ac.za']
'^From .*@([^ ]*)'
Starting at the beginning of the line, look for the string 'From '
![Page 83: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/83.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Even Cooler Regex Version
From [email protected] Sat Jan 5 09:14:16 2008
import re
lin = 'From [email protected] Sat Jan 5 09:14:16 2008'
y = re.findall('^From .*@([^ ]*)',lin)
print(y)
['uct.ac.za']
'^From .*@([^ ]*)'
Skip a bunch of characters, looking for an at-sign
![Page 84: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/84.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Even Cooler Regex Version
From [email protected] Sat Jan 5 09:14:16 2008
import re
lin = 'From [email protected] Sat Jan 5 09:14:16 2008'
y = re.findall('^From .*@([^ ]*)',lin)
print(y)
['uct.ac.za']
'^From .*@([^ ]*)'
Start 'extracting'
![Page 85: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/85.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Even Cooler Regex Version
From [email protected] Sat Jan 5 09:14:16 2008
import re
lin = 'From [email protected] Sat Jan 5 09:14:16 2008'
y = re.findall('^From .*@([^ ]*)',lin)
print(y)
['uct.ac.za']
'^From .*@([^ ]*)'
Match non-blank character Match many of them
![Page 86: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/86.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Even Cooler Regex Version
From [email protected] Sat Jan 5 09:14:16 2008
import re
lin = 'From [email protected] Sat Jan 5 09:14:16 2008'
y = re.findall('^From .*@([^ ]*)',lin)
print(y)
['uct.ac.za']
'^From .*@([^ ]*)'
Stop 'extracting'
![Page 87: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/87.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Escape Character
• If you want a special regular expression character to just behave normally (most of the time) you prefix it with '\'
>>> import re >>> x = 'We just received $10.00 for cookies.' >>> y = re.findall('\$[0-9.]+',x) >>> print(y) ['$10.00'] \$[0-9.]+
A digit or period A real dollar sign
At least one or more
![Page 88: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/88.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Spam Confidence
import re
hand = open('mbox-short.txt')
numlist = list()
for line in hand:
line = line.rstrip()
stuff = re.findall('^X-DSPAM-Confidence: ([0-9.]+)', line)
if len(stuff) != 1 : continue
num = float(stuff[0])
numlist.append(num)
print('Maximum:', max(numlist))
python ds.py Maximum: 0.9907
![Page 89: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/89.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
group
89
These groups can be fetched using the match object’s group() method. The groups are addressable numerically in the order that they appear, from left to right, in the regular expression (starting with group 1):
The reason that the group numbering starts with group 1 is because group 0 is reserved to hold the entire match
\1 is equivalent to re.search(...).group(1)
![Page 90: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/90.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
r prefix
The r means that the string is to be treated as a raw
string, which means all escape codes will be ignored.
For an example:
'\n' will be treated as a newline character, while r'\n'
will be treated as the characters \ followed by n.
90
![Page 91: NotesInterpreter · 2019. 10. 3. · Created Date: 10/30/2018 5:06:56 PM](https://reader033.fdocuments.us/reader033/viewer/2022051901/5fefffc935dba74896646de4/html5/thumbnails/91.jpg)
© Dr.S.Gowrishankar, Dept. of CSE, Dr.AIT
Summary
• Regular expressions are a cryptic but powerful
language for matching strings and extracting
elements from those strings
• Regular expressions have special characters that
indicate intent