COMPUTATION WITH STRINGS 1 DAY 2 - 8/27/14 LING 3820 & 6820 Natural Language Processing Harry Howard...
-
Upload
derek-howard -
Category
Documents
-
view
215 -
download
0
Transcript of COMPUTATION WITH STRINGS 1 DAY 2 - 8/27/14 LING 3820 & 6820 Natural Language Processing Harry Howard...
![Page 1: COMPUTATION WITH STRINGS 1 DAY 2 - 8/27/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f535503460f94c77cf0/html5/thumbnails/1.jpg)
Computation with strings 1Day 2 - 8/27/14LING 3820 & 6820
Natural Language Processing
Harry Howard
Tulane University
![Page 2: COMPUTATION WITH STRINGS 1 DAY 2 - 8/27/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f535503460f94c77cf0/html5/thumbnails/2.jpg)
Course organization
27-Aug-2014NLP, Prof. Howard, Tulane University
2
http://www.tulane.edu/~howard/LING3820/
The syllabus is coming. http://www.tulane.edu/~howard/CompCu
ltEN/ Is there anyone here that wasn't here on
Monday?
![Page 3: COMPUTATION WITH STRINGS 1 DAY 2 - 8/27/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f535503460f94c77cf0/html5/thumbnails/3.jpg)
Can anyoone NOT get Spyder to do this?
Installation of Python
27-Aug-2014
3
NLP, Prof. Howard, Tulane University
![Page 4: COMPUTATION WITH STRINGS 1 DAY 2 - 8/27/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f535503460f94c77cf0/html5/thumbnails/4.jpg)
Test
>>> 237 + 9075 9312 Be sure to try the other arithmetic operators, subtraction (-), multiplication (*), and division (/). Does division work the way you expect?After you have tired of playing with math, play with some text:>>> word = 'msinairatnemhsilbatsesiditna' >>> 'anti' in word False >>> 'itna' in word True
27-Aug-2014NLP, Prof. Howard, Tulane University
4
![Page 5: COMPUTATION WITH STRINGS 1 DAY 2 - 8/27/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f535503460f94c77cf0/html5/thumbnails/5.jpg)
A string is a sequence of characters delimited between single or double quotes.
§3. Computation with strings
27-Aug-2014
5
NLP, Prof. Howard, Tulane University
![Page 6: COMPUTATION WITH STRINGS 1 DAY 2 - 8/27/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f535503460f94c77cf0/html5/thumbnails/6.jpg)
Examples
1. >>> monty = 'Monty Python' 2. >>> monty 3. 'Monty Python' 4. >>> doublemonty = "Monty Python" 5. >>> doublemonty 6. 'Monty Python' 7. >>> circus = 'Monty Python's Flying Circus' 8. File "<stdin>", line 1 circus = 'Monty Python's Flying
Circus'9. ^ SyntaxError: invalid syntax 10. >>> circus = "Monty Python's Flying Circus" 11. >>> circus 12. "Monty Python's Flying Circus" 13. >>> circus = 'Monty Python\'s Flying Circus' 14. >>> circus 15. "Monty Python's Flying Circus"
27-Aug-2014NLP, Prof. Howard, Tulane University
6
![Page 7: COMPUTATION WITH STRINGS 1 DAY 2 - 8/27/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f535503460f94c77cf0/html5/thumbnails/7.jpg)
The + and * operators
A new string can be formed by combination or concatenation of two strings with + or repeating a string a number of times with *. Unfortunately, a character cannot be deleted with –:
1. >>> S = 'balloon' 2. >>> S+'!' 3. >>> S+!4. >>> 'M'+S 5. >>> S*2 6. >>> S+'!'*2 7. >>> (S+'!')*2 8. >>> S-'n'9. >>> S+210.>>> S+'2'
27-Aug-2014NLP, Prof. Howard, Tulane University
7
![Page 8: COMPUTATION WITH STRINGS 1 DAY 2 - 8/27/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f535503460f94c77cf0/html5/thumbnails/8.jpg)
Some string methods
Python supplies several methods that can be applied to strings to perform tasks. Some of them are illustrated below. The input code is given, without the corresponding output. It is up to you to type them in to see what they do:
1. >>> len(S) 2. >>> len(S+'!') 3. >>> len(S*2) 4. >>> sorted(S) 5. >>> len(sorted(S)) 6. >>> set(S) 7. >>> sorted(set(S)) 8. >>> len(set(S))
27-Aug-2014NLP, Prof. Howard, Tulane University
8
![Page 9: COMPUTATION WITH STRINGS 1 DAY 2 - 8/27/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f535503460f94c77cf0/html5/thumbnails/9.jpg)
Tokens vs. types
set(S) produces the set of characters in the string. One useful property of sets is that they do not
contain duplicate elements. The process of removing repetitions performed by set() touches on a fundamental concept in language computation, that of the distinction between a token and a type.
A representation in which repetitions are allowed is said to consist of tokens, while one in which there are no repetitions is said to consist of types.
Thus set() converts the tokens of a string into types. There is one type of 'o' in 'balloon', but two tokens of 'o'.
27-Aug-2014NLP, Prof. Howard, Tulane University
9
![Page 10: COMPUTATION WITH STRINGS 1 DAY 2 - 8/27/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f535503460f94c77cf0/html5/thumbnails/10.jpg)
Method notation
The material aggregated to a method in parentheses is called its argument(s).
In the examples above, the argument S can be thought of linguistically as the object of a noun: the length of S, the alphabetical sorting of S, the set of S. But what if two pieces of information are needed for a method to work, for instance, to count the number of o’s in otolaryngologist?
To do so, Python allows for information to be prefixed to a method with a dot:
>>> S.count('o') The example can be read as “in S, count the o’s”, with the
argument being the substring to be counted, 'o', and the attribute being the string over which the count progresses, or more generally:
attribute.method(argument) What can be attribute and argument varies from method to
method and so has to be memorized.
27-Aug-2014NLP, Prof. Howard, Tulane University
10
![Page 11: COMPUTATION WITH STRINGS 1 DAY 2 - 8/27/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f535503460f94c77cf0/html5/thumbnails/11.jpg)
Cleaning up a string
There is a group of methods for modifying the properties of a string, illustrated below. You can guess what they do from their names:
>>> S = 'i lOvE yOu' >>> S >>> S.lower() >>> S.upper() >>> S.swapcase() >>> S.capitalize() >>> S.title() >>> S.replace('O','o') >>> S.strip('i') >>> S2 = ' '+S+' ' >>> S2 >>> S2.strip()
27-Aug-2014NLP, Prof. Howard, Tulane University
11
![Page 12: COMPUTATION WITH STRINGS 1 DAY 2 - 8/27/14 LING 3820 & 6820 Natural Language Processing Harry Howard Tulane University.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649f535503460f94c77cf0/html5/thumbnails/12.jpg)
3.3. Finding your way around a stringI will try to send you some practice for what we have done today.
Next time