1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from...
-
Upload
ruth-price -
Category
Documents
-
view
219 -
download
0
Transcript of 1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from...
![Page 1: 1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from some alphabet. What do we use strings for? 1) Text.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649eb15503460f94bb727b/html5/thumbnails/1.jpg)
1
Strings
CopyWrite D.Bockus
![Page 2: 1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from some alphabet. What do we use strings for? 1) Text.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649eb15503460f94bb727b/html5/thumbnails/2.jpg)
2
Strings
• Def: A string is a sequence (possibly empty) of symbols from some alphabet.
• What do we use strings for? 1) Text processing. Word processing.
2) Grammatical Structure of Languages.
3) Searching, String Sequences
![Page 3: 1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from some alphabet. What do we use strings for? 1) Text.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649eb15503460f94bb727b/html5/thumbnails/3.jpg)
3
String Example 1• E.g. Java's "for" statement. (simplistic view)
for (initialization ; condition; increment) u v w
– Where a “for” statement breaks down into ‘for (u;v;w)’.
– We can then define each part:• u » identifier = constant
• v » identifier relational_operator value
• w » identifier++
• In this context we can also define a while loop as:– while(v)
• Deterministic Context Free Languages (programming languages) are defined by breaking rules down into sub-rules, etc.
![Page 4: 1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from some alphabet. What do we use strings for? 1) Text.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649eb15503460f94bb727b/html5/thumbnails/4.jpg)
4
Strings Example 2
– Genetic Coding:
– aab cd aab d
Searching for, and matching codes, leads into graph theory. a b b b a a a b b a a c c c d d d a b f g a b f g b bd d
s1s2 s3 s4
![Page 5: 1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from some alphabet. What do we use strings for? 1) Text.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649eb15503460f94bb727b/html5/thumbnails/5.jpg)
5
String Example 3
• Compression• Converting a large volume of symbols into a
smaller format.– Huffman Coding
– LZW compression.
![Page 6: 1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from some alphabet. What do we use strings for? 1) Text.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649eb15503460f94bb727b/html5/thumbnails/6.jpg)
6
Basics
Given a string v: The length of v can be expressed:
1) |v| = magnitude of v
2) length (v).
– Empty strings v = ' ' or v =
• There are 5 common operations that may be performed on strings.– Insertion, Deletion, Substitution, Concatenation,
Comparison.
![Page 7: 1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from some alphabet. What do we use strings for? 1) Text.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649eb15503460f94bb727b/html5/thumbnails/7.jpg)
7
Insertion & Deletion• Insertion
k = ac where
a = (a1, a2 .. am)
c = (c1, c2 .. cn)
insert b = (b1, b2 ... bp) between ac
k = abc
= a1, a2 … am, b1, b2 … bp, c1, c2 … cn
|k| = m + p + n
• Deletion
k = abc
delete c
k = ab
![Page 8: 1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from some alphabet. What do we use strings for? 1) Text.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649eb15503460f94bb727b/html5/thumbnails/8.jpg)
8
Substitutionk = u where & maybe null, i.e. || = 0 or || = 0
Search for u & replace with v.
k = v Notice this same operation can be accomplished
with a deletion and insertion.k = u
Delete uk =
Insert vk = v
Note: |u| does not have to equal |v|; |k| before does not have to equal |k| after.
![Page 9: 1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from some alphabet. What do we use strings for? 1) Text.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649eb15503460f94bb727b/html5/thumbnails/9.jpg)
9
Concatination
This is the joining of 2 strings a & b.c = a + b
So if a = (a1, a2 .. am) & b = (b1, b2 ... bn)
Then c = (a1, a2 .. am, b1, b2 ... bn)
– Note: concatenation may be performed with insertion, i.e. insert b at end of a, or substitution.
• a where is null.
• substitute for b.
|c| = m + n
![Page 10: 1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from some alphabet. What do we use strings for? 1) Text.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649eb15503460f94bb727b/html5/thumbnails/10.jpg)
10
Comparison
– Compare a & b to see if one of the following is true.1) a < b
2) a = b
3) a > b
• 1) a is less then b if a lexicographical comparison is performed on each element of a & b.
• Until the first ak < bk is true.
a b
a1 b1 a1 = b1
a2 b2 a2 = b2
a3 b3 a3 < b3
a4 b4 a4 = b4
b5 (a5 = ) < b5
a3 < b3So, a < b
![Page 11: 1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from some alphabet. What do we use strings for? 1) Text.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649eb15503460f94bb727b/html5/thumbnails/11.jpg)
11
Comparison Cont...
– Note: a3 < b3 is the first instance where an element in a differs from b.
a < b.
– If a3 = b3 then a is still less then b because |a| < |b|. Can think of having a value of - for comparison purposes.
• 2). For a = b the following must be true.• |a| = |b|
– and • ak = bk k
• 3). a > b, opposite of (1).– Or b < a
![Page 12: 1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from some alphabet. What do we use strings for? 1) Text.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649eb15503460f94bb727b/html5/thumbnails/12.jpg)
12
String Representations
• Consider the string "L1 CMPR BANANAS WATERMELLONS 12”
• There are 6 ways to represent strings in storage noting that 3 criteria must be kept in mind
– Storage Efficiency (1:1 packing ratio)
– Ease of Lookup (Searching)
– Ease of Modification• Insertion
• Deletion
![Page 13: 1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from some alphabet. What do we use strings for? 1) Text.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649eb15503460f94bb727b/html5/thumbnails/13.jpg)
13
Fixed Length Strings
L 1C M P RB A N A N A SW A T E R M E L O N S1 2
– Adv: Ease of Modification
– Dis: Storage Efficiency due of wasted space at end of short strings.
![Page 14: 1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from some alphabet. What do we use strings for? 1) Text.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649eb15503460f94bb727b/html5/thumbnails/14.jpg)
14
Var Strings
– Adv: Easier to look up strings, we already have the length.
– Dis: Still wastes space.
2 L 14 C M P R7 B A N A N A S
11 W A T E R M E L O N S2 1 2
![Page 15: 1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from some alphabet. What do we use strings for? 1) Text.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649eb15503460f94bb727b/html5/thumbnails/15.jpg)
15
Count Delimited
– Adv: Very efficient in space usage, Lookup is not bad.
– Dis: Modification is hard , Replacing a string must be same length or readjustment of array is needed.
02 L1 04 CMPR 07 BANANAS 11 WATERMELLONS 02 12
![Page 16: 1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from some alphabet. What do we use strings for? 1) Text.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649eb15503460f94bb727b/html5/thumbnails/16.jpg)
16
Indexed List
– Adv: Good Storage and Search capabilities
– Dis: Modification is poor
Strategies include: always adding new strings and never reclaiming space except during a repack.
1 2 3 4 5 ...2 4 7 11 21 3 7 14 25
L1 CMPR BANANAS WATERMELLONS 12
![Page 17: 1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from some alphabet. What do we use strings for? 1) Text.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649eb15503460f94bb727b/html5/thumbnails/17.jpg)
17
Linked List
– Adv: Modification is simple pointer manipulation.
– Dis: Storage overhead. • one address per character
– Note: Lookup can be improved by adding additional length field to table or by imploring a hash function.
Linked List
2 1
4 2
7 3
11 4
2 5
L 1
C M P R
![Page 18: 1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from some alphabet. What do we use strings for? 1) Text.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649eb15503460f94bb727b/html5/thumbnails/18.jpg)
18
Blocked Linked List
– Adv: Better storage then linked list. • More characters per node
– Note: A trade off between dealing with single characters and blocks of characters during modification.
• Note: If modification is not required then methods such as indexed lists are quite useful. Applications include symbol tables in compilers.
![Page 19: 1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from some alphabet. What do we use strings for? 1) Text.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649eb15503460f94bb727b/html5/thumbnails/19.jpg)
19
Implementation
– In most cases a variable length string structure is desirable. i.e. the most versatile.
– Consider a string type as:
String {int size;char data[];
}– Java declares string objects with methods to determine
length and other attributes.
– Declaring Variables:
String S1, S2
![Page 20: 1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from some alphabet. What do we use strings for? 1) Text.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649eb15503460f94bb727b/html5/thumbnails/20.jpg)
20
Basic Functions
s.length(); -- Returns the length of S1
• Other Usefull functions– String s.concat(String t);– String new String(s);– String s.substring(int i);– int indexOf(String t, int index);
See Java api.
![Page 21: 1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from some alphabet. What do we use strings for? 1) Text.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649eb15503460f94bb727b/html5/thumbnails/21.jpg)
21
Variable Length Coding
Old TreeChar Prob. Bits (MRC) Bits (Fixed) (pi)lg(pi) pi(bits)
a 0.15 3 3 -0.41 0.45e 0.25 2 3 -0.50 0.5i 0.13 3 3 -0.38 0.39
m 0.09 4 3 -0.31 0.36s 0.15 2 3 -0.41 0.3z 0.02 4 3 -0.11 0.08t 0.09 4 3 -0.31 0.36r 0.12 4 3 -0.37 0.48
H(U) 2.81Ave. Redundency
Fixed 3 0.06MRC 2.92 0.04
![Page 22: 1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from some alphabet. What do we use strings for? 1) Text.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649eb15503460f94bb727b/html5/thumbnails/22.jpg)
22
Huffman Coding Algorithm
1) Collect a history of the frequencies of the characters i.e. determine the probabilities.
2) Arrange the characters in an ordered list (priority queue) based on increasing probabilities (frequency)
3) While (More then 1 node in List) Do i) Remove first 2 Nodes
ii) Combine into a tree and have the tree root represent the sum of the frequencies of the children
iii) Insert into List maintaining proper List order
![Page 23: 1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from some alphabet. What do we use strings for? 1) Text.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649eb15503460f94bb727b/html5/thumbnails/23.jpg)
23
Variable Length Coding New TreeBased on new tree
Char Prob. Bits (MRC) Bits (Fixed) (pi)lg(pi) pi(bits)a 0.15 3 3 -0.41 0.45e 0.25 2 3 -0.50 0.5i 0.13 3 3 -0.38 0.39
m 0.09 3 3 -0.31 0.27s 0.15 3 3 -0.41 0.45z 0.02 4 3 -0.11 0.08t 0.09 4 3 -0.31 0.36r 0.12 3 3 -0.37 0.36
H(U) 2.81Ave. Redundency
Fixed 3 0.06MRC 2.86 0.02
![Page 24: 1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from some alphabet. What do we use strings for? 1) Text.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649eb15503460f94bb727b/html5/thumbnails/24.jpg)
24
LZW Compression
• Lempel-Ziv Welch (LZW)• Uses a method of finding the largest known prefix
in a character string. • Typical uses.
– LossLess
– Compressed file can be reconstructed without data loss• GIF, TIFF
• zip & unzip
![Page 25: 1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from some alphabet. What do we use strings for? 1) Text.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649eb15503460f94bb727b/html5/thumbnails/25.jpg)
25
LZW Compression
• Idea is to build a code table, where codes are added as they are discovered.
• Look at the prefix for a given character.
![Page 26: 1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from some alphabet. What do we use strings for? 1) Text.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649eb15503460f94bb727b/html5/thumbnails/26.jpg)
26
Compressor Pseudo Code http://marknelson.us/1989/10/01/lzw-data-compression/
STRING = get input characterWHILE there are still input characters DO CHARACTER = get input character IF STRING+CHARACTER is in the string table then STRING = STRING+character ELSE output the code for STRING add STRING+CHARACTER to the string table STRING = CHARACTER END of IF
END of WHILE
![Page 27: 1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from some alphabet. What do we use strings for? 1) Text.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649eb15503460f94bb727b/html5/thumbnails/27.jpg)
27
DeCompressor Pseudo Codehttp://marknelson.us/1989/10/01/lzw-data-compression/
Read OLD_CODEoutput OLD_CODECHARACTER = OLD_CODEWHILE there are still input characters DO Read NEW_CODE IF NEW_CODE is not in the translation table THEN STRING = get translation of OLD_CODE STRING = STRING+CHARACTER ELSE STRING = get translation of NEW_CODE END of IF output STRING CHARACTER = first character in STRING add OLD_CODE + CHARACTER to the translation table OLD_CODE = NEW_CODEEND of WHILE
![Page 28: 1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from some alphabet. What do we use strings for? 1) Text.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649eb15503460f94bb727b/html5/thumbnails/28.jpg)
28
Compressor Example
• Assume we have an alphabet of a and b.
• We start by building a code book initialized to all characters in the alphabet, in this case a and b.
• We can now compress the string:a a a b b b b b b a a b a a b a
Code String
2
3
4
5
6
7
8
9
0
1
a
b
![Page 29: 1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from some alphabet. What do we use strings for? 1) Text.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649eb15503460f94bb727b/html5/thumbnails/29.jpg)
29
Compressor Example …
a a a b b b b b b a a b a a b a Code String
2
3
4
5
6
7
8
9
0
1
a
b
Output Code
0
Find largest prefix in code book
a a
Add code + next char to code bookFind largest prefix in code book
2
Add code + next char to code booka a b
Find largest prefix in code book
1
Add code + next char to code book b bFind largest prefix in code book
4
Add code + next char to code bookb b b
Find largest prefix in code book
5
Add code + next char to code book
b b b a
Find largest prefix in code book
3
a a b a
Add code + next char to code bookFind largest prefix in code book
7
No more input to compress so stop
![Page 30: 1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from some alphabet. What do we use strings for? 1) Text.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649eb15503460f94bb727b/html5/thumbnails/30.jpg)
30
De-compressor Example
• We have an encoded string. 0 2 1 4 5 3 7
• To decode we need two things,– knowledge of the alphabet.
– An initialized code book based on the alphabet.
• Headers on say GIF files contain the alphabet information.
• The code book is re-build during de-compression
Code String
2
3
4
5
6
7
8
9
0
1
a
b
![Page 31: 1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from some alphabet. What do we use strings for? 1) Text.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649eb15503460f94bb727b/html5/thumbnails/31.jpg)
31
De-compressor Example..
• During De-compression a code is read and an attempt is made to find this code in the code book.
• There are two cases:– The code is found in the code book.– The code is not found in the code book.
• Code found:– output the string from found code.– make an entry based on:
previous string + firstChar of current string.
• Not found:– make an entry into the code book based on:
previous string + firstChar of previous string.– output the string of new entry.
![Page 32: 1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from some alphabet. What do we use strings for? 1) Text.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649eb15503460f94bb727b/html5/thumbnails/32.jpg)
32
De-compressor Example...
• Notice that a code which is not found is a special case:
• E.g. during compression of a a a b b b ….– a is coded to 0, but the compressor now enters aa into
the code book.
– aa is the next code to be used.
– During de-compression, we can guess at this code.
– Text(previous) + FC(previous).
![Page 33: 1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from some alphabet. What do we use strings for? 1) Text.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649eb15503460f94bb727b/html5/thumbnails/33.jpg)
33
De-compressor Example….
• More formally:– We encounter a string P[…]P[…]PQ.– If P[…] is in the code book and P[…]P is not, then the
compressor outputs P[…] and adds P[…]P to the code book.
– When the de-compressor sees P[…]P it will not of added this code yet.
– We know from the pattern that P[…] is already in the code book and it was the last code encountered, and that P[…]P would normally be added next (during compression).
– So…. We can accurately guess and enter P[…]P into the code book.
• Taken from: http://www.danbbs.dk/~dino/whirlgif/lzw.html
![Page 34: 1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from some alphabet. What do we use strings for? 1) Text.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649eb15503460f94bb727b/html5/thumbnails/34.jpg)
34
De-compressor Example….
Code String
2
3
4
5
6
7
8
9
0
1
a
b
0 2 1 4 5 3 7
Output Text
a a
Found - No code book entry is made for first code
a
Not Found - Enter Text(previous) + FC(Previous).Output last code entered into code book
a a
Found - Enter Text(previous) + FC(current).
b
a a bNot Found - Enter Text(previous) + FC(Previous).b b
Output last code entered into code book
b b
Not Found - Enter Text(previous) + FC(Previous).
b b b
Output last code entered into code book
b b b
Found - Enter Text(previous) + FC(current).
a a b
b b b a
Not Found - Enter Text(previous) + FC(Previous).
a a b a
Output last code entered into code book
a a b a
No more code to de-compress - STOP
![Page 35: 1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from some alphabet. What do we use strings for? 1) Text.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649eb15503460f94bb727b/html5/thumbnails/35.jpg)
35
Links
http://www.cs.sfu.ca/cs/CC/365/li/squeeze/
Squeeze Page - Applets dealing with compression Algorithms
http://www.geocities.com/yccheok/lzw/lzw.html
![Page 36: 1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from some alphabet. What do we use strings for? 1) Text.](https://reader036.fdocuments.us/reader036/viewer/2022081513/56649eb15503460f94bb727b/html5/thumbnails/36.jpg)
36
Finite State Machine for KMP Pattern 1010110
011
0 1 2 3 4 5 6 7
0
0
0
0
1
1
1 0111 00