What You’ll Learn Today - BU Computer Science...9 17 What You Learned Today More encoding!...
Transcript of What You’ll Learn Today - BU Computer Science...9 17 What You Learned Today More encoding!...
1
1
Aaron Stevens16 February 2011
CS101 Lecture 12:Text Representation
2
What You’ll Learn Today
– How do computers store text information?– Why do some characters show up as �s
on my browser?
2
3
Binary Representations
Recall: a single bit can be either a 0 or a 1
What if you need to represent more than 2choices?
n bits can represent 2n possible combinations
4
Representing Text
There are finite number of characters torepresent, so list them all and assign each abinary pattern.
Character setA list of characters and the binary codes usedto represent each one.Computer manufacturers agreed to standardizein the early 1960s.
3
5
The ASCII Character Set
ASCII stands for American StandardCode for Information Interchange
ASCII originally used seven bits torepresent each character, allowing for128 unique characters
Later extended ASCII evolved so that alleight bits were used.
6
The ASCIICharacterSet
(7 bits)
4
7
ASCII Encoding
Example: Hello, world!H -> 72 -> 01001000e -> 101 -> 01100101l -> 108 -> 01101100l -> 108 -> 01101100o -> 111 -> 01101111, -> 44 -> 00101100 -> 32 -> 00100000w -> 119 -> 01110111o -> 111 -> 01101111r -> 114 -> 01110010l -> 108 -> 01101100d -> 100 -> 01100100! -> 33 -> 00100001
Encoding Algorithm:
For each character: Find it’s ASCII code.Convert to binary.
8
ASCII Decoding
01000010 01100101 00100000 01110100 01110010 0111010101100101 00100000 01110100 01101111 00100000 0111100101101111 01110101 01110010 00100000 01110011 0110001101101000 01101111 01101111 01101100
01000010 -> 0x42 -> B01100101 -> 0x65 -> e
Decoding Algorithm:
For each 8 bits: Convert Hex/decimal valueLookup ASCII symbol
5
9
ASCII Decoding01000010 -> 0x42 -> B01100101 -> 0x65 -> e00100000 -> 0x20 ->01110100 -> 0x74 -> t01110010 -> 0x72 -> r01110101 -> 0x75 -> u01100101 -> 0x65 -> e00100000 -> 0x20 ->01110100 -> 0x74 -> t01101111 -> 0x6f -> o00100000 -> 0x20 ->01111001 -> 0x79 -> y01101111 -> 0x6f -> o01110101 -> 0x75 -> u01110010 -> 0x72 -> r00100000 -> 0x20 ->01110011 -> 0x73 -> s01100011 -> 0x63 -> c01101000 -> 0x68 -> h01101111 -> 0x6f -> o01101111 -> 0x6f -> o01101100 -> 0x6c -> l
10
TheExtendedASCIICharacterSet
6
11
ASCII Art
12
Can't You Take a Joke? :-)
Carnegie Mellon professor Scott E. FahlmanProposed ASCII emoticons, Sept. 19, 1982.Source: http://www.wired.com/science/discoveries/news/2008/09/dayintech_0919
7
13
The Unicode Character Set
Extended ASCII is not enough for internationaluse.
Unicode uses 16 bits per characterHow many characters can UNICODErepresent?
Unicode is a superset of ASCII.The first 256 characters correspond exactly to
the extended ASCII character set
14
The Unicode Character Set
8
15
Unicode Character Distribution
16
9
17
What You Learned Today
More encoding!– Character Sets– ASCII– Unicode
18
Announcements and To Do List
–HW04 due Wednesday 2/16–Readings:
• Reed ch 5, pp 83-87, 89-90 (today)
– Quiz 2 on Friday 2/18• Covers lectures 6,7,9,10,11