Http://proglit.com/. bits and text BY SA byte (the size of a cell of addressable memory) 8 bits on...
-
Upload
lesley-morgan -
Category
Documents
-
view
221 -
download
0
Transcript of Http://proglit.com/. bits and text BY SA byte (the size of a cell of addressable memory) 8 bits on...
terabytepetabyte
exabytezettabyte (1021 bytes or 270 bytes)
(1018 bytes or 260 bytes)
(1015 bytes or 250 bytes)
(1012 bytes or 240 bytes)
character set(a mapping of characters to numbers)
ASCII(American Standard Code for
Information Interchange)128 characters
control character(signals an action response to the reader)
• LF (line feed)• CR (carriage return)• FF (form feed)• BEL (bell)
plain text (no formatting, only characters)
• no italics, underline, or bold• no fonts, font sizes, or colors• no margins, columns, or page breaks etc.
character(a unit of written language and notation)
glyph(an actual visual representation
of a character)
j j
character encoding(scheme for representing characters as bits)
ASCII = 1 byte per character
c a t
100 97 116
0x64 0x61 0x74
U+0000 – U+FFFF plane 0, BMP (Basic Multilingual Plane)U+10000 – U+1FFFF plane 1, SMP (Supplementary Multilingual Plane)U+20000 – U+2FFFF plane 2, SIP (Supplementary Ideographic Plane)U+30000 – U+DFFFF planes 3 to 13 currently unassignedU+E0000 – U+EFFFF plane 14, SSP (Supplementary Special-purpose Plane)U+F0000 – U+FFFFF plane 15, PUA (Private Use Area)U+100000 – U+10FFFF plane 16, PUA (Private Use Area)
UTF-32(4 bytes per character)
U+3FF01 0000_0000 0000_0011 1111_1111 0000_000100 03 FF 01
U+40077 0000_0000 0000_0100 0000_0000 0111_011100 04 00 77
U+0065 0000_0000 0000_0000 0000_0000 0110_010100 00 00 65
1101_10xx xxxx_xxxx 1101_11xx xxxx_xxxx
* (fixed) (plane) (character)
UTF-16(2 or 4 bytes per character)
U+3F010 1101_1000 1011_1100 1101_1100 0001_0000
U+10FF00 1101_1011 1111_1111 1101_1111 0000_0000
U+17711 1101_1000 0001_1101 1101_1111 0001_0001
UTF-16(2 or 4 bytes per character)
U+3F010 1101_1000 1011_1100 1101_1100 0001_0000D8 BC DC 10
U+10FF00 1101_1011 1111_1111 1101_1111 0000_0000 DB FF DF 00U+17711 1101_1000 0001_1101 1101_1111 0001_0001 D8 1D DF 11
surrogates: U+D800 to U+DFFF
UTF-8(1 to 4 bytes per character)
U+0000 – U+007F:0xxx_xxxx
U+0080 – U+07FF:110x_xxxx 10xx_xxxx
U+0800 – U+FFFF:1110_xxxx 10xx_xxxx 10xx_xxxx
U+10000 – U+10FFFF:1111_0xxx 10xx_xxxx 10xx_xxxx 10xx_xxxx
UTF-8(1 to 4 bytes per character)
U+0031:0011_0001
U+0700:1101_1100 1000_0000
U+86FF:1110_1000 1001_1011 1011_1111
U+50000:1111_0001 1001_0000 1000_0000 1000_0000
UTF-8(1- to 4-bytes per character)
U+0031: (valid) 0011_0001
U+0031: (invalid) 1111_0000 1000_0000 1000_0000 1011_0001