Kodar-lan - Text and editors Character encoding and UTF-8

6
Föreläsning - Character encoding UTF-8 (Tim Gremalm) Tim Gremalm Developer at Conmel Data AB Programming, Web, Databases, Windows/Linux, Servers, Network, TCP/IP, Robotics, Electronics [email protected] http://tim.gremalm.se åä

Transcript of Kodar-lan - Text and editors Character encoding and UTF-8

Page 1: Kodar-lan - Text and editors Character encoding and UTF-8

Föreläsning - Character encoding UTF-8 (Tim Gremalm)

Tim GremalmDeveloper at Conmel Data AB

Programming, Web, Databases, Windows/Linux, Servers, Network, TCP/IP, Robotics, Electronics

[email protected]://tim.gremalm.se

åä

Page 2: Kodar-lan - Text and editors Character encoding and UTF-8

Character-encodingASCII

● 7bit 0-127

● Too small

Föreläsning - Character encoding UTF-8 (Tim Gremalm)åä

Page 3: Kodar-lan - Text and editors Character encoding and UTF-8

Character-encodingISO8859-1

● 8bit 0-255

● Still too small

Föreläsning - Character encoding UTF-8 (Tim Gremalm)åä

Page 4: Kodar-lan - Text and editors Character encoding and UTF-8

Character-encodingUTF-8

● 8bit and variable bytes

● Backwards compatibility with ASCII

Föreläsning - Character encoding UTF-8 (Tim Gremalm)åä

Bits ofcode point

Firstcode point

Lastcode point

Bytes insequence

Byte 1 Byte 2 Byte 3 Byte 4 Byte 5 Byte 6

7 U+0000 U+007F 1 0xxxxxxx

11 U+0080 U+07FF 2 110xxxxx 10xxxxxx

16 U+0800 U+FFFF 3 1110xxxx 10xxxxxx 10xxxxxx

21 U+10000 U+1FFFFF 4 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

26 U+200000 U+3FFFFFF 5 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx

31 U+4000000 U+7FFFFFFF 6 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx

Page 5: Kodar-lan - Text and editors Character encoding and UTF-8

Jumping between character-encodings

● If you copy “åä” written in UTF-8 it will look like “åä” in ISO8859-1

Föreläsning - Character encoding UTF-8 (Tim Gremalm)åä

Page 6: Kodar-lan - Text and editors Character encoding and UTF-8

Lib iconv

● Convert between different character-encodings

Föreläsning - Character encoding UTF-8 (Tim Gremalm)åä