Kodar-lan - Text and editors Character encoding and UTF-8
-
Upload
tim-gremalm -
Category
Technology
-
view
187 -
download
0
Transcript of Kodar-lan - Text and editors Character encoding and UTF-8
![Page 1: Kodar-lan - Text and editors Character encoding and UTF-8](https://reader035.fdocuments.us/reader035/viewer/2022071814/55a6642c1a28abcc1b8b45b3/html5/thumbnails/1.jpg)
Föreläsning - Character encoding UTF-8 (Tim Gremalm)
Tim GremalmDeveloper at Conmel Data AB
Programming, Web, Databases, Windows/Linux, Servers, Network, TCP/IP, Robotics, Electronics
[email protected]://tim.gremalm.se
åä
![Page 2: Kodar-lan - Text and editors Character encoding and UTF-8](https://reader035.fdocuments.us/reader035/viewer/2022071814/55a6642c1a28abcc1b8b45b3/html5/thumbnails/2.jpg)
Character-encodingASCII
● 7bit 0-127
● Too small
Föreläsning - Character encoding UTF-8 (Tim Gremalm)åä
![Page 3: Kodar-lan - Text and editors Character encoding and UTF-8](https://reader035.fdocuments.us/reader035/viewer/2022071814/55a6642c1a28abcc1b8b45b3/html5/thumbnails/3.jpg)
Character-encodingISO8859-1
● 8bit 0-255
● Still too small
Föreläsning - Character encoding UTF-8 (Tim Gremalm)åä
![Page 4: Kodar-lan - Text and editors Character encoding and UTF-8](https://reader035.fdocuments.us/reader035/viewer/2022071814/55a6642c1a28abcc1b8b45b3/html5/thumbnails/4.jpg)
Character-encodingUTF-8
● 8bit and variable bytes
● Backwards compatibility with ASCII
Föreläsning - Character encoding UTF-8 (Tim Gremalm)åä
Bits ofcode point
Firstcode point
Lastcode point
Bytes insequence
Byte 1 Byte 2 Byte 3 Byte 4 Byte 5 Byte 6
7 U+0000 U+007F 1 0xxxxxxx
11 U+0080 U+07FF 2 110xxxxx 10xxxxxx
16 U+0800 U+FFFF 3 1110xxxx 10xxxxxx 10xxxxxx
21 U+10000 U+1FFFFF 4 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
26 U+200000 U+3FFFFFF 5 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
31 U+4000000 U+7FFFFFFF 6 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
![Page 5: Kodar-lan - Text and editors Character encoding and UTF-8](https://reader035.fdocuments.us/reader035/viewer/2022071814/55a6642c1a28abcc1b8b45b3/html5/thumbnails/5.jpg)
Jumping between character-encodings
● If you copy “åä” written in UTF-8 it will look like “åä” in ISO8859-1
Föreläsning - Character encoding UTF-8 (Tim Gremalm)åä
![Page 6: Kodar-lan - Text and editors Character encoding and UTF-8](https://reader035.fdocuments.us/reader035/viewer/2022071814/55a6642c1a28abcc1b8b45b3/html5/thumbnails/6.jpg)
Lib iconv
● Convert between different character-encodings
Föreläsning - Character encoding UTF-8 (Tim Gremalm)åä