26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong) Surrogate Support in...
-
Upload
isabella-mccabe -
Category
Documents
-
view
216 -
download
2
Transcript of 26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong) Surrogate Support in...
![Page 1: 26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong) Surrogate Support in Microsoft Products Michael S. Kaplan Software Design Engineer.](https://reader033.fdocuments.us/reader033/viewer/2022051819/5515073b550346a87d8b465a/html5/thumbnails/1.jpg)
26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong)
Surrogate Support in Surrogate Support in Microsoft ProductsMicrosoft Products
Michael S. KaplanSoftware Design EngineerTrigeminal Software, Inc.
![Page 2: 26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong) Surrogate Support in Microsoft Products Michael S. Kaplan Software Design Engineer.](https://reader033.fdocuments.us/reader033/viewer/2022051819/5515073b550346a87d8b465a/html5/thumbnails/2.jpg)
26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong)
What are surrogates?What are surrogates?
"a coded character representation for a single abstract character that consists of a sequence of two code units, where the first unit of the pair is a high surrogate and the second is a low surrogate"
![Page 3: 26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong) Surrogate Support in Microsoft Products Michael S. Kaplan Software Design Engineer.](https://reader033.fdocuments.us/reader033/viewer/2022051819/5515073b550346a87d8b465a/html5/thumbnails/3.jpg)
26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong)
High/low surrogate?High/low surrogate?
High: U+D800 - U+DBFFLow: U+DC00 - U+DFFFTerminology:
– "surrogate pair" preferred over "surrogate character"
![Page 4: 26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong) Surrogate Support in Microsoft Products Michael S. Kaplan Software Design Engineer.](https://reader033.fdocuments.us/reader033/viewer/2022051819/5515073b550346a87d8b465a/html5/thumbnails/4.jpg)
26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong)
Conversion example #1Conversion example #1 Example #1:
– The first character in the Surrogate range (D800, DC00) as UTF-32:
1. D800: binary 1101100000000000 (lower ten bits: 0000000000)
2. DC00: binary 1101110000000000 (lower ten bits: 0000000000)
3. Concatenate 0000000000+0000000000 = x0000
4. Add x10000
Result: U+10000. This makes sense, since the first character in the Surrogate range follows immediately after the last character in the 16-bit Unicode range (U+FFFF)
![Page 5: 26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong) Surrogate Support in Microsoft Products Michael S. Kaplan Software Design Engineer.](https://reader033.fdocuments.us/reader033/viewer/2022051819/5515073b550346a87d8b465a/html5/thumbnails/5.jpg)
26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong)
Conversion example #2Conversion example #2 Example #2.
– You have a Unicode character such as U+2040A (a CJK character in Plane2) and wish to encode it in UTF-16
1. Subtract x10000 - Result: 1040A 2. Split into two ten-bit pieces: 0001000001 0000001010 3. Add 1101100000000000 (D800) to the high 10
bits piece (0001000001) - Result: 1101100001000001 (D841)
4. Add 1101110000000000 (DC00) to the low 10 bits piece (0000001010) - Result: 1101110000001010 (DC0A)
Your surrogate pair: D841, DC0A
![Page 6: 26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong) Surrogate Support in Microsoft Products Michael S. Kaplan Software Design Engineer.](https://reader033.fdocuments.us/reader033/viewer/2022051819/5515073b550346a87d8b465a/html5/thumbnails/6.jpg)
26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong)
UTF-8 conversionsUTF-8 conversions
Illegal conversions: six-byte UTF-8 (two surrogate code points of UTF-16, converted separately)
legal conversions: four-byte UTF-8 (one UTF-32 code point)
![Page 7: 26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong) Surrogate Support in Microsoft Products Michael S. Kaplan Software Design Engineer.](https://reader033.fdocuments.us/reader033/viewer/2022051819/5515073b550346a87d8b465a/html5/thumbnails/7.jpg)
26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong)
UTF-8 exampleUTF-8 example Unicode surrogate pair: aaaabbbbbbcccccc, zzzzyyyyyyxxxxxx
becomes incorrect UTF-8 total 6 bytes: 1110aaaa 10bbbbbb 10cccccc 1110zzzz 10yyyyyy 10xxxxxx
Instead, you should take a Unicode surrogate pair:
110110wwwwzzzzyy, 110111yyyyxxxxxx
and convert it to UTF-8 totaling 4 bytes (below, uuuuu is defined as = wwww+1):
11110uuu 10uuzzzz 10yyyyyy 10xxxxxx
![Page 8: 26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong) Surrogate Support in Microsoft Products Michael S. Kaplan Software Design Engineer.](https://reader033.fdocuments.us/reader033/viewer/2022051819/5515073b550346a87d8b465a/html5/thumbnails/8.jpg)
26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong)
Encoding choices for MSEncoding choices for MS UTF-16, mostly Occasionally UTF-8 Even more occasionally, UTF-32
REASONS: There was obviously an existing, well-tested set of APIs
that support UCS-2, which is a total subset of UTF-16. A completely new API set was not required. A move to UTF-32 would require twice as much space
for all characters. A move to UTF-8 would require even more than twice as
much space in many cases.
![Page 9: 26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong) Surrogate Support in Microsoft Products Michael S. Kaplan Software Design Engineer.](https://reader033.fdocuments.us/reader033/viewer/2022051819/5515073b550346a87d8b465a/html5/thumbnails/9.jpg)
26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong)
The products...The products...
Mostly the new generation of products:– Windows 2000/XP– Office XP (some support in Office 2000)
Most of these products supported Unicode already– a little bit of extra work needed for surrogate
pairs– usually just UTF-8 support needed
![Page 10: 26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong) Surrogate Support in Microsoft Products Michael S. Kaplan Software Design Engineer.](https://reader033.fdocuments.us/reader033/viewer/2022051819/5515073b550346a87d8b465a/html5/thumbnails/10.jpg)
26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong)
Windows 2000/XPWindows 2000/XP
Uniscribe/GDI+ support for renderingEach surrogate pair is a single graphemeAPIs like CharPrev/CharNext not changedExtensions to fallback fonts in XPFont CMAP extensions in XPLots of UTF-8 issues fixed in XPNo specific surrogate font/IME (yet)
![Page 11: 26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong) Surrogate Support in Microsoft Products Michael S. Kaplan Software Design Engineer.](https://reader033.fdocuments.us/reader033/viewer/2022051819/5515073b550346a87d8b465a/html5/thumbnails/11.jpg)
26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong)
Collation for Supplementary chacactersCollation for Supplementary chacacters
All Plane-1 (non-ideographic) characters sort after all the other non-ideographic scripts but before the ideographs.
All Plane 2 (ideographic) characters will be sorted after all the ideographs on the BMP.
All Plane 3-14 (currently not assigned) will be treated like any other unassigned characters. (includes plane 14 language tags)
All characters encoded in Plane 15-16 (private use) will be sorted after all other characters.
![Page 12: 26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong) Surrogate Support in Microsoft Products Michael S. Kaplan Software Design Engineer.](https://reader033.fdocuments.us/reader033/viewer/2022051819/5515073b550346a87d8b465a/html5/thumbnails/12.jpg)
26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong)
Other system componentsOther system components
MLangInternet ExplorerIIS 5.0/6.0
![Page 13: 26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong) Surrogate Support in Microsoft Products Michael S. Kaplan Software Design Engineer.](https://reader033.fdocuments.us/reader033/viewer/2022051819/5515073b550346a87d8b465a/html5/thumbnails/13.jpg)
26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong)
The downlevel storyThe downlevel story
No good support for Unicode, let along supplementary characters
Uniscribe/RichEdit does improve the downlevel story for display purposes, at least
Officially, no surrgoate support on Win9x
![Page 14: 26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong) Surrogate Support in Microsoft Products Michael S. Kaplan Software Design Engineer.](https://reader033.fdocuments.us/reader033/viewer/2022051819/5515073b550346a87d8b465a/html5/thumbnails/14.jpg)
26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong)
The Office suiteThe Office suite
WordFrontpageExcel/AccessOutlookRichEdit 4.0
![Page 15: 26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong) Surrogate Support in Microsoft Products Michael S. Kaplan Software Design Engineer.](https://reader033.fdocuments.us/reader033/viewer/2022051819/5515073b550346a87d8b465a/html5/thumbnails/15.jpg)
26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong)
Specific FeaturesSpecific Features
Insertion/Deletion of text - All Cursor movement - All Font linking/fallback - All (Word's is best) UTF-8 issues fixed - All Enhanced word breaking - All (Word/RichEdit) Vertical text - Word/PowerPoint/Publisher/RichEdit Direct entry (Alt+nnnnnn, hhhhh + Alt+x) -
Word/RichEdit
![Page 16: 26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong) Surrogate Support in Microsoft Products Michael S. Kaplan Software Design Engineer.](https://reader033.fdocuments.us/reader033/viewer/2022051819/5515073b550346a87d8b465a/html5/thumbnails/16.jpg)
26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong)
CHS/CHT/CHP OfficeCHS/CHT/CHP Office
The product and the langpacks support an extended Unicode IME that handles supplementary characters
An Extension B font is also included
![Page 17: 26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong) Surrogate Support in Microsoft Products Michael S. Kaplan Software Design Engineer.](https://reader033.fdocuments.us/reader033/viewer/2022051819/5515073b550346a87d8b465a/html5/thumbnails/17.jpg)
26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong)
Visual Studio[.NET]Visual Studio[.NET]
String class and globalization namespaceStringInfoGetTextElementEnumerator
– Handles supplementary characters– Also handles composite characters
GDI+IDE support
![Page 18: 26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong) Surrogate Support in Microsoft Products Michael S. Kaplan Software Design Engineer.](https://reader033.fdocuments.us/reader033/viewer/2022051819/5515073b550346a87d8b465a/html5/thumbnails/18.jpg)
26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong)
SQL ServerSQL Server
Past - no supportPresent - surrogate "safe" (neutral)Future - surrogate awaree
![Page 19: 26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong) Surrogate Support in Microsoft Products Michael S. Kaplan Software Design Engineer.](https://reader033.fdocuments.us/reader033/viewer/2022051819/5515073b550346a87d8b465a/html5/thumbnails/19.jpg)
26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong)
Items not supportedItems not supported
Character MapGraph 10Outlook 10 mail headersCollations for supplementary charactersFonts/IMEs
![Page 20: 26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong) Surrogate Support in Microsoft Products Michael S. Kaplan Software Design Engineer.](https://reader033.fdocuments.us/reader033/viewer/2022051819/5515073b550346a87d8b465a/html5/thumbnails/20.jpg)
26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong)
Questions?Questions?
![Page 21: 26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong) Surrogate Support in Microsoft Products Michael S. Kaplan Software Design Engineer.](https://reader033.fdocuments.us/reader033/viewer/2022051819/5515073b550346a87d8b465a/html5/thumbnails/21.jpg)
26 April 2001 Surrogate Support in Microsoft Products, IUC 18 (Hong Kong)
Surrogate Support in Microsoft Products