Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007)...
Transcript of Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007)...
![Page 1: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization](https://reader030.fdocuments.us/reader030/viewer/2022041103/5f022cbc7e708231d402f14f/html5/thumbnails/1.jpg)
U n i v e r s a l A c c e p t a n c e S t e e r i n g G r o u p
IntroductiontoUniversalAcceptanceMarkSvancarekandLuisaVilla
![Page 2: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization](https://reader030.fdocuments.us/reader030/viewer/2022041103/5f022cbc7e708231d402f14f/html5/thumbnails/2.jpg)
IntroductiontoUniversalAcceptance(UASG007)
Version8-5May2016 2
AboutThisDocumentPurpose
TheInternet’stechnologies,includingitsnamingcomponents,areundercontinualevolutionandchange.In
recentyears,agreatnumberofnewTLDswithASCIIcharactersandIDNtop-leveldomainshavebeenreleased
by ICANN.Examples include.nyc,.hsbc,.eco,and.. However, the response to the change in the naming
landscapehasnotbeen fast enough.Manyapplicationsand
services are not being updated to manage new TLDs. This
affectstheuserexperience.Forexample:
• Validemailaddressesarenotbeingaccepted
• Domain names are mistakenly treated as search
termsintheaddressbarofthebrowser.
Unlesssoftwarerecognizesandcanprocessthenewdomains,astateknownasUniversalAcceptance,itwillnotbepossibletoprovideaconsistentandpositiveexperienceforInternetusers.Thisdocument,therefore,
providesabroadintroductiontoUniversalAcceptancetoassistinthedevelopmentofUniversalAcceptance-
readysoftware.
TargetAudience
• SoftwareDevelopers
• ChiefTechnicalOfficers(CTOs)
• Thetechnicalcommunityingeneral
DocumentStructure
Part1
BaselineconceptsofUniversalAcceptancesuchaswhatisaDomainNameandtheDomain
NameSystem,ASCIIandUnicode,Punycode,EmailAddressInternationalization,andother
basicconcepts.
Part2 The fivecriteriaofUniversalAcceptanceaswellas thegoodpractices foreachof thesecriteria. Also contains user scenarios and nonconformance practices to UniversalAcceptance,technicalrequirementsandcurrentchallenges.
Part3 Advanced topics such as right-to-left scripts, the Bidi algorithm,Normalization and Case
Folding.
Part4 Containstheglossaryandusefulonlineresources.
Needmoreinformation?
The UASG and the community are available to provide advice to software developers and
implementersonwhatisneeded.
• Contactustoshareyourideasandsuggestionsonthetopicatinfo@uasg.tech
• JointheUniversalAcceptancediscussionathttp://tinyurl.com/ua-discuss
• Tolearnmoreabouttheeffort,visithttp://www.icann.org/universalacceptance
Manyapplicationsandservicesarenotbeingupdatedto
managethesenewTLDs.Thisaffectstheuserexperience.
![Page 3: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization](https://reader030.fdocuments.us/reader030/viewer/2022041103/5f022cbc7e708231d402f14f/html5/thumbnails/3.jpg)
IntroductiontoUniversalAcceptance(UASG007)
Version8-5May2016 3
Contents
Introduction.......................................................................................................................................................5
ABriefHistoryofDomainNameInternationalization...................................................................................5
TheNeedforUniversalAcceptance...............................................................................................................5
Part1:BaselineConceptsofUniversalAcceptance...........................................................................................6
DomainName............................................................................................................................................6
DomainNameSystem(DNS).....................................................................................................................6
TopLevelDomains(TLDs)..........................................................................................................................6
GenericTopLevelDomains(gTLDs)..........................................................................................................7
CharacterSetsandScripts.........................................................................................................................7
ASCIIandUnicode.....................................................................................................................................7
InternationalizedDomainNames(IDNs)andPunycode...........................................................................8
Email..........................................................................................................................................................9
AddressesandEmailAddressInternationalization(EAI)...........................................................................9
DynamicLinkGeneration(Linkification)..................................................................................................10
Part2:UniversalAcceptanceinAction............................................................................................................11
FiveCriteriaofUniversalAcceptance..........................................................................................................11
UserScenarios.............................................................................................................................................12
NonconformancetoUniversalAcceptancePractices..................................................................................14
TechnicalRequirementsforUAReadiness......................................................................................................15
HighlevelRequirements..............................................................................................................................15
DeveloperConsiderations............................................................................................................................15
AGuidingPrincipleforAchievingUniversalAcceptance:Postel’sLaw...................................................16
GoodPracticesforDevelopingandUpdatingSoftwaretoAchieveUA-Readiness.................................16
AuthoritativeSourcesforDomainNames...................................................................................................22
DNSRootZone.........................................................................................................................................22
PublicSuffixList.......................................................................................................................................22
OtherChallenges..............................................................................................................................................23
General........................................................................................................................................................23
IDN-StyleEmailandWhyItIsNottheSameasEAI.....................................................................................23
LinkificationandItsChallenges....................................................................................................................24
Part3:AdvancedTopics...................................................................................................................................26
ComplexScripts...........................................................................................................................................26
RighttoLeftLanguagesandUnicodeConformance...............................................................................26
![Page 4: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization](https://reader030.fdocuments.us/reader030/viewer/2022041103/5f022cbc7e708231d402f14f/html5/thumbnails/4.jpg)
IntroductiontoUniversalAcceptance(UASG007)
Version8-5May2016 4
TheBidiAlgorithm...................................................................................................................................26
TheBidiRuleforDomainNames.............................................................................................................27
Joiners......................................................................................................................................................27
HomoglyphandConfusinglySimilarCharacters......................................................................................28
NormalizationandCaseFolding..................................................................................................................29
Normalization..........................................................................................................................................29
CaseFolding.............................................................................................................................................30
Part4:GlossaryandOtherResources..............................................................................................................32
Glossary.......................................................................................................................................................32
RFCs..............................................................................................................................................................34
KeyStandards..............................................................................................................................................36
OnlineResources.........................................................................................................................................37
Acknowledgements..........................................................................................................................................39
![Page 5: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization](https://reader030.fdocuments.us/reader030/viewer/2022041103/5f022cbc7e708231d402f14f/html5/thumbnails/5.jpg)
IntroductiontoUniversalAcceptance(UASG007)
Version8-5May2016 5
IntroductionABriefHistoryofDomainNameInternationalizationIn the 1970s, the characters available for registering domain names were limited to a subset of ASCIIcharacters(lettersa-z,digits0-9andthehyphen“-“).Sincetheearliest.comregistration,symbolics.com,in
1985, the number and characteristics of domain names have expanded to reflect the needs of the ever-
increasingglobaluseoftheInternetasacommunalresource.Today,themajorityofInternetusersarenon-
English speakers. However, the dominant language used on the Internet is English. To help with theinternationalization of the Internet, in 2003, the Internet
EngineeringTaskForce(IETF)startedreleasingstandardsproviding
technical guidelines for the deployment of InternationalizedDomainNames(IDN)throughatranslationmechanismtosupport
non-ASCII representations of domain names in geographically
diverselocalscripts(e.g., - . , ua-test. ,etc.).
The Board of Directors of the Internet Corporation for Assigned
NamesandNumbers (ICANN)approvedtheprocessto introduce
new IDN Country Code Top Domain Names (ccTLDs) in October
2009,withthefirstIDNccTLDsaddedtotherootzoneinMay2010.
InJune2011,theBoardapprovedandauthorizedthelaunchofthe
new Generic Top Level Domain (gTLD) program, which included
newASCII aswell as IDNTLDs. The first batchof TLDs from this
programwasadded to the root zone in2013.Theadditionof IDNccTLDsandnewTLDshasdramatically
increasedthepaceatwhichTLDsareaddedtotherootzone.
AdecadeaftertheIETFreleaseditsIDN-relatedguidelines,andthankstotheICANNNewTLDProgram,more
thanonethousandnewTLDshavenowbeenreleased.Inspiteofalltheseefforts,however,muchsoftware
and many applications are still not Universal Acceptance-ready. This causes problems to Internet users,
includingthosewhoselanguagesarewritteninscriptsthatincludenon-ASCIIcharacters.
TheNeedforUniversalAcceptanceTokeeppacewiththisnewTLDworld,newsoftwaremustbebuiltandoldsoftwareandapplicationsmustbe
updated.ThestateofsuccessfullycomplyingwiththisnewworldofTLDsiscalledUniversalAcceptance.
UniversalAcceptanceisthestatewhereallvaliddomainnamesandemailaddressesareaccepted,validated,stored,processedanddisplayedcorrectlyandconsistentlybyallInternet-enabledapplications,devicesandsystems. Inotherwords, every validwebaddress resolves to theexpectedwebsiteandevery validemail
addressdeliversmailtotheexpecteddestination.Duetotherapidlychangingdomainnamelandscape,many
systemsdonotrecognizeorappropriatelyprocessnewdomainnames,primarilybecausetheymaybeina
non-ASCIIformat,becausethesoftwareisnotawareofthenewlyreleasedTLD,orbecausethelengthoftheir
TLDvariesinlength.Thesameistrueforemailaddressesthatincorporatethesenewextensions.
TheUniversalAcceptanceSteeringGroup(UASG),acommunity-led,industry-wideinitiativethatissupported
by ICANN, isworkingoncreatingawareness, identifyingandresolvingproblemsassociatedwithUniversal
AcceptanceofDomainNamestohelpensureaconsistentandpositiveexperienceforInternetusersglobally.
UniversalAcceptanceisthestatewhereallvaliddomain
namesandemailaddressesareaccepted,validated,stored,processedanddisplayed
correctlyandconsistentlybyallInternet-enabledapplications,
devicesandsystems.
![Page 6: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization](https://reader030.fdocuments.us/reader030/viewer/2022041103/5f022cbc7e708231d402f14f/html5/thumbnails/6.jpg)
IntroductiontoUniversalAcceptance(UASG007)
Version8-5May2016 6
Part1:BaselineConceptsofUniversalAcceptanceThissectioncontainsanoverviewofthebasictermsandconceptsnecessarytounderstandbeforereading
themoreadvancedsectionsofthisdocument.
DomainNameAdomainnameisadottedtextstringusedasahuman-friendlytechnicalidentifierforcomputers
andnetworksontheInternet.Forexample:
www.domain.tld
Howtoreadadomainname:
• EachdotrepresentsalevelintheDomainNameSystem(DNS)hierarchy.
• ATop-LevelDomain(TLD)isoftencalledthesuffixattheendofadomainname.
• Theindividualwordsorcharactersbetweenthedotsarecalledlabels.Forthoselanguagesorscriptsthatarewrittenfromlefttoright(LTR),1thelabelfurthestrightrepresentsthetop-leveldomain.
• Thesecondlabelfromtheendrepresentsthesecond-leveldomain.
• Any labels thatcomebefore thesecond-leveldomainareconsideredsubdomainsof thesecond-leveldomain(sometimescalledthird-leveldomains).
DomainNameSystem(DNS)EachresourceontheInternetisassignedanaddresstobeusedbytheInternetProtocol(IP).Since
IP addresses are difficult to remember, the DNS provides amapping between IP addresses and
human-readable domain names. Servers collectively providing a public DNS exist at well-known
addressesontheInternet.
TopLevelDomains(TLDs)Humanreadabledomainnamesaremanagedbyorganizationsknownasregistries.Whenadomain
name is registered, it consists ofmultiple text strings representingmultiple domain levels, each
separatedbya“.”character.InLTRscripts,theright-mostdomainlevelisthetop-leveldomain(TLD).
SomeTLDsaredelegated to specific countriesor territories.ThesearecalledCountryCodeTLDs(ccTDs).
1Languagesorscriptswrittenfromrighttoleft(RTL)willbediscussedlaterinthisdocument.
![Page 7: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization](https://reader030.fdocuments.us/reader030/viewer/2022041103/5f022cbc7e708231d402f14f/html5/thumbnails/7.jpg)
IntroductiontoUniversalAcceptance(UASG007)
Version8-5May2016 7
GenericTopLevelDomains(gTLDs)Starting in 2013, ICANN (the organization responsible for the creation and maintenance of TLD
assignments) has approved the creation of a large number of new TLDs. These new TLDs can
represent brands, communities of interest, geographic communities (cities, regions) and more
genericconcepts.Collectively,allofthesenewTLDsareknownasGenericTopLevelDomains(gTLDs).
CharacterSetsandScriptsLanguagesarewrittenusingwritingsystems.Mostwritingsystemsuseonescript,whichisasetof
graphiccharactersusedforthewrittenformofoneormorelanguages.Asmallnumberofwriting
systemsemploymorethanonescriptatthesametime.Thesecharactersorscriptscanberecognized
byhumans.However,theyarenotusefultocomputers. Instead,acomputerneedsascripttobe
encodedinawaythatitcanprocess(forexample,toresolveawebaddress).Themechanismforthis
iscalledacharactermappingorcodedcharacterset(CCS),oracodepage.2Acharactermapping
associatescharacterswithspecificnumbers.Manydifferentcodepageshavebeencreatedovertime
fordifferentpurposes,butforthistopicwewillfocusononlytwo:ASCIIandUnicode.
ASCIIandUnicodeIntheexamplesofTLDsabove,allofthetextstringsarerepresentedusingtheLatincharacterset.
ThischaractersetisincludedintheAmericanStandardCodeforInformationInterchange(ASCII,or
US-ASCII) character-encoding scheme. ASCII is an older encoding scheme andwas based on the
Englishlanguage.Forhistoricalreasons,itbecamethestandardcharacterencodingschemeonthe
Internet.ASCIIusesonly7bitspercharacter,whichlimitsthesetto128characters,notallofwhich
canbeusedindomainnames.DomainnamesarelimitedtothecharactersA-Z,thenumbers0-9,
andhyphen“-“.
2TherearesubtletiestothetermsthatarenotdirectlyrelevanttothetopicofUniversalAcceptance.Ifyou
are interested in more information about the terminology, a useful starting point is:
https://tools.ietf.org/html/rfc6365
ExamplesofcommonTLDs ExamplesofccTLDs ExamplesofnewgTLDs
.com
.gov
.info
.org
China=.cn Germany=.deUnitedStates=.us
.app
.lawyer
.shopping
.panasonic
.osaka
![Page 8: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization](https://reader030.fdocuments.us/reader030/viewer/2022041103/5f022cbc7e708231d402f14f/html5/thumbnails/8.jpg)
IntroductiontoUniversalAcceptance(UASG007)
Version8-5May2016 8
ASCII-ISO8859-1(Latin-1)Table3
BecausemostwritingsystemsdonotusetheLatincharacterset,alternateencodingshavealsobeen
adopted.Unicode,alsoknownastheUniversalCodedCharacterSet(UCS),iscapableofencodingmorethan1millioncharacters.EachoftheseUnicodecharactersisacalledacodepoint.Themost
commonformofUnicodeiscalledUniversalCodedCharacterSetTransformFormat8-bit(UTF-8).
ToseeallUnicodecharactercodecharts,goto:http://unicode.org/charts
InternationalizedDomainNames(IDNs)andPunycodeTheuseofUnicodeenablesdomainnamestocontainnon-ASCIIcharacters.Asnotedearlierinthis
document,domainnamesthatusenon-ASCIIcharactersarecalledInternationalizedDomainNames
(IDNs).4Theinternationalizedportionofadomainnamecanbeinanylevel–notjusttheTLDbut
alsotheotherlabels.
SincetheDNSitselfpreviouslyonlyusedASCII,5itwasnecessarytocreateanadditionalencodingto
allownon-ASCIIUnicodecodepointstobeconvertedintoASCIIstrings,andviceversa.Thealgorithm
thatimplementsthisUnicode-to-ASCIIencodingiscalledPunycode;theoutputstringsarecalledA-Labels.A-LabelscanbedistinguishedfromanordinaryASCIIlabelbecausetheyalwaysstartwiththe
followingfourcharacters:
• xn--
ThesecharactersarecalledtheACEprefix.6
ThePunycode transformation is reversible: it can transformfromUnicode toanA-Labelandalso
fromanA-labelbacktoUnicode(knownasaU-Label).
TheonlyRFC-defined7useofthePunycodealgorithmis forexpressing internationalizeddomains.
However, rather than implement Unicode, some developers choose to apply Punycode to other
scenarios.
3Source:CaliforniaStateUniversity.1997.ASCII-ISO8859-1(Latin-1)TablewithHTMLEntityNames.http://web.calstatela.edu/faculty/jchen13/Docs/CS120/Lectures/ASCIITable_with_HTML_Entity_Names.ht
m4Notethatnoteverynon-ASCIIcharacterisanIDN.
5Forcurrentstatus,seehttp://tools.ietf.org/html/rfc6055#section-3
6ASCIICompatibleEncoding(ACE)prefixisusedtodistinguishPunycode-encodedlabelsfromordinaryASCII
labels.7RFC:RequestforComments.SeetheGlossaryofterminPart4ofthisdocumentformoreinformation.
![Page 9: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization](https://reader030.fdocuments.us/reader030/viewer/2022041103/5f022cbc7e708231d402f14f/html5/thumbnails/9.jpg)
IntroductiontoUniversalAcceptance(UASG007)
Version8-5May2016 9
AddressesandEmailAddressInternationalization(EAI)Emailaddressescontaintwoparts:
1. Alocalpart(theusername,beforethe“@” character)
2. Adomain(afterthe“@”character)
ThedomainpartcancontainanyTLD,includinganewTLD.BothportionsmaybeUnicodeU-labels.
NOTE:Anadditionalformat,IDN-StyleEmailAddresses,willbediscussedbelow.
EmailAddressInternationalization(EAI)requirestheuseofUnicodeinallpartsoftheemailaddress.
EachoftheexamplesabovecouldbeexpressedasEAI,andthisisthepreferredformat.
Examplesof(imaginary)IDNs
example. (Punycode encoding = example.xn--q9jyb4c)
.info (Punycode encoding = xn--uesx7b.info)
. � (Punycode encoding = xn--q9jyb4c.xn--uesx7b)
Tolearnmore,seetheIDNFAQ:http://unicode.org/faq/idn.html
Examplesof(imaginary)EmailAddressesincludingIDNs
user@example.
user@ .info
@example.lawyer
(UsesinternationalizedTLD)
(Usesinternationalized2ndleveldomain)
(UsesinternationalizedusernameandnewgTLD)
![Page 10: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization](https://reader030.fdocuments.us/reader030/viewer/2022041103/5f022cbc7e708231d402f14f/html5/thumbnails/10.jpg)
IntroductiontoUniversalAcceptance(UASG007)
Version8-5May2016 10
DynamicLinkGeneration(Linkification)Modernsoftware,suchaspopularwordprocessingorspreadsheetapplications,sometimesallowsa
usertocreateahyperlinksimplybytypinginastringthatlookslikeawebaddress,emailaddressor
networkpath.Forexample,typing“www.icann.org”intoanemailmessagemayresultinaclickable
linktohttp://www.icann.org beingautomaticallycreatediftheapptreats“www.”asaspecial
prefixor“.org”asaspecialsuffix.
Linkificationshouldworkconsistentlyforallwell-formedwebaddresses,emailaddressesornetwork
paths.
![Page 11: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization](https://reader030.fdocuments.us/reader030/viewer/2022041103/5f022cbc7e708231d402f14f/html5/thumbnails/11.jpg)
IntroductiontoUniversalAcceptance(UASG007)
Version8-5May2016 11
Part2:UniversalAcceptanceinActionFiveCriteriaofUniversalAcceptanceAs described in the section, Universal Acceptance is the state where all valid domain names and email
addressesareaccepted,validated,stored,processedanddisplayedcorrectlyandconsistentlybyallInternet-enabledapplications,devicesandsystems.Thesefivecriteriaaredescribedbelow.
1.Accept8
Acceptingoccurswheneveranemailaddressoradomainnameisreceivedasastringofcharactersfromauserinterface,fromafile,orfromanAPIusedbyasoftwareapplicationoronlineservice.
Applicationsandservicesallowdomainnamesandemailaddressestobe:
• Enteredintouserinterfaces,AND/OR • ReceivedfromotherapplicationsandservicesviaAPIs
2.Validate9
Validationmayoccur inmanyplaceswheneveranemailaddressoradomainnameiseitherreceivedoremittedasastringofcharactersbyanapplicationoronlineservice.
Validationisintendedtoensurethattheenteredinformationiseithervalidorat
least definitely not invalid. In other words, validation ensures the syntax
correctnessofthegiveninformation.
Fordomainnamesandemailaddresses,manyprogrammershavebeenusingsome
heuristics (for example, checking that a TLD has the “correct” number of
characters,orthatthecharactersarefromtheASCIIcharacterset).However,these
heuristicsarenolongerapplicablebecausetheInternetischanging:
• DomainnamesandemailaddressescannowincludeUnicode(non-ASCII)
characters
• ThelistofTLDsisgrowing
• ATLDcanbeupto63characterslong
3.Store
The Storage process occurswhenever an email address or a domain name isstoredasastringofcharactersinadatabaseorfileusedbyasoftwareapplicationoronlineservice.
Applications and services might require long-term and/or transient storage of
domainnamesandemailaddresses.Regardlessofthelifetimeofthedata,itmust
bestoredin:
• RFC-definedformats,OR
• AlternateformatsthatcanbeeasilytranslatedtoandfromRFC-defined
formats(thisismuchlessdesirable)
8AcceptingistreatedasdistinctfromValidatinginthisdocument.Inpractice,theabilitiesmayoverlap.
9AcceptingandProcessingaretreatedasdistinctfromValidatinginthisdocument.Inpractice,theabilities
mayoverlap.
![Page 12: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization](https://reader030.fdocuments.us/reader030/viewer/2022041103/5f022cbc7e708231d402f14f/html5/thumbnails/12.jpg)
IntroductiontoUniversalAcceptance(UASG007)
Version8-5May2016 12
Although RFCs require the use ofUTF-8, other formatsmay be encountered in
legacycode.Seethe“GoodPractices”sectionbelow.
4.Process10
Processingoccurswheneveranemailaddressoradomainnameisusedbyanapplicationorservicetoperformanactivity(forexample,searchingorsortingalist),ortransformedintoanalternateformat(suchasstoringASCIIasUnicode).
Processingmeansusingdomainnamesandemailstringsinafeature.Additional
validationmayoccurduringprocessing.Thereisnolimittothenumberofways
thatdomainnamesandemailaddressescouldbeprocessed(examples:“Identify
allthepeopleassociatedwithNewZealandbecausetheyhaveanamewitha.nzccTLD”; “Identify all the pharmacists because they have a
[email protected] email address”; “Identify firewalls that might filter
DNSrequeststhatdon’tapplytotheirpolicies”).
5.Display
TheDisplay process occurs whenever an email address or a domain name isrenderedwithinauserinterface.
Displayingdomainnamesandemailaddressesisusuallystraightforwardwhenthe
scriptsusedaresupportedintheunderlyingoperatingsystemandwhenthestrings
are stored in Unicode. If these conditions are not met, application-specific
transformationsmayberequired.
UserScenariosThe examples and definitions above may give the impression that Universal Acceptance is only about
computersystemsandonlineservices.Thereality,however, is that it’salsoabout thepeopleusing those
systemsandservices.
BelowaresomeexamplesofactivitiesthatrequireUniversalAcceptance:
RegisteringanewTLD
An organization adopts a “brand” TLD to offer its customers a differentiated
customerexperiencebyprovidingemailaddressesintheformat,[email protected].
UniversalAcceptancemeans:
• Web apps accept these new “@example.brand” email addresses as
validastheywouldwithTLDssuchas.com,.net,.org.
AccessingagTLD Auseraccessesawebsite,whosedomainnamecontainsanewTLD,bytypingan
addressintoabrowserorclickingalinkinadocument.
UniversalAcceptancemeans:
• EventhoughtheTLDisnew,anybrowsertheuserwishestousedisplays
the web address in its native form and accesses the site as the user
10ProcessingistreatedasdistinctfromValidatinginthisdocument.Inpractice,theabilitiesmayoverlap.
![Page 13: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization](https://reader030.fdocuments.us/reader030/viewer/2022041103/5f022cbc7e708231d402f14f/html5/thumbnails/13.jpg)
IntroductiontoUniversalAcceptance(UASG007)
Version8-5May2016 13
expects.ThebrowserdoesnotdisplayPunycodedtexttotheuserunless
itbenefitstheuserinsomeway.
UsinganemailaddresscontaininganewgTLDasanonlineidentity
AuseracquiresanemailaddresswiththedomainportionusinganewgTLD,and
usesthisemailaddressastheiridentityforaccessingtheirbankandairlineloyalty
accounts.
UniversalAcceptancemeans:
• Eventhoughthedomainusedintheemailaddress isnew,thebankor
airline siteaccepts theaddressexactlyas if itwereanestablishedTLD
suchas.bizor.eu.
AccessinganIDN
AuseraccessesanIDNURL,bytypinganaddressintoabrowserorclickingalink
inadocument.
UniversalAcceptancemeans:
• Evenifthedomainnamecontainscharactersdifferentthanthelanguage
settingsontheuser’scomputer,anybrowsertheuserwishestousewill
displaythewebaddressasexpectedandaccessthesitesuccessfully.
Usinganinternationalizedemailaddressforemail
A user has acquiredmultiple email addresses, some are internationalized (e.g.
īnfo@ - . ).
UniversalAcceptancemeans:
• Theusercansendtoandreceivefromanyemailaddressandusingany
emailclient.
Usinganinternationalizedemailaddressasanonlineidentity
AuseracquiresanEAIemailaddress,andusesthisemailaddressastheiridentity
foraccessingtheirbankandairlineloyaltyaccounts.
UniversalAcceptancemeans:
• ThebankorairlinesiteacceptstheEAIidentityexactlyasifitwereany
otheremailidentity.
DynamicallycreatingaHyperlinkinanApplication
Ausertypesawebaddressintoadocumentoremailmessage.
UniversalAcceptancemeans:
• Therulesusedbytheapplicationtoautomaticallygenerateahyperlink
arethesameeveniftheaddressisanEAIorcontainsanewTLD.
DevelopinganApplication
Adeveloperwritesanappthataccesseswebresources.
UniversalAcceptancemeans:
• ThetoolsusedbythedevelopersincludelibrariesthatenableUniversal
AcceptancebysupportingUnicode,IDNsandEAI.
![Page 14: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization](https://reader030.fdocuments.us/reader030/viewer/2022041103/5f022cbc7e708231d402f14f/html5/thumbnails/14.jpg)
IntroductiontoUniversalAcceptance(UASG007)
Version8-5May2016 14
NonconformancetoUniversalAcceptancePracticesThefollowingareconsideredtobepoorpractice:
✖DisplayingPunycodedtexttotheuserwithoutacorrespondinguserbenefit.
Forexample,toshowthemappingbetweenaU-labelandaA-label.
✖RequiringausertoenterPunycodedtextwhensigningupforanewemailaddressorrequiringauser
toenterPunycodedtextwhensigningupforanewhosteddomain.
✖Validatingthesyntaxofdomainnameoremailaddressusingoutofdatecriteriaornon-authoritative
onlinedomainnameresources.
✖ UsinganoutdatedlistofTLDseventhoughnewTLDsareregularlybeingadded.
✖ ExposinginternaluseofPunycodedtexttousers.
Forexample,convertingfromEAItoanIDN-styleemailaddresswhenreplyingtoanEAIuser.
✖ Treatingsomedomainnamesassearchtermsratherthanasdomainnamesbecausetheapplication
doesnotrecognizethemassuch.
✖ SettingspamblockerstoautomaticallyblockentireTLDs.
![Page 15: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization](https://reader030.fdocuments.us/reader030/viewer/2022041103/5f022cbc7e708231d402f14f/html5/thumbnails/15.jpg)
IntroductiontoUniversalAcceptance(UASG007)
Version8-5May2016 15
TechnicalRequirementsforUAReadinessHighlevelRequirementsAnapplicationorservicethatsupportsuniversalacceptance(UA):
1. Supportsalldomainnamesregardlessoflengthorcharacterset.
SeeRFC5892.
2. Allowsmultiplecharactersetsthatarevalidfordomainnamesandemailaddresses.
Thatis,permitsUnicodecodepoints.
3. CancorrectlyrenderallcodepointsinUnicodestrings.
SeeRFC3490.
4. Cancorrectlyrenderright-to-left(RTL)stringssuchasthoseinArabicandHebrew.
ForinformationaboutRTLscripts,seeRFC5893.
5. CancommunicatedatabetweenapplicationsandservicesinformatsthatsupportUnicodeandareconvertibleto/fromUTF-8.
ForinformationaboutUTF-8,seeRFC3629.
6. OfferspublicAPIsthatsupportUnicode&UTF-8.
7. OffersprivateAPIsthatsupportUnicode&UTF-8.
PrivateAPIsapplyonlytointer-servicecallsbythesamevendor.
8. StoresuserdatainformatsthatsupportUnicodeandisconvertibleto/fromUTF-8.
Suchconversionswouldbevisibleonlytotheproduct/serviceowner.
9. Supports all domain name strings in the authoritative ICANN TLD list and the community-servedPublicSuffixListregardlessoflengthorcharacterset.
Seehttps://newgtlds.icann.org/en/program-status/delegated-strings.
10. Cansendemailtoandreceivefromrecipientsregardlessofdomainnameorcharacterset.
SeeRFC6530.
11. TreatsEAIaddressesthesamewayastheirPunycodedequivalents(IDNemailformat).
DeveloperConsiderationsSincemanyexistingsoftwaresystemscontainhardcodedassumptionsaboutdomainsandemailaddresses,
codechangesmayberequiredtorecognizeIDNsandnewTLDs.Thissectiondiscusseshowdeveloperscan
incorporatecodechangesthatwillenableUniversalAcceptanceofallnewTLDs.
![Page 16: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization](https://reader030.fdocuments.us/reader030/viewer/2022041103/5f022cbc7e708231d402f14f/html5/thumbnails/16.jpg)
IntroductiontoUniversalAcceptance(UASG007)
Version8-5May2016 16
AGuidingPrincipleforAchievingUniversalAcceptance:Postel’sLawIn RFC 793, Jon Postel formulated the Robustness Principle, now known as Postel's Law, as animplementation guideline for the then-new TCP. In computing, the Robustness Principle is a general
designguidelineforsoftware:
"Beconservativeinwhatyoudo,beliberalinwhatyouacceptfromothers."
Inotherwords,beconservativeinwhatyousendandbeliberalinwhatyouaccept.Thisisalsoagood
approach when dealing with the vagaries of Universal Acceptance currently implemented in the
ecosystem.
GoodPracticesforDevelopingandUpdatingSoftwaretoAchieveUA-Readiness
Accept
✔
AlwaysofferUnicodeequivalents.
Usersshouldbeallowed,butnotrequired,toenterASCIICompatibleEncoded(or“Punycoded”)
text in place of its Unicode equivalent. However, Unicode should be shown by default, with
Punycodedtextonlyshowntotheuseronlywhenitprovidesabenefit.
! Don’tgenerateIDN-Styleemailaddresses,butdobeabletohandlethemifpresentedbysomeone
else’ssoftware.
✔
Anyuserinterfaceelementrequiringausertotypeadomainnameoremailaddressmustsupport
Unicode,labelsupto63characters,anddomainnamestringsupto253characters.
• SeeRFC1035.
Validate
✔
Validateonlytotheminimumextendnecessary.
Validateonly if it is required for theoperationof theapplicationor service. This is themost
reliablewaytoensurethatallvaliddomainnamesareacceptedintoyoursystems.
✔ Recognizethatsyntacticallycorrectinputsmaynotrepresentdomainnamesoremailaddresses
currentlyinuseontheInternet.
!
Ifyoumustvalidate,considerthefollowing:
• VerifytheTLDportionofadomainnameagainstanauthoritativetable.Examplesofsome
authoritativetablesthatyoucanuseare:
o http://www.internic.net/domain/root.zone
o http://www.dns.icann.org/services/authoritative-dns/index.html
o http://data.iana.org/TLD/tlds-alpha-by-domain.txt
Seealso:https://www.icann.org/en/system/files/files/sac-070-en.pdf
• QuerythedomainnameagainsttheDNS
o ConsiderusingtheGETDNSAPI(http://getdnsapi.net/)
• Requirerepeatedentryofanemailaddresstoprecludetypingerrors
![Page 17: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization](https://reader030.fdocuments.us/reader030/viewer/2022041103/5f022cbc7e708231d402f14f/html5/thumbnails/17.jpg)
IntroductiontoUniversalAcceptance(UASG007)
Version8-5May2016 17
• ValidatethecharactersinlabelsonlytotheextentofdeterminingthattheU-labeldoes
notcontain"DISALLOWED"codepointsorcodepointsthatareunassignedinitsversion
ofUnicode
o SeeRFC5892
• Limitvalidationoflabelsitselftoasmallnumberofwhole-labelrulesdefinedintheRFCs
o SeeRFC5894
• IfastringresemblingadomainnamecontainstheArabicfullstopcharacter“۔”(U+06D4),or the ideographic full stop character “ ” (U+002E), convert it to the full stop “.”
(U+002E).
• Doensurethattheproductorfeaturehandlesnumberscorrectly
o For example: ASCII numerals and Asian ideographic number representations
shouldallbetreatedasnumbers
Store
✔ ApplicationsandservicesshouldsupporttheappropriateUnicodestandards.
✔
InformationshouldbestoredintheUTF-8(UnicodeTransformationFormat)wheneverpossible.
SomesystemsmayrequiresupportforUTF-16aswell,butgenerallyUTF-8ispreferred.UTF-7and
UTF-32shouldbeavoided.
!
Consider all end-to-end scenarios before converting A-Labels (Punycode) to U-Labels and vice
versawhenstoring.
ItmaybedesirabletomaintainonlyU-Labelsinafileordatabase,becauseitsimplifiessearching
and sorting.However, conversionmayhave implicationswhen interoperatingwitholder, non-
Unicode-enabledapplicationsandservices.Considerstoringandindexingbothformats.
✔
Clearlymarkemailaddressesanddomainnamesduringstorageforeasieraccess.
Instanceswhereemailaddressesanddomainnameshavebeenfiledunderthe“author”fieldofa
documentor“contactinfo”inalogfilehaveledtothelossoftheoriginaladdress.
✔ Ifyoudon’tstoreinUnicode,youmustbeabletomatchstringsinmultipleformats.
Forexample,asearchforexample. shouldalsofindexample.xn--q9jyb4c.
Process
✔ EnsureallserverresponseshaveUnicodespecifiedinthecontenttype.
✔
SpecifyUnicodeinthewebserverhttpheaderanddirectlyinawebfile.
• EverywebfileshouldincludetheUTF-8charset
• Itisimportanttoensurethattheencodingisspecifiedoneveryresponse
![Page 18: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization](https://reader030.fdocuments.us/reader030/viewer/2022041103/5f022cbc7e708231d402f14f/html5/thumbnails/18.jpg)
IntroductiontoUniversalAcceptance(UASG007)
Version8-5May2016 18
!
Consider all end-to-end scenarios before converting A-Labels (Punycode) to U-Labels and vice
versaduringprocessing.
ItmaybedesirabletomaintainonlyU-Labelsinafileordatabaseasitsimplifiessearchingand
sorting. However, conversion may have implications when interoperating with older, non-
Unicode-enabledapplicationsandservices.Considerstoringinbothformats.
✔ Ensure that the product or feature handles sort order, searches, and collation according to
locale/languagespecifications,andthatitaddressesmultilanguagesearchingandsorting.
✖
Don’tuseURL-encodingfordomainnames:
• example. iscorrect
• example.%E3%81%BF%E3%82%93%E3%81%AA isnotcorrect
✔
SincetheUnicodestandardiscontinuallyexpanding,codepointsnotdefinedwhentheapplication
orservicewascreatedshouldbecheckedtoensuretheywillnot“break”theuserexperience.
Missing fonts in the underlying operating system may result in non-displayable characters
(frequentlythe“o”characterisusedtorepresentthese),butthissituationshouldnotresultina
fatalcrash.
✔ UsesupportedUnicode-enabledAPIs.
✔
Use the latest Internationalized Domain Names in Applications (IDNA) Protocol and Tables
documentsforIDNs:
• RFC5891
• RFC5892
✔ ProcessinUTF-8formatwhereverpossible.
✔
Upgradeapplicationsandservers/servicestogether.
IftheserverisUnicodeandclientisnon-Unicode,orviceversa,thedatawillneedtobeconvertedtoeachcodepageeverytimethedatatravelsbetweenserverandclient.
✔ Performcodereviewstoavoidbufferoverflowattacks.
Whendoingcharactertransformation,textstringsmaygroworshrinksubstantially.
Display
✔
DisplayallUnicodecodepointsthataresupportedbytheunderlyingoperatingsystem.
Ifanapplicationmaintainsitsownfontsets,comprehensiveUnicodesupportshouldbeofferedto
thecollectionoffontsavailablefromtheoperatingsystem.
✔ Whendevelopinganapporaserviceconsiderthelanguagessupportedandmakesureoperating
systemsandapplicationscoverthoselanguages.
✔ Convertnon-UnicodedatatoUnicodebeforedisplay.
![Page 19: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization](https://reader030.fdocuments.us/reader030/viewer/2022041103/5f022cbc7e708231d402f14f/html5/thumbnails/19.jpg)
IntroductiontoUniversalAcceptance(UASG007)
Version8-5May2016 19
For example, the end user should see “example. ” as opposed to “example.xn--q9jyb4c”. (ThisconversionisanexampleofUA-readyprocessing).
✔ DisplayUnicodebydefault.
DisplayPunycodedtexttotheuseronlywhenitprovidesabenefit.
!
Beawarethatmixed-scriptaddresseswillbecomemorecommon.
• Some Unicode characters may look the same to the human eye, but different to
computers
• Don’t assume that mixed-script strings are intended for malicious purposes, such as
phishing
• Iftheuserinterfacecallsthestringstotheuser’sattention,besurethatitdoessoina
waywhichisnotprejudicialtousersofnon-Latinscripts
LearnmoreaboutUnicodeSecurityConsiderationsat:http://unicode.org/reports/tr36
✔ UseUnicodeIDNACompatibilityProcessinginordertomatchuserexpectations.
Tolearnmore,goto:http://unicode.org/reports/tr46
✔ Beawareofunassignedanddisallowedcharactersfordomainnames.
• SeeRFC5892
Unicode
✔ UsesupportedUnicode-enabledAPIs.
✖
Don’tbuildyourownAPIsfor:
• Stringformatconversions
• Determiningwhichscriptcomprisesastring
• Determiningifastringcontainsamixofscripts
• Unicodenormalization/decomposition
✖
Don’tuseUTF-7orUTF-32.
• UTF-7 is generallynotusedasanative representationwithinapplicationsas it is very
awkwardtoprocess.DespiteitssizeadvantageoverthecombinationofUTF-8witheither
quoted-printableorbase64,theInternetMailConsortiumrecommendsagainstitsuse.
• ThemaindisadvantageofUTF-32isthatitisspaceinefficient,usingfourbytespercode
point.Non-BMPcharactersaresorareinmosttexts[citationneeded],theymayaswell
beconsiderednon-existentforsizingissues,makingUTF-32uptotwicethesizeofUTF-
16anduptofourtimesthesizeofUTF-8.
✔ UseUnicodeincookiessotheycanbereadcorrectlybyapplications.
✔ UseIDNA2008ProtocolandTablesdocuments:
• RFC5891
![Page 20: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization](https://reader030.fdocuments.us/reader030/viewer/2022041103/5f022cbc7e708231d402f14f/html5/thumbnails/20.jpg)
IntroductiontoUniversalAcceptance(UASG007)
Version8-5May2016 20
• RFC5892
✖ Don’tuseIDNA2003;innearlyallcasesithasbeensupersededbyIDNA2008.
✖ DonotautomaticallyassumethatexternalAPIscanconsumedatathathasbeenNFKC11converted.
!
MaintainIDNAandUnicodetablesthatareconsistentwithregardtoversions.
For example, unless the application actually executes the classification rules in the Tables
document (RFC 5892), its IDNA tables must be derived from the version of Unicode that is
supportedmoregenerallyonthesystem.Aswithregistration,thetablesdonotneedtoreflect
thelatestversionofUnicode,buttheymustbeconsistent.
! ValidatethecharactersinlabelsonlytotheextentofdeterminingthattheU-labeldoesnotcontain
“DISALLOWED”12codepointsorcodepointsthatareunassignedinitsversionofUnicode.
✔
Limitvalidationoflabelsitselftoasmallnumberofwhole-labelrules:
• Noleadingcombiningmarks
• Bidirectionalconditionsaremetifright-to-leftcharactersappear
• Any contextual rules that are associated with joiner characters (and CONTEXTJ13
charactersmoregenerally)aretested
!
Don’tuseUTF-16exceptwhereitisexplicitlyrequired(asincertainWindowsAPIs).
WhenusingUTF-16,notethat16bitscanonlycontaintherangeofcharactersfrom0x0to0xFFFF,
andadditionalcomplexityisusedtostorevaluesabovethisrange(0x10000to0x10FFFF).Thisis
doneusingpairsofcodeunitsknownassurrogates.Ifhandlingofsurrogatepairsisnotthoroughly
tested,itmayleadtotrickybugsandpotentialsecurityholes.
Linkification
✔ Ifastringresemblingadomainnamecontainstheideographicfullstopcharacter“ ”(U+3002),
doacceptitandtransformitto“.”.
General
✔ Useauthoritativeresourcestovalidatedomainnames.
Donotmakeheuristicassumptions,suchas“allTLDsare2,3,4,or6charactersinlength”.
11NFKC(NormalizationFormCompatibilityComposition):Charactersaredecomposedbycompatibility,then
recomposedbycanonicalequivalence.See:http://unicode.org/reports/tr15
12DISALLOWED:CodepointsthatshouldnotbeincludedinIDNs.See:https://tools.ietf.org/html/rfc5892
13CONTEXTJ:ContextualRuleforJoincontrols.See:https://tools.ietf.org/html/rfc5892
![Page 21: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization](https://reader030.fdocuments.us/reader030/viewer/2022041103/5f022cbc7e708231d402f14f/html5/thumbnails/21.jpg)
IntroductiontoUniversalAcceptance(UASG007)
Version8-5May2016 21
✔
Ensurethattheproductorfeaturehandlesnumberscorrectly.
Forexample,ASCIInumeralsandAsianideographicnumberrepresentationsshouldallbetreated
asnumbers.
!
Lookformailaddressesinunexpectedplaces:
• Artist/author/photographer/copyrightmetadata
• Fontmetadata
• DNScontactrecords
• Binaryversioninformation
• Supportinformation
• OEMcontactinformation
• Registration,feedback,andotherforms
! LookforpotentialIRI
14pathsinunexpectedplaces:
• Single-labelmachinenamesregardlessofloadedsystemcodepage
• Fully-qualifiedmachinenamesregardlessofloadedsystemcodepage
✔ UseGB18030(China)forChineselanguagesupport15inadditiontoUTF-8.
!
Restrictthecodepointsallowedwhengeneratingnewdomainnamesandemailaddresses:
All products that use email addressesmust accept internationalized email addresses, allowing
characters>U+007f.Thatis,nocharacters>U+007faredisallowed.However,anapporservice
neednotallowallofthesecharacterswhenausercreatesanewIDNoremailaddress.Useonly
thislistofallowedcharactersforIDNs:http://unicode.org/reports/tr36/idn-chars.txt
PreventingcertainIDNsoremailaddressesfrombeingcreatedinthefirstplacecanmitigatesome
likelysecurityandaccessibilityconcerns.(NOTE:Postel’sLawwouldstillrequiresoftwaretoaccept
suchstringsifpresented.)
!
BeawarethatUniversalAcceptancecannotalwaysbemeasuredthroughautomatedtestcases
alone.
Forexample,testinghowanapporprotocolhandlesnetworkresourcemaynotalwaysbepossible
and sometimes it is best to verify the compliance through functional spec review and design
review.
!
Don’tautomaticallyassumethatbecauseacomponentdoesnotdirectlycallname-resolutionAPIs,
ordirectlyuseemailaddresses,itdoesnotmeanthattheydonotaffectit.
Understandhownetworknamesareobtainedbythecomponent; it isnotalwaysthroughuser
interaction.Thefollowingaresomeexamplesonhowthecomponentcangetanetworkname:
• Grouppolicy
• LDAPquery
• Configurationfiles
14IRI:InternationalizedResourceIdentifiers.See:https://www.ietf.org/rfc/rfc3987.txt
15GB18030-2000 is a Chinese government standard that specifies an extended codepage for use in the
Chinesemarket.See:http://icu-project.org/docs/papers/unicode-gb18030-faq.html
![Page 22: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization](https://reader030.fdocuments.us/reader030/viewer/2022041103/5f022cbc7e708231d402f14f/html5/thumbnails/22.jpg)
IntroductiontoUniversalAcceptance(UASG007)
Version8-5May2016 22
• WindowsRegistry
• Transferredto/fromanothercomponent/feature
✔
Performcodereviewstoavoidbufferoverflowattacks.
• InUnicode,stringsmayexpandincasing:Fluß→FLUSS→fluss
• Whendoingcharacterconversion,textmaygroworshrinksubstantially
AuthoritativeSourcesforDomainNamesDNSRootZoneTherearea fewoptions for theauthoritative listofTLDs.The firstoption is theDNSrootzone itself. It is
DNSSEC-signed,sothelistisproperlyauthenticated.Youcanobtaintherootzonefromanyofthefollowing
links:
• http://www.internic.net/domain/root.zone
• http://www.dns.icann.org/services/authoritative-dns/index.html
• http://data.iana.org/TLD/tlds-alpha-by-domain.txt
PublicSuffixListThePublicSuffixList (PSL),managedbyvolunteersof theMozillaFoundation,providesanaccurate listof
domainnamesuffixes.ThislistisasetofDNSnamesorwildcardsconcatenatedwithdotsandencodedusing
UTF-8.IfyouneedtousethePSLasanauthoritativesourcefordomainnames,yoursoftwaremustregularly
receivePSLupdates.DonotbakestaticcopiesofthePSLintoyoursoftwarewithnoupdatemechanism.You
canusethelinkbelowtomakeyourappdownloadanupdatedlistperiodically.Thelistgetsupdatedonceper
dayfromGithub:
• https://publicsuffix.org/list/public_suffix_list.dat
![Page 23: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization](https://reader030.fdocuments.us/reader030/viewer/2022041103/5f022cbc7e708231d402f14f/html5/thumbnails/23.jpg)
IntroductiontoUniversalAcceptance(UASG007)
Version8-5May2016 23
OtherChallengesGeneral
VariableencodingofIDNs
Insomeapplications,IDNsareencoded:
• InPunycode,asperIDNA,ifthenameisidentifiedasanInternetname,
BUT
• InUTF-8, if thenameis identifiedasanameonthe localareanetwork
(“intranet”)
Mechanismtodetectandconvertcharsets
Someolderemailapplicationswereencodedinalocalcodepageanddidnothave
a set mechanism for detecting and converting charset as needed. This was
especiallytruefortheemailheader(TO,CC,BCC,Subject).
Failuretohandlenon-DNSprotocols
SomeapplicationsthatdoIDNA(forexample,IE7+)breakfornon-DNSprotocols. Thiscouldaffectaccessingresourcesusingnon-DNSprotocols.
Mechanismtomanagemultipleemailaddressesintoasingleuseridentity
Whenauserisaliasingmultipleemailaddressesitmaybetrickytomanagethese
addressesasasingleuseridentity.
Email programs can direct traffic to such aliases to the samemailbox, but the
applicationmaystillperceivetheseemailstopertaintodifferentidentities.
Tipforsoftwaredevelopers
✔
Whenallowingausertogenerateadomainnameoremailaddress,consideravoidingtheuseof
visuallyconfusingcharacterstopreventhomographattacks.Useonlythislistofallowedcharacters
forIDN:http://unicode.org/reports/tr36/idn-chars.txt
IDN-StyleEmailandWhyItIsNottheSameasEAIEAI isdefinedasusingUnicodeonly;A-Labels (Punycode)arenotallowed.Nevertheless,developershave
sometimesadaptedemailsoftwareandservicestohandleIDN-Styleemailaddressesratherthanmakeafull
conversiontoUnicode.
BecauseIDNscanbePunycodeencoded,someexistingsoftwareallowstheIDNportionofanemailaddress
toberepresented inASCIIorUnicode.Forexample,somesoftwarewill treatthesetwo“IDN-Styleemail”
addressesequivalentlyforallpurposes(sending,receiving,andsearching):
However,somesoftwarewillnotrobustlytreattheseaddressesasequivalent,eventhougharebothvalid,
because there isno requirement for software toprocessanA-label (i.e. “xn--q9jyb4c”) into itsU-labelequivalent (i.e. “ ”) before comparing. This can result in unpredictable user experience. The user
NotallsoftwarewilltreatthesetwoIDN-Styleemailsasfunctionallyequivalent
user@example. = [email protected]
![Page 24: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization](https://reader030.fdocuments.us/reader030/viewer/2022041103/5f022cbc7e708231d402f14f/html5/thumbnails/24.jpg)
IntroductiontoUniversalAcceptance(UASG007)
Version8-5May2016 24
experience may become especially confusing if some software converts U-labels into A-labels for
“compatibility”;asmessagesarereplied-toorforwarded,theaddresseswhicharevisiblydifferenttoauser,
orwhichfailtosearchandsortasexpected,mayincrease.
Intheexamplebelow,somesoftwaremayattempttoconverteventhelocalpartoftheemailaddressusing
Punycode,creatingsomethingthatlookslikeanA-LABELinthelocalpartoftheaddress.Thisisnotallowed
under the existingRFCs, and is very likely to result in failures to receive email by certain systems and to
generatesearchingandsortingdifficultiesasexplainedabove.
RobustUA-readysoftwareandservicesmaybeabletohandleandtreatalltheseformatsidentically,even
thosewhich are not RFC-compliant.Nevertheless,UA-ready software should not generate true EAI email
addressesonly.
LinkificationandItsChallengesModernsoftwaresometimesallowsausertoautomaticallycreateahyperlinksimplybytypinginastringthat
lookslikeawebaddress,emailnameornetworkpath.Forexample,typing“www.icann.org”intoanemail
message may result in a clickable link to http://www.icann.org being automatically created if the
applicationtreats“www.”asaspecialprefixor“.org”asaspecialsuffix.
Linkificationshouldworkconsistentlyforallwell-formedwebaddresses,emailnamesornetworkpaths.
Linkificationistheactionwhereanapplicationacceptsastringanddynamicallydetermineswhetheritshould
createahyperlinktoanInternetLocation(URL)oranemailaddress(mailto:)
Linkificationusesalgorithmsandrulescreatedbysoftwaredeveloperstodeterminewhether astringshould
be deemed a link – or not. Related to this is how people can identify a string as a domain name.While
browsers,emailclientsandwordprocessorsareobviousplaces,therearemanymoreapplicationsthatmake
thesedecisions.
Good Practice Recommendations
1. Attempttolinkifybasedonexplicitprotocolprefixes(e.g.“http://”,ftp://”,“mailto:”)butonlycomplete
theactioniftherestofthestringiswellformed
ExampleString ExpectedBehavior/Result
example.com Nolinkificationbecauseprotocolisabsentandnotinferred.
http://example.com Createhyperlinkbecauseprotocolisexplicit
http:example.com Nolinkificationbecauseofbadsyntax(missing//)
NeverconvertthelocalpartofanemailaddressusingPunycode
✔ 戶@example.
![Page 25: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization](https://reader030.fdocuments.us/reader030/viewer/2022041103/5f022cbc7e708231d402f14f/html5/thumbnails/25.jpg)
IntroductiontoUniversalAcceptance(UASG007)
Version8-5May2016 25
ExampleString ExpectedBehavior/Result
http://example.a NolinkificationbecauseICANNPoliciesrequireTLDtobeatleasttwo
characters. NB: This syntax could be supported within an internal
network.
http://example..ab Nolinkificationbecauseofbadsyntax(consecutivedots)
http:// - . Createhyperlinkbecauseprotocolisexplicit.
2. Attempttolinkifybasedonimplicitprotocolprefixes(e.g.“www”infers“http://www”)
ExampleString ExpectedBehavior/Result
www.example.com Createhyperlinkbecauseprotocolisimplied16
[email protected] Createmailto:[email protected].
3. MaptheIdeographicFullStop“”(U+3002)toFullStop“.”(U+002E)(e.g.http:// comàhttp://
.com)ifstringisotherwisewellformed.
4. IfTLDsareusedasa‘specialsuffix’todeterminelinkability,thenallTLDsmustbeincluded.Alistof
validTLDsshouldbeupdateddynamicallyonafrequentbasis.
16Note:itmightbethecasethattheactualwebsiterequiresthatenduserstypehttps://insteadofhttp://.If
thisisthecase,thenthehyperlinkmaynotresolveormayreturnanerrorpage.
![Page 26: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization](https://reader030.fdocuments.us/reader030/viewer/2022041103/5f022cbc7e708231d402f14f/html5/thumbnails/26.jpg)
IntroductiontoUniversalAcceptance(UASG007)
Version8-5May2016 26
Part3:AdvancedTopicsComplexScriptsThedetailsofcomplexscriptsmaynotbeofinteresttothosewhoarenotdeveloperscreatingtheirownstring
parsing libraries. Nevertheless, a summary is included here to ensure that all readers have sufficient
awarenesstorecognizecodebugsrelatedtothesescriptswhenencounteredinuserexperiences.
RighttoLeftLanguagesandUnicodeConformanceMostscriptsdisplaycharactersfromlefttorightwhentextispresentedinhorizontallines.However,
therearealso several scripts, suchasArabicorHebrew,where theorderingofhorizontal text in
displayisfromrighttoleft.Thetextcanalsobebidirectional(lefttoright–righttoleft)whenaright-
to-leftscriptusesdigitsthatarewrittenfromlefttorightorwhen itusesembeddedwordsfrom
Englishorotherscripts.
Challengesandambiguitiescanoccurwhenthehorizontaldirectionofthetext isnotuniform.To
solvethisissue,thereisanalgorithmtodeterminethedirectionalityforbidirectionalUnicodetext.
Thereisasetofrulesthatshouldbeappliedbytheapplicationtoproducethecorrectorderatthe
timeofdisplaywhicharedescribedbytheUnicodeBidirectionalAlgorithm.Wegenerallyreferto
thisasthe“Bidialgorithm”.
TheBidiAlgorithmTheBidialgorithmdescribeshowsoftwareshouldprocesstextthatcontainsbothleft-to-right(LTR)
and right-to-left (RTL) sequences of characters. Thebase direction17 assigned to the phrasewilldeterminetheorderinwhichtextisdisplayed.
Toknowifasequenceis left-to-rightorright-to-left,eachcharacter inUnicodehasanassociated
directionalproperty.Mostlettersarestronglytyped(strongcharacters)asLTR(left-to-right).Lettersfromright-to-leftscriptsarestronglytypedasRTL(right-to-left).Asequenceofstrongly-typedRTL
characterswillbedisplayedfromrighttoleft.Thisisindependentofthesurroundingbasedirection.
Forexample:
(LTR)example-مثال(RTL).
Textwithdifferentdirectionalitycanbemixedinline.Insuchcases,theBidialgorithmproducesa
separatedirectionalrunoutofeachsequenceofcontiguouscharacterswiththesamedirectionality.
SpacesandpunctuationarenotstronglytypedaseitherLTRorRTLinUnicodebecausetheymaybe
used in either type of script. They are therefore classified asneutral orweak characters.Weak
charactersarethosewithvaguedirectionality.Examplesofthistypeofcharacterinclude:
• Europeandigits
• EasternArabic-Indicdigits
• Arithmeticsymbols,andcurrencysymbols
• Punctuationsymbolsthatarecommontomanyscripts,suchasthecolon,comma,full-stop,
andtheno-break-space
Thedirectionalityofneutralcharactersisindeterminatewithoutcontext.Someexamplesinclude:
17InHTMLthebasedirectioniseitherinheritedfromthedefaultdirectionofthedocument,whichisleft-to-
right,orexplicitlysetbythenearestparentelementthatusesthedirattribute.
![Page 27: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization](https://reader030.fdocuments.us/reader030/viewer/2022041103/5f022cbc7e708231d402f14f/html5/thumbnails/27.jpg)
IntroductiontoUniversalAcceptance(UASG007)
Version8-5May2016 27
• Tabs
• Paragraphseparators
• Mostotherwhitespacecharacters
Whenaneutralcharacterisbetweentwostronglytypedcharactersthathavethesamedirectional
type, it will also assume that directionality. For example, a neutral character between two RTLcharacters will be treated as a RTL character itself, and will have the effect of extending the
directionalrun:
نطاق.مثال •
Evenifthereareseveralneutralcharactersbetweenthetwostronglytypedcharacters,theywillall
betreatedinthesameway.
When a space or punctuation falls between two strongly typed characters that have different
directionality, the neutral character (or characters) will be treated as if they have the same
directionalityastheprevailingbasedirection.Forexample:
• example. مثال
Unlessadirectionaloverrideispresentnumbersarealwaysencoded(andentered)big-endian18,andthenumeralsrenderedLTR.Theweakdirectionalityonlyappliestotheplacementofthenumberin
itsentirety.
ToseetheBidialgorithmindetail,goto:http://unicode.org/reports/tr9/tr9-11.html
TheBidiRuleforDomainNamesABididomainnameisonethatcontainsatleastoneRTLlabel.Thereisarulethatdeterminesthe
conditionstobemetforthelabelsinBididomainnames.ThisrulecanbefoundonSection2ofRFC
5893:https://tools.ietf.org/html/rfc5893
JoinersSomelanguagesusealphabeticscriptsinwhichsinglephonemesarewrittenusingtwocharacters
calledadigraph.Inotherwords,adigraphisagroupoftwosuccessivelettersthatrepresentasinglesound(orphoneme).
Somedigraphsarefullyjoinedasligatures.Inwritingandtypography,aligaturehappenswheretwoormoregraphemesorlettersarejoinedasasingleglyph.Anexampleistheampersandcharacter
(&),whichevolvedfromtheadjoinedLatinletterseandt(“et”means“and”).
18“Big-endianandlittle-endianaretermsthatdescribetheorderinwhichasequenceofbytesarestoredin
computermemory.Big-endianisanorderinwhichthe‘bigend’(mostsignificantvalueinthesequence)is
storedfirst(attheloweststorageaddress).Little-endianisanorderinwhichthe‘littleend’(leastsignificant
valueinthesequence)isstoredfirst.”
Source:http://searchnetworking.techtarget.com/definition/big-endian-and-little-endian
ExamplesofdiagraphsinEnglish
ch(asinchurch)ph(phone)
th(then)th(think)
sh(shoe)
![Page 28: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization](https://reader030.fdocuments.us/reader030/viewer/2022041103/5f022cbc7e708231d402f14f/html5/thumbnails/28.jpg)
IntroductiontoUniversalAcceptance(UASG007)
Version8-5May2016 28
Ifligaturesanddigraphshavethesameinterpretationinalllanguagesthatuseagivenscript,Unicode
normalizationgenerallyresolvesthedifferencesandmakesthemmatch.Whentheyhavedifferent
interpretations,matchingmustusealternativemethods,likelychosenattheregistrylevel,orusers
mustbeeducatedtounderstandthatmatchingwillnotoccur.Anexampleofdifferentinterpretation
canbefoundinSection4.3ofRFC5894:https://tools.ietf.org/html/rfc5894
TheUnicodeConsortiumliststwomainstrategiestodeterminethejoiningbehaviorofaparticular
characterafterapplyingtheBidialgorithm:
• “Whenshaping,an implementationcan referback to theoriginalbacking store to see if
therewereadjacentZWNJorZWJ19characters.
• Alternatively,theimplementationcanreplaceZWJandZWNJbyanout-of-bandcharacter
property associated with those adjacent characters, so that the information does not
interferewiththeBidialgorithmandtheinformationispreservedacrossrearrangementof
thosecharacters.OncetheBidialgorithmhasbeenapplied,thatout-of-bandinformation
canthenbeusedforpropershaping.”20
Intheabsenceofcarebyregistriesabouthowstringsthatcouldhavedifferentinterpretationsunder
IDNA2003andthecurrentspecificationarehandled,itispossiblethatthedifferencescouldbeused
asacomponentofname-matchingorname-confusionattacks.Suchcareisthereforeappropriate.
Tolearnmoreaboutjoiners,seeSection4.3ofRFC5894:https://tools.ietf.org/html/rfc5894
HomoglyphandConfusinglySimilarCharactersHomoglyphsarecharactersthat,duetosimilaritiesinsizeandshape,mightappearidenticalatfirst
glance.
Topreventconfusinglylookingdomainnamesbeingregistered,registriescanusethe“homoglyph
bundling”procedure.21
HomoglyphbundlingiswhenyouregisteranIDNandtheregistrationsystemautomaticallybundles
all the homoglyphs of that name (if there are any). Thismeans that several domain names are
bundledatonetime,andnoneoftheotherdomainnamesinthatbundlecanberegistered.
Homoglyphbundlingisagoodpracticeforregistriestoavoidpossiblephishingpracticesthatintend
totricktheuserwithvisuallyconfusingcharacters.
TolearnmoreaboutUnicodesecuritymechanismsforconfusabledetection,goto:
19TolearnmoreaboutZWNJ/ZWJ,goto:http://www.unicode.org/L2/L2005/05307-zwj-zwnj.pdf
20Source:MarkDavis,AharonLanin,AndrewGlass.2015.Unicode.http://unicode.org/reports/tr9
21https://www.icann.org/resources/pages/idn-guidelines-2011-09-02-en
Examplesofhomoglyphs
Cyrilliccharactera
Latincharactera
=
=
Unicodenumber0430
Unicodenumber0061
![Page 29: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization](https://reader030.fdocuments.us/reader030/viewer/2022041103/5f022cbc7e708231d402f14f/html5/thumbnails/29.jpg)
IntroductiontoUniversalAcceptance(UASG007)
Version8-5May2016 29
• http://www.unicode.org/reports/tr39/#Confusable_Detection
Toseealistofhomoglyphs,goto:
• http://homoglyphs.net
Tolearnmoreaboutconfusinglysimilarcharactersandgoodpractice,see:
• M3AAWGUnicodeAbuseOverviewandTutorial
https://www.m3aawg.org/sites/default/files/m3aawg-unicode-tutorial-2016-02.pdf
• M3AAWGBestPracticesforUnicodeAbusePrevention
https://www.m3aawg.org/sites/default/files/m3aawg-unicode-best-practices-2016-02.pdf
NormalizationandCaseFoldingNormalization
UnicodeNormalizationhelpstodeterminewhetheranytwoUnicodestringsareequivalenttoeach
other. Some characters can be represented inUnicode by several code sequences. This is called
Unicodeequivalence.Unicodeprovidestwotypesofequivalences:
• Canonical(NFD)
• Compatibility(NFK)
Sequencesrepresentingthesamecharacterarecalledcanonicallyequivalent.Thesesequenceshavethesameappearanceandmeaningwhenprintedordisplayed.Forexample:
Compatibility equivalents are sequences which can have different appearances, but in some
contextsthesamemeaning.Itisaweakertypeofequivalencebetweencharactersorsequencesof
characters.
Examplesofcanonicallyequivalentcharacters
U+006E (Latin lowercase “n”) followed by U+0303 (the
combiningtilde“◌̃”)
= ñ
U+00F1(lowercaseletter“ñ”oftheSpanishalphabet) = ñ
Examplesofcompatibilityequivalentcharacters
U+FB00(thetypographicligature“ff”) = ff
U+0066U+0066(twoLatin“f”letters) = ff
![Page 30: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization](https://reader030.fdocuments.us/reader030/viewer/2022041103/5f022cbc7e708231d402f14f/html5/thumbnails/30.jpg)
IntroductiontoUniversalAcceptance(UASG007)
Version8-5May2016 30
In the example above, the code point U+FB00 is defined to be compatible, but not canonically
equivalent to the sequence U+0066 U+0066. Sequences that are canonically equivalent are also
compatible,buttheoppositeisnotnecessarilytrue.
To avoid interoperability problems arising from the use of canonically equivalent, yet different,
charactersequences,theW3CrecommendsusingNormalizationFormC22forallcontent.
To see a list of all characters that may change in any of the Normalization Forms, go to:
http://www.unicode.org/charts/normalization
Someotherpointstonote:
• OnlystringsNOTtransformedbyNFKC23arevalid.
• WhentwoapplicationsshareUnicodedata,butnormalizethemdifferently,errorsanddata
losscanoccur.
• NormalizationFormsmustremainstableovertime.Inotherwords,astringmustremain
normalizedunderallfutureversionsofUnicode(backwardcompatibility).
Tipforsoftwaredevelopers
✖
Don’tnormalizebyconvertingtouppercase,orignoringnon-spacingcharacters,becausethismay
alsomakesorting,datacopy,dataimportandexport,dataretrievalbyclientapplicationsrather
difficultandmayresultindatalossorcorruption.
TolearnmoreaboutNormalizationFormsgoto:http://www.unicode.org/reports/tr15
CaseFoldingCasefoldingistheprocessofmakingtwotexts,whichdifferincasebutareotherwise“thesame”,
identical.Mapping[a-z]to[A-Z]worksformostsimpleASCII-onlytextdocuments.However,itbegins
tobreakdownwithlanguagesthatuseadditionalcharacters.
UnicodedefinesthedefaultcasefoldmappingforeachUnicodecodepoint.Therearecommonandfullcasefoldmappings:
• Commonfoldmappingsarethosethathaveasimple,straightforwardmappingtoasingle
matching(mainlylowercase)codepoint
• FullfoldmappingsarethosethatwouldnormallyrequiremorethanoneUnicodecharacter
One importantconsideration,accordingto theW3C,24 iswhether thevaluesarerestrictedto the
ASCIIsubsetofUnicodeorifthevocabularypermitstheuseofcharacters(suchasaccentsonLatin
lettersorabroadrangeofUnicodeincludingnon-Latinscripts)thatpotentiallyhavemorecomplex
casefoldingrequirements.25
22NFC:CanonicalDecomposition,followedbyCanonicalComposition.
23NFKC:CompatibilityDecomposition,followedbyCanonicalComposition.
24W3C:TheWorldWideWebConsortium(W3C)isaninternationalcommunitywhereMemberorganizations,
afull-timestaffandthepublicworktogethertodevelopWebstandards.See:https://www.w3.org
25 Source: A Phillips. 2015. Character Model for the World Wide Web: String Matching and Searching.
https://www.w3.org/TR/charmod-norm
![Page 31: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization](https://reader030.fdocuments.us/reader030/viewer/2022041103/5f022cbc7e708231d402f14f/html5/thumbnails/31.jpg)
IntroductiontoUniversalAcceptance(UASG007)
Version8-5May2016 31
Tipforsoftwaredevelopers
✔ ConsiderUnicodeNormalizationinadditiontocasefolding.
TolearnmoreaboutUnicodenormalization,see:
• http://www.w3.org/TR/charmod-norm
• http://unicode.org/reports/tr15
Forrecommendationsaboutcasefolding,goto:
• https://www.w3.org/International/wiki/Case_folding
![Page 32: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization](https://reader030.fdocuments.us/reader030/viewer/2022041103/5f022cbc7e708231d402f14f/html5/thumbnails/32.jpg)
IntroductiontoUniversalAcceptance(UASG007)
Version8-5May2016 32
Part4:GlossaryandOtherResourcesGlossary
A-label The ASCII-compatible encoded (ACE) representation of an internationalized
domainname,e.g.how it is transmitted internallywithin theDNSprotocol.A-
labelsalwayscommencewiththeprefix“xn--”.ContrastwithU-label.
ACEprefix ASCIICompatibleEncodingPrefix.
ASCIICharacters AmericanStandardCodeforInformationInterchange.Thesearecharactersfrom
thebasicLatinalphabettogetherwiththeEuropean-Arabicdigits.Thesearealso
includedinthebroaderrangeof"Unicodecharacters"thatprovidesthebasisfor
IDNs.
API AnApplicationProgramming Interface (API) is a setof routines,protocols, and
tools for building software and applications. An API may be for a web based
system,operatingsystem,ordatabasesystem,anditprovidesfacilitiestodevelop
applicationsforthatsystemusingagivenprogramminglanguage.
Codespace Rangethatdefinethelowerandupperboundsforanencoding.
CodePoints Acodepointorcodepositionisanyofthenumericalvaluesthatmakeupthecode
space. They are used to distinguish both, the number from an encoding as a
sequence of bits, and the abstract character from a particular graphical
representation(glyph).
DNSRootZone TherootzoneisthecentraldirectoryfortheDNS,whichisakeycomponentin
translatingreadablehostnamesintonumericIPaddresses.
EAI EmailAddress Internationalization is anemail address that requires theuseof
Unicodeinallpartsoftheemailaddress.
IANA InternetAssignedNumbersAuthority.Itsfunctionsinclude:
• MaintenanceoftheregistryoftechnicalInternetprotocolparameters
• Administrationofcertain responsibilitiesassociatedwith InternetDNS
rootzone
• AllocationofInternetnumberingresources
ICANN The Internet Corporation for Assigned Names and Numbers (ICANN) is an
internationally organized, non-profit corporation that has responsibility for
Internet Protocol (IP) address space allocation, protocol identifier assignment,
generic (gTLD) and country code (ccTLD) Top-Level Domain name system
management,androotserversystemmanagementfunctions.
IDN InternationalizedDomainNames.IDNsaredomainnamesthatincludecharacters
usedinthelocalrepresentationoflanguagesthatarenotwrittenwiththetwenty-
sixlettersofthebasicLatinalphabet“a-z”,thenumbers0-9,andthehyphen“-“.
IDNA InternationalizedDomainNamesinApplications.
![Page 33: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization](https://reader030.fdocuments.us/reader030/viewer/2022041103/5f022cbc7e708231d402f14f/html5/thumbnails/33.jpg)
IntroductiontoUniversalAcceptance(UASG007)
Version8-5May2016 33
IDNccTLD Country Code Top-level Domain that includes characters used in the local
representationoflanguagesthatarenorwrittenwiththetwenty-sixlettersofthe
basicLatinalphabet“a-z”.Examples:
• .рф(Russia)
(Egypt).صر •
(ArabiaSaudi).السعودیة •
IETF The Internet Engineering Task Force (IETF) is a large open international
communityofnetworkdesigners,operators,vendors,andresearchersconcerned
withtheevolutionoftheInternetarchitectureandthesmoothoperationofthe
Internet. It is open to any interested individual. The IETF develops Internet
StandardsandinparticularthestandardsrelatedtotheInternetProtocolSuite
(TCP/IP).
Language Themethodofhumancommunication,eitherspokenorwritten,consistingofthe
useofwordsinastructuredandconventionalway.
Punycode ItisanalgorithmtorepresentUnicodewiththelimitedcharactersubsetofASCII
supportedbytheDomainNameSystem.Punycodeisintendedfortheencoding
of labels in the Internationalized Domain Names in Applications (IDNA)
framework.
Registrar Anorganizationwheredomainnamesareregisteredbyusers.Theregistrarkeeps
records of the contact information and submits the technical information to a
centraldirectoryknownasthe“registry”.
Registry Theauthoritative,masterdatabaseofalldomainnamesregistered ineachTop
LevelDomain.
RFC A Request for Comments (RFC) is a formal document from the Internet
Engineering Task Force (IETF) that is the result of committee drafting and
subsequentreviewbyinterestedparties.
Script Thecollectionoflettersorcharactersusedinwriting,representingthesoundsof
alanguage.
Second-leveldomainname
IntheDomainNameSystem(DNS)hierarchy,asecond-leveldomain(SLDor2LD)
is a domain that is directly below a top-level domain (TLD). For example, in
example.com,exampleisthesecond-leveldomainofthe.comTLD.
U-label A"U-label" isan IDNA-valid stringofUnicodecharacters includingat leastone
non-ASCIIcharacter.ConversionsbetweenU-labelsandA-labelsareperformed
accordingtothePunycodespecification[RFC3492].
UA-ready SoftwareorUA-Readiness
Universal Acceptance Ready Software. It is a software that has the ability to
Accept,Store,Process,ValidateandDisplayallTopLevelDomainsequallyandall
IDNs,hyperlinkandemailaddressesequally.
Unicode Auniversalcharacterencodingstandard.Itdefinesthewayindividualcharacters
arerepresentedintextfiles,webpages,andothertypesofdocuments.Unicode
wasdesignedtosupportcharactersfromalllanguagesaroundtheworld.Itcan
![Page 34: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization](https://reader030.fdocuments.us/reader030/viewer/2022041103/5f022cbc7e708231d402f14f/html5/thumbnails/34.jpg)
IntroductiontoUniversalAcceptance(UASG007)
Version8-5May2016 34
support roughly 1,000,000 characters and supports up to 4 bytes for each
character.See:http://unicode.org
UTF UnicodeTransformationFormat.ItisawayoftransformingUnicodecodepoints
intoastreamofbytes.UTF-8isthepreferredUTFforhandlingIDNandEAI.UTF-
8convertsUnicodeto8-bitbytes.
M3AAWG TheMessaging,Malware andMobile Anti-AbuseWorkingGroup (M3AAWG) is
where the industry comes together to work against botnets, malware, spam,
viruses, DoS attacks and other online exploitation. See:
https://www.m3aawg.org/
W3C TheWorldWideWebConsortium (W3C) is an international communitywhere
Memberorganizations,afull-timestaff,andthepublicworktogethertodevelop
Webstandards.See:https://www.w3.org/
ZWJ Zero-WidthJoinerisnon-printingcharacterusedinthecomputerizedtypesetting
ofsomecomplexscriptssuchastheArabicscriptoranyIndicscript.Whenplaced
betweentwocharactersthatwouldotherwisenotbeconnected,aZWJcauses
themtobeprintedintheirconnectedforms.
ZWNJ Zero-WidthNon-Joinerisanon-printingcharacterusedinthecomputerizationof
writingsystemsthatmakeuseofligatures.Whenplacedbetweentwocharacters
thatwouldotherwisebe connected intoa ligature, aZWNJcauses them tobe
printedintheirfinalandinitialforms,respectively.Thisisalsoaneffectofaspace
character, but a ZWNJ is used when it is desirable to keep the words closer
togetherortoconnectawordwithitsmorpheme.
ForacompleteICANNglossary,goto:https://www.icann.org/resources/pages/glossary-2014-02-03-en
RFCsPUNYCODERFCs
RFC3492 Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names inApplications(IDNA)
RFC3492describesPunycodeas:
"a simple and efficient transfer encoding syntax designed for use withInternationalizedDomainNamesinApplications(IDNA)"
PunycodetransformsuniquelyandreversiblyaUnicodestringintoanASCIIstring.ThisRFC
definesageneralalgorithmcalledBootstring.Thisalgorithmallowsastringofbasiccode
pointstouniquelyrepresentanystringofcodepointsdrawnfromalargerset.
https://tools.ietf.org/html/rfc3492
IDNRFCs
![Page 35: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization](https://reader030.fdocuments.us/reader030/viewer/2022041103/5f022cbc7e708231d402f14f/html5/thumbnails/35.jpg)
IntroductiontoUniversalAcceptance(UASG007)
Version8-5May2016 35
RFC5890
Internationalized Domain Names for Applications (IDNA): Definitions and DocumentFramework
ThisRFCdescribestheusagecontextandprotocolforarevisionofInternationalizedDomain
NamesforApplications(IDNA).
https://tools.ietf.org/html/rfc5890
RFC5891 InternationalizedDomainNamesinApplications(IDNA)Protocol
This RFC specifies the protocol mechanism, called Internationalized Domain Names in
Applications (IDNA), for registering and looking up IDNs in a way that does not require
changestotheDNSitself.
https://tools.ietf.org/html/rfc5891
RFC5892 TheUnicodePointsandInternationalizedDomainNamesforApplications(IDNA)
TheRFC5892specifiesrulesfordecidingwhetheracodepoint,consideredinisolationorin
context,isacandidateforinclusioninanInternationalizedDomainName(IDN).
https://tools.ietf.org/html/rfc5892
RFC5893 Right-to-leftscriptsforInternationalizedDomainNamesforApplications(IDNA)
This RFC provides a new Bidi rule for Internationalized Domain Names for Applications
(IDNA)labels,fortheuseofright-to-leftscriptsinInternationalizedDomainNames.
https://tools.ietf.org/html/rfc5893
RFC5894 InternationalizedDomainNames forApplications (IDNA): Background, Explanation andRationale
Thisinformationaldocumentprovidesanoverviewofarevisedsystemtodealwithnewer
versionsofUnicodeandprovidesexplanatorymaterialforitscomponents.
https://tools.ietf.org/html/rfc5894
RFC5895 MappingCharactersforInternationalizedDomainNamesinApplications(IDNA)2008
ThisRFCdescribestheactionsthatcanbetakenbyanimplementationbetweenreceiving
userinputandpassingpermittedcodepointstothenewIDNAprotocol(2008).Itdescribes
anoperationthatistobeappliedtouserinputinordertopreparethatuserinputforusein
an “on the network” protocol. It also includes a general implementation procedure for
mapping.
https://tools.ietf.org/html/rfc5895
EAIRFCs
RFC6530 OverviewandFrameworkforInternationalizedEmail
This standard introduces a series of specifications that definemechanisms and protocol
extensions needed to fully support internationalized email addresses. This document
describes how the various elements of email internationalization fit together and the
![Page 36: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization](https://reader030.fdocuments.us/reader030/viewer/2022041103/5f022cbc7e708231d402f14f/html5/thumbnails/36.jpg)
IntroductiontoUniversalAcceptance(UASG007)
Version8-5May2016 36
relationshipsamongtheprimaryspecificationsassociatedwithmessagetransport,header
formats,andhandling.
https://tools.ietf.org/html/rfc6530
RFC6531 SMTPExtensionforInternationalizedEmail
ThedocumentdefinesaSimpleMailTransferProtocolextensionsoserverscanadvertise
the ability to accept and process internationalized email addresses and internationalized
emailheaders.
https://tools.ietf.org/html/rfc6531
RFC6532 InternationalizedEmailHeaders
ThisdocumentspecifiesanenhancementtotheInternetMessageFormatandtoMIMEthat
allows use of Unicode inmail addresses andmost header field content. This document
specifiesanenhancement to the InternetMessageFormat (RFC5322)and toMIME that
permitsthedirectuseofUTF-8,ratherthanonlyASCII,inheaderfieldvalues,includingmail
addresses. A new media type, message/global, is defined for messages that use this
extended format.This specificationalso lifts theMIME restrictiononhavingnon-identity
content-transfer-encodings on any subtype of the message top-level type so that
message/globalpartscanbesafelytransmittedacrossexistingmailinfrastructure.
https://tools.ietf.org/html/rfc6532
RFC6533 InternationalizedDeliveryStatusandDispositionNotifications
Thisspecificationaddsanewaddresstypeforinternationalemailaddressessoanoriginal
recipient address with non-ASCII characters can be correctly preserved even after
downgrading. This also provides updated content returnmedia types for delivery status
notificationsandmessagedispositionnotificationstosupportuseofthenewaddresstype.
https://tools.ietf.org/html/rfc6533
KeyStandardsISO10646(Unicode)
Toprovideacommontechnicalbasis for theprocessingofelectronic information in
various languages, the International Organization for Standardization (ISO) has
developedaninternationalcodingstandardcalledISO10646.TheISO10646provides
a unified standard for the coding of characters in allmajor languages in theworld
includingtraditionalandsimplifiedChinesecharacters.Thislargecharactersetiscalled
theUniversalCharacterSet(UCS).ThesamesetofcharactersisdefinedbytheUnicode
standard,whichfurtherdefinesadditionalcharacterpropertiesandotherapplication
detailsofgreatinteresttoimplementers.
UnicodeisacharactercodingsystemdesignedbytheUnicodeConsortiumtosupport
theinterchange,processinganddisplayofthewrittentextsofallmajorlanguagesin
theworld. ISO 10646 andUnicode define several encoding forms of their common
repertoire:UTF-8,UCS-2,UTF-16,UCS-4andUTF-32.
![Page 37: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization](https://reader030.fdocuments.us/reader030/viewer/2022041103/5f022cbc7e708231d402f14f/html5/thumbnails/37.jpg)
IntroductiontoUniversalAcceptance(UASG007)
Version8-5May2016 37
http://www.iso.org/iso/home/store/catalogue_ics/catalogue_detail_ics.htm?csnumb
er=63182
GB18030(China)
GB18030-2000isaChinesegovernmentstandardthatspecifiesanextendedcodepage
foruseintheChinesemarketinadditiontoUTF-8.Theinternalprocessingcodeforthe
characterrepertoirecanandshouldbeUnicode;however,thestandardstipulatesthat
softwareprovidersmustguaranteeasuccessfulround-tripbetweenGB18030andthe
internalprocessingcode.AllproductscurrentlysoldortobesoldinChinamustplan
the code page migration to support GB18030 without exception. GB18030 is a
“mandatorystandard”andtheChinesegovernmentregulatesthecertificationprocess
toreinforceGB18030deployment.
http://icu-project.org/docs/papers/unicode-gb18030-faq.html
UnicodeTechnicalStandard#46:UnicodeIDNACompatibilityProcessing
Thisspecificationdefinesamappingconsistentwiththenormativerequirementsofthe
IDNA2008protocol,andwhichisascompatibleaspossiblewithIDNA2003.Forclient
software, this provides behavior that is themost consistentwith user expectations
aboutthehandlingofdomainnameswithexistingdata.
http://unicode.org/reports/tr46/
OnlineResourcesAPIs WindowsAPIs
https://www.msdn.microsoft.com/enus/library/windows/desktop/ff818516%28v=vs.
85%29.aspx
SharePointAPIs
https://msdn.microsoft.com/en-us/library/office/jj860569.aspx
PublicSuffixList
https://publicsuffix.org/list/public_suffix_list.dat
ICANNAuthoritativeTLDlist
http://data.iana.org/TLD/tlds-alpha-by-domain.txt
AndroidAPIs
http://developer.android.com/guide/index.html
MACIOSAPIs
https://developer.apple.com/library/mac/navigation
.NetFramework
https://msdn.microsoft.com/en-us/library/system.text.encoding(v=vs.110).aspx
UnicodeSecurity
UnicodeSecurityconsiderations
http://www.unicode.org/reports/tr36
Unicodesecuritymechanisms
http://www.unicode.org/reports/tr39
![Page 38: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization](https://reader030.fdocuments.us/reader030/viewer/2022041103/5f022cbc7e708231d402f14f/html5/thumbnails/38.jpg)
IntroductiontoUniversalAcceptance(UASG007)
Version8-5May2016 38
Unicodecharactergroupings
Unicodecodeplanes
http://en.wikipedia.org/wiki/Mapping_of_Unicode_character_planes
OverviewofGB18030
http://en.wikipedia.org/wiki/GB_18030
AuthoritativemappingtablebetweenBG18038-2000andUnicode
http://source.icu-project.org/repos/icu/data/trunk/charset/data/xml/gb-18030-
2000.xml
Unicodenormalization
https://en.wikipedia.org/wiki/Unicode_equivalence
Unicodeexploits
Section3.1,“UTF-8Exploits”inUnicodeTechnicalReport#36
http://unicode.org/reports/tr36/#UTF-8_Exploit
M3AAWGBestPracticesforUnicodeAbusePrevention
https://www.m3aawg.org/sites/default/files/m3aawg-unicode-best-practices-2016-
02.pdf
M3AAWGUnicodeAbuseOverviewandTutorial
https://www.m3aawg.org/sites/default/files/m3aawg-unicode-tutorial-2016-02.pdf
Seealso:
http://www.unicode.org
Miscellaneous URIs
http://tools.ietf.org/html/rfc3986
TheDomainNameSystem:ANon-TechnicalExplanation–WhyUniversalResolvability
IsImportant
http://www.internic.net/faqs/authoritative-dns.html
ICANNglossary
https://www.icann.org/resources/pages/glossary-2014-02-03-en
![Page 39: Introduction to Universal Acceptance - Teleinfo · Introduction to Universal Acceptance (UASG 007) Version 8 -5 May 2016 5 Introduction A Brief History of Domain Name Internationalization](https://reader030.fdocuments.us/reader030/viewer/2022041103/5f022cbc7e708231d402f14f/html5/thumbnails/39.jpg)
IntroductiontoUniversalAcceptance(UASG007)
Version8-5May2016 39
AcknowledgementsTheauthorsgratefully acknowledge the followingpeople for their contributionsand collaborationon this
document:
EleezaAgopian
GwenCarlson
EdmonChung
SamanthaDickinson
DonHollander
ChantalLebrument
AntoniettaMangiacotti
RichardMerdinger
RamMohan
DavidMorrison
CarolynNguyen
MichaelD.Palage
KurtPritz
AndréSchappo
ZhengSong
LarsSteffen
AndrewSullivan
DennisTan
WinnieYu