
nicode - Transformation Issues

Transcript of Unicode

Page 1: Unicode

Unicode -

Transformation Issues

Page 2: Unicode
Page 3: Unicode

What’s this talk about?

• What is Unicode?

• How Apps deal with Unicode

• Unicode Transformation Attack

• Real World Examples

• How To Manipulate Applications

• Remediation

Page 4: Unicode

<scrİpt> <script>

< <g g

Can you tell the difference?

Page 5: Unicode

• Unicode lets computer systems support more languages, allowing for world wide use

• Stores characters with multiple bytes

• It provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language

What is Unicode?

Page 6: Unicode

• Every character has a unique number• A = U+0041• < = U+003C

Page 7: Unicode

• Classic example: c0rn ;)

o=U+006f, ο=U+03bf, о=U+043e

• Latin Small o, Greek Small O, Cyrillic Small Letter o

• Searches for the above can turn up different results in Google

Search For Filtered Words

Page 8: Unicode

• Data can be entered using Unicode to disguise malicious code and permit various Unicode transformation issues, such as Best-Fit Mapping

Unicode Transformation Issue

Page 9: Unicode

• Occurs when a character X gets transformed to an entirely different character Y.

• Character X in the source encoding doesn't exist in the destination encoding, so the App attempts to find a best match.

• So the characters are transcoded between Unicode and another encoding language.

Best-Fit Mapping

Page 10: Unicode

“So What” you say

Bypass filters: • Cross-site scripting (XSS)

• SQL Injection

• WAF's

• IDS devices

Page 11: Unicode

• Lowercase operation on the input after filtering.

• The string "script" is prevented by the filter, but the string "scrİpt" is allowed.

• Possibility of using many lookalikes: AΑА ᴀᴬᐱᗅᗋᗩ ⍲A

Filter Bypass

Page 12: Unicode

• Unicode character U+FF1C FULLWIDTH LESS-THAN SIGN (< ) transformed into U+003C LESS-THAN SIGN (<) due to best-fit.

• Unicode Transformation for Cross-Site Scripting or SQL Injection;

• %C0%BE = >• %C0%BC = <

Page 13: Unicode

• URL encoded GET input locale is set to acux5291%C0%BEz1%C0%BCz2a%90bcxuca5291

• Here is a part of the HTTP request.

https://vendors-unit.prudential.com/OA_HTML/help?locale= acux5291%C0%BEz1%C0%BCz2a%90bcxuca5291 &group=FND:LIBRARY:US&topic=US/FND/@ICX_FWK_LABS_HOME_PAGE

How To Test

Page 14: Unicode

• In the HTTP response, this character was converted to the short form (<)

<input type="hidden" value="acux5291&gt;z1<z2a&#65533;bcxuca5291" name="group">

• Unicode character acux5291%C0%BEz1%C0%BCz2a%90bcxuca5291 is transformed into acux5291&gt;z1<z2a&#65533;bcxuca5291

Page 15: Unicode

• ?locale=%c0%bcscript%3E&group=FND:LIBRARY:US&topic=US/FND/@ICX_FWK_LABS_HOME_PAGE


• ?locale=%c0%bcscript/%3E&group=FND:LIBRARY:US&topic=US/FND/@ICX_FWK_LABS_HOME_PAGE

Page 16: Unicode

• Supported Unicode usernames.

• Existing user account bigbird hijacked.

• Attacker created a new Spotify account with username ᴮᴵᴳᴮᴵᴿᴰ (string u’\u1d2e\u1d35\u1d33\u1d2e\u1d35\u1d3f\u1d30 ).′

• Send a request for a password reset for your new account.

• A password reset link is sent to the email for your new account. Use it to change the password.

• Instead of logging into that account with username ᴮᴵᴳᴮᴵᴿᴰ, logged with username bigbird with the new password.

• Account compromised.

Spotify Unicode Account Hijacking

Page 17: Unicode

• The canonical_username function only implemented the first time. Function like “toLower” implemented.

• Users signs up with username BigBird, normalized to bigbird.

• Another user signs up as ᴮᴵᴳᴮᴵᴿᴰ, which also gets normalized to BIGBIRD the first time, but bigbird the next time.

• ᴮᴵᴳᴮᴵᴿᴰ requests a password reset email, but with it can reset bigbird’s account.


Page 18: Unicode

• Use Canonicalizing– Important aspect of input sanitization

– Converting data with various possible representations into a standard "canonical" representation deemed acceptable by the application mapping all characters to lower case

– Treat “BigBird”, “ ᴮᴵᴳᴮᴵᴿᴰ ” and “bigbird” as the same by Canonicalizing as they would all be mapped to ‘bigbird’

What to do

Page 19: Unicode

• The vulnerability was noticed when the compromised accounts started RETWEETING a tweet with a " " symbol that was followed ♥

by a string of code/Parameter.

• Users didn’t even have to click on the tweet sent out by the Twitter account @derGeruhn. Just the act of viewing the tweet would cause the user to automatically retweet

• Affected accounts also involuntarily re-tweeted a cross-site scripting (XSS) code as a result of the vulnerability

• That tweet hit the max re-tweet over 84,000 times


Page 20: Unicode

• TweetDeck didn’t escape HTML-chars if a Unicode-char is in the tweet -text

• The Unicode-Heart (which gets replaced with an image by TweetDeck) somehow prevents the Tweet from being HTML-escaped.

• TweetDeck was not supposed to display this as an image.

Because it's simple Text, which should be escaped to "&amp;hearts;".

Page 21: Unicode

1. When converting strings used in security-sensitive operations, use documented options which prevent the use of best-fit mappings.

2. A suitable canonical form should be chosen and all user input canonicalized into that form before any authorization decisions are performed.

3. Security checks should be carried out after UTF-8 decoding is completed.

X is only allowed if X==canonical(X)

Defined Remediation

Page 22: Unicode

• Here’s a chart with all the new emoji in yellow including my favorite “1F595” which will be a hit on Twitter.

• http://www.unicode.org/charts/PDF/Unicode-7.0/U70-1F300.pdf

Page 24: Unicode

Any Questions?

Unicode is just too complex to ever be secure. — Bruce Schneier