Taint Tracking Through UTF Extension
description
Transcript of Taint Tracking Through UTF Extension
![Page 1: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/1.jpg)
Taint Tracking Through UTF Extension
byBože Zekan
supervised byDr. Mark Shtern, Dr. Vassilios Tzerpos
Computer Science and Engineering FacultyYork University
funded byNSERC USRA Grant
![Page 2: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/2.jpg)
Topics To Be Covered
• Some threats from user input
• Taint tracking
• Previous work
• Our work
![Page 3: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/3.jpg)
Topics To Be Covered
Our work
• Unicode
• Implementations
• Results
![Page 4: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/4.jpg)
The Problem We Are Addressing
• Estimated that > 80% of web services contain security vulnerabilities 1
• Many of these (50 to 82%) are user command injection vulnerabilities 1
[1] Chin, Erika, and Wagner, David. Efficient Character-level Taint Tracking for Java. In Procedings of SWS’09, November 13, 2009, Chicago, Illinois, USA. ACM 978-1-60558-789-9/09/11
![Page 5: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/5.jpg)
Our Goal
Reduce security vulnerabilities that may occur when dealing with user input
User input: - input from an actual physical person - input from another program, file, database, etc
OR- any data that is not a literal constant in our
program or has not been generated by the manipulation of literal constants in our program
![Page 6: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/6.jpg)
Some User Command Injection Threats:
• SQL injection
• Cross-site scripting (XSS)
• Path traversal
• Shell injection attacks, http response splitting, ...
![Page 7: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/7.jpg)
SQL Injection
query = "SELECT * FROM students WHERE name = '" + studentName + "'";
SELECT * FROM students WHERE name = 'bobby'
![Page 8: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/8.jpg)
SQL Injection
From: Exploits of a Mom webcomic at http://xkcd.com/327/
![Page 9: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/9.jpg)
SQL Injection
SELECT * FROM students WHERE name = 'bobby'; DROP TABLE students; --'
query = "SELECT * FROM students WHERE name = '" + studentName + "'";
![Page 10: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/10.jpg)
Cross-Site Scripting (XSS)
<p>Anonymous </br>0 Hours Ago </br> Have you noticed that Soros spelled backwards is still Soros? Coincidence, I think not!</p>
html="<p>" + name + " </br>" + when + " </br>" + comment + "</p>";
![Page 11: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/11.jpg)
Cross-Site Scripting (XSS)
<p>Anonymous </br>0 Hours Ago </br> <script> window.location="http://www.mybadsite.com/"</script></p>
html="<p>" + name + " </br>" + when + " </br>" + comment + "</p>";
![Page 12: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/12.jpg)
Path Traversal
filename: /srv/www/users/bobby/myhomework1.doc
filename = "/srv/www/users/bobby/" + filename;
![Page 13: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/13.jpg)
Path Traversal
filename: /srv/www/users/bobby/../cse3000/tentativetestquestions.doc /srv/www/users/cse3000/tentativetestquestions.doc
filename = "/srv/www/users/bobby/" + filename;
![Page 14: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/14.jpg)
To Prevent the Propagation of Malicious Data
Possible solution #1: Carefully parse/sanitize/analyze all data being sent to a sensitive data sink
SELECT * FROM students WHERE name = 'bobby'
SELECT * FROM students WHERE name = 'bobby'; DROP TABLE students; --'
<p>Anonymous </br>0 Hours Ago </br> Have you noticed that Soros spelled backwards is still Soros? Coincidence, I think not!</p>
<p>Anonymous </br>0 Hours Ago </br> <script>window.location = "http://www.mybadsite.com/"</script></p>
/srv/www/users/bobby/myhomework1.doc
/srv/www/users/bobby/../cse3000/tentativetestquestions.doc
... and hope that you catch everything from among all the possibly combinations, and don't discard any valid requests
![Page 15: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/15.jpg)
To Prevent the Propagation of Malicious Data
Possible solution #2: Carefully parse/sanitize/analyze all user supplied data being sent to a sensitive data sink
SELECT * FROM students WHERE name = 'bobby'
SELECT * FROM students WHERE name = 'bobby'; DROP TABLE students; --‘
<p>Anonymous </br>0 Hours Ago </br> Have you noticed that Soros spelled backwards is still Soros? Coincidence, I think not!</p>
<p>Anonymous </br>0 Hours Ago </br> <script>window.location = "http://www.mybadsite.com/"</script></p>
/srv/www/users/bobby/myhomework1.doc
/srv/www/users/bobby/../cse3000/tentativetestquestions.doc
... and hope that you catch everything from among all the possibly combinations, and don't discard any valid requests
![Page 16: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/16.jpg)
Taint Tracking Makes Possible Solution 2
Taint tracking consists of three main steps:
1. Identifying untrusted input at the point that it enters the program and marking that it is untrusted (i.e., tainted). 2. Propagating the taint information At each subsequent computation, mark as tainted all data that
is derived from an untrusted source.
3. Checking all data going into sensitive data sinks (e.g., a database,
or output response, or file) Use the taint information to identify potential attacks.
![Page 17: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/17.jpg)
Taint Tracking
Taint tracking comes in two possible flavours:
1. String level – mark the entire string as tainted
2. Character level- mark individual characters as tainted- allows for finer granularity
![Page 18: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/18.jpg)
How Can Character Level Tainting Be Achieved?
One method, by Chin and Wagner, of USC Berkley 1
Expand the structure of the Java String class to include a boolean array which stores the taint status for each character in the string.
[1] Chin, Erika, and Wagner, David. Efficient Character-level Taint Tracking for Java. In Procedings of SWS’09, November 13, 2009, Chicago, Illinois, USA. ACM 978-1-60558-789-9/09/11
![Page 19: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/19.jpg)
The Chin and Wagner method
Their achievement: Implementing a solution which minimizes the need to rewrite existing application code while transparently decreasing the vulnerability of applications to threats tracking
Their shortcomings:• Specific to Java
• Increases the memory required to store a string in Java
• The taint status of the java char primitive cannot be determined
• Not readily adapted to other programming languages
• Their taint information cannot propagate onwards to a database, or an application, script, or procedure running in another programming language.
![Page 20: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/20.jpg)
How can character level tainting be achieved?
Our method:Expand Unicode to include tainted characters
Our achievements: · Implement a solution which minimizes the need to rewrite existing application source code while
transparently decreasing the vulnerability of applications to threats. · Is not specific to Java
· Does not increase the memory required to store a string in Java
· The taint status of the java char primitive can be determined
· Is readily adapted to other programming languages · The taint information can propagate onwards to a
database, or an application, script, or procedure running in another programming language
![Page 21: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/21.jpg)
What is Unicode?
• A scheme that assigns a codepoint to each character in current use throughout the world
• Has been implemented in XML, Java, Microsoft.NET, web browsers, databases, and modern operating systems.
![Page 22: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/22.jpg)
Unicode
• Can accomodate 1,114,112 codepoints in 17 “planes” of 65,536 characters each
• Most of the codespace is still unassigned• Mechanisms (ex. UTF-8, UTF-16 ...) exist
that already allow software to manipulate and store all these codepoints even if no characters have been assigned to them
![Page 23: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/23.jpg)
Our Design, Part 1Tainting & Propagating Taint• We create a “tainted” character for every
character and assign it an unused codepoint
Ex. Untainted Tainted
(ascii: 41hex) A A (Unicode: U+0041) (Unicode:U+E041)
(ascii: 7Ahex) z z (Unicode: U+007A) (Unicode:U+E071)
• Now wherever a character’s codepoint goes, it’s tainted or untainted status goes with it
![Page 24: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/24.jpg)
Tainting Algorithms
• To taint a user input character x: __codepoint(tainted x) = codepoint(x) + OFFSET
• To check if character x is tainted or not:
if (codepoint(x) is in tainted codepoint range) ___character x is tainted //is user supplied else character x is untainted
• To remove taint from tainted character x: __ codepoint(x) = codepoint(tainted x) - OFFSET
![Page 25: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/25.jpg)
Our Design, Part 2The Transparent Protection Framework
Consider a typical vulnerable web application:
![Page 26: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/26.jpg)
Designing The Added Transparent Protection Framework
Consider a less vulnerable web application:
• User’s OS has fonts which incorporate tainted characters• Request Intercept Wrapper uses custom taint aware
classes/functions and is generic for a given technology• Application is on a server w/taint awareness built into its
library functions• Database Driver Intercept Wrapper uses custom taint aware
classes/functions specific to the database to check for SQL injection, and drop malicious queries
![Page 27: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/27.jpg)
Implementation Details: The Font
For a final, universally adopted application:• System fonts would be expanded to include tainted
characters, which would look identical to their untainted counterparts
Ex. untainted ABCDE ... vs tainted ABCDE ...
For our proof of concept: • Tainted vs untainted character appear different
– to easily distinguish them on computer screens and in documents
Ex. untainted ABCDE ... vs tainted ...
![Page 28: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/28.jpg)
Implementation Details: The Font
• We used Type-Light freeware to modify Window's Courier New font
- installed it by dragging out the original ttf file from the Fonts directory, and dragging in our new ttf file
![Page 29: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/29.jpg)
![Page 30: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/30.jpg)
Implementation Details: The Application
• Has no knowledge of taint• Counts the number of visits of this user
• 1st query to db checks if user’s name is in the db.
If no, then insert name into db and sets visits count to 1
If yes, then increment visits count by 1 in the db
• 2nd query to db outputs the number of visits for the user‘s _name from the db’s record
![Page 31: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/31.jpg)
Implementation Details: The Transparent Protection Framework
We implemented our framework on our typical web application in four different technologies:
1. PHP/Mysql on Apache (under Windows XP)
2. PHP/DB2 on Apache (under Linux) 3. Java Servlet/DB2 on Tomcat7 (under Linux) 4. PHP on Apache (under Linux) calling Java Servlet/DB2
----on Tomcat7 (under Linux) To do this we set the UTF-8 or Unicode encoding option
everywhere it was available, and Courier New as the selected font wherever possible.
![Page 32: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/32.jpg)
Implementation Details: The Transparent Protection Framework
![Page 33: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/33.jpg)
Implementation Details: The Form Page
![Page 34: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/34.jpg)
Implementation Details: The Transparent Protection Framework
![Page 35: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/35.jpg)
Implementation Details: The Request Intercept Wrapper
• Two versions were used: 1. PHP version which uses cURL to interact with the
application 2. Java Servlet version which uses a connection to interact
with the application
• Both versions handled both the post and get requests.
• Browser only sees wrapper's url, never the application page's url
• Both will work with any form, no matter the combinations of controls
![Page 36: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/36.jpg)
Implementation Details: The Transparent Protection Framework
![Page 37: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/37.jpg)
Implementation Details: PHP Application & Db Driver Intercept
• Four applications exist
- essentially the same code with minor variations
• Two Database Driver Intecept Wrappers exist
- essentially the same code with minor variations
- they are php include files
- each file has taint aware functions that wrap the _query and fetch array functions of their respective _databases
![Page 38: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/38.jpg)
Implementation Results: PHP Application & Db Driver Intercept
• Was not totally transparent - application needed modification to specify the
include files, and rename two functions
• But we did successfully: - propagate taint from user input all the way back
to the user output - transparently detect and stop SQL injection - show our method work on different databases and
different operating systems - produce an easy to implement solution to increase
the security of legacy programs
![Page 39: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/39.jpg)
Implementation Results: PHP Application & Db Driver Intercept
![Page 40: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/40.jpg)
Implementation Results: PHP Application & Db Driver Intercept
![Page 41: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/41.jpg)
Implementation Results: PHP Application & Db Driver Intercept
![Page 42: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/42.jpg)
Implementation Details: Java Application
• One application, reachable in two ways
• Has modified String & Character classes that will not break application at ("A").equals(" ") or ('A').equals(' ')
![Page 43: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/43.jpg)
Implementation Details: Java DB2 Database Intercept Wrapper
• Is a collection of custom taint aware classes
• The original ibm.db2.jdbc.app.DB2Driver class is wrapped with our taint aware Db2DriverIntercept class
• We then drill down and also wrap the Connection, PreparedStatement, and ResultSet interfaces and augment their existing methods to provide transparent SQL injection protection
![Page 44: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/44.jpg)
Implementation Results: Java Application & Db Driver Intercept
• Was not totally transparent - application needs to call our driver instead of the
IBM’s database driver
• But we additionally showed that our character level taint method could:
- work on different programming languages (php and java) and paradigms (procedural and OOP)
- propagate between different languages and different servers
- could be handled transparently by modifying Java’s String and Character class operations
![Page 45: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/45.jpg)
Application Breaks & Work Arounds
• Java: the char is a primitiveif ('A'==' ') … is as far as we can keep taint
information accurate Thereafter, taint information is lost no further propagation
- if allowed to alter source code then replace ('A'==' ')with taint aware custom method ('A'.equals(' '))to allow taint to propagate even further within an application.
![Page 46: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/46.jpg)
Application Breaks & Work Arounds
• php: strings are considered primitiveif ("AB"==" ") … is as far as we can keep taint
information accurate Thereafter, taint information is lost no further propagation
- if allowed to alter source code then replace ("AB"==" ") with taint aware custom method (("AB".equals(" "))to allow taint to propagate even further within an application.
NB! If our method were to be adopted universally, the above could be overcome by modifying the JVM or PHP engine
![Page 47: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/47.jpg)
Other Possible Uses of Our Character Level Tainting Method
• Tainting and tracking of multiple input sources– there are a lot of unassigned codepoints– many tainted character sets could be created to
indicate different data sources (ex. keyboard, file, database, remote login, ...)
• Storing tainted characters in log files to make user input immediately recognizable
• Tainted characters can be stored in a database & retrieved via using taint in queries
![Page 48: Taint Tracking Through UTF Extension](https://reader033.fdocuments.us/reader033/viewer/2022051117/5681587e550346895dc5e166/html5/thumbnails/48.jpg)
Other Possible Uses of Our Character Level Tainting Method