HENGHA: DATA HARVESTING DETECTION ON HIDDEN DATABASES Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi...
-
date post
20-Dec-2015 -
Category
Documents
-
view
217 -
download
0
Transcript of HENGHA: DATA HARVESTING DETECTION ON HIDDEN DATABASES Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi...
![Page 1: HENGHA: DATA HARVESTING DETECTION ON HIDDEN DATABASES Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi University of California, Santa Barbara CCSW 2010.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a30700/html5/thumbnails/1.jpg)
HENGHA: DATA HARVESTING DETECTION ON HIDDEN
DATABASES
Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi
University of California, Santa Barbara
CCSW 2010
![Page 2: HENGHA: DATA HARVESTING DETECTION ON HIDDEN DATABASES Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi University of California, Santa Barbara CCSW 2010.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a30700/html5/thumbnails/2.jpg)
Data Security Concern: Back-End Databases of Web-based Applications• Form-based query interfaces provide entrance to both
users and attackers.
• Traditional Attacks• Submit malicious requests to break in the hidden database
through vulnerable holes in the application, e.g. SQL injection [Vale05].
• Many can be detected by prior work.
10/8/2010 2
![Page 3: HENGHA: DATA HARVESTING DETECTION ON HIDDEN DATABASES Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi University of California, Santa Barbara CCSW 2010.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a30700/html5/thumbnails/3.jpg)
Data Security Concern: Back-End Databases of Web-based Applications
• Data Harvesting Attacks• Iteratively submit legitimate queries to extract data inventory or
infer sensitive aggregate information.
• E.g 1. A competitor of a car rental company A harvested A’s inventory about a popular car.
• E.g 2. Terrorists inferred that a flight was relatively empty and could be a hijacking target.
10/8/2010 3
![Page 4: HENGHA: DATA HARVESTING DETECTION ON HIDDEN DATABASES Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi University of California, Santa Barbara CCSW 2010.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a30700/html5/thumbnails/4.jpg)
Anatomy of Data Harvesting Attacks
• General strategy• Iteratively submit legitimate queries with valid fields, analyze the
results and then design new queries with the goal of maximizing information gain through limited #queries.
• Two types of harvesting attacks to consider• Crawling Attack
• Performed by deep web crawling [Madh08]
• Sampling Attack• Performed by uniform random sampling on results of sizes no more
than K [Dasg09]
10/8/2010 4
![Page 5: HENGHA: DATA HARVESTING DETECTION ON HIDDEN DATABASES Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi University of California, Santa Barbara CCSW 2010.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a30700/html5/thumbnails/5.jpg)
How To Defend Against Data Harvesting Attacks
• Database inference control [Denn83]?• Query set restriction is not effective, especially on sampling
attacks.• Query set restriction and data perturbation [Dasg09] hurt usability.
• Web robot detection [Tan02]?• Data harvesters can camouflage normal users’ http traffic patterns.
10/8/2010 5
![Page 6: HENGHA: DATA HARVESTING DETECTION ON HIDDEN DATABASES Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi University of California, Santa Barbara CCSW 2010.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a30700/html5/thumbnails/6.jpg)
Our Approach• Detection based on search behaviors within sessions
• Attackers’ search behaviors• Diversity
• Queries are not concentrated and localized, and they reflect very• distinct intents
• Broadness• The results of the queries cover a broad scope of the underlying data.
10/8/2010 6
![Page 7: HENGHA: DATA HARVESTING DETECTION ON HIDDEN DATABASES Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi University of California, Santa Barbara CCSW 2010.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a30700/html5/thumbnails/7.jpg)
HengHa: Detecting Data Harvesting Attacks at Single Session Level
• Identify data harvesting attackers by examining if their search behaviors in a session show relatively significant diversity and broadness.
• Diversity -> query correlation• Broadness -> result coverage
10/8/2010 7
Heng: query correlationobserver
Ha: resultcoveragemonitor
HengH
a
DETECTOR
Web
Application
DB
query
resultsuspicious
![Page 8: HENGHA: DATA HARVESTING DETECTION ON HIDDEN DATABASES Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi University of California, Santa Barbara CCSW 2010.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a30700/html5/thumbnails/8.jpg)
Queries in a Session That Plans Trip to Chicago
Heng: Query Correlation Observer
• Key idea• Frequent predicate value sets as indications of correlations
among queries
• Intuitively, if a session has more frequent predicate value sets with higher supports, and those predicate value sets are more similar to the queries, the queries in this session are more correlated.
10/8/2010 8
![Page 9: HENGHA: DATA HARVESTING DETECTION ON HIDDEN DATABASES Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi University of California, Santa Barbara CCSW 2010.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a30700/html5/thumbnails/9.jpg)
Ha: Result Coverage Monitor
• Key idea• Sort multi-attribute data D in a
total order, e.g. z-curve, that preserves locality.
• Create a coverage bit vector (CBV), where the bits correspond to the data in the total order.
• Access a data -> set a bit
• Training• Cluster CBVs to model
different data access patterns
x
y
0
1
2
3
10/8/2010 9
1110110001000000
![Page 10: HENGHA: DATA HARVESTING DETECTION ON HIDDEN DATABASES Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi University of California, Santa Barbara CCSW 2010.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a30700/html5/thumbnails/10.jpg)
Experiment• Extracted 98,564 real user query sessions and a data table of 387 records
from KDD Cup 2000 clickstream dataset
• Synthesized 1000 attack sessions [Madh08, Dasg09]
• Run on a server with Intel 2.4GHz CPU, 3GB RAM and FC 8 OS
• Performed four folds cross-validation
10/8/2010 10
Effectiveness of Detection in Four ValidationsEfficiency of Detection in Four Validation
![Page 11: HENGHA: DATA HARVESTING DETECTION ON HIDDEN DATABASES Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi University of California, Santa Barbara CCSW 2010.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a30700/html5/thumbnails/11.jpg)
Conclusion & Future Work• Identified non-traditional data harvesting attacks on the
back-end databases of web-based applications, i.e. crawling attack and sampling attack.
• Detection based on identifying attackers’ special search behaviors at single session level, diversity->query correlation observer, broadness->result coverage monitor.
• Detecting cross-session data harvesting attacks will be considered in the future work.
10/8/2010 11
![Page 12: HENGHA: DATA HARVESTING DETECTION ON HIDDEN DATABASES Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi University of California, Santa Barbara CCSW 2010.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a30700/html5/thumbnails/12.jpg)
References• [Vale05] F. Valeur et al. A learning-based approach to the
detection of sql attacks. In DIMVA, pages 123–140, 2005.• [Dasg09] A. Dasgupta et al. Privacy preservation of
aggregates in hidden databases: why and how? In SIGMOD, pages 153–164, 2009.
• [Madh08] J. Madhavan et al. Google’s deep web crawl. PVLDB, 1(2):1241–1252, 2008.
• [Tan02] P.-N. Tan et al. Discovery of web robot sessions based on their navigational patterns. Data Min. Knowl. Discov., 6(1):9–35, 2002.
• [Denn83] D. E. Denning et al. Inference controls for statistical databases. Computer, 16(7):69–82, 1983.
10/8/2010 12
![Page 13: HENGHA: DATA HARVESTING DETECTION ON HIDDEN DATABASES Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi University of California, Santa Barbara CCSW 2010.](https://reader030.fdocuments.us/reader030/viewer/2022032801/56649d545503460f94a30700/html5/thumbnails/13.jpg)
Thanks for Listening
10/8/2010 13