A Data Masking Technique for Data Warehouses Ricardo Jorge Santos & Marco Vieira CISUC – DEI –...
-
Upload
simon-hollingshed -
Category
Documents
-
view
212 -
download
0
Transcript of A Data Masking Technique for Data Warehouses Ricardo Jorge Santos & Marco Vieira CISUC – DEI –...
A Data Masking Technique for Data A Data Masking Technique for Data WarehousesWarehouses
Ricardo Jorge Santos & Marco VieiraRicardo Jorge Santos & Marco Vieira
CISUC – DEI – FCTUCCISUC – DEI – FCTUCUniversity of Coimbra - PortugalUniversity of Coimbra - Portugal
Jorge BernardinoJorge Bernardino
CISUC – DEIS – ISECCISUC – DEIS – ISECPolytechnic Intitute of Coimbra - PortugalPolytechnic Intitute of Coimbra - Portugal
ISEL, Lisbon – September/2011ISEL, Lisbon – September/2011
INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUMINTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM
AgendaAgenda BackgroundBackground
22Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
AgendaAgenda
BackgroundBackground
MotivationMotivation
MOBAT: A MOD Based Data Masking TechniqueMOBAT: A MOD Based Data Masking Technique
Optimization FeaturesOptimization Features
Experimental ResultsExperimental Results
Conclusions and Future WorkConclusions and Future Work
MotivationMotivation MOBATMOBAT Optimizing FeaturesOptimizing Features Experimental ResultsExperimental Results Conclusions & Future WorkConclusions & Future Work
33
Security Concerns in Data WarehousingSecurity Concerns in Data Warehousing
A Data Warehouse (DW) is a critical asset for many A Data Warehouse (DW) is a critical asset for many
enterprisesenterprises
Stores all relevant historical and current business Stores all relevant historical and current business
information needed for supporting decision making information needed for supporting decision making
(sensitive data)(sensitive data)
Main targets for stealing or compromising sensitive dataMain targets for stealing or compromising sensitive data
Attack rate and complexity has increased in the recent Attack rate and complexity has increased in the recent
pastpast
AgendaAgenda BackgroundBackground MotivationMotivation MOBATMOBAT Optimizing FeaturesOptimizing Features Experimental ResultsExperimental Results Conclusions & Future WorkConclusions & Future Work
Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
44
Data Security DomainsData Security Domains
Data Confidentiality: Data Confidentiality: Only the right users should access the right Only the right users should access the right
datadata
Data Integrity: Data Integrity: Data should always be correct, authentic and Data should always be correct, authentic and
consistentconsistent
Data Availability: Data Availability: User should always be able to access data User should always be able to access data
whenever neededwhenever needed
AgendaAgenda BackgroundBackground MotivationMotivation MOBATMOBAT Optimizing FeaturesOptimizing Features Experimental ResultsExperimental Results Conclusions & Future WorkConclusions & Future Work
Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
55
Data Privacy Issues in Today’s DWs (Our Focus)Data Privacy Issues in Today’s DWs (Our Focus)
Masking solutions are not considered an acceptable Masking solutions are not considered an acceptable
solutionsolution
Encryption techniques introduce too much overheadsEncryption techniques introduce too much overheads Storage SpaceStorage Space Data Loading TimeData Loading Time Query Response TimeQuery Response Time
AgendaAgenda BackgroundBackground MotivationMotivation MOBATMOBAT Optimizing FeaturesOptimizing Features Experimental ResultsExperimental Results Conclusions & Future WorkConclusions & Future Work
Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
66
Data Privacy Issues in Today’s DWs (Our Focus)Data Privacy Issues in Today’s DWs (Our Focus)
Important feature: Important feature: Facts in DW’s are mainly numerical-based Facts in DW’s are mainly numerical-based
columns!columns!
AgendaAgenda BackgroundBackground MotivationMotivation MOBATMOBAT Optimizing FeaturesOptimizing Features Experimental ResultsExperimental Results Conclusions & Future WorkConclusions & Future Work
Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
77
MOBAT – MOd BAsed data masking Technique for DWsMOBAT – MOd BAsed data masking Technique for DWs
AgendaAgenda BackgroundBackground MotivationMotivation MOBATMOBAT Optimizing FeaturesOptimizing Features Experimental ResultsExperimental Results Conclusions & Future WorkConclusions & Future Work
Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
MOBAT System ArchitectureMOBAT System Architecture
88
MOBAT – MOd BAsed data masking Technique for DWsMOBAT – MOd BAsed data masking Technique for DWs
Suppose table T => set of N numerical columns Ci = {C1, C2, C3, …, CN) to mask; total set of M rows Rj = {R1, R2, R3, …, RM).
Each value to mask in the table identified as a pair (Rj, Ci)Rj and Ci respectively represent the row and column to which the value refers
Each new masked value (Rj, Ci)’ is obtained by applying the following formula (1) for row j and column i of table T:
(Rj, Ci)’ = (Rj, Ci) – ((K3, j MOD K1) MOD K2, i) + K2, i
The inverse formula (2) for retrieving the original value is:
(Rj, Ci) = (Rj, Ci)’ + ((K3, j MOD K1) MOD K2, i) – K2, i
AgendaAgenda BackgroundBackground MotivationMotivation MOBATMOBAT Optimizing FeaturesOptimizing Features Experimental ResultsExperimental Results Conclusions & Future WorkConclusions & Future Work
Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
99
MOBAT – Example DatasetMOBAT – Example Dataset
Supposing K1 = 7432, K2,1 = 34 and K2,2 = 17252
AgendaAgenda BackgroundBackground MotivationMotivation MOBATMOBAT Optimizing FeaturesOptimizing Features Experimental ResultsExperimental Results Conclusions & Future WorkConclusions & Future Work
Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
1010
MOBAT – Example DatasetMOBAT – Example Dataset
Supposing K1 = 9264, K2,1 = 12 and K2,2 = 78254
AgendaAgenda BackgroundBackground MotivationMotivation MOBATMOBAT Optimizing FeaturesOptimizing Features Experimental ResultsExperimental Results Conclusions & Future WorkConclusions & Future Work
Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
1111
MOBAT – QueryingMOBAT – Querying
Using TPC-H benchmark with four numerical fact columns (i = 4) (L_Quantity, L_ExtendedPrice, L_Tax and L_Discount) masked by MOBAT
New column L_KeyK3 for the j rows of the LineItem table, as the K3, j key
K1=9342K2, L_Quantity=12K2, L_ExtendedPrice=51234K2, L_Tax=6K2, L_Discount=4
SELECT SUM(L_ExtendedPrice * L_Discount) AS Total_RevenueFROM LineItem WHERE L_ShipDate>=TO_DATE('1994-01-01','YYYY-MM-DD') AND L_ShipDate<TO_DATE('1995-01-01','YYYY-MM-DD') AND L_Discount BETWEEN 0.05 AND 0.07 AND L_Quantity<24
SELECT SUM((L_ExtendedPrice+MOD(MOD(L_KeyK3,9342),51234)-51234) * (L_Discount+MOD(MOD(L_KeyK3,9342),4)-4)) AS Total_RevenueFROM LineItem WHERE L_ShipDate>=TO_DATE('1994-01-01','YYYY-MM-DD') AND L_ShipDate<TO_DATE('1995-01-01','YYYY-MM-DD') AND (L_Discount+MOD(MOD(L_KeyK3,9342),4)-4) BETWEEN 0.05 AND 0.07 AND (L_Quantity+MOD(MOD(L_KeyK3,9342),12)-12)<24
AgendaAgenda BackgroundBackground MotivationMotivation MOBATMOBAT Optimizing FeaturesOptimizing Features Experimental ResultsExperimental Results Conclusions & Future WorkConclusions & Future Work
Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
1212
MOBAT – Optimizing Features & PerformanceMOBAT – Optimizing Features & Performance
The inclusion of K3,j requires additional storage spaceThe inclusion of K3,j requires additional storage space
KK3,j3,j can be created in several ways, all with different impact can be created in several ways, all with different impact
in performance:in performance:
Simply adding a new column to the previous existing fact tableSimply adding a new column to the previous existing fact table
Recreating the fact table including KRecreating the fact table including K3,j3,j from the start from the start
Using a 128-bit integer column already existing in the fact table Using a 128-bit integer column already existing in the fact table
(typically can be the primary key column)(typically can be the primary key column)
AgendaAgenda BackgroundBackground MotivationMotivation MOBATMOBAT Optimizing FeaturesOptimizing Features Experimental ResultsExperimental Results Conclusions & Future WorkConclusions & Future Work
Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
1313
Experimental Evaluation Experimental Evaluation
AgendaAgenda BackgroundBackground MotivationMotivation MOBATMOBAT Optimizing FeaturesOptimizing Features Experimental ResultsExperimental Results Conclusions & Future WorkConclusions & Future Work
Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
2.8GHz CPU, 2GB RAM (512MB for Oracle SGA), 1.5TB SATA 2.8GHz CPU, 2GB RAM (512MB for Oracle SGA), 1.5TB SATA
HDHD
Oracle 11g DBMSOracle 11g DBMS
One standard benchmark and one real-world DWOne standard benchmark and one real-world DW TPC-H Decision Support Benchmark with 1GB and 10GB scaleTPC-H Decision Support Benchmark with 1GB and 10GB scale Real-world Sales DW (2GB storage size)Real-world Sales DW (2GB storage size)
1414
Experimental Evaluation Experimental Evaluation
AgendaAgenda BackgroundBackground MotivationMotivation MOBATMOBAT Optimizing FeaturesOptimizing Features Experimental ResultsExperimental Results Conclusions & Future WorkConclusions & Future Work
Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
1515
Experimental Evaluation Experimental Evaluation
AgendaAgenda BackgroundBackground MotivationMotivation MOBATMOBAT Optimizing FeaturesOptimizing Features Experimental ResultsExperimental Results Conclusions & Future WorkConclusions & Future Work
Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
1616
Experimental Evaluation Experimental Evaluation
AgendaAgenda BackgroundBackground MotivationMotivation MOBATMOBAT Optimizing FeaturesOptimizing Features Experimental ResultsExperimental Results Conclusions & Future WorkConclusions & Future Work
Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
1717
Conclusions Conclusions
Our technique decreases data storage space and Our technique decreases data storage space and
processing overheads, while still proving a significant level processing overheads, while still proving a significant level
of securityof security
Transparent method with minimal network bandwidth Transparent method with minimal network bandwidth
consumption overheads, due to only rewriting queriesconsumption overheads, due to only rewriting queries
Extremely easy and simple to implement in any DBMS / Extremely easy and simple to implement in any DBMS /
DW, with low costsDW, with low costs
Querying the database directly will produce only realistic Querying the database directly will produce only realistic
results (stored data is masked at all times)results (stored data is masked at all times)
AgendaAgenda BackgroundBackground MotivationMotivation MOBATMOBAT Optimizing FeaturesOptimizing Features Experimental ResultsExperimental Results Conclusions & Future WorkConclusions & Future Work
Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
1818
Future WorkFuture Work
Developing the technique for also masking alphanumeric Developing the technique for also masking alphanumeric
valuesvalues
Assess its security strength in comparison with other Assess its security strength in comparison with other
solutionssolutions
Developing the technique for increasing its security Developing the technique for increasing its security
strengthstrength Using higher-sized keysUsing higher-sized keys Enabling data integrity checksEnabling data integrity checks Implementing false data injectionImplementing false data injection
AgendaAgenda BackgroundBackground MotivationMotivation MOBATMOBAT Optimizing FeaturesOptimizing Features Experimental ResultsExperimental Results Conclusions & Future WorkConclusions & Future Work
Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
1919
THANK YOU!THANK YOU!
Questions and Comments?Questions and Comments?
Ricardo Jorge SantosRicardo Jorge [email protected]@gmail.com
ISEL, Lisbon – September/2011ISEL, Lisbon – September/2011
INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUMINTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM
A Data Masking Technique for Data A Data Masking Technique for Data WarehousesWarehouses