Data Transformation for Analysis Purposes Presented By: Gregg Ravenscroft Khulisa Management...
-
Upload
baldric-cross -
Category
Documents
-
view
212 -
download
0
Transcript of Data Transformation for Analysis Purposes Presented By: Gregg Ravenscroft Khulisa Management...
Data Transformation for Analysis Purposes
Presented By: Gregg Ravenscroft
Khulisa Management Services
E-mail: [email protected]
Tel: (011) 447 6464
Scenario
Organisation faced with disparate data sets that are: Decommissioned systems Multiple systems In a format non-conducive to analysis
But… Information in data sets needed for analysis Data structure not allowing analysis Information is available but inaccessible
Data Transformation Goals
Overcome challenge of variable underlying data set structures throughCreating a uniform, integrated
data set that allows for timely and easily accessible reports
Integrating data needs according to a central schema
Typical Ways Data Sets Vary
Non-standardised table and field names where information or content is similar
Differences in way data is stored within data sets (fields entered as text information, while others are numeric or code designated)
Similar naming conventions in data sets for different information
Step One Solution
Evaluate different data sets Focus on data base structure to respond to
organisation’s reporting requirements Collaborate on an ideal data structure
designed for ease of analysis Involve stakeholders and ensure buy-in Design new business processes
Step One Solution (cont)
Acquire the dataMaintain the integrity of the data
setsEnsure transfer process
maintains reliability and validity of data
Step Two Data Extraction
Uncomplicated extraction: Importing an excel spreadsheet from an MS
Excel file Converting word documents to Excel and
then exporting spreadsheet Importing a C.S.V file
Complicated extraction: Setting up relationships to external data
systems such as Oracle, MS SQL and PostgreSQL
Step Three Transformation
Utilise a third party system Follow schematic outline agreed with
stakeholders Investigate the process of converting
data formats though use of a data dictionary
Use dimension and mapping system
Step Three Transformation (cont)
Map information in data sets Take account of inherent dimensions Specify how the data will fit into the refined
output data set/s Design internal checks to:
Minimise mapping errors Reject incorrect mappings
Transformation system does not store data, “translates” data from source to destination
Step Four - Loading
Process where data is ‘deposited’ into a data warehouse (postgrSQL allows for efficient storage)
Load process done through series of SQL scripts
Loading process has series of checks that to ensure all data from source can be accounted for at destination
Conclusion & Questions
Data Transformation is Vital to data analysis across programmes Essential to optimise use of current (multiple)
data sets But…
Requires a high level of data base expertise and scripting ability
Questions