Data Transformation for Analysis Purposes Presented By: Gregg Ravenscroft Khulisa Management...

11
Data Transformation for Analysis Purposes Presented By: Gregg Ravenscroft Khulisa Management Services E-mail: [email protected] Tel: (011) 447 6464

Transcript of Data Transformation for Analysis Purposes Presented By: Gregg Ravenscroft Khulisa Management...

Page 1: Data Transformation for Analysis Purposes Presented By: Gregg Ravenscroft Khulisa Management Services E-mail: gravenscroft@khulisa.comgravenscroft@khulisa.com.

Data Transformation for Analysis Purposes

Presented By: Gregg Ravenscroft

Khulisa Management Services

E-mail: [email protected]

Tel: (011) 447 6464

Page 2: Data Transformation for Analysis Purposes Presented By: Gregg Ravenscroft Khulisa Management Services E-mail: gravenscroft@khulisa.comgravenscroft@khulisa.com.

Scenario

Organisation faced with disparate data sets that are: Decommissioned systems Multiple systems In a format non-conducive to analysis

But… Information in data sets needed for analysis Data structure not allowing analysis Information is available but inaccessible

Page 3: Data Transformation for Analysis Purposes Presented By: Gregg Ravenscroft Khulisa Management Services E-mail: gravenscroft@khulisa.comgravenscroft@khulisa.com.

Data Transformation Goals

Overcome challenge of variable underlying data set structures throughCreating a uniform, integrated

data set that allows for timely and easily accessible reports

Integrating data needs according to a central schema

Page 4: Data Transformation for Analysis Purposes Presented By: Gregg Ravenscroft Khulisa Management Services E-mail: gravenscroft@khulisa.comgravenscroft@khulisa.com.

Typical Ways Data Sets Vary

Non-standardised table and field names where information or content is similar

Differences in way data is stored within data sets (fields entered as text information, while others are numeric or code designated)

Similar naming conventions in data sets for different information

Page 5: Data Transformation for Analysis Purposes Presented By: Gregg Ravenscroft Khulisa Management Services E-mail: gravenscroft@khulisa.comgravenscroft@khulisa.com.

Step One Solution

Evaluate different data sets Focus on data base structure to respond to

organisation’s reporting requirements Collaborate on an ideal data structure

designed for ease of analysis Involve stakeholders and ensure buy-in Design new business processes

Page 6: Data Transformation for Analysis Purposes Presented By: Gregg Ravenscroft Khulisa Management Services E-mail: gravenscroft@khulisa.comgravenscroft@khulisa.com.

Step One Solution (cont)

Acquire the dataMaintain the integrity of the data

setsEnsure transfer process

maintains reliability and validity of data

Page 7: Data Transformation for Analysis Purposes Presented By: Gregg Ravenscroft Khulisa Management Services E-mail: gravenscroft@khulisa.comgravenscroft@khulisa.com.

Step Two Data Extraction

Uncomplicated extraction: Importing an excel spreadsheet from an MS

Excel file Converting word documents to Excel and

then exporting spreadsheet Importing a C.S.V file

Complicated extraction: Setting up relationships to external data

systems such as Oracle, MS SQL and PostgreSQL

Page 8: Data Transformation for Analysis Purposes Presented By: Gregg Ravenscroft Khulisa Management Services E-mail: gravenscroft@khulisa.comgravenscroft@khulisa.com.

Step Three Transformation

Utilise a third party system Follow schematic outline agreed with

stakeholders Investigate the process of converting

data formats though use of a data dictionary

Use dimension and mapping system

Page 9: Data Transformation for Analysis Purposes Presented By: Gregg Ravenscroft Khulisa Management Services E-mail: gravenscroft@khulisa.comgravenscroft@khulisa.com.

Step Three Transformation (cont)

Map information in data sets Take account of inherent dimensions Specify how the data will fit into the refined

output data set/s Design internal checks to:

Minimise mapping errors Reject incorrect mappings

Transformation system does not store data, “translates” data from source to destination

Page 10: Data Transformation for Analysis Purposes Presented By: Gregg Ravenscroft Khulisa Management Services E-mail: gravenscroft@khulisa.comgravenscroft@khulisa.com.

Step Four - Loading

Process where data is ‘deposited’ into a data warehouse (postgrSQL allows for efficient storage)

Load process done through series of SQL scripts

Loading process has series of checks that to ensure all data from source can be accounted for at destination

Page 11: Data Transformation for Analysis Purposes Presented By: Gregg Ravenscroft Khulisa Management Services E-mail: gravenscroft@khulisa.comgravenscroft@khulisa.com.

Conclusion & Questions

Data Transformation is Vital to data analysis across programmes Essential to optimise use of current (multiple)

data sets But…

Requires a high level of data base expertise and scripting ability

Questions