Database 2 - WordPress.com...We created a database on a server (survey.cs.unicam.it) and uploaded...
Transcript of Database 2 - WordPress.com...We created a database on a server (survey.cs.unicam.it) and uploaded...
Database 2
Diego CervelliniRiccardo Pancotti
General Index
● Introduction to Data Warehousing● Initial goals● Date Warehousing phases● Obtained reports● Required indexes● Conclusions
First of all - What is a Database?
● A database is an organized collection of data● Data are organized in models to be easily
queried● Most important aspects are accuracy,
availability, usability and resilience● It's not useful for detailed analysis aimed at
planning and decision making● Possible solution?
Data Warehousing
What is Data Warehousing?
● Data Warehousing consists in a set of methods, tools and technologies to assist the knowledge worker to carry out data analysis.
● It can starts from:○ an existing corporate database○ the Company Information Systems ○ data coming outside the corporate
Data Warehouse
Data Warehouse works as a repository used for reporting and analysis.
It has the following characteristics:● oriented to the subject of interest● integrated and consistent● representative of temporal evolution● non-volatile
Benefits of Data Warehouse
● Maintain data history● Integrate data from multiple source systems● Provide a single data model for all data● Improve data quality● Restructure the data so that it delivers excellent query
perfomance● OLAP vs OLTP
OLTP (On-line transactional processing)
OLAP (On-line Analytical Processing)
● Dynamic and multidimensional analysis.
● Works better with huge amount of data, summing up the performance of an enterprise.
● Interactivity is essential
● Transactions that read/write a small number of tuples from/to many tables connected by simple relations
● The workload core is "frozen", no interactivity.
Initial goals
Initial goals of our course were:
● Creation of a data warehouse from ESSE3 database
● Data extraction to obtain indexes● Report creation
Phases in Data Warehousing
Major phases in Data Warehousing:
● Extraction
● Cleaning
● Transformation
● Loading
Tools used
● SquirrelSql & Dbeaver: Sql clients used to analyze Esse3
● Pentaho Suite: open source BI suite with ETL and reporting capabilities
● MysqlWorkbench: Database design and administration tool, used to manage our local repository
Extraction
In this phase relevant data are extracted from data source.
The choice of the data to be extracted is mainly based on their quality.
Our Extraction
What have we done?Downloaded some useful tables from ESSE3 database, according to our goals and the suggestions of ESSE3 developers.
Tools used:● SquirrelSQL to obtain the SQL structure of the DB● Pentaho suite to download the tables from ESSE3 to
our local database.● MySQLWorkbench to create and manage our local
database.
Cleaning
Cleaning is used to improve the quality of thedata sources.It's about deleting and/or leaving out:● duplicate data● missing data● inconsistency between logical associated
values● ...
Our Cleaning
What have we done?We cutted all data that were inserted before 2008, because they are not useful for our purposes.
Tools used:● MySQLWorkbench to delete all unnecessary data.
Transformation
Converts data from operational source format to that of DW. The correspondence with the source level is complicated by the presence of distinct sources heterogeneous, requiring a complex integration phase.
Our Transformation
What have we done?We have changed the engine of tables (from Oracle one to InnoDB).We created indexes of each table.We linked the tables creating the foreign keys.
Tools used:● MySQLWorkbench to manage the tables changes
Loading
The loading of data into the DW ● Refresh: DW data are written in full,
replacing the previous ones (technique used to originally populate the DW)
● Update: only changes occurring in source data are added in DW (technique used for the periodic update of DW)
Our Loading
What have we done?We created a database on a server (survey.cs.unicam.it) and uploaded there our "clean" and modified tables.
Tool used:● MySQLWorkbench to re-create indexes and foreign
keys ● Pentaho suite to upload tables on the server
Obtained Reports
We worked on and analyzed our cleaned tables to try to retrieve some useful data that can influence the decision making process.
In this way we could give some useful information about Unicam, making the decision planning easier and faster.
Obtained Reports
1. Situation of first year exams of some faculties
2. Foreign students on total students percentage
3. Situation of exams between italian and foreign students
Situation of marks average between italian and foreign students
First year exams Pharmacy
First year exams Computer Science
First year exams Law faculty
Passed exams by Italian students
Passed exams by foreign students
Italian students marks average
Foreign students marks average
Percentage of foreign students on total from 2008
Percentage of foreign student on total by year
Calculating Indexes
One of the goals of our course was to calculate two different indexes for the FFO (Fondo di finanziamento ordinario).● A1: Atot = RAP * ( KA + KT )
● A2: University's weighted CFU / National's weighted CFU
Active studentsRegion wealth function
0,98
Number of Teacher /Courses 0,85
A1-Index
RAP = 5.092KT = 0,98KA = 0,85National Atot = ?
Atot = RAP*(KA+KT) = 9318,36
A1 = Local Atot/National Atot = ?
A2-Index
Acquired CFU = 171.058Expected CFU = 294.178MNG = 0,43National Weighted CFU = ?
PCFU = Expected CFU/Acquired CFU = 1,719755872Weight = PCFU/MNG = 3,999432261Weighted CFU = Weight*Acquired CFU = 684134,88372093
A2 = Local Weighted CFU/National Weighted CFU= ?
Conclusions
● We didn't managed to make a data-warehouse properly but just a collection of data-marts and some reports about it.
● We faced a lot of problems due to the inconsistency of ESSE3 database and its documentation, that sometimes didn't seem so clarifying and helpful.
● On the other hand we obtained useful reports and we realized how to work in team on such a "problematic" task.
THANKS FOR
YOUR ATTENTION!