Enterprise Data Warehouse
ABC University
Final Course Project
MIS 563 – Business Intelligence Systems
Professor Miriam Masullo
Ting Yin
February 22, 2015
1
Table of Contents
Introduction---------------------------------------------------------------------------------------------- 3
Business Intelligence System Procedure-------------------------------------------------------------3
Project Requirements ----------------------------------------------------------------------------------3
Assumptions ---------------------------------------------------------------------------------------------4
Technical Infrastructure Enhancements -------------------------------------------------------------4
Project Requirements Definition Activities -------------------------------------------------------5
Project Plan -------------------------------------------------------------------------------------------5
Database Design ---------------------------------------------------------------------------------6
Snow Flake Schema ----------------------------------------------------------------------------------7
Data Model --------------------------------------------------------------------------------------------8
Extract/Transfer/Load--------------------------------------------------------------------------------8
Data Mining Tool ----------------------------------------------------------------------------------10
Conclusion ---------------------------------------------------------------------------------------11
2
Introduction
ABC University has asked for a data warehouse that can provide a unified view of
information about its students, staff, and instructors. The school’s data is currently stored in
multiple databases. The objective is to create an effective and efficient way for storing, keeping,
and retrieving the data. This paper will describe the proposed database.
Data modeling, the database, and modeling tools can be examined in terms of Bill
Inman’s theory. This current paper will discuss Informatica’s Extract Transfer Load (ETL) tool,
which produces clean data, and the Oracle Data Miner (ODM), which is used as a selection tool.
The goal is to transition to a larger, unified system. It is hoped that a business intelligence
(BI) system can bring about the changes that will allow the school to stay competitive in the
market.
BI Procedure
The following will explain the BI procedures that will be followed during implementation
and to facilitate further improvement. BI introduces the business opportunity that must be
addressed, and the discussion of the system will continue throughout this paper
Project Requirement
Enterprise data warehouse for ABC University will use the BI Application Release
Concept. The BI model is used for software development. The model develops systematically
from one phase to the next in a downward-flowing fashion. The model follows steps in order
from the beginning, (1) to (10): 1) Business Opportunity, 2) Decision-Support Strategy, 3)
Project Planning, 4) Strategic Information Requirements, 5) Business Analysis, 6) Design, 7)
Development, 8) Testing, 9) Implementation, and 10) Release Evaluation (Atre, 8).
3
Educational business opportunities are the primary drivers for this academic BI
application. The proposed BI applications are implemented across organizational BI design and
development plans by incorporating and analyzing data across various similar organizations and
departments. BI decision-support requirements are more strategic information requirements than
operational functional requirements. Analysis of BI projects emphasizes educational business
analysis. The ongoing BI application releases assessment and evaluation to promote iterative
development.
Assumptions
It is assumed that all the computers involved in this project have accesses to the Internet.
The databases and warehouses will not be accessible by computers without online access. As a
second assumption, the participants have at least some basic training in business intelligence or
related studies. They can follow direction and catch up with plans on their own without further
training in BI.
Technical Infrastructure Enhancement (Atre, 120)
1) New database management system (DMS) or upgrades to the existing DMS:
2) New development tools
3) New data access or reporting tools
4) New data mining tool
5) New metadata repository or enhancements to it
6) New network requirements
4
Project Requirements Definition
The BI activities will follow the path as described in the diagram below. BI project scope
will be addressed continuously to ensure that the objective remains achievable within the defined
timeframe. Items 1 and 2 can define the technical and non-technical enhancements. The
requirement announcement will inform the participants in the BI project about the types of
software and hardware that are needed. Items 3 and 4 will address reporting requirements and
data sources as requested by the business analysts. The data model and service level agreement
will be developed after the scope is reviewed. Each of these items can be secured to generate a
detailed requirements document that will be referred to throughout the initial release. (Atre, 120)
Project Plan
A project plan has been organized to show the timeframe to carry out this project.
Specific business intelligence tasks will be followed in the following table.
5
Database Design
I will apply Bill Inmon’s approach while designing the database. Inmon’s techniques and
requirements are able to accommodate the needs of ABC University’s BI project. Inmon uses a
6
“top-down” approach to a data warehouse schema architecture. The dimensional data within the
data warehouse will contain information about specific business processes. If data marts are used
to rapidly retrieve reports, this can work well with the university’s requirements.
Data marts that gather information from a centralized data repository will allow the
school to effectively and efficiently use the warehouse. There will be specific data marts for
students, faculty, and non-instructional staff. Each student record must contain a unique student
identification number that allows information about that student to be accessed. Each student
should have only one student identification number.
Unique student, faculty, and employee identification numbers will be used to connect the
database for reporting purposes. Since each number is unique, the Oracle database can establish
and process individuals who accessed data through the database. Additionally, the system can set
up specific keys for connection. Oracle is the database of choice for the school system, as it is a
well-established database and features many modern tools.
Snowflake Schema
Snowflake schema can be used to track internal data. The structure consists of a
centralized database, which contains all information about students, staff, and faculty and points
to other related structures to access specific information by means of:
1. Centralized DB
a. A link that uses a primary identification key to access general information:
b. Name, address, phone number, and student, staff, and faculty ID numbers can be
accessed in this way
2. A link that can utilize a primary identification key to access financial information:
a. A secondary key can be created to allow access to departmental information
b. The database will store information related to each department’s employees
3. A link that will use a primary instructor identification key to access faculty information:
a. Employee title, grade level, salary, start and end dates, office, and courses taught can be
accessed in this way
b. A second key can be created to connect with a database that will store lists of students,
classroom locations, and assigned textbooks
7
4. A link that will use the USI to access employee information:
a. Employee position, start date, and salary can be accessed in this way
b. A second key can be created to connect to departmental databases
Data marts will be created based on the data about students, staff, and faculty that will be
needed for reports. SQL will be the tool of choice for generating reports for upper management.
These reports will help the school grow and develop by identifying areas that could be improved
and changes that would reduce redundancy. A significant percentage of the potential
improvements can be generated from the improved database structure, or by physical changes.
Data Model
The following diagram shows a sample of the Entity Relationship Diagram (ERD) data models
that can be used. The model shows the relationship among the data entities.
http://www.assignmenthelp.net/assignment_help/ER-diagram-for-institute
Extract/Transfer/Load
Extract, transform, and load (ETL) is a process of extracting data from one database,
manipulating that data, and then placing the resultant dataset into another database. After the data
have been arranged, sorted, and analyzed, it can become an important tool for helping ABC
University make better decisions. This makes the BI process an integral part of any decision
support system.
8
Many large organizations have accumulated numerous years of data. The data may have
been derived from customer information that was originally gathered from old COBOL
applications. The old version can now be upgraded and can combine a series of data marts. The
data need to be reconciled and organized so that new systems can accept the information.
The ETL process follows somewhat unique procedures to achieve a uniform format.
Reformatting the ETL process requires taking all the data sources and arranging the data in a
format that can be used later for analyses. It can combine the data and minimize the vast number
of duplications that organizations have accumulated. Data cleansing is needed to eliminate
incomplete data, orphan records, and any other dirty data. Using an ETL tool can provide a
structured design, data cleansing, and support operational resilience. Before any automated tool
can be used, it is important to have a map of the target database to be created.
ETL Tool Selection
The ETL process is both demanding and intensive. An organization may perform an
extraction in a company that requires a tool to cleanse the data and then complete the ETL
process. However, there is no single best product to accomplish everything. The DB environment
has a vast spectrum of challenges. Those challenges may require some ETL tools to be more
focused in specific areas. Expenses, experience, support, UI, and scalability are just some of the
factors that the University must address when selecting an ETL tool.
Informatica
For as long as data existed, there has been a need for data integration. As employers
transitioned from traditional mainframes to a client/server set-up and now to cloud computing,
the BI process really began to grow. Informatica provided the call to provide solutions for
organization’s data integration. Informatica’s core business model is focused around ETL, data
masking, data quality, data replication, and information lifecycle management. The obvious
point here is that they need to understand data. Informatica provides different aspects of data
integration and have been involved in the development of cloud computing.
Informatica’s PowerCenter Express Enterprise
Given Informatica’s versatility and scalability, my selection is Informatica’s PowerCenter
Express Enterprise. The PowerCenter Express solution has two levels of standard and enterprise
9
editions. PowerCenter offers an end-to-end solution that can help organizations to transfer data
from older databases: from the old COBOL compiled DBs onto the mainframe of current SQL
DBs. Then, it converts them into one target data warehouse or data mart. PowerCenter may have
a simple GUI for the novice, which has been deemed extremely capable by experienced users.
Informatics has the ability to read data and clean data from multiple platforms. When Informatics
is compared to Ascential, the performance results are impressive.
Summary: ETL Tools’ Effectiveness
For the majority of companies that integrate databases, it is often necessary to use ETL
tools. However, even when a developer has acquired in-depth experience about ETL, using a tool
can provide consistency. Developing a custom tool may also result in a custom fee. When the
developer and users merge with each other, the custom setup may have issues that could in turn
increase the cost. The developer can use industry standards for the next developer to build upon.
Fortunately, as an organization grows and includes more data in their data warehouse, savings
can accumulate rather quickly.
Data Mining Tool
Once all the data are in one place, the next step is to find the useful information that can
be extracted from the database. It is very important to understand how these data can be
manipulated and used for the benefit of the University’s mission. To achieve this goal, we should
choose the right tool to help the user find patterns, hidden knowledge, and useful information
within the data. We have chosen Oracle for ETL (extraction, transformation, and loading).
Choosing Oracle Data Mining (ODM) over other tools is a more cost effective decision
for both software and labor. ODM offers data mining functionality as a native SQL function
within the Oracle database. This tool of choice for data mining and reporting offers many
algorithms that can be used to address any business problem. ABC University’s database uses
Oracle and ODM.
ODM also offers a graphical user interface (GUI) to show users various data patterns and
relationships resulting from data mining. The GUI enables Oracle data analysts to work with the
data already stored in the university’s database, and it can assist with the university’s community
initiatives by offering predictions and recommendations. Data mining GUI offers user-friendly
10
tools that can help people explore the data graphically and create and evaluate multiple data
mining models. The GUI applies ODM models to new data and reveals insights and predictions
throughout the enterprise.
Oracle’s database SQL Application Program Interface mines Oracle data and releases
results in real-time. The data, models, and results remain in the Oracle database so that data
movement is minimized, data security is maximized, and latency of information is shortened.
ODM serves over 400,000 (400 * 10 3 ) customers in more than 100 different countries. It also
provides Cloud computing solutions as an open source database. ODM comes with many other
applications to handle the high technology demands of ABC University. ODM’s GUI is free of
charge and comes fully equipped with Oracle SQL Developer. The tool has visibility that stores
data, provides visualization of graphical data, and accesses multiple data models. To help users
learn how to use the software faster, Oracle’s Live Virtual Class online training speeds up the
learning curve
Conclusion
The BI system proposed for an enterprise-wide data warehouse has now become
available. This structure will offer a unified view of the school’s information, combining
departmental information into a seamless data retrieval system. Newly developed tools can
provide users with quick and reliable information in half the time of the current system. The
proposed BI system is targeted to meet the school’s needs. The prices of the additions are also
affordable for the University.
The proposal discussed in here will transfer ABC University to earn competitive
advantage for business related or academic related processing. The BI system can provide
reporting, prediction, and analytics needed for the school to stay competitive in today market.
The BI systems are supposed to meet all the school objectives and goals for upcoming students,
instructors, and employees. The BI system will be able to provide the user the tool they need to
face any situation the participants in the university may face such as course registration, tuition,
and billing. The new BI system can help the school to provide useful information that the school
needs to remain competitive in today market.
11
Reference
Atre, Larissa. T. Moss S. (2003). Business Intelligence Roadmap: The Complete Project Lifecycle for DecisionSupport Applications. Pearson Learning Solutions. VitalBook file.
Berger, Charles. (2012) “Oracle Data Mining Blog” Retrieved fromhttps://blogs.oracle.com/datamining/tags/virtual
George, S. (2012) “Inmon vs. Kimball: Which approach is suitable for your data warehouse?” Retrieved from http://searchbusinessintelligence.techtarget.in/tip/Inmon-vs-Kimball-Which-approach-is-suitable-for-your-data-warehouse
Informatica. (2014). “Why Informatica? Why now?” Retrieved fromhttp://www.informatica.com/Images/03045_6485_why-informatica.pdf
Oracle. (2014). “Oracle Data Mining”. Retrieved fromhttp://www.oracle.com/
Oracle. (2014). “Oracle Database Developer Data Modeler” Retrieved from http://www.oracle.com/technetwork/developer-tools/datamodeler/overview/index.html
TechTarget. (2014). “Informatica PowerCenter Real Time Edition (PowerCenter RTE)” Retrieved from
http://searchdatamanagement.techtarget.com/review/Informatica-PowerCenter-Real-Time-Edition-PowerCenter-RTE
12
Top Related