Mining Gnome Data

15
MaxQDPro Team Anjan.K Harish.R II Sem M.Tech CSE 06/22/22 MCSE 202 : Topics in DB Systems 1

description

Concentrates on the mining issue in the GNOME database and the access methodlogy.

Transcript of Mining Gnome Data

Page 1: Mining Gnome Data

MaxQDPro TeamAnjan.K Harish.R

II Sem M.Tech CSE

04/12/23 MCSE 202 : Topics in DB Systems 1

Page 2: Mining Gnome Data

Introduction to GNOME Need for Mining Mining Challenges GNOME Data Access

◦ Database usage grid◦ Components◦ Features

GNOME Data miner Summary References

04/12/23 2MCSE 202 : Topics in DB Systems

Page 3: Mining Gnome Data

GNOME is acronym for GNU Network Object Model Environment.

International Project that provides software development frameworks initially developed for desktop environment.

GNU project compatible with Unix like OS and sit on the top Kernel

GNOME-DB ◦ aims to provide free unified data access architecture

to GNOME projects.◦ Known for its pretty good data management API’s.

04/12/23MCSE 202 : Topics in DB Systems 3

Page 4: Mining Gnome Data

The Explosive Growth of Data: from terabytes to petabytes◦ Data collection and data availability

Automated data collection tools, database systems, Web, computerized society

◦ Major sources of abundant data Business: Web, e-commerce, transactions, stocks, … Science: Remote sensing, bioinformatics, scientific

simulation, … Society and everyone: news, digital cameras, YouTube

We are drowning in data, but starving for knowledge! Data mining—Automated analysis of massive data sets also

called as Knowledge Discovery Process (KDD).

04/12/23MCSE 202 : Topics in DB Systems 4

Page 5: Mining Gnome Data

04/12/23MCSE 202 : Topics in DB Systems 5

Copyright © Data Mining and warehousing by Han et al.,

Page 6: Mining Gnome Data

Exceeds the designers expectations Data warehouses typically grow asynchronously. Establishing the scalability of a system across the

lifetime . Data is everywhere Data is inconsistent

◦ Records are different in each system ◦ Noisy Data

Performance issues◦ Running queries to summarize data for stipulated long

period takes operating system for task (max. Load)

04/12/23MCSE 202 : Topics in DB Systems 6

Page 7: Mining Gnome Data

GNOME has its own tool for data access similar to proprietary Microsoft OLE.

Key issue in the data access is with heterogeneous data sources and variety of different access methods to each of them

Access methods and SQL are not standards de-facto.

Its middleware to access various data sources Libgda is the actual tool used for this

purpose.

04/12/23MCSE 202 : Topics in DB Systems 7

Page 8: Mining Gnome Data

04/12/23MaxQDPro: Kettle- ETL Tool 8

Page 9: Mining Gnome Data

Consists of Three Major components◦ Libgda (Library Gnome Data access)

Data abstraction layer Manages data stored in databases Interfaces with Glib and LibXML Can be use for non-GNOME applications

◦ Libgnomedb DB widget library Depends on GTK+

◦ Mergeant Front end for DB administration and application

developers.

04/12/23MCSE 202 : Topics in DB Systems 9

Page 10: Mining Gnome Data

Easier access to several database engine Metadata extractor Easy to use API’s Comes with Console and Graphical UI Open source or General Public license Direct editing of DB data Compatible with most programming

language Distributed transactions are supported.

04/12/23MCSE 202 : Topics in DB Systems 10

Page 11: Mining Gnome Data

Open Source Data Mining Tools, collection of experimental GUI-based tools written in Python and GTK by Togaware

Uses GDA to access the heterogeneous data sources

Build the warehouse after essential processing and transformation steps with help flexible GNOME API’s

04/12/23MCSE 202 : Topics in DB Systems 11

Page 12: Mining Gnome Data

GUI can be used for the visual checks. Used on Unix- variant system like Debian,

Red Hat, Ubuntu etc., Mining system is generic so can be used

for most of the routine works. New Data mining tool by GNOME is Rattle Greening is a decision tree builder with

stochastic boosting and random forests

04/12/23MCSE 202 : Topics in DB Systems 12

Page 13: Mining Gnome Data

Some of the associated application with GDM◦ Decision trees◦ Apply Apriori Association rules for identifying

Frequent item set.◦ Bayes Classification for building and classifying

the trained data.◦ Bar chart and Binning Chart◦ GDM plot utility for Q-Q plot, Histogram

analysis, Correlation plot

04/12/23MCSE 202 : Topics in DB Systems 13

Page 14: Mining Gnome Data

Introduction to GNOME Need for mining

◦ KDD◦ Challenges

GNOME Data Access◦ Components◦ Features

GNOME Data Mining

04/12/23 14MCSE 202 : Topics in DB Systems

Page 15: Mining Gnome Data

[1] An article in URL http://www.gnome.org[2] Han et.al., “Data Mining and

Warehousing” 2nd Edition

[3] An article in URL http://www.gnomedb.org[4] An article in wikipedia.org

04/12/23 15MCSE 202 : Topics in DB Systems