THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer...
Transcript of THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer...
![Page 1: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/1.jpg)
THE NEW POWER OF DATA: Collection, Integration, and Analytics
Wenny RahayuProfessor in Computer ScienceHead, School of Engineering and Mathematical Sciences
La Trobe University, Melbourne Australia
![Page 2: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/2.jpg)
Where is La Trobe?
35,000 students3,200 staff
![Page 3: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/3.jpg)
3
Moving from Databases to Data Container
“Everyday, 2.5 quintillion bytes of data are created and 90% of the data in the world today was created within the past two years”.
IBM Corporation
…1015 = quadrillion (petabytes)1018 = quintillion (exabytes)1021 = sextillion (zettabytes)
“Worldwide information is more than doubling every two years, with 1.8 zettabytes or 1.8 trillion gigabytes projected to be created and replicated this year alone”.ZDNet news
VOLUME
![Page 4: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/4.jpg)
4http://archive.tiecon.org/content/big-data-landscape-why-should-you-care
Means for Data Collection
![Page 5: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/5.jpg)
5
We are not quite sure what the exact definition of a Data Scientist is, but if you deal with something generally related to converting data into useful insight then you will hopefully benefit from joining the group.
Whether you’re in business, academia, or government, and whether you’re an analyst, data miner, programmer, student, electrical engineer, computer scientist, physicist, etc, and you work with data to generate insights, build predictive models, build optimisation models, build reports/dashboards/visualisations, automate analyses, etc, using python, R, SQL, C/C+, Java, Tableau, Excel, Hadoop, etc, and you care about doing it right, efficiently, repetitively, optimally, visually, etc, then join us!
Source: http://www.meetup.com/Data-Science-Melbourne/
Multi-Disciplinary
![Page 6: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/6.jpg)
66
New ways of developing drugs – Novartis New Drug Research
Novartis Institute for Biomedical Research (NIBR) in Cambridge, Mass.
• A new breed of “data scientist” is working to re-invent the traditional drug research team. Instead of biologists, chemists and clinicians working in silos, pharmaceutical companies such as Novartis are assembling collaborative, cross-disciplinary teams.
• These teams include data scientists, drawing on their expertise in computer science and statistics to sift through information and attempt to extract answers to pressing questions. They collaborate with biologists and clinicians to develop a clear hypothesis and then put it to the test.
• https://www.novartis.com/stories/discovery/surfing-wave-big-data-analytics
Data Inspires NewScientific Innovation
![Page 7: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/7.jpg)
7
Smart Sensor Solution and Real Time Data Analytics
Database
Pasture
Recording behaviour, activity and relationship of animals
Options for sensor data download
Sensor data will be saved securely in a database
system for post-analysis
Sensor on lambs
Base-station
Sensor on Ewes
Proximity approach
Handheld Reader
Computer
Smart Sensor
RF Communication
Activity Sensors:Accelerometers, Gyroscope,
Magnetometer, Temperature
Low Power Processing and Storage
Battery Powered and Power Management Unit
User InterfaceAdministration /
Configuration
Data Visualization
Reporting System
Alert System
User Interface
Analysed sensor data reports will be accessible through a
web-base user interface
Multidisciplinary work between IT, Engineering, Centre of Technology Infusion, and the Agricultural Department.
Will produce low cost, long life, sensors for use with farm animals to monitor motion, proximity and true location.
Sensor data and real-time data analytics will provide actionable information to farmers on (parentage, health, oestrous, grazing information etc.)
![Page 8: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/8.jpg)
8
Big Data - the bottleneck issue
8
Gathering & preparingdata
(70~80%)
Analyzingdata
(20~30%)
Homogenous, standard enterprise data
Gathering & preparingdata
(95%)
Analyzingdata(5%)
Heterogeneous, Big Data
* Reference from Prof. Timos Sellis – Data Ecosystem - From Very large databases to Big Data Infra structure, La Trobe November 2015
![Page 9: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/9.jpg)
9
Also known as data fusion, data blending, data mapping, data acquisition, etc…
Informal description by Roderick et. al http://www.odbms.org/2015/11/what-is-data-blending/:
“… the answer is not always written on the same book as the question. Thus, we must learn to decipher it from multiple books. Some of them are in a foreign language, some are hundreds of times thicker than others, and most of them are by different authors who have never agreed on a literary style. And there is no catalogue.”
Data Integration
![Page 10: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/10.jpg)
10
Data Integration
The need to deal with large data size and different complexity of data formats/structures
Integration can be achieved through:• Standardization of data representations • Global semantic representation: ontology or
schema mediator• “Loose coupling” integration: data virtualization,
data container
![Page 11: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/11.jpg)
Standardization of data representations
![Page 12: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/12.jpg)
12
XML as the common ‘language’ of representation� XBLR (eXtensible Business Reporting Language)
� BSML (Bioinformatics Sequence Markup Language)
� HL7 (Health Level Seven)
� FIX (Financial Information eXchange)
� AIXM (Aeronautical Information eXchange Model)
� GML (Geograhical Markup Language)
� MathML (Mathematical Markup Language)
� GBXML (Green Building eXtensible Markup Language)
� And so on….
Example of Standardization
![Page 13: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/13.jpg)
13
Snapshot of a standard XML representation in Aviation - AIXM
<AirportHeliport ..
<timeSlice> <AirportHeliportTimeSlice gml:id="AHa1"> <gml:validTime> <gml:TimePeriod gml:id="AHb1"> <gml:beginPosition>7/8/2004 0:0:0</gml:beginPosition> <gml:endPosition>12/31/8888 0:0:0</gml:endPosition> </gml:TimePeriod> </gml:validTime> <interpretation>BASELINE</interpretation>
<designator>NFFN</designator> <name>NADI</name> <type>AD</type> <magneticVariation>12.24</magneticVariation> <ARP> <ElevatedPoint gml:id="AHc1"> <gml:coordinates decimal="." cs="," ts=" "> 177.443333333333,-17.7563888888889 </gml:coordinates> <elevation uom="FT">59</elevation> </ElevatedPoint> </ARP>
……… </AirportHeliportTimeSlice> </timeSlice> </AirportHeliport>
![Page 14: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/14.jpg)
14
Integration of standard XML representation in Aviation : AIXM, WXXM, FIXM, etc.
ADMSOracle
AIXM 5.0Oracle
AutomatedMapping Specification
ADMSAIXM-based database LAYER 1
ADMS Mapping and Migration to new AIXM5 Database
EFB Charting Publication …Visualisation Tool
Transformation to produce flat XML documents
LAYER 3External Service Providers
AIXM document
WXXM Weather data
FIXM, NOTAMXML data
???Future XMLData
![Page 15: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/15.jpg)
15
The International Standard Body OGC - XML Standard in Aviation Domain
AIXM = Aeronautical Information Exchange Model
WXXM = Weather Information Exchange Model
FIXM = Flight Information Exchange Model
Source: OGC – www.opengeospatial.org
![Page 16: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/16.jpg)
16
The layering design approach enables the integration of AIXM data with other Aeronautical XML based data
Aeronautical Reference Data NOTAM Airport Spatial Data Dynamic-Temporal Messaging
WEATHER Data
A 0 3 1 2 / 0 8 N O T A M N
Q ) L K A A / Q F A X X / I V / B O / A / 0 00 / 9 9 9 / 5 0 0 6 N 0 1 4 1 5 E 0 0 5
A ) L K P R B ) 0 8 0 3 2 3 0 0 0 0 C ) P E R M
E ) N E W P O S T A L A D D R E S S O F L K P R A D : K E K R A L O V S K E M U L E T I S T I 6 / 1 0 1 9 1 6 0 0 8 P R A H A 6 R U Y Z N E .
yyyy mm tmax tmin af rain sun degC degC days mm hours 2008 1 5.0 -1.4 21 --- 29.7 2008 2 7.3 1.9 8 --- 71.9 2008 3 6.2 0.3 13 --- 101.4 2008 4 8.6 2.1 5 --- 128.6 2008 5 15.8 7.7 0 --- 180.4
NBA5683GG YSCBNOCX YUZZNCLX012322 YBBBZEZXC0120/10 NOTAMR C0119/10Q) YBBB/QXXXX/IV/NBO/A/000/999/1653S14545EA) YBCSB) 1003012322 C) 1003050930 ESTD) DAILY 0800/0930 1800/2100E) INCREASED FLYING FOX ACTIVITY
Data Integration
![Page 17: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/17.jpg)
17
The layering design approach enables the integration of AIXM data with other Aeronautical XML based data
Data Integration
D a ta T y p es S PAT IA L T E M P O R A L
N O TA M m e ssa g e
lo c a tio n c o o rd in a te s , a re a
c o o rd in a te s
valid s ta r t a n d end d a te s , d u ra tio n
AV IAT IO N R E F E R E N C E
d a ta
lo c a tio n
c o o rd in a te s , a re a c o o rd in a te s ,
sh a p e
valid s ta r t a n d
end d a te s , p e rm a n e n t o r
te m p o ra ry W E AT H E R
d a ta lo c a tio n c o o rd in a te s , a re a
c o o rd in a te s , te m p e ra tu re ,
p re s su re
valid s ta r t a n d end d a te s , d u ra tio n
Ta b le 1 Av ia t io n d a ta to be in te g ra te d with te m p o ra l a n d sp a tia l in te g ra tio n p o ints
X1234/09 NO TA M Q) YM M L/Q MR XX/IV /NBO /A /00/999/3767S14484E002/ A) YM ML B) 07068:0:0 C) 070610:0:0 EST E) RWY 16/34 CONDITIONAL DUE TO RESUR FAC ING
<AIRPORT_HELIPORT num ="2"> <AH_UUID>16468</AH_UUID> <NAME>MELBO URNE</NA ME> <DESIG N ATO R>YM M L</DES IG NATO R> <RUNW AY_FULL_CO DE>16/34</RU NW AY_FULL_CO DE> <RUN_DIR_V ID>11781</RUN_DIR_V ID> <AH_USG_LIM_C O DE>CO NDITIONAL<AH_USG _LIM _CO DE> <AH_W ARN_DESCR>Resurfacing</AH_W ARN_DESCR>
< /A IRPORT_HELIPO RT> </ALL_A IRPORT_HELIPO RT>
![Page 18: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/18.jpg)
• Global semantic representation: ontology or schema mediator
![Page 19: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/19.jpg)
19
The Ontology
Ontology Definition• O = (C, H, R, P, I, A), where
• C = a set of entities in the ontology (class and instance)
• H = a set of taxonomic relationships between concepts.
• R = a set of non-taxonomic ontology concept relationships.
• P = a property set of ontology concept entities that connects a class property into a datatype.
• I = a set of ontology instance declaration (the relationships of instances with its class, its property and value, and other instances).
• A = is a set of axioms and rules that allow consistency checking of an ontology and infer new knowledge through some inferencing mechanism.
![Page 20: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/20.jpg)
20
The Ontology
Ontology Development• via Domain Expert
• From scratch
• Mostly manual and time consuming
• Valid and rich knowledge within ontology
• via Data Transformation
• Existing data required
• Based on specific data format transformation (e.g. RDB and XML)
• Automatic
• Knowledge richness limited to database content
![Page 21: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/21.jpg)
21
What we need…
• Global common knowledge
• Local ontology development may not be shared globally
• The value of local knowledge for global development
• Rich and valid knowledge
• Automatic development process
• Does not rely on the availability of domain expert
• Domain experts are not always present
• Immediate development
• Maintainable knowledge
![Page 22: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/22.jpg)
22
A Data-Driven Dynamic Common Ontology
The Concept• (i) create common ontologies automatically from community knowledge
representations and
• (ii) maintain its content by: capturing dynamic knowledge changes and updates specific in the community, and capturing world recent updates (eg through social media and news).
• Contents updates are done through propagation and enrichment.
![Page 23: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/23.jpg)
23
A Data-Driven Dynamic Common Ontology
The Concept• (i) create common ontologies automatically from community knowledge
representations and
• (ii) maintain its content by: capturing dynamic knowledge changes and updates specific in the community, and capturing world recent updates (eg through social media and news).
![Page 24: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/24.jpg)
24
A Data-Driven Dynamic Common Ontology
The Creation • CO = (C, H, R, P, I, A, S, CV), where
• S = is a set of similarity values between ontology knowledge components (class, instances, non-taxonomic relationship, and properties) and its respected external similar ontology knowledge component.
• CV = is a set of confidence values Cv residing in an ontology instance knowledge, which takes the ratio between the number of knowledge sources that mention a knowledge and the total number of knowledge sources.
• Why Confidence Value (CV)?
• Knowledge stability assurance. The new extracted knowledge is not always being the best knowledge and one particular piece of knowledge from one community may not necessarily become global community knowledge representation.
![Page 25: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/25.jpg)
25
A Data-Driven Dynamic Common Ontology
The Propagation• Why ?
• Frequent change in community
• Validity assurance from the knowledge source
• How ?• Using delta script
• A delta script is very useful when the original file is located in another place or in the distributed environment, since sending the whole updated file will consume resources and result in a greater chance of information loss
![Page 26: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/26.jpg)
26
A Data-Driven Dynamic Common Ontology
The Enrichment• Why ?
• Global knowledge update
• Validity assurance from the global understanding
• How ?• Take RECENT related document (e.g. recent news article)
• Ontology + Linguistic Pattern –based extraction
• Self-enrichment : find related recent document by exploiting keywords from the common ontology.
• Domain independent• Confidence value is considered
![Page 27: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/27.jpg)
ReferencesFudholi, D.H., Rahayu, W., Pardede, E. (2015). A data-driven dynamic ontology. J. Information Science 41(3): 383-
398.
Fudholi, D.H., Rahayu, W., Pardede, E. (2014). CODE (Common Ontology DEvelopment): A Knowledge Integration
Approach from Multiple Ontologies. IEEE AINA , 751-758 (2014), Victoria Canada.
Fudholi, D.H., Rahayu, W., Pardede, E., Hendrik. (2013). A Data-Driven Approach toward Building Dynamic
Ontology. ICT-EurAsia 2013: 223-232 (2013), Indonesia
![Page 28: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/28.jpg)
• Global semantic representation: ontology or schema mediator
![Page 29: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/29.jpg)
29
o Data can arrive from various heterogeneous data sources.
o Data from different have different structures.
o In most cases the underlying data is quite similar. But as the structures are different, conflicts arise.
Consolidating Data Sources
![Page 30: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/30.jpg)
30
Consolidating Data Sources
![Page 31: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/31.jpg)
ReferencesNguyen H. Q., David Taniar, J. Wenny Rahayu, Kinh Nguyen (2011) "Double-layered schema
integration of heterogeneous XML sources", Journal of Systems and Software, Vol. 84 (1), pp. 63-
76.
Nguyen, H., Rahayu, J.W., Taniar, D., Nguyen, K., 2008, Mediation-based XML query answerability, Proceedings of
the OTM 2008 Confederated International Conferences: On the Move to Meaningful Internet Systems (OTM 2008), 9
November 2008 to 14 November 2008, Springer, Berlin Germany, pp. 1550-1558.
![Page 32: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/32.jpg)
• “Loose coupling” integration: data virtualization, data container
![Page 33: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/33.jpg)
33
Data Container *
• The era of large heterogeneous data collection – moving from Databases to Data container
• Data container – contains a collection of resources, each of which has a unique reference/identifier
• The resources in a Data container can be: databases, database relations, database tuples, files, records in files, data streams, social media documents, parts of texts, maps, trajectories, etc.
* Reference from Prof. Timos Sellis – Data Ecosystem - From Very large databases to Big Data Infra structure, La Trobe November 2015
![Page 34: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/34.jpg)
34
Data Container *
User Query
Result
![Page 35: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/35.jpg)
35
o The data may arrive from various data sources from different locations.
o Data from multiple data sources are integrated and aggregated on the fly.
o The user experiences the presence of a real data warehouse. The user has no clue where the data is from, but it is available.
o Some more benefits are,
o Real-time availability of information for decision support.
o Data is less stale.
o Able to access data instantly.
Data Warehouse Virtualization
![Page 36: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/36.jpg)
36
Current Data Warehousing Trends in IndustryAccording to the latest Gartner Study (2015), Data Warehouses can be classified into
four main categories:
1. Traditional Data Warehouse
o Consolidates and stores historical data that arrive from various data sources
2. Operational Data Warehouse
o Data is structured and continuously loaded to support operational queries
3. Logical Data Warehouse
o Structured data and other content data typeso Utilizes Data Virtualization
4. Data Lake
o Uses flat architecture to store data in its original formato Supports ‘Schema on Read’ capabilities
![Page 37: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/37.jpg)
37
Traditional Data Warehouse
![Page 38: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/38.jpg)
38
Dynamic Data Warehouse
E Chang, W. Rahayu, M. Diallo, M. Machizaud: Dynamic Data Mart for Business Intelligence. IFIP AI 2015: 50-63.
![Page 39: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/39.jpg)
39
o 3M: Data Mining, Data Marshalling, and Data Meshing
o 3R: Recommendation, Reconciliation, Representation
Dynamic Data Warehouse
E Chang, W. Rahayu, M. Diallo, M. Machizaud: Dynamic Data Mart for Business Intelligence. IFIP AI 2015: 50-63.
![Page 40: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/40.jpg)
40
New Trends in Data WarehousingMicrosoft | IBM | Oracle | Cisco | Sap
• Real-time Data Warehousing• Support new data types• Support for cloud data• Data Lake • Real-time Data Warehousing• Logical Big Data Warehousing• Support for complex, structured
and unstructured data • Support for Big Data• Logical Data Warehousing• Data Lake
![Page 41: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/41.jpg)
41
Finally…
Integration can be achieved through:• Standardization – suitable for domain specific data
sources since it is relying on the availability of the standard
• Global semantic representation – suitable for data sources with an inherent common knowledge
• “Loose coupling” integration - suitable for large heterogeneous data sources/data container with a dynamic nature (frequent changes)
![Page 42: THE NEW POWER OF DATA: Collection, Integration, and Analytics Wenny Rahayu Professor in Computer Science Head, School of Engineering and Mathematical Sciences.](https://reader035.fdocuments.us/reader035/viewer/2022062804/5697bf911a28abf838c8e3b2/html5/thumbnails/42.jpg)
Thank you
CRICOS Provider 00115M