MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%%...
Transcript of MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%%...
![Page 1: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/1.jpg)
Mega Modeling for Scien/fic “Big Data” Processing
Stefano Ceri, Emanuele Della Valle (Politecnico di Milano)
Dino Pedreschi, Roberto Trasar/ (ISTI-‐CNR and University of Pisa)
ER 2012 -‐ Stefano Ceri 1
![Page 2: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/2.jpg)
The context
ER 2012 -‐ Stefano Ceri 2
![Page 3: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/3.jpg)
Scenario
• BIG DATA: A new data revolu/on. • Data is reshaping every individual and collec/ve ac/vity of people’s life. -‐ Sensors and people produce huge amounts of data -‐ Data is becoming accessible everywhere via the Web
• Scien/fic big data is changing our aVtude towards science, from specialized to massive experiments and from focused to broad ques/ons.
• A data-‐centric vision goes towards Horizon 2020’s objec/ves.
ER 2012 -‐ Stefano Ceri 3
![Page 4: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/4.jpg)
Examples of Big Data A. London Traffic
4
![Page 5: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/5.jpg)
Challenges of Scien/fic Big Data Processing Smart Ci/es
• Ci/es are becoming smarter, as governments, businesses, and communi/es increasingly rely on technology to overcome the challenges from rapid urbaniza/on.
• Typical ques/ons for smart ci/es: – Where in the city are people converging during a typical week day? Or during weekends?
– Is public transporta/on dynamically adap/ng to people’s density?
– Is a traffic jam going to happen on this road? And is it then convenient to reallocate travellers based upon the forecast?
– Where are all my friends mee/ng? Can I reach them? Should I use public transports or go by car?
ER 2012 -‐ Stefano Ceri 5
![Page 6: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/6.jpg)
B. Pulse of the Na/on inferred from
Twicer
[source hcp://www.ccs.neu.edu/home/amislove/twicermood/ ] 6
![Page 7: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/7.jpg)
The social network behind Facebook!
C. Facebook World’s Geography
7
![Page 8: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/8.jpg)
Challenges of Scien/fic Big Data Processing Social Mining
• Using user-‐generated content for discovering and analyzing emergent social behaviors, by combining sensing of personal micro-‐data (tweets, web logs, mobile phones traces) and par/cipatory sensing (via crowdsourcing, GWAP,…).
• Typical ques/ons for social mining: – Who will win US elec/ons? What’s the elector’s current inten/on of vote? How reliable is it?
– Which are the indicators of social well-‐being (beyond GDP) and how can they be computed and monitored?
– How is the aging popula/on effec/vely helped by the social par/cipa/on to digital community services?
– What is the link between media ownership and media content? Is there bias in news repor/ng? And in content reviews?
– Is an infec/ve disease emerging? How is its diffusion model? ER 2012 -‐ Stefano Ceri 8
![Page 9: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/9.jpg)
D. Genomic Data
ER 2012 -‐ Stefano Ceri 9
![Page 10: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/10.jpg)
Challenges of Scien/fic Big Data Processing Genomic Compu/ng
• The context: thanks to Fast DNA Sequencing, “personalized genomic medicine” will become possible: – aner a blood sample, with a cost below 100$ and within hours or minutes of compu/ng /me, have the en/re genome of each individual available at a genome browser
• New ques/ons and scenarios: – Am I the carrier of gene/c muta/ons? Will I develop cancer? – How obesity correlates with breast cancer? – Which computa/onal approach can discriminate between "driver" or "passenger" cancer DNA muta/ons?
– How can specific target genes be assigned to epigene/cally defined regulatory regions?
– How do epigene/c modifica/ons affect DNA synthesis during the replica/on of genomes?
ER 2012 -‐ Stefano Ceri 10
![Page 11: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/11.jpg)
All the scenarios require… MODELS MODEL • Representa/on of the problem space in the ICT vocabulary (concepts, data, processes, systems).
• Computa/onal abstrac/ons extrac/ng relevant data from input data
• Models can: – Based upon analy/cal/sta/s/cal laws – Based upon simula/ons, extrac/ng general behaviors from many observa/ons of the behavior of individuals
– Based upon induc/ve methods applied to data • Challenge: convergence of three types of models
ER 2012 -‐ Stefano Ceri 11
![Page 12: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/12.jpg)
Mo/va/ng Context: FutureICT Flagship
• SCIENCE: The ul/mate goal of the FuturICT flagship project is to understand and manage complex, global, socially interac/ve systems, with a focus on sustainability and resilience.
• POLICY: FuturICT will build a Living Earth Plasorm, a simula/on, visualiza/on and par/cipa/on plasorm to support decision-‐making of policy-‐makers, business people and ci/zens.
• TECHNOLOGY: Integra/ng ICT, Complexity Science and the Social Sciences will create a paradigm shin, facilita/ng a symbio/c co-‐evolu/on of ICT and society.
ER 2012 -‐ Stefano Ceri 12
![Page 13: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/13.jpg)
FuturICT Vision
ER 2012 -‐ Stefano Ceri 13
![Page 14: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/14.jpg)
A s/mulus from FuturICT vision: World-‐of-‐Modeling Plasorm
THEORY • Classify models by type and describe each type’s proper/es. – Define (type-‐aware) strong interoperability within the elements of the same class
– Define model interoperability among models of different classes
PRACTICE • Build language abstrac/ons and sonware plasorms suppor/ng them
ER 2012 -‐ Stefano Ceri 14
![Page 15: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/15.jpg)
Mega-‐Modeling Concept
ER 2012 -‐ Stefano Ceri 15
![Page 16: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/16.jpg)
Mega-‐Modeling for Scien/fic Data
• General goal: Building a model of models -‐ which describes each model’s proper/es and interac/ons -‐ for suppor/ng opera/ons upon models, such as selec/on, inspec/on, composi/on, subs/tu/on, reduc/on, extension, and search.
• Keywords: big data, data pacerns, management of complexity, uncertainty, dynamic composi/on, adapta/on.
• Chris Welty (Jeopardy): “Increasingly computa/onal tasks require inexact solu/ons that combine mul/ple methods in unpredictable ways” (WWW 2012, Lyon)
ER 2012 -‐ Stefano Ceri 16
![Page 17: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/17.jpg)
Which scien/fic computa/ons? • Mathema=cal model: uses mathema/cal concepts and language. – Analy=cal Model: mathema/cal models that have a closed form solu/on
– Numerical Model: mathema/cal models that are solved by numerical approxima/on
• Sta=s=cal model: uses sta/s/cal concepts and language, e.g. probability distribu/on func/ons. – Data mining model: extracts pacerns from large data sets.
• Simula=on model: predicts the expected behavior of a system. – Agent-‐based model: simulates the ac/ons and interac/ons of autonomous agents (represen/ng individuals, groups or organiza/ons)
ER 2012 -‐ Stefano Ceri 17
![Page 18: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/18.jpg)
How should they be modeled?
• By embedding scien/fic computa/ons within a conceptual/ontological model of reality that serves the purpose of defining how computa/onal models share and exchange data, with a clear seman/cs
ER 2012 -‐ Stefano Ceri 18
![Page 19: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/19.jpg)
The root: Mega-‐Programming
• Wiederhold-‐Wegner-‐Ceri, CACM, Nov. 1992 • Mega-‐module:
– Internally homogeneous, independently maintained sonware system.
– Each mega-‐module describes its externally accessible data structures and opera/ons.
• Megaprogramming language MPL – A form of programming in the large
• It developed into: – “mediators”, “web services”, “Workflow / business process languages”, “seman/c web services”, “web 3.0”
ER 2012 -‐ Stefano Ceri 19
![Page 20: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/20.jpg)
Useful ideas of mega-‐programming
• Every mega-‐module exposes a data model and certain opera/ons to a mega-‐program: – SUPPLY: provide data in model-‐compa/ble format – INVOKE: ac/vate computa/on through entry points – EXTRACT: provides mega-‐module results – EXAMINE: makes access to internal state variables – ESTIMATE: gets informa/on about execu/on comple/on
– LIMIT: constraints execu/on /me & cost
ER 2012 -‐ Stefano Ceri 20
![Page 21: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/21.jpg)
Previous Uses of Mega-‐Modeling Term
• BEZEVIN-‐VALDURIEZ: “On the need for megamodels” (2004), emphasis on meta-‐models and model registry.
• BEZIVIN: “Model of models” (2004), a model of rela/onships between models.
• FAVRE: “Meta-‐model of model transforma/ons” (2005), models linked by rela/onships such as representa(onOf, conformsTo, isTransformedIn.
• SEIBEL et al. (2010) “dynamic hierarchical data models for traceability” – emphasis on dependencies between model ar/facts.
• SEIBEL et al. (2011) mega-‐models for “modeling run/me behavior”
ER 2012 -‐ Stefano Ceri 21
![Page 22: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/22.jpg)
Data-‐driven computa/on paradigms
• Data analysis: – process of extrac/ng useful informa/on from input data by using any kind of model (including data mining).
• Data mining: – automa/c or semi-‐automa/c analysis of large data sets to extract previously unknown interes(ng paEerns (emphasis on induc/on).
ER 2012 -‐ Stefano Ceri 22
![Page 23: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/23.jpg)
On the meaning of pacern • PaEern type = context-‐independent data format for
expressing the results of data analysis and data mining ac/vi/es – e.g. trajectories
• PaEern instance = context-‐specific data item compliant to the pacern type -‐ e.g. my trajectory from office to home today
• PaEern = context-‐specific popula/on of pacern instances, featuring an intensional descrip/on (name, pacern type, qualifying parameters, including quality parameters) and an extension (set of pacern instances) – e.g. the cluster of trajectories leading to Linate airport through the highway
• PaEern extrac=on = compu/ng pacerns in a given context, by first evalua/ng pacern instances and then abstrac/ng the common proper/es that collec/vely describe a popula/on
ER 2012 -‐ Stefano Ceri 23
![Page 24: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/24.jpg)
The authors’ history of pacerns
ER 2012 -‐ Stefano Ceri 24
![Page 25: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/25.jpg)
MineRule Operator (associa/on rules)
• Data type – Tabular representa/on of associa/on rules (HEAD, BODY, SUPPORT, CONFIDENCE)
• Pacern type – Associa/on rule HEAD -‐> BODY, featuring sta/s/cal proper/es of confidence, support
• Paradigm – Mine Rule Operator: SQL-‐based language for extrac/ng associa/on rules and puVng them into a tabular format, with built-‐in variables HEAD, BODY, SUPPORT, CONFIDENCE
ER 2012 -‐ Stefano Ceri 25
![Page 26: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/26.jpg)
Mine Rule Pacern MINE RULE PurchaseBasket AS SELECT DISTINCT l..n item AS BODY, I..1 item AS HEAD, SUPPORT, CONFIDENCE FROM Purchase WHERE DATE BETWEEN 1-‐1-‐2011 AND 1-‐1-‐2012 GROUP BY Transac/on HAVING COUNT(*) >= 3 EXTRACTING RULES WITH SUPPORT: 0.2, CONFIDENCE: 0.2 body head support confidence
ski_pants jacket 0.2 0.25 hiking_boots jacket 0.25 0.3
ski_pants, hiking_boots jacket 0.5 0.3 col_shirt jacket 0.3 0.2
col_shirt ,hiking_boots jacket 0.5 0.2
Associations
ER 2012 -‐ Stefano Ceri 26
![Page 27: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/27.jpg)
Stream Reasoning
• Data Types – RDF Stream: unbound sequence of /mestamped RDF triples
– Window (sliding or tumbling): top por/on of the RDF stream
– Time stamp func/on: associated to triples • Pacern Type
– Computa/on of a new stream from data and streams • Paradigm
– Addi/on to standard Sparql of new data types and of con/nuous seman/cs (i.e., streams and registered queries over streams)
ER 2012 -‐ Stefano Ceri 27
![Page 28: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/28.jpg)
An Example of C-SPARQL Stream
ER 2012 - Stefano Ceri 28
Who are the opinion makers? i.e., the users who are likely to influence the behaviour of other users who follow them
REGISTER STREAM OpinionMakers COMPUTED EVERY 5m AS CONSTRUCT { ?opinionMaker sd:about ?resource } FROM STREAM <http://streamingsocialdata.org/interactions>
[RANGE 30m STEP 5m] WHERE { ?opinionMaker ?opinion ?resource.
?follower sioc:follows ?opinionMaker.
?follower ?opinion ?resource. FILTER ( cs:timestamp(?follower) >
cs:timestamp(?opinionMaker) && ?opinion != sd:accesses )
}
HAVING ( COUNT(DISTINCT ?follower) > 3 )
![Page 29: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/29.jpg)
M-‐Atlas Interoperability for trajectories
• Data types – Points, lines, polygons, trajectories (moving points)
• Pacerns – Clusters: trajectories of points with the same label – Flows: trajectories moving between regions – Flocks: spa/o-‐temporal coincidence of flows
• Paradigm – SQL-‐like language for building pacerns and for querying, transforming, composing and visualizing them.
ER 2012 -‐ Stefano Ceri 29
![Page 30: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/30.jpg)
M-‐Atlas queries for social mining How do people leave Milan’s city center toward suburban areas?
CREATE MODEL MilanODMatrix AS MINE ODMATRIX FROM (SELECT t.id, t.trajectory FROM TrajectoryTable t), (SELECT orig.id, orig.area FROM MunicipalityTable orig), (SELECT dest.id, dest.area FROM MunicipalityTable dest) CREATE RELATION CenterToNESuburbTrajectories USING ENTAIL FROM (SELECT t.id, t.trajectory FROM TrajectoryTable t, MilanODMatrix m WHERE m.origin = Milan AND m.des/na/on IN (Monza, ..., Brugherio)) CREATE MODEL ClusteringTable AS MINE T-‐CLUSTERING FROM (Select t.id, t.trajectory from CenterToNESuburbTrajectories t) SET T-‐CLUSTERING.FUNCTION = ROUTE_SIMILARITY AND T-‐CLUSTERING.EPS = 400 AND T-‐CLUSTERING.MIN_PTS = 5
30
![Page 31: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/31.jpg)
Search Compu/ng
• Data type: – Ranked data services with input/output parameters
• Pacern type: – Service combina/ons obtained by compu/ng top-‐k join queries
• Paradigm: – SeCoQL, a query language and protocol suppor/ng ranked queries on services and exploratory search
ER 2012 -‐ Stefano Ceri 31
![Page 32: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/32.jpg)
Search Compu/ng Queries DEFINE QUERY NightPlan($X:String, $Y: string, $Z:Integer , $U:String, $V:String) AS
SELECT M.*, T.*, R.*, TotalPrice=T.Price + R.AvgPrice FROM ((Movie (iGenre: $X, iCountry: Y, iYear: $Z) AS M USING IMDB_MOVIES, JOIN Theatre (iAddress: $U, iCity: $V, iCountry: $Y) AS T USING GOOGLE_DISPLAYING ON M.Title=T.Title) JOIN Restaurant (iCountry: $Y, iCategory: "Italian Restaurant") AS R USING YQL_LOCAL ON T.address=R.Address AND T.city=R.City)
WHERE R.Ra/ng>3 RANK BY (R=0.4, T=0.3, M=0.3) LIMIT 20 TUPLES AND 50 CALLS
32
![Page 33: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/33.jpg)
CrowdSearcher
• Data type: – List of search items with a regular schema (possibly produced by a conven/onal search system)
• Pacern types: – Annota/ons on search items (like, dislike, recommend, tag, score, order, group, top, insert delete, correct, connect)
• Paradigm: – Use of crowd for adding pacerns to search items
ER 2012 -‐ Stefano Ceri 33
![Page 34: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/34.jpg)
CrowdSearcher Model
• Data type: collec/on of tuples • Query type: Like, Add, Sort / Rank, Comment, Modify
ER 2012 -‐ Stefano Ceri 34
![Page 35: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/35.jpg)
Example of crowdsourcing
ER 2012 -‐ Stefano Ceri 35
![Page 36: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/36.jpg)
Crowdsearcing results
![Page 37: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/37.jpg)
Common aspects of five pacerns
• High-‐level data representa/on through “tables”
• High-‐level data manipula/on language as an extension of major rela/onal languages, one of: SQL, Sparql, Datalog+-‐
• Recipe: – Expose a tabular representa/on – Use a rela/onal language extension for computa/on & composi/on
ER 2012 -‐ Stefano Ceri 37
![Page 38: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/38.jpg)
(just a bit more) Systema/c view
ER 2012 -‐ Stefano Ceri 38
![Page 39: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/39.jpg)
Pacerns for classifica/on & clustering
• CLASSIFICATION. The computa/on extracts classes from a popula/on, each class has a name and sta/s/cs – from simple frequencies up. Data: Popula/on(Item) Pacern: Class(Name, AggrStats)
• CLUSTERING. The computa/on extracts clusters from a collec/on, each cluster has a name, an extent (consis/ng of its elements), a centroid element, and sta/s/cs – from cardinali/es up. Data: Collec/on(Item) Pacern: Cluster(Name, Extent: [Item],
CentroidItem, AggrStats)
ER 2012 -‐ Stefano Ceri 39
![Page 40: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/40.jpg)
Pacerns for Streams • STREAMING. Stream compu/ng aggregates data of a given
type from a stream; it associates each type with a valid /me interval, typically the most recent, and aggregate proper/es. Data: Stream(TimeStamp, Item) Pacern: StreamStats(ItemType, TimeInterval, AggrStats)
• STREAMING WITH WINDOWS. The stream is subdivided in
windows, stream compu/ng associates a given type and window with aggregate proper/es. Data: Stream(Window, StartTimeStamp,
EndTimeStamp, Content:[Item]) Pacern: WindowedStats(Window, ItemType, AggrStats)
ER 2012 -‐ Stefano Ceri 40
![Page 41: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/41.jpg)
Pacerns for Associa/on Rules • ASSOCIATION RULES. They solve the basket analysis problem;
each associa/on rule has an head and a body describing item sets, and then sta/s/cal proper/es of support and confidence defining the rule’s interest. Data Basket(Tid,Item) Pacern: Rule(Head:[Item], Body:[Item], Support, Confidence)
ER 2012 -‐ Stefano Ceri 41
![Page 42: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/42.jpg)
Pacerns for Trees
• TREE. Classical computa/ons provide the descendants or ancestors of a given node, or classify a new node rela/ve to a taxonomy, by returning the path from the root to the most similar node Data: Tree (Item, Children: [Item]) Pacern: Descendants(Item, To: [Item]) Ancestors(Item, From: [Item]) Classify (Item, Path[Item])
ER 2012 -‐ Stefano Ceri 42
![Page 43: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/43.jpg)
Pacerns for Graphs • GRAPH. Classical computa/ons provide a decomposi/on of
a graph into components or find the “friend” nodes which are at a given “nearness” from a given node. Data: Graph(FromItem, ToItem) Pacern: Components(Name, Components: [Node]) Friends(FromItem, NearnessLevel, To: [Item])
• DISTANCE-‐GRAPH. Shortest path between any two items
expressed as a sequence of nodes connec/ng them and a totaldistance. Data: D-‐Graph(FromItem, ToItem, Distance) Pacern: ShortestPath(OriginItem, Des/na/onItem, Path: [Item], TotalDistance)
ER 2012 -‐ Stefano Ceri 43
![Page 44: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/44.jpg)
Pacerns for Moving Points • MOVING POINTS. Reconstruc/on of the trajectories as sequences of
loca/ons which are traversed by the same item. Data: Point(Item, Time, Loca/on)
Pacern: Trajectory(Item, FromLoca/on, ToLoca/on, Steps:[Loca/on], StepCount: Number)
• FLOCKS. Combina/on of trajectories together to recognize flocks, i.e.
simultaneous movements of groups of individuals across regions. Data: Trajectory(Item, FromLoca/on, ToLoca/on,
Steps:[Loca/on], StepCount: Number) Pacern: Flock(FlockName, FromRegion, ToRegion, TimeInterval, Objects: [Items], ObjectCount: Number)
44
![Page 45: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/45.jpg)
(eventually) Mega-‐modules
ER 2012 -‐ Stefano Ceri 45
![Page 46: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/46.jpg)
Mega-‐modules
ER 2012 -‐ Stefano Ceri 46
![Page 47: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/47.jpg)
Format • Data prepara/on
– Purpose: assembling input objects -‐-‐-‐ typically applica/on-‐specific – Techniques: abstrac/on, seman/c enrichment, noise reduc/on – Computa/on complexity: low (a data scan or sort)
• Data analysis – Purpose: performing the core scien/fic processing, compu/ng output
objects -‐-‐-‐ applica/on-‐independent – Techniques: computa/onal models – Computa/on complexity: as required (par//oning and streaming
recommended) • Data evalua/on
– Purpose: extrac/ng & presen/ng results -‐-‐-‐ typically applica/on-‐specific – Techniques: quality assessment, filtering, significance measuring,
diversifica/on, ranking – Computa/on complexity: as required (object transforma/ons to fit
needs) ER 2012 -‐ Stefano Ceri 47
![Page 48: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/48.jpg)
Inspec/ons and controls
• Megamodule inspec/on – Aner prepara/on: view of input objects – Aner execu/on: view of output objects
• Megamodule controls – Based upon inspec/on – May alter behavior, suspend, resume, terminate
ER 2012 -‐ Stefano Ceri 48
![Page 49: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/49.jpg)
Ra/onale
• Data analysis: reusable transforma/on of input objects into output objects – Classical mathema/cal/sta/s/cal algorithms compute output data
– Simula/on algorithms predict output data – Data mining methods induce output data
• Applica/on-‐independent input and output objects compliant with pacern types
ER 2012 -‐ Stefano Ceri 49
![Page 50: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/50.jpg)
Rela/onal View of Mega-‐Modules
• Input/output objects for data analysis in object-‐rela/onal format? – Poten/al for high-‐level declara/ve data analysis descrip/on using extended rela/onal query language
– Easing inspec/on and control – Easing data analysis reuse
ER 2012 -‐ Stefano Ceri 50
![Page 51: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/51.jpg)
Example: M-‐Atlas
ER 2012 -‐ Stefano Ceri 51
![Page 52: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/52.jpg)
Running Example
• Data prepara/on – GPS observa/ons of the same individual are assembled into a trajectory
• Data analysis – Trajectories are assembled and reported as simultaneous movements of groups of people (flocks)
• Data evalua/on – Flocks which are most relevant (above threshold) are reported upon a map
ER 2012 -‐ Stefano Ceri 52
![Page 53: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/53.jpg)
Composi/on Abstrac/ons
• Used for assembling mega-‐modules into higher order computa/ons
• If appropriately chosen, are key to mega-‐module reuse
• Ideal design process = top-‐down, recursive applica/on of (de)composi/on abstrac/ons up to finding the appropriate mega-‐modules within a repository
ER 2012 -‐ Stefano Ceri 53
![Page 54: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/54.jpg)
Composi/on Abstrac/ons (so far)
• General-‐purpose – Pipeline – Parallel/Itera/ve
• Recurrent – What-‐if control – Drin control
ER 2012 -‐ Stefano Ceri 54
![Page 55: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/55.jpg)
Pipeline
ER 2012 -‐ Stefano Ceri 55
![Page 56: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/56.jpg)
Parallel/Itera/ve
ER 2012 -‐ Stefano Ceri 56
![Page 57: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/57.jpg)
Map-‐Reduce
ER 2012 -‐ Stefano Ceri 57
![Page 58: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/58.jpg)
What-‐If
ER 2012 -‐ Stefano Ceri 58
![Page 59: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/59.jpg)
Drin Control
ER 2012 -‐ Stefano Ceri 59
![Page 60: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/60.jpg)
Graph Decomposi/on
ER 2012 -‐ Stefano Ceri 60
![Page 61: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/61.jpg)
Summary of ICT Requirements for Scien/fic Big Data Management
• In the “small” (modules, each processing terabytes of data) – Iden/fy reusable data formats as pacern types – Iden/fy reusable computa/ons as data analysis models – Iden/fy appropriate data transforma/ons for data prepara/on – Iden/fy appropriate quality assessments for data evalua/on
• In the “large” (composing mega-‐modules) – Foster composi/on through appropriate composi/on abstrac/ons + infrastructures
– Allow for assessing proper/es of the mega-‐module composi/on • Correctness, reliability, etc.
– Allow for inspec/on of mega-‐modules during processing • Assessing current state, intermediate results, etc.
– Allow for dynamic reconfigura/on of each mega-‐module • Scale up and down in response to the load, recover a computa/on aner a fault, etc.
ER 2012 -‐ Stefano Ceri 61
![Page 62: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/62.jpg)
Examples of applica/ons through composi/ons of MegaModules
ER 2012 -‐ Stefano Ceri 62
![Page 63: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/63.jpg)
BOTTARI: restaurant recommender based on geo-‐aware social media analy/cs
ER 2012 -‐ Stefano Ceri 63
![Page 64: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/64.jpg)
BOTTARI as a Mega-‐Model Composi/on
• Explicit module structure with input-‐output rela/onships
Inputs
BOTTARI
Temporal Model
Geo-Spatial Model
Predictive Model
Social Media Crawler and
Miner
Outputs
64
![Page 65: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/65.jpg)
BOTTARI Models • Geo-‐spa(al model
– Input: User posi/on, seman/c + geo-‐spa/al descrip/on of restaurants – Output: a list of matching restaurants ranked by distance from the
user • Temporal model
– Input: stream of liked restaurants – Output: ranking of restaurants in “like” order in the last week/month/
quarter • Predic(ve model
– Input: materialized stream of liked restaurants – Output: predic/on of the restaurant which will be chosen by the user
as best-‐fit • Social Media Crawler and Miner
– Input: stream of tweets of people about restaurants – Output: stream of most liked restaurant aner named en/ty
recogni/on and sen/ment mining
ER 2012 -‐ Stefano Ceri 65
![Page 66: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/66.jpg)
Mega-‐modulariza/on of Bocari
66
![Page 67: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/67.jpg)
Mobility analysis system
ER 2012 -‐ Stefano Ceri 67
![Page 68: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/68.jpg)
Mobility Manager Service How do driver get to Linate?
GPS Tracks
Trajectories that entails the clusters whose des/na/on is Linate
Two alterna/ve routes to Linate Airport
ER 2012 -‐ Stefano Ceri 68
![Page 69: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/69.jpg)
End-‐User Service User’s Mobility Profiling for Car Pooling
69 Home = most frequent loca/on Work = second most frequent loca/on
User’s GPS Tracks
Trajectories that entail the cluster “Home-‐Work”
Trajectories that entail the cluster “Work-‐Home”
Spa/o-‐Temporal User’s mobility profile
![Page 70: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/70.jpg)
Mega-‐modulariza/on of Trajectory Clustering
Input GPS data
Clustered Trajectories
Cluster Statistics
Geography, Zoning and Road Network
TRA
JEC
TOR
Y
RE
CO
NS
TRU
CTI
ON
&
SE
LEC
TIO
N
CLU
STE
R
EVA
LUAT
ION
TRAJECTORY CLUSTERING
70
![Page 71: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/71.jpg)
Mob
ility M
ng.
Service
End-‐user
Service
Trajectory Clustering Megamodule Usages
ER 2012 -‐ Stefano Ceri 71
![Page 72: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/72.jpg)
Mega-‐modulariza/on for Mobility Manager Service
Trajectory Clusters
ER 2012 -‐ Stefano Ceri 72
All Users’ Trajectories
Spatio-temporal Distance function
TRAJECTORY CLUSTERING
Routes to Linate
ROUTES IDENTIFICATION
Destination e.g., Linate
Spatio-Temporal Observations
Semantic of a Stop
DAT
A
CLE
AN
ING
TRA
JEC
TOR
IES
FILT
ER
ING
TRAJECTORIES RECONSTRUCTION
![Page 73: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/73.jpg)
Mega-‐modulariza/on of Trajectory Clustering for Car Pooling
User’s Mobility Profile
Car Pooling Suggestions
Spatio-Temporal Thresholds
CLU
STE
RIN
G
DE
CO
MP
OS
ITIO
N
PR
OFI
LE
AG
GR
EG
ATIO
N
USER MOBILITY PROFILE
COMPUTATION
ER 2012 -‐ Stefano Ceri 73
Spatio-temporal Distance function
TRAJECTORY CLUSTERING
Semantic of a Stop
DAT
A
CLE
AN
ING
TRA
JEC
TOR
IES
FILT
ER
ING
TRAJECTORIES RECONSTRUCTION
Spatio-Temporal Observations
Single User’s Trajectories
Single User’s Trajectory Clusters
![Page 74: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/74.jpg)
Research ques/ons & agenda • Express a large collec/on of pacerns through suitable
(rela/onal) language extensions • Build an ontology of mega-‐models, support reasoning upon
the ontology for deriving proper/es of mega-‐models • Define/classify composi/on abstrac/ons and define the
mega-‐modeling composi/on language • Consider research problems related to:
– Op/miza/on (inter vs intra) – Orchestra/on – Inspec/on – Adapta/on
• Build the sonware engineering tools and environment for building and composing mega-‐models
ER 2012 -‐ Stefano Ceri 74
![Page 75: MegaModeling%% for%Scien/fic%“Big%Data”%Processing% · MegaModeling%% for%Scien/fic%“Big%Data”%Processing% Stefano%Ceri, Emanuele%DellaValle% (Politecnico%di%Milano)% Dino](https://reader033.fdocuments.us/reader033/viewer/2022042922/5f6c746df87dff60762feb78/html5/thumbnails/75.jpg)
Summary of the talk • Mo/va/ons
– Examples of big scien/fic data, FuturICT – Typical research ques/ons
• Why MegaModelling? – History of the term – What should be solved
• What is a pacern – Applica/on-‐independent , tabular, composable
• What is a mega-‐module – Ingredients: Prepara/on / Analysis / Evalua/on – Composi/on abstrac/ons
• Examples of mega-‐modulariza/ons • To-‐do list
ER 2012 -‐ Stefano Ceri 75