BIG DATA STATE OF THE ART -...
Transcript of BIG DATA STATE OF THE ART -...
![Page 1: BIG DATA STATE OF THE ART - cdn.bdigital.orgcdn.bdigital.org/PDF/BIGDATACONGRESS2016/24.SPARSITYDAMAUP… · The Linked Data Benchmark Council: Big Data Benchmarking and Generation](https://reader031.fdocuments.us/reader031/viewer/2022030506/5ab5617f7f8b9a156d8cbde5/html5/thumbnails/1.jpg)
BIG DATA STATE OF THE ART
![Page 2: BIG DATA STATE OF THE ART - cdn.bdigital.orgcdn.bdigital.org/PDF/BIGDATACONGRESS2016/24.SPARSITYDAMAUP… · The Linked Data Benchmark Council: Big Data Benchmarking and Generation](https://reader031.fdocuments.us/reader031/viewer/2022030506/5ab5617f7f8b9a156d8cbde5/html5/thumbnails/2.jpg)
Sr. Josep Lluís Larriba-Pey
CEO | Director
Sparsity | DAMA-UPC www.sparsity-technologies.com
www.dama.upc.edu
![Page 3: BIG DATA STATE OF THE ART - cdn.bdigital.orgcdn.bdigital.org/PDF/BIGDATACONGRESS2016/24.SPARSITYDAMAUP… · The Linked Data Benchmark Council: Big Data Benchmarking and Generation](https://reader031.fdocuments.us/reader031/viewer/2022030506/5ab5617f7f8b9a156d8cbde5/html5/thumbnails/3.jpg)
The Linked Data Benchmark Council: Big Data Benchmarking and Generation
Josep Lluís Larriba-Pey
Sparsity & DAMA-UPC
Big Data Congress, Barcelona5/10/2016
![Page 4: BIG DATA STATE OF THE ART - cdn.bdigital.orgcdn.bdigital.org/PDF/BIGDATACONGRESS2016/24.SPARSITYDAMAUP… · The Linked Data Benchmark Council: Big Data Benchmarking and Generation](https://reader031.fdocuments.us/reader031/viewer/2022030506/5ab5617f7f8b9a156d8cbde5/html5/thumbnails/4.jpg)
Population in cities The Linked Data Benchmark Council
Linked Data Benchmark Council:• Benchmarking organization
• Specializes in Graph and RDF technologies
• Formed by
![Page 5: BIG DATA STATE OF THE ART - cdn.bdigital.orgcdn.bdigital.org/PDF/BIGDATACONGRESS2016/24.SPARSITYDAMAUP… · The Linked Data Benchmark Council: Big Data Benchmarking and Generation](https://reader031.fdocuments.us/reader031/viewer/2022030506/5ab5617f7f8b9a156d8cbde5/html5/thumbnails/5.jpg)
The Linked Data Benchmark Council
Four main elements in a benchmark:
• Data & Schema. Describe a business.
• Workloads. Shows the type of interaction between users and the system. Transactional, BI, Analytics.
• Performance metrics. How we compare systems fairly.
• Auditing rules. We all must proceed in the same way to be comparable.
![Page 6: BIG DATA STATE OF THE ART - cdn.bdigital.orgcdn.bdigital.org/PDF/BIGDATACONGRESS2016/24.SPARSITYDAMAUP… · The Linked Data Benchmark Council: Big Data Benchmarking and Generation](https://reader031.fdocuments.us/reader031/viewer/2022030506/5ab5617f7f8b9a156d8cbde5/html5/thumbnails/6.jpg)
Big Data Generation
• Data is becoming the cornerstone of many business models and applications:• It contains sensitive and critical information,
• Holders reluctant to share data• Privacy at risk
• Competitive advantage
• This can:• Limit the growth of the data economy
• Truncate business opportunities
![Page 7: BIG DATA STATE OF THE ART - cdn.bdigital.orgcdn.bdigital.org/PDF/BIGDATACONGRESS2016/24.SPARSITYDAMAUP… · The Linked Data Benchmark Council: Big Data Benchmarking and Generation](https://reader031.fdocuments.us/reader031/viewer/2022030506/5ab5617f7f8b9a156d8cbde5/html5/thumbnails/7.jpg)
Need for anonymized Big Data generation
Use cases:• Medical data: necessary to analyse, difficult to release.
• City data: understanding how people move for the benefit of society.
• Migrant data: how to alleviate absorption.
![Page 8: BIG DATA STATE OF THE ART - cdn.bdigital.orgcdn.bdigital.org/PDF/BIGDATACONGRESS2016/24.SPARSITYDAMAUP… · The Linked Data Benchmark Council: Big Data Benchmarking and Generation](https://reader031.fdocuments.us/reader031/viewer/2022030506/5ab5617f7f8b9a156d8cbde5/html5/thumbnails/8.jpg)
BigData-EnGen platform
BigData-EnGen is a joint effort
![Page 9: BIG DATA STATE OF THE ART - cdn.bdigital.orgcdn.bdigital.org/PDF/BIGDATACONGRESS2016/24.SPARSITYDAMAUP… · The Linked Data Benchmark Council: Big Data Benchmarking and Generation](https://reader031.fdocuments.us/reader031/viewer/2022030506/5ab5617f7f8b9a156d8cbde5/html5/thumbnails/9.jpg)
BigData-EnGen platform
![Page 10: BIG DATA STATE OF THE ART - cdn.bdigital.orgcdn.bdigital.org/PDF/BIGDATACONGRESS2016/24.SPARSITYDAMAUP… · The Linked Data Benchmark Council: Big Data Benchmarking and Generation](https://reader031.fdocuments.us/reader031/viewer/2022030506/5ab5617f7f8b9a156d8cbde5/html5/thumbnails/10.jpg)
DataSynth
Generic Data generator• Accepts achema specifications and data properties, via DSL
• Extensible via plugins
• Designed to scale on clusters
• Completely Open Source (https://github.com/DAMA-UPC/DataSynth)
Still on pre-alpha state, subject to changes
We are open to contributions!!!
![Page 11: BIG DATA STATE OF THE ART - cdn.bdigital.orgcdn.bdigital.org/PDF/BIGDATACONGRESS2016/24.SPARSITYDAMAUP… · The Linked Data Benchmark Council: Big Data Benchmarking and Generation](https://reader031.fdocuments.us/reader031/viewer/2022030506/5ab5617f7f8b9a156d8cbde5/html5/thumbnails/11.jpg)
DataSynth Domain Specific Language
• JSON• We can express entities, edges and attributes• Example of code and the property graph generated• The compiler traverses the property graph and generates IR code
![Page 12: BIG DATA STATE OF THE ART - cdn.bdigital.orgcdn.bdigital.org/PDF/BIGDATACONGRESS2016/24.SPARSITYDAMAUP… · The Linked Data Benchmark Council: Big Data Benchmarking and Generation](https://reader031.fdocuments.us/reader031/viewer/2022030506/5ab5617f7f8b9a156d8cbde5/html5/thumbnails/12.jpg)
DataSynth Backend
Backend implemented in Apache Spark 2.x.x• Currently using DataSet API (in Scala) taking advantage of built-in query
optimizer
• Can incorporate other programming models and APIs, such as GraphX or Streaming
Backends in other technologies can be implemented
![Page 13: BIG DATA STATE OF THE ART - cdn.bdigital.orgcdn.bdigital.org/PDF/BIGDATACONGRESS2016/24.SPARSITYDAMAUP… · The Linked Data Benchmark Council: Big Data Benchmarking and Generation](https://reader031.fdocuments.us/reader031/viewer/2022030506/5ab5617f7f8b9a156d8cbde5/html5/thumbnails/13.jpg)
Conclusions
• We strongly believe in the future of Big Data generation
• There are many business models that can grow out of it:• SMEs providing services
• Companies growing their analytics businesses
• We are open to collaborate with third parties