Avro Apache Course: Distributed class Student ID: AM20144203 Name: Azzaya Galbazar 2014.12.17.
-
Upload
norah-perry -
Category
Documents
-
view
216 -
download
2
Transcript of Avro Apache Course: Distributed class Student ID: AM20144203 Name: Azzaya Galbazar 2014.12.17.
![Page 1: Avro Apache Course: Distributed class Student ID: AM20144203 Name: Azzaya Galbazar 2014.12.17.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649e2d5503460f94b1ce68/html5/thumbnails/1.jpg)
Avro Apache Course: Distributed class
Student ID: AM20144203
Name: Azzaya Galbazar
2014.12.17
![Page 2: Avro Apache Course: Distributed class Student ID: AM20144203 Name: Azzaya Galbazar 2014.12.17.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649e2d5503460f94b1ce68/html5/thumbnails/2.jpg)
Overview-What is Avro? Avro is an Apache open source project that provides
two services for the Hadoop(data serialization and exchange).
Avro is recent serialization system. Interoperability
Can Serialize into Avro/Binary or Avro/JSON
Supports reading and writing protobufs and thrift
![Page 3: Avro Apache Course: Distributed class Student ID: AM20144203 Name: Azzaya Galbazar 2014.12.17.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649e2d5503460f94b1ce68/html5/thumbnails/3.jpg)
Overview-Avro provides..? Rich data structures with schema designed over JSON
A compact, fast binary format.
A container file, to store persistent data.
Remote procedure call (RPC).
Simple integration with dynamic languages.
Code generation is not required to read or write data files nor to use or implement RPC protocols.
Code generation as an optional optimization, only worth implementing for statically typed languages.
![Page 4: Avro Apache Course: Distributed class Student ID: AM20144203 Name: Azzaya Galbazar 2014.12.17.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649e2d5503460f94b1ce68/html5/thumbnails/4.jpg)
Overview Avro uses JSON for Interface Description
Language(IDL)To specify data types
To specify protocols Review: JavaScript Object Notation is just a light-
weight text-based standard for data interchange.
![Page 5: Avro Apache Course: Distributed class Student ID: AM20144203 Name: Azzaya Galbazar 2014.12.17.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649e2d5503460f94b1ce68/html5/thumbnails/5.jpg)
Overview-Why the need for Avro? Primary usage in Hadoop, provides standard:
Serialization format for persistent data Wire format for communication
Among Hadoop nodes.
From client programs to Hadoop services.
![Page 6: Avro Apache Course: Distributed class Student ID: AM20144203 Name: Azzaya Galbazar 2014.12.17.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649e2d5503460f94b1ce68/html5/thumbnails/6.jpg)
Overview Avro relies on schemas.
Schema stored with data Each datum written with no per-value overheads.
Thus serialization is fast and small Avro in RPC:
Schema exchange during client-server handshake Correspondence in fields can be easily resolved.
![Page 7: Avro Apache Course: Distributed class Student ID: AM20144203 Name: Azzaya Galbazar 2014.12.17.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649e2d5503460f94b1ce68/html5/thumbnails/7.jpg)
Overview-APIs Supporting API for:
Java C C++ C# Python Ruby
![Page 8: Avro Apache Course: Distributed class Student ID: AM20144203 Name: Azzaya Galbazar 2014.12.17.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649e2d5503460f94b1ce68/html5/thumbnails/8.jpg)
Specification A Schema is represented in JSON by on of: A JSON string, naming a defined type.
A JSON object, of the form:{“type”: ”type name” …attributes…}
A JSON array, representing a union of embedded types.
Primitive types: null, boolean, int, long, float, double, bytes, string
Complex types: records, enums, arrays, maps, unions, fixed
![Page 9: Avro Apache Course: Distributed class Student ID: AM20144203 Name: Azzaya Galbazar 2014.12.17.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649e2d5503460f94b1ce68/html5/thumbnails/9.jpg)
Apache Avro with Maven Java
1. Apache Maven is a software project management and comprehension tool.
2. Based on the concept of a project object model (POM),
3. Maven can manage a project's build, reporting and documentation from a central piece of information
![Page 10: Avro Apache Course: Distributed class Student ID: AM20144203 Name: Azzaya Galbazar 2014.12.17.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649e2d5503460f94b1ce68/html5/thumbnails/10.jpg)
Apache Avro with Maven Java
1.Add two dependencies to pom.xml-the one is Apache Avro library, the other one is maven plugin that allows us to generate Java classes.
![Page 11: Avro Apache Course: Distributed class Student ID: AM20144203 Name: Azzaya Galbazar 2014.12.17.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649e2d5503460f94b1ce68/html5/thumbnails/11.jpg)
Apache Avro with Maven Java
1.Add two dependencies to pom.xml-the one is Apache Avro library, the other one is maven plugin that allows us to generate Java classes.
![Page 12: Avro Apache Course: Distributed class Student ID: AM20144203 Name: Azzaya Galbazar 2014.12.17.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649e2d5503460f94b1ce68/html5/thumbnails/12.jpg)
Apache Avro with Maven Java
2.Defining a schema
#a schema file can only contain a single schema definition.
![Page 13: Avro Apache Course: Distributed class Student ID: AM20144203 Name: Azzaya Galbazar 2014.12.17.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649e2d5503460f94b1ce68/html5/thumbnails/13.jpg)
Apache Avro with Maven Java
2.Serializing and deserializing from a File
# serializes book to file and deserializes it and print it to output.
![Page 14: Avro Apache Course: Distributed class Student ID: AM20144203 Name: Azzaya Galbazar 2014.12.17.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649e2d5503460f94b1ce68/html5/thumbnails/14.jpg)
Apache Avro with Maven Java
2.Serializing and deserializing from a File
# serializes book to file and deserializes it and print it to output.
![Page 15: Avro Apache Course: Distributed class Student ID: AM20144203 Name: Azzaya Galbazar 2014.12.17.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649e2d5503460f94b1ce68/html5/thumbnails/15.jpg)
Apache Avro with Maven
2.Describing functions
#DataFileWriter converts Java object into an in-memory serialized format.
#SpecificDatumWriter extracts the schema from specified type.
#DataFileWriter writes the serialized record, as well as the schema.
![Page 16: Avro Apache Course: Distributed class Student ID: AM20144203 Name: Azzaya Galbazar 2014.12.17.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649e2d5503460f94b1ce68/html5/thumbnails/16.jpg)
Apache Avro with Maven Java
4.Running the example code
5.Result output.
![Page 17: Avro Apache Course: Distributed class Student ID: AM20144203 Name: Azzaya Galbazar 2014.12.17.](https://reader036.fdocuments.us/reader036/viewer/2022062516/56649e2d5503460f94b1ce68/html5/thumbnails/17.jpg)
Thank you for your attention