Apach avro
-
Upload
megrhi-haikel -
Category
Data & Analytics
-
view
231 -
download
1
Transcript of Apach avro
Apach Avro
Overview 1
Data serialization system or /and Data Exchange
Resolve Hadoop Writables : lack of portability
Sharing data
Language independent-schema (JSON)
No need for code generation
Overview 2
Supports schema evolution
Supports compression and splitting
Rich data types and schema
Avro Data types and Schemas 1
null
boolean
int
long
float
double
bytes
Avro Data types and Schemas 2
array
map
record
enum
fixed
union
Avro Data types and Schemas 3
Generic Java mapping
Specific Java mapping
Reflect Java mapping
In-memory Serialization and Deserialization
Specific API (avro-tool)
Datafiles
Schema
Avro object
Marker sync
In binary format
Datafiles
Portability
Portability
Schema resolution (Projection)
Sort Order
Every avro object has ordering rule except records
Comparing works directly on the byte streams
Avro MapReduce
Avro offers many API to run MapReduce on Avro data
Avro MapReduce
Avro MapReduce
Avro MapReduce
Avro MapReduce
Avro MapReduce
Avro MapReduce
Avro MapReduce
Avro Sorting MapReduce
Avro Sorting MapReduce