&+ ø Ä R X
Transcript of &+ ø Ä R X
![Page 1: &+ ø Ä R X](https://reader034.fdocuments.us/reader034/viewer/2022050403/627026394307e9135a1c6bde/html5/thumbnails/1.jpg)
0 2 1 1
![Page 2: &+ ø Ä R X](https://reader034.fdocuments.us/reader034/viewer/2022050403/627026394307e9135a1c6bde/html5/thumbnails/2.jpg)
2
2013
“ ” “ ” “”
2000
![Page 3: &+ ø Ä R X](https://reader034.fdocuments.us/reader034/viewer/2022050403/627026394307e9135a1c6bde/html5/thumbnails/3.jpg)
3
������
A
![Page 4: &+ ø Ä R X](https://reader034.fdocuments.us/reader034/viewer/2022050403/627026394307e9135a1c6bde/html5/thumbnails/4.jpg)
4
�������
I
N FP
I
I I P
U
I
���������������
���� ����
3
����
![Page 5: &+ ø Ä R X](https://reader034.fdocuments.us/reader034/viewer/2022050403/627026394307e9135a1c6bde/html5/thumbnails/5.jpg)
5
1.
2. -Flink
3.
4.Butterfly-Sql
![Page 6: &+ ø Ä R X](https://reader034.fdocuments.us/reader034/viewer/2022050403/627026394307e9135a1c6bde/html5/thumbnails/6.jpg)
CIF -CIF -
CIF HDFS
Mysql MongoDB
Canal Oplog (cif-oplog-sync)
cif-rest-server
Mongdb( )
Azkaban
DBV RestFul
Kafka
Flink
HDFSBase
Cif-trickle
HDFS( )
ODS2CIF
HBaseElasticsearch
Butterfly
UTCS WEB
SparkSql
cif.schema
Canal-analyzer oplog-analyzer
![Page 7: &+ ø Ä R X](https://reader034.fdocuments.us/reader034/viewer/2022050403/627026394307e9135a1c6bde/html5/thumbnails/7.jpg)
7
1.
2. -Flink
3.
4.Butterfly-Sql
![Page 8: &+ ø Ä R X](https://reader034.fdocuments.us/reader034/viewer/2022050403/627026394307e9135a1c6bde/html5/thumbnails/8.jpg)
8
1.oplog
2.consumer
3.
![Page 9: &+ ø Ä R X](https://reader034.fdocuments.us/reader034/viewer/2022050403/627026394307e9135a1c6bde/html5/thumbnails/9.jpg)
Flink
9
2008 Stratosphere0.6 Flink
2014 4 16 Apache ASFApache Software Foundation
![Page 10: &+ ø Ä R X](https://reader034.fdocuments.us/reader034/viewer/2022050403/627026394307e9135a1c6bde/html5/thumbnails/10.jpg)
Flink Ecosystem
10
localSingle JVM
ClusterStandalone,YARN
CloudGEC,EC2
RuntimeDistributed Streaming Dataflow
DataStream APIStream Processing
DataSet APIBatch Processing
CEPEvent Processing
Streaming TablesRelational
Batch TablesRelational
FlinkMLMachine Learning
GellyGraph Processing
Deploy
Core
API
Libraries
Flink Flink
![Page 11: &+ ø Ä R X](https://reader034.fdocuments.us/reader034/viewer/2022050403/627026394307e9135a1c6bde/html5/thumbnails/11.jpg)
Flink
11
Flink JobManger TaskManager Client JobManagerJobManager TaskManager TaskManager JobManagerTaskManager JVM
Client Job JobManager Job
Client Streaming
JobManager Job Task checkpoint Storm Nimbus Client Job JAR
Task TaskManager
TaskManager Slot slot Task Task JobManager Task
Netty Job JobManager Job Client
Streaming
![Page 12: &+ ø Ä R X](https://reader034.fdocuments.us/reader034/viewer/2022050403/627026394307e9135a1c6bde/html5/thumbnails/12.jpg)
Flink DAG
12
JobGraph StreamGraph JobGraph JobManager chain / /
StreamGraph Stream API
ExecutionGraph JobManager JobGraph ExecutionGraph ExecutionGraph JobGraph
JobManager ExecutionGraph Job TaskManager Task “ ”
![Page 13: &+ ø Ä R X](https://reader034.fdocuments.us/reader034/viewer/2022050403/627026394307e9135a1c6bde/html5/thumbnails/13.jpg)
Flink WaterMake
13
1.
2.
1. Event Time2. Ingestion Time Flink3. Processing Time Operator
Flink Google MillWheel WaterMarkEvent Time
![Page 14: &+ ø Ä R X](https://reader034.fdocuments.us/reader034/viewer/2022050403/627026394307e9135a1c6bde/html5/thumbnails/14.jpg)
Flink WaterMake
14
Event Time
WaterMark Flink WaterMark FlinkFlink WaterMark
Flink Flink WaterMarkWaterMark WaterMark
![Page 15: &+ ø Ä R X](https://reader034.fdocuments.us/reader034/viewer/2022050403/627026394307e9135a1c6bde/html5/thumbnails/15.jpg)
Flink
15
JVM , Hadoop, Spark,Hbase,Kafka .JVMJVM .
1.JVM OOM
2.Full GC
3.Java
![Page 16: &+ ø Ä R X](https://reader034.fdocuments.us/reader034/viewer/2022050403/627026394307e9135a1c6bde/html5/thumbnails/16.jpg)
Flink
16
Network Buffers: 32KB buffer TaskManager 2048
taskmanager.network.numberOfBuffers
Memory Manager Pool: MemoryManager MemorySegment Flink sort/shuffle/
join MemorySegment 70%
Remaining (Free) Heap: TaskManager
GC
MemorySegment: flink MemorySegment flinkMemorySegment cell cell 32kb flink
MemorySegment Flink java.nio.ByteBuffer Java byte[] ByteBuffer MemorySegment
![Page 17: &+ ø Ä R X](https://reader034.fdocuments.us/reader034/viewer/2022050403/627026394307e9135a1c6bde/html5/thumbnails/17.jpg)
Flink
17
//1.Personpublic class Person { public int id; public String name;}//Tuple3<age:Integer, height:Double, Person>(25,175.5,Person(1,"zhangsan"))
int 4 double 8 POJO headerPojoSerializer header serializer
Flink Schema Flink Java Scala Flink Flink Java Reflection Java Flink UDF (User Define Function) Scala Compiler
Scala Flink UDF TypeInformation TypeInformation
BasicTypeInfo: Java ( ) String BasicArrayTypeInfo: Java ( ) String WritableTypeInfo: Hadoop Writable TupleTypeInfo: Flink Tuple ( Tuple1 Tuple25) Flink tuples Java Tuple CaseClassTypeInfo: Scala CaseClass( Scala tuples) PojoTypeInfo: POJO(Java Scala)Java public getter/setter GenericTypeInfo:
1. Flink TypeSerializer 2. Flink Kryo
![Page 18: &+ ø Ä R X](https://reader034.fdocuments.us/reader034/viewer/2022050403/627026394307e9135a1c6bde/html5/thumbnails/18.jpg)
BackPressure( )
18
>=
<
![Page 19: &+ ø Ä R X](https://reader034.fdocuments.us/reader034/viewer/2022050403/627026394307e9135a1c6bde/html5/thumbnails/19.jpg)
Spark BackPressure( )spark streaming
19
Spark 1.5 Receiverspark.streaming.receiver.maxRate Spark Streaming v1.5 back-pressure ,
,spark.streaming.backpressure.enabled RateController, OnBatchCompleted
processingDelay schedulingDelay . Estimatorrate Receiver Input Stream rate ReceiverTracker
ReceiverSupervisorImpl BlockGenerator
![Page 20: &+ ø Ä R X](https://reader034.fdocuments.us/reader034/viewer/2022050403/627026394307e9135a1c6bde/html5/thumbnails/20.jpg)
Flink BackPressure( )
20
task1 task2 TaskManagertask task2
task 2 task1 task1 task1
task1 task2(TCP Channel)
TCP
![Page 21: &+ ø Ä R X](https://reader034.fdocuments.us/reader034/viewer/2022050403/627026394307e9135a1c6bde/html5/thumbnails/21.jpg)
Flink BackPressure( )
21
Flink Web Job Backpressure Sampling TaskJobManager 50ms Job Task 100
radio=0.01 100 1
Flink BackpressureOK: 0 <= Ratio <= 0.10LOW: 0.10 < Ratio <= 0.5HIGH: 0.5 < Ratio <= 1
![Page 22: &+ ø Ä R X](https://reader034.fdocuments.us/reader034/viewer/2022050403/627026394307e9135a1c6bde/html5/thumbnails/22.jpg)
Flink
22
1. At most once 2. At least one 3. Exactly once
Flink checkpoint
Lightweight Asynchronous Snapshots for Distributed Dataflows Flink Chandy-Lamport Flink
Flink snapshot1.2.
![Page 23: &+ ø Ä R X](https://reader034.fdocuments.us/reader034/viewer/2022050403/627026394307e9135a1c6bde/html5/thumbnails/23.jpg)
Flink checkpoint
23
Barrier Flink barrier barrier Barrier barrier
barrier ID barrier Barrier barrier
Asynchronous Barrier Snapshots
Flink Exactly once at least once
![Page 24: &+ ø Ä R X](https://reader034.fdocuments.us/reader034/viewer/2022050403/627026394307e9135a1c6bde/html5/thumbnails/24.jpg)
Flink savepoint
24
checkpoint Flink savepoint checkpoint checkpointFlink checkpoint savepoint checkpoint checkpoint
savepoint
flink list
flink savepoint job_id
flink cancel job_id
flink run -d -s hdfs://savepoint/1 ###.jar
![Page 25: &+ ø Ä R X](https://reader034.fdocuments.us/reader034/viewer/2022050403/627026394307e9135a1c6bde/html5/thumbnails/25.jpg)
25
1���� 1���� 15����
![Page 26: &+ ø Ä R X](https://reader034.fdocuments.us/reader034/viewer/2022050403/627026394307e9135a1c6bde/html5/thumbnails/26.jpg)
26
1.
2. -Flink
3.
4.Butterfly-Sql
![Page 27: &+ ø Ä R X](https://reader034.fdocuments.us/reader034/viewer/2022050403/627026394307e9135a1c6bde/html5/thumbnails/27.jpg)
27
Kafka Topic
Mongdb0-> Column11-> Column22-> Column3
N-1 ->ColumnN
Spark
hdfsrecord
Encoder
Parquet
ParquetStructField(“0”)StructField(“1”)StructField(“2”)
StructField(“N-1”)
Canal
ODS
CIF
Spark
ParquetStructField(“ C1”)StructField(“C2”)StructField(“C3”)
StructField(“Cn-1”)
HDFS
Json -> version1
Json -> version2Json -> version3
Flink
Trickle
![Page 28: &+ ø Ä R X](https://reader034.fdocuments.us/reader034/viewer/2022050403/627026394307e9135a1c6bde/html5/thumbnails/28.jpg)
28
Schema
+
cif
mysql/mongo
shema ()
parquet
![Page 29: &+ ø Ä R X](https://reader034.fdocuments.us/reader034/viewer/2022050403/627026394307e9135a1c6bde/html5/thumbnails/29.jpg)
29
1.
2. -Flink
3.
4.Butterfly-Sql
![Page 30: &+ ø Ä R X](https://reader034.fdocuments.us/reader034/viewer/2022050403/627026394307e9135a1c6bde/html5/thumbnails/30.jpg)
30
Scala scalaScala parser combinators
json sql
Scala parser combinators Scala “ ”
Scala parser combinators RegexParsers
StandardTokenParsers StandardTokenParsersSpark sql sql StandardTokenParsers
-Scala Parsers
DSL
Abstract Syntax Tree AST AST
AST AST
![Page 31: &+ ø Ä R X](https://reader034.fdocuments.us/reader034/viewer/2022050403/627026394307e9135a1c6bde/html5/thumbnails/31.jpg)
BNF
31
DSL BNF BNF
Backus Algol60 Backus-Naur Backus-Naur Form, BNF EBNF BNF
BNF ("word") double_quote
< > : [ ] : { } : 0 | : "OR" ::= : “ ” "..." :
[...] : {...} : 0 (...) : | :
BNF expr ::= term {'+' term | '-' term} term ::= factor {'*' factor | '/' factor} factor ::= floatingPointNumber | '(' expr ')'
{} 0 | factor floatingPointNumber expr
![Page 32: &+ ø Ä R X](https://reader034.fdocuments.us/reader034/viewer/2022050403/627026394307e9135a1c6bde/html5/thumbnails/32.jpg)
StandardTokenParsers
32
Scala EBNF Scala StandardTokenParsers
lexical.delimiters lexical.reservedDSL ”+","-","*","/","(",")"
lexical.reserve"if","then","SUM","COUNT" lexical.reserve
/
* Parser Parser * Parsers / / Parser * Reader / * ParseResult / / * Positional * Position
:Parser parser
* | Parser * ~ Parser * ~> ~ * <~ ~ * ^^ * opt(p) Some(x) None * rep repeat
![Page 33: &+ ø Ä R X](https://reader034.fdocuments.us/reader034/viewer/2022050403/627026394307e9135a1c6bde/html5/thumbnails/33.jpg)
33
Butterfly
![Page 34: &+ ø Ä R X](https://reader034.fdocuments.us/reader034/viewer/2022050403/627026394307e9135a1c6bde/html5/thumbnails/34.jpg)
34
ButterflyButterfly Sql Parser&AST, Table Engine
Sql Parser&AST : Sql Engine : AST Table 7 Table : Schema (MeteData)
Sql Query Sql Parser AST SelectStmt
projections: Seq[SqlProj]
relations: Option[Seq[SqlRelation]]
filter: Option[SqlExpr]
groupBy: Option[SqlGroupBy]
orderBy: Option[SqlOrderBy]
limit: Option[Int]
formFumc where groupBy aggre select orderBy limit
Table Table Table Table Table Table Table
Schema : Table Table : Row Row Map[String, MetaData[_]] LoadTableData loadSchema
34
![Page 35: &+ ø Ä R X](https://reader034.fdocuments.us/reader034/viewer/2022050403/627026394307e9135a1c6bde/html5/thumbnails/35.jpg)
35