mcubed london - data science at the edge
-
Upload
simon-elliston-ball -
Category
Data & Analytics
-
view
510 -
download
1
Transcript of mcubed london - data science at the edge
Data science at the Edge
With NiFi, TensorFlow and a proper
cluster for good measure
Simon Elliston Ball
@sireb
Simon Elliston Ball
• Product Manager
• Data Scientist
• Elephant herder
• @sireb
Data gravity
588,000,000 km
• Size• Distance
Other types of data gravity
•Compliance
• Legislation
•Political
•Paranoia
Photo: https://flic.kr/p/JvW7qh
Sampling vs Big Data: a quick history
• Before we had cloud, clusters and GPUs…• MPP
• Super Computers
• Grids
• Cut down data size to fit in memory
A quick intro to NiFi
• Guaranteed Delivery
• Prioritized queuing and buffering
• Data provenance
• Bi-directional communication
• Security – Authentication and multi-role authorization
• Visual command and control
• Templating
• Robust API
and lots of adapters
Demo: sending stuff around
• Pushing camera frames to the cloud
Face detectionKey point locations
Lightweight models
Low contextual data
face detection
• Simple haarcascader in opencv: https://github.com/simonellistonball/nifi-OpenCV
Dlib Face
Detection
• 68 Facial Point Model
• c. 100MB
Tensorflow in NiFi
• Our haarcascade was… Face detection didn’t do a great job
• Neural Networks
• Relatively Large models• Haarcascader: 677KB of XML
• Facenet trained model on LFW: 168 MB (and that’s zipped protobufs)
• Tensorflow: https://github.com/tspannhw/nifi-tensorflow-processor.git
Face recognition• Huge databases of face hashes and feature measures
• Extra information and context around the person
• Computationally expensive and heavy network use
• Apple Face ID demo… too many people had tried the device beforehand, blew the database. One or two faces is easy, millions is another matter
Rocket ship to the cloud
https://www.nasa.gov/sites/default/files/thumbnails/image/s83-35620-3k.jpg
Cloud: ML all packaged up… for a price
Tensorflow on Spark
• Why?
• Doesn’t TensorFlow already have a distributed compute model?
Existing clusters, multi-purpose clusters:
• Tensorframes, TensorflowOnSpark, CaffeOnSpark, Spark ML, SQL
• When?
• Training, batch scoring
Broadening the example
• Where is your context?
• Why do you need context?• Detection
• Explanation
Body worn video
• Record everything
• Record when you remember to press the button
• Record when it matters
What about?
• Live assist
• Evidence and accountability
Netflow
Cybersecurity: progressive context
• Record everything: PCAP
• Send up the (maybe) interesting bits
• Fetch detail on demand
PCAP at Edge
1ST Pass Model Security Data Analytics Platform
adds context, more compute intensive modelling etc
Hmmm… That’s interesting
Let me tell you more…
“small” data flow
ANPR: or why you can’t hide from parking fines
Summary: progressive enhancement of context
Is it worth processing? Rough-cut and hashing Expensive deep analysis
@sireb
677KB of local model O(100MB) models Cloud scale models and data
name
Simon Elliston Ball
cognitive.face.emotion
surprise
cognitive.face.exposure
overExposure
cognitive.face.noise
high
Thank you!
@sireb