Post on 14-Jul-2015
Our research: Cognitive Computing
4
FOUNDATIONAL BUILDING BLOCKS For us Cognitive Computing refers to the continuous development of supercomputing systems enabling the convergence of advanced analytic algorithms and big data technologies driving new insights based on the massive amounts of available data
DATA Supercomputer Systems
Big Data Technologies
Advanced Analy9c
Algorithms
Today focus: Multimedia & Spark
5
DATA Supercomputer Systems
Big Data Technologies
Advanced Analy9c
Algorithms
ML &
OpenCV
Marenostrum Supercomputer
36-p
ort F
DR1
0!36
-por
t FD
R10!
36-p
ort F
DR1
0!36
-por
t FD
R10!
36-p
ort F
DR1
0!36
-por
t FD
R10!
36-p
ort F
DR1
0!36
-por
t FD
R10!
Mel
lano
x
648-
port
IB
Cor
e S
witc
h
Mel
lano
x 64
8-po
rt IB
C
ore
Sw
itch
Mel
lano
x
648-
port
IB
Cor
e S
witc
h
Mel
lano
x
648-
port
IB
Cor
e S
witc
h
Infin
iba
nd
648-
port
FDR
C
ore
switc
h
Mel
lano
x
648-
port
IB
Cor
e S
witc
h
Mel
lano
x
648-
port
IB
Cor
e S
witc
h
36-p
ort F
DR1
0!36
-por
t FD
R10!
560!
560!
560!
560!
560!
560!
Leaf
sw
itche
s!18
!18
!18
!18
!12
!
3 lin
ks to
eac
h co
re!
3 lin
ks to
eac
h co
re!
3 lin
ks to
eac
h co
re!
3 lin
ks to
eac
h co
re!
2 lin
ks to
eac
h co
re!
FDR1
0 lin
ks!
18!
18!
18!
18!
12!
3 lin
ks to
eac
h co
re!
3 lin
ks to
eac
h co
re!
3 lin
ks to
eac
h co
re!
3 lin
ks to
eac
h co
re!
2 lin
ks to
eac
h co
re!
18!
18!
18!
18!
12!
18!
18!
18!
18!
12!
Late
ncy:
0,7
μs
!Ba
ndw
idth
: 40
Gb/
s!
Storage Network Storage Racks Computer Nodes
Computer Network
Our Supercomputer in Barcelona: Marenostrum
new module: spark4mn
• Framework to run efficiently a Spark cluster over an LSF-‐based environment and the hardware par<culari<es of MareNostrum
• The framework provides func<onali<es to evaluate different configura<ons (HDFS vs GPFS, different networks, different affini<es, cluster geometries, etc.)
8
Marenostrum Supercomputer
!
9
Shuffle 1TB of data in Sort Benchmark format 1010 records of 100 bytes each
Three different partitioning :100, 1000 and 10000 parts
Spark over Marenostrum: Shuffling 1TB
101.47 TB/H max speed (128 nodes)!
Research line in our group: Multimedia Big Data Computing
11!
real-time!
analysis!
multimodal!
social!
Crossover of 4 main aspects:
12!
social network
relationships
audiovisual content metadata
Multimedia Big Data Computing
The challenge is to work with tree kind of data at the same time:
13!
Case Study:
Multimodal Data Analytics systems can aid Desigual in better understanding their customers and potential customers through the analysis of social media data sources
Source:demotix.com
!
14!
Case Study: (Autumn-‐Winter 2015-‐2016)
Dataset1: #desigual #lavidaeschula #mydesigual
30.000 photos 100 photos x 2K followers = 200K Photos (100 GB)
Dataset 2: Followers
15!
Case Study:
AGE, GENDER, HOME LOCATION, TRAVEL PATTERNS, LIFESTYLE/CONSUMPTION PATTERNS, …
E.g. Latent User Attribute Inference to Predicting Desigual Followers
16!
CATWALK : Social Media Image Analysis for Fashion Industry Market Research
Multimedia Big Data Computing platform that operates over freely available online images from sources such as Instagram or Twitter
17!
CATWALK : Small files problem
json! json! json!
json! json! json!
json! json! json!
SEQUENCE !FILE!
SEQUENCE!FILE!
SEQUENCE !FILE!
SEQUENCE!FILE!… …
18!
CATWALK : Vectorization
PATCH 1!PATCH 2!PATCH 3!PATCH 4!
KP1!KP2!KP3!KP4!
PATCH 1!PATCH 2!PATCH 3!PATCH 4!
KP1!KP2!KP3!KP4!
kmeans! CW1!
feature detec<on
feature descrip<on
CODEWORDS DICTIONARY
CW2! CW3!
0.4! 0.2! 0.8!
Necessary for visual similarity search, visual clustering, classification, etc.
19!
CATWALK: bsc.spark.image scala> import bsc.spark.image.ImageU9ls … scala> images = ImageU9ls.seqFile("hdfs://...", sc); scala> dic9onary = ImageU9ls.BoWDic9onary(images); scala> vectors = dic9onary.getBags(images); … scala> val splits = vectors.randomSplit(Array(0.6, 0.4), seed = 11L) scala> training = splits(0) scala> test = splits(1) scala> model = NaiveBayes.train(training, lambda = 1.0) …
20!
CATWALK : Locality Sensitive Hashing e.g. near-replica detection (visual spam detection, copyright infringement)
PATCH 1!
PATCH 2!
PATCH 3!
PATCH 4!
KP1!
KP2!
KP3!
KP4!
feature detec<on
feature descrip<on
0000 0100 1100
0010 0110 1110
0011 0111 1111
features are sketched, embedded into a Hamming space
Similar features are hashed into similar buckets in a hash table
SIFT, SURF, ORB, etc.
0! 1! 1! 0!
21!
CATWALK : more …
• High performance visual recogni<on
• High performance near-‐replica detec<on
• Image style recogni<on
• Unified representa<on of inferred knowledge through ISO/IEC 24800-‐2 (JPEG’s JPSearch)
• Launch of CATWALK placorm (autumn 2015)