DeviantART Analysis using Image Features

Image Analysis using Image Features

1

DeviantART Analysis using Image Features

Bart Buter, Davide Modolo, Sander van Noort Nick Dijkshoorn, Quang Nguyen, Bart van de Poel


2

Profile Project

• Our project focused on explorative research on the analysis of artists and their images of a huge art community called deviantART

• The research touched different fields:– Visualization (implementation of a Toolkit)– Data collection– Features extraction (statistical and cognitive-inspired) – Classification– Network analysis


3

Overview

• Introduction• Toolkit• Experiments & Results• Future work• Conclusion


4

Introduction - deviantART

• deviantART (dA) is the largest online community showcasing various forms of user-made artwork

• 13 million registered members (called Deviants)• Allows emerging and established artists to exhibit,

promote, and share their works• All artwork is well organized (comprehensive category

structure)– Traditional media (painting and sculpture), to digital art,

pixel art, films and anime


11

Research questions

• Can we visualize important aspects of deviantART?• Can artists and/or styles be distinguished?• Are artists influencing each other?• Do art styles change over time?• Are there none-artists interesting for deviantART?


12

Toolkit

• General tool to answer research questions about social art communities (deviantART)

• 4 Components

Online


13

Data collection from deviantART

• Network of “professional” artists– Download artist’s name and their watchers– Output for Pajek and Matlab graph toolbox

• Artist’s images and information about these images– Download galleries from users as dataset– No web API, instead follow Backend links– Parse RSS XML files and download images

Data collection


14

Data collection

• For each image store a xml file

Example:<?xml version="1.0"?><root xml_tb_version="3.1"> <guid>http://catluvr2.deviantart.com/art/42-

Journals-73664427</guid> <title>-42 Journals</title>

<category>customization/screenshots/other</category>

<filename>_42_Journals_by_catluvr2.jpg</filename></root>

Data collection


15

Dataset information

• Downloaded 31 users• About 5000 images • Daily Deviations of a random day

• Top categories:– photography: 2244– customization: 906– traditional: 842– digitalart: 587– fanart: 239

Data collection


16

Feature extraction

• Why we need features– Can’t visualize sets of images in high-dimensional space– Features can be intuitive for toolkit users– Easier to work with than raw data (classification)

• Kind of features:– Statistical features – Cognitively-inspired features

Feature extraction


17

Feature format

• Store features in XML files– One XML file per image describing all features– Easy to add new features of existing images– Easy to add images

• Only calculate features that are not already present in XML file

• Add those features to the XML file of the image

Feature extraction


18

Statistical features

• Low level & understandable features– RGB values (average, median)– Hue, Saturation&Intensityvalues (average, median)– Edge-pixel ratio – Corner-pixel ratio – Entropy of the intensity– Variance of the intensity– Compositional features

Feature extraction – Statistic part


19

Edge-pixel ratioRatio: 0.0094 Ratio: 0.0998

Feature extraction - Statistic part


20

Average of the intensity

AvgIntensity: 21.90 AvgIntensity: 243.67AvgIntensity: 123.96



21

Entropy of the intensity

Intensity entropy: 1.5408 Intensity entropy: 7.8799



22

Variance of the intensity

Intensity variance: 506 Intensity variance: 14676



23

Compositional edge-pixel ratio



24

Hue and Saturation



25

Weibull-Distribution Image Contrast

• Why

Feature extraction – Statistical part


26

Cognitively-inspired features

Model of Saliency-Based Visual Attention

• It has appeared that attention influences visual information even in the earliest areas of primate visual cortex

• This influence seems to shape an integrated saliency map

• This maps is the representation of the environment that weighs every input by its local feature contrast and its current behavioral relevance

• It enables the visual system to integrate a large amount of information

Feature extraction - Cognitive part


27

Itti, Koch and Niebur’s Model



28

Example of saliency map

color

orientation

intensity

EXTRA: skin SALIENCY MAP

ORIGINAL IMAGE



29

What do we have• Important visual features

about the style of the photo of this image:

- The portrait is not exactly in the middle- The portrait is a human- The portrait is standing statically- Colors are quite uniform, and they are not so many

But how to use all the different maps to represent these information?

Saliency map

Skin map

Color map

Orientation map

Intensity map



30

Cognitively-inspired features (1)

• Shannon entropy of the 5 different maps (the saliency and the conspicuity ones)

• Standard deviation of the saliency distribution in the saliency map

• Location of the three most salient points

• Skin intensity



31

Cognitively-inspired features (2)

• Location has been computed using the Inhibition Of Return (IOR) procedure:

Original saliency map

After the first inhibition After the second inhibition

3 most salient locations



32

Cognitively-inspired features (3)• Skin is an extra channel (not standard in the Itti’s model) but it has

been found really interesting• It can easily be used to detect nude images (that are quite popular

within devianArt’s professional photographer)

Original image Skin map

Original image

Skin map



33

OpenCV face detector



34

Classification

• Given a set of features, the classification is used to:– Determine if two artists/categories are distinguishable – Determine which features are useful to do it

• Different classifiers are available in the Toolkit:– k-Nearest Neighbour (kNN)– Naive Bayes (NB)– Nearest Mean (NM)– Support Vector Machine (libSVM)

Classification


35

Classification

• Pre-processing functions: – Reading in XML files and creating a dataset– Normalization– Dataset filtering on classes and features– Parameter optimization using cross-validation

• Classification current capabilities: – 1 class against another class– 1 class against all other classes

Classification


36

Classification

• Feature selection is needed when dealing with a lot of features– Reduces the dimensions of the data representation – Give the feature combination that best separate a class

• Sequential forward feature selection– First select the most informative feature and iteratively

add the next most informative feature to it– Criterion is based on the inter-intra distance

Classification


37

Classification

• Evaluation measures:– Precision

• The percentage of how many of the positive classified images were indeed positive

– Recall• The percentage of how many of the total positive images

were found positive

– F1-Measure • The weighted average of the precision and recall

Classification


38

Visualization

• Purpose of the visualization:1. Visualize the dataset

• Find patterns• Analyse classification results• Filtering (relevant information)• Input: Dataset (thumbs+full) images & XML features files

– Converted to single TAB seperated file

2. Express the classification performance• Capture the performance in one graph• Input: performance output of the classifier

Visualization


39

Visualization

• Use existing visualization application?– Mondrian, general purpose statistical data-visualization system

Visualization

http://rosuda.org/mondrian/


40

Visualization

• Use existing visualization application?– XmdvTool, interactive visual exploration of multivariate data sets

– Flat version of the data set

Visualization

http://davis.wpi.edu/~xmdv/


41

Visualization

• Use existing visualization application?• Tool that has generic uses, produce only generic displays• Data can take many interesting forms– Require unique types of display and interaction– Not captured with general applications

• UI not intuitive (lack easy way to filter data)• (These tools also look outdated)

Visualization


42

Visualization

• What language/framework for our visualization?• There are many…• Prefuse visualization toolkit (generic displays)

• Adobe Flash/Flex (expensive, slow for large datasets)

Visualization


43

Visualization

• (Partially) Implemented in “Processing”– Open source programming language to create images,

animations, and interactions– Build on top of Java (collection of Java classes)– Consists of:

• Processing Development Environment (PDE) (very minimalistic)

• A collection of commands (API)• Several libraries that support more advanced features

(OpenGL, XML)– Easy to integrate into Java (Eclipse)

Visualization


44

Visualization: Processing

• Provides functions to make life more easy– image(img, x, y, [width, height])– line(x1, y1, x2, y2) stroke(color)

– Not to draw complete graphs/plots

• Right combination of cost, ease of use and speed

• Export the application as a Java Applet– Run it on a website– Use URL instead of images to avoid legal issues

Visualization


46

Experiments & Results


47

Experiment #1 – Classification

• Goal:– Use the toolkit to find what kind of features best separate

two artists

• Details of the experiment– Experiment was performed for all artists in the dataset– Feature selection algorithm was used to output the 1-5

most informative features– Evaluation was done using the F-measure


48

Selecting the classifier

• Select classifier for the experiment– Train all the classifiers on a subset of the trainingdata using

crossvalidation to optimize parameters– Criteria of selection: F-measure– SVM gives the highest F-measure

KNN Naive Bayes Nearest Mean Linear SVM

0.7644 0.8157 0.7383 0.8278

Average F-measure 1vs1 classification over all artists


49

Result Matrix using the top 1 feature


50

Result Matrix using top 2 features


51

Result Matrix using top 3 features


52

Result Matrix using the top 4 features


53

Result Matrix using the top 5 features


54

Result Matrix using all features


55

Visualization Case (1)

• Artist Pair: Kitsunebaka91 and LALAax– Fmeasure Pair: 0.952941 and 0.884615– medIntCells_2 – gridEdgeRatio_4

• Artist Pair: fediaFedia and gsphoto– Fmeasure Pair: 0.867347 and 0.938095– avgHue – intVariance


56

Visualization Case (2)

• Artist Pair: K1lgore and sekcyjny– Fmeasure Pair: 0.692308 and 0.640000– avgBCells_3 – salMapCEntropy

• Artist Pair: stereoflow and zihnisinir– Fmeasure Pair: 0.649007 and 0.683871– avgHueCells_4 – avgR


57

Results

Features Number of Occurences in the top 5 features: F-measure > 0.9

EdgeRatio in the Center 13

EdgeRatio over the entire image 8

Average Hue in the Center 5

Saliency Map Skin Entropy 7

Entropy of the Intensity 5

Average R in the lower right corner 4

EdgeRatio in the Center Right 3


58

Results

Names Number of Occurences in the top 5 features: F-measure > 0.9

Kitsunebake91 12

Pierrebfoto 10

One_Vox 2

gsphoto 2

sekcyjny 2


59

Experiment #2a - Global network results

• Goal:– Describe the professional network for watcher

connections.• Results:– 103’663 unique artists – 4’483’023 connections– Average Degree: 43.25– Fraction of reciprocal links: 17.65%


60

Experiment #2b - Core network results

• Goal:– Find a core of highly connected users.

• Algorithm– Recursive remove all nodes with degree < N

• Results:• Out-degree N = 44, 1471 nodes• In – degree N = 43, 1701 nodes• In+Out – degree N = 185, 1099 nodes• CoOccurence matrix:• Tripple Occurrence:

– 14

1701 541 54541 1471 28654 286 1099


61

Core network


62

Future work

• More features – Including emotional features (color and texture)

• More network information– Using the Core network as a basis for a new dataset (ongoing)– More links, not only watchers (hierachy)

• Incorporating time

• Using classifiers to make recommendations


63

Questions

DeviantART Analysis using Image Features

Documents

Transcript of DeviantART Analysis using Image Features