A Learned Dictionary Model for Texture Classification...NP-hard. Approximation schemes include the...

A Learned Dictionary Model for Texture Classification

Clara Fannjiang [email protected]

1. Abstract

2. Introduction

Biological visual systems process incessant streams ofnatural images, and have done so since organisms firstdeveloped vision. To capture and interpret visual sig-nals with such high throughput, it makes sense thatvisual systems have unearthed clever schemes to en-code images cheaply. By recognizing statistical pat-terns that are inherent to natural scenery, visual sys-tems can represent complex images by triggering just ahandful of underlying patterns ([2], [8]) from a dictio-nary of possible patterns. Inspired by Nature’s sparsecoding strategy, we develop a novel method for tex-ture classification by learning dictionaries that providesparse models for each texture class. Unknown tex-tures are then classified based on which dictionary pro-vides the most accurate sparse representation.

More formally, let D ∈ Rn×k be a dictionary whosecolumns are k prototypical patterns or atoms. We canrepresent a signal x ∈ RN as a sparse linear combi-nation of atoms, such that x = Ds for an exact rep-resentation, or x ≈ Ds in that ||s − Dx||2 ≤ ε foran approximation. The dictionary can be overcom-plete, where k > n and there are more explanatorypatterns than signal components, or underdetermined,where k < n. Overcomplete models are more robustto noise and can capture more elaborate structures ofa signal; on the other hand, the compactness of under-determined models may distinguish between differenttextures more clearly.

For an overcomplete dictionary, note that there are in-finitely many representations s. The principle of sparsecoding is to constrain s to be the sparsest represen-tation, or the one with the least number of nonzerocomponents. Such a representation captures the signalmost cheaply for us, as we can just store the nonzerocomponents. It is given by

arg mins||s||0 s.t. x = Ds (1)

for an exact representation, or

arg mins||s||0 s.t. ||x−Ds||2 ≤ ε (2)

for an approximation, where the so-called “`-0 norm”|| · ||0 gives the number of nonzero components of theargument. A related formulation, where we cap thenumber of nonzero components in s to c, is given by

arg mins||x−Ds||2 s.t. ||s||0 ≤ c (3)

These are combinatorial optimization problems andNP-hard. Approximation schemes include the greedymatching pursuit and orthogonal matching pursuit(OMP; [13]) algorithms, as well as convex relaxationsthat replace the `-0 norm with an `-1 norm.

The other variable in problems (1) and (2), which ourtexture classification model focuses on, is the choiceof the dictionary D. If we know our signals x belongto a particular class of textures, how can we designD to adapt to the distinguishing patterns of x? Inother words, what set of atoms enables the sparsestrepresentations of a class of textures? Contrary to fixedbases like wavelets, we can indeed adapt dictionaries forparticular classes of signals. Algorithms for learningdictionaries include the probabilistic approach in [10],MOD [7], and K-SVD [1]; we focus on implementingK-SVD, as it is the most generalized framework andcomputationally cheapest. If we have a set of trainingsignals {xi}mi that are the columns of X ∈ Rn×m, thelearning task is described by the optimization problem

minD,S||X−DS||F s.t. ||si||0 ≤ c (4)

where the sparse representations si are the columns ofS ∈ Rk×m and c is the maximum number of atoms weallow in the sparse representation. The optimizationproblem is both non-smooth and non-convex in (D,S)due to the sparsity constraint, but [1] is proven to finda stationary point.


Learned dictionaries have proven to match or exceedthe state of the art in problems such as image compres-sion, de-noising, and in-painting, which have tradition-ally been the domains of fixed bases such as wavelets,edgelets, noiselets, etc. To demonstrate the power oflearned dictionaries for texture classification, we imple-ment K-SVD to learn dictionaries for different textureclasses, then run a simple classification method on testsets of 10 and 30 textures from the Normalized Bro-datz Texture Dataset ([12]), a widely used benchmarkfor texture processing algorithms.

3. Texture Classification

3.1. Past Work

The state of the art in texture classification involvesa trade-off between complex but much slower neuralnetworks ([4], [6]), which provide higher classificationrates for more texture classes, and simpler SVM-basedmethods ([9]), which provide somewhat lower classifi-cation rates but are much faster. dARTEX, the elab-orate neural architecture developed in [4], models howthe visual cortex discriminates textures and involveslayers for surface-based texture classification, contour-based edge grouping, surface filling-in, spatial atten-tion, and object attention. [6], another neural archi-tecture, uses Gabor filtering over multiple scales for itsfeature extraction and improves on dARTEX’s classifi-cation rates slightly. On the other end of the spectrum,[9] presents a multi-class SVM that cannot match theperformance of [4] and [6], but still offers classificationrates in the mid-90s on the same Brodatz test sets andis far simpler and faster.

We will compare our classification results against [4],[5], [6], and [9] on the same sets of 10 and 30 Brodatztextures. Though our learned dictionary model can-not beat these methods, we highlight that its noveltylies in its simplicity and intuitiveness. Simpler than amulti-class SVM, which must tackle the problem of se-lecting the right kernel and feature representation, wecan train our model (i.e. run K-SVD) within about 10seconds per texture class and run a fast greedy methodto classify each new texture within about a second. Alearned dictionary is an intuitive extrapolation of k-means clustering, where each observation belongs tomultiple clusters (atoms) with varying degrees of at-tachment (given by the sparse coefficients), rather thaneach observation being assigned to just one cluster.Along with this aesthetic intuitiveness, our model alsoreaches classification rates in the mid-90s. Unlike [6]and [9], which use fixed dictionariesmulti-scale Gaborfilter banks and discrete wavelet transforms, respec-tivelyto extract features, our learned dictionary model

performs adequately despite its simplicity because it isexplicitly designed to adapt to each texture class.

3.2. Learned Dictionary Models

4. Methods

4.1. K-SVD: Dictionary Learning

To learn a dictionary for a texture class, we train K-SVD as described in [1] (reproduced in Fig. 1) on 100016 × 16-pixel patches extracted from an image of thetexture. For the “Sparse Coding Stage” of K-SVD, weuse OMP as described in [13] (reproduced in Fig. 2).

The two key K-SVD parameters to choose are c in Eq.4, the number of atoms permitted in the sparse repre-sentation, and k, the number of atoms in the dictionaryD. When c = 1, K-SVD reduces to the special case ofk-means clustering; when c > 1, a texture patch canbe expressed as a combination of multiple underlyingpatterns. Similarly, the redundancy ratio k/n can tunethe descriptive power of the model: the more redun-dant the dictionary, the more complex patterns thatcan be captured in a sparse representation. Small k/n,however, creates highly compact models that can dis-criminate between different textures more easily. Wewill show classification results across a range of valuesfor both k and c.

Figure 1. The K-SVD algorithm (Fig. 2 in [1]).


Figure 2. The OMP algorithm (Alg. 3 in [13]).

4.2. Classification on the Normalized BrodatzTexture Dataset

We train and test our model on the same set of 10Brodatz textures used in [4] and [6] (listed on the left-hand side of Fig. 11), and the same set of 30 Brodatztextures used in [5], [6], and [9] (listed on the left-handside of Fig. 13).

To create disjoint training and testing instances pertexture, we follow a similar protocol to [6]: each textureis represented in the Brodatz database as a single 640×640-pixel image, which we split into a 3 × 3-grid of 9non-overlapping 213 × 213-pixel sub-images. For eachtexture, we train K-SVD on 1000 randomly selected16 × 16-pixel patches from the central sub-image. Wethen use the remaining 8 sub-images as test images, so

that there are 8 test images per texture. The trainingand testing pipeline proceeds as follows:

• Inputs: training patches for each texture class,K-SVD parameters k and c, a test image

• Training: For each texture class, train K-SVD onrepresentative patches of the texture to obtain alearned dictionary D.

• Testing: Extract patches from the test image.For each patch x, run OMP to obtain a c-sparserepresentation s from each learned dictionary.Classify the test image as the texture whose dic-tionary provides the smallest average re-projectionerror ||x−Ds||2 across all patches.

Fig. 3 gives an example of the random dictionary usedto initialize K-SVD. As a demonstration, the algorithmcan iteratively adapt these random atoms to learn thedistinctive patterns of the Brodatz textures D74 andD106 in Figs. 4 and 7, as seen in Figs. 5, 6, 8, and9. Note howwithout any prior knowledgethe smooth,curved edges of the coffee bean texture emerge fromnoise-like atoms in Figs. 5 and 6, and the distinctivestripes of the mesh texture develop in Figs. 8 and 9.

Initial Random Dictionary

Figure 3. Random dictionary we use to initialize K-SVD.

5. Classification Results

5.1. 10 Brodatz Textures

Fig. 10 demonstrates how varying the K-SVD param-eters k, giving the redundancy of the dictionary, andc, the number of atoms chosen in the sparse repre-sentations, affect classification rate on the 10 Brodatz


Figure 4. Original Brodatz texture D74 (coffee beans).

Dictionary at Iteration 5

Figure 5. Dictionary for sparse representation of 16×16-pixel patches of Brodatz texture D74 (k = 40, c = 4) after5 iterations of K-SVD.

textures. The highest classification rate of 96.25% isobtained with k = 64, c = 4. We observe that k needsto be large enough that a dictionary can capture suf-ficiently elaborate patterns of the texture, but if k istoo large, a dictionary starts incorporating patternsinduced by noise rather than the texture itself. Thesespurious patterns provide no discriminative power be-tween different textures, as shown in the “128-AtomDictionary” curve, which is outperformed by the 64-atom dictionary.

Similarly, we observe that c needs to be large enoughthat complex textures can be expressed as combina-tions of multiple patterns. However, if c is too large,OMP starts selecting atoms that are less representa-tive of a texture patch. Again, these irrelevant atomsweaken the discriminatory power of the model: al-

Final Learned Dictionary at Iteration 15

Figure 6. Final learned dictionary for sparse representationof 16×16-pixel patches of Brodatz texture D74 (k = 40, c =4) after 15 iterations of K-SVD.

Figure 7. Original Brodatz texture D106 (mesh).

most all curves decline in performance once c growstoo large.

Fig. 11 shows the confusion matrix of the best learneddictionary model on the 10 Brodatz textures, whererow label gives the true texture and the column la-bel gives the predicted texture. Our model only mis-classifies 3 test images of herringbone, canvas, andjeans as raffia, and perfectly classifies the rest.

Tab. 1 compares the learned dictionary model’s classi-fication rate to those of other methods. Though it can-not match their performance, it performs adequatelyenough to reach the mid-90s, which is satisfying giventhe model’s simplicity.


Dictionary at Iteration 5

Figure 8. Dictionary for sparse representation of 16×16-pixel patches of Brodatz texture D106 (k = 40, c = 4) after5 iterations of K-SVD.

Classification Method Classification Rate

Dictionary Learning Model 0.9625Bhatt et al. [4] 0.9810Diaz-Pernas et al. [6] 0.9956

Table 1. Comparison of the texture classification methodsin [4] (“1 Texture/Image with attention”), [6], and ourlearned dictionary model (k = 64, c = 4) on 10 Brodatztextures.

5.2. 30 Brodatz Textures

Fig. 12, Fig. 13, and Tab. 2 provide analogous infor-mation to Fig. 10, Fig. 11, and Tab. 1 on the largerset of 30 Brodatz textures.

We observe similar trends: Fig. 12 reveals it is impor-tant that k and c are tuned to be not too large, yetnot too small, and Tab. 2 shows that though we can-not compete with more complex methods, we can stillreach classification rates in the 90s. Fig. 13 revealsthat much of our mis-classification occurs between 3different sand textures, which is reasonable given theinherent similarity between sand textures.

6. Conclusion

We have developed a simple, novel model for textureclassification, based on learned dictionaries for sparserepresentations of textures. By tuning the parametersk and c of the dictionary learning algorithm K-SVD,which control the redundancy of the dictionary and thesparsity of the representations, we can control the dis-

Final Learned Dictionary at Iteration 17

Figure 9. Final learned dictionary for sparse representa-tion of 16×16-pixel patches of Brodatz texture D106 (k =40, c = 4) after 15 iterations of K-SVD.

Classification Method Classification Rate

Dictionary Learning Model 0.9250Chen et al. [5] 0.9572Li et al. [9] 0.9634Diaz-Pernas et al. [6] 0.9890

Table 2. Comparison of the texture classification methodsin [5], [6], [9], and our learned dictionary model (k = 32, c =2) on 30 Brodatz textures.

criminative power of our model. We have evaluatedthe model on sets of 10 and 30 textures from the Bro-datz database, a widely used benchmark for textureprocessing algorithms. Though it cannot match theperformance of much more complex methods, it canstill reach classification rates in the mid-90s. Giventhe simplicity and intuitiveness of the model and clas-sification method, we consider this a success.

There are many extensions that can be explored forthe learned dictionary model. Learning dictionaries atdifferent scales of the textures, for example, could sim-ulate the multi-scale feature extraction highlighted in[6]. We also intend to replace K-SVD with a hierarchi-cal learning algorithm such as [3], which will providedictionary models invariant to rotation and scaling.

References

[1] Aharon, M., Elad, M., & Bruckstein, A. (2006).K-SVD: An algorithm for designing overcom-plete dictionaries for sparse representation. IEEETrans. Sig. Proc., 54:4311-4322.


2 4 6 8 10 12 14 16

0.78

0.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

Number of Chosen Atoms

Cla

ssific

ation R

ate

4−Atom Dictionary

8−Atom Dictionary

16−Atom Dictionary




Figure 10. Classification rate of the learned dictionarymodel on 10 Brodatz textures for k ∈ {4, 8, 16, 32, 64, 128}and c ∈ {1, 2, 4, 8, 16}.

[2] Atick, J. J. & Redlich, A. N. (1992). What doesthe retina know about natural scenes? NeuralComputation, 2:308-320.

[3] Bar, L. & Sapiro, G. (2010). Hierarchical dictio-nary learning for invariant classification. ICASSP2010.

[4] Bhatt, R., Carpenter, G., & Grossberg, S.(2007). Texture segregation by visual cortex:Perceptual grouping, attention, and learning. Vi-sion Research 47(25): 3173-3211.

[5] Chen, X.W., Zeng, X., & van Alphen, D. (2006).Multi-class feature selection for texture classifica-tion. Pattern Recognition Letters, 27(14): 1685-1691.

[6] Diaz-Pernas, F. J. et al. (2009). Texture classifi-cation of the entire Brodatz database throughan orientational-invariant neural architecture.IWINAC 2009, 294-303.

[7] Engan, K., Rao, B. D., & Kreutz-Delgado, K.(1999). Method of optimal directions for framedesign. IEEE Int. Conf. Acoust., Speech, SignalProcess. 5:2443-2446.

[8] Field, D.J. (1987). Relations between the statis-tics of natural images and the response propertiesof cortical cells. Journal of the Opt. Soc. of Am.A, 4:2379:2394.

[9] Li, S., Kwok, J.T., Zhu, H., & Wang, Y. (2003).Texture classification using the support vector

8

0

0

0

0

0

0

0

0

0

0

8

0

0

0

0

0

0

0

0

0

0

8

0

0

0

0

0

0

0

0

0

0

8

0

0

0

0

0

0

0

0

0

0

8

0

0

0

0

0

0

0

0

0

0

8

0

0

0

0

0

0

0

0

0

0

8

0

0

0

0

0

0

0

0

0

0

8

0

0

0

0

0

0

0

0

0

0

8

0

0

1

0

0

1

0

0

0

1

5

D9 D17 D19 D21 D52 D57 D68 D77 D82 D84

D9 (grass)

D17 (herringbone)

D19 (weave)

D21 (wool)

D52 (canvas)

D57 (paper)

D68 (wood)

D77 (oriental cloth)

D82 (jeans)

D84 (raffia)0

1

2

3

4

5

6

7

8

Figure 11. Confusion matrix of the best learned dictionarymodel (k = 64, c = 4) on 10 Brodatz textures, where thereare 8 test images per texture class.

machines. Pattern Recognition 36(12): 2883-2893.

[10] Olshausen, B.A. & Field., D.J. (1997). Sparsecoding with an overcomplete basis set: A strat-egy employed by V1? Vision Res., 37:3311-3325

[11] Peyre, G.(2009). Sparse modeling of textures.Journal of Mathematical Imaging and Vision,34:17-31.

[12] Safia, A. & He, D. (2013). New Brodatz-basedImage Databases for Grayscale Color and Multi-band Texture Analysis. ISRN Machine Vision,2013: Article ID 876386.

[13] Tropp, J.A., Gilbert, A.C. (2007). Signal re-covery from random measurements via orthogo-nal matching pursuit. IEEE Trans. Inf. Theory,53:4655-4666.


1 2 3 4 5 6 7 80.67

0.72

0.77

0.82

0.87

0.92

Number of Chosen Atoms

Cla

ssific

atio

n R

ate

4−Atom Dictionary8−Atom Dictionary

16−Atom Dictionary32−Atom Dictionary

Figure 12. Classification rate of the learned dictionarymodel on 30 Brodatz textures for k ∈ {4, 8, 16, 32} andc ∈ {1, 2, 4, 8}.


80

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

80

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

80

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

80

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

80

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

80

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

80

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

80

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

80

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

70

0

0

0

0

10

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

10

0

0

1220

0

0

0

20

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

10

0

0

31110

0

10

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

80

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

80

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

80

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

710

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

260

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

80

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

80

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

80

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

80

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

80

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

80

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

80

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

80

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

80

0

0

0

0

0

0

0

0

10

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

70

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

80

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

80

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

8

D1

01

D1

04

D1

1D

16

D1

7D

1D

20

D2

1D

24

D2

8D

29

D3

2D

34

D3

5D

3D

46

D4

7D

49

D5

1D

52

D5

3D

55

D5

6D

57

D6

5D

6D

78

D8

2D

84

D8

5

D101 (cane)

D104 (netting B)D11 (wool cloth)

D16 (herringbone A)D17 (herringbone B)

D1 (woven wire A)D20 (canvas A)D21 (canvas B)

D24 (leather)D28 (sand A)D29 (sand B)D32 (sand C)

D34 (netting A)D35 (lizard skin)D3 (reptile skin)

D46 (patterned cloth A)D47 (patterned cloth B)

D49 (bamboo mat)D51 (raffia A)

D52 (straw cloth A)D53 (straw cloth B)

D55 (straw matting A)D56 (rattan A)

D57 (paper)D65 (rattan B)

D6 (woven wire B)D78 (mesh)

D82 (straw cloth C)D84 (raffia B)

D85 (straw matting B)0

1

2

3

4

5

6

7

8

Figure 13. Confusion matrix of the best learned dictionary model (k = 32, c = 2) on 30 Brodatz textures, where there are8 test images per texture class.

A Learned Dictionary Model for Texture Classification...NP-hard. Approximation schemes include the...

Documents

Transcript of A Learned Dictionary Model for Texture Classification...NP-hard. Approximation schemes include the...