Grayscale Image Colorization Using Machine Learning …

8
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050 051 052 053 054 055 056 057 058 059 060 061 062 063 064 065 066 067 068 069 070 071 072 073 074 075 076 077 078 079 080 081 082 083 084 085 086 087 088 089 090 091 092 093 094 095 096 097 098 099 100 101 102 103 104 105 106 107 108 109 Grayscale Image Colorization Using Machine Learning Techniques Zachary Frenette [email protected] University of Waterloo, 200 University Ave W, N2L 3G1 Abstract From modern medical imaging to antique photography, there exists vast amounts of illustrations and photographs that lack color information. Adding color to these images could help improve both visual appeal and expressiveness. Given that the majority of colorization methods rely heavily on user interaction, we would like to explore how machine learning techniques can be applied to the colorization process of grayscale images, and then analyze the limitations of each of these techniques. 1. Introduction Image colorization can be described as the process of assigning colors to the pixels of a grayscale image. This problem is ill-posed in the sense that, without prior information regarding the image, there is often more than one possible colorization. In other words, the colors of an object cannot usually be distinguished from one another by simply looking at the grayscale component of the object. For example, all other things equal, a red balloon would most likely look the same as a green balloon in a grayscale image. Because of this property, automatic image colorization is a very challenging task. Many of the current methods used for grayscale image colorization rely heavily on user interaction. Normally, this is achieved by professional artists who use software to manually adjust the colors, brightness, contrast and exposure of the image (Piti´ e et al., 2008). Not only is this an expensive procedure, but it is also very time consuming. On the other hand, the semi-automatic algorithms that exist for image colorization all suffer from a variety of limitations. These limitations range from a lack of robustness, to a requirement for some Project report for CS886: Applied Machine Learning. University of Waterloo, Fall 2014. of the data to be manually processed. For example, there has been some work done for the case where the user provides the colors of a few regions before having an algorithm propagate this color information to the rest of the image (Levin et al., 2004). Similar work has been done for which the user plays more of an interactive role by manually coloring some of these regions in between propagation steps (Charpiat et al., 2010). There has also been some work done in the context of fully-automatic algorithms, though the algorithms proposed only seem to work well when the image has a few colors (Ashikhmin et al., 2002). In addition, several discrepancies can typically be observed in the resulting images. Recently, machine learning techniques have been employed in the colorization process of grayscale images (Charpiat et al., 2010; Liu and Zhang, 2012). In this project, we compare the performance of some these machine learning methods, and then analyze their respective limitations. Section 2 discusses some of the models that are required in order to present the colorization methods that we consider. In sections 3 and 4, we formulate image colorization as a machine learning problem and describe the dataset that will be used throughout this project. In sections 5 and 6, we provide a discussion of the methods employed as well as the results obtained, while in section 7, we finish with some concluding remarks and potential directions for future work. 2. Preliminaries In this section, we define several of the models that will be used in the development of our image colorization algorithms. 2.1. The Spatial Image Model There are several different ways of thinking about the representation of an image. Intuitively, we can think of an image as a function f : U →C where U R 2 is a subset of the plane and C is a color space. However, from a computational perspective, this may not be the

Transcript of Grayscale Image Colorization Using Machine Learning …

Page 1: Grayscale Image Colorization Using Machine Learning …

000001002003004005006007008009010011012013014015016017018019020021022023024025026027028029030031032033034035036037038039040041042043044045046047048049050051052053054

055056057058059060061062063064065066067068069070071072073074075076077078079080081082083084085086087088089090091092093094095096097098099100101102103104105106107108109

Grayscale Image Colorization Using Machine Learning Techniques

Zachary Frenette [email protected]

University of Waterloo, 200 University Ave W, N2L 3G1

Abstract

From modern medical imaging to antiquephotography, there exists vast amounts ofillustrations and photographs that lack colorinformation. Adding color to these imagescould help improve both visual appeal andexpressiveness. Given that the majority ofcolorization methods rely heavily on userinteraction, we would like to explore howmachine learning techniques can be appliedto the colorization process of grayscaleimages, and then analyze the limitations ofeach of these techniques.

1. Introduction

Image colorization can be described as the process ofassigning colors to the pixels of a grayscale image.This problem is ill-posed in the sense that, withoutprior information regarding the image, there is oftenmore than one possible colorization. In other words,the colors of an object cannot usually be distinguishedfrom one another by simply looking at the grayscalecomponent of the object. For example, all other thingsequal, a red balloon would most likely look the sameas a green balloon in a grayscale image. Because ofthis property, automatic image colorization is a verychallenging task.

Many of the current methods used for grayscale imagecolorization rely heavily on user interaction. Normally,this is achieved by professional artists who use softwareto manually adjust the colors, brightness, contrast andexposure of the image (Pitie et al., 2008). Not only isthis an expensive procedure, but it is also very timeconsuming. On the other hand, the semi-automaticalgorithms that exist for image colorization all sufferfrom a variety of limitations. These limitations rangefrom a lack of robustness, to a requirement for some

Project report for CS886: Applied Machine Learning.University of Waterloo, Fall 2014.

of the data to be manually processed. For example,there has been some work done for the case wherethe user provides the colors of a few regions beforehaving an algorithm propagate this color informationto the rest of the image (Levin et al., 2004). Similarwork has been done for which the user plays moreof an interactive role by manually coloring some ofthese regions in between propagation steps (Charpiatet al., 2010). There has also been some work donein the context of fully-automatic algorithms, thoughthe algorithms proposed only seem to work well whenthe image has a few colors (Ashikhmin et al., 2002).In addition, several discrepancies can typically beobserved in the resulting images.

Recently, machine learning techniques have beenemployed in the colorization process of grayscaleimages (Charpiat et al., 2010; Liu and Zhang, 2012).In this project, we compare the performance of somethese machine learning methods, and then analyzetheir respective limitations. Section 2 discusses someof the models that are required in order to present thecolorization methods that we consider. In sections 3and 4, we formulate image colorization as a machinelearning problem and describe the dataset that will beused throughout this project. In sections 5 and 6, weprovide a discussion of the methods employed as wellas the results obtained, while in section 7, we finishwith some concluding remarks and potential directionsfor future work.

2. Preliminaries

In this section, we define several of the models that willbe used in the development of our image colorizationalgorithms.

2.1. The Spatial Image Model

There are several different ways of thinking about therepresentation of an image. Intuitively, we can thinkof an image as a function f : U → C where U ⊂ R2 isa subset of the plane and C is a color space. However,from a computational perspective, this may not be the

Page 2: Grayscale Image Colorization Using Machine Learning …

110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164

165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219

CS 886 Final Project

ideal representation of an image. Instead of tryingto encode a continuous subset of R2, we will encodea discrete subset U ′ of U . More specifically, we willconsider a natural discretization in the spatial imagemodel (Velho et al., 2008). In this model, we assumethat the domain of f is a rectangle U = [a, b] × [c, d],and for some fixed values of δx and δy, we apply thefollowing discretization:

U ′ = {(xj , yk) ∈ U : xj = jδx, yk = kδy where j, k ∈ Z}.

Here, U ′ is an orthogonal lattice of points, where eachpoint (xj , yk) is called a pixel. Using this definitionof U ′ yields a natural matrix representation of theimage. More specifically, we can define a matrixI ∈ Cm×n where each element Ij,k = f(xj , yk). Inother words, the value of each element in our matrixis simply equal to the color of that correspondingpixel. Given that images on a computer are typicallyencoded using this type of representation, we willadopt this representation throughout the remainder ofthis project.

2.2. The LAB Color Model

Images on a computer are typically represented in theRGB color model, in which the color of each pixel isdetermined by a 3-tuple ρ = (r, g, b) denoting the red,green, and blue components of that color respectively.However, the RGB model is a device dependent colormodel, and therefore, some of the color points thatare represented in this space may not be absolute.Hence, the same point may produce different colors ondifferent devices. Furthermore, this color model wasoptimized for performing operations on devices andtherefore is not representative of human perception(Velho et al., 2008). Because of these properties,we will instead choose to represent images in theLAB color model (Charpiat et al., 2010). The colorsin this model comprise of 3 different components,the first of which represents luminance while theother two orthogonal components store explicit colorinformation.

Not only is the LAB color model device independent,but it was also designed in a way to approximatethe human perception of brightness and color. Inparticular, the Euclidean distance between two colorsin this model approximates the difference in perceivedcolor, which provides a natural distance metric formeasuring the similarity between two colors. It isworth noting that the color gamut of this model islarger than the color gamut of human vision. Thismeans that there are points in this space that do notcorrespond to any of the colors that are perceivable bythe human eye. For additional details regarding the

different characteristics of these two color models, werefer the interested reader to the work of Hunt (Hunt,2005).

2.3. Markov Random Fields

A Markov random field is a graphical model which hasthe structure of an undirected graph. The vertices ofthis graph correspond to random variables while theedges model the conditional independencies betweenthem. More formally, a Markov random field is a tupleM = (V,F ,Λ, N), where V = {v1, . . . , vn} is a setof vertices, F = {Fi : i ∈ V } is a set of randomvariables, Λ is a set of labels, and N : V → 2V

is a neighborhood function (Bishop, 2006). The setΛ contains a label for each possible outcome of therandom variables contained in F . In particular, we willuse the notation λi ∈ Λ to denote the label assignedto the random variable Fi. Furthermore, we will usethe notation FS = {Fi : i ∈ S} and ΛS = {λi : i ∈ S}to denote specific sets of random variables and labelsrespectively.

A Markov random field must also satisfy the Markovproperty. That is, for any particular realization of therandom variables, we have that:

P (Fi = λi|FV−i = ΛV−i) = P (Fi = λi|FN(i) = ΛN(i)).

In other words, Fi is independent of any other randomvariable given its neighbors. Markov random fieldsare often used to model labeling problems wheresome particular labeling is desired. In our case,we will be interested in labeling the pixels of agrayscale image with colors from our color spaceC. From an algorithmic point of view, this desiredlabeling is obtained by trying to find an assignment ofcolors that minimizes an appropriate energy functionE(ΛV ). Markov random fields have many other usefulproperties, though not all of them will be needed forthis project (Wang et al., 2013). The properties thatwill be needed will be discussed in section 5 whenthe details of the machine learning algorithms arepresented.

3. Machine Learning Formulation

There are several differences between the traditionalsupervised learning paradigm and the one we willadopt for this project. Under typical circumstances,we are given a fixed training set where the trainingexamples are sampled independently from someunderlying distribution D. This set is then processedby some learning algorithm before it can be usedto classify new data points, which are also sampledindependently from this same distribution. In our

Page 3: Grayscale Image Colorization Using Machine Learning …

220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274

275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329

CS 886 Final Project

case, we are given a collection of colored images I ={I1, . . . , Ik} as well as a grayscale image I ′, and thegoal is to assign a color from C to each of the pixels ofI ′. It is worth observing that I plays the role of ourtraining set while the pixels of I ′ represent the newdata points that we wish to label and classify.

Unlike in the traditional supervised learning paradigm,our training set I is of small size and its contentvaries depending on the grayscale image I ′. Morespecifically, our training set is chosen to be a smallset of colored images which are all similar to thegrayscale image that we are trying to colorize. Forour project, the training set will consist of a singlecolored image, though it is possible to consider caseswhere the training set contains several images. As aresult, for each of the grayscale images that we wantto colorize, a new training set needs to be chosen andthe learning algorithm needs to be re-executed.

4. Dataset Used

Our dataset will primarily consist of a subset of thecolored images made available by Jegou, Douze andSchmid from their work on image retrieval (Jegouet al., 2008). More specifically, this dataset entails awide variety of outdoor images, some of which includenatural scenery, man-made objects, animals, lakesand waterfalls. Furthermore, this collection containsimages that have been rotated, and that containchanges in perspective or illumination. These changeswill help us test the robustness and limitations of themachine learning algorithms we apply. Lastly, it isworth noting that, although primarily outdoor scenes,this dataset contains a handful of indoor images aswell.

For testing purposes, we will select a small subset ofthe images and transform them into their grayscalecounterpart. Although in practice we will not knowthe real colorings of these images, it will facilitate thetask of measuring the error and performance of eachmachine learning algorithm. Reasonable error metricsfor the case in which the true colorization of an imageis not known still requires additional research.

5. Description of Methods Used

In this section, we describe each of the methods andsteps used in our image colorization algorithms.

5.1. Preprocessing Step

In order to extract a meaningful set of features andfacilitate training, each image in our training set will

be preprocessed in two ways. As described in section2.2, we begin by converting the representation ofeach image from the RGB color model to the LABcolor model. This conversion is accomplished througha series of non-linear transformations, although thedetails of said transformations are not important forthis project (Hunt, 2005).

Next, we reduce the size of the color space of eachimage to a manageable subset of colors througha process called color space quantization (Charpiatet al., 2010; Velho et al., 2008). A typical image inour dataset has tens of thousands of different colors,many of which only appear in a small handful ofpixels. Hence reducing the size of our color space willnot only help eliminate outliers, but also allow us towork with significantly less prediction classes. Usingan algorithm like k-means, we can cluster groups ofcontiguous pixels having similar colors into k differentbins (Bishop, 2006). By assigning a color to each ofthe bins, we can then recolor the image by only usingk different colors. It is worth noting that we are onlyperforming quantization on the (a, b)-components ofeach pixel since the luminance component does notstore explicit color information. For this project, wewill consider k = 16 different color classes.

5.2. Feature Extraction

In essence, we are interested in labeling pixels withtheir appropriate color. Hence the features that weselect should reflect the properties of the pixels insteadof the entire image. However, features on individualpixels do not convey much information. Instead, fora given pixel ρ of the image, we will extract featuresfrom a δ × δ window centered at ρ. This will allow usto obtain information about the local neighborhoodof ρ. As a result, we will be interested in 4 classes offeatures over this δ×δ window: SURF descriptors (Bayet al., 2006), the magnitude of the 2D Discrete FourierTransform (DFT), the grayscale histogram, and thelocalized mean and standard diviation of the intensity.In particular, we will calculate SURF descriptors overthree different scales, while the other features areextracted over a 11 × 11 window centered at ρ. Thisgives us an ungodly 763 dimensional feature vector.

As the dimensionality of our feature vector is quitelarge, we will apply the Principal Component Analysis(PCA) algorithm in order to reduce its dimensionality(Bishop, 2006). In particular, we will reduce thedimensionality in a way such that 90% of the variancein our data is maintained. For each image, featurevectors will be extracted from a random sample of Npixels, which we choose to be approximately 4% of the

Page 4: Grayscale Image Colorization Using Machine Learning …

330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384

385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439

CS 886 Final Project

pixels in the image.

5.3. Initial Color Prediction Phase

Predicting the colors of the pixels in our grayscaleimage is done in two phases. In the first phase, weobtain initial estimates for the probabilities of thecolors of each pixel. After obtaining these estimates,the image is modeled as a Markov random field wheregraph cuts are used in order to obtain a globallyspatial coherent labeling of the pixels. This sectionwill address the first phase of the colorization processwhile section 5.4 will discuss the application of graphcuts in obtaining a final coloring of the image.

In order to estimate the desired probabilities, weconsider two different machine learning models. Thefirst model that we consider is linear logistic regression.Let C′ = {c1, c2, . . . , ck} denote the quantized colorspace after applying k-means clustering and let φρdenote the feature vector for pixel ρ. Under thismodel, we are interested in learning linear decisionboundaries that model P (C = ci|φρ) where ci is a colorin C′. Although simple, linear logistic regression canbe parameterized to control regularization and thusshould provide a good baseline. It is worth notingthat the implementation of linear logistic regressionselected is using a one versus all approach ratherthan a true multinomial regression technique. Inparticular, k different logistic regression models aretrained as follows. For each model, the data is dividedin a way such that all feature vectors with outputclass ci are grouped together as positive instanceswhile the other feature vectors are treated as negativeinstances. However, this approach creates a skewbetween positive and negative instances. Therefore,in order to help minimize bias, examples are sampledinversely proportional to their frequencies in thetraining set.

The second model we consider is support vectormachines, which are one of the most popular classifiersused in the literature for image colorization (Charpiatet al., 2010). In this model, we are interested inlearning the decision boundaries directly rather thanfirst learning the conditional probability distributionP (C = ci|φρ). Once again, a one versus all approachis taken for classification. That is, we train k supportvector machines that perform binary classification,and the training data is divided in a manner thatis analogous to what is described for linear logisticregression. To allow for more flexibility, a Gaussiankernel is used to create non-linear boundaries. Inaddition, as images are noisy by nature, soft-marginclassifiers are used in order to allow a small degree

of misclassifications. Since we are not learning thedistribution P (C = ci|φρ) directly, we will use thedistance between a new data point and the margin asa proxy for confidence. These values are then used inthe post-processing phase by the graph cut algorithm.

It is worth mentioning that there are other simplemachine learning models that can be used to performmulticlass classification, such as nearest neighbors anddecision trees (Bishop, 2006). However, the mainreason they were not considered as models for imagecolorization is that there is no straightforward wayto obtain probability estimates for the different colorclasses. We will return to this idea when we discusspossible avenues for future work.

5.4. Post Processing Phase Using Graph Cuts

The colorization process of a grayscale image I ′ startsby computing the feature vectors φρ for every pixelin I ′. These feature vectors are then passed to oneof the models described in the previous section whereprobability estimates are derived for each color class.In order to achieve a globally spatial coherent coloring,the grayscale image is then modeled as a Markovrandom field M = (V,F ,Λ, N). In particular, webegin by setting a vertex and a random variable forevery pixel in our grayscale image. That is, we definethe sets V = {vρ : ρ ∈ I ′} and F = {Fvρ : vρ ∈ V }.The labels of our Markov random field correspondto the possible colors that a pixel can have. In ourcase, we have that Λ = {λc : c ∈ C′}. The lastthing to define is our neighborhood function, whichis what models the edges in our graph. When buildinga Markov random field, we assume that our graphsatisfies the Markov property. As such, there areseveral reasonable choices for a neighborhood functionwhen modeling an image (Boykov and Veksler, 2006).For our application, we will choose a neighborhoodfunction such that every vertex is connected to the 8other vertices surrounding it.

In essence, a graph cut algorithm works by tryingto find an assignment of labels that minimizes someenergy function defined on our Markov random field(Boykov and Veksler, 2006). In our case, this energyfunction should accomplish two things. First, wewould like the coloring between adjacent pixels to besmooth, while allowing color discontinuities at edgeboundaries. Second, we would like to encourage alabeling of pixels by colors that were initially predictedwith high confidence. Let ΛV = {λρ : vρ ∈ V } denotesome labeling of the vertices and let g(ρ) denote themagnitude of the gradient at pixel ρ. In addition, wedefine s(ρ, λ) to be the estimated probability that ρ has

Page 5: Grayscale Image Colorization Using Machine Learning …

440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494

495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549

CS 886 Final Project

the color labeled by λ. With these auxiliary terms, wecan define our energy function E(ΛV ) as follows:

Eρ(ΛV ) =∑ρ′

g(ρ′) · ‖λρ − λρ′‖2 −∑λ∈Λ

s(ρ, λ)δ(λρ, λ)

E(ΛV ) =∑ρ

Eρ(ΛV ).

Here, ρ′ is used to denote the neighbors of ρ accordingto our neighborhood function N , and δ is a functionthat produces 1 if and only if its two parameters areequal. The first term in our summation is used topenalize color variation where it is not expected, whilethe second term encourages the use of high confidencecolors.

In our project, we have made use of the graph cutimplementation provided by Delong, Osokin, Isack,and Boykov (Delong et al., 2012). Despite the factthat finding an optimal solution is NP-Hard, theiralgorithm can approximate the solution within areasonable amount of time.

6. Error Analysis and Discussion

Our experiments have produced mixed results. Inparticular, the quality of the coloring obtained variesgreatly from image to image and depends heavilyon the choice of parameters. For images containinggeneric outdoor scenery, the algorithms tend to doreasonably well. Although several patches are coloredincorrectly, the general colors are all present. However,for most indoor images with many different colors, thetwo algorithms perform quite badly. In particular,Figure 1 shows the results of a colorization thatboth models struggled with. We can observe thatseveral important colors are missing in the results,as well as heavy patches of noise and discoloration.One possible explanation for these difficulties is thatthe image has many colors that are only presentin small contiguous areas. Therefore, since we aresampling our training examples randomly, it couldvery well be the case that we are not extracting enoughinformation from those regions. This is made evidentby noticing that, especially for the support vectormachine model, most of the image is colored yellow,the dominant color found in our training image. Theseobservations suggest that a better sampling methodmay be required in order to achieve better results.

Another common source of error that we observeis the inability to differentiate between regions thathave similar textures but different colors. Since ourfeatures are all extracted within a small windowsurrounding ρ, we do not capture differences between

(a) Training image (b) Test image

(c) Colored using LR (d) Colored using SVM

Figure 1. Sample coloring of a school classroom

two regions that are locally similar, but that arepart of differently colored objects. For example,we can see that in Figure 2, both our models haddifficulties differentiating between the ragged greenleaves and some of the mountainous regions. Infact, these difficulties are much more apparent forthe logistic regression model than they are for thesupport vector machine model. This suggests thatsimply incorporating localized features for each pixelmay not be enough to achieve good colorizations. Forexample, one might try explicitly incorporating anobject recognition step or a region segmentation step.This would allow us to add features that keep track ofwhich region or which object each pixel belongs with.

Another important observation is that our algorithmsappear to be robust against certain transformations.In particular, we have observed that both modelsseem to perform well when the test image is moreor less a rotation of the training image. This issomewhat expected because a large portion of ourfeatures are obtained from SURF descriptors, whichwere originally designed to be unaffected by rotations(Bay et al., 2006). One such example can be seenin Figure 3. Besides for a few discolored patches,both models deliver promising results. On the otherhand, our algorithms perform quite poorly when thereis a significant difference between the brightness ofthe training image and the test image. A changein brightness affects the luminance component ofevery pixel, and hence changes a large portion ofour feature vector. The Fourier transform, thegrayscale histogram, and the mean of the intensity

Page 6: Grayscale Image Colorization Using Machine Learning …

550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604

605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659

CS 886 Final Project

(a) Training image (b) Test image

(c) Colored using LR (d) Colored using SVM

Figure 2. Sample coloring of a mountainous region

are all features that depend directly on luminancevalues. Therefore, since the same pixel in both imageswould produce very different feature vectors, it isnot unreasonable to expect that our machine learningmodels would struggle with this task.1

Due to the ill-posed nature of the problem, it is difficultto design reasonable error metrics. For example,whether an image is aesthetically pleasing or notis entirely subjective, and thus difficult to quantify.Consequently, this makes parameter optimization achallenging task. Under normal circumstances, modelparameters are learned through a process calledcross-validation (Bishop, 2006). During this process,the training set is partitioned into a smaller trainingset and a validation set. When values for theparameters are selected, the performance of the modelcan then be measured against the validation set. Thisprocedure allows us to learn near-optimal values forour parameters since it provides an unbiased estimateof the generalization error. In our case, since we haveno concrete measure of error, we cannot automaticallyassess the selection of our parameters. Therefore,parameter tuning was done manually on a per imagebasis, which we believe played a significant role in allof the errors described above.

7. Conclusions

In this project, we have explored how machine learningtechniques can be applied to the colorization process of

1Additional examples can be found in Appendix A.

(a) Training image (b) Test image

(c) Colored using LR (d) Colored using SVM

Figure 3. Sample coloring of a rotated village

grayscale images. In particular, we have looked at twotypes of models: linear logistic regression and supportvector machines. Moreover, we discussed varioustypes of errors that occurred during the colorizationprocess of these images. We argued that these errorswere caused by localized features and unstructuredsampling. Furthermore, we explained that these errorswere amplified by the fact that our parameters were allchosen manually. However, despite these difficulties,our methods show promise in accurately coloringvarious types of images, particularly outdoor scenery.In regards to image transformations, we discussed whyour two methods were robust against image rotationsbut not changes in brightness. More specifically, ourmodels are robust against image rotations because alarge portion of our features are derived from SURFdescriptors. On the other hand, our models strugglewith changes in brightness because this transformationproduces significantly different feature vectors.

In terms of future work, there are many avenuesthat one could explore. For example, it would beworth investigating if different classifiers can producebetter results. Algorithms such as nearest neighborsand decision trees are simple classifiers that tendto do well in practice. Though in order to makeeffective use of these classifiers for image colorization,probability estimates for each color class is required.One potential way to obtain these estimates for nearestneighbors would be to use the Voronoi diagram asthe decision boundary. Then the distance between anew data point and that boundary could be a proxyfor probability. There has also been some work done

Page 7: Grayscale Image Colorization Using Machine Learning …

660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714

715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769

CS 886 Final Project

in obtaining accurate probability estimates for thecase of decision trees (Zadrozny and Elkan, 2001).Improvements could also be made with regards to thetypes of features that we extract. Obtaining globalfeatures could help reduce coloring errors that arecaused by regions sharing many local similarities. Forexample, one could perform a region segmentationstep in order to extract global information regardingthe different areas of an image. Not only would thisprovide global features, but it would also help withthe sampling step. Instead of randomly selectingour training pixels, we could sample a subset of thepixels from every region of the image. This wouldlikely provide a more accurate characterization of thedifferent colors and textures within the image.

On a different note, it might be worth investigatinghow the performance of our algorithms could beimproved if multiple training images are used. Thisframework would also generalize nicely to filmcolorization since contiguous movie frames are allsimilar to one another. In such a setting, the goalwould be to color the scenes of a black and whitemovie. The training set would consist of the first fewframes of a particular scene, and these colored frameswould then serve as a basis for automatically coloringthe remaining frames of that scene. Finally, we believethat it would be worthwhile to develop concrete errormetrics for the image colorization problem. This wouldfacilitate the task of automatically learning modelparameters, which would likely reduce many of theerrors that arise during the colorization process ofgrayscale images.

Acknowledgments

We would like to thank Professor Dan Lizotte for hishelpful suggestions during the course of this project.His feedback is greatly appreciated.

References

M. Ashikhmin, K. Mueller, and T. Welsh. TransferringColor to Greyscale Images. ACM Trans. Graph., 21(3):277–280, 2002.

H. Bay, T. Tuytelaars, and L. Van Gool. SURF:Speeded Up Robust Features. In ECCV, pages 404– 417, 2006.

C. M. Bishop. Pattern Recognition and MachineLearning. Springer-Verlag New York, Inc., 2006.ISBN 0387310738.

Y. Boykov and O. Veksler. Graph Cuts in Vision andGraphics: Theories and Applications, 2006.

G. Charpiat, I. Bezrukov, Y. Altun, M. Hofmann,and B. Scholkopf. Machine Learning Methods forAutomatic Image Colorization. In ComputationalPhotography: Methods and Applications, pages 395– 418. CRC Press, 2010.

A. Delong, A. Osokin, H.N. Isack, and Y. Boykov.Fast Approximate Energy Minimization With LabelCosts. International Journal of Computer Vision,96(1):1–27, 2012.

R.W.G. Hunt. The Reproduction of Colour. Wiley,2005. ISBN 9780470024263.

H. Jegou, M. Douze, and C. Schmid. HammingEmbedding and Weak Geometric Consistency forLarge Scale Image Search. In Proceedings of the 10thEuropean Conference on Computer Vision: Part I,pages 304–317, 2008. ISBN 978-3-540-88681-5.

A. Levin, D. Lischinski, and Y. Weiss. ColorizationUsing Optimization. In ACM SIGGRAPH 2004Papers, pages 689–694. ACM, 2004.

S. Liu and X. Zhang. Automatic Grayscale ImageColorization Using Histogram Regression. PatternRecognition Letters, 33(13):1673 – 1681, 2012.

F. Pitie, A. Kokaram, and R. Dahyot. Enhancementof Digital Photographs Using Color TransferTechniques. In Single-Sensor Imaging: Methods andApplications for Digital Cameras, pages 295 – 321.CRC Press, 2008.

L. Velho, A. C. Frery, and J. Gomes. Image Processingfor Computer Graphics and Vision. SpringerPublishing Company, Incorporated, 2nd edition,2008. ISBN 1848001924, 9781848001923.

C. Wang, N. Komodakis, and N. Paragios. MarkovRandom Field Modeling, Inference and Learningin Computer Vision and Image Understanding: ASurvey. Computer Vision and Image Understanding,117(11):1610 – 1627, 2013.

B. Zadrozny and C. Elkan. Obtaining CalibratedProbability Estimates from Decision Trees andNaive Bayesian Classifiers. In Proceedings of theEighteenth International Conference on MachineLearning, pages 609–616. Morgan Kaufmann, 2001.

Page 8: Grayscale Image Colorization Using Machine Learning …

770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824

825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879

CS 886 Final Project

A. Colorization Examples

(a) Training image (b) Test image

(c) Colored using LR (d) Colored using SVM

Figure 4. Sample coloring of a fish under water

(a) Training image (b) Test image

(c) Colored using LR (d) Colored using SVM

Figure 5. Sample coloring of small icebergs

(a) Training image (b) Test image

(c) Colored using LR (d) Colored using SVM

Figure 6. Sample coloring of a small food tray

(a) Training image (b) Test image

(c) Colored using LR (d) Colored using SVM

Figure 7. Sample coloring of desert rocks