Download - CS87 Project Report: Parallelized Computation of Superpixelshenryshan.com/cs87-project-report.pdf · parallel computation API in C/C++) to paral-lelize th e kernels, whereas gSLIC

Transcript
Page 1: CS87 Project Report: Parallelized Computation of Superpixelshenryshan.com/cs87-project-report.pdf · parallel computation API in C/C++) to paral-lelize th e kernels, whereas gSLIC

CS87 Project Report: Parallelized Computation of Superpixels

Rachel Diamond, Tai Warner, and Henry HanComputer Science Department, Swarthmore College, Swarthmore, PA 19081

May 11, 2018

Abstract

Superpixelation involves grouping pixels in away that captures some of the perceptual andintuitive meaning of an image. Superpixels area useful tool for many computer vision problemsincluding image segmentation and object recog-nition because superpixelation can be used asa preprocessing step that both reduces dimen-sionality and identifies more meaningful features.A good superpixel algorithm efficiently producessuperpixels that respect object boundaries, areof approximately equal size, and are compact –this means that superpixel edges fall along theedges of objects in the image, there is a roughlyconstant number of pixels per superpixel, andthat each superpixel is relatively round. We im-plement a parallelized version of the simple lineariterative clustering (SLIC) algorithm for super-pixelation with the goal of improving time effi-ciency and possibly scaling to larger image sizes.SLIC uses a k-means clustering approach whichiteratively updates the assignment of pixels tosuperpixels based on the distance between thepixel and the superpixel centers. It is a goodcandidate algorithm for GPU parallelization be-cause it has subparts that can computed inde-pendently by pixel or by superpixel. Althoughour results show that our parallelized implemen-tation is 4-5 times slower than the sequentialSLIC, we achieve nearly the same accuracy usingmetrics calculated using UC Berkeley’s Segmen-tation Benchmarks - especially as the number ofsuperpixels increase.

1 Introduction

Superpixels are an approach to image segmen-tation in which the pixels are grouped togetherin a way that graphically approximates the in-formation contained in the image. For example,regions of the same color should be grouped to-gether, and edges between two objects in a pic-ture should correspond to a boundary betweentwo superpixels.

Figure 1. An example of SLIC segmentingan image into superpixels and superimposingthe boundaries (shown in yellow) over the image.

1

Page 2: CS87 Project Report: Parallelized Computation of Superpixelshenryshan.com/cs87-project-report.pdf · parallel computation API in C/C++) to paral-lelize th e kernels, whereas gSLIC

Superpixels can be useful in computer visionas a step toward object recognition; seeing ob-jects as collections of constituents is an intuitivegoal for machines. A superpixel could also pro-vide a viewer with a better idea of what an imageis about since it contains information about theshape of the region and the variation in colorwithin the region, as opposed to a single coloredpixel, which would give a viewer no idea whatthe image contains. Superpixelation can even beused to create compressed representations of im-ages if the color variation within a superpixel isdiscarded in favor of just the superpixel’s shapeand average color.

One example application where superpixela-tion served as an important preprocessing step isin Lucchi et. al’s (2010) work [3] to identify mito-chondria in an electron microscope (EM) imageof neural tissue. This is a particularly difficultobject localization and recognition problem be-cause, as can be seen in Figure 2(a), the edges ofmitochondria look very similar to other featuresin the EM image, mitochondria are irregularlyshaped, there are many structures in a single im-age, and textures clutter the background. Lucchiet. al’s were able to improve on previous resultsby introducing a new algorithm that starts bysegmenting the image into superpixels. Then avector of shape and color features were calcu-lated for each superpixel. These features andthe training data shown in Figure 2(c) are thenused to train an Support Vector Machine (SVM),which is predictive model that classifies whethereach superpixel is mitochondial boundary. Thefinal steps map back from superpixels to pixelsand apply smoothing to get a final segmentation.

Figure 2. 1 Basic steps of an object localizationalgorithm that uses SLIC to identify mitochon-dria in an image of neural tissue. (a) Originalimage. (b) Superpixel segmentation. (c) Graphdefined over superpixels. White edges indicatepairs of superpixels used to train an SVM thatpredicts mitochondrial boundaries. (d) SVMprediction where blue indicates a probablemitochondrion. (e) Graph cut segmentation. (f)Final results after automated post-processing.

We use Achanta et al’s (2010) simple lineariterative clustering algorithm, SLIC, as a base-line to implement our own version parallelized onthe GPU. SLIC is based on the classic machinelearning algorithm k-means clustering. Section3.1 has a detailed explanation of k-means, butat a high level, we begin with k pixels at evenintervals designated as the centroids. We thenrun (usually) ten iterations where pixels are as-signed to centroids and then centroids are ad-justed to have the average color and location oftheir assigned pixels. This causes centroids tosettle into k local optima, generally representingthe centers of k representative constituents of theimage. The simplicity of the algorithm allows itto perform with surprising speed given its accu-racy. However, substantially sized images — forexample, landscapes shot in high resolution —can take up to a minute due to the repetitivenature of the algorithm multiplied over millionsof data points.

1Figure from Lucchi et. al (2010) [3].

2

Page 3: CS87 Project Report: Parallelized Computation of Superpixelshenryshan.com/cs87-project-report.pdf · parallel computation API in C/C++) to paral-lelize th e kernels, whereas gSLIC

Our extension to SLIC is parallelism on theGPU. For this, our system consists of eight In-tel i7 cores and a NVIDIA Quadro P5000. Weparallelize over the pixels in the image; while theGPU can assign each thread its own pixel up toabout ten trillion pixels, we hope to quickly reacha speedup of a few thousand, when all the coresof the GPU are exploited, which should happeneven with an image as small as 50x50 pixels.

At the end of this paper, we discuss our exper-imental design and results. We present the re-sults of tests that access quality of performanceof parallel and sequential implementations andthe runtime efficiency of each. We use a metricof boundary recall to assess quality, where the in-tuition is that high boundary recall means thatour superpixelation captures most of the contrastbetween objects in the image. We also measuretotal time taken for the program to run and com-pare these results between sequential and paral-lel runs of SLIC.

2 Related Work

Achanta et. al (2010, 2012) [1][2] introducedthe SLIC algorithm, a new technique for solv-ing the problem of image superpixelation. Whileolder algorithms existed that use graph-based orgradient ascent methods, SLIC is built on a k-means clustering approach. This is conceptuallymuch simpler, because it intuitively clusters interms of the color and position of pixels – twopixels that are a similar color but far away arenot likely to be part of the same object, just astwo nearby pixels of starkly different color areprobably part of different objects. It is also asimpler algorithm computationally, and takes or-ders of magnitude less time than some other su-perpixel algorithms.

Some SLIC extensions have already been ex-plored and implemented. For example, gSLICis a parallel implementation of SLIC that usesCUDA to run on the GPU [4]. gSLIC achievedan impressive 19 times speedup when comparedto SLIC for the largest image size that they

tested (1280 x 960 pixels). ASLIC, or Adap-tive SLIC, is an implementation of SLIC thatdoesn’t require the programmer to tune any hy-perparameters [2].

Our parallelized version of SLIC differs fromgSLIC in that we primarily used Python and Py-Cuda (A Python wrapper to use Nvidia’s CUDAparallel computation API in C/C++) to paral-lelize th e kernels, whereas gSLIC is written en-tirely in C++.

Prior to SLIC, there were other approachesto image superpixelation that used graph-based(GS04, NC05) or gradient-ascent-based (WS91,MS02) algorithms. In graph based algorithms,each pixel becomes a node in a graph, andthe edge weights are set to be proportionalto the similarity between pixels. In gradientascent based algorithms, the algorithm usesgradient ascent methods to refine clusters (afteran initial rough clustering) and obtain bettersegmentation [1]. As shown in Figure 3, interms of metrics for image superpixelation, i.e.speed, boundary recall, etc., SLIC outperformsall of the other algorithms [1].

Figure 3.2This figure shows that neither NC05,a graph-based algorithm, nor QS09, a gradient-ascent-based algorithm, can outperform SLICin all four of the following metrics regardingthe superpixels created by the algorithms:object boundaries, equal size, compactness, andruntime.

2Segmented images by Achanta et. al (2012) [2].

3

Page 4: CS87 Project Report: Parallelized Computation of Superpixelshenryshan.com/cs87-project-report.pdf · parallel computation API in C/C++) to paral-lelize th e kernels, whereas gSLIC

3 Our Solution

The SLIC algorithm implements local k-meansclustering to group pixels together based on theircolor similarity and proximity. Instead of com-paring pixels in the canonical RGB color space,our implementation of SLIC assumes an imagein CIELAB color space which represents colorin terms of dimensions referred to as l, a, and brespectively. The CIELAB color space is gener-ally regarded as perceptually uniform for smallcolor distances and is intended to more closelymatch how humans perceive color [1]. Our im-plementation combines color information and lo-cation on the image for each pixel, effectivelymapping each pixel in a 2-dimensional imageinto 5-dimensional labxy space. Our implemen-tation extends to higher dimensional space tosupport segmentation of 3-dimensional imagesin 6-dimensional labxyz which allows superpix-elation of video. Because we didn’t get to ex-perimenting in 6-dimensional labxyz space withvideos, z is 1 in all our experiments.

3.1 k-Means

The k-means clustering algorithm is a machinelearning algorithm that aims to iteratively clas-sify n observations into k clusters by assigningeach observation to the cluster that is nearest toit by some given distance measure. As shown inFigure 4, cluster centers can be initialized ran-domly or based on a preset arrangement. Thealgorithm will then run iteratively for a fixednumber of repetitions or until the number of ob-servations reassigned to a new cluster from onestep to the next falls below a set threshold. Eachiteration has two steps: expectation and maxi-mization.

During the expectation step, the distance ofeach observation to each cluster center is cal-culated and the observation is assigned to thecluster that is the closest to it based on a givendistance measure (usually Eucleadian distance).Next, during the maximization step, the clustercenters are re-calculated based on the average of

the new member observations.

Figure 4. 3 In Iteration 1, cluster centers areinitialized and represented as colorful crosses.The observations are associated with the closestclusters in iteration 2. By iteration 6, thecluster centers have clearly shifted around asthey are recalculated in the maximization step.In the last plot, we can see that the algorithmhas converged and every observation is nowassociated with its closest cluster.

In our implementation of SLIC, each pixel inthe image is an observation that the algorithmclusters into one of k superpixels. We initial-ize cluster centers, which we call centroids, asthe centers of an evenly spaced rectangular gridover the image. Then we run ten iterations ofthe expectation-maximization steps. Achanta et.al found that ten iterations is usually sufficientto minimize the residual error after reassigningcentroids [1], so we default to ten iterations aswell. Also in accordance with Achanta et. al’smethod, we implement a local (as opposed toglobal) k-means algorithm. This means that dur-ing the expectation step, instead of calculatingthe distance from each pixel to each centroid,we only calculate the distance from each pixel toeach centroid within a 2S × 2S window, whereS =

√nk , n is the number of pixels, and k is the

3Taken from http://www.learnbymarketing.com/

methods/k-means-clustering/

4

Page 5: CS87 Project Report: Parallelized Computation of Superpixelshenryshan.com/cs87-project-report.pdf · parallel computation API in C/C++) to paral-lelize th e kernels, whereas gSLIC

number of centroids, or superpixels. We can re-strict the search area for each pixel because weexpect similar pixels that should be in the samesuperpixel to be nearby one another.

Implementing a local k-means instead ofsearching globally and limiting the algorithm toten iterations improves the runtime from O(kni)to O(n), where k is the number of centroids, n isthe number of pixels, and i is the number of it-erations for the k-means algorithm. This modifi-cation drastically reduces the number of calcula-tions in the expectation step - instead of calculat-ing the distance from each pixel to all centroids,which is O(kn), local k-means only requires aconstant number of calculations from each pixelto each centroid within a 2S × 2S window. Thismakes the expectation step just O(n).

3.2 Distance Measure

Because we are working in 6-dimensionallabxyz space, we use a special distance measureDs, defined as follows:

dlab =√

(lk − li)2 + (ak − ai)2 + (bk − bi)2 (1)

dxyz =√

(xk − xi)2 + (yk − yi)2 + (zk − zi)2

(2)

Ds = dlab +m

Sdxyz (3)

Here, Ds is the sum of the lab distance and thexyz plane distance normalized by the grid inter-val S, and m is the compactness factor. Thishyperparameter m allows us to control the com-pactness of a superpixel by emphasizing spacialproximity during the normalization of the spa-tial distances. [1]. Figure 5 shows the effect ofvarying m on an image.

Figure 5. This figure shows the impact of thecompactness factor m. As m increases, spatialproximity is emphasized and the superpixels be-come more compact. This results in smootherand rounder superpixels.

3.3 CUDA Parallelism

For a more modular, powerful, and concep-tually easier coding environment, we break up

5

Page 6: CS87 Project Report: Parallelized Computation of Superpixelshenryshan.com/cs87-project-report.pdf · parallel computation API in C/C++) to paral-lelize th e kernels, whereas gSLIC

the pipeline of SLIC. We designate three mainparts which are parallelized in separate CUDAkernels: the initial assignment of pixels through-out the image to centroids, the assignment ofpixels to the closest centroid, and the recalcula-tion of the centroid locations. We also draw fromspecific knowledge about CUDA architecture insome of the choices we make about the design ofthe parallelism. In order to quickly reach max-imum parallelism, we parallelize over pixels andassign one thread per pixel. With this designchoice, it makes the most sense to store the datafor the algorithm in three large arrays represent-ing the color values of each of n pixels, the kaverage coordinate values of each centroid, andthe mapping of each of n pixels to its assignedcentroid. This allows us a parallelism factor upto n for the first assignments (since it just con-sists of populating the assignments array), andupdate assignments (since it involves each pixelassessing candidate centroids for the new clos-est). The recomputing of the centroids reachesa parallelism factor of k, since there are k cen-troids.

The first assignments function simply uses asmany threads as there are pixels in the image,and uses the thread ID to map each thread toa unique pixel. Then, it calculates the xyz co-ordinates of the pixel and uses this to calculatethe identity of the nearest centroid. (This desig-nation is easy to figure out beforehand, but thefunction to update assignments which will be de-scribed later involves searching for the nearestcentroid.) This function does not need to re-turn anything to the CPU before calling the nextfunction.

Once the first assignments are made, we en-ter a loop which is the expectation-maximizationpart of this algorithm. Importantly, this loopcannot be parallelized because each round re-lies on the state resulting from the last round.This loop consists of recomputing centroids (theexpectation step), and adjusting pixels’ assign-ments to them (the maximization step). Recom-puting centroids is parallelized over k threads

on the GPU, and each thread takes the averagevalue in 6-dimensional labxyz space of its (onaverage) k

n member pixels. These measurementsare embarrassingly parallel, but there is not aneasy way to further parallelize them on the GPU,so each thread loops through every member. Bythe end of this function, the array of centroidshas been updated.

Finally, we update the assignments. Again,this is parallelized over all n pixels in the im-age. However, now each thread must do a cer-tain number of checks to make sure that a newcentroid has not become closer than the old one.For this, we make the assumption that the finalcentroid that a pixel ends up associated with willbe one of its closest initially. The “close” regionis defined by three centroids in each dimension.For 2-dimensional images, this means we look ata local region that includes nine centroids, andcalculate the distance to each. Most of the time,when this function finishes, we go back to recom-puting the centroids, but after the final iteration,we copy the memory that has been manipulatedby the GPU back to the CPU in order to post-process and visualize it.

4 Results

Our experiments are run on the computeranise at Swarthmore College which has eight In-tel i7 cores and a NVIDIA Quadro P5000 graph-ics card.

4.1 Timing Metric

A central goal of implementing a parallelizedversion of SLIC is to decrease the time thatit takes to produce a superpixel segmentation.This is especially desirable if SLIC is being usedas a preprocessing step for a computer vision tasklike object recognition, because the overall algo-rithm can only be as efficient as its componentparts.

To test the efficiency of our parallelized imple-mentation of SLIC we used timing code to deter-

6

Page 7: CS87 Project Report: Parallelized Computation of Superpixelshenryshan.com/cs87-project-report.pdf · parallel computation API in C/C++) to paral-lelize th e kernels, whereas gSLIC

mine runtime on a single square image scaled tofive different edge sizes: 128, 256, 512, 1024, and2048px. We used the same image scaled so thatthe complexity of the image itself would haveminimal effect on the timing. We ran the sequen-tial SLIC implementation on the same set of fiveimages. We also varied the number of superpix-els k in order to see how this affected runtime.We expect increasing image size to lead to longerruntime because many more computations mustbe performed. We also expect that increasingthe number of superpixels will have little effecton sequential runtime, but it will improve per-formance of the parallel algorithm because therecompute centroids function is parallelized interms of the number of superpixels.

We ran five repetitions of each set of hyperpa-rameters and averaged the results to account forpotential variation in background programs thatcould affect the timing results.

4.2 Timing Results

Figure 6. Timing results for the parallel (solidlines) and sequential (dotted lines) implemen-tations of SLIC as image size and number ofsuperpixels is varied.

As shown in Figure 6, both the parallel andsequential versions of SLIC had runtimes thatscaled linearly in the number of pixels, regard-less of the number of superpixels. The sequential

version of SLIC was 3.7 times faster for k = 1000and 4.9 times faster for k = 100 when averagedacross image sizes. This was unexpected giventhat the purpose of parallelism is to reduce run-time. We did a more detailed breakdown of howtime is spent within the algorithm to determineif the relatively long execution time for the paral-lel implementation was due to copying memorybetween the CPU and the GPU. However, wedetermined that the majority of the time was infact spent within the kernel calls for recomput-ing superpixel centroid locations and updatingthe assignment of pixels to superpixels.

As expected, the number of superpixels didnot significantly affect the runtime of the sequen-tial algorithm. With 1000 rather than 100 super-pixels, the parallel algorithm was 11.1% fasterwhen averaged across image sizes.

4.3 The Boundary Recall Metric

Boundary recall is a metric that tells us thepercentage of the actual object boundaries in theimage that our algorithm correctly labels as asegmentation boundary. Using the metrics truenegative (TN), false negative (FN), false positive(FP), and true positive (TP), boundary recall isdefined as

boundary recall =TP

FN + TP(4)

This means that if our algorithm has a bound-ary recall of 0.60 for a given image, then 60%of the boundaries in the image are also locatedwhere there is a superpixel boundary identifiedby our algorithm.

To test our algorithm’s boundary recall, weused David Stutz’s benchmark code writtenas an extension to the Berkeley SegmentationBenchmark that is meant to provide performancemetrics for contour detection and image segmen-tation [5].

Figure 7 shows both the inputs to the bench-mark code and the output from our SLIC algo-rithm. To calculate boundary recall, we provided

7

Page 8: CS87 Project Report: Parallelized Computation of Superpixelshenryshan.com/cs87-project-report.pdf · parallel computation API in C/C++) to paral-lelize th e kernels, whereas gSLIC

the benchmark code with three images: the orig-inal image, the “correct” or ground truth seg-mentation of the image, and a contour map ofthe superpixel segmentation that our algorithmgenerated. The superpixel segmentation and theimage with superpixel boundaries superimposedonto it are both outputs of our SLIC algorithm.

Figure 7. Clockwise from upper left: the originalpicture, the original picture with superpixelboundaries from our algorithm superimposedonto it (yellow lines), superpixel boundaries ouralgorithm determined, and the “correct” seg-mentation from by the Berkeley SegmentationBenchmark [5] dataset.

4.4 Boundary Recall Results

We ran experiments with David Stutz’s exten-sion on Berkeley Segmentation Benchmark byvarying the number of superpixels k, the com-pactness factor m, for the sequential and parallelimplementations of SLIC.

The boundary recall results presented in Fig-ure 8 largely make sense. Boundary recall wasexpected to increase as the number of superpixelsincrease simply because an increase in the num-ber of superpixels will increase the total numberof boundaries. This increases the likelihood thatthe “correct” ground truth boundaries will alsobe segmentation boundaries. This was shown tobe true for both the parallel and sequential im-plementations.

Figure 8. As shown in the chart, boundaryrecall to increases significantly as the number ofsuperpixels increases. The sequential version ofSLIC has slightly higher recall than our parallelimplementation. A low compactness factorappeared to do better for lower values of k, anda high compactness factor appeared to do betterfor higher values of k.

When m is small, SLIC appears to do bet-ter when k is small as well for both the par-allel and sequential implementations. This isalso expected because with less superpixels andless enforced spatial compactness, the algorithmsare more likely to capture the segmentationboundaries in the ground truth with jagged androugher edges in each superpixel. As m gets big-ger, the algorithms do better when k is bigger.This is because as the images become heavilysegmented it is naturally able to capture the seg-mentation boundaries in the ground truth. Witha larger value of m, spatial proximity is em-phasized and the superpixel boundaries will besmoother and more naturally capture boundariesin the image.

In theory, we expect the sequential and par-allel algorithms to produce the same superpixelsand thus produce the same boundary recall mea-sures. However, because our parallelization ofthe code differs slightly from the sequential ver-sion in order to effectively parallelize the code,the superpixels differ and thus the boundary re-call measures do as well. This is most notablewhen both m and k are small.

All in all, our parallel version of SLIC performsnearly as well as the sequential version of SLIC,

8

Page 9: CS87 Project Report: Parallelized Computation of Superpixelshenryshan.com/cs87-project-report.pdf · parallel computation API in C/C++) to paral-lelize th e kernels, whereas gSLIC

especially as the number of superpixels and thevalue of the compactness factor increases.

5 Conclusions and FutureWork

5.1 Conclusions

We expected to achieve a significant reductionin runtime by parallelizing the SLIC algorithmfor superpixels. However, our parallelized ver-sion of SLIC was about 4-5 times slower than thesequential version. It is not clear why our par-allel implementation was drastically slower, butwe suspect that our parallel kernels had synchro-nization and concurrency problems as we triedto hack them together. We think this prob-lem was compounded by the fact that we im-plemented the algorithm in Python and paral-lelized with PyCuda. We chose to implementSLIC in Python because we thought it would beeasier, but the high level coding style actually be-came an obstacle when we were parallelizing thecode. Now that we really understand the con-cepts behind the algorithm, it would be smootherto implement SLIC in C or C++. We also origi-nally suspected that the slower runtime was dueto copying memory back and forth between theCPU and the GPU. However, we determinedthat the majority of the time was actually spentwithin the kernel calls for recomputing the su-perpixel centroid locations and updating the as-signment of pixels to superpixels. On the otherhand, even though our algorithm didn’t performbetter in terms of speed, it achieved nearly thesame boundary recall accuracy, especially as thenumber of superpixels and the value of the com-pactness factor increased.

5.2 Future Work

SLIC has been implemented with a numberof extensions since its conception. GPU paral-lelism is one, dubbed gSLIC, as well as ASLIC,a zero-parameter version of SLIC which adap-tively assesses the visual activity in a region and

lowers compactness in highly textured areas inorder to more accurately represent the irregu-lar constituents. This provides a simple nextstep building off of our work presented here — aGPU parallelism of SLICO would likely requirelittle more than a change to the objective func-tion within the kernel to update assignments.

There are also a few potentially interestingniches of research to be filled with SLIC-basedapplications. While our work presented in thispaper focuses on 2-dimensional images which re-sults in 5-dimensional k-means, a 6-dimensionalversion could allow for object recognition andtracking through time. An example would bean application of SLIC over a movie. For this,all frames in the movie would be consolidatedinto a 3-dimensional block of images, like stack-ing thousands of sheets of paper together into ablock. Now there is a sixth axis, t, for time. Inthis representation, a ball being thrown acrossthe screen would be represented by a diagonalworm shape — in the early frames, the ball ison one side but as its position in t increases, itsxy position changes, too. Ideally for this versionof the problem, the algorithm would try to findthe 3-dimensional supervoxels that make up theball’s path. The distance function would have tobe changed, and we suspect that there should beless compactness along t as along x and y, sincet isn’t spatial in the same way as the latter two.

SLIC could be used for general purpose datacompression. Like the boom of GPGPU com-puting, which conceived of regular problems asmatrices in order to take advantage of the GPU’spropensity for projecting planes, regular datacould be conceived of as an image, and com-pressed using SLIC. SLIC is a convenient al-gorithm for compression because the user canchoose k; the closer k is to n, the less com-pressed the output is. By this token, a datasetcould be compressed, while still hopefully pre-serving the general distribution and outline ofthe original data. We suspect that a SLIC-basedcompression would be less accurate than someother well-known machine learning techniques

9

Page 10: CS87 Project Report: Parallelized Computation of Superpixelshenryshan.com/cs87-project-report.pdf · parallel computation API in C/C++) to paral-lelize th e kernels, whereas gSLIC

for dimensionality reduction given the scale ofthe compression, although Lucchi et al (2010)have shown that more meaningful new dimen-sions can be extrapolated from the output givenby SLIC.[3]

6 Meta-discussion

We originally intended to parallelize SLIC inthe first week, implement SLICO (adaptive SLICthat self-tunes m) by week 2, experiment with 6-dimensional supervoxels, and extend gSLIC andSLICO to create gSLICO - a parallelized versionof SLIC that adaptively tunes m. We think wefell short of our goals for a number of reasons,but bad documentation and starting code waslikely the biggest.

Initially, we intended to write our implemen-tation in CUDA C++. This quickly became con-fusing and hard to read, and we decided it wouldbe simpler and more modular in Python. Weused PyCUDA, an appropriation of CUDA to aPython environment, to help connect with theGPU. However, while the documentation is notbad, it leaves something to be desired in termsof thorough explanation, return values, and espe-cially error messages. We experienced a hangupfor over a week due to an error that was impossi-ble to find a helpful discussion of on the internet.In addition, there is no GDB for a Python envi-ronment, so this type of low-level error tracingwas especially difficult. While this turbulencebetween low- and high-level created some ten-sion in our work flow, there were a number ofbenefits that the high-level aspect of this projectprovided: namely, image interpretability. In pre-senting our project, we time and again were ableto show listeners our generated images while wewere talking to them about the particular al-gorithm, and this multimedia approach allowedpeople who were less interested in the math tograsp the function of the algorithm intuitively,while also being able to connect the math tovarying behavior between different runs of theprogram.

References

[1] Radhakrishna Achanta, Appu Shaji, KevinSmith, Aurelien Lucchi, Pascal Fua, andSabine Susstrunk. SLIC superpixels. Techni-cal report, 2010.

[2] Radhakrishna Achanta, Appu Shaji, KevinSmith, Aurelien Lucchi, Pascal Fua, andSabine Susstrunk. SLIC superpixels com-pared to state-of-the-art superpixel meth-ods. IEEE transactions on pattern analysisand machine intelligence, 34(11):2274–2282,2012.

[3] Aurelien Lucchi, Kevin Smith, RadhakrishnaAchanta, Vincent Lepetit, and Pascal Fua. Afully automated approach to segmentation ofirregularly shaped cellular structures in emimages. International Conference on Medi-cal Image Computing and Computer-AssistedIntervention, pages 463–471, 2010.

[4] Carl Yuheng Ren and Ian Reid. gSLIC: areal-time implementation of SLIC superpixelsegmentation. University of Oxford, Depart-ment of Engineering Science, 6, 2011.

[5] David Stutz. Extended berke-ley segmentation benchmark.https://github.com/davidstutz/

extended-berkeley-segmentation-benchmark,2014.

10