jessica FRIDRICH jan KODOVSK Ý miroslav GOLJAN vojt ě ch HOLUB

PowerPoint Presentation

jessicaFRIDRICHjanKODOVSKmiroslavGOLJANvojtchHOLUBBreaking HUGO theProcess Discovery

presented jointly with

Steganalysis of Content-AdaptiveSteganography in Spatial Domain

1/ 4311111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101Are there issues with adaptive stego? Content adaptive embedding leakage about placement of embedding changes. Is HUGOs probabilistically-known selection channel a problem?Fridrich, Kodovsk, Holub, GoljanWhy should it be a problem? It is all about how well we can model the content. Honestly, fellow BOSS competitors, you all started here, havent you?2/ 4311111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101Probability of embedding change can be estimated from the stego imagefairly well:

trueestimatedcoveractualchangesFridrich, Kodovsk, Holub, Goljan3/ 43

11111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101Fridrich, Kodovsk, Holub, GoljanComplex texture of 512512 images

4/ 434MP image512512image11111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101Look at what HUGO did Seven images from BOSSrank can be detected visually as stego images:BOSSrank image No. 235

Close-up of its LSB plane

Fridrich, Kodovsk, Holub, Goljan5/ 4311111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101Weighted-Stego attack for HUGO?Assume that we can estimate

Problem: E[c] varies much with content, cannot be easilythresholded or calibrated despite the fact that E[c] < E[s] in general (and sometimes by as much as 60% but on average by 1.74%).Fridrich, Kodovsk, Holub, Goljan6/ 4311111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101Pixel domain is not useful, right?HUGO approximately preserves ~107 statistics computed from neighboring pixels. Intimidating, isnt it? Forget the pixel domain, go to a different domain. Wavelet perhaps?

Brushed off dust from WAM, put it on steroids, whacked HUGO with it.

What we tried: added moments from LL band to inform steganalyzer about content (makes sense for content adaptive stego) add the same feature vector from re-embedded image (relying on saturation effect with re-embedding) replace Wiener filter in WAM with adaptive filter based on estimated probability of change:

BOSSrank score: 59% Fridrich, Kodovsk, Holub, Goljan

7/ 4311111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101Go back to pixel domain!Your best chances for detection are in the embedding domain.

Compute the residual where is an estimator of xij from its local neighborhood.

Advantages of computing detection statistics from rij: narrower dynamic range image content suppressed higher SNR between stego-signal and noise

Undoubtedly, the best estimator is xij. However, should not depend on xij to avoid biased estimate (this is why denoising filters do not work well).

Fridrich, Kodovsk, Holub, Goljan

8/ 4311111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101Higher-order local models (HOLMES)SPAM feature set uses locally constant model

constant model linear model quadratic model

Fridrich, Kodovsk, Holub, Goljan9/ 43 HUGO approximately preserves joint distribution of three 1st-order differences among four neighboring pixels. We need to get out of HUGOs model:

Use four or more differences cooc dimension grows too fast, bins in coocs become empty or underpopulated. Use higher-order differences they see beyond 4 pixels.11111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101

Image with many edgesEdge close upHugo is likelyto embed hereeven thoughthe content ismodelable in thevertical direction

However, pixeldifferences will mostlybe in the marginal.Linear or quadraticmodels bring the residual back insidethe cooc matrixHigher-order local models, contdFridrich, Kodovsk, Holub, Goljan10/ 4311111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101Note that we marginalize instead of cutting.The marginals (bins at the boundary) are very important!Quantize and truncate

Before computing the coocs, the residual is first quantizedand then truncated.Fridrich, Kodovsk, Holub, Goljan11/ 4311111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101First successful features

Features are two 3Dcooc matrices:

MINMAX: T = 4, q = 1, dim = 2(2T+1)3 = 1458QUANT : T = 4, q = 2, dim = 1458Take min/max of 2nd-order residuals in 4 directions:

Fridrich, Kodovsk, Holub, Goljan12/ 4311111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101Encouraging results Early October

Features: MINMAX, dim 1458Training database: 29074 BOSSbase 0.91Classifier: FLDBOSSrank: 71%

Features: MINMAX+QUANT, dim 2916Training database: 29074 BOSSbase 0.91Classifier: G-SVMBOSSrank: 73%

Fridrich, Kodovsk, Holub, Goljan13/ 4311111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101BOSSbase 0.91 was prepared with 4, 10BOSSrank with 1

BOSSbase 0.92 embedded with 1.

Retraining our classifier on the correct stego database gave:

October 14

Features: MINMAX+QUANT, dim 2916Training database: 29074 BOSSbase 0.92Classifier: G-SVMBOSSrank: 75%Unexpected stego-source mismatch

Fridrich, Kodovsk, Holub, Goljan14/ 4311111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101This is when BOSS became GOSS:

Guess Our Steganographic SourceDo not say hop before you jumpFridrich, Kodovsk, Holub, Goljan15/ 43757476777879BOSSrankHugobreakers frustrationOct 14Nov 1311111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101Fridrich, Kodovsk, Holub, GoljanThe dreaded cover-source mismatchThe tell-tale symptom of the mismatch: Adding more features improved score on BOSSbase but worsenedBOSSrank score.

The problem: we trained on one source but tested on another (different) source. Our detector lacked robustness.

Note that this is an issue of robustness rather than overtraining. Well recognized in detection and estimation.Very difficult problem as the mismatch can have so many different forms.16/ 4311111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101Trying to resolve the CSMa) Train on a more diverse source (adding 6000 images to BOSSbase lowered BOSSrank making mismatch worse?)

b) Use classifiers with a simpler decision boundary (L-SVM)(the same problem and lower accuracy)

c) Contaminate the training set with BOSSrank images: - put denoised BOSSrank covers (use adaptive denoising based on estimated probabilities) - put re-embedded BOSSrank stego (unable to obtain consistent results with contamination when experimenting with BOSSbase, decided to toss it)

d) Find out more about the cover source- estimate resampling artifacts we could obtain info about the original image size (no artifacts detected by Farids code)- extract fingerprint from BOSSbase cameras, detect in images from BOSSrank, train on images from the right source.Fridrich, Kodovsk, Holub, Goljan17/ 4311111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101Forensic analysis of BOSSrank Fingerprint extracted from all 7 BOSSbase cameras and detected in BOSSrank. ~500 images tested positive for Leica M9, no other camera tested positiveFridrich, Kodovsk, Holub, Goljan

PCE Leica Rebel18/ 43BOSSrank images11111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101

Forensic analysis of BOSSrank, contdMost images takenin Pacific North-West

19/ 4311111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101Fingerprint extracted from 25 JPEG images from Tomas Fillers camera (Panasonic Lumix DMC-FZ50) taken previously at SPIE conferences.

Resized to 512512 using the same script. Positively identified in ~77 BOSSrank images.

Could not use for BOSSas other competitors didnot have this opportunity.

We closed our investigation with ~50%from Leica, the restdeclared unknown.Forensic analysis of BOSSrank, contdFridrich, Kodovsk, Holub, Goljan

PCE20/ 43BOSSrank images11111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101Forensic-aided steganalysisOption #1: Buy Leica M9 and generate our owndatabase. Oops price is $7,000!!Option #2: LensRentals.com, rent it for a week.Took 7,301 images with Leica M9.

Experiment#1Train two classifiers one trained only on Leica to analyze only Leica images, and one trained on all to analyze the rest. Merge the prediction files.

Experiment#2Add Leica images to the BOSSbase batabase and train on all.

Fridrich, Kodovsk, Holub, GoljanResult: BOSSrank score either the same or slightly worse. Bummer 21/ 4311111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101Can a cover source be replicated?Cover source is a very complex entity shaped by:

Camera and its settingsshort exposure lower dark currenthigh ISO increased level of noisestopping lens at 5.6 sharper images than when stopped at 2.0 Lensshort focus low depth of field easier for analysis ContentBinghamton in Fall is a poor replacement for French Riviera.Average amount of edges, smooth regions.

We rented the wrong lens (50 mm), Patrick used 35 mm.Fridrich, Kodovsk, Holub, Goljan22/ 4311111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101Model diversity is the keyQUANT, go 4D, use 3rd order differences (quadratic model), merge.

Difference order Cooc.Tq dim 2nd 332 686 3rd 332 686 2nd 4221250 3rd 4221250

November 13

Features: dim 3872Training database: 29074 BOSSbase 0.92Classifier: G-SVMBOSSrank: 76%

With increased dimensionality, machine learning became a serious bottleneck.

Fridrich, Kodovsk, Holub, Goljan23/ 4311111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101Fridrich, Kodovsk, Holub, GoljanEnsemble classifier (SVM)To facilitate further development, we started using ensemble classifiers instead of SVMs.

Set l 1Randomly select k features out of d, k d.Train a FLD on this random subspace on all BOSSbase images, set threshold to obtain minimum PE, store the eigenvector el.Make decisions on BOSSrank (fj is the jth feature):fj el > 0 Dec(l, j) 1 (stego)fj el < 0 Dec(l, j) 0 (cover)Repeat 24 L-times, obtain L decisions Dec(1..L, 1..1000) for each test image.For each image, fuse decisions by voting.

Advantages Low complexity (training of a 9288-dim set on 217,000 images with L 31 and k 1600 takes only 8 minutes on a PC. Performance comparable to SVM.

24/ 4311111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101Scaling up feature dim seemed to work Mid November

Feature set: Previous 3872 + 1458 (MINMAX) = 5330Training database: 29074 BOSSbase v. 0.92Classifier: Ensemble, L 31, k 1600BOSSrank: 77%

However, adding more features computed from various residuals did not improve BOSSrank, despite steady improvement on BOSSbase.

Fridrich, Kodovsk, Holub, Goljan25/ 4311111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101A little more empirical magic Train on 2N images where N is about 2050% larger than feature dimension.

November 29

Feature set: 5330 + QUANT4 + SQUARE + KB = 9288Training database: 29074 + 26500 = 215,574 Classifier: Ensemble, L 31, k 1600BOSSrank: 78%QUANT4:SQUARE: + square coocKB (Ker-Bhme) kernel: cooc = H + V

-1/4 1/2 -1/4 1/2 0 1/2-1/4 1/2 -1/42500 2500 1458

26/ 4311111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101The final behemoth of dim 24,933 December 31

Feature set: 24,933Training database: 234,719 Classifier: Ensemble, L 71, k 2400BOSSrank: 81%Accuracy on Leica: 82.3%Accuracy on Panasonic: 70.0%

Fridrich, Kodovsk, Holub, GoljanCombination of 32 feature subsets containing 1st6th order differences multiple versions with different values of q (quantization) EDGE residuals (effective around edges) Calibrated features (from a low-pass filtered image) 5D coocs with T = 127/ 4311111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101Fridrich, Kodovsk, Holub, GoljanScore progress

28/ 4311111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101Fridrich, Kodovsk, Holub, GoljanDetecting HUGO without cover source mismatch

alias

Steganalysis of Content-AdaptiveSteganography in Spatial Domain

29/ 4311111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101Fridrich, Kodovsk, Holub, GoljanEffect of quantizationFridrich, Kodovsk, Holub, GoljanQuantization allows the features to sense changes intextured areas and around edges.

3D coocs are best quantized with q = c = central coefficient inthe residual computation.

c 1c 2c 3c 6c 10c 2030/ 4311111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101Fridrich, Kodovsk, Holub, GoljanBest quantization value for 3D and 4D coocsq 2 4 6 8 10 12PE 30.5 26.8 26.1 26.8 27.7 28.2Feature set MINMAX, 4th-order differences, 3D, T = 4.q 2 4 6 8 10 12PE 34.2 30.7 28.2 26.8 27.5 28.4Feature set MINMAX, 4th-order differences, 4D, T = 2.31/ 43For 3D coocs, the best q is cFor 4D coocs, the best q is 1.5c11111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101Fridrich, Kodovsk, Holub, GoljanTesting higher-order residualsFea. type (diff, q, T) d PE Best Worst L kSPAM(3D)*(2nd,1,4) 1458 71.4 74.5 69.0 31 1000MINMAX(3D)(2nd,1,4) 1458 72.7 74.9 68.7 31 1000QUANT(3D)(2nd,2,4) 1458 73.8 76.8 71.6 31 1000QUANT(3D)+(2nd6th,c,4) 7290 80.0 82.2 77.4 81 1600QUANT(4D)+(2nd6th,c,2) 6250 79.1 81.0 76.5 81 1600* SPAM is a direct equivalent of SPAM vector with 1st orderdifferences replaced with 2nd order.+ 2nd6th is a merger of QUANT features from 2nd6th differencesquantized with q = c = central coefficient in the residualAverage accuracy when training on 8074 and testing on 1000 imagesfrom BOSSbase repeated 100 times (all results with ensemble).32/ 4311111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101

Accuracy per image of BOSSbaseon 1000 splits 8074/1000 (trn/tst).Lines = avgs for each camera Accuracy on BOSSbase across cameras33/ 436627 cover images always classified as cover6647 stego images always classified as stego4836 images always classified correctly as cover AND stego11111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101Fridrich, Kodovsk, Holub, Goljan

Pentax K20D is the easiestROC and scatter plot with QUANT (dim 1458)34/ 4311111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101Fridrich, Kodovsk, Holub, Goljan

Canon Rebel is the hardestScatter plot with QUANT (dim 1458)35/ 4311111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101Fridrich, Kodovsk, Holub, GoljanAccuracy correlates with texture

FLD scatter plot with QUANT (dim 1458)Average absolute 2nd difference

36/ 4311111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101Fridrich, Kodovsk, Holub, GoljanLeica images

Typical Leica image histogram (possibly caused by theresizing script). Decreased dynamic range makes detectionof embedding easier.37/ 4311111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101Fridrich, Kodovsk, Holub, Goljan

Scatter plot for LSB matching (QUANT 1458)Dependence on content is much weaker!38/ 4311111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101Fridrich, Kodovsk, Holub, GoljanComparison to 1 embedding and CDF

ensemble with 33,963-dim behemoth

HUGO with BOSS payload, accuracy 84.2%39/ 4311111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101Fridrich, Kodovsk, Holub, GoljanImplications for steganalysis As steganography becomes more sophisticated, steganalysis needs to use more complex models to capture more subtle dependencies among pixels. The key is diveristy! The model should be rich a union of smaller submodels. Feature dimensionality will inevitably increase. Automatic handling of the dimensionality problem is preferable to hand-tweaking ensemble classifiers scale well w.r.t. feature dim and training set size and are suitable for this task. Detectability of HUGO embedding in larger images will increase faster than what Square Root Law dictates because neighboring pixels will be more correlated Cover source mismatch is an extremely difficult problem that will hamper deployment of steganalysis in practice. Robust machine learning is badly needed.40/ 4311111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101Fridrich, Kodovsk, Holub, Goljan Adaptive stego implemented to minimize distortion in model space is the way to go Critical: choice of model and distortion function HUGOs model is high-dim but too narrow By making the model more diverse (rich) better steganography can likely be built Despite progress made during BOSS, HUGO remains the most secure stego algorithm we ever testedImplications for steganography41/ 4311111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101Fridrich, Kodovsk, Holub, Goljan42/ 43 Optimal choice of residual and its quantization? Perhaps learning both from given source and for stego algorithm? Alternative to coocs as statistical descriptors of the random field of residuals? Helped us develop ensemble classification as alternative to SVMs Drew attention to CSM training set contamination training only on (processed) test imagesBOSS jump-started new directions11111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101Fridrich, Kodovsk, Holub, GoljanOur current results on detection of HUGOand much more in the Rump Session.43/ 4311111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101Fridrich, Kodovsk, Holub, Goljan44/ 43Some more interesting statsImages Avg. graySatur. pixlsTextureBEST 74.1 2046 1.73FAs 101.3 4415 4.66MDs 102.0 5952 3.951000 splits of BOSSbase into 8074/1000BEST images always classified correctly as cover AND stegoFAs images always classified as stego when coverMDs .. images always classified as cover when stegoTexture: Scaled average |xij xi,j+1|11111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101

Fridrich, Kodovsk, Holub, Goljan45/ 43Effect of quantizationOriginaldistributionQuantizeddistributionCooc covers only this rangeThickmarginalThinmarginalChanges to elements from marginal are undetected.11111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101Fridrich, Kodovsk, Holub, GoljanAnother example

46/ 434MP imageafter scalingto 51251211111000001111111110000011100100000111100011100010100100100100111100100111000011111000001001111000011110101110101101110110111101101101

jessica FRIDRICH jan KODOVSK Ý miroslav GOLJAN vojt ě ch HOLUB

Documents

Transcript of jessica FRIDRICH jan KODOVSK Ý miroslav GOLJAN vojt ě ch HOLUB