Computer Forensics Hochschule Darmstadt, CASED … Hard disk drives, solid-state drives, USB sticks....

21
Chapter 8: On the Use of Hash Functions in Computer Forensics Harald Baier Hochschule Darmstadt, CASED WS 2011/2012 Harald Baier Hash Functions in Forensics / WS 2011/2012 1/41 Motivation Foundations of Hash Functions Use Cases of Hash Functions Piecewise Hashing Fuzzy Hashing Harald Baier Hash Functions in Forensics / WS 2011/2012 2/41

Transcript of Computer Forensics Hochschule Darmstadt, CASED … Hard disk drives, solid-state drives, USB sticks....

Page 1: Computer Forensics Hochschule Darmstadt, CASED … Hard disk drives, solid-state drives, USB sticks. I Mobile phones, SIM cards. I Digital cameras, digital camcorders, SD cards. ...

Chapter 8: On the Use of Hash Functions in

Computer Forensics

Harald Baier

Hochschule Darmstadt, CASED

WS 2011/2012

Harald Baier Hash Functions in Forensics / WS 2011/2012 1/41

Motivation

Foundations of Hash Functions

Use Cases of Hash Functions

Piecewise Hashing

Fuzzy Hashing

Harald Baier Hash Functions in Forensics / WS 2011/2012 2/41

Page 2: Computer Forensics Hochschule Darmstadt, CASED … Hard disk drives, solid-state drives, USB sticks. I Mobile phones, SIM cards. I Digital cameras, digital camcorders, SD cards. ...

Motivation

Motivation

Foundations of Hash Functions

Use Cases of Hash Functions

Piecewise Hashing

Fuzzy Hashing

Harald Baier Hash Functions in Forensics / WS 2011/2012 3/41

Motivation

Use Case: Prosecution

1. Police and prosecutors confronted with different storagemedia:

I Hard disk drives, solid-state drives, USB sticks.

I Mobile phones, SIM cards.

I Digital cameras, digital camcorders, SD cards.

I CDs, DVDs.

I RAM (dumps).

I ...

2. Amount of distrained data often exceeds 1 terabyte.

Harald Baier Hash Functions in Forensics / WS 2011/2012 4/41

Page 3: Computer Forensics Hochschule Darmstadt, CASED … Hard disk drives, solid-state drives, USB sticks. I Mobile phones, SIM cards. I Digital cameras, digital camcorders, SD cards. ...

Motivation

Different views of 1 terabyte

1 terabyte of digital text is (approximately) equal to:

1. 1 trillion characters: 1 character = 1 byte.

2. 220 million pages: 1 page = 5000 characters.

3. 21 years of printing time: 20 sheets per minute.

4. 1 million kg of paper: onesided printed.

5. Paper stack of 22 km height: bulk of 0.1 mm.

Harald Baier Hash Functions in Forensics / WS 2011/2012 5/41

Motivation

Finding relevant files resembles ...

Source: tu-harburg.de Source: beepworld.de

Harald Baier Hash Functions in Forensics / WS 2011/2012 6/41

Page 4: Computer Forensics Hochschule Darmstadt, CASED … Hard disk drives, solid-state drives, USB sticks. I Mobile phones, SIM cards. I Digital cameras, digital camcorders, SD cards. ...

Motivation

... or is it solved for suspect files?

Harald Baier Hash Functions in Forensics / WS 2011/2012 7/41

Foundations of Hash Functions

Motivation

Foundations of Hash Functions

Use Cases of Hash Functions

Piecewise Hashing

Fuzzy Hashing

Harald Baier Hash Functions in Forensics / WS 2011/2012 8/41

Page 5: Computer Forensics Hochschule Darmstadt, CASED … Hard disk drives, solid-state drives, USB sticks. I Mobile phones, SIM cards. I Digital cameras, digital camcorders, SD cards. ...

Foundations of Hash Functions

Definition and Security Applications

1. A hash function h is a function with two properties:I Compression: h : {0, 1}∗ −→ {0, 1}n.

I Ease of computation: Computation of h(m) is ’fast’ in practice.

2. Notation:I m is a ’document’ (e.g. a file, a device).

I h(m) its hash value or digest.

3. Sample security applications:I Storage of passwords.

I Electronic signatures (MAC, asymmetric signatures).

I Computer forensics.

Harald Baier Hash Functions in Forensics / WS 2011/2012 9/41

Foundations of Hash Functions

Hash Functions in Cryptography

For use in cryptography, we have to impose further conditions:

1. Preimage Resistance:I Let a hash value H ∈ {0, 1}n be given.

I It is infeasible in practice to find an input (i.e. a document m)with H = h(m).

2. Second Preimage Resistance:I Let a document m ∈ {0, 1}∗ be given.

I It is infeasible in practice to find a second document m′ withm 6= m′ and h(m) = h(m′).

3. Collision Resistance:I It is infeasible in practice to find any two documents m, m′

with m 6= m′ and h(m) = h(m′).

Harald Baier Hash Functions in Forensics / WS 2011/2012 10/41

Page 6: Computer Forensics Hochschule Darmstadt, CASED … Hard disk drives, solid-state drives, USB sticks. I Mobile phones, SIM cards. I Digital cameras, digital camcorders, SD cards. ...

Foundations of Hash Functions

Avalanche Effect

1. Let m and h(m) be given.If m is replaced by m′, h(m′) behaves pseudo randomly.=⇒ No control over the output, if the input is changed.

2. Consequences:I If only one bit in m is changed to get m′, the two outputs

h(m) and h(m′) look ’very’ different.

I Every bit in h(m′) changes with probability 50%,independently of the number of different bits in m′.

Harald Baier Hash Functions in Forensics / WS 2011/2012 11/41

Foundations of Hash Functions

Sample Cryptographic Hash Functions

Name MD5 SHA-1 SHA-256 SHA-512 RIPEMD-160n 128 160 256 512 160

Demo:1. Computation of hash values using openssl.

2. Avalanche effect.

3. Performance.

Harald Baier Hash Functions in Forensics / WS 2011/2012 12/41

Page 7: Computer Forensics Hochschule Darmstadt, CASED … Hard disk drives, solid-state drives, USB sticks. I Mobile phones, SIM cards. I Digital cameras, digital camcorders, SD cards. ...

Use Cases of Hash Functions

Motivation

Foundations of Hash Functions

Use Cases of Hash Functions

Piecewise Hashing

Fuzzy Hashing

Harald Baier Hash Functions in Forensics / WS 2011/2012 13/41

Use Cases of Hash Functions

Use Cases

1. Ensure authenticity and integrity during data acquisition.I Relevant for both dead and live analysis.

I Hash values must be protected:I Written down by hand in investigation notebook.

I Compute a digital signature over it.

2. Identify known files:I Whitelisting: Known to be good files.

I Blacklisting: Known to be bad files.

3. Bloom filters to efficiently identify a large set of files.

Relevant security property of the hash function:Second-preimage resistance.

Harald Baier Hash Functions in Forensics / WS 2011/2012 14/41

Page 8: Computer Forensics Hochschule Darmstadt, CASED … Hard disk drives, solid-state drives, USB sticks. I Mobile phones, SIM cards. I Digital cameras, digital camcorders, SD cards. ...

Use Cases of Hash Functions

Data Acquisition: Dead analysis

1. Investigation is performed on a copy.

2. Main goal of hashing: Preserve chain of custody.

3. Common approach for a storage medium:I Compute hash value h1 over the whole original volume.

I Write hash value h1 down in physical logbook.

I Make a 1-to-1 copy of the volume using dd.

I Compute hash value h2 over the copy.

I Write hash value h2 down in physical logbook.

I Work read-only on copy.

I To prove chain of custody, compute h2 again and check, ifh1 = h2 holds.

Harald Baier Hash Functions in Forensics / WS 2011/2012 15/41

Use Cases of Hash Functions

Example: Save first partition of a HDD

Chain of custody preserved

$ openssl dgst -sha1 /dev/hda1SHA1(/dev/hda1)= c1e4449bfd504b60e85447afaa3cd100f7e59dad

$ dd if=/dev/hda1 of=image-hda1.dd bs=512

$ openssl dgst -sha1 image-hda1.ddSHA1(image-hda1.dd)= c1e4449bfd504b60e85447afaa3cd100f7e59dad

[INVESTIGATE read-only image-hda1.dd]

$ openssl dgst -sha1 image-hda1.ddSHA1(image-hda1.dd)= c1e4449bfd504b60e85447afaa3cd100f7e59dad

Harald Baier Hash Functions in Forensics / WS 2011/2012 16/41

Page 9: Computer Forensics Hochschule Darmstadt, CASED … Hard disk drives, solid-state drives, USB sticks. I Mobile phones, SIM cards. I Digital cameras, digital camcorders, SD cards. ...

Use Cases of Hash Functions

Example: Save first partition of a HDD

Chain of custody destroyed

$ openssl dgst -sha1 /dev/hda1SHA1(/dev/hda1)= c1e4449bfd504b60e85447afaa3cd100f7e59dad

$ dd if=/dev/hda1 of=image-hda1.dd bs=512

$ openssl dgst -sha1 image-hda1.ddSHA1(image-hda1.dd)= c1e4449bfd504b60e85447afaa3cd100f7e59dad

[INVESTIGATE read-only image-hda1.dd]

$ openssl dgst -sha1 image-hda1.ddSHA1(image-hda1.dd)= 568c68d97baf93642962f509343b8b02d324e3d8

Harald Baier Hash Functions in Forensics / WS 2011/2012 17/41

Use Cases of Hash Functions

Hash Functions in Data Acquisition: Assessment

1. Key problem: If original and copy differ by at least 1 bit, thechain of custody is broken.

2. Solutions:I Compute hashes over smaller pieces of the input (e.g. file

system blocks).This leads to segment hashes.

I Perform a binary search over original and copy to finddifferences.

Harald Baier Hash Functions in Forensics / WS 2011/2012 18/41

Page 10: Computer Forensics Hochschule Darmstadt, CASED … Hard disk drives, solid-state drives, USB sticks. I Mobile phones, SIM cards. I Digital cameras, digital camcorders, SD cards. ...

Use Cases of Hash Functions

Whitelisting

1. Underlying Idea:I Generate a database of known to be good files.

I Let G denote this set.

I Identify automatically an unsuspicious file on base of its hashvalue, which matches a fingerprint of a file in G.

I Exclude a known to be good file from further investigation.

I Significant reduction of irrelevant data.

2. Examples of unsuspicious files:I System files of operating systems.

I Well-known benign applications like browsers, editors, ...

3. Widespread database:I Reference Data Set (RDS) of the National Software Reference

Library (NSRL), maintained by NISTHarald Baier Hash Functions in Forensics / WS 2011/2012 19/41

Use Cases of Hash Functions

NSRL-RDS

1. Website www.nsrl.nist.gov.

2. Current download version: RDS 2.34 from October 2011

3. Original software is kept to ensure court worthiness.

4. Propagated viaI CD.

I Download of iso-images from website via http.

5. RDS 2.34 comprises four iso-images of about 1.2 GB size.

6. Contains 69.887.164 files and 21.082.054 unique SHA-1 values(due to duplications across different software distributions)

7. Entries are arranged according to the SHA-1 values.

Harald Baier Hash Functions in Forensics / WS 2011/2012 20/41

Page 11: Computer Forensics Hochschule Darmstadt, CASED … Hard disk drives, solid-state drives, USB sticks. I Mobile phones, SIM cards. I Digital cameras, digital camcorders, SD cards. ...

Use Cases of Hash Functions

NSRL-RDS

Sample entries

$ less NSRLFile.txt

"SHA-1","MD5","CRC32","FileName","FileSize","ProductCode","OpSystemCode","SpecialCode"

"000000206738748EDD92C4E3D2E823896700F849","392126E756571EBF112CB1C1CDEDF926","EBD105A0","I05002T2.PFB",98865,3095,"WIN",""

"0000004DA6391F7F5D2F7FCCF36CEBDA60C6EA02","0E53C14A3E48D94FF596A2824307B492","AA6A7B16","00br2026.gif",2226,228,"WIN",""

"000000A9E47BD385A0A3685AA12C2DB6FD727A20","176308F27DD52890F013A3FD80F92E51","D749B562","femvo523.wav",42748,4887,"MacOSX",""

"00000142988AFA836117B1B572FAE4713F200567","9B3702B0E788C6D62996392FE3C9786A","05E566DF","J0180794.JPG",32768,18266,"358",""

"00000142988AFA836117B1B572FAE4713F200567","9B3702B0E788C6D62996392FE3C9786A","05E566DF","J0180794.JPG",32768,2322,"WIN",""

"00000142988AFA836117B1B572FAE4713F200567","9B3702B0E788C6D62996392FE3C9786A","05E566DF","J0180794.JPG",32768,2575,"WIN",""

"00000142988AFA836117B1B572FAE4713F200567","9B3702B0E788C6D62996392FE3C9786A","05E566DF","J0180794.JPG",32768,2583,"WIN",""

"00000142988AFA836117B1B572FAE4713F200567","9B3702B0E788C6D62996392FE3C9786A","05E566DF","J0180794.JPG",32768,3271,"WIN",""

"00000142988AFA836117B1B572FAE4713F200567","9B3702B0E788C6D62996392FE3C9786A","05E566DF","J0180794.JPG",32768,3282,"UNK",""

"SHA-1","MD5","CRC32","FileName","FileSize","ProductCode",\\"OpSystemCode","SpecialCode""000000206738748EDD92C4E3D2E823896700F849",\\"392126E756571EBF112CB1C1CDEDF926",\\"EBD105A0","I05002T2.PFB",98865,3095,"WIN",""

Harald Baier Hash Functions in Forensics / WS 2011/2012 21/41

Use Cases of Hash Functions

Whitelisting: Anti-detection

1. Task: Find a suspicious file s with h(s) = h(g):I Attacker has to solve the second preimage problem for some

g ∈ G.

I As of today hardly possible (even for MD5).

I If RDS is used, both SHA-1 and MD5 preimages have to befound.

2. Attack the whitelist:I Breach integrity and authenticity of whitelist.

I Example:I Attack the distribution channel.

I If RDS is downloaded via http, spoofing is possible.

Harald Baier Hash Functions in Forensics / WS 2011/2012 22/41

Page 12: Computer Forensics Hochschule Darmstadt, CASED … Hard disk drives, solid-state drives, USB sticks. I Mobile phones, SIM cards. I Digital cameras, digital camcorders, SD cards. ...

Use Cases of Hash Functions

Whitelisting: Assessment

1. General assessment:I Well-known and established process in computer forensics.

I If database is trusted, no false positives (positive = benign).

2. Possible bottleneck: Size of database.I Size of database is increasing.

I Currently RDS is about 6 gigabyte.

Harald Baier Hash Functions in Forensics / WS 2011/2012 23/41

Use Cases of Hash Functions

Blacklisting

1. Underlying idea:I Generate a database of known to be bad files.

I Let B denote this set.

I Find automatically a suspicious file on base of its fingerprint,which matches a fingerprint of a file in B.

2. Sample suspect files:I Malware.

I Encryption or steganographic software.

I Corporate secrets.

I IPR protected files.

I Child pornography.

Harald Baier Hash Functions in Forensics / WS 2011/2012 24/41

Page 13: Computer Forensics Hochschule Darmstadt, CASED … Hard disk drives, solid-state drives, USB sticks. I Mobile phones, SIM cards. I Digital cameras, digital camcorders, SD cards. ...

Use Cases of Hash Functions

Blacklisting: Evaluation

1. Anti-detection approach:I Let a suspicious file b ∈ B be given.

I Change some (irrelevant) bit of b to get b′.

I Consequence:I h(b′) is very different from h(b).

I b′ is not detected automatically.

2. Core problem:I Cryptographic requirements of a hash function and forensic

goals are complementary.

I A suspicious file similar to an element of B is not detected.

3. Fragments of elements of B are not identified, too.

Harald Baier Hash Functions in Forensics / WS 2011/2012 25/41

Piecewise Hashing

Motivation

Foundations of Hash Functions

Use Cases of Hash Functions

Piecewise Hashing

Fuzzy Hashing

Harald Baier Hash Functions in Forensics / WS 2011/2012 26/41

Page 14: Computer Forensics Hochschule Darmstadt, CASED … Hard disk drives, solid-state drives, USB sticks. I Mobile phones, SIM cards. I Digital cameras, digital camcorders, SD cards. ...

Piecewise Hashing

Goals

1. Overcome drawbacks of cryptographic hash functions in thecontext of computer forensics.

2. Main drawbacks are:I Data acquisition: Integrity of copy is destroyed, if some bits

change.

I White-/Blacklisting:I Suspect files similar to known to be bad files are not detected.

I Fragments are not detected (due to deletion, fragmentation).

3. Currently known approaches:I Segment hashes (also called block hashes): Tool dcfldd.

I Context-triggered piecewise hashes: Tool ssdeep.

I Similarity digests: Tool sdhash.

Harald Baier Hash Functions in Forensics / WS 2011/2012 27/41

Piecewise Hashing

Segment Hashes

1. Underlying idea:I Split input data (volume, file) in blocks of fixed length.

I Compute for each segment its cryptographic hash.

I Lookup in hash database for matches.

2. Original aim: Improve integrity of storage media.

Harald Baier Hash Functions in Forensics / WS 2011/2012 28/41

Page 15: Computer Forensics Hochschule Darmstadt, CASED … Hard disk drives, solid-state drives, USB sticks. I Mobile phones, SIM cards. I Digital cameras, digital camcorders, SD cards. ...

Piecewise Hashing

Segment Hashes: Example from NIST

1. Sample tool of Nicholas Harbour (since 2002):I dcfldd: An extension of dd.

I Department of Defense – Computer Forensics Laboratory.

I Provides MD5, SHA-1, SHA-2 family.

2. Evaluation by NIST (Douglas White, 2008):I Hashing of File Blocks: When Exact Matches are not Useful.

I NIST worked on Windows 2000 and XP OS files.

I Main result:I File-based data reduction leaves an average 30% of disk space

for human investigation.

I Incorporating block hashes reduces this to an average of 15%.

I Assist in recognising wiped media.

Harald Baier Hash Functions in Forensics / WS 2011/2012 29/41

Piecewise Hashing

Example: Save first partition of a HDD

Chain of custody preserved

$ dcfldd if=/dev/hda1 of=image-hda1.dd bs=512 hashwindow=4096 hash=sha256

0 - 4096: da0bd2b16c7cd5acb5695e9d81fb6d832cba85312d87e08d0c675e41b608de50

4096 - 8192: 281f4b8ac2dcda0f3fd9a0642a694fedf829d7567a531b1cfc8925f94eebe7a3

8192 - 12288: 1c05a3c7251b666c1ec4a2b689e25f95a92a311613ce685fdf7cbf41552290e5

12288 - 16384: 6c9c17f271f18587bcccf8f9c6154b4bff764664a3eb8ddf06881168c5c4698b

16384 - 20480: c61cd658e73450dfb0dfc9a1d83cdbddd162d9194d81f27f0516bb107280e841

[REMOVED]

$ dcfldd if=image-hda1.dd of=/dev/null bs=512 hashwindow=4096 hash=sha256

0 - 4096: da0bd2b16c7cd5acb5695e9d81fb6d832cba85312d87e08d0c675e41b608de50

4096 - 8192: 281f4b8ac2dcda0f3fd9a0642a694fedf829d7567a531b1cfc8925f94eebe7a3

8192 - 12288: 1c05a3c7251b666c1ec4a2b689e25f95a92a311613ce685fdf7cbf41552290e5

12288 - 16384: 6c9c17f271f18587bcccf8f9c6154b4bff764664a3eb8ddf06881168c5c4698b

16384 - 20480: c61cd658e73450dfb0dfc9a1d83cdbddd162d9194d81f27f0516bb107280e841

[REMOVED]

Harald Baier Hash Functions in Forensics / WS 2011/2012 30/41

Page 16: Computer Forensics Hochschule Darmstadt, CASED … Hard disk drives, solid-state drives, USB sticks. I Mobile phones, SIM cards. I Digital cameras, digital camcorders, SD cards. ...

Piecewise Hashing

Example: Save first partition of a HDD

Chain of custody partly destroyed

$ dcfldd if=/dev/hda1 of=image-hda1.dd bs=512 hashwindow=4096 hash=sha256

0 - 4096: da0bd2b16c7cd5acb5695e9d81fb6d832cba85312d87e08d0c675e41b608de50

4096 - 8192: 281f4b8ac2dcda0f3fd9a0642a694fedf829d7567a531b1cfc8925f94eebe7a3

8192 - 12288: 1c05a3c7251b666c1ec4a2b689e25f95a92a311613ce685fdf7cbf41552290e5

12288 - 16384: 6c9c17f271f18587bcccf8f9c6154b4bff764664a3eb8ddf06881168c5c4698b

16384 - 20480: c61cd658e73450dfb0dfc9a1d83cdbddd162d9194d81f27f0516bb107280e841

[REMOVED]

$ dcfldd if=image-hda1.dd of=/dev/null bs=512 hashwindow=4096 hash=sha256

0 - 4096: da0bd2b16c7cd5acb5695e9d81fb6d832cba85312d87e08d0c675e41b608de50

4096 - 8192: 281f4b8ac2dcda0f3fd9a0642a694fedf829d7567a531b1cfc8925f94eebe7a3

8192 - 12288: 1c05a3c7251b666c1ec4a2b689e25f95a92a311613ce685fdf7cbf41552290e5

12288 - 16384: f361bc439d0fc4215eba5523cba663c5371d47c358d68b2192602a29e972e143

16384 - 20480: c61cd658e73450dfb0dfc9a1d83cdbddd162d9194d81f27f0516bb107280e841

[REMOVED]

Harald Baier Hash Functions in Forensics / WS 2011/2012 31/41

Piecewise Hashing

Segment Hashes: Evaluation

1. Anti-Blacklisting is very easy:I Introduce an irrelevant byte in the first sector.

I All segment hashes differ from the stored segment hashes.

I Modified suspect file is not detected.

I Demo.

2. A good technique for whitelisting (see NIST results).

3. Size of segment hash database is large:I 4096 byte block size, SHA-1.

I size of hash databasesize of raw data = 20

4096 = 0.00488=⇒ 1 terabyte of raw data yields a 5 gigabyte hash database.

4. Hash database depends on the hashwindow size.

Harald Baier Hash Functions in Forensics / WS 2011/2012 32/41

Page 17: Computer Forensics Hochschule Darmstadt, CASED … Hard disk drives, solid-state drives, USB sticks. I Mobile phones, SIM cards. I Digital cameras, digital camcorders, SD cards. ...

Piecewise Hashing

Context Triggered Piecewise Hashes

1. Underlying idea:I Split input data (volume, file) in blocks of variable length.

I The end points of the blocks are determined by a rolling hash:I Its value only depends on the current context.

I A window (e.g. of size 7 bytes) slides over the input.

I Context = Bytes of input data in the current window.

I If rolling hash matches a trigger value, an end point is set.

I Rolling hash function is assumed to be pseudo random: PF .

I Compute for every segment its cryptographic hash.

I The sequence of these segment hashes is the context triggeredpiecewise hash (CTPH) of the input.

I Lookup in hash database for matches.

Harald Baier Hash Functions in Forensics / WS 2011/2012 33/41

Piecewise Hashing

Context Triggered Piecewise Hashes

1. Originally proposed for spam detection (spamsum by AndrewTridgell, 2002)

2. Ported to forensics by Jesse Kornblum, 2006: ssdeep.

Harald Baier Hash Functions in Forensics / WS 2011/2012 34/41

Page 18: Computer Forensics Hochschule Darmstadt, CASED … Hard disk drives, solid-state drives, USB sticks. I Mobile phones, SIM cards. I Digital cameras, digital camcorders, SD cards. ...

Piecewise Hashing

Piecewise Hashes: The algorithm

Harald Baier Hash Functions in Forensics / WS 2011/2012 35/41

Piecewise Hashing

CTPH: A sample tool

1. ssdeep (based on spamsum).

2. Window size is 7.

3. CTPH is a sequence of printable characters:I Only the least significant 6 bits (LS6B) of a segment hash are

considered.

I LS6B are encoded base64.

4. ssdeep decides about a match:I On base of the edit distance (changing, inserting, ...

characters) of two CTPHs.

I Edit distance is rescaled to a percentage match score.

5. Demo.

Harald Baier Hash Functions in Forensics / WS 2011/2012 36/41

Page 19: Computer Forensics Hochschule Darmstadt, CASED … Hard disk drives, solid-state drives, USB sticks. I Mobile phones, SIM cards. I Digital cameras, digital camcorders, SD cards. ...

Piecewise Hashing

CTPH: Sample Research Questions

1. Rolling hash:I Shall be efficient and pseudo random.

I Current implementation is fast, but not pseudo random.

I Task: Find a slightly slower, but pseudo random rolling hash.

2. Fragment detection:I Kornblum’s approach fails, if fragments are much smaller than

the original file.

I Task: Find a different approach.

3. Edit distance:I Kornblum’s approach addresses text files (due to spam

detection).

I Task: Find a more general approach, which also addressesimages, videos, ...

Harald Baier Hash Functions in Forensics / WS 2011/2012 37/41

Fuzzy Hashing

Motivation

Foundations of Hash Functions

Use Cases of Hash Functions

Piecewise Hashing

Fuzzy Hashing

Harald Baier Hash Functions in Forensics / WS 2011/2012 38/41

Page 20: Computer Forensics Hochschule Darmstadt, CASED … Hard disk drives, solid-state drives, USB sticks. I Mobile phones, SIM cards. I Digital cameras, digital camcorders, SD cards. ...

Fuzzy Hashing

Vision

1. Find a similarity preserving hash function.I Fuzzy hash function, denoted by f .

I m and m′ are ’similar’ =⇒ f(m) and f(m′) are ’similar’, too.

2. Find a general (forensic) metric to measure similarity:I Let d : {0, 1}∗ × {0, 1}∗ −→ R+

0 be such a metric.

I d shall be applicable to txt, doc, odt, jpg, bmp, devices, ...

3. f is a function with d(m1, m2) ≈ d(f(m1), f(m2)).

Harald Baier Hash Functions in Forensics / WS 2011/2012 39/41

Fuzzy Hashing

Fuzzy Hashing: Applications

1. Forensics (on the file level): Detect similar files.I Blacklisting:

I Detect manipulated suspicious files.

I Find fragments of suspicious data.

I Whitelisting: Find changed unsuspicious files.

2. Biometrics: Template protection.

3. Malware: Detect obfuscated malware (e.g. metamorphicmalware).

4. Junk mail detection.

Harald Baier Hash Functions in Forensics / WS 2011/2012 40/41

Page 21: Computer Forensics Hochschule Darmstadt, CASED … Hard disk drives, solid-state drives, USB sticks. I Mobile phones, SIM cards. I Digital cameras, digital camcorders, SD cards. ...

Fuzzy Hashing

Questions?

Harald Baier Hash Functions in Forensics / WS 2011/2012 41/41