Merkle Tree Traversal in Log Space & Time

Merkle Tree Traversal in Log Space & Time

Michael Szydlo, RSAEurocrypt 2004

May 6, 2004

Presentation overview

1. Review of Merkle Authentication Trees

2. Define the Traversal Problem

3. Describe classic traversal technique

4. Present new, space-efficient algorithm

5. Concluding comments

Merkle trees

Introduced by Ralph Merkle, 1979 “Classic” cryptographic construction Involves combining hash functions on binary tree structure

A public-key authentication scheme Using only one-way hash function as building blocks No number theory or trapdoor permutations Also public-key signatures (Lamport’s one-time signatures)

Theoretical and practical contexts Receive less practical attention today due to (e.g, RSA, DSA) Not terribly inefficient. No number theory – advantage?

Our contribution Re-examine efficiency aspects of construction New algorithm - answer an “old question” about Merkle trees

Merkle tree data structure

xxxxxx

xxxxxx

xxxxx xxxxx xxxxxxxxxxx

xxxxxx xxxxxx

xxxxxxx xxxxxx xxxxxxx

vi =Hash( si )

• Binary tree, nodes are assigned (e.g. 160 bit) values

• Extra, secret values associated to each leaf.

v=Hash( vleft || vright )

si secret

leaves

Interior nodes

A Public / Private key pair

How to generate a public key pair1. Select a random (e.g 160 bit) secret S

2. Derive leaf secrets si = PRF(S || i )

3. Use hash function to get leaf / interior node values

4. Publish root value as P

Key generation has a cost Tree of height H has N= 2H leaves Nodes at height h will depend on 2h leaf values Obtaining P requires calculating all N leaf values plus 2H-1

more hash function evaluations

Authenticating a secret

Prover wishes to reveals si to identify herself Prover sends i,si (each secret used just once)

Additional data required:”sibling node” values

Verifier checks si against the public key P Hash first si

Hash result together with its sibling in tree Repeat, moving up tree Check result with root

Sibling node values required

xxxxxx

xxxxxx

xxxxx xxxxx

xxxxxx xxxxxxSibling nodes required to authenticate secret

Root value is public

1. Verify secret value by hashing, then hashing together with sibling, etc.

2. Accept if you match with the root values0

H

H

H

Digital signatures, too

Use up 1 leaf per authentication

Digital Signature– use multiple leaves Extends Lamport’s one-time signature scheme

Want to sign m = (m0, m1,… m159)

Requires 160 pairs of secrets {si ti}

si included in signature if mi =0. Otherwise ti is.

Verification requires sibling nodes, as above

Merkle construction provides signatures Security intuitive, how about efficiency?

Efficiency questions

Tacit assumption - all node values saved.

A useful Merkle tree has many leaves! E.g., N= 230 allows many authentications / signatures. Not practical for a weak prover!

Store all node values? – too much space! N= 2H leaves, N-1 interior nodes

Recalculate from scratch? - too much time! Interior node near the top requires 2H-1 Hash operations

The traversal problem

Formulate efficient Prover algorithm. Must output authentication data for each leaf, in sequence:

(on round i, si with associated sibling nodes)

Prover has limited memory Prover should compute few Hash values per round

Metrics Space: 1 Unit = 1 stored node value Time: 1 Unit = 1 leaf calc. or 1 interior node calc. Note - this analysis fixes the security parameter.

Traversal challenge

Higher node – used for 220 rounds, costs ~221

Lower node – used for 25 rounds, costs ~26

……………………………………………

……………………………………………

( Note ‘per round’ cost is <2 )

Merkle’s amortization technique

Used space-efficient node computation

Costly nodes computed over many rounds

Form of the algorithm – on each round Output si with sibling values Discard “expired” sibling values For each height, working on preparing “upcoming” sibling

Upcoming values should be ready on time

Merkle’s result for tree with N=2H leaves O(log(N)) = O(H) time per round. Space bounded O(log(N)2) = O(H2)

TREEHASH Calculate a height h node using space= h+1

Simply erase values no longer required

Adding leaf or internal node is 1 “unit” of work

Evolving set of stored node – call tail nodes

Example with h=3

Merkle’s amortization (2)

Prover’s initial internal state Contains Current and Next sibling value for each height

h<H

Prover’s internal state (later points) Contains Current sibling value for each height h<H For each height, contains Next sibling, OR a partial

TREEHASH computation for Next.

Per-round update procedure Output leaf secret and Current sibling nodes Discard “expired” sibling nodes, promote Next to Current Spend maximum 2 units of work towards the TREEHASH

procedure for each height

Merkle’s amortization (3)

Nodes are ready on time 2 units per round is enough The cost of 2h+1 spread over 2h rounds

Time per round linear in tree-height O(log(N)) = O(H) time per round.

Total Space quadratic in tree-height Each height TREEHASH may be in progress. Space for TREEHASH < 1+2+3+……H Space bound - O(log(N)2) = O(H2)

Recap of classic traversal

Merkle’s Solution indeed satisfactory Medium / Large Merkle trees practical Less efficient than number theory approaches

Security properties transparent No random oracles, etc

Conjecture classic traversal is “optimal”?

Related work

Time-space trade-off. RSA’03 Jakobsson, Micali, Leighton, Szydlo Idea use “sub trees” of height T Speed up Prover by a factor of T ! Increases space by a factor of 2T

This work

New traversal algorithm Still O(log(N)) time Space required reduced to O(log(N))

This is optimal in sense Space at least O(log(N)) - easy to see No traversal algorithm has both

If time < O(log(N)) space= O(log(N)) Proof in paper

Motivation for improvement Tails of Concurrent TREEHASH computations

Graphic reminder of why space is O(log(N)2)

Tail at height h - up to h+1 values

up to h tail pebbles

up to h-1 tail pebbles

Many tails contain pebbles at the same height.

Can this be avoided ?

Wasteful concurrent computation

Example - two TREEHASH instances. Each must compute a node value at height 3 as a sub-goal Assume start at same time

Classic traversal – 2 units of work to each Maximum space 4+4 =8

Re-allocate 4-units/per round Complete first, then do second Maximum space 1+4 =5

Rescheduling save space, complete nodes on time.

Look for scheduling algorithm to avoid such concurrent node computations.

New algorithm:“Zipping” up the tails

Apply budget to meet two kinds of requirements1. Avoid working on height h nodes from different tails

2. Ensure completion of nodes with short deadline.

Solution: this compromise algorithm satisfies both Focus computational attention on nodes with shortest deadline Delay beginning new height h node until other TREEHASH are

partially completed, with no tail nodes below height h So we zip up the tails before diverting attention Essentially rigging it to have fewer tail nodes

What is the effect of this rescheduling ? Question 1: Are the nodes completed on time ? Question 2: How much space do you need now ?

Nodes completed on time

Informal justification For a node at height h node, the delay < 2h+1

This is only 2 per round over period of 2h rounds Long time to recover from delay

Formal proof involves computation Fix any period of 2h rounds Identify all “deadlines”, maximum delay Tabulate total required computation units This is less than total budget over period

Experimental verification (via implementation)

Algorithm works time 2 log(N) per round

Less space is used

Easy to see why space is O(log(N))

At each height at most 4 values are stored. Exactly one current sibling value At most 1 completed next sibling value At most 2 tail values

Total space required 3 log(N) Tail pebbles happen when a sibling incomplete

Result of new algorithm

Traversal of a Merkle tree with N leaves

Space bounded by 3 log(N) [ node storage units ]

Time is 2 log(N) [ leaf calc units, hash evaluation units ]

Answers classic Merkle traversal problem. Asymptotically optimal

Improved constants?

The constants are not optimal

Example - retain left nodes to half time Manuscript on webpage rsasecurity.com / szydlo.com

Can technique be combined with JMLS’03? The main focus was to increase speed, at space cost Zipping technique still always saves some space

Practical ramifications

Merkle authentication & signatures more feasible on space constrained devices

Easy relationship between tree size and speed

Speed up if smaller tree size acceptable

Possible bonus for longer term assurance hedge against number theory breakthrough

Conclusions

Merkle Trees - interesting after 25 years.

Viable for practical applications? Need not be only a theoretical construction More efficient than widely believed.

Further directions Use as a tool in larger crypto protocols Improve constants good implementations, compare speed to RSA What else can we do without number theory based

cryptography?

Merkle Tree Traversal in Log Space & Time

Documents

Transcript of Merkle Tree Traversal in Log Space & Time