Merkle Tree Traversal in Log Space & Time
description
Transcript of Merkle Tree Traversal in Log Space & Time
Merkle Tree Traversal in Log Space & Time
Michael Szydlo, RSAEurocrypt 2004
May 6, 2004
Presentation overview
1. Review of Merkle Authentication Trees
2. Define the Traversal Problem
3. Describe classic traversal technique
4. Present new, space-efficient algorithm
5. Concluding comments
Merkle trees
Introduced by Ralph Merkle, 1979 “Classic” cryptographic construction Involves combining hash functions on binary tree structure
A public-key authentication scheme Using only one-way hash function as building blocks No number theory or trapdoor permutations Also public-key signatures (Lamport’s one-time signatures)
Theoretical and practical contexts Receive less practical attention today due to (e.g, RSA, DSA) Not terribly inefficient. No number theory – advantage?
Our contribution Re-examine efficiency aspects of construction New algorithm - answer an “old question” about Merkle trees
Merkle tree data structure
xxxxxx
xxxxxx
xxxxx xxxxx xxxxxxxxxxx
xxxxxx xxxxxx
xxxxxxx xxxxxx xxxxxxx
vi =Hash( si )
• Binary tree, nodes are assigned (e.g. 160 bit) values
• Extra, secret values associated to each leaf.
v=Hash( vleft || vright )
si secret
leaves
Interior nodes
A Public / Private key pair
How to generate a public key pair1. Select a random (e.g 160 bit) secret S
2. Derive leaf secrets si = PRF(S || i )
3. Use hash function to get leaf / interior node values
4. Publish root value as P
Key generation has a cost Tree of height H has N= 2H leaves Nodes at height h will depend on 2h leaf values Obtaining P requires calculating all N leaf values plus 2H-1
more hash function evaluations
Authenticating a secret
Prover wishes to reveals si to identify herself Prover sends i,si (each secret used just once)
Additional data required:”sibling node” values
Verifier checks si against the public key P Hash first si
Hash result together with its sibling in tree Repeat, moving up tree Check result with root
Sibling node values required
xxxxxx
xxxxxx
xxxxx xxxxx
xxxxxx xxxxxxSibling nodes required to authenticate secret
Root value is public
1. Verify secret value by hashing, then hashing together with sibling, etc.
2. Accept if you match with the root values0
H
H
H
Digital signatures, too
Use up 1 leaf per authentication
Digital Signature– use multiple leaves Extends Lamport’s one-time signature scheme
Want to sign m = (m0, m1,… m159)
Requires 160 pairs of secrets {si ti}
si included in signature if mi =0. Otherwise ti is.
Verification requires sibling nodes, as above
Merkle construction provides signatures Security intuitive, how about efficiency?
Efficiency questions
Tacit assumption - all node values saved.
A useful Merkle tree has many leaves! E.g., N= 230 allows many authentications / signatures. Not practical for a weak prover!
Store all node values? – too much space! N= 2H leaves, N-1 interior nodes
Recalculate from scratch? - too much time! Interior node near the top requires 2H-1 Hash operations
The traversal problem
Formulate efficient Prover algorithm. Must output authentication data for each leaf, in sequence:
(on round i, si with associated sibling nodes)
Prover has limited memory Prover should compute few Hash values per round
Metrics Space: 1 Unit = 1 stored node value Time: 1 Unit = 1 leaf calc. or 1 interior node calc. Note - this analysis fixes the security parameter.
Traversal challenge
Higher node – used for 220 rounds, costs ~221
Lower node – used for 25 rounds, costs ~26
……………………………………………
……………………………………………
( Note ‘per round’ cost is <2 )
Merkle’s amortization technique
Used space-efficient node computation
Costly nodes computed over many rounds
Form of the algorithm – on each round Output si with sibling values Discard “expired” sibling values For each height, working on preparing “upcoming” sibling
Upcoming values should be ready on time
Merkle’s result for tree with N=2H leaves O(log(N)) = O(H) time per round. Space bounded O(log(N)2) = O(H2)
TREEHASH Calculate a height h node using space= h+1
Simply erase values no longer required
Adding leaf or internal node is 1 “unit” of work
Evolving set of stored node – call tail nodes
Example with h=3
Merkle’s amortization (2)
Prover’s initial internal state Contains Current and Next sibling value for each height
h<H
Prover’s internal state (later points) Contains Current sibling value for each height h<H For each height, contains Next sibling, OR a partial
TREEHASH computation for Next.
Per-round update procedure Output leaf secret and Current sibling nodes Discard “expired” sibling nodes, promote Next to Current Spend maximum 2 units of work towards the TREEHASH
procedure for each height
Merkle’s amortization (3)
Nodes are ready on time 2 units per round is enough The cost of 2h+1 spread over 2h rounds
Time per round linear in tree-height O(log(N)) = O(H) time per round.
Total Space quadratic in tree-height Each height TREEHASH may be in progress. Space for TREEHASH < 1+2+3+……H Space bound - O(log(N)2) = O(H2)
Recap of classic traversal
Merkle’s Solution indeed satisfactory Medium / Large Merkle trees practical Less efficient than number theory approaches
Security properties transparent No random oracles, etc
Conjecture classic traversal is “optimal”?
Related work
Time-space trade-off. RSA’03 Jakobsson, Micali, Leighton, Szydlo Idea use “sub trees” of height T Speed up Prover by a factor of T ! Increases space by a factor of 2T
This work
New traversal algorithm Still O(log(N)) time Space required reduced to O(log(N))
This is optimal in sense Space at least O(log(N)) - easy to see No traversal algorithm has both
If time < O(log(N)) space= O(log(N)) Proof in paper
Motivation for improvement Tails of Concurrent TREEHASH computations
Graphic reminder of why space is O(log(N)2)
Tail at height h - up to h+1 values
up to h tail pebbles
up to h-1 tail pebbles
Many tails contain pebbles at the same height.
Can this be avoided ?
Wasteful concurrent computation
Example - two TREEHASH instances. Each must compute a node value at height 3 as a sub-goal Assume start at same time
Classic traversal – 2 units of work to each Maximum space 4+4 =8
Re-allocate 4-units/per round Complete first, then do second Maximum space 1+4 =5
Rescheduling save space, complete nodes on time.
Look for scheduling algorithm to avoid such concurrent node computations.
New algorithm:“Zipping” up the tails
Apply budget to meet two kinds of requirements1. Avoid working on height h nodes from different tails
2. Ensure completion of nodes with short deadline.
Solution: this compromise algorithm satisfies both Focus computational attention on nodes with shortest deadline Delay beginning new height h node until other TREEHASH are
partially completed, with no tail nodes below height h So we zip up the tails before diverting attention Essentially rigging it to have fewer tail nodes
What is the effect of this rescheduling ? Question 1: Are the nodes completed on time ? Question 2: How much space do you need now ?
Nodes completed on time
Informal justification For a node at height h node, the delay < 2h+1
This is only 2 per round over period of 2h rounds Long time to recover from delay
Formal proof involves computation Fix any period of 2h rounds Identify all “deadlines”, maximum delay Tabulate total required computation units This is less than total budget over period
Experimental verification (via implementation)
Algorithm works time 2 log(N) per round
Less space is used
Easy to see why space is O(log(N))
At each height at most 4 values are stored. Exactly one current sibling value At most 1 completed next sibling value At most 2 tail values
Total space required 3 log(N) Tail pebbles happen when a sibling incomplete
Result of new algorithm
Traversal of a Merkle tree with N leaves
Space bounded by 3 log(N) [ node storage units ]
Time is 2 log(N) [ leaf calc units, hash evaluation units ]
Answers classic Merkle traversal problem. Asymptotically optimal
Improved constants?
The constants are not optimal
Example - retain left nodes to half time Manuscript on webpage rsasecurity.com / szydlo.com
Can technique be combined with JMLS’03? The main focus was to increase speed, at space cost Zipping technique still always saves some space
Practical ramifications
Merkle authentication & signatures more feasible on space constrained devices
Easy relationship between tree size and speed
Speed up if smaller tree size acceptable
Possible bonus for longer term assurance hedge against number theory breakthrough
Conclusions
Merkle Trees - interesting after 25 years.
Viable for practical applications? Need not be only a theoretical construction More efficient than widely believed.
Further directions Use as a tool in larger crypto protocols Improve constants good implementations, compare speed to RSA What else can we do without number theory based
cryptography?