Bitcoin Internals
-
Upload
james-turner -
Category
Internet
-
view
44 -
download
0
Transcript of Bitcoin Internals
Bitcoin Internals
Who am I?
James Turner Polyglot programmer
Worked for ebay , BBC, BSkyB CTO @ magnr.com
2
Topics
• Binary protocols
• Hashing and probability
• Bloom Filters
• Merkle Trees
• P2P networks and CAP theorem
3
What is a binary protocol?
4
What is a binary protocol
A binary protocol is a protocol which is intended or expected to be read by a machine rather than a human being, as opposed to a plain text protocol such as IRC, SMTP, or HTTP. Binary protocols have the advantage of terseness, which translates into speed of transmission and interpretation.
5https://en.wikipedia.org/wiki/Binary_protocol
Wikipedia says…
NOT a binary protocol
6
Our own binary protocol?
We can define our own “Sandwich” protocol as 1) a 32 bit Integer for number of cheese slices followed by 2) a 32 bit Integer for number of ham slices
So our binary protocol (assuming Big Endian) for 1,1 sandwich would be: 00000000 00000000 00000000 00000001 00000000 00000000 00000000 00000001
This is a fixed format. There are no variable sized parts.
7https://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats
Binary Protocol Efficiency
Our “Sandwich” protocol uses 8 bytes to transmit the cheese and ham information.
Compare this to JSON, where we might have {“cheese”:1,”ham”:1}
This is 20 bytes
In this example, we’re >50% more efficient.
However, sometimes you can’t read a binary protocol, our terminal output would be “”
8
There are 8 bytes here honestly!
Variable length binary protocol
The “Message” protocol:
1) a 32 bit Integer followed by 2) a variable number of bytes (chars)
So our binary output for 5”hello” would be
00000000 00000000 00000000 00000101 1101000 1100101 1101100 1101100 1101111
9https://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats
Bitcoin protocol
10https://en.bitcoin.it/wiki/Protocol_documentation#Message_structure
Block
Message Header
What is hashing?
A computational function that takes an arbitrary sized input, and produces a fixed size output.
e.g. sha256(“hello”) produces “2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824”
Hashing has certain properties which we find useful: • It’s extremely hard to reverse, and calculate the original data from the hash. • If the input data changes even slightly the hash output is completely different.
11
Hashing collision probability
If 2 pieces of input data produce the same output hash, we have a “collision”.
Given the random nature of hashing, what is the probability that for any 2 pieces of input data we would generate identical hashes?
If the output of the hashing function is a single byte e.g. 01101011 or 01111111 or 01110001
We can see that there is a 1 in 256 (2^8) chance of getting a collision.
This can be generalised to 1/(2^n) where n is number of bits.
12
Merkle Trees
13
Root Hash 1234
TX1
Hash0
hash(TX1)
Hash 01hash(Hash0 , Hash1 )
TX2
Hash1
hash(TX2)
TX3
Hash2
hash(TX3)
TX4
Hash3
hash(TX4)
Data
Hash 23hash(Hash2 , Hash3 )
Verify a TX using Merkle Trees
14
H 12345678
TX1
H1
H12
DataTX2 TX3 TX4 TX5 TX6 TX7 TX8
H2 H3 H4 H5 H6 H7 H8
H34 H56 H78
H1234 H5678
Verify TX8 exists in block
https://en.bitcoin.it/wiki/Merged_mining_specification#Merkle_Branch
Merkle Blocks
15
Merkle Block Message
https://github.com/bitcoin/bips/blob/master/bip-0037.mediawiki
Bloom Filters
Let’s assume we have 2 hash functions “f” and “g”
f(x) and g(x) produce 2 random outputs
e.g.
f(“hello”) => 123 g(“hello”) => 192
16https://en.wikipedia.org/wiki/Bloom_filter
Bloom Filters
We have an array of bits, let’s say 8 (this will fit a single byte). such that the empty bitset looks like this:
17https://en.wikipedia.org/wiki/Bloom_filter
Bloom Filters
By performing the modulus (%) of each hash output with 8 (the size of the bitset) we should get the following:
123 % 8 =3 192 % 8 = 0
We now mark positions 0 and 3 as “1” bits
18https://en.wikipedia.org/wiki/Bloom_filter
Bloom Filters
19https://en.wikipedia.org/wiki/Bloom_filter
f(“hello”) g(“hello”) f(“world”) g(“world”)
Bloom Filters (exists)
20https://en.wikipedia.org/wiki/Bloom_filter
f(“world”) g(“world”) f(“bar”) g(“bar”)
Bloom Filters (false positives)
21https://en.wikipedia.org/wiki/Bloom_filter
f(“foo”) g(“foo”)
Bloom Filters (error rate)
22https://en.wikipedia.org/wiki/Bloom_filter
m bits
k is number of hashing functionsk=2 , hash functions f & g
m=8m is the number of bits in our bitset
n is the number of items representedn=1, “hello”
f(“hello”)
probability of a single bit NOT being set is (1 - 1/8)^2 , more generally (1-1/m)^k
as n grows, this becomes (1-1/m)^kn
g(“hello”)
Bloom Filter properties
• Memory compaction (lots of items in a small space)
• Possible existence (and false positives)
• Collision probability determined by number of items/number
of bits
23
Bloom Filters in Bitcoin
24
filteradd, filterload, filterremove
CAP theorem
• Consistency
• Availability
• Partition Tolerance
25
https://en.wikipedia.org/wiki/CAP_theorem
The CAP theorem is a negative result that says you cannot simultaneously achieve all three goals in the presence of errors. Hence, you must pick one objective to give up.
http://cacm.acm.org/blogs/blog-cacm/83396-errors-in-database-systems-eventual-consistency-and-the-cap-
theorem/fulltext
CAP theorem in P2P networks
26
A node in the Bitcoin network
Availability
27
XAvailable
Available Available
Available
Available
Available
Available
Available
Available
Available
Unavailable
Network: Available
UnavailableX
Partitioning
28
XX
Partitioned
TX
TX
TX
Consistency
29
Block 222B
Network: Inconsistent
Block 222B
Block 222B Block 222B
Block 222B
Block 222A Block 222A
Block 222A
Block 222A
Block 222A
Block 222A
Block 222A
Consistency
30
Block 223
Network: Consistent
Block 223
Block 223 Block 223
Block 223
Block 223 Block 223
Block 223
Block 223
Block 223
Block 223
Block 223
Other P2P protocols
• BitTorrent (Distributed Hash Tables)
• Gnutella (Query Routing Tables)
31https://en.wikipedia.org/wiki/List_of_P2P_protocols
Questions?
32