Datafying Bitcoins

Datafying BitcoinTariq B. Ahmad

https://github.com/tariq786/datafying_bitcoin

Motivation● Bitcoin is a virtual Peer-to-Peer crypto currency.

● All bitcoin transactions are publicly available (who sent, who received and how much?) but pseudo-anonymous

● This publicly available data is called “blockchain distributed ledger”. Current size is around 70 GB (binary data). Growing every day since 2009.

BlockChain Size

Bitcoin Transaction types

one to one transaction

Many to Many transaction

Block contains bitcoin

transactions.

There are almost 400,000

blocks today.

Blockchain contain all

these blocks linked

together like a doubly linklist

Data● Historical Data

○ Almost 400,000 blocks (new bitcoins)○ More than 104 Million transactions so far

● Live Data○ 2 transaction per second○ Propagate through Peer to Peer

69 GB (2009-2016)

The evolution of bitcoin transaction fee per block.

Working with Data● Run full node locally on AWS => Store the entire blockchain ledger on AWS.● Query blockchain via JSON RPC in Python● Two RPC calls per block (Number of relevant blocks ~ 200,000 and 6.5 GB

of text storage)○ Av time per RPC call = 1.45 sec (huge performance bottleneck. Work around is to reduce RPC

calls to one RPC call by storing all blocks in json format on disk/HDFS)

Bitcoin Node APP

get block RPC call

block json

get transaction RPC call

transaction json

Data Pipeline

Ingestion File SystemBatch

processingDatabase

Visualization

BitcoinNode

(Local Disk)

StreamprocessingNetcat

Accomplishments and Challenges● Complex query (bitcoin transaction fee evolution) working end to end

● Working with sea of jsons (2 jsons per block) in Apache Spark is complex. Takes time to scale the results

● Ideally comparing three modes (batch,streaming and API) for throughput, latency and cost

● Public APIs have rate limits. After lot of search, found Toshi API https://toshi.io that has no rate limits

Mode # of processed blocks

Time(minutes)

Storage

RPC Batch 186,846 162 Local File System

RPC Batch 186,846 69 HDFS

RPC Streaming 187,990 177 -

API Streaming 187,990 222 -

API Batch 187,990 3.1 HDFS

Comparison

Storing data on HDFS pays off with Spark processing taking only 3.1 minutes in API modeand 69 minutes in RPC mode (62 minutes account for RPC call overhead for get transaction)

Visualization

Zooming in to check discontinuity

About MePhD in Computer Engineering

Parallel Computing & Computer

Security.

In Love with Linux

Likes disruptive technology

Thank you + Q&A

Datafying Bitcoins

Engineering

Transcript of Datafying Bitcoins

Economic aspects of bitcoins

Bitcoins Explained

Digital Virtual Currency and Bitcoins

Trade Bitcoins

Bitcoins Math

On Mining Bitcoins - Fundamentals & Outlooks

How bitcoins work

BITCOINS WITH CREDIT CARD

Cryptocurrencies: Bitcoins · There are three ways for people to get Bitcoins You can buy Bitcoins using “real” money. You can sell things and let people pay you with Bitcoins

Cryptocurrencies and Bitcoins

Why buy bitcoins?

Free Bitcoins!

The New Age Currency: Bitcoins

Bitcoins for coffee in Cryptocafe

Digital currency, Bitcoins

Bitcoins: the cryptocurrencies' revolution

Bitcoins MV

Bitcoins & Blockchain

Bitcoins' anonymity

What Are Bitcoins, Working of Bitcoins, How Bitcoins Work, Bitcoins Systems and Risks, Digital Currency