Riffle: An Efficient Communication System with Strong ...
Transcript of Riffle: An Efficient Communication System with Strong ...
Riffle: An Efficient Communication System with Strong
Anonymity
by
Young Hyun Kwon
B.S., University of Pennsylvania (2013)
ARCHNESMASSACHUSETTS tNSTITJTE
OF TECHNOLOLGY
JUL 0 7 2015
LIBRARIES
Submitted to the Department of Electrical Engineering and Computer Sciencein partial fulfillment of the requirements for the degree of
Master of Science in Electrical Engineering and Computer Science
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
June 2015
Massachusetts Institute of Technology 2015. All rights reserved.
Author ........Signature redacted
................Department of Electrical Engineering and Computer Science
May 20, 2015
0((Signature redacted
Certified by.............................................................Srini Devadas
Edwin Sibley Webster Professor of Electrical Engineering and Computer ScienceThesis Supervisor
Accepted by .................. Signature redactedI e L6t(i. Kolodziejski
Chair, Department Committee on Graduate Students
I
...
Riffle: An Efficient Communication System with Strong Anonymityby
Young Hyun Kwon
Submitted to the Department of Electrical Engineering and Computer Scienceon May 20, 2015, in partial fulfillment of the
requirements for the degree ofMaster of Science in Electrical Engineering and Computer Science
Abstract
Anonymous communication is an important part of democratic societies and freedom ofspeech. Whistleblowers, protest organizers, and, more broadly, anyone with controversialviewpoints have been using the limited form of anonymity the Internet provides to protecttheir privacy. Unfortunately, the basic anonymity the Internet guarantees is too weak toprotect their identities from even the weakest adversaries. As a result, more and more usershave adopted privacy enhancing technologies to protect themselves.
All existing anonymity systems, however, sacrifice anonymity for efficient communicationor vice-versa. Onion-routing achieves low latency, high bandwidth, and scalable anonymouscommunication, but is susceptible to traffic analysis attacks. Designs based on DC-Nets, onthe other hand, protect the users against traffic analysis attacks, but sacrifice bandwidth.Verifiable mixnets maintain strong anonymity with low bandwidth overhead, but suffer fromhigh computation overhead instead.
In this thesis, we present Riffle, a bandwidth and computation efficient communicationsystem with strong anonymity. Riffle consists of a small set of anonymity servers and alarge number of users, and guarantees anonymity as long as there exists at least one honestserver. Riffle uses a new hybrid verifiable shuffle technique and private information retrievalfor bandwidth- and computation-efficient anonymous communication. We have evaluatedRiffle in two different applications: file sharing and microblogging. Our evaluation showsthat Riffle can achieve a bandwidth of over 100KB/s per user in an anonymity set of 200users in the case of file sharing, and handle over 100,000 users with less than 10 secondlatency in the case of microblogging.
Thesis Supervisor: Srini DevadasTitle: Edwin Sibley Webster Professor of Electrical Engineering and Computer Science
3
Acknowledgments
First and foremost, I would like to thank my advisor, Srini Devadas, for his endless support
and guidance. Your energy and advice have guided me through the maze that is graduate
school thus far, and helped me grow immensely both as a person and a researcher. I especially
thank you for the amount of research freedom you have granted me, allowing me to explore
different areas of computer science without any restrictions. I really cannot imagine a better
advisor, and I look forward to our road ahead.I would like to thank my collaborators, Bryan Ford and David Lazar, for the long dis-
cussions we had, both technical and personal. This project was very much a rollercoasterride: almost daily, we hit highs when we found new ideas and directions, and, just hours
later, we hit lows when we realized new flaws in the system that broke our ideas entirely.If it were not for Bryan and David, I would have given up on this project a long time ago;
thank you for making the highs higher and lows not so low. As a result of this collaboration,I also found some new research directions I would like to pursue in the future; for that, Iam extremely grateful.
I have also been fortunate enough to find a research home with people of infinite academicinterests and capabilities. I would like to thank everyone in our group, especially ChrisFletcher, Ling Ren, Charles Herder, Mashael Al-Sabah, and Marten Van Dijk (who is now
a professor at University of Connecticut) for everything they have done for and with mesince I started at MIT. When I say something crazy, perhaps even stupid, they are the
people who find the silver-lining, and develop it into an interesting research idea. I love the
open-minded attitude of our group towards research, and I am excited to see what more wecan accomplish.
I also want to thank my undergraduate advisor, Andr6 DeHon. He took me under
his wing when I was just a lost sophomore in college, and helped me discover my passion
for computer security. Without him, not only would I not be pursuing a Ph.D., but Iwould probably not even be working on computer security. I would like to also thank
professors Jonathan Smith, Boon Thau Loo, and Sanjeev Khanna who have provided me
with invaluable advice on both personal and professional matters. Without their help, I am
honestly not sure where I would be.Thank you to all my MIT and non-MIT friends. I cannot mention everyone as the
list would be longer than the actual thesis, but I would like to say special thanks to (in
alphabetical order): Misha Borochin, Daniel Grier, Michal Grzadkowski, Nick Howarth, "G"Kamath, Twan Koolen, Jerry Li, Daniel Ramos, Po-An Tsai, Yang Yang Zhao, and Alex
Zheng. I need not say more than "You guys are the best."
Finally, the biggest thanks to my grandmother, my parents, my brother Fred, and my
better half Fannie Liu. They believe in me even when I find it hard to, and constantly
remind me why my life is so fantastic. Fannie, you are now starting graduate school, and Ihope I can repay some of the things you have done for me by doing the same for you!
5
Contents
1 Introduction 131.1 Motivation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.2 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2 Related Work 172.1 Proxy-Based Anonymity Systems . . . . . . . . . . . . . . . . . . . . . . . . . 172.2 Anonymity versus Bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.3 Anonymity versus Computation . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3 Background 213.1 Verifiable Shuffle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 Authenticated Encryption . . . . . ........................ 21
3.3 Riffle Deployment Model and Assumptions . . . . . . . . . . . . . . . . . . . . 22
4 Riffle Architecture 254.1 Riffle Protocol Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.2 Hybrid Verifiable Shuffle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.3 Private Information Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.4 Accusation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.5 Bandwidth Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.6 Security Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5 Applications 335.1 File Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.1.1 Non-Bijective Requests and Uploads . . . . . . . . . . . . . . . . . . . 34
5.1.2 Bandwidth Overhead in File Sharing . . . . . . . . . . . . . . . . . . . 355.2 Microblogging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6 Prototype Evaluation 376.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.2.1 File sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376.2.2 Microblogging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
7 Discussion, Future Work, and Conclusion 417.1 D icussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
7.1.1 Alternate usage model . . . . . ... . . . . . . . . . . . . . . . . . . . . 41
7
7.1.2 M alicious Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417.1.3 Server to server bandwidth . . . . . . . . . . . . . . . . . . . . . . . . 427.1.4 Network churns and group changes . . . . . . . ... . . . . . . . . . . . 427.1.5 Intersection attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
7.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
8
List of Figures
3-1 Deployment model of Riffle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4-1 Anonymous communication in Riffle . . . . . . . . . . . . . . . . . . . . . . . 26
5-1 Anonymous File Sharing Protocol . . . . . . . . . . . . . . . . . . . . . . . . . 34
6-1 Testbed topology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386-2 Average time taken to share a 300MB file and effective bandwidth with dif-
ferent group configurations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396-3 Breakdown of the total time taken to share two different size files for varying
numbers of clients with 3 servers. . . . . . . . . . . . . . . . . . . . . . . . . . 396-4 Average latency to share a microblogging post for different message sizes with
3 servers. ....... .............. .............. ...... ... 40
9
List of Tables
2.1 Trade-offs made by existing systems. . . . . . . . . . . . . . . . . . . . . . . . 17
3.1 Term inology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
11
Chapter 1
Introduction
1.1 Motivation
The right to remain anonymous is a fundamental right in a democratic society and is crucialfor freedom of speech [49]. Many whistleblowers, protest organizers, and, more broadly,anyone with controversial viewpoints have long been relying on the limited form of anonymitythe Internet provides to voice their ideas. Recently, anonymizing networks based on relayssuch as Tor [251 have been gaining popularity as a practical privacy enhancing technologyamong users seeking higher levels of privacy. However, such systems are susceptible to trafficanalysis attacks [44, 36] by powerful adversaries such as an authoritarian government or astate controlled ISP, and have recently been attacked by even weaker adversaries monitoringonly the users' traffic [32, 40, 15, 51].
There are two major branches of work that offer traffic analysis resistance even in thepresence of a powerful adversary. The first is the Dining-Cryptographer Networks (DC-Nets) [161, which offers information theoretically secure anonymous communication for itsusers as long as one other participant is honest. Dissent [53] improved upon DC-Nets bymoving to the anytrust model, where the network is organized as servers and clients, andguarantee anonymity as long as there exists one honest server. The second is verifiablemixnets, based on mix networks [18]. In this design, the mixes use verifiable shuffle [14, 28,37, 8] to permute the ciphertexts, and produce a third-party verifiable proof of the correctnessof the shuffle without revealing the actual permutation. Similar to DC-Net based systems,verifiable mixnets guarantee anonymity as long as one mix in the network is honest.
Both designs, however, suffer from serious drawbacks. DC-Nets and DC-Net based sys-tems, by design, implement a broadcast channel. That is, they were primarily designed forthe case where one client messages everyone in the network. Thus, when multiple users wishto communicate simultaneously, every user must transfer a message of size proportional tothe number of clients who wish to communicate. As a result, DC-Net based designs sufferfrom a large bandwidth overhead, and only scale to a few thousand clients [53, 23]. Verifiablemixnets, on the other hand, allow the clients to send messages proportional only to theirown messages, and thus can be bandwidth efficient. However, the high computation over-head of verifiable shuffles has prevented verifiable mixnets from supportig high bandwidthcommunication.
13
1.2 Thesis Contributions
In this thesis, we present Riffle, a system for bandwidth- and computation-efficient anony-
mous communication. Riffle addresses the problems of DC-Nets and verifiable mixnets,while offering the same level of anonymity. At a high level, Riffle is organized as servers and
clients, similar to previous works [53, 21, 23]. Riffle focuses on minimizing the bandwidthconsumption of the clients, who may be connecting from bandwidth-constrained environ-
ments such as their mobile phones, and reduce the computation overhead of the servers, who
may have to compute on large amounts of data. Specifically, the clients in Riffle consume
upstream bandwidth proportional only to the size of their messages rather than the number
of clients, and the server computation only involves fast symmetric key cryptography in thecommon case. This allows the users to efficiently exchange messages, making it suitable
for applications like file sharing that no current strong anonymity system can support well.
Moreover, Riffle provides strong anonymity for all clients as long as one of the servers is
honest.Riffle achieves bandwidth and computation efficiency by employing two privacy primi-
tives: a new hybrid verifiable shuffle for upstream communication, and private informationretrieval (PIR) [201 for downstream communication. Our novel hybrid verifiable shuffle
scheme avoids expensive traditional verifiable shuffle in the critical path, and improves both
bandwidth and computation overhead of the shuffle without losing verifiability by using au-
thenticated encryption. We also propose a novel application of private information retrieval
in the anytrust setting. Previous strong anonymity systems made a trade-off between com-
putation and bandwidth by either broadcasting all messages to all users (low computation,high bandwidth) [53, 23, 21], or using computationally expensive PIR (high computation,low bandwidth) [47]. In the anytrust model, we show that PIR can minimize the download
bandwidth with minimal computation overhead.
We develop a Riffle prototype, and two applications. The first is an anonymous file
sharing application, where each message is large and is potentially of interest to only a small
number of users. Sharing large files is a scenario that has not been considered carefully byprevious strong anonymity systems [8, 53, 21], and we propose a new file sharing protocol
using Riffle in this thesis. The second is an anonymous microblogging application, similar
to the applications studied by previous works [53, 21]. In this setting, each user posts small
messages to the servers, and each message is of interest to many users.
Our prototype demonstrates effectiveness for both applications. In file sharing, the pro-
totype achieves high bandwidth (over 100KB/s) for more than 200 clients. In microblogging,the prototype can support more than 10,000 clients with latency less than a second, or han-
dle more than 100,000 clients with latency of less than 10 seconds. We show that our results
are orders of magnitude better in terms of both scalability and bandwidth compared to
previous systems [48, 53, 81 in comparable scenarios.
To summarize, this thesis makes the following contributions:
1. A hybrid verifiable shuffle that uses symmetric encryption, and avoids expensive public
key verifiable shuffling [14, 28, 37, 8] in the common case.
2. A novel application of private information retrieval [20] in anytrust settings.
3. A bandwidth- and computation-efficient anonymous communication system that is
resilient against traffic analysis attacks and malicious clients.
4. Evaluation of Riffle that demonstrates efficiency in different applications-'
14
1.3 Thesis Organization
In Chapter 2, we describe related work. In Chapter 3, we describe cryptographic primitives
relevant to Riffle, and explain the threat model and deployment model of Riffle. In Chapter 4,
we present the Riffle architecture in detail. We then describe the example of applications
Riffle in Chapter 5, and evaluate our prototype in Chapter 6. Finally, we discuss future
work and conclude in Chapter 7.
15
Chapter 2
Related Work
In this chapter, we describe related work, focusing on the anonymity, bandwidth, and com-putation trade-offs made by existing anonymity systems. Table 2.1 summarizes the trade-offsdescribed in this chapter, along with Riffle's characteristics.
Table 2.1: Trade-offs made by existing systems.
2.1 Proxy-Based Anonymity Systems
Tor [25] is a popular anonymity system that focuses on scalability and low-latency by rout-ing the users' messages through their decentralized volunteer relays without any :delays orcover traffic. While this design allows Tor to scale to millions of users [7], it also allows apowerful adversary who can observe traffic going in and out of the relay network (e.g., astate controlled ISP) to deanonymize the users [44, 36]. Moreover, a recent class of machinelearning based attacks enables even a local adversary only observing traffic to entry relaysto deanonymize the users with high probability [32, 40, 15, 51]. Other systems that alsouse anonymizing proxies [33, 35] focus on different aspects of communication while makingsimilar sacrifices of anonymity.
Mix networks (mixnets) [18] and systems that build on them [24, 45, 27, 46] aim toprevent traffic analysis attacks via routing each user's message through a set of anonymityservers called mixes. In mixnets, mixes collect many users' inputs and shuffle them beforesending any out, making it difficult to correlate the inputs and the outputs even for aglobal adversary. Because the mixes can be distributed and decentralized, mixnets havebeen shown to scale to a large number of clients [46], and provide reasonable latency and
Offers good bandwidth only if the database is small.
17
System Anonymity Bandwidth ComputationTor [25] X
Mixnets [18] XDC-Nets [16, 22, 30] /
Dissent [53] XRiposte 121] Al
Verifiable Mixnets [37, 8] VRiffle
bandwidth when the mixes are well-provisioned. With malicious mixes in the network,however, mixnets fail to provide the level of anonymity required for sensitive activities likewhistleblowing. Specifically, several proposed attacks [42, 41, 38, 52] allow a malicious mixto deanonymize the users through dropping, modifying, or duplicating the input messagesbefore sending them out.
2.2 Anonymity versus Bandwidth
Unlike systems using anonymizing proxies, Dining Cryptographer Networks (DC-Nets) [16]provide information theoretic anonymity even in the face of global adversaries as long as thereexists at least one other honest participant. However, they require communication betweenevery user to broadcast one message from a single user, and thus incur high bandwidthoverhead. As a result, systems that build on DC-Nets could not scale to more than a fewtens of clients [30, 22]. Recent DC-Net based systems 153, 23] use the anytrust model, wherethe network is organized as servers and clients, and guarantee anonymity as long as one ofthe servers is honest. The new model allowed the designs to scale to a few thousand clientswith strong anonymity, but they still suffer from a bandwidth penalty proportional to thenumber of clients and the message size.
Riposte [21] is a recent system that is optimized for anonymous microblogging. Eachclient in Riposte uses a novel "private information storage" technique to write into a shareddatabase maintained by multiple servers. It follows a similar threat model as Dissent, whereanonymity is guaranteed as long as one server is honest, and offers good throughput forsmall messages. However, each Riposte client must submit a message proportional to thesquare root of the size of the whole database (i.e., collection of all clients' data), making itunsuitable for sharing large messages among many clients.
2.3 Anonymity versus Computation
Another design that provides strong anonymity is a verifiable mixnet. Verifiable mixnetsaim to solve the security problems of plain mixnets by using verifiable shuffles [14, 28, 37, 8].In this design, when a mix shuffles the inputs, it also generates a zero-knowledge proof thatproves that the outputs form a valid permutation of the input ciphertexts. 2 Using the proofand the ciphertexts, other mixes in the network can verify that the mix did not tamper withany message, while learning nothing about the actual permutation. Assuming that at leastone of the mixes is honest, a verifiable mixnet is secure even with compromised mixes inthe network: The honest mix will shuffle inputs sufficiently, which prevents traffic analysisattacks, and the malicious mixes cannot tamper with the inputs without generating a badproof. However, generation and verification of such proofs is computationally expensive,resulting in high latency and low bandwidth. The state-of-the-art verifiable shuffle by Bayerand Groth [8], for instance, takes 2 minutes to prove and verify a shuffle of 100,000 shortmessages.
Another privacy primitive that trades off computation for anonymity is private infor-mation retrieval (PIR) [20]. While most anonymity systems mentioned previously focus onprotecting the senders' anonymity, PIR protects the privacy of the downloader. In the set-ting of PIR, a client accesses some data in a database managed by a server or a collection of
2The encryption scheme used must be probabilistic, and the mix re-randomizes the ciphertexts beforeoutputting. Otherwise, other mixes can trivially learn the permutation.
18
servers, and the goal is to hide which data was accessed. There are variants of PIR for differ-
ent settings [20, 19, 29], but many schemes have complex formulation, and incur significant
overheads in computation. On the other hand, the original PIR scheme proposed by Chor
et al. [20] is efficient in terms of both computation and bandwidth, but its threat model is
limiting: it requires multiple servers each with a copy of the database, and at least one of
the servers needs to be honest. However, this is precisely the setting of anytrust, and we
show that the efficient PIR can be used in a practical system to minimize the downstream
bandwidth overhead.
19
Chapter 3
Background
This chapter summarizes two important cryptographic primitives used in Riffle, and de-scribes the system and threat model.
3.1 Verifiable Shuffle
As mentioned in Chapter 2, verifiable shuffles [37, 8J can be used to construct mixnets withstrong anonymity guarantees. In general, verifiable shuffles consist of two parties: a proverP and a verifier V. Most known verifiable shuffles operate in groups Z, for some prime p,and is used to shuffle ElGamal ciphertexts. In this setting, the problem is parameterized bythe following: prime p, generator g E Z, with prime multiplicative order q. There are twoinput sequences provided to the two parties: X1, . . . , X" and Y, . .. , Y" where Xi, Yi E ZpVi.Furthermore, there are constants c, d E Zq known only to P, but the commitments C = g'and D = gd are known to both parties. Given these inputs, verifiable shuffle requires P toprove to V that there exists a permutation 7r such that yjd = X without revealing anyinformation about 7r, c, and d. In mixnets, the mix that performs the shuffle is the proverand all other mixes in the network are the verifiers. Each mix uses random c and d tore-randomize the ciphertexts, and generates a zero-knowledge proof that the re-randomizedciphertexts form a valid permutation of the original inputs.
3.2 Authenticated Encryption
Authenticated encryption aims to provide confidentiality, integrity, and authenticity of theciphertexts for private key encryption schemes. In practice, an authenticated encryptionscheme AEnc consists of (1) symmetric encryption for privacy, and (2) message authen-tication code (MAC) for authenticity. There are three common ways to combine the twoprimitives:
" Encrypt-and-MAC: AEnck(M) = EnCk (M) MACk(m). Append the MAC of theplaintext to the ciphertext of the encryption. The receiver decrypts the ciphertextfirst, and verifies the MAC after.
" MAC-then-Encrypt: AEnck(M) = Enck(MIMACk(m)). Append the MAC of theplaintext before encryption, and encrypt them together. The receiver decrypts theciphertext first, and verifies the MAC afterwards.
21
* Encrypt-then-MAC: AEnck(m) = Enck(m)IIMACk(Enck(m)). Append the MAC ofthe ciphertext to the ciphertext. The receiver verifies MAC first, and decrypts theciphertext after.
Bellare and Namprempre 110] showed that if the symmetric encryption scheme is semanticallysecure and the MAC is unforgeable under chosen message attack, then Encrypt-then-MACprovides both privacy and authenticity. To meet the semantic security requirement, manysecure authenticated encryption schemes implement randomized encryption with a nonce.That is, AEnc uses a key k and a nonce r to encrypt a message m (i.e., AEnck,,(m)),and the guarantee is that if r is never repeated, the ciphertexts will not be distinguishable.Bellare and Namprempre also presented constructions of the other two modes of operation(albeit only theoretical, not realistic, constructions) that show that privacy is not preserved.
3.3 Riffle Deployment Model and Assumptions
Riffle Group XServer B
Server A Server C
Figure 3-1: Deployment model of Riffle
Riffle consists of a group of clients and servers, as illustrated in Figure 3-1. In each group,the clients form the anonymity set for the individuals in the group who wish to communicate
anonymously, and the servers collectively form an anonymity provider. For deployment,we assume that each server is run by separately administered entities, such as different
commercial or non-profit anonymity providers. Each client in a Riffle group is connected
to his or her preferred server called primary server (e.g., by location, hosting organization,
etc). This thesis focuses on a single group where the clients wish to communicate with the
other clients in the same group, not with other groups or the general Internet. Chapter 7
discusses other possible usage models of Riffle.
In this setting, we assume that the precious resource is the bandwidth between the clients
and the servers. Having a high bandwidth network between a small number of servers is
feasible, and already common (e.g., between data centers, large companies, etc). However,we cannot expect all clients to have high bandwidth connections to the servers (e.g., clients
connecting from locations with bad infrastructure, their mobile devices, 104.', Therefore,Riffle focuses on minimizing the client-server bandwidth requirements.
22
For the threat model, Riffle assumes an anytrust model, first presented in [531. Riffle
does not depend on a fraction of the servers being honest, or even a particular server, to
guarantee anonymity. We rely only on the assumption that there exists an honest server. In
particular, despite the clients being connected to only one server, we guarantee anonymity
of all clients in a group as long as there exists an honest server in the group. Apart from
having one honest server, we do not limit the adversaries' power in any way, and allow any
number of servers to collude. We do not, however, consider denial-of-service attacks from
the servers. We note that such an attack would be trivial for malicious servers since they
can simply drop all messages sent to them.
In this thesis, we assume that a group is already established, and focus on the operation
of a single group. That is, we do not consider how the clients select the servers to ensure
presence of an honest server, or how each client determines the appropriate group (anonymity
set) size. Previous works 130, 48] have explored these problems in detail, and the solutions
are applicable to Riffle as well.
We note that any communication over the network must be done through authenticated
and encrypted channels. Moreover, each message needs to be padded to a fixed length to
prevent privacy leakage through the size of the message. We assume this is done for all
communication, and omit this detail in the rest of the thesis to simplify presentation.
Table 3.1 summarizes the terminologies used throughout this thesis.
Table 3.1: Terminology
Terminology Description
C Set of Riffle clients
n The number of clients
S Set of Riffle servers
m The number of servers
b Size of a message
A Security parameter
23
Chapter 4
Riffle Architecture
In this chapter, we describe the Riffle architecture, which achieves near optimal bandwidthwith strong anonymity and minimal computation overhead. We first describe the systemand threat model, and our assumptions. We then give an overview of the Riffle protocol,and describe our hybrid shuffle and our application of private information retrieval. We alsodescribe how to hold a malicious client accountable, without any privacy leakage and per-formance overhead during regular operation. Finally, we analyze the asymptotic bandwidthrequirement, and provide a security argument for Riffle.
4.1 Riffle Protocol Overview
Riffle uses verifiable shuffle at its core for anonymous communication. At a high level, itconsists of two phases: setup and communication. The setup phase is used to share keys andhappens only once in the beginning. The communication phase consists of multiple rounds,similar to previous designs [53, 21], and clients upload and download messages to and fromthe servers in each round.
During the setup phase, every server Si generates a public key pi, and the clients down-load all pi's. To upload a message, a client Ci onion encrypts his or her message with allkeys pi's, and uploads the ciphertext to his or her primary server. To be fully traffic analysisresistant, all users are required to upload a message, even if they do not wish to communi-cate that round. Once all ciphertexts are uploaded, the first server Si collects the messages,and performs verifiable decryption and verifiable shuffle. It sends the proof of decryptionand shuffle along with the decrypted ciphertexts to all other servers, who will then verifythe proofs. Decryption, shuffling, and verification of proofs are repeated until the last serverfinally reveals all plaintexts to the servers. Once the plaintext messages are available, theservers broadcast all messages to their clients.
We note that this design already achieves a significant client-server bandwidth savingsfor uploads compared to DC-Net based designs [53, 23], both in practice and asymptotically;the ciphertext each client uploads is proportional only to the actual message.
4.2 Hybrid Verifiable Shuffle
Despite the significant bandwidth savings, the computational overhead of verifiable shufflesmakes it unsuitable for high bandwidth communication. As a concrete example, the state-of-the-art verifiable shuffle proposed by Bayer and Groth [8] takes more than 2 minutes
25
(a) Setup: Clients share keys with ev-ery server through verifiable shuffle, andshare secrets used for PIR.
(c) Shuffle: Servers authenticate, de-crypt, and shuffle the ciphertexts.
(b) Upload: Clients upload ciphertextlayer encrypted using authenticated en-cryption.
(d) Download: Clients download themessages either through PIR or broad-cast.
Figure 4-1: Anonymous communication in Riffle
to shuffle 100,000 ElGamal 1261 ciphertexts, each of which corresponds to a small message
(e.g., a symmetric encryption key). Furthermore, randomized public key encryption schemes
commonly used for verifiable shuffle, such as ElGamal, often result in ciphertext expansion
of at least 2, which halves the effective bandwidth.
To address the issues of traditional verifiable shuffles, we propose a new hybrid verifiable
shuffle. In hybrid shuffle, verifiable shuffle is performed only once to share encryption keys
that are used throughout an epoch, which consists of many rounds. Riffle uses authenticated
encryption 1101 with the shared keys for the communication phase.
During the setup phase, the clients share pairwise secrets (keys) with the servers using
verifiable shuffle, and each server Si retains permutation ?rj used during the verifiable shuffle.
To upload, C, now onion encrypts his or her message Mj for round r using authenticated
encryption with the keys shared during the setup phase, and sends the resulting ciphertext.
When Si collects the ciphertexts, it authenticates and decrypts the ciphertexts using the
shared keys, and shuffles them using the 7ri saved from the setup phase. It then sends the
shuffled ciphertexts (with one less layer of encryption) to the next server, and the servers
repeat the authentication, decryption, and shuffling until the last server finally reveals the
plaintext messages to all servers. The details of the hybrid verifiable shuffle are presented
in Algorithm 1.
It is important to note that the authenticated encryption in practice is symmetric 110],which lowers computational overhead by orders of magnitude compared to public key en-
cryption schemes. Furthermore, this eliminates expensive generation and verification of
26
Algorithm 1 Hybrid Shuffle
1. Sharing keys: Cj submits m onion-encrypted keys (secrets) to its primary server.
Each key kij for Si is encrypted with keys p1, ... , pi. Starting with Si, Si's decrypt a
layer on all submitted encrypted keys, and perform verifiable shuffle on all keys using
7ir, except for kij's . Si then retains the permutation xi and the keys kij for 1 < j 5 n,and sends the other (still encrypted) keys to the next server. Once this step finishes,kij will be at position ri_.1(... (iri(j)) .. .) in Si. This step happens only once per
epoch.
2. Upload: In round r, Cj onion-encrypts, using authenticated encryption, the message
MJ using keys kij's and the round number r as a nonce:
AEnc1,...,m(MJ) = AEnCkij,r(. . . (AEnckm,r (MJ)) ... ).
Each client uploads AEnc1 ,...,m(Mr) to its primary server.
3. Shuffle: Si collects all uploaded ciphertexts for round r. Starting with Si, all the
Si's verify authenticity of the ciphertexts and decrypt a layer using the shared secrets
and r. Then, using the permutation 7ri, Si shuffles the messages and sends them to
Si+1. Note that this means in Si, ciphertext AEnci,...,m(Mj) of Cj is at position
7ri(... (7r1(j)) .. .), which is where the matching key kij is. This is repeated until the
last server learns the permuted plaintext messages M,, and broadcasts M, to all other
servers. The final permutation of the messages is ir = 7rmnm-1(... (72(71)) ... ))-
proofs, and the servers simply need to authenticate the ciphertext to verify the shuffle.
4.3 Private Information Retrieval
There are situations where a client is not interested in the majority of the messages. For
example, consider two clients chatting through Riffle. Here, the clients can learn the expected
location of the messages after one round of communication (since our hybrid shuffle uses
the same 7r for all rounds in an epoch), but the clients are forced to download all available
messages due to broadcast. In these scenarios, we can use multi-server private information
retrieval (PIR) proposed by Chor et al. 1201 to improve the download efficiency. Let Ij be
the index (location) of the message client C, wants to download. To download the message,
C, first generates m - 1 random bit masks each of length n, and computes a mask such that
the XOR of all m masks results in a bit mask with 1 only at position Ij. Each mask is sent
to a server, and each server Si XORs the messages at positions with 1 in the bit mask to
generate its response rij for Cj. Finally, C, downloads all rij's and XORs them together to
learn the plaintext message.
Although this scheme is fairly efficient, we can further reduce the bandwidth overhead
using pseudo-random number generators (PRNGs). To avoid sending masks to all servers
every round, C, shares an initial mask with every server during the setup phase. Each server
then updates the mask internally using a PRNG every round, and C, only sends a mask to
its primary server Sp, to ensure the XOR of all masks has 1 only in position Ij.
To avoid downloading a message from every server, C, can ask Sp, to collect all responses
27
Algorithm 2 Private Information Retrieval
1. Sharing Secrets: Each client Cj shares two secrets mij and sij with each Si exceptfor the primary server Spj it is connected to. This step happens only once per epoch.
2. Mask Generation: Let index Ij be the index of the message C, wants to download.
Ci generates mpj, such that EQi mi = el, where ei, is a bit mask with 1 only in slotIj. C, then sends mjj to Spj.
3. Response Generation: Each server Si computes the response rij for C, by com-puting the XOR sum of the messages at the positions of 1's in the mij, and XORingthe secret sij. Specifically, response rij = (@e fm [V] - M) E sij, where M is the ithplaintext message. Then, the servers send rij to Sp , and S, computes rj:
ri = (@ rij) = ( mz ij[f]M) si
= M, 8 ( s
4. Message Download: C, downloads rj from Sp,,, and XORs all sij's to compute themessage of interest, MI, = rj E ( sij).
5. Update Secrets: Both C and S apply PRNG to their masks and secrets to get freshmasks and secrets.
and XOR them together. However, doing so naively results in Sp, learning the message C,downloaded. To prevent this privacy leakage, C, shares another set of secrets with everyserver during the setup, and each server XORs its secret into the response. Sp, can nowcollect the responses and XOR them, while learning nothing about which message C, isinterested in. Finally, C, can download the response from Spj (i.e., the message XORedwith the shared secrets), and remove the secrets to recover the message. Similar to themasks, the servers and the clients can internally update the secrets using a PRNG everyround. Since PIR hides which data was accessed, we note that sharing of masks and secretsneed not be through verifiable shuffle; we do not need to hide which secret is associatedwith which client. However, each client must perform PIR every round to remain resistantto traffic analysis attacks even if the client is not interested in any message. Algorithm 2demonstrates the exact details of the optimized PIR, and Figure 4-1 illustrates the finalRiffle protocol.
4.4 Accusation
In DC-nets [16] or DC-net based designs [53], it is easy for a malicious client to denial-of-service attack the whole network. Namely, any client can XOR arbitrary bits at any timeto corrupt a message from any user. This problem has led to complex, and often costly,solutions to hold a client accountable, such as trap protocols [50], trap bits [53], and verifiableDC-nets [23], or limited the size of the group to minimize the attack surface 130].
28
Without any precautions, a similar attack is possible in Riffle as well: a malicious clientcould send an unauthenticated ciphertext, and the system will come to a halt. However,unlike DC-nets, shuffling does not allow one client's input to corrupt any others' inputs.Leveraging this fact, Riffle provides an efficient way to hold a client accountable whileintroducing no overhead during regular operation and leaking no privacy of honest clients:when a server Si detects that a ciphertext at its position j is not authenticated correctly,it begins the accusation process by revealing j to Si-1. Si-1 then reveals j' = r- 1(j), andthis continues until Si reveals the client who sent the problem ciphertext.
This naive approach, however, could be abused by a malicious server: any server couldflag an error for any client, and that client will be deanonymized since other servers haveno way to verify the claim. To prevent this problem, Riffle requires all servers to commitall their shared secrets to all other servers during the setup phase. When a server Si wishesto accuse a client corresponding to position j, it reveals j, the associated secret kij, and theproblem ciphertext Eij to all upstream servers. The servers then run the following steps,where each step executes only if the previous step finishes successfully:
1. Server Si-1 confirms the Eij is indeed the ciphertext at location j.
2. All upstream servers (Sj for f < i) verify the secret kij with the commitment, andcheck that Eij is not authenticated correctly.
3. Si-1 reveals j' = r- (j), the corresponding secret ki_,y, and ciphertext Ei_1,j.
4. Upstream servers verify that decryption of Eji_,' using ki_1,' is Eij.
This is repeated until Si finally announces the real client.We note that a malicious client cannot perform a similar attack during a download
phase. When the messages are broadcast, there is nothing that a client can do to disruptother clients. When the clients use PIR, a client can only corrupt its own response sinceeach client generates its own masks and secrets.
4.5 Bandwidth Overhead
Riffle achieves near optimal bandwidth between a client and a server when sending a message.A client only uploads a ciphertext of layered authenticated encryption of size b + mA, whereb is the size of the message, m is the number of servers, and A is the size of the messageauthentication codes (MACs) [9] used with the authenticated encryption [101. If the client isinterested in only one message and the index is known, then the only overhead is sending themask of size n, the number of clients, to the primary server. The total upstream bandwidthis then b + mA + n, and the total downstream bandwidth is b per client per round. We notethat even though upstream bandwidth grows linearly with n, it only requires 1 bit per client.In the general case where the index of the message is not known, the download bandwidthis nb per client due to the broadcast, but the upload bandwidth decreases by n, the size ofthe mask.
The-bandwidth requirement between the servers, on the other hand, grows linearly withthe number of users. Every server must download n ciphertexts, and upload n ciphertexts(with one removed layer) to the next downstream server. The last server also needs to
,send the plaintexts to all other servers as well. Furthermore, though PIR reduces thedownload bandwidth overhead of the clients, the server to server bahdwidth increases: with
29
our optimizations, each server needs to send ri to the clients' primary servers. Therefore, theservers also need (m - 1)nb additional bandwidth for the PIR responses. In total, the serverto server bandwidth requirement per round is approximately 3(m - 1)nb. Even though thetotal grows linearly with the number of clients, we note that this is asymptotically betterthan previous anytrust systems. For example, Dissent [53] requires m(m - 1)nb to performDC-Net among the servers with all clients' data.
4.6 Security Analysis
We present the security argument for Riffle. The security of the setup phase of the Riffleprotocol follows immediately from the security of the verifiable shuffle. Any server down-stream from the honest server cannot correlate a secret to a particular client if the proof ofthe shuffle is zero-knowledge.
Verifiability and Zero-Knowledge of Hybrid Shuffle. Following the proposalof [101, let us assume that the authenticated encryption used by Riffle is semantically se-cure [31] and is secure against forgery. For Si to tamper with the inputs and not be detectedby Si+1, Si needs to generate outputs such that some of the outputs are not decryptions ofthe inputs, but still authenticate properly under the keys of Si+1. However, if ciphertextsare unforgeable and the keys of Si+1 are unknown, then Si cannot generate such outputs.We also use the round number, which is provided internally by Si+1, as the nonce to au-thenticated encryption to guarantee the freshness of the data, and stop replay attacks by Si.Therefore, shuffle and decryption of Si is valid if and only if Si+1 can authenticate all outputciphertexts. Moreover, if the encryption is semantically secure, then the outputs cannot becorrelated to the inputs, and thus the shuffle is zero-knowledge.
Anonymity of Uploads. Anonymity of the uploads relies on the fact that the shuffleis verifiable and zero-knowledge. The verifiability ensures that a malicious server cannotprovide the downstream servers with invalid inputs designed to deanonymize a client (e.g.,inputs "marked" to reveal the final position of a client). The zero-knowledge property ofthe shuffle guarantees that the inputs and the outputs cannot be correlated. Moreover,in the anytrust setting, it guarantees that the permutation irH of the honest server SH
is unknown to the adversary. Since 7rH is generated randomly independently of all otherpermutations, the final permutation 7r = 7rm(7m-1(... (r 1 ) .. .)) is also random and unknownto the adversary.
Anonymity of Downloads. The anonymity of the downloads depends on the securityof PIR and PRNGs. The security of PIR used in Riffle was proven by Chor et al. [20].Intuitively, if the masks are generated at random, then the mth mask cannot be inferredfrom the other m - 1 masks. The optimization to reduce mask sharing is secure if the PRNGis cryptographically secure, which says that the masks cannot be distinguished from trulyrandom masks. Finally, collecting the responses at one server is also secure as long as notall secrets are known to the malicious servers.
Anonymity of Accusation. Assume that a malicious server Si starts a false accu-sation by claiming that a ciphertext is mis-authenticated when it was actually correctlyauthenticated. Let SH be an honest server. There are two cases: (1) i < H (maliciousupstream server) or (2) i > H (malicious downstream server). In the first case, because theaccusation does not involve the honest server at all, the honest server will not reveal any ofits permutation. As noted before, one unknown random permutation is sufficient to keep allclients anonymous, and thus malicious upstream servers cannot deanonymize a client.
30
In the second case, eventually the honest server SH will see all ciphertexts EH+1, ... , Ei
flagged by the downstream servers along with the associated secrets. Apart from checking
the secrets against the commitments and the authenticity of Ej, the honest server also
checks the validity of the ciphertexts by (1) confirming EH+1 is the ciphertext given to
SH+1 and (2) checking that the decryptions of EH+1 matches EH+2, .. . , Ei. If SH finds any
mismatching ciphertexts, it will not reveal the permutation, thus preserving anonymity. If
the commitment scheme is binding and a ciphertext cannot decrypt to two different values
using one key, then the malicious servers cannot generate a correct decryption of EH+1 that
results in a mis-authenticated ciphertext. Therefore, no server can falsely accuse a client,and reveal the client's true identity.
31
Chapter 5
Applications
The efficiency of Riffle makes it suitable for high bandwidth applications like anonymous file
sharing, and latency sensitive applications like anonymous microblogging. In this chapter,
we present the details of the two applications in Riffle.
5.1 File Sharing
File sharing within a Riffle group is similar to that of BitTorrent [21, despite the differences
in the system model (server-client of Riffle vs. peer-to-peer of BitTorrent). When a client
wants to share a file, he or she generates a torrent file, which contains the hashes of all blocks
(the smallest unit used for file sharing) of the file. Then, using Riffle, the client uploads
the torrent file to the servers. The servers play the role of torrent trackers in BitTorrent,
and manage all available files in that group. In the simplest design, the file descriptors are
broadcast to all connected clients, and clients can locally choose a file to download. Since the
torrent files are fairly small even for large files (only a few 100KB in practice) and sharing
them is a one-time cost, we assume broadcasting them is inexpensive and focus on sharing
blocks.With the torrent files distributed, the clients can now share files anonymously using
Riffle. There are three major steps:
1. Requesting Blocks: Each C, identifies a file F of interest, and the hashes of the
blocks of the file HF via its torrent file. C, then requests a block of F by uploading
the hash of the block Hj (E HF to Sp using Riffle. When a client has no blocks to
request, he or she sends a random value as a (non-)request to remain traffic analysis
resistant. All requests H, are broadcast to the clients at the end of this step.
2. Uploading Blocks: Each C, checks if it possesses any requested block by checking
the hashes of the blocks it owns with H. If a matching block Mj is found, then C,
uploads Mj using Riffle. Once the plaintext blocks are available to the servers, each
server broadcasts the hashes of the available blocks H' .
3. Downloading Blocks: From Hj and H', C, learns the index Ij of the block he or
she requested. Using PIR, C, downloads the block.
The rounds can (and should) be pipelined for performance since each request is independent.
That is, we allow for many outtinding requests. Figure 5-1 summarizes our file sharing
protocol.
33
(a) Setup: Clients share the torrent filesanonymously.
(c) Upload: Each client uploads an en-crypted file based on the requests usingRiffle.
(b) Request: Each client requests a fileby uploading the hash of the file usingRiffle.
(d) Download: Each client downloadsthe file he or she requested using PIR.
Figure 5-1: Anonymous File Sharing Protocol
5.1.1 Non-Bijective Requests and Uploads
In a given round, it is possible that the mapping from requests to uploaders is not bijective.
It could be that
1. Multiple C2's request blocks from the same Ck,
2. Cj has no matching blocks for a round, and
3. Multiple C,'s have the same block
There are two possible sub-cases for Case 1. When the Cj's request the same block,the protocol can be carried out normally from Ck's perspective. When Ck needs to service
multiple blocks, this becomes a problem. One possible solution is to have multiple rounds of
uploading. All participants can detect the problem by checking the request hashes H, and
the hashes of available uploaded blocks. If any are missing, then more rounds of uploads
happen until all blocks are available. The client who does not possess any of the missing
blocks will upload a dummy block (e.g., identically 0 block) for anonymity. Another possible
solution is simply ignoring the missing content, and have the clients request the block again.
The first solution wastes bandwidth for those uploading dummy blocks, while the second
one does not guarantee fairness.
Case 2 happens when (1) some clients did not request a block, or (2) Case 1 happened.
In either situation, the clients with jio matching blocks must upload a dummy block to
remain resistant to traffic analysis
34
Case 3 is only a problem when several Cj's upload the same block when they actually
could have uploaded different blocks, resulting in wasted bandwidth. To mitigate this issue,
the servers could keep track of the number of clients who possess a block (i.e., the rarity of
the block). When the servers broadcast the request hashes, they can annotate the download
requests with their rarities, and the clients will upload the rarest blocks first.
5.1.2 Bandwidth Overhead in File Sharing
To share a block among users in peer-to-peer file sharing such as BitTorrent 12], each client
only needs to request and download a block. The client also may need to upload a file if it
receives a request. If each request is of size h, each client consumes h +b of upload bandwidth
(assuming it has a block to upload to another client), and b of download bandwidth. In Riffle
with PIR, there are three sources of bandwidth overhead for a client: (1) downloading the
requests, (2) uploading MACs of authenticated encryption, and (3) the mask uploaded to the
primary server. Thus, the total bandwidth between a client and a server is h + b + n + 2mA
of upload, and b + 2hn of download. We note that even though both bandwidths grow with
the number of clients, n and 2nh are much smaller than b in file sharing scenarios for a
reasonable number of clients and block size.
5.2 Microblogging
Microblogging in Riffle is similar to previous work 153, 211: each client submits a small
message per round that will be published, and all plaintext messages are broadcast to all
clients every round. In microblogging scenarios, Riffle uses broadcast instead of PIR because
the users are interested in many messages posted by different users; users would have to
perform p PIRs to download p messages, and therefore the benefits of PIR decreases as p
increases.
35
Chapter 6
Prototype Evaluation
6.1 Implementation
We have implemented a Riffle prototype in Go [4] using Go's native crypto library alongwith the DeDis Advanced Crypto library [1]. We used ElGamal encryption [26] usingCurve25519 [11] and Neff's shuffle [37] with Chaum-Pederson proofs [17] for verifiable shuffleand decryption. For authenticated encryption, we use Go's Secretbox implementation [5, 6],which internally uses Salsa20 [12] for encryption and Poly1305 [13] for authentication. Forthe pseudo-random number generator used to update the masks and the secrets neededfor PIR, we used keyed AES [39]. Our prototype supports two different modes of opera-tion: (1) file sharing, and (2) microblogging. We implemented the applications described inChapter 5.
6.2 Evaluation
To evaluate our system, we used the Emulab [3] testbed. Emulab provided a stable en-vironment to test our prototype, while allowing us to easily vary the group topology andserver configurations. The servers were connected to each other via a shared LAN, and theclients were distributed evenly among all servers. The clients connected to their primaryserver using one shared 100 Mbps link with 20ms delay, and the servers were connected toeach other through a 1 Gbps link with 10ms delay. Due to resource and time constraints,we used 50 physical nodes to simulate all clients. Each server was equipped with an 8-coreIntel Xeon E5530 Nehalem processor1 , and the majority of the client nodes were using adual core Intel Pentium 4 processor. As shown in the next sections, we have found thatthe "outdated" processors did not impact our results much. Figure 6-1 shows the testbedtopology used for the majority of our experiments.
6.2.1 File sharing
We simulated users sharing large files by first creating a pool of files, each of which was300MB. From the pool, each client chose one file to request and one file to share, and eachfile was divided into 256KB blocks, similar to BitTorrent's typical block size [43]. We have
'The most powerful machines with Sandy Bridge Xeon with 10 Gigabit Ethernet were not readily avail-able.
37
El
em/ 1d
Figure 6-1: Testbed topology.
experimented with different block sizes, and found that the block size changed the effectivebandwidth by less than 5% in all experiments as we varied it from 128KB to 1MB.
Figure 6-2a shows the total time spent and the effective bandwidth (the size of the sharedfile over the total time spent) when sharing a 300MB file.2 We also plotted the "ideal" Riffle,where we computed the expected time to share a 300MB file based on our analytic bandwidthmodel (Chapter 5.1.2), assuming computation is free and network bandwidth is perfect. Wehave created a similar model for Dissent 1531 as well for comparison.
In this experiments, Riffle performed competitively up to 200 clients, supporting 100KB/sof effective bandwidth per client. Our prototype matches the analytical model fairly closely,showing that the amount of computation is minimal, and the primary limitation is the serverto server bandwidth. If the servers were connected through 10 Gbps connections, we expectthe effective bandwidth to improve by an order of magnitude. The discrepancy between theidealized model and the prototype for larger numbers of clients is due to two factors: First,the ideal model ignores cost of computation, which increases linearly with the number ofclients. Though symmetric decryption is inexpensive, the cost becomes non-negligible whenthe number of clients is large. Second, the effective bandwidth per client decreases since weare sharing a 100 Mbps link among a few hundred clients.
When comparing to the ideal model of Dissent 1531, we see an order of magnitude speedup. This is expected since the client-server bandwidth overhead in Dissent grows linearlywith the number of clients, and each client only has access to a small amount of bandwidth.
We also tested the impact of different server configurations on performance. Figure 6-2bshows the effective bandwidth of 200 clients sharing 300MB files as we varied the number ofservers. As observed in our analytical model (Chapter 5.1.2), the server to server bandwidth
requirement grows with the number of servers, so the average bandwidth of the clients dropsas we increase the number of servers.
Finally, we evaluated the full system performance of Riffle, including the setup phase of
~Note that the y-axis of the time graph is in log scale to properly display the time taken for Dissent.
38
- Riffle .. Ideal Riffle I-deal Dissent
--- -- -
.... .. . . EP
1 1 2 2 0 3 - 450- --
'90 100 150 200 250 300 35 400 450 -500Number of clients
(a) Varying numbers of clients with 3 servers.
VC
10'
le
l0'
120
100
s0so40
20
- Riffle -.- dea1 Riffle -- Ideal Dissent
--- - - - - - - - - - - - - - -
.-.-----.
4 7
Number of servers
(b) 200 clients with varying numbers of servers.
Figure 6-2: Average time taken to share a 300MB file and effective bandwidth with different
group configurations.
EI-
10000
:o00
6000
4000
2000
- Setup - Upload- Request - Download
Numuer or ciinis
(a) 10MB file.
Figure 6-3: Breakdown of the total time
numbers of clients with 3 servers.
Number of clients
(b) 300MB file.
taken to share two different size files for varying
an epoch. Figure 6-3a and Figure 6-3b present the breakdown of the time spent to share a
10MB file and a 300MB file for different numbers of clients, which takes 40 and 1200 rounds
respectively with 256KB blocks. Specifically, the graph shows the time spent in the setup
phase (verifiable shuffle of keys, and sharing secrets for PIR), and the three steps of file
sharing rounds (request, upload, and download). When sharing a 10MB file, the verifiable
shuffle took more than half of the total time, proving to be quite costly. However, as
demonstrated by Figure 6-3b, the verifiable shuffle is a one-time operation in the beginning
of an epoch, and the cost becomes less significant for longer epochs. Moreover, the verifiable
shuffle used by the Riffle prototype 1371 is not the most efficient shuffle, and we can reduce
the setup time if we implement faster verifiable shuffles 181. We also note that with more
powerful machines, the verifiable shuffle should be much faster, as computation dominates
the overhead in this case.
39
10
EFloll
10'
4001
300
200
100
4
700
600
500
S 400
EF 300
200
100
ImNDU
I Infk
I-34 7
6.2.2 Microblogging
We simulated a microblogging scenario by creating tens to hundreds of thousands of clientsall of whom submitted a small message (160 byte or 320 byte) per round, and broadcastingthe messages at the end of each round. Due to physical resource limitation, we createdhundreds to thousands of "super" clients, each of which submitted hundreds of messages tosimulate a large number of users. For our microblogging experiment, we fixed the numberof servers at 3 as we varied the number of clients.
Figure 6-4 shows the average latency of posting one message. For latency sensitivemicroblogging, we can support up to 10,000 users with less than one second latency with160 byte messages. If the messages can tolerate some delay, we can support more than100,000 users with less than 10 seconds of latency. This figure also demonstrates the abilityof Riffle to exchange latency for message size: if we double the message size, the latency alsodoubles. Because Riffle is bandwidth and computation efficient, the latency is determinedsolely by the total number of bits of the messages in a round. This makes it easy to makea conscientious trade-off between the latency, the message size, and the number of clients.
Riffle can support an order of magnitude more clients compared to Dissent [531 forlatency sensitive microblogging (10,000 in Riffle vs. 1,000 in Dissent). For a large numberof clients, Riffle outperforms previous works [53, 8] by orders of magnitude. For instance,it takes 2 minutes just to verifiably shuffle messages of 100,000 users [8], ignoring networkcommunication. In Riffle, it takes a fraction of a second to shuffle and verify, and takes lessthan 10 seconds in total for 100,000 users, including the time spent in the network. Finally,though it is hard to compare Riffle to Riposte [21] directly due to lack of a database inRiffle, we expect Riffle to perform better as the database of microblog posts grows larger:the bandwidth requirement of Riposte clients grows with the size of the database, while thebandwidth requirement of Riffle clients remains the same.
25
- 160 byte message - - 320 byte message
0 - . ................ ...................... --.-...............
0-20000 40000 60000 80000 100000 120000
Number of clients
Figure 6-4: Average latency to share a microblogging post for different message sizes with 3servers.
40
Chapter 7
Discussion, Future Work, andConclusion
7.1 Dicussion
In this section, we discuss a few limitations and some aspects of Riffle we did not consider
in this thesis, and how they could be addressed in future work.
7.1.1 Alternate usage model
We have assumed that the clients want to communicate with others in the same group.
However, the clients may also want to interact with (1) clients in other Riffle groups, or (2)
the general Internet. In the first setting, we can use ideas from Herbivore [30], and connect
the servers of different groups to each other. The clients can then communicate with any
client in any group through the network of servers. In the second setting, we could use
the Riffle servers as "exit" nodes: each client can submit his or her message for someone
outside through Riffle, and the servers interact with the Internet on the client's behalf. For
example, each client uploads a BitTorrent request for a file to the servers. The servers then
participate in a regular BitTorrent protocol in the open Internet, and the clients can use PIR
to securely download their files. Moreover, the servers could use a highly scalable anonymity
system with a large anonymity set (e.g., Tor [25]) to expand the anonymity set of the clients
beyond just one Riffle group, while providing strong anonymity within the group.
7.1.2 Malicious Servers
Verifiability and zero-knowledge property of the hybrid shuffle (Chapter 4.6) prevent any
malicious server from deanonymizing a client, and make it possible for an honest server
to reveal a malicious server. However, since the identity of the honest server is unknown,the clients cannot rely on the claims of the servers. Concretely, there is no mechanism
to prevent m - 1 malicious servers from claiming that the one honest server is malicious.
Though this scenario should be rare with independently administered servers, we hope to
provide a mechanism to prevent such an attack in the future.
41
7.1.3 Server to server bandwidth
As noted in our bandwidth analysis and evaluation, the server to server bandwidth require-ment grows linearly with the number of clients. This quickly became the bottleneck ofRiffle, and limited the effective throughput (Chapter 6). One potential solution is for eachprovider to manage a "super" server that consists of smaller servers, and each smaller serverhandles a fraction of the traffic. Essentially, we can increase the server to server bandwidthby replicating the connections.
Though we hope to lower the actual overhead in future work, we believe that some of thecost is fundamental to the anytrust model proposed by 153] and assumed by Riffle. Namely,if there is only one honest server and the identity of the honest server is unknown, it seemsnecessary for all data to pass through all servers.
7.1.4 Network churns and group changes
In a realistic network, different clients can have drastically different connection speeds, andclients can leave or join the group dynamically. The default Riffle protocol requires thateveryone in a group submit a message every round to maintain anonymity, which makesthe overall latency as bad as the worst individual latency. Worse, any changes in the grouprequire Riffle to perform the expensive verifiable shuffle to create a new permutation for thenew set of active clients. To handle the case of clients dropping out, we currently ask eachclient to submit some cover traffic to the first server. When a client disconnects, we use thecover traffic to carry on the rounds with the active clients, and perform the verifiable shufflein the background to create a new permutation. However, this scheme wastes bandwidth,and does not allow for dynamic growth of the group. Tolerating network churns and changesin the group more effectively is deferred to future work.
7.1.5 Intersection attacks
A powerful adversary monitoring clients and network over longer periods of time can corre-late the presence of some messages with the online status of the clients [34]. For instance,if messages related to a protest are only posted when a particular client is online, then theadversary can link the messages to the client. Though Riffle does not protect against theseclass of attacks, it could benefit from prior work on mitigating these attacks [54].
7.2 Conclusion
Riffle is an anonymous communication system that provides traffic analysis resistance andstrong anonymity while keeping the bandwidth and computation overhead to minimum. Weachieved this by developing a new hybrid shuffle technique which avoids expensive verifiableshuffles in the critical path for uploading messages, and using private information retrieval fordownloading messages. We have also demonstrated through a prototype the bandwidth andcomputation efficiency of Riffle via anonymous file sharing and microblogging applications,and that strong anonymity can indeed scale to a large number of users.
42
Bibliography
[1] Advanced crypto library for the go language. https://github.com/DeDiS/crypto.
[2] Bittorrent. https://bittorrent.com.
[3] Emulab network emulation testbed. http://www.emulab.net/.
[4] The go programming language. https://golang.org/.
[51 Secret-key authenticated encryption. http://nacl.cr.yp.to/secretbox.html.
[61 Secretbox - godoc. https://godoc.org/golang.org/x/crypto/nacl/secretbox.
[7] Tor metrics portal. https://metrics.torproject.org.
[81 Stephanie Bayer and Jens Groth. Efficient zero-knowledge argument for correctness of
a shuffle. In Proceedings of the 31st Annual International Conference on Theory and
Applications of Cryptographic Techniques, EUROCRYPT'12, pages 263-280, Berlin,
Heidelberg, 2012. Springer-Verlag.
[9] Mihir Bellare, Ran Canetti, and Hugo Krawczyk. Keying hash functions for message
authentication. pages 1-15. Springer-Verlag, 1996.
[101 Mihir Bellare and Chanathip Namprempre. Authenticated encryption: Relations
among notions and&#xOOaO;analysis of the generic composition paradigm. J. Cryptol.,
21(4):469-491, September 2008.
[11] Daniel J. Bernstein. Curve25519: new diffie-hellman speed records. In In Public Key
Cryptography (PKC), Springer- Verlag LNCS 3958, page 2006, 2006.
[121 Daniel J. Bernstein. New stream cipher designs. chapter The Salsa20 Family of Stream
Ciphers, pages 84-97. Springer-Verlag, Berlin, Heidelberg, 2008.
[13] DanielJ. Bernstein. The poly1305-aes message-authentication code. In Henri Gilbert
and Helena Handschuh, editors, Fast Software Encryption, volume 3557 of Lecture Notes
in Computer Science, pages 32-49. Springer Berlin Heidelberg, 2005.
[14] Justin Brickell and Vitaly Shmatikov. Efficient anonymity-preserving data collection.
In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Dis-
covery and Data Mining, KDD '06, pages 76-85, New York, NY, USA, 2006. ACM.
[15] Xiang Cai, Xincheng Zhang, Brijesh Joshi, and Rob Johnson. Touching from a distance:
Website fingerprinting attacks and defenses. In Proceedings of the 19th A CM conference
on Computer and Communications Security (CCS 2012), October 2012.
43
[16] D. Chaum. The dining cryptographers problem: Unconditional sender and recipientuntraceability. J. Cryptol., 1(1):65-75, March 1988.
[17] David Chaum and Torben P. Pedersen. Wallet databases with observers. In Proceedingsof the 12th Annual International Cryptology Conference on Advances in Cryptology,CRYPTO '92, pages 89-105, London, UK, UK, 1993. Springer-Verlag.
[18] David L. Chaum. Untraceable electronic mail, return addresses, and digitalpseudonyms. Commun. ACM, 24(2):84-90, February 1981.
[19] Benny Chor and Niv Gilboa. Computationally private information retrieval (extendedabstract). In Proceedings of the Twenty-ninth Annual ACM Symposium on Theory ofComputing, STOC '97, pages 304-313, New York, NY, USA, 1997. ACM.
[20] Benny Chor, Eyal Kushilevitz, Oded Goldreich, and Madhu Sudan. Private informationretrieval. J. ACM, 45(6):965-981, November 1998.
[211 H. Corrigan-Gibbs, D. Boneh, and D. Mazieres. Riposte: An Anonymous MessagingSystem Handling Millions of Users. ArXiv e-prints, March 2015.
[22] Henry Corrigan-Gibbs and Bryan Ford. Dissent: Accountable anonymous group mes-saging. In Proceedings of the 17th ACM Conference on Computer and CommunicationsSecurity, CCS '10, pages 340-350, New York, NY, USA, 2010. ACM.
[23] Henry Corrigan-Gibbs, David Isaac Wolinsky, and Bryan Ford. Proactively accountableanonymous messaging in verdict. In Presented as part of the 22nd USENIX SecuritySymposium (USENIX Security 13), pages 147-162, Washington, D.C., 2013. USENIX.
[24] George Danezis, Roger Dingledine, David Hopwood, and Nick Mathewson. Mixminion:Design of a type iii anonymous remailer protocol. In In Proceedings of the 2003 IEEESymposium on Security and Privacy, pages 2-15, 2003.
[25] Roger Dingledine, Nick Mathewson, and Paul Syverson. Tor: The Second-GenerationOnion Router. In Proceedings of the 13th USENIX Security Symposium, pages 303-320,August 2004.
[26] T. Elgamal. A public key cryptosystem and a signature scheme based on discretelogarithms. Information Theory, IEEE Transactions on, 31(4):469-472, Jul 1985.
[27] Michael J. Freedman and Robert Morris. Tarzan: A peer-to-peer anonymizing networklayer. In Proceedings of the 9th ACM Conference on Computer and CommunicationsSecurity, CCS '02, pages 193-206, New York, NY, USA, 2002. ACM.
[28] Jun Furukawa and Kazue Sako. An efficient scheme for proving a shuffle. In In Proc.of CRYPTO dAZ01, pages 368-387. Springer-Verlag, 2001.
[29] Niv Gilboa and Yuval Ishai. Distributed point functions and their applications. InPhongQ. Nguyen and Elisabeth Oswald, editors, Advances in Cryptology - EURO-CRYPT 2014, volume 8441 of Lecture Notes in Computer Science, pages 640-658.Springer Berlin Heidelberg, 2014.
[30] Sharad Goel, Mark Robson, Milo Polte, and Emin Gun Sirer. Herbivore: A Scalableand Efficient Protocol for Anonymous Communication. Technical Report 2003-1890,Cornell University, Ithaca, NY, February 2003.
44
RRONOMMMMIN MOP
[311 Shafi Goldwasser and Silvio Micali. Probabilistic encryption & how to play mental
poker keeping secret all partial information. In Proceedings of the Fourteenth Annual
A CM Symposium on Theory of Computing, STOC '82, pages 365-377, New York, NY,USA, 1982. ACM.
[32] Dominik Herrmann, Rolf Wendolsky, and Hannes Federrath. Website fingerprinting:Attacking popular privacy enhancing technologies with the multinomial naïve-bayes classifier. In Proceedings of the 2009 ACM Workshop on Cloud Computing Secu-rity, CCSW '09, pages 31-42, New York, NY, USA, 2009. ACM.
[331 Stevens Le Blond, David Choffnes, Wenxuan Zhou, Peter Druschel, Hitesh Ballani,and Paul Francis. Towards efficient traffic-analysis resistant anonymity networks. InProceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM, SIGCOMM '13,pages 303-314, New York, NY, USA, 2013. ACM.
134] Nick Mathewson and Roger Dingledine. Practical traffic analysis: extending and resist-ing statistical disclosure. In 4th International Workshop on Privacy Enhancing Tech-nologies, May 2004.
[35] Prateek Mittal and Nikita Borisov. Shadowwalker: Peer-to-peer anonymous communi-cation using redundant structured topologies. In Proceedings of the 16th ACM Confer-
ence on Computer and Communications Security, CCS '09, pages 161-172, New York,NY, USA, 2009. ACM.
[36] Steven J. Murdoch and George Danezis. Low-cost traffic analysis of tor. In Proceed-
ings of the 2005 IEEE Symposium on Security and Privacy, SP '05, pages 183-195,Washington, DC, USA, 2005. IEEE Computer Society.
[37] C. Andrew Neff. A verifiable secret shuffle and its application to e-voting. In Proceedings
of the 8th ACM Conference on Computer and Communications Security, CCS '01, pages
116-125, New York, NY, USA, 2001. ACM.
[38] Lan Nguyen and Rei Safavi-naini. Breaking and mending resilient mix-nets. In Proc.
PETaAZ03, Springer-Verlag, LNCS 2760, pages 66-80. Springer-Verlag, LNCS, 2003.
[39] National Institute of Standards and Technology. Specification for the advanced en-
cryption standard (aes). In Federal Information Processing Standards Publication 197.2001.
[40] Andriy Panchenko, Lukas Niessen, Andreas Zinnen, and Thomas Engel. Website Fin-gerprinting in Onion Routing Based Anonymization Networks. In Proceedings of the
ACM Workshop on Privacy in the Electronic Society (WPES), pages 103-114, October
2011.
[41] Birgit Pfitzmann. Breaking an efficient anonymous channel. In In EUROCRYPT, pages
332-340. Springer-Verlag, 1995.
[42] Birgit Pfitzmann and Andreas Pfitzmann. How to break the direct rsa-implementation
of mixes. In Advances in Cryptology-EUROCRYPT '89 Proceedings, pages 373-381.
Springer-Verlag, 1990.
45
[43] Johan Pouwelse, Pawe Garbacki, Dick Epema, and Henk Sips. The bittorrent p2pfile-sharing system: Measurements and analysis. In Miguel Castro and Robbert vanRenesse, editors, Peer-to-Peer Systems IV, volume 3640 of Lecture Notes in ComputerScience, pages 205-216. Springer Berlin Heidelberg, 2005.
[44] Jean-Frangois Raymond. Traffic Analysis: Protocols, Attacks, Design Issues, and OpenProblems. In H. Federrath, editor, Proceedings of Designing Privacy Enhancing Tech-nologies: Workshop on Design Issues in Anonymity and Unobservability, pages 10-29.Springer-Verlag, LNCS 2009, July 2000.
[45] Michael K. Reiter and Aviel D. Rubin. Anonymous web transactions with crowds.Commun. ACM, 42(2):32-48, February 1999.
[46] Marc Rennhard and Bernhard Plattner. Introducing morphmix: Peer-to-peer basedanonymous internet usage with collusion detection. In Proceedings of the 2002 ACMWorkshop on Privacy in the Electronic Society, WPES '02, pages 91-102, New York,NY, USA, 2002. ACM.
[47] Len Sassaman, Bram Cohen, and Nick Mathewson. The pynchon gate: A secure methodof pseudonymous mail retrieval. In Proceedings of the 2005 ACM Workshop on Privacyin the Electronic Society, WPES '05, pages 1-9, New York, NY, USA, 2005. ACM.
[481 Emin Gun Sirer, Sharad Goel, and Mark Robson. Eluding carnivores: File sharing withstrong anonymity. In In Proc. of ACM SIGOPS European Workshop, 2004.
[49] Al Teich, Mark S. Frankel, Rob Kling, and Yaching Lee. Anonymous communicationpolicies for the internet: Results and recommendations of the aaas conference. Infor-mation Society, 15(2), 1999.
[50] Michael Waidner and Birgit Pfitzmann. The dining cryptographers in the disco: Uncon-ditional sender and recipient untraceability with computationally secure serviceability.In Proceedings of the Workshop on the Theory and Application of Cryptographic Tech-niques on Advances in Cryptology, EUROCRYPT '89, pages 690-, New York, NY, USA,1990. Springer-Verlag New York, Inc.
[51] Tao Wang and Ian Goldberg. Improved website fingerprinting on tor. In Proceedingsof the Workshop on Privacy in the Electronic Society (WPES 2013). ACM, November2013.
[52] Douglas Wikstrom. Four practical attacks for "optimistic mixing for exit-polls", 2003.
[53] David Isaac Wolinsky, Henry Corrigan-Gibbs, Bryan Ford, and Aaron Johnson. Dissentin numbers: Making strong anonymity scale. In Presented as part of the 10th USENIXSymposium on Operating Systems Design and Implementation (OSDI 12), pages 179-182, Hollywood, CA, 2012. USENIX.
[54] David Isaac Wolinsky, Ewa Syta, and Bryan Ford. Hang with your buddies to resistintersection attacks. In Proceedings of the 2013 ACM SIGSAC conference on Computerand Communications Security, CCS '13, pages 1153-1166, New York, NY, USA, 2013.ACM.
46