1 A Survey on Federated Learning Systems: Vision, Hype and ...multiple external and internal cloud...

15
1 A Survey on Federated Learning Systems: Vision, Hype and Reality for Data Privacy and Protection Qinbin Li, Zeyi Wen, Zhaomin Wu, Sixu Hu, Naibo Wang, Bingsheng He Abstract—Federated learning has been a hot research area in enabling the collaborative training of machine learning models among different organizations under the privacy restrictions. As researchers try to support more machine learning models with different privacy-preserving approaches, there is a requirement in developing systems and infrastructures to ease the development of various federated learning algorithms. Just like deep learning systems such as Caffe, PyTorch, and Tensorflow that boost the development of deep learning algorithms, federated learning systems are equivalently important, and face challenges from various issues such as unpractical system assumptions, scalability and efficiency. Inspired by federated systems in other fields such as databases and cloud computing, we investigate the existing characteristics of federated learning systems. We find that two important features for federated systems in other fields, i.e., heterogeneity and autonomy, are rarely considered in the existing federated learning systems. Moreover, we provide a thorough categorization for federated learning systems according to six different aspects, including data partition, machine learning model, privacy mechanism, communication architecture, scale of federation and motivation of federation. The categorization can help the design of federated learning systems as shown in our case studies. Lastly, we take a systematic comparison among the existing federated learning systems and present future research opportunities and directions. Index Terms—Federated Learning, Survey, Machine Learning, Privacy 1 I NTRODUCTION Many machine learning algorithms are data hungry, and in reality, data are dispersed over different organizations under the protection of privacy restrictions. Due to these factors, federated learning (FL) [1] has become a hot research topic in machine learning and data mining. For example, data of different hospitals are isolated and become “data islands”. Since the size or the characteristic of data in each data island has limitations, a single hospital may not be able to train a high quality model that has a good predictive accuracy for a specific task. Ideally, hospitals can benefit more if they can collaboratively train a machine learning model with the union of their data. However, the data cannot simply be shared among the hospitals due to various policies and regulations. Such phenomena on “data islands” are commonly seen in many areas such as finance, government, and supply chains. Policies such as General Data Protection Regulation (GDPR) [2] stipulate rules on data sharing among different organizations. Thus, it is challenging to develop a federated learning system which has a good predictive accuracy while protecting data privacy. Last updated on December 4, 2019. We try to keep this survey as an updated and complete handbook for federated learning systems. We will periodically update this survey. Please check out our latest version at this URL: https://arxiv.org/abs/1907.09693. Also, if you have any reference that you want to add into this survey, kindly drop Dr. Bingsheng He an email ([email protected]). Q. Li, Z. Wu, S. Hu, N. Wang, and B. He are with the National University of Singapore, Singapore (e-mail: {qinbin, zhaomin, sixuhu}@comp.nus.edu.sg; [email protected]; [email protected]. Z. Wen is with The University of Western Australia, Crawley, WA 6009, Australia (e-mail: [email protected]). Many efforts have been devoted to implementing federated learning algorithms to support effective machine learning models under the context of federated learning. Specifically, researchers try to support more machine learning models with different privacy-preserving approaches, including deep neutral networks [3]– [7], gradient boosting decision trees (GBDTs) [8], [9], logistics regression [10], [11] and support vector machine (SVM) [12]. For instance, Nikolaenko et al. [10] and Chen et al. [11] proposed approaches to conduct FL based on linear regression. Hardy et al. [13] implemented an FL framework to train a logistic regression model. Since GBDTs have become very successful in recent years [14], [15], the corresponding FLSs have also been proposed by Zhao et al. [8] and Cheng et al. [9]. Moreover, there are also many neural network based FLSs. Google has proposed a scalable production system which enables tens of millions of devices to train a deep neural network [5]. Yurochkin et al. [4] developed a probabilistic FL framework for neural networks by applying Bayesian nonparametric machinery. Several methods try to combine FL with machine learning techniques such as multi-task learning and transfer learning. Smith et al. [12] combined FL with multi-task learning to allow multiple parties to complete separate tasks. To address the scenario where the label information only exists in one party, Yang et al. [16] adopted transfer learning to collaboratively learn a model. Among the studies on customizing machine learning algorithms under the federated context, we have identified a few commonly used methods to provide privacy guarantees. One common method is to use cryptographic techniques [17] such as secure multi-party computation [18] and homomorphic encryption. The other popular method is differential privacy [8], which adds noises to the model parameters to protect the individual record. For example, Google’s federated learning system [17] adopts both secure aggregation and arXiv:1907.09693v3 [cs.LG] 3 Dec 2019

Transcript of 1 A Survey on Federated Learning Systems: Vision, Hype and ...multiple external and internal cloud...

Page 1: 1 A Survey on Federated Learning Systems: Vision, Hype and ...multiple external and internal cloud computing services. The concept of cloud federation enables further reduction of

1

A Survey on Federated Learning Systems:Vision, Hype and Reality for Data Privacy and

ProtectionQinbin Li, Zeyi Wen, Zhaomin Wu, Sixu Hu, Naibo Wang, Bingsheng He

Abstract—Federated learning has been a hot research area in enabling the collaborative training of machine learning models amongdifferent organizations under the privacy restrictions. As researchers try to support more machine learning models with differentprivacy-preserving approaches, there is a requirement in developing systems and infrastructures to ease the development of variousfederated learning algorithms. Just like deep learning systems such as Caffe, PyTorch, and Tensorflow that boost the development ofdeep learning algorithms, federated learning systems are equivalently important, and face challenges from various issues such asunpractical system assumptions, scalability and efficiency. Inspired by federated systems in other fields such as databases and cloudcomputing, we investigate the existing characteristics of federated learning systems. We find that two important features for federatedsystems in other fields, i.e., heterogeneity and autonomy, are rarely considered in the existing federated learning systems. Moreover, weprovide a thorough categorization for federated learning systems according to six different aspects, including data partition, machinelearning model, privacy mechanism, communication architecture, scale of federation and motivation of federation. The categorization canhelp the design of federated learning systems as shown in our case studies. Lastly, we take a systematic comparison among the existingfederated learning systems and present future research opportunities and directions.

Index Terms—Federated Learning, Survey, Machine Learning, Privacy

F

1 INTRODUCTION

Many machine learning algorithms are data hungry, and inreality, data are dispersed over different organizations under theprotection of privacy restrictions. Due to these factors, federatedlearning (FL) [1] has become a hot research topic in machinelearning and data mining. For example, data of different hospitalsare isolated and become “data islands”. Since the size or thecharacteristic of data in each data island has limitations, a singlehospital may not be able to train a high quality model that has agood predictive accuracy for a specific task. Ideally, hospitals canbenefit more if they can collaboratively train a machine learningmodel with the union of their data. However, the data cannotsimply be shared among the hospitals due to various policies andregulations. Such phenomena on “data islands” are commonly seenin many areas such as finance, government, and supply chains.Policies such as General Data Protection Regulation (GDPR) [2]stipulate rules on data sharing among different organizations. Thus,it is challenging to develop a federated learning system which hasa good predictive accuracy while protecting data privacy.

• Last updated on December 4, 2019. We try to keep this survey as anupdated and complete handbook for federated learning systems. We willperiodically update this survey. Please check out our latest version at thisURL: https://arxiv.org/abs/1907.09693. Also, if you have any reference thatyou want to add into this survey, kindly drop Dr. Bingsheng He an email([email protected]).

• Q. Li, Z. Wu, S. Hu, N. Wang, and B. He are with theNational University of Singapore, Singapore (e-mail: {qinbin,zhaomin, sixuhu}@comp.nus.edu.sg; [email protected];[email protected].

• Z. Wen is with The University of Western Australia, Crawley, WA 6009,Australia (e-mail: [email protected]).

Many efforts have been devoted to implementing federatedlearning algorithms to support effective machine learning modelsunder the context of federated learning. Specifically, researcherstry to support more machine learning models with differentprivacy-preserving approaches, including deep neutral networks [3]–[7], gradient boosting decision trees (GBDTs) [8], [9], logisticsregression [10], [11] and support vector machine (SVM) [12]. Forinstance, Nikolaenko et al. [10] and Chen et al. [11] proposedapproaches to conduct FL based on linear regression. Hardy etal. [13] implemented an FL framework to train a logistic regressionmodel. Since GBDTs have become very successful in recentyears [14], [15], the corresponding FLSs have also been proposedby Zhao et al. [8] and Cheng et al. [9]. Moreover, there are alsomany neural network based FLSs. Google has proposed a scalableproduction system which enables tens of millions of devices totrain a deep neural network [5]. Yurochkin et al. [4] developeda probabilistic FL framework for neural networks by applyingBayesian nonparametric machinery. Several methods try to combineFL with machine learning techniques such as multi-task learningand transfer learning. Smith et al. [12] combined FL with multi-tasklearning to allow multiple parties to complete separate tasks. Toaddress the scenario where the label information only exists in oneparty, Yang et al. [16] adopted transfer learning to collaborativelylearn a model.

Among the studies on customizing machine learning algorithmsunder the federated context, we have identified a few commonlyused methods to provide privacy guarantees. One common methodis to use cryptographic techniques [17] such as secure multi-partycomputation [18] and homomorphic encryption. The other popularmethod is differential privacy [8], which adds noises to the modelparameters to protect the individual record. For example, Google’sfederated learning system [17] adopts both secure aggregation and

arX

iv:1

907.

0969

3v3

[cs

.LG

] 3

Dec

201

9

Page 2: 1 A Survey on Federated Learning Systems: Vision, Hype and ...multiple external and internal cloud computing services. The concept of cloud federation enables further reduction of

2

differential privacy to enhance privacy guarantee.As there are common methods and building blocks for building

federated learning algorithms, it is possible to develop systemsand infrastructures to ease the development of various federatedlearning algorithms. Systems and infrastructures allow algorithmdevelopers to reuse the common building blocks, and avoid buildingalgorithms every time from scratch. Just like deep learning systemssuch as Caffe [19], PyTorch [20], and TensorFlow [21] that boostthe development of deep learning algorithms, federated learningsystems (FLSs) are equivalently important for the success offederated learning. However, building federated learning systemsface challenges from various issues such as unpractical systemassumptions, scalability and efficiency.

In this paper, we take a survey on the existing FLSs witha focus on drawing the analogy and differences to traditionalfederated systems in other fields such as databases [22] and cloudcomputing [23]. First, we consider heterogeneity and autonomy astwo important characteristics of FLSs, which are often ignored inthe existing designs in federated learning. Second, we categorizeFLSs based on six different aspects: data distribution, machinelearning model, privacy mechanism, communication architecture,scale of federation, and motivation of federation. These aspectscan direct the design of a federated learning system. Furthermore,based on these aspects, we compare the existing FLSs and discoverthe key limitations of them. Last, to make FL more practicaland powerful, we present future research directions to work on.We believe that systems and infrastructures are essential for thesuccess of federated machine learning. More work has to be carriedout to address the system research issues in security and privacy,efficiency and scalability.

There have been several surveys and white papers on federatedlearning. A seminal survey written by Yang et al. [16] introducedthe basics and concepts in federated learning, and further proposeda comprehensive secure federated learning framework. Later,WeBank [24] has published a white paper in introducing the back-ground and related work in federated learning and most importantlypresented a development roadmap including establishing local andglobal standards, building use cases and forming industrial dataalliance. The survey and the white paper mainly target at a relativelysmall parties which are typically enterprise data owners. Lim etal. [25] conducted a survey of federated learning specific to mobileedge computing. Li et al. [26] summarized challenges and futuredirections of federated learning in massive networks of mobile andedge devices.

In comparison with the previous surveys, the main contributionsof this paper are as follows. (1) By analogy with the previous feder-ated systems, we analyze two dimensions, including heterogeneityand autonomy, of federated learning systems. These dimensionscan play an important role in the design of FLSs. (2) We providea comprehensive taxonomy against federated learning systems onsix different aspects, including data distribution, machine learningmodel, privacy mechanism, communication architecture, scale offederation, and motivation of federation, which can be used todirect the design of FLSs. (3) We summary the characteristicsof existing FLSs and present the limitation of existing FLSs andvision for future generations of FLSs.

2 FEDERATED SYSTEMS

In this section, we review key conventional federated systems andpresent the federated learning systems.

2.1 Conventional Federated Systems

The concept of federation can be found with its counterparts in thereal world such as business and sports. The main characteristic offederation is cooperation. Federation not only commonly appears insociety, but also plays an important role in computing. In computerscience, federated computing systems have been an attractive areaof research.

Around 1990, there were many studies on federated databasesystems (FDBSs) [22]. An FDBS is a collection of autonomousdatabases cooperating for mutual benefit. As pointed out in aprevious study [22], three important components of an FDBSare autonomy, heterogeneity, and distribution. First, a databasesystem (DBS) that participates in an FDBS is autonomous, whichmeans it is under separate and independent control. The partiescan still manage the data without the FDBS. Second, differencesin hardware, system software, and communication systems areallowed in an FDBS. A powerful FDBS can run in heterogeneoushardware or software environments. Last, due to the existenceof multiple DBSs before an FDBS is built, the data distributionmay differ in different DBSs. An FDBS can benefit from the datadistribution if designed properly. Generally, FDBSs focus on themanagement of distributed data.

More recently, with the development of cloud computing,many studies have been done for federated cloud computing [23].A federated cloud (FC) is the deployment and management ofmultiple external and internal cloud computing services. Theconcept of cloud federation enables further reduction of costsdue to partial outsourcing to more cost-efficient regions. Resourcemigration and resource redundancy are two basic features offederated clouds [23]. First, resources may be transferred fromone cloud provider to another. Migration enables the relocation ofresources. Second, redundancy allows concurrent usage of similarservice features in different domains. For example, the data canbe broken down and processed at different providers followingthe same computation logic. Overall, the scheduling of differentresources is a key factor in the design of a federated cloud system.

2.2 Federated Learning Systems

While machine learning, especially deep learning, has attractedmany attentions again recently, the combination of federation andmachine learning is emerging as a new and hot research topic. Whenit comes to federated learning, the goal is to conduct collaborativemachine learning techniques among different organizations underthe restrictions on user privacy. Here we give a formal definition offederated learning systems.

We assume that there are N different parties, and each party isdenoted by Ti, where i ∈ [1, N ]. We use Di to denote the data ofTi. For the non-federated setting, each party Ti uses only its localdata Di to train a machine learning model Mi. The performanceof Mi is denoted as Pi. For the federated setting, all the partiesjointly train a model Mf while each party Ti protects its data Di

according to its specific privacy restrictions. The performance ofMf is denoted as Pf . Then, for a valid federated learning system,there exists i ∈ [1, N ] such that Pf > Pi.

Note that, in the above definition, we only require that thereexists any party that can achieve a higher model utility from theFLS. Even though some parties may not get a better model froman FLS, they may make an agreement with the other parties to askfor the other kinds of benifits (e.g., money).

Page 3: 1 A Survey on Federated Learning Systems: Vision, Hype and ...multiple external and internal cloud computing services. The concept of cloud federation enables further reduction of

3

Data management

Resource scheduling

Model training

Fig. 1. Federated database, federated cloud, and federated learning

2.3 Analogy among Federated Systems

Figure 1 shows the frameworks of federated database systems,federated clouds, and federated learning systems. There are somesimilarities and differences between federated learning systemsand conventional federated systems. On the one hand, the conceptof federation still applies. The common and basic idea is aboutthe cooperation of multiple independent parties. Therefore, theperspective of considering heterogeneity and autonomy among theparties can still be applied in FLSs. Furthermore, some factorsin the design of distributed systems are still important for FLSs.For example, how the data are shared between the parties caninfluence the efficiency of the systems. On the other hand, thesefederated systems have differences. While FDBSs focus on themanagement of distributed data and FCs focus on the schedulingof the resources, FLSs care more about the secure computationamong multiple parties. FLSs induce new challenges such as thealgorithm designs of the distributed training and the data protectionunder the privacy restrictions. With these findings, we analyze theexisting FLSs and figure out the potential future directions of FLSsin the following sections.

3 SYSTEM CHARACTERISTICS

While existing FLSs take a lot of concerns on user privacy andmachine learning models, two important characteristics of previousfederated systems (i.e., heterogeneity and autonomy) are rarelyaddressed.

3.1 Heterogeneity

We consider heterogeneities between different parties in threeaspects: data, privacy requirements and tasks.

3.1.1 Differences in data

parties always have different data distributions. For example, dueto the ozone hole, the countries in the Southern Hemisphere mayhave more skin cancer patients than the Northern Hemisphere.Thus, hospitals in different countries tend to have very differentdistributions of patients records. The difference in data distributionsmay be a very important factor in the design of FLSs. The partiescan potentially gain a lot from FL if they have various and partiallyrepresentative distributions towards a specific task. Furthermore,if party Alice has fully representative data for task A and partyBob has fully representative data for task B, Alice and Bob can

make a deal to conduct FLs for both tasks A and B to improve theperformance for task B and task A, respectively.

Besides data distributions, the size of data may also differ indifferent parties. FL should enable collaboration among partieswith different scales. Furthermore, for fairness, the parties thatprovide more data should benefit more from FL.

3.1.2 Differences in privacy restrictions

Different parties always have different policies and regulationof data sharing restrictions. For example, the companies in theEU have to comply with the General Data Protection Regulation(GDPR) [2], while China recently issued a new regulation namelythe Personal Information Security Specification (PISS). Further-more, even in the same region, different parties still have differentdetailed privacy rules. The privacy restrictions play an importantrole in the design of FLSs. Generally, the parties are able to gainmore from FL if the privacy restrictions are looser. Many studiesassume that the parties have the same privacy level [9], [11]. Thescenario where the parties have different privacy restrictions ismore complicated and meaningful. It is very challenging to designan FLS which can maximize the utilization of data of each partywhile not violating their respective privacy restrictions.

3.1.3 Differences in tasks

The tasks of different parties may also vary. A bank may wantto know whether a person can repay the loan but an insurancecompany may want to know whether the person will buy theirproducts. The bank and the insurance company can also adopt FLalthough they want to perform different tasks. Multiple machinelearning models may be learned during the FL process. Techniqueslike multi-task learning can also be adopted in FL [12].

3.2 Autonomy

The parties are often autonomous and under independent control.These parties are willing to share the information with the othersonly if they retain control. It is important to address the autonomyproperty when designing an FLS.

3.2.1 Association autonomy

A party can decide whether to associate or disassociate itself fromFL and can participate in one or more FLSs. Ideally, an FLS shouldbe robust enough to tolerate the entry or departure of any party.Thus, the FLS should not fully depend on any single party. Sincethis goal is hard to achieve, in practice, the parties can make anagreement to regularize the entry or departure to ensure that theFLS works properly.

3.2.2 Communication autonomy

A party should have the ability to decide how much informationto communicate with others. The party can also choose the sizeof data to participate in the FL. An FLS should have the ability tohandle different communication scale during the learning process.As we have mentioned in Section 3.1.1, the benefit of the partyshould be relevant to its contribution. The party may gain more ifit is willing to share more information, while the risk of exposinguser privacy may also be higher.

Page 4: 1 A Survey on Federated Learning Systems: Vision, Hype and ...multiple external and internal cloud computing services. The concept of cloud federation enables further reduction of

4

4 TAXONOMY

Through analysis of many application scenarios in building FLSs,we can classify FLSs by six aspects: data distribution, machinelearning model, privacy mechanism, communication architecture,scale of federation, and motivation of federation. These aspectsinclude common factors (e.g., data distribution, communicationarchitecture) in previous federated systems and unique consider-ation (e.g., machine learning model and privacy mechanism) forFLSs. Furthermore, these aspects can be used to direct the designof FLSs. Figure 2 shows the summary of the taxonomy of FLSs.

Let us explain the six aspects with an intuitive example. Thehospitals in different regions want to conduct federated learning toimprove the performance of prediction task on lung cancer. Then,the six aspects have to be considered to design such a federatedlearning system. First, we should consider how the patient recordsare distributed among hospitals. While the hospitals may havedifferent patients, they may also have different knowledge for acommon patient. Thus, we have to utilize both the non-overlappedinstances and features in federated learning. Second, we shouldfigure out which machine learning model should be adopted forsuch a task. For example, we may adopt gradient boosting decisiontrees which show a good performance on many classificationproblems. Third, we have to decide what techniques to use forprivacy protection. Since the patient records cannot be exposed tothe public, differential privacy is an option to achieve the privacyguarantee. Fourth, the communication architecture also matters. Wemay need a centralized server to control the updates of the model.Moreover, the number of hospitals and the computation power ineach hospital should also be studied. Unlike federated learningon mobile devices, we have a relatively small scale and wellstability of federation in this scenario. Last, we should consider theincentive for each party. A clear and straightforward motivation forthe hospitals is to increase the accuracy of lung cancer prediction.Then, it is important to achieve a highly accurate machine learningmodel by federated learning. Next, we discuss these aspects indetails.

4.1 Data DistributionBased on how data are distributed over the sample and featurespaces, FLSs can be typically categorized to horizontal, vertical,and hybrid FLSs [16].

In horizontal FL, the datasets of different organizations havethe same feature space but little intersection on the sample space.The system uses a server to aggregate the information from thedevices and adopts differential privacy [27] and secure aggregationto enhance privacy guarantee. Wake-word recognition [28], such as‘Hey Siri’ and ‘OK Google’, is a typical application of horizontalpartition because each user speaks the same sentence with adifferent voice.

In vertical FL, the datasets of different organizations havethe same sample space but differ in the feature space. Vaidya etal. has proposed multiple secure models on vertically partitioneddata, including association rule mining [29], k-means [30], naivebayes classifier [31] and decision tree [32]. For the vertical FLS, itusually adopts entity alignment techniques to collect the overlappedsamples of the organizations. Then the overlapped data are usedto train the machine learning model using encryption methods.Cheng et al. [9] proposed a lossless vertical FLS to enable partiesto collaboratively train gradient boosting decision trees. They useprivacy-preserving entity alignment to find common users among

two parties, whose gradients are used to jointly train the decisiontrees. Cooperation among government agencies can be treated as asituation of vertical partition. Suppose the department of taxationrequires the housing data of residents, which is stored in thedepartment of housing, to formulate tax policies. Meanwhile, thedepartment of housing also needs the tax information of residents,which is kept by the department of taxation, to adapt their housingpolicies. These two departments share the same sample space (i.e.all the residents) but each of them only has one part of features(e.g. housing data and tax data).

In many other applications, while existing FLSs mostly focuson one kind of partition, the partition of data among the partiesmay be a hybrid of horizontal partition and vertical partition. Let ustake cancer diagnosis system as an example. A group of hospitalswants to build a federated system for cancer diagnosis but eachhospital has different patients as well as different kinds of medicalexamination results. Transfer learning [33] is a possible solutionfor such scenarios. Liu et al. [3] proposed a secure federatedtransfer learning system which can learn a representation amongthe features of parties using common instances.

4.2 Machine Learning ModelsWhile there are many kinds of machine learning models, here weconsider three different kinds of models that current FLSs mainlysupport: linear models, decision trees, and neural networks.

Linear models include some basic statistical methods such aslinear regression and logistic regression [11]. There are many welldeveloped systems for linear regression and logistic regression [10],[13]. These linear models are basically easy to learn compared withother complex models (e.g., neural networks).

A tree based FLS is designed for the training for a single ormultiple decision trees (i.e., gradient boosting decision trees andrandom forests). GBDTs are especially popular recently and it has avery good performance in many classification and regression tasks.Zhao et al. [8] and Cheng et al. [9] proposed FLSs for GBDTs onhorizontally and vertically partitioned data, respectively.

A neural network based system aims to train neural networks,which are an extremely hot topic in current machine learning area.There are many studies on federated stochastic gradient descent [5],[7], which can be used to learn the parameters of neural networks.

Generally, for different machine learning models, the designs ofthe FLSs usually differ. It is still challenging to propose a practicaltree based or neural network based FLS. Moreover, due to the fastdeveloping of machine learning, there is a gap for FLSs to wellsupport the state-of-the-art models.

4.3 Privacy MechanismsPrivacy has been shown to be an important issue in machinelearning and there have been many attacks against machine learningmodels [34]–[39]. Also, there are many privacy mechanismsnowadays such as differential privacy [40] and k-anonymity [41],which provide different privacy guarantees. The characteristics ofexisting privacy mechanisms are summarized in the survey [42].Here we introduce three popular approaches that are adopted incurrent FLSs for data protection: model aggregation, cryptographicmethods, and differential privacy.

Model aggregation (or model averaging) [7], [43] is a widelyused framework to avoid the communication of raw data infederated learning. Specifically, a global model is trained byaggregating the model parameters from local parties. A typical

Page 5: 1 A Survey on Federated Learning Systems: Vision, Hype and ...multiple external and internal cloud computing services. The concept of cloud federation enables further reduction of

5

Fig. 2. Taxonomy of federated learning systems (FLSs)

algorithm is federated averaging [7] based on stochastic gradientdescent (SGD), which aggregates the locally-computed modelsand then update the global model in each round. PATE [44]combines multiple black-box local models to learn a global model,which predicts an output chosen by noisy voting among all ofthe local models. Yurochkin et al. [4] developed a probabilisticFL framework by applying Bayesian nonparametric machinery.They use a Beta-Bernoulli process informed matching procedureto general a global model by matching the neurons in the localmodels. Smith et al. [12] combined FL with multi-task learningto allow multiple parties to locally learn models against differenttasks. A challenge of model aggregation methods is to ensure thebetter utility of the global model than the local models.

Cryptographic methods such as homomorphic encryption [13],[45]–[53], and secure multi-party computation [17], [54]–[64] arewidely used in privacy-preserving machine learning algorithms.Basically, the parties have to encrypt their messages before sending,operate on the encrypted messages, and decrypt the encrypted out-put to get the result. Applying the above methods, the user privacyof federated learning systems can usually be well protected [65]–[69]. For example, secure multi-party computation [70] guaranteesthat all parties cannot learn anything except the output. However,such systems are usually not efficient and have a large computationand communication overhead.

Differential privacy [40], [71] guarantees that one single recorddoes not influence much on the output of a function. Many systemsadopt differential privacy [8], [72]–[78] for data privacy protection,where the parties cannot know whether an individual recordparticipates in the learning or not. By adding random noise to thedata or the model parameters [74], [77], [79], differential privacyprovides statistical privacy for individual records and protectionagainst the inference attack on the model. Due to the noises inthe learning process, such systems tend to produce less accuratemodels.

Note that the above methods are independent of each other, anda federated learning system can adopt multiple methods to enhancethe privacy guarantees [80].

While most of the existing FLSs adopt cryptographic techniquesor differential privacy to achieve well privacy guarantee, the

limitations of these approaches seem hard to overcome currently.While trying to minimize the side effects brought by these methods,it may also be a good choice to look for novel approaches to protectdata privacy and flexible privacy requirements. For example, Liu etal. [3] adopts a weaker security model [81], which can make thesystem more practical.

Related to privacy level, the threat models also vary in FLSs. Acommon assumption is that all parties are honest-but-curious [11],[16], [82], meaning that they follow the protocol but try to findout as much as possible about the data of the other parties. Insuch scenario, inference attacks may be conducted to extract userinformation. For example, membership inference attack [37]–[39] isan interesting kind of attack, where the attacker can infer whether agiven record was used as part of the training dataset given accessesto the machine learning model. Also, there may be maliciousparties [83], [83] in federated learning. One threat model is thatthe parties may conduct poison and adversarial attacks [84]–[86]to backdoor federated learning. Another threat model is that theparties may suffer Byzantine faults [87]–[90], where the partiesbehave arbitrarily badly against the system.

4.4 Communication Architecture

There are two major ways of communications in FLSs: centralizeddesign and distributed design. In the centralized design, the dataflow is often asymmetric, which means one server or a specificparty is required to aggregate the information (e.g., gradients) fromthe other parties and send back training result [5]. The parameterupdates on the global model are always done in this server. Thecommunication between the server and the local parties can besynchronous [7] or asynchronous [91], [92]. In a distributed design,the communications are performed among the parties [8] and everyparty is able to update the global parameters directly. GoogleKeyboard [93] is a case of centralized architecture. Google collectsdata from users’ android phones, collaboratively trains modelsusing the collected data and returns the prediction result to users.How to reduce the communication cost is a vital problem inthis kind of architectures. Some algorithms like deep gradientcompression [94] has been proposed to solve this problem.

Page 6: 1 A Survey on Federated Learning Systems: Vision, Hype and ...multiple external and internal cloud computing services. The concept of cloud federation enables further reduction of

6

While the centralized design is widely used in existing FLSs,the distributed design is preferred some aspects since concentratinginformation on one server may bring potential risks or unfairness.Recently, blockchain [95] is a popular distributed platform forconsideration. It is still challenging to design a distributed systemfor FL while each party is treated nearly equally in terms ofcommunication during the learning process and no trusted server isneeded. The distributed cancer diagnosis system among hospitalsis an example of distributed architecture. Each hospital shares themodel trained with data from their patients and gets the globalmodel for diagnosis. In distributed design, the major challenge isthat it is hard to design a protocol that treats every member fairly.

4.5 Scale of FederationThe FLSs can be categorized into two typical types by the scale offederation: private FLS and public FLS. The differences betweenthem lie on the number of parties and the amount of data stored ineach party.

In private FLS, there are usually a relatively small number ofparties and each of them has a relatively large amount of dataas well as computational power. For example, Amazon wants torecommend items for users by training the shopping data collectedfrom hundreds of data centers around the world. Each data centerpossesses a huge amount of data as well as sufficient computationalresources. One challenge that private FLS faces is how to efficientlydistribute computation to data centers under the constraint ofprivacy models [96].

In public FLS, on the contrary, there are a relatively largenumber of parties and each party has a relatively small amount ofdata as well as computational power [97]. Google Keyboard [93]is a good example for public FLS. Google tries to improve thequery suggestions of Google Keyboard with the help of federatedlearning. There are millions of Android devices and each deviceonly has its users’ data. Meanwhile, due to the energy consumptionconcern, the devices cannot be asked to conduct complex trainingtasks. Under this occasion, the system should be powerful enoughto manage huge number of parties and deal with possible issuessuch as the unstable connection between the device and the server.

4.6 Motivation of FederationIn real-world applications of federated learning, individual partiesneed the motivation to get involved in the federated system. Themotivation can be regulations or incentives. Federated learninginside a company or an organization is usually motivated byregulations. But in some sorts of cooperation, parties cannotbe forced to provide their data by regulations. Taking GoogleKeyboard [93] as an example, Google cannot prevent users whodo not provide data from using their app. But those who agree toupload input data may enjoy a higher accuracy of word prediction.This kind of incentives can encourage every user providing theirdata to improve the performance of the overall model. However,how to design such a reasonable protocol remains a challenge.

Incentive mechanism design can be very important for thesuccess of a federated learning system. There have been somesuccessful cases for incentive designs in blockchain [98], [99].The parties inside the system can be collaborators at the sametime competitors. Other incentive designs like [100], [101] areproposed to attract participants with high-quality data for federatedlearning. We expect different game theory models [102]–[104] andtheir equilibrium designs should be revisited under the federated

learning systems. Even in the case of Google Keyboard, the usersneed to be motivated to participate this collaborative learningprocess.

5 SUMMARY OF EXISTING STUDIES

In this section, we compare the existing studies on FLSs accordingto the aspects considered in Section 4.

5.1 Methodology

To discover the existing studies, we search keyword “federatedlearning” in Google Scholar1 and arXiv2. Here we only considerthe published studies in computer science community.

Since the scale of federation and the motivation of federationare problem dependent, we do not compare the studies by thesetwo aspects. For ease of presentation, we use “NN”, “DT” and“LM” to denote neural networks, decision trees and linear models,respectively. Also, we use “CM”, “DP” and “MA” to denotecryptographic methods, differential privacy and model aggregation,respectively. Note that the algorithms (e.g., federated stochasticgradient descent) in some studies can be used to learn manymachine learning models (e.g., logistic regression, neural networks).Thus, in the “model implementation” column, we present themodels that are already implemented in the corresponding papers.Moreover, in the “main focus” column, we indicate the major areathat the papers study on.

5.2 Individual Studies

Table 1 shows a summary of comparison among existing publishedstudies on federated learning, which mainly focus on individualalgorithms.

From Table 1, we have the following findings.First, most of the existing studies consider a horizontal data

partitioning. We suspect a part of the reason is that the experi-mental studies and benchmarking in horizontal data partitioningis relatively ready than vertical data partitioning. Also, in verticaldata partitioning, how to align data sets with different features isproblem dependent and thus can be very challenging. The scenarioswith the vertical data partitioning need to be further investigated.

Second, most approaches of the existing studies can only beapplied to one kind of machine learning model. While a particularlydesigned algorithm for one specific model may achieve highermodel utility, a general federated learning framework may be morepractical and easy-to-use.

Third, due to the property of stochastic gradient descent, themodel aggregation method can be easily applied to SGD and iscurrently the most popular approaches to implement federatedlearning without directly exposing the user data. We look forwardto more novel FL frameworks.

Lastly, the centralized design is the mainstream of currentimplementations. A trusted server is needed in their assumptions.It is more challenging to propose a framework without the demandof a centralized server.

In the following, we review those studies with the categorieson the main focus: algorithm design, efficiency improvement,application, and benchmark.

1. https://scholar.google.com/2. https://arxiv.org/

Page 7: 1 A Survey on Federated Learning Systems: Vision, Hype and ...multiple external and internal cloud computing services. The concept of cloud federation enables further reduction of

7

TABLE 1Comparison among existing published studies. LM denotes Linear Models. DM denotes Decision Trees. NN denotes Neural Networks. CM denotes

Cryptographic Methods. DP denotes Differential Privacy. MA denotes Model Aggregation.

Federated LearningStudies

mainfocus

datapartition

modelimplementation

privacymechanism

communicationarchitecture

Linear Regression FL [105]

federatedalgorithm

design

vertical

LM

CM centralizedLogistic Regression FL [13] LM

Federated Transfer Learning [3] NNSecureBoost [9]

DTFedXGB [106]

horizontal

Tree-based FL [8] DP distributedFederated GBDTs [107] hashingRidge Regression FL [10]

LM

CM

centralized

PPRR [11] CMBlockFL [108]

MA

Federated Collaborative Filtering [109]Federated SVRG [110]

Byzantine Gradient Descent [89]Federated MTL [12]

Variatinoal Federated MTL [111]

NN

Krum [88]Federated Averaging [7]

Federated Meta-Learning [112]LFRL [113]

Bayesian Nonparametric FL [4]Federated Generative Privacy [114] GAN

Secure Aggregation FL [17] CM, MADSSGD [115]

DP, MAClient-Level DP FL [116]FL-LSTM [27]

Protection Against Reconstruction [117]LM, NNAgnostic FL [118] MA

Fair FL [119] MAPATE [44] LM, DT, NN DP, MA

Hybrid FL [120] LM, DT, NN CM, DP, MACommunication Efficient FL [121]

efficiencyimprovement

NN

MA

Multi-Objective Evolutionary FL [122]On-Device ML [123]

Sparse Ternary Compression [124]FedCS [125]

application

DRL-MEC [126]FL for Keyboard Prediction [127]

FFL-ERL [128]FL for Vehicular Communication [129] GPD

Resource-Constrained MEC [130] LM, NNFLs Performance Evaluation [131] benchmark NN

LEAF [132]

5.2.1 Algorithm DesignSanil et al. [105] presented a secure regression model on verticalpartitioned data. They focus on the linear regression model andsecret sharing is applied to ensure privacy in their solution.

Hardy et al. [13] presented a solution for two-party verticalfederated logistic regression. They used entity resolution andadditively homomorphic encryption. They also study the impact ofentity resolution errors on learning.

Liu et al. [3] introduced an FL framework combined withtransfer learning for neural networks. They addressed a specificscenario where two parties have a part of common samples andall the label information are in one party. They used additivelyhomomorphic encryption to encrypt the model parameters to protectdata privacy.

Cheng et al. [9] implemented a vertical tree-based FLS calledSecureBoost. In their assumptions, only one party has the labelinformation. They used the entity alignment technique to getthe common data and then build the decision trees. Additivelyhomomorphic encryption is used to protect the gradients.

Liu et al. [106] proposed a federated extreme boosting learningframework for mobile crowdsensing. They adopted secret sharing

to achieve privacy-preserving learning of GBDTs.Zhao et al. [8] proposed a horizontal tree-based FLS. Each

decision tree is trained locally without the communications betweenparties. The trees trained in a party are sent to the next party tocontinuous train a number of trees. Differential privacy is used toprotect the decision trees.

Li et al. [107] exploited similarity information in the buildingof federated GBDTs by using locality-sensitive hashing [133]. Theyutilize the data distribution of local parties by aggregating gradientsof similar instances. Within a weaker privacy model comparedwith secure multi-party computation, their approach is effectiveand efficient.

Nikolaenko et al. [10] proposed a system for privacy-preservingridge regression. Their approaches combine both homomorphic en-cryption and Yao’s garbled circuit to achieve privacy requirements.An extra evaluator is needed to run the algorithm.

Chen et al. [11] proposed a system for privacy-preserving ridgeregression. Their approaches combine both secure summation andhomomorphic encryption to achieve privacy requirements. Theyprovided a complete communication and computation overheadcomparison among their approach and the previous state-of-the-art

Page 8: 1 A Survey on Federated Learning Systems: Vision, Hype and ...multiple external and internal cloud computing services. The concept of cloud federation enables further reduction of

8

approaches.Kim et al. [108] combined blockchain architecture with

federated learning. On the basis of federated averaging, they used ablockchain network to exchange the devices’ local model updates.

Ammad et al. [109] formulated the first federated collaborativefilter method. Based on a stochastic gradient approach, the item-factor matrix is trained in a global server by aggregating the localupdates.

Konevcny et al. [110] proposed federated SVRG algorithm,which is based on stochastic variance reduced gradient [134].They compared their algorithm with the other baselines likeCoCoA+ [135] and simple distributed gradient descent. Theirmethod can achieve better accuracy with the same communicationrounds for the logistic regression model.

Smith et al. [12] combined federated learning with multi-tasklearning (MTL) [136], [137]. Their method considers the issues ofhigh communication cost, stragglers, and fault tolerance for MTLin the federated environment.

Corinzia et al. [111] proposed a federaetd MTL method withnon-convex models. They treated the central server and the localparties as a Bayesian network and the inference is performed usingvariational methods.

Chen et al. [89] and Blanchard et al. [88] studied the scenariowhere the parties may be Byzantine and try to compromise the FLS.The former proposed an aggregation rule based on the geometricmedian of means of the gradients. The later proposed Krum, whichselects the gradients vector closest to the barycenter among theproposed parameter vectors.

McMahan et al. [7] implemented federated averaging onTensorFlow, focusing on improving communication efficiency.Methods they use on deep networks are effective on reducingcommunication costs compared to synchronized stochastic gradientdescent method.

Chen et al. [112] designed a federated meta-learning framework.Specifically, the users’ information are shared at the algorithm level,where a server updates the algorithm with feedback from the users.

Liu et al. [113] proposed a lifelong federated reinforcementlearning framework. Adopting transfer learning techniques, a globalmodel is trained to effectively remember what the robots havelearned.

Yurochkin et al. [4] developed a probabilistic FL frameworkby applying Bayesian nonparametric machinery. They used anBeta-Bernoulli process informed matching procedure to combinethe local models into a federated global model.

Triastcyn et al. [114] used generative adversarial networks(GAN) [138] to generate artificial data, which are then directlyused to train machine learning models. Their privacy guarantee isweaker than differential privacy.

Bonawitz et al. [17] applied secure aggregation to protect thelocal parameters on the basis of federated averaging.

Shokri et al. [115] proposed a distributed selective SGDalgorithm, where a fraction of local parameters are used to updatethe global parameters each round. Differential privacy is applicableto protect the uploaded parameters.

Geyer et al. [116] applied differential privacy in federatedaveraging on a client level perspective. They used the Gaussianmechanism to distort the sum of updates of gradients to protect awhole client’s dataset instead of a single data point.

McMahan et al. [27] deployed federated averaging in thetraining of Long Short-Term Memory (LSTM) recurrent neural

networks (RNNs). In addition, they used user-level differentialprivacy to protect the parameters.

Bhowmick et al. [117] applied local differential privacy toprotect the parameters in federated learning. To increase the modelutility, they considered a practical threat model that wishes todecode individuals’ data but has little prior information on them.Withing this assumption, they could get a much larger privacybudget.

Mohri et al. [118] proposed a new framework named agnosticfederated learning. Instead of minimizing the loss with respect tothe uniform distribution, which is an average distribution among thedata distributions from local clients, they tried to train a centralizedmodel optimized for any possible target distribution formed by amixture of the client distributions.

Li et al. [119] proposed a new objective taking the fairness intoconsideration. Specifically, if the variance of the performance ofthe model on the devices is smaller, then the model is more fair.Based on their objective, they proposed an extension of federatedSGD, which uses a dynamic step-size instead of a fixed step-size.

Papernot et al. [44] designed a general framework for federatedlearning, which can be applied to any model. In a black-box setting,they considered the local trained models as teacher models. Then,they learned a student model by noisy voting among all of theteachers.

Truex et al. [120] combined both secure multiparty computationand differential privacy for privacy-preserving federated learning.They used differential privacy to inject noises to the local updates.Then the noisy updates will be encrypted using the Pailliercryptosystem [139] before sent to the central server.

5.2.2 Efficiency ImprovementKonevcny et al. [121] proposed two ways, structured updates andsketched updates, to reduce the communication costs in federatedaveraging. Their methods can reduce the communication cost bytwo orders of magnitude with a slight degradation in convergencespeed.

Zhu and Jin [122] designed a multi-objective evolutionaryalgorithm to minimize the communication costs and the globalmodel test errors simultaneously. Considering the minimizationof the communication cost and the maximization of the globallearning accuracy as two objectives, they formulated federatedlearning as a bi-objective optimization problem and solve it by themulti-objective evolutionary algorithm.

Jeong et al. [123] proposed a federated learning frameworkfor devices with non-i.i.d. local data. They designed federateddistillation, whose communication size depends on the outputdimension but not on the model size. Also, they proposed adata augmentation scheme using a generative adversarial network(GAN) to make the training dataset become i.i.d.. Many otherstudies also design specialize approach for non-i.i.d. data. [140]–[143]

Sattler et al. [124] proposed a new compression frameworknamed sparse ternary compression (STC). Specifically, STCcompresses the communication using sparsification, ternarization,error accumulation and optimal Golomb encoding. Their method isrobust to non-i.i.d. data and large numbers of parties.

There are also other studies working on reducing commu-nication cost of federated learning. Yao et al. [144] adopted atwo-stream model with MMD (Maximum Mean Discrepancy)constraint instead of the single model to be trained on devices instandard federated learning settings to reduce the communication

Page 9: 1 A Survey on Federated Learning Systems: Vision, Hype and ...multiple external and internal cloud computing services. The concept of cloud federation enables further reduction of

9

cost. Zhu et al. [145] proposed a multi-access Broadband AnalogAggregation (BAA) design for communication-latency reductionin federated learning based on the concept of over-the-hair com-putation [146]. Later, Amiri et al. [147] added error accumulationand gradient sparsification with over-the- air computation to geta faster convergence speed. Caldas et al. [148] proposed lossycompression and federated dropout to reduce server-to-participantcommunication costs.

5.2.3 Application Studies

Nishio et al. [125] implemented federated averaging in practicalmobile edge computing (MEC) frameworks. They used an operatorof MEC framworks to manage the resources of heterogeneousclients. Federated learning is promising in edge computing [149]–[152].

Wang et al. [126] adopted federated averaging to implementdistributed deep reinforcement learning (DRL) in mobile edgecomputing system. The usage of DRL and FL can effectivelyoptimize the mobile edge computing, caching, and communication.

Hard et al. [127] applied federated learning in mobile keyboardprediction. They adopted the federated averaging method to learn avariant of LSTM.

Ulm et al. [128] implemented federated learning in Erlang(FFL-ERL), which is a functional programming language. Basedon federated averaging, they created a functional implementationof an artificial neural network in Erlang.

Samarakoon et al. [129] first adopted federated learning in thecontext of ultra reliable low latency communication. To modelreliability in terms of probabilistic queue lengths, they used modelaveraging to learn a generalized Pareto distribution (GPD).

Wang et al. [130] performed federated learning on resource-constrained MEC systems. They address the problem of how toefficiently utilize the limited computation and communicationresources at the edge. Using federated averaging, they implementedmany machine learning algorithms including linear regression,SVM, and CNN.

While edge computing is an appropriate scenario to applyfederated learning, federated learning has also been applied inmany other areas. Brisimi et al. [153] developed models to predicthospitalizations for cardiac events using a federated distributedalgorithm. They developed a general decentralized optimizationframework enabling multiple data holders to collaborate and con-verge to a common predictive model, without explicitly exchangingraw data. Other applications about health AI are shown in [154].Verma et al. [155] help to foster collaboration across multipleGovernment agencies with federated learning algorithms.

5.2.4 Benchmarking Studies

Nilsson et al. [131] conducted performance comparison amongthree different federated learning algorithms, including federatedaveraging [7], federated stochastic variance reduced gradient [121],and CO-OP [156]. They executed experiments using a multi-layerperceptron on the MNIST dataset with both i.i.d. and non-i.i.d.partitions of the data. Their experiments showed that federatedaveraging can achieve better performance on MNIST than the othertwo algorithms.

Caldas et al. [132] proposed a LEAF, a modular benchmarkfor federated learning. LEAF includes public federated datasets,an array of statistical and systems metrics, and a set of referenceimplementations.

5.3 Open Source Systems

Table 2 shows a summary of comparison among some representa-tive FLSs. Here we only consider open source systems that supportsdata protection.

From Table 2, we have the following findings.First, most of the current open-sourced systems only imple-

mented one kind of partition methods and one or two kinds ofmachine learning models. We think that many systems are still inan early stage, and we expect more development efforts will beput in from the community. A general and complete FLS whichcan support multiple kinds of data partitioning or machine learningmodels is still on the way.

Second, despite the costly processing of cryptographic methods,they seem to be the most popular technique to be used to provideprivacy guarantees. However, there is no final conclusion as towhich approach is better in terms of system performance andmodel utility. It should depend on the privacy restrictions.

Last, many systems still adopt a centralized communicationdesign since they need a server to aggregate model parameters.From the system perspective, this introduce single point offailure/control, and we believe that a more advanced mechanism toavoid this centralized design is needed.

Now we give more details on these FLSs.Google proposed a scalable FLS which enables over tens of

millions of Android devices learning a deep neural network basedon TensorFlow [5]. In their design, they use a server to aggregate themodel updates with federated averaging [27], which are computedby the devices locally in synchronous rounds. Differential privacyand secure aggregation are used to enhance privacy guarantees.

The OpenMined community introduced an FL system namedPySyft [6] built on PyTorch, which applies both differential privacyand multi-party computation (MPC) to provide privacy guarantees.They adopts SPDZ [158] and moment accountant [74] methodsrespectively for MPC and DP in a federated learning context.

Corbacho implemented PhotoLabeller, which gives a practicaluse case of FLS. It uses Android phones to train models locally,and uses federated averaging on the server to aggregate the model.Finally the trained model is shared across every client for photolabeling.

WeBankFinTech company implemented FL platform FederatedAI Technology Enabler (FATE), which supports multiple kinds ofdata partitioning and algorithms. The secure computation protocolsare based on homomorphic encryption and multi-party computation.It has supported many machine learning algorithms includinglogistic regression, gradient boosting decision trees, etc.

6 CASE STUDY

There are many interesting applications for federated learningsystems. We review two case studies to examine the system aspectssurveyed above.

6.1 Keyboard Word Suggestion

Here we analyze the keyboard word suggestion application andidentify which type of FLS is suitable for such applications.Keyboard word suggestion aims to predict which word usersinput next according to their previous input words [93]. Keyboardword suggestion models can be trained in the federated learningmanner, such that more user input data can be exploited to

Page 10: 1 A Survey on Federated Learning Systems: Vision, Hype and ...multiple external and internal cloud computing services. The concept of cloud federation enables further reduction of

10

TABLE 2Comparison among some existing FLSs. The notations used in this table are the same as Table 1.

Federated LearningSystems

datapartition

modelimplementation

privacymechanism

communicationarchitecture

Google TensorFlow Federated (TFF) [5], [157]horizontal LM, NN CM, DP, MA centralizedPySyft [6]

PhotoLabeller3 NN MAFederated AI Technology Enabler (FATE)4 hybrid LM, DT, NN CM distributed

TABLE 3GBoard requirements

Horizontal PartitioningNeural NetworksPublic FLCentralized CommunicationPrivacy ProtectionIncentive Motivated

improve the model quality, whileobeying the user data regulationssuch as GDPR [159], [160] inEurope, PDPA [161] in Singaporeand CCPA [162] in the US. Fortraining such word suggestionmodels, the training data (i.e.,users’ input records) is “parti-tioned” horizontally, where eachuser has her input data on her

own device (e.g., mobile phone). Furthermore, many of the wordsuggestions models are based on neural networks [93] [163], so thedesired FLS should support neural networks. Privacy protectionmechanisms need to be enforced in the word suggestion modeltraining as well, because the user data or the trained modelis synchronized with those of other users. In keyboard wordsuggestion applications, the keyboard service providers cannotprevent users from using their services for users’ refusal of sharingdata. So, incentive mechanisms need to be adopted in the FLS,such that more users are willing to participate to improve the wordsuggestion model. Finally, the training of the word suggestionmodel is performed in a public FLS, since the user data isdistributed globally.

Next, we take a concrete example, Google Keyboard (GBoard),of word suggestion applications to identify which FLS is wellsuited to. According to the analysis above and existing federatedlearning systems shown in Table 2, Google TensorFlow Federated(TFF) may be the underlying FLS for GBoard. The communicationarchitecture of the federated learning for GBoard is centralized,since Google cloud collects data from Android phones all aroundthe world and then provides prediction service to users. TFFcan handle horizontally partitioned data and can support neuralnetworks for training models. Meanwhile, TFF provides differentialprivacy to ensure privacy guarantee. TFF is centralized and cansustain millions of users in the public FL setting. Finally, Googlecan encourage users to share their data by granting higher predictiveaccuracy to them as incentives. In summary, based on our analysis,TFF tends to be the underlying FL model for GBoard.

6.2 Nationwide Learning Health System

TABLE 4Nationwide Learning Health

System Requirements

Hybrid PartitioningNo specific ModelsPrivate FLDistributed CommunicationPrivacy ProtectionPolicy Motivated

In this case study, we discussthe application of FL in buildinghealth care systems. The visionof a nationwide learning healthsystem has been introduced since2010 [164]. This system aims toexploit data from research insti-tutes, hospitals, federal agenciesand many other parties to improvehealth care of the nation. In suchscenario, the health care data is

partitioned both horizontally and vertically: each party containshealth care data of residents for a specific purpose (e.g., patienttreatment), but the features used in each party are different.Moreover, the health care data is strictly protected by laws. Inhealth care systems, besides neural networks, a wide range ofmachine learning algorithms are commonly used [165] [166]. Thelearning process should be distributed, because each party sharesdata, trains the model on the data cooperatively, and gets the finallearned model. Due to the strict privacy protection and the natureof health care systems, it is a private federated learning systemwhere a few parties possess a huge amount of data, while theothers may hold only a small amount. Finally, once the system isestablished, the data sharing can be guaranteed by policies of thenational government (i.e., nationwide learning health systems canbe governmental policy motivated).

Based on the analysis above, Federated AI Technology Enabler(FATE) is the potential choice for this application. It can supportboth horizontal and vertical data partitioning. FATE providesmultiple machine learning algorithms such as decision trees andlinear models. Meanwhile, FATE has cryptographic techniquesto guarantee the protection of user data, and supports distributedcommunication architecture as well. These properties of FATE wellmatch the requirements of this application.

7 VISION AND CONCLUSION

In this section, we show interesting directions to work on in thefuture, and conclude this paper.

7.1 Future Research Directions

7.1.1 (Re)-Invent Federated Learning ModelsWhen designing an FLS, many existing studies have tried to supportmore machine learning models and come up with new efficientapproaches to protect privacy while not sacrificing the accuracy ofthe learned model much.

7.1.2 Dynamic schedulingAs we discussed in Section 3.2, the number of parties may not befixed during the learning process. However, the number of partiesis fixed in many existing systems and they do not consider thesituations where there are entries of new parties or departures ofthe current parties. The system should support dynamic schedulingand have the ability to adjust its strategy when there is a change inthe number of parties. There are some studies addressing this issue.For example, Google TensorFlow Federated [5] can tolerate thedrop-outs of the devices. Also, the emergence of blockchain [95]can be an ideal and transparent platform for multi-party learning.More efforts need to be done in this direction.

3. https://github.com/mccorby/PhotoLabeller4. https://github.com/FederatedAI/FATE

Page 11: 1 A Survey on Federated Learning Systems: Vision, Hype and ...multiple external and internal cloud computing services. The concept of cloud federation enables further reduction of

11

7.1.3 Diverse privacy restrictionsLittle work has considered the privacy heterogeneity of FLSs, asshown in Section 3.1.2. The existing systems adopt techniques toprotect the model parameters or gradients for all the parties onthe same level. However, the privacy restrictions of the partiesusually differ in reality. It would be interesting to design anFLS which treats the parties differently according to their privacyrestrictions. The learned model should have a better performanceif we can maximize the utilization of data of each party while notviolating their privacy restrictions. The heterogeneous differentialprivacy [167] may be useful in such settings.

7.1.4 Intelligent benefitsIntuitively, one party can gain more from the FLS if it contributesmore information. A simple solution is to make agreements amongthe parties such that some parties pay for the other parties whichcontribute more information. Representative incentive mechanismsneed to be developed.

7.1.5 BenchmarkAs more FLSs are being developed, a benchmark with represen-tative data sets and workloads is quite important to evaluate theexisting systems and direct future development. Caldas et al. [132]proposed LEAF, which is a benchmark including federated datasets,an evaluation framework, and reference implementations. Hao etal. [168] presented a computing testbed named Edge AIBench withfederated learning support, and discussed four typical scenarios andsix components for measurement included in the benchmark suite.Still, more applications and scenarios are the key to the success ofFLSs.

7.1.6 System architectureLike the parameter server in deep learning which controls theparameter synchronization, some common system architecturesare needed to be investigated for FL. Although Yang et al. [16]presented three architectures for different partition methods of data,we need more complete architectures in terms of learning modelsor privacy levels.

Communication costs can be a significant issue in the perfor-mance of training a federated learning model [169], [170].

7.1.7 Data life cyclesLearning is simply one aspects of a federated system. A data lifecycle consists of multiple stages including data creation, storage,use, share, archive and destroy. For the data security and privacy ofthe entire application, we need to invent new data life cycles underfederated learning context. Although data sharing is clearly oneof the focused stage, the design of federated learning system alsoaffects other stages. For example, data creation may help to preparethe data and features that are suitable for federated learning.

7.1.8 Data labelsMost existing studies have focused on labelled data sets. However,in practice, training data sets may not have labels, or have poisonedand mistaken labels, which can lead to runtime mispredictions.The poisoned and mislabels can come from unreliable datacollection process such as in mobile and edge environments, andmalicious parties. There are still many challenges to address thoseissues of data poisoning and backdoor attacks. Along this line,CalTrain [171] presents a multi-party collaborative learning system

to fulfill modle accountability in Trusted Execution Environment(TEE) environments. Ghosh et al. [172] considers the modelrobustness upon Byzantine parties (or abnormal and adversarialparties). Another potential approach can be blockchain [173], [174].Preuveneers et al. [173] proposed a permissioned blockchain-basedfederated learning method to monitor the incremental updates toan anomaly detection machine learning model.

7.1.9 Federated learning in domainsInternet-of-thing: Security and privacy issues have been a hotresearch area in fog computing and edge computing, due to theincreasing deployment of Internet-of-thing applications. For moredetails, readers can refer to some recent surveys [175]–[177].Federated learning can be one potential approach in addressing thedata privacy issues, while still offering reasonable good machinelearning models [25], [178].

The additional key challenges come from the computationand energy constraints. The mechanisms of privacy and securityintroduces runtime overhead. For example, Jiang et al. [179] appliesindependent Gaussian random projection to improve the dataprivacy, and then the training of a deep network can be too costly.The author needs to develop new resource scheduling algorithm tomoves the workload to the nodes with more computation power.Similar issues happen on other environments such as vehicle-to-vehicle networks [180], [181].

7.2 Conclusion

Many efforts have been devoted to developing federated learningsystems (FLSs). A complete overview and summary for existingFLSs is important and meaningful. Inspired by the previousfederated systems, we have shown that heterogeneity and auton-omy are two important factors in the design of practical FLSs.Moreover, with six different aspects, we provide a comprehensivecategorization for FLSs. Based on these aspects, we also presentthe comparison on features and designs among existing FLSs. Moreimportantly, we have pointed out a number of opportunities, rangingfrom more benchmarks to integration of emerging platforms such asblockchain. Federated learning systems will be an exciting researchjourney, which call for the effort from machine learning, systemand data privacy communities.

ACKNOWLEDGMENTS

This work is supported by a MoE AcRF Tier 1 grant (T1251RES1824), an SenseTime Young Scholars Research Fund, anda MOE Tier 2 grant (MOE2017-T2-1-122) in Singapore.

REFERENCES

[1] E. Shi, T.-H. H. Chan, E. Rieffel, and D. Song, “Distributed private dataanalysis: Lower bounds and practical constructions,” ACM Transactionson Algorithms (TALG), vol. 13, no. 4, p. 50, 2017.

[2] J. P. Albrecht, “How the gdpr will change the world,” Eur. Data Prot. L.Rev., vol. 2, p. 287, 2016.

[3] Y. Liu, T. Chen, and Q. Yang, “Secure federated transfer learning,” arXivpreprint arXiv:1812.03337, 2018.

[4] M. Yurochkin, M. Agarwal, S. Ghosh, K. Greenewald, T. N. Hoang,and Y. Khazaeni, “Bayesian nonparametric federated learning of neuralnetworks,” arXiv preprint arXiv:1905.12022, 2019.

[5] K. Bonawitz, H. Eichner, W. Grieskamp, D. Huba, A. Ingerman,V. Ivanov, C. Kiddon, J. Konecny, S. Mazzocchi, H. B. McMahan et al.,“Towards federated learning at scale: System design,” arXiv preprintarXiv:1902.01046, 2019.

Page 12: 1 A Survey on Federated Learning Systems: Vision, Hype and ...multiple external and internal cloud computing services. The concept of cloud federation enables further reduction of

12

[6] T. Ryffel, A. Trask, M. Dahl, B. Wagner, J. Mancuso, D. Rueckert, andJ. Passerat-Palmbach, “A generic framework for privacy preserving deeplearning,” arXiv preprint arXiv:1811.04017, 2018.

[7] H. B. McMahan, E. Moore, D. Ramage, S. Hampson et al.,“Communication-efficient learning of deep networks from decentralizeddata,” arXiv preprint arXiv:1602.05629, 2016.

[8] L. Zhao, L. Ni, S. Hu, Y. Chen, P. Zhou, F. Xiao, and L. Wu, “Inprivatedigging: Enabling tree-based distributed data mining with differentialprivacy,” in INFOCOM. IEEE, 2018, pp. 2087–2095.

[9] K. Cheng, T. Fan, Y. Jin, Y. Liu, T. Chen, and Q. Yang, “Secureboost: Alossless federated learning framework,” arXiv preprint arXiv:1901.08755,2019.

[10] V. Nikolaenko, U. Weinsberg, S. Ioannidis, M. Joye, D. Boneh, andN. Taft, “Privacy-preserving ridge regression on hundreds of millionsof records,” in 2013 IEEE Symposium on Security and Privacy. IEEE,2013, pp. 334–348.

[11] Y.-R. Chen, A. Rezapour, and W.-G. Tzeng, “Privacy-preserving ridgeregression on distributed data,” Information Sciences, vol. 451, pp. 34–49,2018.

[12] V. Smith, C.-K. Chiang, M. Sanjabi, and A. S. Talwalkar, “Federatedmulti-task learning,” in Advances in Neural Information ProcessingSystems, 2017, pp. 4424–4434.

[13] S. Hardy, W. Henecka, H. Ivey-Law, R. Nock, G. Patrini, G. Smith,and B. Thorne, “Private federated learning on vertically partitioned datavia entity resolution and additively homomorphic encryption,” arXivpreprint arXiv:1711.10677, 2017.

[14] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” inKDD. ACM, 2016, pp. 785–794.

[15] Z. Wen, J. Shi, B. He, J. Chen, K. Ramamohanarao, and Q. Li,“Exploiting gpus for efficient gradient boosting decision tree training,”IEEE Transactions on Parallel and Distributed Systems, 2019.

[16] Q. Yang, Y. Liu, T. Chen, and Y. Tong, “Federated machine learning:Concept and applications,” ACM Transactions on Intelligent Systems andTechnology (TIST), vol. 10, no. 2, p. 12, 2019.

[17] K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H. B. McMahan,S. Patel, D. Ramage, A. Segal, and K. Seth, “Practical secure aggregationfor privacy-preserving machine learning,” in Proceedings of the 2017ACM SIGSAC Conference on Computer and Communications Security.ACM, 2017, pp. 1175–1191.

[18] P. Mohassel and P. Rindal, “Aby 3: a mixed protocol framework formachine learning,” in Proceedings of the 2018 ACM SIGSAC Conferenceon Computer and Communications Security. ACM, 2018, pp. 35–52.

[19] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick,S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture forfast feature embedding,” in Proceedings of the 22nd ACM internationalconference on Multimedia. ACM, 2014, pp. 675–678.

[20] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin,A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation inpytorch,” 2017.

[21] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin,S. Ghemawat, G. Irving, M. Isard et al., “Tensorflow: A system for large-scale machine learning,” in 12th {USENIX} Symposium on OperatingSystems Design and Implementation ({OSDI} 16), 2016, pp. 265–283.

[22] A. P. Sheth and J. A. Larson, “Federated database systems for managingdistributed, heterogeneous, and autonomous databases,” ACM ComputingSurveys (CSUR), vol. 22, no. 3, pp. 183–236, 1990.

[23] T. Kurze, M. Klems, D. Bermbach, A. Lenk, S. Tai, and M. Kunze,“Cloud federation,” Cloud Computing, vol. 2011, pp. 32–38, 2011.

[24] C. ”WeBank, Shenzhen, “Federated learning white paper v1.0,” inhttps://www.fedai.org/static/flwp-en.pdf, 2018.

[25] W. Y. B. Lim, N. C. Luong, D. T. Hoang, Y. Jiao, Y.-C. Liang, Q. Yang,D. Niyato, and C. Miao, “Federated learning in mobile edge networks:A comprehensive survey,” 2019.

[26] T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, “Federated learning:Challenges, methods, and future directions,” 2019.

[27] H. B. McMahan, D. Ramage, K. Talwar, and L. Zhang, “Learn-ing differentially private recurrent language models,” arXiv preprintarXiv:1710.06963, 2017.

[28] D. Leroy, A. Coucke, T. Lavril, T. Gisselbrecht, and J. Dureau, “Federatedlearning for keyword spotting,” in ICASSP 2019-2019 IEEE InternationalConference on Acoustics, Speech and Signal Processing (ICASSP).IEEE, 2019, pp. 6341–6345.

[29] J. Vaidya and C. Clifton, “Privacy preserving association rule mining invertically partitioned data,” in Proceedings of the eighth ACM SIGKDDinternational conference on Knowledge discovery and data mining.ACM, 2002, pp. 639–644.

[30] ——, “Privacy-preserving k-means clustering over vertically partitioneddata,” in Proceedings of the ninth ACM SIGKDD international conferenceon Knowledge discovery and data mining. ACM, 2003, pp. 206–215.

[31] ——, “Privacy preserving naive bayes classifier for vertically partitioneddata,” in Proceedings of the 2004 SIAM International Conference onData Mining. SIAM, 2004, pp. 522–526.

[32] ——, “Privacy-preserving decision trees over vertically partitioned data,”in IFIP Annual Conference on Data and Applications Security andPrivacy. Springer, 2005, pp. 139–152.

[33] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactionson knowledge and data engineering, vol. 22, no. 10, pp. 1345–1359,2010.

[34] M. J. Wainwright, M. I. Jordan, and J. C. Duchi, “Privacy aware learning,”in Advances in Neural Information Processing Systems, 2012, pp. 1430–1438.

[35] M. Fredrikson, S. Jha, and T. Ristenpart, “Model inversion attacksthat exploit confidence information and basic countermeasures,” inProceedings of the 22nd ACM SIGSAC Conference on Computer andCommunications Security. ACM, 2015, pp. 1322–1333.

[36] N. Carlini, C. Liu, J. Kos, U. Erlingsson, and D. Song, “The secretsharer: Measuring unintended neural network memorization & extractingsecrets,” arXiv preprint arXiv:1802.08232, 2018.

[37] R. Shokri, M. Stronati, C. Song, and V. Shmatikov, “Membershipinference attacks against machine learning models,” in 2017 IEEESymposium on Security and Privacy (SP). IEEE, 2017, pp. 3–18.

[38] M. Nasr, R. Shokri, and A. Houmansadr, “Comprehensive privacyanalysis of deep learning: Passive and active white-box inference attacksagainst centralized and federated learning,” in Comprehensive PrivacyAnalysis of Deep Learning: Passive and Active White-box InferenceAttacks against Centralized and Federated Learning. IEEE, 2019, p. 0.

[39] L. Melis, C. Song, E. De Cristofaro, and V. Shmatikov, “Exploitingunintended feature leakage in collaborative learning,” in 2019 IEEESymposium on Security and Privacy (SP). IEEE, 2019, pp. 691–706.

[40] C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise tosensitivity in private data analysis,” in Theory of cryptography conference.Springer, 2006, pp. 265–284.

[41] K. El Emam and F. K. Dankar, “Protecting privacy using k-anonymity,”Journal of the American Medical Informatics Association, vol. 15, no. 5,pp. 627–637, 2008.

[42] I. Wagner and D. Eckhoff, “Technical privacy metrics: a systematicsurvey,” ACM Computing Surveys (CSUR), vol. 51, no. 3, p. 57, 2018.

[43] H. B. McMahan, E. Moore, D. Ramage, and B. A. y Arcas, “Feder-ated learning of deep networks using model averaging,” CoRR, vol.abs/1602.05629, 2016.

[44] N. Papernot, M. Abadi, U. Erlingsson, I. Goodfellow, and K. Talwar,“Semi-supervised knowledge transfer for deep learning from privatetraining data,” arXiv preprint arXiv:1610.05755, 2016.

[45] Y. Aono, T. Hayashi, L. Wang, S. Moriai et al., “Privacy-preserving deeplearning via additively homomorphic encryption,” IEEE Transactions onInformation Forensics and Security, vol. 13, no. 5, pp. 1333–1345, 2018.

[46] F. Bourse, M. Minelli, M. Minihold, and P. Paillier, “Fast homomorphicevaluation of deep discretized neural networks,” in Annual InternationalCryptology Conference. Springer, 2018, pp. 483–512.

[47] H. Chabanne, A. de Wargny, J. Milgram, C. Morel, and E. Prouff,“Privacy-preserving classification on deep neural network.” IACR Cryp-tology ePrint Archive, vol. 2017, p. 35, 2017.

[48] R. Hall, S. E. Fienberg, and Y. Nardi, “Secure multiple linear regressionbased on homomorphic encryption,” Journal of Official Statistics, vol. 27,no. 4, p. 669, 2011.

[49] M. S. Riazi, C. Weinert, O. Tkachenko, E. M. Songhori, T. Schneider,and F. Koushanfar, “Chameleon: A hybrid secure computation frameworkfor machine learning applications,” in Proceedings of the 2018 on AsiaConference on Computer and Communications Security. ACM, 2018,pp. 707–721.

[50] B. D. Rouhani, M. S. Riazi, and F. Koushanfar, “Deepsecure: Scalableprovably-secure deep learning,” in Proceedings of the 55th AnnualDesign Automation Conference. ACM, 2018, p. 2.

[51] J. Yuan and S. Yu, “Privacy preserving back-propagation neural networklearning made practical with cloud computing,” IEEE Transactions onParallel and Distributed Systems, vol. 25, no. 1, pp. 212–221, 2013.

[52] Q. Zhang, L. T. Yang, and Z. Chen, “Privacy preserving deep computationmodel on cloud for big data feature learning,” IEEE Transactions onComputers, vol. 65, no. 5, pp. 1351–1362, 2015.

[53] J. Liu, M. Juuti, Y. Lu, and N. Asokan, “Oblivious neural networkpredictions via minionn transformations,” in Proceedings of the 2017ACM SIGSAC Conference on Computer and Communications Security.ACM, 2017, pp. 619–631.

Page 13: 1 A Survey on Federated Learning Systems: Vision, Hype and ...multiple external and internal cloud computing services. The concept of cloud federation enables further reduction of

13

[54] A. Shamir, “How to share a secret,” Communications of the ACM, vol. 22,no. 11, pp. 612–613, 1979.

[55] D. Chaum, “The dining cryptographers problem: Unconditional senderand recipient untraceability,” Journal of cryptology, vol. 1, no. 1, pp.65–75, 1988.

[56] K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H. B. McMa-han, S. Patel, D. Ramage, A. Segal, and K. Seth, “Practical secureaggregation for federated learning on user-held data,” arXiv preprintarXiv:1611.04482, 2016.

[57] W. Du and Z. Zhan, “Building decision tree classifier on private data,” inProceedings of the IEEE international conference on Privacy, securityand data mining-Volume 14. Australian Computer Society, Inc., 2002,pp. 1–8.

[58] P. Li, J. Li, Z. Huang, T. Li, C.-Z. Gao, S.-M. Yiu, and K. Chen,“Multi-key privacy-preserving deep learning in cloud computing,” FutureGeneration Computer Systems, vol. 74, pp. 76–85, 2017.

[59] R. Bahmani, M. Barbosa, F. Brasser, B. Portela, A.-R. Sadeghi, G. Scerri,and B. Warinschi, “Secure multiparty computation from sgx,” inInternational Conference on Financial Cryptography and Data Security.Springer, 2017, pp. 477–497.

[60] A. Gascon, P. Schoppmann, B. Balle, M. Raykova, J. Doerner, S. Zahur,and D. Evans, “Secure linear regression on vertically partitioned datasets.”IACR Cryptology ePrint Archive, vol. 2016, p. 892, 2016.

[61] N. Kilbertus, A. Gascon, M. J. Kusner, M. Veale, K. P. Gummadi, andA. Weller, “Blind justice: Fairness with encrypted sensitive attributes,”arXiv preprint arXiv:1806.03281, 2018.

[62] L. Wan, W. K. Ng, S. Han, and V. Lee, “Privacy-preservation forgradient descent methods,” in Proceedings of the 13th ACM SIGKDDinternational conference on Knowledge discovery and data mining.ACM, 2007, pp. 775–783.

[63] V. Chen, V. Pastro, and M. Raykova, “Secure computation for machinelearning with spdz,” arXiv preprint arXiv:1901.00329, 2019.

[64] B. Ghazi, R. Pagh, and A. Velingker, “Scalable and differentiallyprivate distributed aggregation in the shuffled model,” arXiv preprintarXiv:1906.08320, 2019.

[65] M. Kantarcioglu and C. Clifton, “Privacy-preserving distributed miningof association rules on horizontally partitioned data,” IEEE Transactionson Knowledge & Data Engineering, no. 9, pp. 1026–1037, 2004.

[66] H. Yu, X. Jiang, and J. Vaidya, “Privacy-preserving svm using nonlinearkernels on horizontally partitioned data,” in Proceedings of the 2006ACM symposium on Applied computing. ACM, 2006, pp. 603–610.

[67] A. F. Karr, X. Lin, A. P. Sanil, and J. P. Reiter, “Privacy-preservinganalysis of vertically partitioned data using secure matrix products,”Journal of Official Statistics, vol. 25, no. 1, p. 125, 2009.

[68] R. Nock, S. Hardy, W. Henecka, H. Ivey-Law, G. Patrini, G. Smith,and B. Thorne, “Entity resolution and federated learning get a federatedresolution,” arXiv preprint arXiv:1803.04035, 2018.

[69] H. Yu, J. Vaidya, and X. Jiang, “Privacy-preserving svm classificationon vertically partitioned data,” in Pacific-Asia Conference on KnowledgeDiscovery and Data Mining. Springer, 2006, pp. 647–656.

[70] O. Goldreich, “Secure multi-party computation,” Manuscript. Prelimi-nary version, vol. 78, 1998.

[71] C. Dwork, A. Roth et al., “The algorithmic foundations of differentialprivacy,” Foundations and Trends R© in Theoretical Computer Science,vol. 9, no. 3–4, pp. 211–407, 2014.

[72] K. Chaudhuri, C. Monteleoni, and A. D. Sarwate, “Differentially privateempirical risk minimization,” Journal of Machine Learning Research,vol. 12, no. Mar, pp. 1069–1109, 2011.

[73] R. Bassily, A. Smith, and A. Thakurta, “Private empirical risk minimiza-tion: Efficient algorithms and tight error bounds,” in 2014 IEEE 55thAnnual Symposium on Foundations of Computer Science. IEEE, 2014,pp. 464–473.

[74] M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar,and L. Zhang, “Deep learning with differential privacy,” in Proceedingsof the 2016 ACM SIGSAC Conference on Computer and CommunicationsSecurity. ACM, 2016, pp. 308–318.

[75] X. Wu, F. Li, A. Kumar, K. Chaudhuri, S. Jha, and J. Naughton, “Bolt-on differential privacy for scalable stochastic gradient descent-basedanalytics,” in Proceedings of the 2017 ACM International Conference onManagement of Data. ACM, 2017, pp. 1307–1322.

[76] R. Iyengar, J. P. Near, D. Song, O. Thakkar, A. Thakurta, and L. Wang,“Towards practical differentially private convex optimization,” in TowardsPractical Differentially Private Convex Optimization. IEEE, 2019, p. 0.

[77] Q. Li, Z. Wu, Z. Wen, and B. He, “Privacy-preserving gradient boostingdecision trees,” arXiv preprint arXiv:1911.04209, 2019.

[78] O. Thakkar, G. Andrew, and H. B. McMahan, “Differentially privatelearning with adaptive clipping,” arXiv preprint arXiv:1905.03871, 2019.

[79] S. Song, K. Chaudhuri, and A. D. Sarwate, “Stochastic gradient descentwith differentially private updates,” in 2013 IEEE Global Conference onSignal and Information Processing. IEEE, 2013, pp. 245–248.

[80] S. Goryczka and L. Xiong, “A comprehensive comparison of multi-party secure additions with differential privacy,” IEEE transactions ondependable and secure computing, vol. 14, no. 5, pp. 463–477, 2015.

[81] W. Du, Y. S. Han, and S. Chen, “Privacy-preserving multivariatestatistical analysis: Linear regression and classification,” in SDM. SIAM,2004, pp. 222–233.

[82] W. Du and M. J. Atallah, “Privacy-preserving cooperative statistical anal-ysis,” in Seventeenth Annual Computer Security Applications Conference.IEEE, 2001, pp. 102–110.

[83] B. Hitaj, G. Ateniese, and F. Perez-Cruz, “Deep models under the gan:information leakage from collaborative deep learning,” in Proceedings ofthe 2017 ACM SIGSAC Conference on Computer and CommunicationsSecurity. ACM, 2017, pp. 603–618.

[84] E. Bagdasaryan, A. Veit, Y. Hua, D. Estrin, and V. Shmatikov, “How tobackdoor federated learning,” arXiv preprint arXiv:1807.00459, 2018.

[85] A. N. Bhagoji, S. Chakraborty, P. Mittal, and S. Calo, “Analyzingfederated learning through an adversarial lens,” 2018.

[86] C. Fung, C. J. M. Yoon, and I. Beschastnikh, “Mitigating sybils infederated learning poisoning,” 2018.

[87] M. Castro, B. Liskov et al., “Practical byzantine fault tolerance,” inOSDI, vol. 99, no. 1999, 1999, pp. 173–186.

[88] P. Blanchard, R. Guerraoui, J. Stainer et al., “Machine learning withadversaries: Byzantine tolerant gradient descent,” in Advances in NeuralInformation Processing Systems, 2017, pp. 119–129.

[89] Y. Chen, L. Su, and J. Xu, “Distributed statistical machine learning inadversarial settings: Byzantine gradient descent,” Proceedings of theACM on Measurement and Analysis of Computing Systems, vol. 1, no. 2,p. 44, 2017.

[90] L. Su and J. Xu, “Securing distributed machine learning in highdimensions,” arXiv preprint arXiv:1804.10140, 2018.

[91] C. Xie, S. Koyejo, and I. Gupta, “Asynchronous federated optimization,”arXiv preprint arXiv:1903.03934, 2019.

[92] M. R. Sprague, A. Jalalirad, M. Scavuzzo, C. Capota, M. Neun,L. Do, and M. Kopp, “Asynchronous federated learning for geospatialapplications,” in Joint European Conference on Machine Learning andKnowledge Discovery in Databases. Springer, 2018, pp. 21–28.

[93] T. Yang, G. Andrew, H. Eichner, H. Sun, W. Li, N. Kong, D. Ramage, andF. Beaufays, “Applied federated learning: Improving google keyboardquery suggestions,” arXiv preprint arXiv:1812.02903, 2018.

[94] Y. Lin, S. Han, H. Mao, Y. Wang, and W. J. Dally, “Deep gradientcompression: Reducing the communication bandwidth for distributedtraining,” arXiv preprint arXiv:1712.01887, 2017.

[95] Z. Zheng, S. Xie, H.-N. Dai, X. Chen, and H. Wang, “Blockchainchallenges and opportunities: A survey,” International Journal of Weband Grid Services, vol. 14, no. 4, pp. 352–375, 2018.

[96] A. C. Zhou, Y. Xiao, B. He, J. Zhai, R. Mao et al., “Privacy regulationaware process mapping in geo-distributed cloud data centers,” IEEETransactions on Parallel and Distributed Systems, 2019.

[97] S. Wang, T. Tuor, T. Salonidis, K. K. Leung, C. Makaya, T. He, andK. Chan, “When edge meets learning: Adaptive control for resource-constrained distributed machine learning,” in IEEE INFOCOM 2018-IEEE Conference on Computer Communications. IEEE, 2018, pp.63–71.

[98] G. Zyskind, O. Nathan, and A. . Pentland, “Decentralizing privacy: Usingblockchain to protect personal data,” in 2015 IEEE Security and PrivacyWorkshops, May 2015, pp. 180–184.

[99] I. Eyal, A. E. Gencer, E. G. Sirer, and R. V. Renesse, “Bitcoin-ng:A scalable blockchain protocol,” in 13th USENIX Symposium onNetworked Systems Design and Implementation (NSDI 16). SantaClara, CA: USENIX Association, Mar. 2016, pp. 45–59. [Online].Available: https://www.usenix.org/conference/nsdi16/technical-sessions/presentation/eyal

[100] J. Kang, Z. Xiong, D. Niyato, H. Yu, Y.-C. Liang, and D. I. Kim,“Incentive design for efficient federated learning in mobile networks: Acontract theory approach,” arXiv preprint arXiv:1905.07479, 2019.

[101] J. Kang, Z. Xiong, D. Niyato, S. Xie, and J. Zhang, “Incentivemechanism for reliable federated learning: A joint optimization approachto combining reputation and contract theory,” IEEE Internet of ThingsJournal, 2019.

[102] Y. Sarikaya and O. Ercetin, “Motivating workers in federated learning:A stackelberg game perspective,” 2019.

[103] R. Jurca and B. Faltings, “An incentive compatible reputation mecha-nism,” in EEE International Conference on E-Commerce, 2003. CEC2003., June 2003, pp. 285–292.

Page 14: 1 A Survey on Federated Learning Systems: Vision, Hype and ...multiple external and internal cloud computing services. The concept of cloud federation enables further reduction of

14

[104] M. Naor, B. Pinkas, and R. Sumner, “Privacy preservingauctions and mechanism design,” in Proceedings of the 1stACM Conference on Electronic Commerce, ser. EC ’99. NewYork, NY, USA: ACM, 1999, pp. 129–139. [Online]. Available:http://doi.acm.org/10.1145/336992.337028

[105] A. P. Sanil, A. F. Karr, X. Lin, and J. P. Reiter, “Privacy preservingregression modelling via distributed computation,” in Proceedings of thetenth ACM SIGKDD international conference on Knowledge discoveryand data mining. ACM, 2004, pp. 677–682.

[106] Y. Liu, Z. Ma, X. Liu, S. Ma, S. Nepal, and R. Deng, “Boosting privately:Privacy-preserving federated extreme boosting for mobile crowdsensing,”arXiv preprint arXiv:1907.10218, 2019.

[107] Q. Li, Z. Wen, and B. He, “Practical federated gradient boosting decisiontrees,” arXiv preprint arXiv:1911.04206, 2019.

[108] H. Kim, J. Park, M. Bennis, and S.-L. Kim, “On-device federated learningvia blockchain and its latency analysis,” arXiv preprint arXiv:1808.03949,2018.

[109] M. Ammad-ud din, E. Ivannikova, S. A. Khan, W. Oyomno, Q. Fu,K. E. Tan, and A. Flanagan, “Federated collaborative filtering forprivacy-preserving personalized recommendation system,” arXiv preprintarXiv:1901.09888, 2019.

[110] J. Konecny, H. B. McMahan, D. Ramage, and P. Richtarik, “Federatedoptimization: Distributed machine learning for on-device intelligence,”arXiv preprint arXiv:1610.02527, 2016.

[111] L. Corinzia and J. M. Buhmann, “Variational federated multi-tasklearning,” arXiv preprint arXiv:1906.06268, 2019.

[112] F. Chen, Z. Dong, Z. Li, and X. He, “Federated meta-learning forrecommendation,” arXiv preprint arXiv:1802.07876, 2018.

[113] B. Liu, L. Wang, M. Liu, and C. Xu, “Lifelong federated reinforcementlearning: a learning architecture for navigation in cloud robotic systems,”arXiv preprint arXiv:1901.06455, 2019.

[114] A. Triastcyn and B. Faltings, “Federated generative privacy,” 2019.[115] R. Shokri and V. Shmatikov, “Privacy-preserving deep learning,” in

Proceedings of the 22nd ACM SIGSAC conference on computer andcommunications security. ACM, 2015, pp. 1310–1321.

[116] R. C. Geyer, T. Klein, and M. Nabi, “Differentially private federatedlearning: A client level perspective,” arXiv preprint arXiv:1712.07557,2017.

[117] A. Bhowmick, J. Duchi, J. Freudiger, G. Kapoor, and R. Rogers,“Protection against reconstruction and its applications in private federatedlearning,” arXiv preprint arXiv:1812.00984, 2018.

[118] M. Mohri, G. Sivek, and A. T. Suresh, “Agnostic federated learning,”arXiv preprint arXiv:1902.00146, 2019.

[119] T. Li, M. Sanjabi, and V. Smith, “Fair resource allocation in federatedlearning,” arXiv preprint arXiv:1905.10497, 2019.

[120] S. Truex, N. Baracaldo, A. Anwar, T. Steinke, H. Ludwig, and R. Zhang,“A hybrid approach to privacy-preserving federated learning,” arXivpreprint arXiv:1812.03224, 2018.

[121] J. Konecny, H. B. McMahan, F. X. Yu, P. Richtarik, A. T. Suresh, andD. Bacon, “Federated learning: Strategies for improving communicationefficiency,” arXiv preprint arXiv:1610.05492, 2016.

[122] H. Zhu and Y. Jin, “Multi-objective evolutionary federated learning,”IEEE transactions on neural networks and learning systems, 2019.

[123] E. Jeong, S. Oh, H. Kim, J. Park, M. Bennis, and S.-L. Kim,“Communication-efficient on-device machine learning: Federated dis-tillation and augmentation under non-iid private data,” arXiv preprintarXiv:1811.11479, 2018.

[124] F. Sattler, S. Wiedemann, K.-R. Muller, and W. Samek, “Robust andcommunication-efficient federated learning from non-iid data,” arXivpreprint arXiv:1903.02891, 2019.

[125] T. Nishio and R. Yonetani, “Client selection for federated learningwith heterogeneous resources in mobile edge,” in ICC 2019-2019 IEEEInternational Conference on Communications (ICC). IEEE, 2019, pp.1–7.

[126] X. Wang, Y. Han, C. Wang, Q. Zhao, X. Chen, and M. Chen, “In-edgeai: Intelligentizing mobile edge computing, caching and communicationby federated learning,” IEEE Network, 2019.

[127] A. Hard, K. Rao, R. Mathews, F. Beaufays, S. Augenstein, H. Eichner,C. Kiddon, and D. Ramage, “Federated learning for mobile keyboardprediction,” arXiv preprint arXiv:1811.03604, 2018.

[128] G. Ulm, E. Gustavsson, and M. Jirstrand, “Functional federated learningin erlang (ffl-erl),” in International Workshop on Functional andConstraint Logic Programming. Springer, 2018, pp. 162–178.

[129] S. Samarakoon, M. Bennis, W. Saady, and M. Debbah, “Distributed fed-erated learning for ultra-reliable low-latency vehicular communications,”arXiv preprint arXiv:1807.08127, 2018.

[130] S. Wang, T. Tuor, T. Salonidis, K. K. Leung, C. Makaya, T. He, andK. Chan, “Adaptive federated learning in resource constrained edge com-puting systems,” IEEE Journal on Selected Areas in Communications,vol. 37, no. 6, pp. 1205–1221, 2019.

[131] A. Nilsson, S. Smith, G. Ulm, E. Gustavsson, and M. Jirstrand, “Aperformance evaluation of federated learning algorithms,” in Proceedingsof the Second Workshop on Distributed Infrastructures for Deep Learning.ACM, 2018, pp. 1–8.

[132] S. Caldas, P. Wu, T. Li, J. Konecny, H. B. McMahan, V. Smith, andA. Talwalkar, “Leaf: A benchmark for federated settings,” arXiv preprintarXiv:1812.01097, 2018.

[133] M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni, “Locality-sensitivehashing scheme based on p-stable distributions,” in Proceedings of thetwentieth annual symposium on Computational geometry. ACM, 2004,pp. 253–262.

[134] R. Johnson and T. Zhang, “Accelerating stochastic gradient descentusing predictive variance reduction,” in Advances in neural informationprocessing systems, 2013, pp. 315–323.

[135] C. Ma, J. Konecny, M. Jaggi, V. Smith, M. I. Jordan, P. Richtarik,and M. Takac, “Distributed optimization with arbitrary local solvers,”optimization Methods and Software, vol. 32, no. 4, pp. 813–848, 2017.

[136] R. Caruana, “Multitask learning,” Machine learning, vol. 28, no. 1, pp.41–75, 1997.

[137] Y. Zhang and Q. Yang, “A survey on multi-task learning,” arXiv preprintarXiv:1707.08114, 2017.

[138] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,”2014.

[139] P. Paillier, “Public-key cryptosystems based on composite degreeresiduosity classes,” in International Conference on the Theory andApplications of Cryptographic Techniques. Springer, 1999, pp. 223–238.

[140] Y. Zhao, M. Li, L. Lai, N. Suda, D. Civin, and V. Chandra, “Federatedlearning with non-iid data,” arXiv preprint arXiv:1806.00582, 2018.

[141] X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang, “On the convergenceof fedavg on non-iid data,” arXiv preprint arXiv:1907.02189, 2019.

[142] L. Liu, J. Zhang, S. Song, and K. B. Letaief, “Edge-assisted hierarchicalfederated learning with non-iid data,” arXiv preprint arXiv:1905.06641,2019.

[143] N. Yoshida, T. Nishio, M. Morikura, K. Yamamoto, and R. Yonetani,“Hybrid-fl: Cooperative learning mechanism using non-iid data inwireless networks,” arXiv preprint arXiv:1905.07210, 2019.

[144] X. Yao, C. Huang, and L. Sun, “Two-stream federated learning: Reducethe communication costs,” in 2018 IEEE Visual Communications andImage Processing (VCIP). IEEE, 2018, pp. 1–4.

[145] G. Zhu, Y. Wang, and K. Huang, “Low-latency broadband analog aggre-gation for federated edge learning,” arXiv preprint arXiv:1812.11494,2018.

[146] K. Yang, T. Jiang, Y. Shi, and Z. Ding, “Federated learning via over-the-air computation,” arXiv preprint arXiv:1812.11750, 2018.

[147] M. M. Amiri and D. Gunduz, “Federated learning over wireless fadingchannels,” arXiv preprint arXiv:1907.09769, 2019.

[148] S. Caldas, J. Konecny, H. B. McMahan, and A. Talwalkar, “Expandingthe reach of federated learning by reducing client resource requirements,”arXiv preprint arXiv:1812.07210, 2018.

[149] S. Niknam, H. S. Dhillon, and J. H. Reed, “Federated learning forwireless communications: Motivation, opportunities and challenges,”arXiv preprint arXiv:1908.06847, 2019.

[150] Z. Yu, J. Hu, G. Min, H. Lu, Z. Zhao, H. Wang, and N. Georgalas,“Federated learning based proactive content caching in edge computing,”in 2018 IEEE Global Communications Conference (GLOBECOM).IEEE, 2018, pp. 1–6.

[151] Y. Qian, L. Hu, J. Chen, X. Guan, M. M. Hassan, and A. Alelaiwi,“Privacy-aware service placement for mobile edge computing viafederated learning,” Information Sciences, vol. 505, pp. 562–570, 2019.

[152] M. Duan, “Astraea: Self-balancing federated learning for improvingclassification accuracy of mobile deep learning applications,” arXivpreprint arXiv:1907.01132, 2019.

[153] T. S. Brisimi, R. Chen, T. Mela, A. Olshevsky, I. C. Paschalidis,and W. Shi, “Federated learning of predictive models from federatedelectronic health records,” International journal of medical informatics,vol. 112, pp. 59–67, 2018.

[154] L. Huang, Y. Yin, Z. Fu, S. Zhang, H. Deng, and D. Liu, “Loadaboost:Loss-based adaboost federated machine learning on medical data,” arXivpreprint arXiv:1811.12629, 2018.

Page 15: 1 A Survey on Federated Learning Systems: Vision, Hype and ...multiple external and internal cloud computing services. The concept of cloud federation enables further reduction of

15

[155] D. Verma, S. Julier, and G. Cirincione, “Federated ai for building aisolutions across multiple agencies,” arXiv preprint arXiv:1809.10036,2018.

[156] Y. Wang, “Co-op: Cooperative machine learning from mobile devices,”2017.

[157] “Tensorflow federated: Machine learning on decentralized data,” https://www.tensorflow.org/federated.

[158] I. Damgard, V. Pastro, N. Smart, and S. Zakarias, “Multiparty computa-tion from somewhat homomorphic encryption,” in Annual CryptologyConference. Springer, 2012, pp. 643–662.

[159] P. Voigt and A. Von dem Bussche, “The eu general data protection regu-lation (gdpr),” A Practical Guide, 1st Ed., Cham: Springer InternationalPublishing, 2017.

[160] B. Custers, A. Sears, F. Dechesne, I. Georgieva, T. Tani, and S. van derHof, EU Personal Data Protection in Policy and Practice. Springer,2019.

[161] W. B. Chik, “The singapore personal data protection act and anassessment of future trends in data privacy reform,” Computer Law& Security Review, vol. 29, no. 5, pp. 554–575, 2013.

[162] “California Consumer Privacy Act Home Page,” https://www.caprivacy.org/.

[163] M. Ranzato, S. Chopra, M. Auli, and W. Zaremba, “Sequence level train-ing with recurrent neural networks,” arXiv preprint arXiv:1511.06732,2015.

[164] C. P. Friedman, A. K. Wong, and D. Blumenthal, “Achieving anationwide learning health system,” Science translational medicine,vol. 2, no. 57, pp. 57cm29–57cm29, 2010.

[165] H. Asri, H. Mousannif, H. Al Moatassime, and T. Noel, “Using machinelearning algorithms for breast cancer risk prediction and diagnosis,”Procedia Computer Science, vol. 83, pp. 1064–1069, 2016.

[166] K. Kourou, T. P. Exarchos, K. P. Exarchos, M. V. Karamouzis, andD. I. Fotiadis, “Machine learning applications in cancer prognosis andprediction,” Computational and structural biotechnology journal, vol. 13,pp. 8–17, 2015.

[167] M. Alaggan, S. Gambs, and A.-M. Kermarrec, “Heterogeneous differen-tial privacy,” arXiv preprint arXiv:1504.06998, 2015.

[168] T. Hao, Y. Huang, X. Wen, W. Gao, F. Zhang, C. Zheng, L. Wang, H. Ye,K. Hwang, Z. Ren et al., “Edge aibench: Towards comprehensive end-to-end edge computing benchmarking,” arXiv preprint arXiv:1908.01924,2019.

[169] L. Wang, W. Wang, and B. Li, “Cmfl: Mitigating communicationoverhead for federated learning,” in IEEE 39th International Conferenceon Distributed Computing Systems (ICDCS), 2019.

[170] K. Hsieh, A. Harlap, N. Vijaykumar, D. Konomis, G. R. Ganger,P. B. Gibbons, and O. Mutlu, “Gaia: Geo-distributed machine learningapproaching LAN speeds,” in 14th USENIX Symposium on NetworkedSystems Design and Implementation (NSDI 17). Boston, MA: USENIXAssociation, Mar. 2017, pp. 629–647. [Online]. Available: https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/hsieh

[171] Z. Gu, H. Jamjoom, D. Su, H. Huang, J. Zhang, T. Ma, D. Pendarakis,and I. Molloy, “Reaching data confidentiality and model accountabilityon the caltrain,” 2018.

[172] A. Ghosh, J. Hong, D. Yin, and K. Ramchandran, “Robust federatedlearning in a heterogeneous environment,” 2019.

[173] D. Preuveneers, V. Rimmer, I. Tsingenopoulos, J. Spooren, W. Joosen,and E. Ilie-Zudor, “Chained anomaly detection models for federatedlearning: An intrusion detection case study,” Applied Sciences, vol. 8, p.2663, 12 2018.

[174] H. Kim, J. Park, M. Bennis, and S. Kim, “Blockchained on-devicefederated learning,” IEEE Communications Letters, pp. 1–1, 2019.

[175] I. Stojmenovic, S. Wen, X. Huang, and H. Luan, “An overview offog computing and its security issues,” Concurr. Comput. : Pract.Exper., vol. 28, no. 10, pp. 2991–3005, Jul. 2016. [Online]. Available:https://doi.org/10.1002/cpe.3485

[176] S. Yi, Z. Qin, and Q. Li, “Security and privacy issues of fog computing:A survey,” in WASA, 2015.

[177] M. Mukherjee, R. Matam, L. Shu, L. Maglaras, M. A. Ferrag, N. Choud-hury, and V. Kumar, “Security and privacy in fog computing: Challenges,”IEEE Access, vol. 5, pp. 19 293–19 304, 2017.

[178] T. D. Nguyen, S. Marchal, M. Miettinen, H. Fereidooni, N. Asokan,and A.-R. Sadeghi, “DIot: A federated self-learning anomaly detectionsystem for iot,” 2018.

[179] L. Jiang, R. Tan, X. Lou, and G. Lin, “On lightweight privacy-preservingcollaborative learning for internet-of-things objects,” in Proceedingsof the International Conference on Internet of Things Design andImplementation, ser. IoTDI ’19. New York, NY, USA: ACM, 2019, pp.70–81. [Online]. Available: http://doi.acm.org/10.1145/3302505.3310070

[180] S. Samarakoon, M. Bennis, W. Saad, and M. Debbah, “Federated learningfor ultra-reliable low-latency v2v communications,” 2018.

[181] Y. M. Saputra, D. T. Hoang, D. N. Nguyen, E. Dutkiewicz, M. D. Mueck,and S. Srikanteswara, “Energy demand prediction with federated learningfor electric vehicle networks,” 2019.