ensemble framework to first learn a teacher model ensemble from the distributed datasets among all the parties. Then, the teacher model ensemble is used to make noisy predictions on a public dataset. Finally, the labeled public dataset is used to train a student model. The privacy loss is precisely controlled by the number of public data samples inferred by the teacher ensemble. Generative adversarial network (GAN) is further applied in Papernot et al. [2018] to generate synthetic training data for the training of the student model. Although this approach is not limited to a single ML algorithm, it requires adequate data quantity at each location.
Moments accountant is proposed for differentially private stochastic gradient descent (SGD), which computes the overall privacy cost in neural networks model training by taking into account the particular noise distribution under consideration [Abadi et al., 2016]. It proves less privacy loss for appropriately chosen settings of the noise scale and the clipping threshold.
The differentially private Long Short Term Memory (LSTM) language model is built with user-level differential privacy guarantees with only a negligible cost in predictive accuracy [McMahan et al., 2017]. Phan et al. [2017] proposed a private convolutional deep belief network (pCDBN) by leveraging the functional mechanism to perturb the energy-based objective functions of traditional convolutional deep belief networks. Generating differentially private datasets using GANs is explored in Triastcyn and Faltings [2018], where a Gaussian noise layer is added to the discriminator of a GAN to make the output and the gradients differentially private with respect to the training data. Finally, the privacy-preserving artificial dataset is synthesized by the generator. In addition to the DP dataset publishing, differentially private model publishing for deep learning is also addressed in Yu et al. [2019], where concentrated DP and a dynamic privacy budget allocator are embraced to improve the model accuracy.
Geyer et al. [2018] studied differentially private federated learning and proposed an algorithm for client-level DP preserving federated optimization. It was shown that DP on a client level is feasible and high model accuracy can be reached when sufficiently many participants are involved in federated learning.
CHAPTER 3
Distributed Machine Learning
As we know from Chapter 1, federated learning and distributed machine learning (DML) share several common features, e.g., both employing decentralized datasets and distributed training. Federated learning is even regarded as a special type of DML by some researchers, see, e.g., Phong and Phuong [2019], Yu et al. [2018], Konecný et al. [2016b] and Li et al. [2019], or seen as the future and the next step of DML. In order to gain deeper insights into federated learning, in this chapter, we provide an overview of DML, covering both the scalability-motivated and the privacy-motivated paradigms.
DML covers many aspects, including distributed storage of training data, distributed operation of computing tasks, and distributed serving of model results, etc. There exist a large volume of survey papers, books, and book chapters on DML, such as Feunteun [2019], Ben-Nun and Hoefler [2018], Galakatos et al. [2018], Bekkerman et al. [2012], Liu et al. [2018], and Chen et al. [2017]. Hence, we do not intend to provide another comprehensive survey on this topic. We focus here on the aspects of DML that are most relevant to federated learning, and refer the readers to the references for more details.
3.1 INTRODUCTION TO DML
3.1.1 THE DEFINITION OF DML
DML, also known as distributed learning, refers to multi-node machine learning (ML) or deep learning (DL) algorithms and systems that are designed to improve performance, preserve privacy, and scale to more training data and bigger models [Trask, 2019, Liu et al., 2017, Galakatos et al., 2018]. For example, as illustrated in Figure 3.1, a DML system with three workers (a.k.a. computing nodes) and one parameter server [Li et al., 2014], the training data are split into disjoint data shards and sent to the workers, and the workers carry out stochastic gradient descent (SGD) at their locality. The workers send gradients Δwi or model weights wi to the parameter server, where the gradients or model weights are aggregated (e.g., via taking weighted average) to obtain the global gradients Δw or model weights w. Both synchronous and asynchronous SGD algorithms can be applied in DML [Ben-Nun and Hoefler, 2018, Chen et al., 2017].
Конец ознакомительного фрагмента.
Текст предоставлен ООО «ЛитРес».
Прочитайте эту книгу целиком, купив полную легальную версию на ЛитРес.
Безопасно оплатить книгу можно банковской картой Visa, MasterCard, Maestro, со счета мобильного телефона, с платежного терминала, в салоне МТС или Связной, через PayPal, WebMoney, Яндекс.Деньги, QIWI Кошелек, бонусными картами или другим удобным Вам способом.