1 Introduction
Many models in the field of machine learning, such as deep neural networks (DNNs) and graphical models, are naturally represented in a layered network structure. The more layers we use in such models, the more complex the functions that are able to be represented. However, models with many layers are difficult to estimate optimally, and thus those in the machine learning field have generally opted to restrict their model to fewer layers, trading model expressivity for simplicity [1]. Deep learning explores ways to effectively train models with many hidden layers in order to retain the model's expressive powers. One of the most effective approaches to deep learning has been proposed by Hinton and Salakhutdinov [2]. Traditionally, estimating the parameters of network‐based models involves an iterative algorithm with the initial parameters being randomly chosen. Hinton's proposed method involves pretraining, or deliberately presetting in an effective manner, the parameters of the model as opposed to randomly initializing them. In this chapter, we review the architectures and properties of DNNs and discuss their applications.
We first briefly discuss the general machine learning framework and basic machine learning methodology in Section 2. We then discuss feedforward neural networks and backpropagation in Section 3. In Section 4, we explore convolutional neural networks (CNNs), the type of architectures that are usually used in computer vision. In Section 5, we discuss autoencoders, the unsupervised learning models that learn latent features without labels. In Section 6, we discuss recurrent neural networks (RNNs), which can handle sequence data.
2 Machine Learning: An Overview
2.1 Introduction
Machine learning is a field focusing on the design and analysis of algorithms that can learn from data [3]. The field originated from artificial intelligence research in the late 1950s, developing independently from statistics. However, by the early 1990s, machine learning researchers realized that a lot of statistical methods could be applied to the problems they were trying to solve. Modern machine learning is an interdisciplinary field that encompasses theory and methodology from both statistics and computer science.
Machine learning methods are grouped into two main categories, based on what they aim to achieve. The first category is known as supervised learning. In supervised learning, each observation in a dataset comes attached with a label. The label, similar to a response variable, may represent a particular class the observation belongs to (categorical response) or an output value (real‐valued response). In either case, the ultimate goal is to make inferences on possibly unlabeled observations outside of the given dataset. Prediction and classification are both problems that fall into the supervised learning category. The second category is known as unsupervised learning. In unsupervised learning, the data come without labels, and the goal is to find a pattern within the data at hand. Unsupervised learning encompasses the problems of clustering, density estimation, and dimension reduction.
2.2 Supervised Learning
Here, we state the problem of supervised learning explicitly. We have a set of training data
We wish to choose
where
2.3 Gradient Descent
The form of the function
Gradient descent is a general optimization algorithm that can be used to find the minimizer of any given function. We pick an arbitrary starting point, and then at each time point, we take a small step in the direction