Computational Statistics in Data Science. Группа авторов. Читать онлайн. Hotlib. HOTLIB.NET

Computational Statistics in Data Science

x With Ì‚"/> is the output of an autoencoder, and upper L left-parenthesis dot comma dot right-parenthesis represents the loss function that captures the distance between an input and its corresponding output.

The output of the encoder part is known as the embedding, which is the compressed representation of input learned by an autoencoder. Autoencoders are useful for dimension reduction, since the dimension of an embedding vector can be set to be much smaller than the dimension of input. The embedding space is called the latent space, the space where the autoencoder manipulates the distances of data. An advantage of the autoencoder is that it can perform unsupervised learning tasks that do not require any label from the input. Therefore, autoencoder is sometimes used in pretraining stage to get a good initial point for downstream tasks.

5.3 Variational Autoencoder

Many different variants of the autoencoder have been developed in the past years, but the variational autoencoder (VAE) is the one that achieved a major improvement in this field. VAE is one of the frameworks, which attempts to describe an observation in latent space in a probabilistic manner. Instead of using a single value to describe each dimension of the latent space, the encoder part of VAE uses a probability distribution to describe each latent dimension [17].

Figure 6 shows the structure of the VAE. The assumption is that each input data bold-italic x Subscript i is generated by some random process conditioned on an unobserved random latent variable bold-italic z Subscript i . The random process consists of two steps, where is first generated from some prior distribution p Subscript theta Baseline left-parenthesis bold-italic z right-parenthesis , and then bold-italic x Subscript i is generated from a conditional distribution . The probabilistic decoder part of VAE performs the random generation process. We are interested in the posterior over the latent variable , but it is intractable since the marginal likelihood p Subscript theta Baseline left-parenthesis bold-italic x right-parenthesis is intractable. To approximate the true posterior, the posterior distribution over the latent variable bold-italic z is assumed to be a distribution q Subscript phi Baseline left-parenthesis bold-italic z vertical-bar bold-italic x right-parenthesis parameterized by phi .

Given an observed dataset left-brace bold-italic x Subscript i Baseline right-brace Subscript i equals 1 Superscript n , the marginal log‐likelihood is composed of a sum over the marginal log‐likelihoods of all individual data points: log p Subscript theta Baseline left-parenthesis bold-italic x 1 comma bold-italic x 2 comma period period period comma bold-italic x Subscript n Baseline right-parenthesis equals sigma-summation Underscript i equals 1 Overscript n Endscripts log p Subscript theta Baseline left-parenthesis bold-italic x Subscript i Baseline right-parenthesis , where each marginal log‐likelihood can be written as

(4)

where the first term is the KL divergence [18] between the approximate and the true posterior, and the second term is called the variational lower bound. Since KL divergence is nonnegative, the variational lower bound is defined as

(5)

Figure 6 Architecture of variational autoencoder (VAE).

Therefore, the loss function of training a VAE can be simplified as

(6)

where the first term captures the reconstruction loss, and the second term is regularization on the embedding. To optimize the loss function (6), a reparameterization trick is used. For a chosen approximate posterior q Subscript phi Baseline left-parenthesis bold-italic z vertical-bar bold-italic x right-parenthesis , the latent variable bold-italic z overTilde tilde q Subscript phi Baseline left-parenthesis bold-italic z vertical-bar bold-italic x right-parenthesis is approximated by

(7)

Скачать книгу