Группа авторов

Computational Statistics in Data Science


Скачать книгу

upper W Subscript 2 o Baseline bold-italic h Superscript left-parenthesis t minus 1 right-parenthesis Baseline plus bold-italic b Subscript o Baseline right-parenthesis EndLayout"/>

      where upper W and b are weight matrix and bias, and sigma Subscript g Baseline left-parenthesis z right-parenthesis equals StartFraction 1 Over 1 plus exp left-parenthesis z right-parenthesis EndFraction is the sigmoid function.

      The two hidden states bold-italic h Superscript left-parenthesis t right-parenthesis and bold-italic c Superscript left-parenthesis t right-parenthesis are calculated by

      (14)StartLayout 1st Row 1st Column bold-italic h Superscript left-parenthesis t right-parenthesis 2nd Column equals bold-italic o Superscript left-parenthesis t right-parenthesis Baseline ring hyperbolic tangent left-parenthesis bold-italic c Superscript left-parenthesis t right-parenthesis Baseline right-parenthesis EndLayout

stat08316fgz009

      (15)StartLayout 1st Row 1st Column StartFraction delta script l Superscript left-parenthesis upper T right-parenthesis Baseline Over delta bold-italic upper W Subscript 1 f Baseline EndFraction 2nd Column equals sigma-summation Underscript t equals 0 Overscript upper T Endscripts StartFraction delta script l Superscript left-parenthesis upper T right-parenthesis Baseline Over delta bold-italic h Superscript left-parenthesis upper T right-parenthesis Baseline EndFraction StartFraction delta bold-italic h Superscript left-parenthesis upper T right-parenthesis Baseline Over delta bold-italic c Superscript left-parenthesis upper T right-parenthesis Baseline EndFraction left-parenthesis product Underscript j equals t plus 1 Overscript upper T Endscripts StartFraction delta bold-italic c Superscript left-parenthesis j right-parenthesis Baseline Over delta bold-italic c Superscript left-parenthesis j minus 1 right-parenthesis Baseline EndFraction right-parenthesis StartFraction delta bold-italic c Superscript left-parenthesis t right-parenthesis Baseline Over delta bold-italic upper W Subscript 1 f Baseline EndFraction 2nd Row 1st Column Blank 2nd Column equals sigma-summation Underscript t equals 0 Overscript upper T Endscripts StartFraction delta script l Superscript left-parenthesis upper T right-parenthesis Baseline Over delta bold-italic h Superscript left-parenthesis upper T right-parenthesis Baseline EndFraction StartFraction delta bold-italic h Superscript left-parenthesis upper T right-parenthesis Baseline Over delta bold-italic c Superscript left-parenthesis upper T right-parenthesis Baseline EndFraction left-parenthesis product Underscript j equals t plus 1 Overscript upper T Endscripts bold-italic f Superscript left-parenthesis t right-parenthesis Baseline plus upper A Superscript left-parenthesis t right-parenthesis Baseline right-parenthesis StartFraction delta bold-italic c Superscript left-parenthesis t right-parenthesis Baseline Over delta bold-italic upper W Subscript 1 f Baseline EndFraction EndLayout

      where upper A Superscript left-parenthesis t right-parenthesis represents other terms in the partial derivative calculation. Since the sigmoid function is used when calculating the values of bold-italic i Superscript left-parenthesis t right-parenthesis Baseline comma bold-italic f Superscript left-parenthesis t right-parenthesis Baseline comma bold-italic o Superscript left-parenthesis t right-parenthesis, this implies that they will be close to either 0 or 1. When bold-italic f Superscript left-parenthesis t right-parenthesis is close to 1, the gradient does not vanish, and when it is close to 0, it means that the previous information is not useful for the current state and should be forgotten.

      We discussed the architectures of four types of neural networks and their extensions in this chapter. There have been many other neural networks proposed in the past years, but the ones discussed in this chapter are the classical ones that served as foundations for many other works. Though DNNs have achieved breakthroughs in many fields, the performances in many fields are far from perfect. Developing new architectures that can improve the performances on various tasks or solve new problems is an important research direction. Analyzing the properties and problems of existing architectures is also of great interest to the community.

      1 1 Larochelle, H., Bengio,