Группа авторов

Computational Statistics in Data Science


Скачать книгу

interest may be in estimating a quantile of upper V. Let upper F Subscript h Baseline left-parenthesis v right-parenthesis be the distribution function of h left-parenthesis upper X right-parenthesis, assumed to be absolutely continuous with a continuous density f Subscript h Baseline left-parenthesis v right-parenthesis. The q‐quantile associated with upper F Subscript h is

phi Subscript q Baseline equals upper F Subscript h Superscript negative 1 Baseline left-parenthesis q right-parenthesis equals inf left-brace right-brace colon v colon greater-than-or-equal-to greater-than-or-equal-to of FFh left-parenthesis right-parenthesis vq

      Sample statistics are used to estimate phi Subscript q. That is, let ModifyingAbove phi With Ì‚ Subscript q Baseline equals h left-parenthesis upper X right-parenthesis Subscript left ceiling n q right ceiling colon n be the left ceiling n q right ceilingth order statistic of upper V. Then, standard arguments for IID sampling and MCMC [11] show that ModifyingAbove phi With Ì‚ Subscript q Baseline right-arrow Overscript a period s period Endscripts phi Subscript q as n right-arrow infinity.

      2.3 Other Estimators

normal upper Lamda equals upper V a r Subscript upper F Baseline left-bracket h left-parenthesis upper X right-parenthesis right-bracket equals normal upper E Subscript upper F Baseline left-bracket left-parenthesis h left-parenthesis upper X right-parenthesis minus theta Subscript h Baseline right-parenthesis left-parenthesis h left-parenthesis upper X right-parenthesis minus theta Subscript h Baseline right-parenthesis Superscript upper T Baseline right-bracket

      A natural estimator is the sample covariance matrix

ModifyingAbove normal upper Lamda With Ì‚ Subscript n Baseline equals StartFraction 1 Over n minus 1 EndFraction sigma-summation Underscript t equals 1 Overscript n Endscripts left-parenthesis h left-parenthesis upper X Subscript t Baseline right-parenthesis minus ModifyingAbove theta With Ì‚ Subscript h Baseline right-parenthesis left-parenthesis h left-parenthesis upper X Subscript t Baseline right-parenthesis minus ModifyingAbove theta With Ì‚ Subscript h Baseline right-parenthesis Superscript upper T

      The strong law of large numbers and the continuous mapping theorem imply that ModifyingAbove normal upper Lamda With Ì‚ Subscript n Baseline right-arrow Overscript a period s period Endscripts normal upper Lamda as n right-arrow infinity. For IID samples, ModifyingAbove normal upper Lamda With Ì‚ Subscript n is unbiased, but for MCMC samples under stationarity, ModifyingAbove normal upper Lamda With Ì‚ Subscript n is typically biased from below [12]

normal upper E Subscript upper F Baseline left-bracket ModifyingAbove normal upper Lamda With Ì‚ Subscript n Baseline right-bracket equals StartFraction n Over n minus 1 EndFraction left-parenthesis normal upper Lamda minus upper V a r Subscript upper F Baseline left-parenthesis ModifyingAbove theta With Ì‚ Subscript h Baseline right-parenthesis right-parenthesis

      For MCMC samples, upper V a r Subscript upper F Baseline left-parenthesis ModifyingAbove theta With Ì‚ Subscript h Baseline right-parenthesis is typically larger than normal upper Lamda slash n, yielding biased‐from‐below estimation. If obtaining an unbiased estimator of normal upper Lamda is desirable, a bias correction should be done by estimating Varleft-parenthesis ModifyingAbove theta With Ì‚ Subscript h Baseline right-parenthesis using methods described in Section 4 .

      An asymptotic sampling distribution for estimators in the previous section can be used to summarize the Monte Carlo variability, provided it is available and the limiting variance is estimable. For IID sampling, moment conditions for the function of interest, h, with respect to the target distribution, upper F, suffice. For MCMC sampling, more care needs to be taken to ensure that a limiting distribution holds. We present a subset of the conditions under which the estimators exhibit a normal limiting distribution [9, 13]. The main Markov chain assumption is that of polynomial ergodicity. Let double-vertical-bar dot double-vertical-bar Subscript upper T upper V denote the total‐variation distance. Let upper P Superscript t be the t‐step Markov chain transition kernel, and let upper M colon script í’³ right-arrow double-struck upper R Superscript plus such that normal upper E upper M less-than infinity and for xi greater-than 0,

double-vertical-bar upper P Superscript t Baseline left-parenthesis x comma dot right-parenthesis minus upper F left-parenthesis dot right-parenthesis double-vertical-bar Subscript upper T upper V Baseline less-than-or-equal-to upper M left-parenthesis x right-parenthesis t Superscript negative xi

      for all