PREFACE

This monograph presents the development of mathematical statistics using models of large dimension, and what is more essential, models with the number of unknown parameters so large that it is comparable in magnitude with sample size and may be much more. This branch of statistics is distinguished by a new concept of statistical investigation, by problem settings, specific phenomena, by methods, and by results. This theory may be qualified as "multiparametric statistics" or perhaps, some more meaningfully, "essentially multiparametric statistics".

In the basement of the presented theory lay some restrictions of principle imposed on the degree of dependence of variables that allow to escape from the "curse of dimensionality" , that is, of the necessity to analyze combinatorial large variety of abstract possibilities. They reduce to the restriction on invariant maximum fourth moment of variables and the requirement of decreasing variance of quadratic forms. These restrictions may be called "conditions of multiparametric approach applicability" and they seem to be quite acceptable for a majority of real statistical problems of practical importance.

The statistician using many-parametric models is hardly interested in the magnitude of separate parameters and would prefer, more probably, to obtain some more satisfactory solution of his concrete statistical problem. The consistency and the unbiasedness of estimators important for small dimension, in many-dimensional problems are not most desirable properties. The classical Fisher concept of statistical investigation as sharpening of our knowledge in a process of unbounded accumulation of data is replaced by the necessity of taking optimal decisions under fixed sample size. In this situation, new estimators are preferable that could provide maximum gain in the purpose function. It means, first, a withdrawal to the Wald decision function approach and addressing to concepts of the efficiency and dominance of estimators.
However, it is well known that the best equivariant estimators are mostly not found up till now. The main obstacle here, certainly, is that standard quality functions depend on unknown parameters. Fortunately, as is shown in this book, this difficulty of principle may be overcome in the asymptotics of the increasing number of parameters. This asymptotical approach was proposed by A.N.Kolmogorov in 1968--1970. He suggested to consider a sequence of statistical problems in which sample size $N$ increases along with the number of parameters $n$ so that the ratio n/N\to c>0. This setting is different by that a concrete statistical problem is replaced by a sequence of problems so that the theory may be considered as a tool of isolating leading terms that describe approximately the concrete problem. The ratio n/N shows the boundary of the traditional methods applicability and the origin of new multiparametric features of statistical phenomena. Their essence may be understood if we prescribe the magnitude 1/n to contribution of a separate parameter and compare it with the variance of standard estimators that is of the order of magnitude of 1/N. Large ratios n/N show that the contribution of separate parameters is comparable with "noise level", produced by the uncertainty of sample data. In this situation, the statistician has no opportunity to seek more precise values of unknown parameters and is obliged to take practical decisions over the accessible data.

The proposed theory of multiparametric statistics is advantageous for problems where n and N are large and variables are boundedly dependent. In this case the inaccuracies of estimation of a large number of parameters become approximately independent and their summation produces an additional mechanism of averaging ("mixing") that stabilizes functions uniformly dependent on arguments. It is of importance that this class of functions includes standard quality functions. As a consequence, their variance proves to be small, and in the multiparametric approach we may say not on estimation of quality functionals, but on their "evaluation". This property changes radically the whole problem of estimator dominance and methods of search for admissible estimators. We may say that a new class of mathematical investigations in statistics appears whose purpose is a systematical construction of \it asymptotically better and "asymptotically unimprovable solutions".

Another specifically multiparametric phenomenon is the appearance of stable limit relations between sets of observable quantities and sets of parameters. These relations may have the form of the Fredholm integral equations of the first kind with respect to unknown functions of parameters. Since they are produced by random distortions in the process of observation, these relations may be called "dispersion equations" . These equations present functional relations between sets of first and second moments of variables and sets of their estimators. Higher order deviations are averaged out and this leads to a remarkable essentially multiparametric effect: the leading terms of standard quality functions depend on only two first moments of variables. Practically, it means that for a majority of multivariate (regularized) statistical procedures, for a wide class of distributions, standard quality functions do not differ much from those calculated under the normality assumption, and the quality of multiparametric procedures is approximately population free. Thus, three drawbacks of the existing methods of multivariate statistics prove to be overcome at once: the instability is removed, approximately unimprovable solutions are found, and the population free quality is obtained.

Given approximately non-random quality functions, we may solve appropriate extremum problems and obtain methods of constructing statistical solutions that are "asymptotically unimprovable independently of distributions".

The main result of investigations presented in the book is a regular multiparametric technology of improving statistical solutions. Briefly, this technology is as follows. First, a generalized family of always stable statistical solutions is chosen depending on an apriori vector of parameters or an apriori function that fixes the chosen algorithm. Then dispersion equations are derived and applied to calculate the limit risk as a function of only population parameters. An extremal problem is solved and the extremum apriori vector or function is calculated that define the asymptotically ideal solution. Then by using dispersion equations once more, this ideal solution is approximated by statistics and a practical procedure is constructed providing approximately maximum quality. Another way is, first, to isolate the leading term of the risk function and then to minimize some statistics approximating the risk. It remains to estimate the influence of small inaccuracies produced by asymptotic approach.

These technologies are applied in this book to obtain asymptotically improved and unimprovable solutions for a series of most usable statistical problems. They include the problem of estimation of expectation vectors, matrix shrinkage, the estimation of inverse covariance matrices, sample regression, and discriminant analysis. The same technology is applied for the minimum square solutions to large systems of empirical linear algebraic equations (over a single realization of random coefficient matrix and random right-hand side vector).

For practical use, the implementation of simplest two-parametric shrinkage-ridge versions of existing procedures may be especially interesting. They save the user from the danger of a degeneration, and provide solutions certainly improved for large n and N. Asymptotically unimprovable values of shrinkage and ridge parameters for problems mentioned above are written out in the book. These two-parametric solutions improve over conventional ones and also over one-parametric shrinkage and ridge regularization algorithms, only insignificantly increase the calculation job, and are easy for programming.

Investigations of specific phenomena produced by estimation of a large number of parameters were initiated by A.N.Kolmogorov. Under his guidance in 1970--1972 Yu.N.Blagovechshenskii, A.D.Deev, L.D.Meshalkin, and Yu.V.Arkharov carried out the first but basic investigations in the increasing dimension asymptotics. In later years A.N.Kolmogorov was also interested and supported earlier investigations of the author of this book. The main results exposed in this book are obtained in the Kolmogorov asymptotics.

The second constituent of the mathematical theory presented in this book is the spectral theory of increasing random matrices created by V.A.Marchenko, L.A.Pastur (1967), and V.L.Girko (1975--1995) that was later applied to sample covariance matrices by the author. This theory was developed independently under the same asymptotical approach as the Kolmogorov asymptotics. Its main achievements are based on the method of spectral functions that the author learned from reports and publications by V.L.Girko in 1983. The spectral function method is used in this book for developing a general technology of construction of improved and asymptotically unimprovable statistical procedures distribution free for a wide class of distributions.

The author would like to express my sincere gratitude to our prominent scientists Yu.V.Prokhorov, V.M.Buchstaber, S.A.Aivasian, who appreciated the fruitfulness of multiparametric approach in the statistical analysis from the very beginning, supported his investigations, and made possible the publication of his books. The author appreciates highly the significant contribution of his pupils and successors: of his son Andrei Serdobolskii, who helped much in the development of the theory of optimum solutions to large systems of empirical linear equations; and also of V.S.Stepanov and V.A.Glusker who performed tiresome numeric investigations that served as convincing confirmation of practical applicability of the theory developed in this book.
Hosted by uCoz