FOREWORD


In the last few decades the accumulation of large amounts of information in numerous applications has stimulated an increased interest in multivariate analysis. Computer technologies allow one to use multi-dimensional and multi-parametric models successfully. At the same time, an interest arose in statistical analysis with a deficiency of sample data. Nevertheless, it is difficult to describe the recent state of affairs in applied multivariate methods as satisfactory. Unimprovable (dominating) statistical procedures are still unknown except for a few specific cases. The simplest problem of estimating the mean vector with minimum quadratic risk is unsolved, even for normal distributions. Commonly used standard linear multivariate procedures based on the inversion of sample covariance matrices can lead to unstable results or provide no solution in dependence of data. Programs included in standard statistical packages cannot process `multi-collinear data' and there are no theoretical recommendations except to ignore a part of the data. The probability of data degeneration increases with the dimension n, and for n>N, where N is the sample size, the sample covariance matrix has no inverse. Thus nearly all conventional linear methods of multivariate statistics prove to be unreliable or even not applicable to high-dimensional data.
This situation is by no means caused by lack of the necessary advancing theoretical support of multivariate analysis. The traditional Fisher approach was developed for classical problems with simple models and arbitrarily large samples. The principle requirement on statistical estimators was consistency, i.e., convergence to true values for a fixed model, as the sample size increases. Traditionally, statistical procedures are developed by a substitution of consistent estimators into the extremal theoretical solutions (the `plug-in' procedure).
However, the component-wise consistency does not provide satisfactory solutions to the problems of the multivariate analysis. In the case of a high dimension, the cumulative effects of estimating a large number of parameters can lead to a substantial loss of quality and to the breakdown of multivariate procedures.
It is well known that classical mathematical investigations in multivariate statistical analysis were reduced to the calculation of some exact distributions and their functions under the assumption that the observations are normal. The well developed traditional asymptotic theory of statistics is oriented to one-dimensional and low-dimensional problems. Its formal extrapolation to multi-dimensional problems (by replacing scalars by vectors without analyzing specific effects) enriched the statistics neither with methods, nor with qualitatively new theoretical results. One can say that central problems of the multivariate analysis remain unsolved.
The essential progress has been achieved after a number of investigations in 1970--1974 pioneered by A.N. Kolmogorov. He suggested a new asymptotic approach differing by a simultaneous increase of the sample size N and the dimension n of variables so that the ratio n/N tends to a constant. This constant became a new parameter of the asymptotic theory. In contrast to the traditional asymptotic approach in mathematical statistics, this new approach was called the increasing dimension asymptotics. The investigation of terms of the order of magnitude n/N led to the discovery of a series of new specific phenomena in high-dimensional problems such as accumulation of estimation errors, appearance of finite biases and multiples, and a certain normalization effect when, under some `restricted dependence conditions', all distributions prove to be equivalent to normal distributions with respect to functionals uniformly depending on variables. In particular, this means that standard quality functions of multivariate procedures prove to be approximately distribution-free and that, at last, we obtain a tool for comparing different versions of procedures.

Hosted by uCoz