KGS Home List of Computer Contributions

Kansas Geological Survey, Computer Contributions 39, originally published in 1969


FORTRAN IV Program for Generalized Statistical Distance and Analysis of Covariance Matrices for the CDC 3600 Computer

by R.A. Reyment1, Hans-Åke Ramdén1, and Warren J. Wahlstedt2

1University of Uppsala and 2Cities Service Oil Co.

small image of the cover of the book; light blue paper with dark blue text.

Originally published in 1969 as Kansas Geological Survey Computer Contributions 39.

Abstract

Heterogeneity in sample covariance matrices, deriving from differences in the orientation of major axes of the ellipsoids of scatter, may be of common occurrence. The generalized test of equality of covariance matrices will give a significant result in instances where the scatter ellipsoids are (1) unequally inflated, although identically oriented, (2) identically inflated but differently oriented and (3) a combination of these conditions. Equations to the major axes of a scatter ellipsoid of morphologic variables represent growth patterns in the variables. An approximate application of the asymptotic test developed by T. A. Anderson is used here to identify structure of the heterogeneity between two covariance matrices where such exists. The foregoing procedures are preliminary to a treatment of generalized distances in which the path taken by the computer program is decided by structure of the covariance matrices of the samples. Depending on the nature of the covariance matrices either the Mahanolobis' generalized distance is computed or the Anderson-Bahadur distance for heterogeneous covariance matrices. Tests of significance of the results are provided and the linear discriminant function coefficients produced as a by-product.

Introduction

The generalized statistical distance is probably the best known and most widely used of the multivariate techniques employed in taxonomic work. It is used also widely in other connections.

Calculations performed by this program may be conceived in terms of the geometric properties of two ellipsoids, in this situation ellipsoids of scatter. For the strict application of statistical theory involved, it is necessary that the variables be distributed normally. The ellipsoid of scatter associated with a multivariate normal distribution provides a representation of the variability of the population, directly analogous to the variance of univariate statistics. In univariate statistics one will wish to make sure that in a comparison of two statistical populations on the basis of samples drawn from them the variances of the variable being analyzed are statistically identical for both populations. One also will want to know if the variables are distributed normally. The same reasoning applies in multivariate analysis. It is not difficult to test for homogeneity of univariate variances but the multivariate analog is associated with certain drawbacks and difficulties.

If the variables are distributed as a multivariate normal, the shape of a three-dimensional plot of many points will approximate a football flattened equally on two opposite sides. For a two-sample statistical comparison to be valid one would require the footballs to be of the same size and to be oriented in the same direction with respect to all axes. It is possible to employ a large-sample test to ascertain whether each principal axis of the ellipsoid of one sample is collinear with the corresponding principal axis of the other ellipsoid. In the bivariate situation there are four categories of difference to be recognized. For three and higher dimensions the possibility of rotation about axes has to be taken into consideration.

How serious a matter is it if the covariance matrices are not equal? Experience shows that in most neontological investigations, heterogeneity in covariance matrices, although common, usually does not cause serious inaccuracies in the generalized distance. Geologic materials pose a more intricate problem. Owing to the action of geologic (nonbiologic) agencies, such as transport, considerable heterogeneity factors may be introduced into a sample. These factors may be of such an order as to cause serious inaccuracy in generalized distance computations.

Acknowledgments

The original research work for this paper was done while R.A.R. was Visiting Research Scientist with the Kansas Geological Survey at The University of Kansas (1966-1967). The work was supported by Computing Grant 104104 at the University of Uppsala and Research Contracts 2320-26-7819G and -24-7540G.

The complete text of this report is available as an Adobe Acrobat PDF file.

Read the PDF version (5.7 MB)


Kansas Geological Survey
Placed on web Sept. 9, 2019; originally published 1969.
Comments to webadmin@kgs.ku.edu
The URL for this page is http://www.kgs.ku.edu/Publications/Bulletins/CC/39/index.html