KGS Home List of Computer Contributions

Kansas Geological Survey, Computer Contributions 31, originally published in 1969


Multivariate Procedures and FORTRAN IV Program for Evaluation and Improvement of Classifications

by Ferruh Demirmen

Stanford University

small image of the cover of the book; orange paper with black text.

Originally published in 1969 as Kansas Geological Survey Computer Contributions 31.

Abstract

ITERIM is an IBM System/360 FORTRAN IV(H) program designed primarily to assess and improve classifications, although it can be used also for principal component analysis, discriminant analysis, and one-way multivariate analysis of variance. Three criteria, pooled within-groups sum of squares, Wilks' Lambda, and the sum of the eigenvalues associated with discriminant functions, are computed to assess and compare classifications. The improvement of a classification is achieved through reduction of the pooled within-groups sum of squares in the discriminant space. The classifications compared must contain the same number of items, the same number of groups, and must be defined relative to the same number of variables. A number of options, both as to computations and output, are provided.

Introduction

Geologists and others dealing with multivariate classification or "cluster analysis" are faced frequently with a great diversity of techniques from which to choose (Sokal and Sneath, 1963; Ball, 1965; Williams and Dale, 1.965; Fortier and Solomon, 1966; Goodall, 1966a, 1966b; Gower, 1967a; Johnson, 1967). Some of these techniques concern weighting or standardization of data, others concern similarity measures, and yet others are re lated to grouping of data. At present there exists little a priori rational basis for choosing between these diverse techniques, although a number of writers (Sokal and Rohlf, 1962; Eades, 1965; Minkoff, 1965; Rohlf and Sokal, 1965; Gower, 1967b) have discussed the merits and demerits of certain techniques. With different clustering techniques, the resulting classifications will be different, and it may be difficult to reconcile the conflicting classifications. A way out of this dilemma seems to be the use of a variety of techniques and evaluate, in retrospect, the resulting classifications. Such evaluation can be made either on a substantive and subjective basis, or alternative Iy, on an objective basis. Furthermore, it would be desirable if any of the classifications obtained by cluster analysis could be further improved by some criterion.

The computer program (ITERIM) presented here is designed primarily to evaluate and improve classifications by objective criteria. [Note: It is recognized that the word "objective" is a relative term, and the selection of a so-called objective criterion for the evaluation or improvement of a classification involves a certain amount of subjective judgment on the part of the investigator.] In addition, as intermediate steps, the program computes principal components and multiple linear discriminant functions and performs a one-way multivariate analysis of variance. Techniques used for evaluation and improvement are nonprobabilistic in nature. It is assumed that data on which a classification is based are metric in nature, that is they consist of measurements taken on a continuous scale. For nonmetric or semiquantitative data other techniques of evaluation and improvement might be more appropriate, although, as an exploratory tool, the program may be useful for such data as well. The program accepts a classification as input. It does not do cluster analysis; nor does it assign a new item to a class. In computing the principal components, the classes are ignored and the data are treated as a whole. A number of options, both as to computations and output, are provided.

The ITERIM program described here is an outgrowth of the program originally given by Casetti (1964). The criteria used for evaluation and improvement of a classification are the same as those which Friedman and Rubin (1967) employ to "optimize" a partition in cluster analysis, although the ITERIM was written before Friedman and Rubin's paper was published. The papers of Forgy (1965) and MacQueen (1966) also are cognate with the techniques utilized in the program.

The writer is indebted to Dr. Paul Switzer for many valuable and stimulating discussions, and to Drs. J.E. Klovan and F.J. Rohlf for helpful editorial suggestions. All statements herein, however, are the responsibility of the writer. Partial financial support for the development of the program was provided by a NATO Science Fellowship to the writer and by a National Science Foundation grant (NSF GP 4514) to Dr. J. W. Harbaugh. The School of Earth Sciences of Stanford University furnished most of the computer time.

The complete text of this report is available as an Adobe Acrobat PDF file.

Read the PDF version (6.1 MB)


Kansas Geological Survey
Placed on web Aug. 26, 2019; originally published 1969.
Comments to webadmin@kgs.ku.edu
The URL for this page is http://www.kgs.ku.edu/Publications/Bulletins/CC/31/index.html