KGS Home List of Computer Contributions

Kansas Geological Survey, Computer Contributions 46, originally published in 1970


FORTRAN IV Program for Q-Mode Cluster Analysis on Distance Function with Printed Dendrogram

by James M. Parks

Lehigh University

small image of the cover of the book; light purple-gray paper with darker text.

Originally published in 1970 as Kansas Geological Survey Computer Contributions 46.

Introduction

In statistical literature, "classification" means the assignment of elements into a priori defined classes (Rohlf, 1963, p. 3). In taxonomy (biology and paleontology), geology, oceanography and this paper, classification is the act or result "of putting similar objects into an unknown number of distinct categories, with the objects in each category being more similar to each other than to the objects in all other categories" (Parks, 1966, p , 703). Various techniques of cluster analysis (also termed "numerical taxonomy", Sokal and Sneath, 1963) have been used to find "natural" classifications inherent in the data (some recent examples are Sokal and Sneath, 1963; Parks, 1964, 1966; and Bonham-Carter, 1965,1967). Factor analysis (Imbrie and Purdy, 1962), principal components analysis (McCammon, 1966), canonical analysis (Oxnard, 1969), multiple component analysis (McCammon, 1968a) and optimizing an objective function (Ward, 1963) also have been used for this purpose. Most of these methods produce essentially similar results if applied to the same data (Parks, 1966, 1969).

Early versions of the computer program described in this paper were written in MAD language for an IBM 704 computer while I was with The Pure Oil Company Research Center. Later at the Union Oil Company of California Research Center I rewrote the early version in FORTRAN for an IBM 1620 utilizing disk storage for the similarity matrix. A subsequent version was written in FORTRAN for an IBM 360/30 at the Union Research Center, in which the similarity matrix did not have to be stored at all. A Univac 1108 was used for the first successful attempts at combining an R-mode principal components analysis with a Q-mode cluster analysis. At Lehigh University I rewrote these programs for a GE 225 computer, utilizing a segment-link-chain technique so that the entire program did not have to occupy memory at one time, but the relatively small memory of the GE 225 imposed a limit of 200 samples. Finally the program has been rewritten in FORTRAN IV for Lehigh's CDC 6400 computer, using overlays to achieve a 1000 sample-200 variable capacity on a 65k memory machine (with 4 scratch tapes). I wish to thank the Computing Center of Lehigh University for providing unsponsored research computer time and helpful advice.

The complete text of this report is available as an Adobe Acrobat PDF file.

Read the PDF version (3 MB)


Kansas Geological Survey
Placed on web Aug. 23, 2019; originally published 1970.
Comments to webadmin@kgs.ku.edu
The URL for this page is http://www.kgs.ku.edu/Publications/Bulletins/CC/46/index.html