KGS Home List of Computer Contributions

Kansas Geological Survey, Computer Contributions 17, originally published in 1967


FORTRAN IV Program for Q-Mode Cluster Analysis of Nonquantitative Data using IBM 7090/7094 Computers

by G. F. Bonham-Carter

Stanford University

small image of the cover of the book; tan paper with red text.

Originally published in 1967 as Kansas Geological Survey Computer Contributions 17.

Abstract

CLUST3 is an IBM 7090/7094 FORTRAN IV program for classifying objects into groups based on a large number of nonquantitative characters considered simultaneously. Either Sokal and Michener's coefficient or Jaccard's coefficient may be used as a measure of similarity. The mean expected value of similarity also is calculated. The weighted or unweighted pair-group method may be used for clustering and results may be displayed automatically by setting a plotting option for drawing a dendrogram. A version for the IBM 360/67 also is obtainable from the Kansas Geological Survey.

Introduction

Cluster analysis is a mathematical method of classification which has been applied in a number of fields, particularly biological taxonomy (Sokal and Sneath, 1963). By this method, a measure of resemblance or similarity is computed between all possible pairs of objects being classified; the objects then are linked progressively to form groups, by the criterion that the average similarity between members of the same group is greater than the average similarity between members of different groups. The technique is particularly useful if each object is described by a large number of variables or attributes, for under these circumstances, subjective grouping 'by eye' becomes difficult.

Despite the usefulness of numerical classification, application is limited in geology due to the difficulty of assigning meaningful numbers to many geological attributes. For instance, properties such as the shape and color of rocks do not fit easily a metric scale of measurement. Furthermore, where measurement techniques do exist, they are laborious and time-consuming, e.g. petrographic modal analysis by point-counting.

This program is designed specifically to perform cluster analyses of data that are essentially qualitative. Observations on each object (geological sample, locality, etc.) are recorded on a two-state nominal scale, e.g. attribute present / attribute absent, character positive / character negative, etc. Semiquantitative measurements made on a scale such as abundant / present / rare / absent may also be used, after coding into a two-state form, as described below.

By employing cluster analysis to group and simplify these data, geological properties which are difficult to measure quantitatively may be used for multivariate classification. Geological usage of cluster analysis with nonquantitative data includes limestone facies analysis (Klovan, 1964, Bonham-Carter, 1965, 1967) and ecology of Recent Foraminifera and Ostracoda (Kaesler, 1966). For geological applications of cluster analysis involving fully quantitative data, see Purdy (1963), Howd (1964), Behrens (1965), and Parks (1966).

The complete text of this report is available as an Adobe Acrobat PDF file.

Read the PDF version (4.6 MB)


Kansas Geological Survey
Placed on web Aug. 29, 2019; originally published 1967.
Comments to webadmin@kgs.ku.edu
The URL for this page is http://www.kgs.ku.edu/Publications/Bulletins/CC/17/index.html