Hodgeman County Study, part 3 of 12
In classical applications of discriminant analysis, one employs a training set including direct indicators of group membership. Medical science provides good examples. In certain instances, a group of sick patients is carefully examined to prepare a clinical record relating internal conditions to their external symptoms and laboratory analyses, a costly and lengthy classification only possible after performing surgery or autopsy. In this context, based only on external symptoms and laboratory analyses, the task of discriminant analysis is to determine the probability that other patients suffer the internal conditions considered in the training set.
In earth sciences what prevails is the equivalent of knowing only
the external symptoms and never knowing the group assignments. Even
worse, the nature and number of the groups is seldom known.
Determining the groups is as important in such studies as the
assignments themselves. In such a situation one replaces the
training set assignments by a typification using another method
able to decide both the number and characteristics of the groups.
The best alternative is cluster analysis, which aside from any prior
information not contained in the data, solves the following problem:
given a coregionalization sampling of size
comprising
attributes, is there any
evidence for clustering the sites into groups
scattered around centroids, as against the alternative hypothesis that
they are an unstructured coregionalization?
There are several ways to define and find the clusters depending
partly on the metric used to determine proximity among the sites in
the dimensional
attribute space, the Euclidean distance being
the most common and the only one considered here. The geographical
distance is completely ignored both in cluster analysis and in
discriminant analysis and is only considered at the final mapping
stage of regionalized classification.
There have been many studies comparing various methods of cluster analysis using artificial data sets containing known clusters produced by Monte Carlo methods. In most of these studies the Ward's minimum variance method has been the one with the best overall performance on reproducing the known clusters (SAS, 1990, p. 56).
Previous Page--Introduction ||
Next Page--Ward's Method
Dakota Home ||
Start of Hodgeman County Study