Dakota--Hodgeman County Study--5

Dakota Aquifer Program--Geologic Framework

Hodgeman County Study, part 5 of 12

Discriminant Analysis

In practice the answer to the question of how to group vectorial observations

is never straightforward. First there is the problem of deciding how many groups, which to a large extent one can settle in terms of external information or rules based on the total sum of errors that one can obtain running cluster analysis.

A second and more difficult problem is the group assignment for observations that are not like any of the typical observations in a group and at the same time poorly resemble more than one group. Cluster analysis assignment for such problematic observations is unstable and varies depending on the measure of similarity and the clustering method.

If the quality of group assignments is a concern, cluster analysis does not offer easy answers. That is the realm of discriminant analysis, which decides the assignment of any vector based on allocation probabilities. Discriminant analysis, however, requires a training classification, which in regionalized classification is the one generally produced by cluster analysis.

The basis of discriminant analysis for assigning observations to one of a given number of groups is the minimization of the total misallocation cost. The procedure assumes that it is possible to partition the attribute space in as many mutually exclusive and exhaustive regions R(i) as there are groups. A site is said to belong to group when the vector of attributes associated with the site falls in R(i) .

Theorem 1

Let

be a vectorial random function with probability density function f(z)

and let

be a realization of

at site

for short. Let pi(i)

be the proportional share of observations in the i th

group whose probability density function is f(i)(z)

. Then the probability p(i)(z)

that the site characterized by

belongs to the i th

group is

probability equation

Proof

The proof directly follows from Bayes' Theorem. If one considers that pi(i) can be regarded as the a priori probability of sampling group , then p(i)(z) is the posterior probability that belongs to group .

The optimality in terms of the misallocation cost is assured by assigning to the group with the highest probability (McLachlan, 1992, p. 7).

The kernel of discriminant analysis is the calculation of the probabilities in Theorem 1. As is always the case in statistics, there are non-parametric and parametric methods. By far, multivariate normal methods prevail among the latter and overall.

Definition 3

Let

be a realization of a vectorial random function at site x, z

for short. Let x(i)

be the vectorial mean and Sigma i

the covariance matrix of group

. Then the Mahalanobis distance from

to the centroid of group

is the squared weighted distance

squared weighted distance

Theorem 2

Let

be a normal vectorial random function with heteroscedastic normal group distributions. If pi(i)

is the a priori probability f group

and

is the Mahalanobis distance in Definition 3, then the probability p(i)(z)

that the site characterized by

belongs to the i th

group is

probability

Proof

From Theorem 1

probability equation

If the probability distribution for the i th group is normal,

probability equation

where is the dimension of the attribute space. Considering that in addition f(z) definition (McLachlan, 1992, p. 5), then

probability equation with substitution

Cancellation of the 2 pi term terms proves the theorem.

Theorem 3

Let

be a normal vectorial random function with homoscedastic normal group distributions. If pi(i)

is the a priori probability of group

and

is the Mahalanobis distance in Definition 3 computed using the common covariance matrix Sigma

, then the probability p(i)(z)

that the site characterized by

belongs to the i th

group is

probability equation

Proof

From Theorem 2, considering that all covariance matrices are the same

probability equation with substitution

Proof follows by the cancellation of the covariance matrix determinant term and consolidation of exponents after the trivial transformation substitution for pi(i) .

Use of the model in Theorem 3 results in hyperplanes for the boundaries of the group regions R(i) , thus the term of linear discriminant analysis, while quadratic discriminant analysis deals with different covariance matrices and second order surfaces arising from the heteroscedastic model.

Whatever the model of discriminant analysis used for regionalized classification, the model requires a training set to determine the number of groups and for the assignment of realizations to the different groups in order to have some data for estimating the centroid and the covariance matrix of such groups.

Algorithm 2

This is a procedure for the calculation of group probabilities for vectorial measurements. The method employs a normal model of discriminant analysis.

Run Algorithm 1 using the whole sampling of the coregionalization.
Break the sampling into groups based on the total sum of errors, some external criteria, or both.
Use the vectorial measurements to estimate all group centroids, covariance matrices, and make a decision about the a priori group probabilities. The alternatives are:
- If the sampling properly represents the true group sizes, the best estimate of the a priori probabilities are the relative proportions of the sites per group
- If the proportions of sites per group are regarded as not being indicative of the a priori group probabilities and the user believes that he or she has some better external information to assess the probabilities, the user may employ such external assessment.
- If the proportions of sites per group are regarded as not being indicative of the a priori group probabilities and the user lacks ways to assess them, they can be made equal to , in which case they are ignored in the calculations.
Compare the group covariance matrices and decide whether they are sufficiently similar to assume homoscedasticity.
Calculate the Mahalanobis distance in Definition 3 for measurement . Use the average of all group covariances if the assumption is that the covariance is homoscedastic.
Compute for measurement the probability of belonging to each of the groups

where, if

, the discriminant analysis is linear with unknown or equal a priori group probabilities;
, the discriminant analysis is linear with known and different a priori group probabilities;
, the discriminant analysis is quadratic with unknown or equal a priori group probabilities;
, the discriminant analysis is quadratic with known and different a priori group probabilities.
The choice of must be consistent with the decision of homoscedasticity made in step 4. The term is the Mahalanobis distance in Definition 3; and is the estimate of the covariance matrix for group ;
Go back to step 7 until assigning group probabilities for all sites in the training set. Major statistical computer packages such as IMSL(1987) and SAS (1990) have implementations of Algorithm 2 among several other procedures.
Statisticians concur that quadratic discriminant analysis indeed provides superior results if the group covariances are considerably different and the group sizes are large. Quadratic discriminant analysis, however, is more sensitive to deviations from multinormality and assignment errors in the training set (Lachenbruch, 1982).
Previous Page--Ward's Method || Next Page--Allocation by Extension
Dakota Home || Start of Hodgeman County Study
Kansas Geological Survey, Dakota Aquifer Program
Updated Sept. 16, 1996.
Scientific comments to P. Allen Macfarlane
Web comments to webadmin@kgs.ku.edu
The URL for this page is HTTP://www.kgs.ku.edu/Dakota/vol1/geo/hodge5.htm