Dakota Home Introduction Subsurface Hydrology Geologic Framework Petrophysics Water Quality
Dakota Aquifer Program--Geologic Framework

Hodgeman County Study, part 5 of 12


Discriminant Analysis

In practice the answer to the question of how to group vectorial observations z is never straightforward. First there is the problem of deciding how many groups, which to a large extent one can settle in terms of external information or rules based on the total sum of errors that one can obtain running cluster analysis.

A second and more difficult problem is the group assignment for observations that are not like any of the typical observations in a group and at the same time poorly resemble more than one group. Cluster analysis assignment for such problematic observations is unstable and varies depending on the measure of similarity and the clustering method.

If the quality of group assignments is a concern, cluster analysis does not offer easy answers. That is the realm of discriminant analysis, which decides the assignment of any vector z based on allocation probabilities. Discriminant analysis, however, requires a training classification, which in regionalized classification is the one generally produced by cluster analysis.

The basis of discriminant analysis for assigning observations to one of a given number of groups is the minimization of the total misallocation cost. The procedure assumes that it is possible to partition the attribute space in as many mutually exclusive and exhaustive regions R(i) as there are groups. A site is said to belong to group i when the vector z of attributes associated with the site falls in R(i).

Theorem 1

Let Z be a vectorial random function with probability density function f(z) and let z(x) be a realization of Z at site x, z for short. Let pi(i) be the proportional share of observations in the i th group whose probability density function is f(i)(z). Then the probability p(i)(z) that the site characterized by z belongs to the i th group is

probability equation

Proof

The proof directly follows from Bayes' Theorem. If one considers that pi(i) can be regarded as the a priori probability of sampling group i, then p(i)(z) is the posterior probability that z belongs to group i.

The optimality in terms of the misallocation cost is assured by assigning z to the group with the highest probability (McLachlan, 1992, p. 7).

The kernel of discriminant analysis is the calculation of the probabilities in Theorem 1. As is always the case in statistics, there are non-parametric and parametric methods. By far, multivariate normal methods prevail among the latter and overall.

Definition 3

Let z(x) be a realization of a vectorial random function at site x, z for short. Let x(i) be the vectorial mean and Sigma i the covariance matrix of group i. Then the Mahalanobis distance from z to the centroid of group i is the squared weighted distance

squared weighted distance

Theorem 2

Let Z be a normal vectorial random function with heteroscedastic normal group distributions. If pi(i) is the a priori probability f group i and Mahalanobis distance is the Mahalanobis distance in Definition 3, then the probability p(i)(z) that the site characterized by z belongs to the i th group is

probability

Proof

From Theorem 1

probability equation

If the probability distribution for the i th group is normal,

probability equation

where p is the dimension of the attribute space. Considering that in addition f(z) definition (McLachlan, 1992, p. 5), then

probability equation with substitution

Cancellation of the 2 pi term terms proves the theorem.

Theorem 3

Let Z be a normal vectorial random function with homoscedastic normal group distributions. If pi(i) is the a priori probability of group i and Mahalanobis distance is the Mahalanobis distance in Definition 3 computed using the common covariance matrix Sigma, then the probability p(i)(z) that the site characterized by z belongs to the i th group is

probability equation

Proof

From Theorem 2, considering that all covariance matrices are the same

probability equation with substitution

Proof follows by the cancellation of the covariance matrix determinant term and consolidation of exponents after the trivial transformation substitution for pi(i).

Use of the model in Theorem 3 results in hyperplanes for the boundaries of the group regions R(i), thus the term of linear discriminant analysis, while quadratic discriminant analysis deals with different covariance matrices and second order surfaces arising from the heteroscedastic model.

Whatever the model of discriminant analysis used for regionalized classification, the model requires a training set to determine the number of groups and for the assignment of realizations to the different groups in order to have some data for estimating the centroid and the covariance matrix of such groups.

Algorithm 2

This is a procedure for the calculation of group probabilities for vectorial measurements. The method employs a normal model of discriminant analysis.
  1. Run Algorithm 1 using the whole sampling of the coregionalization.
  2. Break the sampling into groups based on the total sum of errors, some external criteria, or both.
  3. Use the vectorial measurements to estimate all group centroids, covariance matrices, and make a decision about the a priori group probabilities. The alternatives are:
  4. Compare the group covariance matrices and decide whether they are sufficiently similar to assume homoscedasticity.
  5. Calculate the Mahalanobis distance in Definition 3 for measurement z. Use the average of all group covariances if the assumption is that the covariance is homoscedastic.
  6. Compute for measurement z the probability p(i)(z) of belonging to each of the groups

    probability p(i)(z)

    where, if

    option 1, the discriminant analysis is linear with unknown or equal a priori group probabilities;
    option 2, the discriminant analysis is linear with known and different a priori group probabilities;
    option 3, the discriminant analysis is quadratic with unknown or equal a priori group probabilities;
    option 4, the discriminant analysis is quadratic with known and different a priori group probabilities.
    The choice of D(i) squared (z) must be consistent with the decision of homoscedasticity made in step 4. The term Mahalanobis distance is the Mahalanobis distance in Definition 3; and S(i) is the estimate of the covariance matrix for group i;
  7. Go back to step 7 until assigning group probabilities for all sites in the training set. Major statistical computer packages such as IMSL(1987) and SAS (1990) have implementations of Algorithm 2 among several other procedures.

    Statisticians concur that quadratic discriminant analysis indeed provides superior results if the group covariances are considerably different and the group sizes are large. Quadratic discriminant analysis, however, is more sensitive to deviations from multinormality and assignment errors in the training set (Lachenbruch, 1982).

    Previous Page--Ward's Method || Next Page--Allocation by Extension
    Dakota Home || Start of Hodgeman County Study


    Kansas Geological Survey, Dakota Aquifer Program
    Updated Sept. 16, 1996.
    Scientific comments to P. Allen Macfarlane
    Web comments to webadmin@kgs.ku.edu
    The URL for this page is HTTP://www.kgs.ku.edu/Dakota/vol1/geo/hodge5.htm