Dakota Home Report Archive Report Start

Kansas Geological Survey, Open-File Rept. 93-1B
Statistical Methods for Delineating Water Quality--Page 3 of 5


Comparison of Methods

Piper Trilinear Diagrams and Geographic Distribution

The 250 water analyses from the statewide distribution of one sample per location were converted to millequivalents/liter (meq/L) and to concentrations (%) of cations and anions for use in producing trilinear plots. The cations used were total dissolved calcium (Ca), magnesium (Mg), and sodium (Na). The anions used were total dissolved chloride (CI), sulfate (SO4), and bicarbonate (HCO3).

Figure 1. Location of study area subdivided into groups on the basis of geology. Zones 1 to 4 are used for plotting of trilinear diagrams and statistical tests. Symbols represent counties in each zone in figures 2 and 3. A larger version (137k) of this figure is available.

The study area was divided into four zones (based on geologic outcrop, subcrop, and depth of well) to facilitate presentation of similar water chemistry (fig. 1). These zones are the basis for the trilinear diagram groupings. The counties in zones 1 and 4, the outcrop and subcrop regions of the Dakota aquifer, were combined in fig. 2 because of similar chemistry and depth of wells. The counties in zones 2 and 3, the deeper portions of the Dakota Formation, are shown together in fig. 3 because of similar chemistry and depth of wells.

Figure 2. Piper (1944) trilinear diagrams of zones 1 and 4. Areas are grouped together based on their geology. Symbols represent counties in each zone.

Figure 3. Piper (1944) trilinear diagrams of zones 2 and 3. Areas are grouped together based on their geology. Symbols represent counties in each zone.

The trilinear diagrams show an evolution of water types as one moves westward and toward the interior of Kansas. The north-central and southwestern parts of the state (zones 1 and 4; figs. 1 and 2) have predominantly calcium bicarbonate waters with some sodium bicarbonate and a few sodium chloride waters. These water types represent recharge areas in the outcrop and subcrop regions of the Dakota aquifer and reflect the presence of local flow systems.

The west-central and central parts of the state (zones 2 and 3; figs. 1 and 3) have more sodium chloride and mixed-water types than either zone 1 or 4. These water types are representative of confined regional flow areas. In addition, particularly in zone 2, there is evidence of saltwater intrusion upward from the underlying Permian units.

Discriminant Analysis

Discriminant analysis calls for the data to be grouped into predefined classes before the analysis can be performed. The test calculates one or more discriminant functions depending on the number of predefined groups. This function is applied to the data to determine whether the data accurately fit into the predefined groups. If the data fails the internal test, the data are reclassified into a new group. A table of misclassified results is generated, and a group-fit table of percentage of results that fit the predefined groups plus the reclassified data is generated. Use of the group-fit table indicates how well the statistical method works for analysis of the data.

For this study chloride concentration was used as the group delimiter for the discriminant analysis because it is the most common constituent measured, and chloride is a conservative ion; that is, the concentration of chloride in water is not generally affected by microbial or geochemical processes or by water-rock interactions. The group classifications for the data as total chloride concentration (in mg/L) were <250, 250 <= Cl <= 1500, > 1500. The 250 mg/L limit was chosen because this is the maximum recommended contaminant limit set by the U.S. EPA for safe drinking water. The 250 <= Cl <= 1500 was set for mixed-water types. Concentrations greater than 1500 mg/L are considered brines.

The group classifications and the use of transformed data resulted in a 87.2% correct classification of the data (Table 1). Because sodium is generally associated with chloride, the tests were run both with and without sodium (Na+) to determine if sodium acted as a secondary group delimiter. The results of the analyses were similar indicating the effectiveness of using chloride as the sole group delimiter.

The analyses that were misclassified and assigned to another group had large percentages of cations or anions other than chloride in the data set. These other concentrations influenced the calculation of the discriminant function and caused the different classification (Table 2).

Table 1. Discriminant Analysis Classification Results

  Actual group No. of cases Predicted group membership
1 2 3
Group
(CL < 250)
1 181 166
91.7%
15
8.3%
0
0%
Group
(250 <= CL <= 1500)
2 48 12
25.0%
36
75.0%
0
0.0%
Group
(CL > 1500)
3 21 0
0.0%
5
23.8%
16
76.2%
Percent of "grouped" cases correctly classified: 87.2%

Table 2. Misclassifications from Discriminant Analysis.

County Township, range
section, quarter section
Old group New group CA MG NA CL S04 HCO3
Cloud 07S 05W 02BBAB 1 2 104 17.9 403 78.1 568 684
Ellis 15S 2OW 35CAA 1 2 5 3 350 215 179 320
Gove 14S 26W 14DC 1 2 4.5 2.2 430 210 180 560
Gove 14S 27W 11DC 1 2 2.4 1 370 140 86 620
Greeley 17S 42W 36BBB 1 2 18 8.8 380 220 210 450
Logan 14S 33W 22CDD 1 2 4 1 350 70 230 450
Logan 14S 35W 17BCA 1 2 4.8 1 320 69 100 600
Lane 16S 28W 04BCD 1 2 4.8 2.8 300 120 77 480
Lane 16S 29W 11CCC 1 2 5.2 3 380 110 39 730
Mitchell 07S 06W IOCBB 1 2 10 2.3 350 110 150 570
Pawnee 20S 19W 28AAD 1 2 26 16 370 130 320 480
Rush 17S 16W 20 1 2 5 3 303 119 137 431
Trego 15S 24W 15CCC 1 2 26 1.1 429 244 304 390
Trego 14S 24W 19CCA 1 2 4 2 361 201 192 337
Wichita 18S 37W 24CCCA 1 2 5 2.6 365 63 500 350
Cloud 06S 05W 06CB 2 1 20 13 340 250 74 500
Cloud 08S 05W OICCCD 2 1 78.3 18 257 294 97 402
Ellsworth 15S 1OW 36CCB 2 1 140 55 710 1200 170 370
Ellsworth 16S 08W 04DAA 2 1 130 43 180 310 160 340
Graham 08S 23W 25 2 1 158 22 271 485 197 205
Hodgeman 23S 22W 29DDD 2 1 35 19 460 560 110 330
Hodgeman 62IS21W 3IDDA 2 1 27 14 480 540 190 270
Mitchell 06S 07W 14BADD 2 1 89.7 35.6 380 507 128 458
Pawnee 22S 17W 19CBB 2 1 42 26 490 610 130 340
Rush 16S 17W 16DCDC 2 1 24 7.5 402 422 186 271
Rush 17S 2OW 3OCCB 2 1 32 25 400 270 400 300
Russell 14S 13W 12DAD 2 1 74 19 400 501 178 299
Ellis 15S 17W 23AB 3 2 35 26 1330 1650 389 370
Republic 04S 05W 23BC 3 2 19 15 1500 1700 260 920
Rice 19S IOW 33AAD 3 2 1100 84 210 2300 31 120
Rooks 10S 19W 35 3 2 56 58 1660 1920 664 630
Russell 15S 14W 07ABD 3 2 22 21 1400 1800 310 460

The advantage of using discriminant analysis is that the resulting groups can be easily plotted and evaluated geographically to determine whether there is trend in the data resulting from location. Figures 4 and 5 show the original and reclassified data based on the three groups listed in table 1. Figure 6 shows the data distribution and the points that were reclassified.

Figure 4. Plot of original classification of discriminant analysis data based on concentration of chloride. A larger version of this figure is available.

Figure 5. Reclassification of discriminant analysis data based on concentration of chloride. A larger version of this figure is available.

Figure 6. Data distribution and reclassification results of discrimination analysis. A larger version of this figure is available.

Figures 4 and 5 show that the majority of the samples with low chloride percentages (<250 mg/L) are in the outcrop and subcrop regions of the Dakota aquifer. The mixed water types (250 mg/L <= chloride <= 1500 mg/L) are in the southwestern part of Kansas and in parts of the northeast. The zone of high chloride (> 1500 mg/L) is near the middle of the state with a few sites in the northeast.

Comparison of the discriminant analysis data with the trilinear classification of samples (figs. 2-5) suggests that chloride concentration alone may not be the best indicator of water typing for the area. As shown in table 2, the values that were misclassified from group 1 to 2 are mixed-water types with chloride concentrations of less than 250 mg/L but with a correspondingly high concentration of another anion and associated cation. The other misclassifications have chloride concentrations of greater than 250 mg/L but also higher concentrations of sulfate (SO-24) and/or bicarbonate (HCO-3).

The discriminant analysis is in close agreement with the water typing for samples that have chloride concentrations greater than 1500 (figs. 3-5). Based on these results discriminant analysis can provide a reasonable estimate for defining geographic areas of interest for further study by generating the table of misclassified data. Further evaluation of the misclassified data is presented in the discussion section.

Factor Analysis

Factor analysis is frequently used to reduce multiple variables for a single location to several factors that help to explain the relationship among the variables. In the current set of water chemistry data for the Dakota aquifer, the factor analysis did not result in any additional clarification of the data beyond that provided by discriminant analysis; however, the results were useful for plotting the data distribution by factor score. The factor score maps support the results of other methods for the distribution of water chemistry throughout Kansas.

The variables in our analysis were projected as vectors onto two arbitrarily oriented new axes termed factor axes and labeled factor I and factor II. If the factor axes are assumed to be a unit length as measured from the origin, then the value (termed loading) of any given projection from a variable vector onto a factor axis must be between -1 and + 1.

The effectiveness of factor loadings in representing the relationships between variables is shown by the commonality, which is the sum of the squared factor loading for each variable (Harbaugh and Demirmen, 1964). The present analysis shows that 88.8% (mean commonality) of the relationships between the six chemical variables is explained by the two factors (table 3). The factor matrix indicates that five of the variables (Mg, Na, Cl, S04, and HCO3 contribute to factor I and that one variable (Ca) contributes the most to factor II.

The original data were converted to factor scores by conversion to z scores [(x - mu)/s) (where x is the sample value, mu is the mean, and s is the standard deviation)] and multiplied by the factor score coefficient matrix generated by the SPSS factor analysis program. The calculated factor scores for the dominant factor (factor I, which accounted for 72% of the variation in the data) were plotted on a map of the areal distribution of the Dakota aquifer (fig. 7).

Figure 7. Highest factor analysis scores plot in central portion of state.

The area of highest values plots in the central part of Kansas in Russell, Barton, Ellsworth, Ellis, Trego, Rooks, and Rush counties (fig. 7). This concentration of high factor values adds support to the discriminant analysis plots (figs. 4 and 5) which indicate the occurrence of high chloride waters in this part of the state, to the rank order tests for correlation between chloride concentration and depth (next section), and to the predominance of sodium chloride waters on the trilinear diagrams for this area (fig. 3). The implication is that upward recharge of underlying saltwater occurs in this area because the base of the Dakota is in hydraulic contact with lower units, resulting in mixing of waters in this area (Townsend et al., 1989; Macfarlane et al., 1988). Factor analysis of the chemistry data in conjunction with hydrogeologic parameters may result in an increased understanding of the flow system and geochemical mixing in future studies of this area.

Table 3. Factor Matrix (Loadings)

  Ca mg Na Cl S04 HCO3
Factor I 0.45873 0.95046 0.96082 0.95075 0.90435 0.76243
Factor II 0.85499 0.08248 -0.13155 0.03581 -0.01403 -0.47947
Communality 0.94144 0.91018 0.94047 0.90521 0.81805 0.81118
Mean commonality = 0.888

Nonparametric Correlation

Nonparametric tests (Kendall's tau and Spearman's rho) were used to measure the degree of correlation between chloride and depth. Both methods are rank-ordered tests of the degree of independence between random variables. These methods were used to determine whether a correlation exists between chloride concentration and depth. There are 926 valid cases in the data set. The calculated Spearman's rho of 0.365 is a significant value, indicating that some relationship exists between depth and chloride content (table 4). The value of Kendall's tau (0.247) also shows that some correlation exists between chloride and depth (table 4). The values of both Spearman's rho and Kendall's tau are small, suggesting that the correlation is not strong (table 4).

Evaluation of the entire data set (926 measurements) for a correlation between depth and chloride concentration (in mg/L) was also done by small groups based on county code and approximate geologic depth of the Dakota Formation (table 4; fig. 1). The results show that the north-central tier of counties (zone 1), which corresponds to the outcrop regions of the Dakota formation, show low Kendall tau and Spearman rho values indicating that depth and chloride are not well correlated; zones 3 (west-central) and 4 (south-central) also show low correlations; and zone 2 (central) shows the best correlation for both methods.

Table 4. Kendall tau and Spearman rho Correlation for Chloride and Depth

Area of interest Sample size Kendall tau Spearman rho
All zones 926 0.247a 0.365a
Zone 1 357 -0.002b
(0.951)
0.006b
(0.908)
Zone 2 301 0.465a 0.662a
Zone 3 100 0.326a 0.469a
Zone 4 150 0.189b
(0.0007)
0.300b
(0.0002)
a. Significant at the 0.0001 level.
b. Significance level indicated in parentheses.

Previous page--Statistical Methods || Next page--Discussion
Start of this report || Table of Contents


Kansas Geological Survey, Dakota Aquifer Program
Original report available from the Kansas Geological Survey.
Electronic version placed online Nov. 1998
Scientific comments to P. Allen Macfarlane
Web comments to webadmin@kgs.ku.edu
URL=http://www.kgs.ku.edu/Dakota/vol3/ofr93_1b/rep03.htm