Kansas Geological Survey, Open-File Rept. 93-1B
Statistical Methods for Delineating Water Quality--Page 3 of 5
The 250 water analyses from the statewide distribution of one sample per location were converted to millequivalents/liter (meq/L) and to concentrations (%) of cations and anions for use in producing trilinear plots. The cations used were total dissolved calcium (Ca), magnesium (Mg), and sodium (Na). The anions used were total dissolved chloride (CI), sulfate (SO4), and bicarbonate (HCO3).
Figure 1. Location of study area subdivided into groups on the basis of geology. Zones 1 to 4 are used for plotting of trilinear diagrams and statistical tests. Symbols represent counties in each zone in figures 2 and 3. A larger version (137k) of this figure is available.
The study area was divided into four zones (based on geologic outcrop, subcrop, and depth of well) to facilitate presentation of similar water chemistry (fig. 1). These zones are the basis for the trilinear diagram groupings. The counties in zones 1 and 4, the outcrop and subcrop regions of the Dakota aquifer, were combined in fig. 2 because of similar chemistry and depth of wells. The counties in zones 2 and 3, the deeper portions of the Dakota Formation, are shown together in fig. 3 because of similar chemistry and depth of wells.
Figure 2. Piper (1944) trilinear diagrams of zones 1 and 4. Areas are grouped together based on their geology. Symbols represent counties in each zone.
Figure 3. Piper (1944) trilinear diagrams of zones 2 and 3. Areas are grouped together based on their geology. Symbols represent counties in each zone.
The trilinear diagrams show an evolution of water types as one moves westward and toward the interior of Kansas. The north-central and southwestern parts of the state (zones 1 and 4; figs. 1 and 2) have predominantly calcium bicarbonate waters with some sodium bicarbonate and a few sodium chloride waters. These water types represent recharge areas in the outcrop and subcrop regions of the Dakota aquifer and reflect the presence of local flow systems.
The west-central and central parts of the state (zones 2 and 3; figs. 1 and 3) have more sodium chloride and mixed-water types than either zone 1 or 4. These water types are representative of confined regional flow areas. In addition, particularly in zone 2, there is evidence of saltwater intrusion upward from the underlying Permian units.
Discriminant analysis calls for the data to be grouped into predefined classes before the analysis can be performed. The test calculates one or more discriminant functions depending on the number of predefined groups. This function is applied to the data to determine whether the data accurately fit into the predefined groups. If the data fails the internal test, the data are reclassified into a new group. A table of misclassified results is generated, and a group-fit table of percentage of results that fit the predefined groups plus the reclassified data is generated. Use of the group-fit table indicates how well the statistical method works for analysis of the data.
For this study chloride concentration was used as the group delimiter for the discriminant analysis because it is the most common constituent measured, and chloride is a conservative ion; that is, the concentration of chloride in water is not generally affected by microbial or geochemical processes or by water-rock interactions. The group classifications for the data as total chloride concentration (in mg/L) were <250, 250 <= Cl <= 1500, > 1500. The 250 mg/L limit was chosen because this is the maximum recommended contaminant limit set by the U.S. EPA for safe drinking water. The 250 <= Cl <= 1500 was set for mixed-water types. Concentrations greater than 1500 mg/L are considered brines.
The group classifications and the use of transformed data resulted in a 87.2% correct classification of the data (Table 1). Because sodium is generally associated with chloride, the tests were run both with and without sodium (Na+) to determine if sodium acted as a secondary group delimiter. The results of the analyses were similar indicating the effectiveness of using chloride as the sole group delimiter.
The analyses that were misclassified and assigned to another group had large percentages of cations or anions other than chloride in the data set. These other concentrations influenced the calculation of the discriminant function and caused the different classification (Table 2).
Table 1. Discriminant Analysis Classification Results
|Actual group||No. of cases||Predicted group membership|
(CL < 250)
(250 <= CL <= 1500)
(CL > 1500)
Table 2. Misclassifications from Discriminant Analysis.
section, quarter section
|Old group||New group||CA||MG||NA||CL||S04||HCO3|
|Cloud||07S 05W 02BBAB||1||2||104||17.9||403||78.1||568||684|
|Ellis||15S 2OW 35CAA||1||2||5||3||350||215||179||320|
|Gove||14S 26W 14DC||1||2||4.5||2.2||430||210||180||560|
|Gove||14S 27W 11DC||1||2||2.4||1||370||140||86||620|
|Greeley||17S 42W 36BBB||1||2||18||8.8||380||220||210||450|
|Logan||14S 33W 22CDD||1||2||4||1||350||70||230||450|
|Logan||14S 35W 17BCA||1||2||4.8||1||320||69||100||600|
|Lane||16S 28W 04BCD||1||2||4.8||2.8||300||120||77||480|
|Lane||16S 29W 11CCC||1||2||5.2||3||380||110||39||730|
|Mitchell||07S 06W IOCBB||1||2||10||2.3||350||110||150||570|
|Pawnee||20S 19W 28AAD||1||2||26||16||370||130||320||480|
|Rush||17S 16W 20||1||2||5||3||303||119||137||431|
|Trego||15S 24W 15CCC||1||2||26||1.1||429||244||304||390|
|Trego||14S 24W 19CCA||1||2||4||2||361||201||192||337|
|Wichita||18S 37W 24CCCA||1||2||5||2.6||365||63||500||350|
|Cloud||06S 05W 06CB||2||1||20||13||340||250||74||500|
|Cloud||08S 05W OICCCD||2||1||78.3||18||257||294||97||402|
|Ellsworth||15S 1OW 36CCB||2||1||140||55||710||1200||170||370|
|Ellsworth||16S 08W 04DAA||2||1||130||43||180||310||160||340|
|Graham||08S 23W 25||2||1||158||22||271||485||197||205|
|Hodgeman||23S 22W 29DDD||2||1||35||19||460||560||110||330|
|Mitchell||06S 07W 14BADD||2||1||89.7||35.6||380||507||128||458|
|Pawnee||22S 17W 19CBB||2||1||42||26||490||610||130||340|
|Rush||16S 17W 16DCDC||2||1||24||7.5||402||422||186||271|
|Rush||17S 2OW 3OCCB||2||1||32||25||400||270||400||300|
|Russell||14S 13W 12DAD||2||1||74||19||400||501||178||299|
|Ellis||15S 17W 23AB||3||2||35||26||1330||1650||389||370|
|Republic||04S 05W 23BC||3||2||19||15||1500||1700||260||920|
|Rice||19S IOW 33AAD||3||2||1100||84||210||2300||31||120|
|Rooks||10S 19W 35||3||2||56||58||1660||1920||664||630|
|Russell||15S 14W 07ABD||3||2||22||21||1400||1800||310||460|
The advantage of using discriminant analysis is that the resulting groups can be easily plotted and evaluated geographically to determine whether there is trend in the data resulting from location. Figures 4 and 5 show the original and reclassified data based on the three groups listed in table 1. Figure 6 shows the data distribution and the points that were reclassified.
Figure 4. Plot of original classification of discriminant analysis data based on concentration of chloride. A larger version of this figure is available.
Figure 5. Reclassification of discriminant analysis data based on concentration of chloride. A larger version of this figure is available.
Figure 6. Data distribution and reclassification results of discrimination analysis. A larger version of this figure is available.
Figures 4 and 5 show that the majority of the samples with low chloride percentages (<250 mg/L) are in the outcrop and subcrop regions of the Dakota aquifer. The mixed water types (250 mg/L <= chloride <= 1500 mg/L) are in the southwestern part of Kansas and in parts of the northeast. The zone of high chloride (> 1500 mg/L) is near the middle of the state with a few sites in the northeast.
Comparison of the discriminant analysis data with the trilinear classification of samples (figs. 2-5) suggests that chloride concentration alone may not be the best indicator of water typing for the area. As shown in table 2, the values that were misclassified from group 1 to 2 are mixed-water types with chloride concentrations of less than 250 mg/L but with a correspondingly high concentration of another anion and associated cation. The other misclassifications have chloride concentrations of greater than 250 mg/L but also higher concentrations of sulfate (SO-24) and/or bicarbonate (HCO-3).
The discriminant analysis is in close agreement with the water typing for samples that have chloride concentrations greater than 1500 (figs. 3-5). Based on these results discriminant analysis can provide a reasonable estimate for defining geographic areas of interest for further study by generating the table of misclassified data. Further evaluation of the misclassified data is presented in the discussion section.
Factor analysis is frequently used to reduce multiple variables for a single location to several factors that help to explain the relationship among the variables. In the current set of water chemistry data for the Dakota aquifer, the factor analysis did not result in any additional clarification of the data beyond that provided by discriminant analysis; however, the results were useful for plotting the data distribution by factor score. The factor score maps support the results of other methods for the distribution of water chemistry throughout Kansas.
The variables in our analysis were projected as vectors onto two arbitrarily oriented new axes termed factor axes and labeled factor I and factor II. If the factor axes are assumed to be a unit length as measured from the origin, then the value (termed loading) of any given projection from a variable vector onto a factor axis must be between -1 and + 1.
The effectiveness of factor loadings in representing the relationships between variables is shown by the commonality, which is the sum of the squared factor loading for each variable (Harbaugh and Demirmen, 1964). The present analysis shows that 88.8% (mean commonality) of the relationships between the six chemical variables is explained by the two factors (table 3). The factor matrix indicates that five of the variables (Mg, Na, Cl, S04, and HCO3 contribute to factor I and that one variable (Ca) contributes the most to factor II.
The original data were converted to factor scores by conversion to z scores [(x - )/s) (where x is the sample value, is the mean, and s is the standard deviation)] and multiplied by the factor score coefficient matrix generated by the SPSS factor analysis program. The calculated factor scores for the dominant factor (factor I, which accounted for 72% of the variation in the data) were plotted on a map of the areal distribution of the Dakota aquifer (fig. 7).
Figure 7. Highest factor analysis scores plot in central portion of state.
The area of highest values plots in the central part of Kansas in Russell, Barton, Ellsworth, Ellis, Trego, Rooks, and Rush counties (fig. 7). This concentration of high factor values adds support to the discriminant analysis plots (figs. 4 and 5) which indicate the occurrence of high chloride waters in this part of the state, to the rank order tests for correlation between chloride concentration and depth (next section), and to the predominance of sodium chloride waters on the trilinear diagrams for this area (fig. 3). The implication is that upward recharge of underlying saltwater occurs in this area because the base of the Dakota is in hydraulic contact with lower units, resulting in mixing of waters in this area (Townsend et al., 1989; Macfarlane et al., 1988). Factor analysis of the chemistry data in conjunction with hydrogeologic parameters may result in an increased understanding of the flow system and geochemical mixing in future studies of this area.
Table 3. Factor Matrix (Loadings)
|Mean commonality = 0.888|
Nonparametric tests (Kendall's and Spearman's ) were used to measure the degree of correlation between chloride and depth. Both methods are rank-ordered tests of the degree of independence between random variables. These methods were used to determine whether a correlation exists between chloride concentration and depth. There are 926 valid cases in the data set. The calculated Spearman's of 0.365 is a significant value, indicating that some relationship exists between depth and chloride content (table 4). The value of Kendall's (0.247) also shows that some correlation exists between chloride and depth (table 4). The values of both Spearman's and Kendall's are small, suggesting that the correlation is not strong (table 4).
Evaluation of the entire data set (926 measurements) for a correlation between depth and chloride concentration (in mg/L) was also done by small groups based on county code and approximate geologic depth of the Dakota Formation (table 4; fig. 1). The results show that the north-central tier of counties (zone 1), which corresponds to the outcrop regions of the Dakota formation, show low Kendall and Spearman values indicating that depth and chloride are not well correlated; zones 3 (west-central) and 4 (south-central) also show low correlations; and zone 2 (central) shows the best correlation for both methods.
Table 4. Kendall and Spearman Correlation for Chloride and Depth
|Area of interest||Sample size||Kendall||Spearman|
Previous page--Statistical Methods ||
Start of this report || Table of Contents