KGS Home Water Resources Index Page Water-level Program

KGS, Open-file Report 2000-7

Next page--Appendix A

Preliminary Report on Statistical Quality Control for Year 2000 Water Well Measurements

by John C. Davis

January 14 2000
KGS Open-file Report 2000-7

Introduction

The Quality Control and Assurance Program for the year 2000 observation well measurement season followed the general outline for quality assurance developed during the preceding three years, as described in Miller, Davis, and Olea (1997). This discussion of procedures is taken primarily from that source.

The primary variable measured in the water well observation program is depth to water in an observation well. This primary variable is associated with three secondary variables; the ground elevation, east-west coordinate, and north-south coordinate of the well. The secondary variables serve to locate the primary variable in space, and make it possible to determine spatial relationships between observation wells, including mapping the water table and calculating changes in aquifer volume. Historically, the three location variables were determined initially by the U.S. Geological Survey for each well and not re-determined unless a serious error in the original coordinates was suspected. In the 1997 ground water observation measurement program conducted by the Kansas Geological Survey, the geographic (latitude and longitude) coordinates of all wells were re-determined by GPS techniques. In subsequent year's measurement programs, all observation wells were again re-determined by GPS. The 1999 measurements are considered the most accurate for reasons discussed in Miller, Davis, and Olea (1998) and are used in this study, except that year 2000 GPS measurements are used for wells that had no 1999 measurements.

In addition, several secondary characteristics of the observation wells and of the measurement procedure were noted in order to determine if these might influence the quality of the measurements being made (in statistical parlance, these extra measurements are called exogenous variables). As part of the quality control program, water level measurements were repeated two or more times on 120 wells, yielding a collection of 150 quality control observations. Because these data include replicates, they provide an additional check on estimates of the influence of well conditions or measuring techniques on water levels. A subsequent round of measurements resampled 56 wells selected at random from the original set for quality assurance purposes.

The primary variable, depth to water, varies with geographic location and differences in topography so much that these factors will overwhelm all other sources of variation. This means that any errors in location may have a profound effect on the water table elevation. To avoid the complications of simultaneously considering uncertainties in the secondary variables, this statistical quality control study is based on first differences (specifically, the difference between 2000 and 1999 depth-to-water measurements). The secondary variables cancel out, leaving only the difference in depth, which is numerically identical to the year's change in water level. In this statistical quality control study, the difference between 2000 and 1999 corrected depth measurements is abbreviated "'00-'99." If the water table is lower this year, the variable '00-'99 will be a positive number. Because all wells measured in the current program were also measured in 1999, there are a total of 548 wells having the variable '00-'99.

The objective in our quality control study is to identify and assess possible sources of unwanted variation in water level measurements made by the KGS. The purpose of the analysis is to provide guidance to the KGS field measurement program, to suggest ways in which field measurements might be improved, and to provide information necessary to identify past or current measurements that are suspect. The statistical quality control and field measurement programs have been intimately intertwined from the outset when the KGS assumed responsibility in 1997 for measuring observation wells formerly measured by the USGS. A comparison of results from 2000 with those from previous years shows that the desired improvements in the measurement program are being achieved through quality control.

Statistical Procedures

An analysis of variance (ANOVA) procedure was used to estimate the significance of different well and procedural characteristics on '00-'99 in the initial set of 548 observations. The following variables were recorded for each well.

1. Depth to water
2. GPS longitude
3. GPS latitude
4. Date
5. Measurer's initials
6. Well Access
1 = good
0 = poor
7. Downhole Access
1 = good
0 = poor
8. Weighted Tape
1 = yes
0 = no
9. Oil on Water
1 =yes
0 = no
10. Chalk Cut Quality
2 = excellent
1 = good
0 = poor

In addition, each well has a unique USGS ID number and KGS ID designation, a surface elevation, a legal description of the well location, a decimal latitude and longitude (obtained by LEO conversion of the legal description), and the purpose for which the well is used. The variable Aquifer Code describes the primary source of water in the well; the manner in which aquifer code values were assigned is summarized in Miller, Davis, and Olea (1997). The additional variables taken from the historical records are:

11. Well Use
H = household water supply
S = stock water
I = irrigation
U = unused
12. Aquifer Code
KD = Cretaceous Dakota aquifer
KJ = undifferentiated Cretaceous /Jurassic aquifer
KN = Cretaceous Niobrara aquifer
KU = undifferentiated Cretaceous aquifer
QA = Quaternary alluvium aquifer
QAQU = Quaternary alluvium and undifferentiated aquifers
QAQUTO = Quaternary alluvium and undifferentiated aquifers and Tertiary Ogallala aquifer
QATO = Quaternary alluvium and Tertiary Ogallala aquifers
QU = Quaternary undifferentiated aquifer
QUTO = Quaternary undifferentiated and Tertiary Ogallala aquifers
QUTOKJ = Quaternary undifferentiated, Tertiary Ogallala, and Cretaceous /Jurassic aquifers
QUKD = Quaternary undifferentiated and Cretaceous Dakota aquifers
TO = Tertiary Ogallala aquifer
TOKD = Tertiary Ogallala and Cretaceous Dakota aquifers
TOKJ = Tertiary Ogallala and undifferentiated Cretaceous /Jurassic aquifers

The initial statistical model includes all exogenous variables recorded during the quality control study that may contribute to the variability in the response, '00-'99, plus the variables Well Use and Aquifer Code. In contrast to results obtained in 1999, no exogenous variables contribute significantly to the total variance except for a significant operator effect as measured by the variable Measurer. As expected, Aquifer Code also is a significant source of variance. (In contrast, in 1999 Weighted Tape, Chalk Cut, and Measurer were highly significant, and Well Use and Aquifer Code were significant at a lower level.)

Analysis of Variance table for initial model
Source DF Sum of Squares Mean Square F Ratio Prob>F
Model 26 586.8229 22.5701 3.4837 <0.0001**
Measurer 6 213.99124 35.66521 5.5050 <0.0001**
Well Access 1 9.98314 9.98314 1.5409 0.2150ns
Downhole Access 1 0.00154 0.00154 0.0002 0.9877ns
Weighted Tape 1 1.70515 1.70515 0.2632 0.6082ns
Well Use 3 36.17535 12.05845 1.8612 0.1351ns
Oil on Water 1 0.38886 0.38886 0.0600 0.8066ns
Chalk Cut Quality 2 15.47985 7.73993 1.1947 0.3036ns
Aquifer Code 11 364.23350 33.11214 5.1109 <0.0001**
Error 520 3368.9333 6.4787    
Total 546 3955.7563      
RSquare = 0.148347

A revised model was run that combined aquifers into classes similar to those used in 1997 through 1999. This 5-part classification distinguishes between (1) alluvial aquifers, (2) alluvial aquifers plus other unconsolidated aquifers, (3) the High Plains aquifer, (4) bedrock aquifers, and (5) bedrock plus unconsolidated aquifers. This has the effect of reducing the degrees of freedom required for the model and thus increasing the sensitivity of the analysis.

Analysis of Variance table for grouped aquifers
Source DF Sum of Squares Mean Square F Ratio Prob>F
Model 19 337.2486 17.7499 2.5851 0.0003**
Measurer 7 168.27236 28.04539 4.0845 0.0005**
Well Access 1 10.06818 10.06818 1.4663 0.2265ns
Downhole Access 1 0.16162 0.16162 0.0239 0.8772ns
Weighted Tape 1 0.04445 0.04445 0.0065 0.9359ns
Well Use 3 35.01685 51.04308 11.6723 0.1660ns
Oil on Water 1 1.65566 1.65566 0.2411 0.6236ns
Chalk Cut Quality 2 20.52210 10.26105 1.4944 0.2253ns
Aquifer Group 4 114.65918 28.664795 4.1747 0.0024**
Error 527 3618.5076 6.8662    
Total 546 3955.7563      
RSquare = 0.08526

Most of the components in the revised model are not significant, in strong contrast with results obtained in 1999, when most components of the equivalent model were significant. Unfortunately, past models are not directly comparable because there are different numbers of degrees of freedom assigned to some variables, and the response (annual change in water level) has significantly different variances from year to year. It is interesting to note that the variance of the response variable in 2000 is about one-third its value in 1999, which was about three times the variance measured in 1998, which in turn was about one-third the variance in 1997. Although the year-to-year changes in total variance are highly significant, the cause is speculative.

One way to improve the statistical results of the measurement program is to discard wells in which exogenous variables make unusually high contributions to the total variance, arguing that the readings from such wells are atypical and likely erroneous. In 1999, we identified 24 wells which seemed to make spurious contributions to the variance. Most of these wells had been repeatedly measured, indicating that the measurers either had difficulty obtaining a reliable reading or that the initial depth to water was significantly out-of-trend. These wells were deleted from the network in 2000; it is likely that their removal is responsible for the notable reduction in total variance in 2000.

Importance of contributing variables

We can determine the relative contributions of each category of the contributing variables by examining the least-squares means (averages) of '00-'99 for a specified state of a variable, while holding all other variables at their average value. (In statistical parlance, these averages are referred to as the expected values of the variables.) A positive value indicates the average depth to water in a well is greater in 2000 than in 1999 (the water level has dropped from last year's measurement). That is, the elevation of the water level in the well is lower than it was previously. The following list gives the least-squares means for the complete data set.

Operator
Level Original
Least Sq Mean
DRL -1.8798
JBE -1.3370
JLT -2.1814
JMA -0.6825
MWF -0.4329
RCB -1.3735
RDM -1.3543

Well Access
Level Original
Least Sq Mean
0 -1.0065
1 -1.6339

Downhole Access
Level Original
Least Sq Mean
0 -1.2921
1 -1.3482

Weighted Tape
Level Original
Least Sq Mean
0 -1.3431
1 -1.2973

Well Use
Level Original
Least Sq Mean
H -2.3343
I -1.0086
S -0.4431
U -1.4948

Oil on Water
Level Original
Least Sq Mean
0 -1.2293
1 -1.4111

Chalk Cut
Level Original
Least Sq Mean
0 -1.3202
1 -0.2109
2 -0.1182

Geologic Group
Level Original
Least Sq Mean
1 (Cretaceous) -1.6449
2 (Alluvium) -1.4135
3 (Al. + Tert.) -1.2447
4 (Tertiary) -0.3446
5 (Tert. + K) -1.9532

Summary of the Analyses of Variance

Data collected in 2000 show significant variations attributable to Measurer in addition to differences between the aquifer being tapped by the well. The standard deviation of variable '00-'99 is 2.69 ft., about half the standard deviation of variable '99-'98 (4.21 ft.) and about the same as the 2.48 ft. standard deviation of variable '98-'97. The median decline in water level from 1999 to 2000 is 0.31 ft., which is less than half the decline noted from 1998 to 1999 (0.72 ft.). (The decline between 1997 to 1998 was 0.41 ft.).

There are significant differences between the operators, particularly JLT and MWF. (Note that measurer MLA was not part of the measurement team but a visiting administrator who measured the water depth on a single well. To avoid inflating degrees of freedom in the model, this operator and this specific well have been removed from the data set prior to analysis.)

Water levels measured in 2000 in exclusively Cretaceous aquifers (Group 1) tend to unchanged from those measured in 1999. The Ogalalla aquifer (Group 4) tends to be almost a half-foot deeper than last year. Measurements made in wells tapping alluvial aquifers (Group 2) or alluvial plus other sources (Group 3) tend to be unchanged or slightly shallower. Water levels in wells tapping Cretaceous aquifers plus Quaternary and/or Tertiary aquifers (Group 5) tend to be about 0.7 feet shallower this year. There are highly significant differences of the annual change in water level among aquifers, but these are due in part to an extreme decline (over 32 ft.) in well 27S 38W 15BBB 01. In the year 2000, there were no other statistically significant contributors to total variance among the variables recorded by the field crew.

The ANOVA equation can be used to create an expected value and residual (difference between observed and expected value) for each well. The distribution of residuals should be approximately normal. Examination of the residual outliers will reveal any well measurements which cannot be explained by extreme combinations of the different sources of variation. Three wells have been identified by this process. These wells show changes in water level between 1999 and 2000 that are outside the range expected. These well measurements may be correct and reflect highly unusual changes in aquifer level; the wrong wells may have been measured in 1999; or changes in well construction or other factors may have altered the measurability of a well. The three wells, with their residuals, are:

Well ID Residual, ft.
27S 38W 15BBB 01 -25.8
23S 22W 07DAA 01 +15.0
24S 33W 18BDB 02 -10.8

An additional five wells were flagged because their residuals depart significantly from that expected in a normal distribution of errors. Although the residuals for these wells are less extreme than those listed above, they also may be candidates for replacement in the network.

Well ID Residual, ft.
29S 34W 11ADD 01 +10.0
25S 36W 28CBD 01 -9.9
24S 33W 18BDB 02 -9.2
24S 23W 06AAB 01 -8.6
27S 37W 04ABB 01 -7.5

Quality Assurance (remeasure) Program

The year 2000 Quality Assurance program of random remeasurements showed that the QA data contained only one statistically significant sources of variation. Fifty-six randomly selected QA wells were remeasured during the late January supplementary data collection period. Combined with measurements made during the regular collection period, 131 measurements were available for statistical quality control. In spite of the additional control provided by replication, a significant contribution from only one exogenous variable, well access, was detected. As expected, the variance among the QA replicates is about 11% lower than the variance of the complete data set. The most extreme value of '00-'99 among the QA wells is only 8.1 feet, compared to an extreme of -32.2 feet in the complete data set.

Within the QA data set, there are no significant contributions due to Measurer, although the mean difference between readings by MWF and other measurers is nearly significant. Wells in the QA set measured by MWF show a change (decline in water level) in '00-'99 of over 2 feet, more than three times the average change for the QA data. It should be noted that 2000 is the first year that has MWF measured groundwater levels so this difference probably is attributable to lack of experience. Well Access is a highly significant source of variation; much of this is attributable to two wells, 23S 24W 11DAA 01 and 35S 39W 10CAD 01, because Well Access becomes insignificant as a source of variation when these wells are removed from the data set. Wells with poor access tend to measure about 3 feet deeper than wells with good access, possibly the result of tape deflections and hang-ups in the wells. An identical difference was noted in the 1999 QA measurements. No other exogenous variables, including Geological Units, show any significant differences between levels in the QA data set.

Tracking log analysis

For the Year 2000 measurement season, WaterWitch software was modified so the tracking log generated by the Garmin GPS satellite locational systems was recorded by each GPS/computer field unit. The tracking log could then be analyzed to determine the time spent at each well location, which presumably is related to well conditions and difficulty of access. The results may be useful in deciding which wells should be replaced in the High Plains Aquifer Network by considering the cost effectiveness of measurements made on specific wells.

Unfortunately, because of errors in computer settings, the tracking logs from two of the seven field units were so incomplete as to be unusable. Even though tracking log data are not available for all wells, the results from this initial year's tracking data collection do suggest what factors may significantly affect the time required for well measurement. The remaining five logs record the well measurement activities of three experienced and two first-time well measurers. The data may be analyzed using multivariate regression models similar to those used to analyze the change in water level, except that the dependent variable is time spent on station (variable Minutes) rather than the change in water level elevation (variable '00-'99).

Figure 1--Example of track (red) recorded by WaterWitch based on GPS signals. Observation well locations are shown by blue dots.

image of computer screen with red tracks of vehicles and blue points of wells.

The tracking logs corresponding to the visual record shown in Figure 1 require considerable editing before the data are in a format that can be used for statistical analysis. A sample of the GPS tracking output log is shown below.

37.40180 -100.37980 01/07/00-18:04:27 038-samps. 19.8Min Well=30S 28W 33AAA 01 - 0059.ft
37.74437 -100.01865 01/07/00-19:06:37 006-samps. 02.6Min Well=26S24W31DDAOI -11171.ft
37.75389 -100.03392 01/07/00-19:15:31 020-swnps.09.9Min Well=26S 25W 16DCC 01 - 11791.ft
***** Time Break 01/07/00 19:27:02 01/08/00 19:27:44
Invalid time stamp records 710 and 711
37.91421 -100.42906 01/08/00-15:22:42 020-sarnps. 09.9Min Well=24S28W31DDOI -0055.ft
37.80672 -100.34750 01/08/00-15:50:15 009-swnps. 04.lMin Well=26S 28W IOACB 02 - 5182.ft

Each line is a record that gives the longitude and latitude of the location where the vehicle was stationary, the starting and ending time at the stationary location, the number of satellite readings (taken at the rate of 2 readings per minute) made while stationary, the elapsed time at the stationary location (in minutes), the KGS ID of the nearest well in the WaterWitch data base, and the distance to this closest well (in feet). The first line is the record for a position only 59 feet from an observation well, so this probably represents a genuine stop for the purpose of well measurement. The second and third lines record stops that are almost two miles from the closest well, so these probably are not stops that were made for well measurements. Such entries (those over 5000 feet from the nearest observation well) are edited out of the file. The fourth and fifth lines represent times when the GPS device was turned off and also are edited from the file. The sixth line again seems to represent a stop to measure a well, which is only 55 feet away. The seventh line again exceeds the 5000foot threshold and is edited from the file.

Next, the longitude, latitude, starting time and ending time are stripped from each line, as are all labels (samps, Min, Well=, and ft). The format is changed to make the data files compatible with the JMP statistical package for Macintosh. Each edited tracking log is then entered into a JMP spreadsheet and combined with well measurement data from the master Year 2000 observation well database, using the variable KGS ID as the criterion for joining. Each file is then searched and all lines showing operators other than the operator assigned to the specific tracking log are deleted. Next, all multiple entries for an individual well are identified and lines showing the greatest distance to the closest well are deleted, unless these represent re-measurements. Finally, all of the edited tracking logs are recombined into a single file. The data file used for this analysis is given in Appendix A.

The final data file contains 443 records representing 351 observation wells. There are 81 second remeasurements, 9 third remeasurements, and 2 fourth remeasurements of individual wells. The variables include those listed under "Statistical Procedures" in the preceding section, plus Minutes, which is the dependent variable from the tracking log that is to be assessed.

Examination of the variable Minutes shows that the distribution of time spent at wells is skewed, with a mean time of 16.6 minutes per well (Fig. 2). Seventy-five percent of the wells were measured in less than 21 minutes, but 10% of the wells required more than a half-hour to obtain acceptable measurements. The greatest length of time spent at a well was almost two hours. The following list contains the upper ten percent of wells that required the longest times to measure, as estimated from the tracking log. The list also gives initials of the measurer, and whether two or more repeat measurements were made.

Figure 2--Distribution of minutes spent stationary at well sites, Jan. 2000 well measurement program.

Bar chart showing peark at 5-10 minutes per well; highest number of wells at 5-15 minutes or so.

Re-runs KGS ID Initials Time
1 03S 40W 35AAC 01 RDM 56.3
2 03S 40W 35AAC 01 RDM 56.3
1 24S 33W 09CCD 01 RCB 43.1
1 33S 33W 12AAD 01 RCB 40
2 33S 33W 12AAD 01 RCB 40
1 27S 38W 15BBB 01 JMA 39
2 27S 38W 15BBB 01 JMA 39
3 27S 38W 15BBB 01 JMA 39
1 07S 38W 28DAA 01 JLT 43.4
2 24S 33W 19DBB 01 JLT 44.1
1 29S 31W 09CB 01 JLT 80.5
1 05S 39W 25CDA 01 JBE 110.3
1 06S 39W 09DDD 01 JBE 74.3
1 28S 21W 25ABB 01 JBE 42.1

Although the distribution of time required to measure wells is interesting in itself, it does not suggest the possible causes of excessive times spent at particular wells. We can investigate possible sources of variation by running a statistical analysis similar to that used to study variations in changes in water level in the wells. An initial unbalanced, multiway analysis of variance yields the following ANOVA table:

Analysis of Variance table for tracking data
Source DF Sum of Squares Mean Square F Ratio Prob>F
Model 19 10808.275 568.857 5.7677 <0.0001**
Measurer 4 2321.1019 580.2755 5.8834 <0.0001**
Well Access 1 187.3109 187.3109 1.8992 0.1689ns
Downhole Access 1 1186.5681 1186.5681 12.0306 0.0006**
Weighted Tape 1 46.7869 46.7869 0.4744 0.4914ns
Well Use 3 131-1153 43.7051 0.4431 0.7223ns
Oil on Water 1 152.7028 152.7028 1.5483 0.2141ns
Chalk Cut Quality 2 3130.0822 1565.0411 15.8680 <0.0001**
Aquifer Group 4 431.3731 107.8433 1.0934 0.3593ns
'00 Depth 1 189.5258 189.5258 1.9216 0.1664ns
'00-'99 1 540.7979 540.7979 5.4832 0.0197*
Lack of Fit 393 40988.779 104.2971 4.2791 <0.0001**
Pure Error 30 731.205 24.3731    
Total Error 423 41719.984 98.6288    
Total 442 52528.259      
RSquare = 0.2057

Note that this linear model contains two continuous predictor variables, '00 Depth and '00-'99. These components of the model express the tendency for time at station (Minutes) to vary linearly with depth to water or with the change in water level between 1999 and 2000.

The ANOVA shows that significant effects can be attributed to differences between Measurers, as RDM and JLT required on average almost 20 minutes to measure a well while JBE, JMA, and RCB required on average five minutes less time per well. Wells that were flagged as having poor Downhole Access required on average more than seven minutes more time per measurement. Wells in which a poor Chalk Cut was obtained required on average more than 10 minutes additional time to measure than did wells in which the chalk cut was excellent. Finally, there is a significant tendency for wells which show the largest '00-'99 change in water level to require more time to measure (the change is approximately one-half minute increase in measurement time for every foot of water-level decline from 1999 to 2000).

Conclusions

The purpose of the Quality Control and Assurance Program is to identify wells and procedural conditions that may contribute significantly to the variance of Depth to Water measured in observation wells, and which does not reflect true changes in the water table elevation. Gathering Quality Control information requires little additional effort by the field crews, emphasizes the importance of procedural consistency, and certifies performance. Quality Control for the year 2000 field season is remarkably free of inconsistencies compared to the 1999 field season, and is more like the 1998 season in this regard. The results can be interpreted as reinforcing the need for training and the desirability of deleting troublesome wells from the monitoring program. The QA process continues to identify specific wells as troublesome, and flags well locations which require verification before being permanently incorporated into the WIZARD data base.

The Quality Control program has achieved its objectives of identifying and quantifying sources of unwanted variation in observation well data collection, and in flagging wells whose measurements required verification. It detected a small number of spurious values, confirming the benefits of "cleaning" the data base in past years. As the Quality Control process is routinely applied to KGS observation well measurements in the future, and particularly if it is applied to the entire Kansas observation well network, the quality of the groundwater measurement data will continue to be progressively improved with time.

The inferences that can be drawn from the tracking log data for 2000 are limited because the data are incomplete and the proper association of an interval of time when a vehicle was stationary with an attempt to measure a specific well is still uncertain. Changes in field procedures next year may result in fewer ambiguous time measurements, and improvements in computer software may make associating the tracking log file with the well measurement file easier and more exact. Nonetheless, some inferences may be drawn from the current, initial attempt to analyze the tracking data.

  1. Typically, it takes 15 to 20 minutes to measure a well, for either experienced or first-time measurers.
  2. As noted in the analysis of water level changes, there seems to be uncertainty on the part of the measurers as to the distinction between well access and downhole access. These variables seem to be confounded with use of a weighted tape and the quality of chalk cut. It may be useful to revise the exogenous variables recorded in next years program.
  3. The time spent measuring a well, and hence its cost of measurement, should be taken into account along with well history, water level consistency, and spatial requirements when considering replacing a well in the network.

The initial year's attempt to extract and utilize information from the GPS tracking logs should be considered successful and worth pursuing in the future. With continued refinement of the data capture and processing methodology, the tracking logs will yield useful information for the maintenance of the High Plains aquifer network.

References

Miller, R.D., Davis, J.C., and Olea, R.A., 1997, Acquisition Activity, Statistical Quality Control, and Spatial Quality Control for 1997 Annual Water Level Data Acquired by the Kansas Geological Survey: Kansas Geological Survey, Open-File Report 97-33, 45 p.

Miller, R.D., Davis, J.C., and Olea, R.A., 1998, 1998 Annual Water Level Raw Data Report for Kansas: Kansas Geological Survey, Open-File Report 98-7, 275 p., 6 plates, and 1 compact disk.

Next page--Appendix A


Kansas Geological Survey, Geohydrology
Placed online Dec. 3, 2010, original report dated February 2000
Comments to webadmin@kgs.ku.edu
The URL for this page is http://www.kgs.ku.edu/Hydro/Publications/2000/OFR00_07/index.html