KGS Home List of Computer Contributions

Kansas Geological Survey, Computer Contributions 30, originally published in 1968


Sampling a Geological Population

by John C. Griffiths1 and Charles W. Ondrick2

1Pennsylvania State University and 2Kansas Geological Survey

small image of the cover of the book; black paper with silver text.

Originally published in 1968 as Kansas Geological Survey Computer Contributions 30.

Introduction

Statement of Problem

When the outcome of an experiment is important it possesses some tangible value. For example, the Bureau of Census was created to sample the (human) population of the United States for information required by government. Although this population is a sizable unit, census taking is evidently a worthwhile aid to governmental decisions. Many private polls also supply—at a price—information on characteristics of the American people. The important principle here is that information is of value, and when it is, people will pay the necessary price. The population of the United States consists of only about 200 million individuals; the geologist is faced with the problem of sampling units of the earth's crust, each of which includes a larger population of elements. If geological information is valuable, it is worthwhile investing the necessary effort to ensure that is is adequately reliable for the purpose for which it is to be used.

No experiment is better than the constraints included in its sampling arrangement, therefore it is necessary to be deliberate in planning an experiment, and in particular, in planning its sampling design to fit the objective for which the experiment is performed (Steinmetz, 1962). In our experiment, a geological population is characterized by describing certain properties of its individual elements. The experiment illustrates different results that arise from various sampling arrangements. Essentially the objective is simple, that is to determine the mean and dispersion around the mean of some specified characteristics of the population. It is assumed that these two estimators are valuable items of information and both are necessary and sufficient in describing the specific characteristics of the population (Griffiths, 1961, 1964).

A number of constraints are imposed on the experiment by these apparently simple requirements. In order to fulfill the constraints, the procedure becomes elaborate (Fig. 1). All sampling experiments to determine "best estimators" of population parameters are among the most difficult and expensive experiments to complete successfully (Stephan and McCarthy, 1963). In order to achieve the objectives within specified limits it is necessary to be specific about each step in the procedure, and the steps are designed so that the data fulfill the constraints (Fig. 1).

Figure 1—Flow diagram for sampling geological population.

Flow diagram for sampling geological population.

Defining the Problem

The objectives entail determining statistical estimators of the required population parameters. Thus from a population and its elements, a sample of size "n" of the elements is taken. The problem of estimation of parameters of a detrital sediment is outlined by Griffiths (in press). Notation is noted in Table 1. It is essential that as the sample size "n" increases towards the population size "N", sample estimators (X Bar., sigma hat squared.) converge on their respective population parameters (μ, σ2).

Table 1—Relationship between statistical estimators and population parameters.

Characteristic Statistical
Estimator
Population
Parameter
Mean X Bar. μ
Variance sigma hat squared. σ2
Sample Size n N

The statistical estimators are "best estimators" of their respective parameters if they are consistent, unbiased, efficient and sufficient (Fisher, 1948, p. 10). To obtain statistical estimators which fulfill the requirements, it is necessary to commence with random samples from known frequency distributions. Design of the experiment therefore is based essentially on the constraints. In effect, we have decided answers to our questions and worked backwards to the experimental design which will yield these answers. Most of the analysis is concerned with ensuring that the requirements are fulfilled adequately.

The second step is to select the characteristic parameters to estimate. Some parameters are defined as fundamental properties of populations (Griffiths, 1961, 1967) and the analysis may be extended simply to the remaining properties and to derived properties. The fundamental properties of a population are a function of composition, size, shape, orientation, and packing. Data are measurements or counts. Measurements arise if the variate exhibits a continuous range of variation, such as the properties of size and shape. Counts arise if the variate differs in discrete steps. Experimental determination of the proportions of elements of different types, or composition, generally yields discrete data as does packing (Kahn, 1956; Miller and Kahn, 1962).

An important distinction between the two types of data arises essentially from their different frequency distributions. Continuous variates, if measured appropriately, exhibit normal frequency distributions. Count data results generally in binomial and Poisson distributions. These are called constant probability models. By suitable definition of the sampling design and measurement procedure, it is possible usually to achieve a constant probability model. Again, knowing the result, that is the constant probability model, it is possible to use tests of random sampling which permit adjustment of the sampling design so that a constant probability model is appropriate (Griffiths and Drew, 1966).

Now this is a somewhat elaborate and rigorously prescribed procedure to ensure outcome of the experiments are of the desired form. It may well be questioned whether a less elaborate procedure would not be adequate in a geological sense. Fortunately it is possible to use all the constraints as an interpretive base to yield conclusions of geological interest and there appear to be no other procedures which yield such unequivocal answers to geological questions. The desirable features ("best estimators" as described) are obtained by arranging an equivalence between the geological and statistical questions.

If for example, a variate is continuous and if arrangement of different values of the variate in the population is random, then random samples are simple to achieve and variate values will generate a normal distribution after suitable transformation. Geologically, this is equivalent to random sampling a population which possesses a massive structure, that is there is no discernible systematic arrangement of variation in this variate in the population.

On the other hand, suppose that the population is stratified or layered, then it is necessary to be more circumspect about the sampling. If it is unnecessary to describe the layering, channel samples are adequate to achieve the required estimators. Channel samples, if they are random, will be self-weighting and will yield unbiased best estimators. To ensure that the channel samples are unbiased, and self-weighting, each channel sample must cross the entire population. This can be achieved only if the entire population is accessible to the sampling procedure.

If it is necessary to define the layering as well, additional constraints are imposed on the sampling arrangement. It is necessary to ensure that the samples are confined to single layers, for example, sedimentation unit sampling is required (Otto, 1938). If samples transgress the layers the estimators will be biased and certainly will not be "best estimators." In fact the frequency distributions usually will, in this instance, not yield constant probability models.

On this basis, by suitable sampling design, it is possible to determine the pattern of variation in the population, or in other words, to define the structure of the population. A sampling model for detrital sediments using this approach has been described by Griffiths (in Milner, 1962; Fig. 2). Tests of the model are described by Steinmetz (1962), Wood and Griffiths (1963), Dalhlberg (1964) and Ondrick (1968).

Figure 2—Source area, process, stage and variation among and with in samples of different sedimentary deposits; basic model for defining sampling patterns.

Source area, process, stage and variation among and with in samples of different sedimentary deposits; basic model for defining sampling patterns.

It is possible by ensuring fulfillment of the constraints to define structure of the variate in the population and achieve best estimates of the population parameters. In contrast, without this level of attainment there is no guarantee of achieving either objective. If the information resulting from the experiment is valuable then the estimators must be reliable, and in general, best estimators are essential. In order to ensure best estimators, fulfillment of the constraints is essential.

Suppose the object of interest in the investigation (target population in the terminology of Cochran, Mosteller and Tukey, 1954; Krumbein, 1960; Krumbein and Graybill, 1965) is a gravel pit, although it could be any population, and the variate is length of the long axis, suitably defined, of white pebbles. The population, therefore, is defined as all long axes of white pebbles in the gravel pit. In this example attention is confined to the exposed face of the pit so that all elements of the population are equally accessible. The different types of target populations which a geologist considers have been defined by Griffiths (in Milner, 1962; 1967) as follows:

  1. 1. The hypothetical population—that population of elements which was formed by geological processes operating in a specified geographical area and through a specified time interval. The whole population may no longer exist because of erosion or nondeposition, thus it is hypothetical and defined on the basis of geological information.
  2. 2. The existent population—that population of elements which now exists in some circumscribed unit volume of the earth's crust. This is the population which may be defined rigorously and yields the most statistical and geological information.

Frequently the geologist complains that the existent population is not accessible equally and so he cannot sample it. Such a complaint presupposes a judgement of value; what the geologist is actually saying is that it will cost too much to make this existent population accessible to random sample. In other words, he has decided that the result is not worth the investment! He compromises and collects data which costs little, and of course, is also worth little. Until we resolve this compromise and show that geological information, properly collected, is worth the investment necessary to collect it, our level of predictability will suffer. Indeed, the information collected on the basis of the compromise will generally be obtained cheaply and of little value. In this experiment the exposed face of the gravel is the available, and by definition, represents the existent population. It is emphasized that all conclusions, however, are limited to the face of the gravel pit!

  1. 3. The available population—that population of elements which is available readily and usually sampled. If it is equivalent to the existent population, then the information will be adequate. If, however, this available population is either not sampled randomly or not representative of the existent population, the estimators wi II be biased. The bias cannot be measured and the conclusions and interpretation will apply solely to the sampled population. In this instance it may be difficult to relate to the existent geological population.

It seems worthwhile emphasizing this aspect of sampling and investigation because few scientists, except perhaps in the behavioral sciences, have accepted the available in place of the existent population. Results of the Lanarkshire Milk experiment (Student, 1931; Pearson and Wishart, 1942), the presidential polls of 1937 and 1948 and the Kinsey report (Cochran, Mosteller, and Tukey, 1954), have been criticized severely because the conclusions, while correctly applied to the available population, were extrapolated incorrectly to a different target population (Stephan and McCarthy, 1963, p. 235-272). Indeed if the experimental result is used in decision making, and most geological investigations are thus used, then the outcome should be reliable so that the decisions are rationally sound. Geological information is considered low order because of its poor predictability, which is a function of reliability of the information. To improve reliability and predictability it is necessary to invest the effort and money required to make the data worthwhile. It is necessary to ensure that the conclusions and interpretations actually apply to the existent population.

In summary then it has been decided to obtain random samples from homogeneous populations, and the investigation is concerned with the existent population. These are stringent requirements and the result will suffice not only to yield unbiased estimators of population parameters but by suitable arrangement will yield information on the pattern of variation or structure of the population. Random samples of homogeneous populations will yield frequency distributions that are constant probability models.

The complete text of this report is available as an Adobe Acrobat PDF file.

Read the PDF version (15.1 MB)


Kansas Geological Survey
Placed on web Sept. 4, 2019; originally published 1968.
Comments to webadmin@kgs.ku.edu
The URL for this page is http://www.kgs.ku.edu/Publications/Bulletins/CC/30/index.html