Reviewing
the values and characteristics of a selected variable
Filtering (including, excluding, and modifying only
specific values for a variable)
Transforming variables (systematic mathematical modification
of values)
Analyzing relationships (correlation matrices and scatter
plots)
Once the geographic range, the cell type(s) and the desired variables have been selected from the environmental database, the variable review and summary page offers a number of options.
Reviewing the values and characteristics of a selected variable
The "info" button immediately following the variable name will display a statistical summary of the data set for the selected variable in the selected region. This includes such information as the mean,standard deveiation, maximum and minimum values, total cells selected and total populated with data for the variable, etc. Also displayed is a ten-interval histogram plot of the variable values. The histogram display can be modified to have different limiting values or numbers of intervals.
The summary information and the histogram permit the user to see if the variable conforms to expectations and/or looks useful for the intended purposes. The histogram also shows whetehr the distribution of values is normal or highly skewed -- an important consideration for use in clustering exercises or off-line statistical analysis.
Below the histogram is an additional feature that permits the user to view the values as transformed by certain selected functions -- probably the most commonly used are the logarithmic (base 10 and natural) transforms, which cn often be used to 'normalize' skewed distributions. This viewing option does not transform the actual data set -- that is done by the transform option discussed below.
Filtering (including, excluding, or modifying only specific values for a variable)
Why filter? There are a number
of reasons why the user may wish to modify a dataset by “filtering” --selecting
only a certain range of values for one or more of the variables. Examples
include:
The first two examples can conveniently be done either within the database (the on-line filter option) or off-line, in a dataset downloaded for modification and uploading. The geographic range definition is best done off-line at present.
On-line filtering: After the variables have been selected, proceed to the variable review page, where a "Filter" button is available next to the listing of each variable. On the same line as the "Info" button is a dropdown menu labelled "operator." For a description of the available operators and their uses, click here.
Repeat the process for as many variables as desired (it is not necessary to filter all variables, but the data set will be treated as filtered if any component is).
This variable review page confirms the geographic range, cell type, and variables selected. It also offers a choice of "No Null" for each of the variables. If this box is checked, any cells that have no value associated with that variable (indicated by -9999 in the database entry) will be dropped from the final data set. In general, elimination of null values will make a more statistically satisfying cluster group, but at the expense of omitting parts of the geographic visualization.
Once you have made the decisions at this stage, proceed to the Generate Cluster Data step.
Off-line filtering: Although the data selection process provides basic filter capabilities, it will never be possible to provide every kind of tool that the advanced user might desire. Fortunately, the LOICZVIEW capability to accept uploaded datasets, in combination with the database download option, permits the user to adjust data sets using relatively simple spreadsheet operations. At present, offline filtering is the only practical way to modify a geographic range to an irregular (non-rectangular) shape. The following example provides a procedural outline.
EXAMPLE:
Clustering of the Australia-New Zealand region yielded poor results for hydrologic
variables when the standard geographic region selection (Zones 21 and 26)
was used. This was because the rectangular lat-long boxes that include all
of Australia and NZ also include portions of Indonesia and New Guinea, with
a very different rainfall regime. Use of the coordinate selection boxes can
not solve the problem, because a rectangular box that includes all of Australia
(South of 10 degrees S latitude) still clips enough of Indonesia to skew the
data distribution.
The following procedure can be used to adjust the Australia- NZ geographic region:
Once a geographic template is constructed, it can be applied to future dataset downloads.
Transforming variables (systematic mathematical modification of values)
These filters allow you to modify values, in order to avoid the effects of extreme variance or skewed distribution on the clustering. Transforms operate on the entire data set. The presently available transformations are log base 10, natural log, absolute value, and square root. Click here for more detailed descriptions/instructions.
You may use the transform function to alter datasets after they are filtered or modified, but you cannot filter/modify datasets after they are transformed.
The “Info” button in the “Variable” name box permits you to view the original dataset characteristics and distribution, and to preview some of the filters or transforms you might desire, but it does not describe or view the actual filtered or transformed dataset.
Analyzing relationships (correlation matrices and scatter plots)
At the bottom of the VERIFY, EXAMINE, AND/OR MODIFY SELECTED VARIABLES page is a button labelled "compute correlation matrix" -- this calculates a pairwise correlation coefficient for ech pair of variables selected, and displays the resultis in a matrix, along with a summary of the variables used and any modifications to them made with the 'exclude null' choice or any of the include/exclude, reet, or transform operators.
Within the matrix, the numerical values of the coefficients are presented as hypertext links. Clicking on these links will produce a scatterplot of the the two variables represented.
At the bottom of the correlation matrix dispay page is a View or download variable correlation file link; c;licking this produces a downloadable .csv file of the matirix and supporting information.