U.S. National Science Foundation Project OCE 00-03970

Click on a logo below to visit a project site.

GIS, database design and web interfacing by Jeremey Bartley and Grimay Misgna

GIS database development and web interfacing by Casey McLaughlin

Database administrative support: Kurt Look

Dataset Refinement and Editing:

Why filter?  There are a number of reasons why the user may wish to modify a dataset by “filtering” --selecting only a certain range of values for one or more of the variables. Examples include:

  1. To apply climatic rather than geographic definitions of the region to be analyzed – for example, areas where the mean annual precipitation is at least 1000 mm, or where the mean monthly temperature never falls below zero.
  2. To eliminate ‘outlier’ values that dominate the clustering process but are not significant to the analysis of interest.
  3. To tailor a geographic region of analysis to an irregular shape.

The first two examples can conveniently be done either within the database (the on-line filter option) or off-line, in a dataset downloaded for modification and uploading. The geographic range definition is best done off-line at present.

On-line filtering: After the variables have been selected, Proceed to the variable review page, where a "Filter" button is available next to the listing of each variable. Clicking a filter button will display the range of values for that variable over the current geographic and cell type selection, and a set of 'button' choices -- Greater than, Less than, Equal, Not Equal, Between. One of these should be selected, and the appropriate numerical value entered in the box. For the "Between" choice, the proper format is:  111 and 222

Repeat the process for as many variables as desired (it is not necessary to filter all variables, but the data set will be treated as filtered if any component is).

This variable review page confirms the geographic range, cell type, and variables selected. It also offers a choice of "No Null" for each of the variables. If this box is checked, any cells that have no value associated with that variable (indicated by -9999 in the database entry) will be dropped from the final data set. In general, elimination of null values will make a more statistically satisfying cluster group, but at the expense of omitting parts of the geographic visualization.

Once you have made the decisions at this stage, proceed to the Generate Cluster Data step.

Off-line filtering: Although the data selection process provides basic filter capabilities, it will never be possible to provide every kind of tool that the advanced user might desire.  Fortunately, the LOICZVIEW capability to accept uploaded datasets, in combination with the database download option, permits the user to adjust data sets using relatively simple spreadsheet operations. At present, offline filtering is the only practical way to modify a geographic range to an irregular (non-rectangular) shape. The following example provides a procedural outline.

EXAMPLE:
Clustering of the Australia-New Zealand region yielded poor results for hydrologic variables when the standard geographic region selection (Zones 21 and 26) was used.  This was because the rectangular lat-long boxes that include all of Australia and NZ also include portions of Indonesia and New Guinea, with a very different rainfall regime. Use of the coordinate selection boxes can not solve the problem, because a rectangular box that includes all of Australia (South of 10 degrees S latitude) still clips enough of Indonesia to skew the data distribution.

The following procedure can be used to adjust the Australia- NZ geographic region:

  1. Select the desired variables and cell types for 10-47 S Lat, 110-180 E Long, and select View/Download when the data set is assembled. 
  2. Save the resulting file as a text file (e.g. select all, copy, paste into Notepad or a similar application).
  3. Open a spreadsheet and import the text file. For Excel, open a new worksheet, 'open file' (specify ".txt" for type) on the saved data file, and the Wizard should step through the choices for opening the comma-delimited files into a spreadsheet.
  4. Select the entire sheet, and 'sort data' by Latitude in ascending order, then by Longitude in ascending order.  This places the problem portion of the geographic range (the northwest corner) at the top of the spreadsheet.
  5. Identify from a separate source the latitude and longitude ranges to be excluded (in this case, 10-12 S between 110 and 127 E and between 145 and 155. If we want to exclude New Caledonia, remove latitudes above 25 S E of 155 E Longitude).
  6. Delete the rows with lat-long values in these ranges (for complex shapes, it may be easiest to re-sort the table to address different parts).
  7. Upload the edited database to LOICZVIEW for clustering, following the instructions at the example upload site or within LOICZVIEW.

Once a geographic template is constructed, it can be applied to future dataset downloads.