Dataset Verification and Modification Information

 

The ‘Variables Selected’ page of the environmental database 'front-end' provides final review and modification opportunities for the selected data set.

Please check the latitude-longitude range, cell types, and the variable names, to verify your selections.

The “Info” button in the “Variable” name box permits you to view the original dataset characteristics and distribution, and to preview some of the filters or transforms you might desire, but it does not describe or view the actual filtered or transformed dataset.


Back to Dataset Review, Refinement, and Editing

You may use the "operator" and "criteria" dropdown menus to make the following adjustments to the dataset, for any, all or none of the variables:

Exclude cells that contain null (no data) values from the analysis;

Include cells with values above, below, or between selected value intervals;

Exclude a specific value of a variable;

Reset values above, below, or outside of a selected value interval to limiting values;

Transform the values using Log10, LogN, Absolute Value, or Square Root operators.

 

After you make the final verifications and any transformations and click the ‘generate cluster data button,’ the final page will next page will contain a ‘View or Download Variable Selection Report File.’  This option permits you to check your final output and save a record of dataset contents and modifications.

 

Exclude Null checkbox:  If selected this eliminates from the dataset all cells that contain the ‘no data’ marker (-9999) for that variable.  This is true both for the original dataset and for any transformations that result in null values (see below).

 Back to Dataset Review, Refinement, and Editing

Include/Exclude operators:

 

Include only <:   enter a number within the range of variable values.  All values above that are changed to the ‘null’ (no data, or –9999) entry.  If the ‘no-null’ box is checked, all of the cells excluded from this variable will be dropped from the entire dataset. Example:  if you only want to cluster rural areas, select a value for population density that would exclude urban areas (e.g., 500/km2) and click no-null.

 

Include only >:   enter a number within the range of variable values.  All values below that are changed to the ‘null’ (no data, or –9999) entry.  If the ‘no-null’ box is checked, all of the cells excluded from this variable will be dropped from the entire dataset. Example:  if you only want to cluster areas that do not experience sustained freezing, select zero (or some other appropriate value) for minimum monthly temperature and click no-null.

 

Include only between:   enter 2 numbers within the range of variable values, separated by “and” (format example:  100 and 200).  All values above and below those limits are changed to the ‘null’ (no data, or –9999) entry.  If the ‘no-null’ box is checked, all of the cells excluded from this variable will be dropped from the entire dataset.  Example:  if you only want to cluster temperate regions, select values for temperature variables to identify the desired range (e.g., “8 and 18” for mean annual temperature) and click no-null.

 

Exclude =:   enter the value you wish to drop from the dataset; all cells with that value will be changed to the ‘null’ (no data, or –9999) entry.  If the ‘no-null’ box is checked, all of the cells excluded from this variable will be dropped from the entire dataset.  Example:  this could be used to heighten the comparison between extremes in a classified variable dataset by removing the value associated with the middle class.  It could also be used to clean or test a dataset if the user suspected a value of being in error.

Reset operators

 Back to Dataset Review, Refinement, and Editing

Reset Outside Of:   enter 2 numbers within the range of variable values, separated by “and” (format example:  100 and 200).  All values above and below those limits are changed to the limiting values indicated.  Example:  if you want to focus on interactions of other variables with slope or topographic complexity, you might select values of standard deviation of elevation (bathymetry) to emphasize the transition from operationally flat to operationally steep (or variable).

 

Reset >:   enter a number within the range of variable values.  All values above that are changed to the entered value. Example:  if you want to examine rural-to-urban transition regions and you find that the clusters are dominated by a few high-density urban outliers, select an upper limit on population density (e.g., 1000/km2) that classifies all urban areas together and retains them, but not their variance, in the clustering.

 

Reset <:   enter a number within the range of variable values.  All values below that are changed to that value and retained in the dataset. Example:  if you want to cluster shallow-water areas on the basis of depth, select 100 (or some other appropriate value) for minimum or mean bathymetry. This effectively establishes a numerical definition of ‘deep water’ and retains that category while distinguishing among variations in the shallower range.

 Back to Dataset Review, Refinement, and Editing

Transform Operators:

These filters allow you to modify values, in order to avoid the effects of extreme variance or skewed distribution on the clustering.  Transforms operate on the entire data set.  The presently available transformations are log base 10, natural log, absolute value, and square root.

You may use the transform function to alter datasets after they are filtered or modified, but you cannot filter/modify datasets after they are transformed.

The “Info” button in the “Variable” name box permits you to view the original dataset characteristics and distribution, and to preview some of the filters or transforms you might desire, but it does not describe or view the actual filtered or transformed dataset.

 Back to Dataset Review, Refinement, and Editing

Log10:  This transform ignores null values (if present), assigns a null value (-9999) to all negative numbers, and adds a value of one to all remaining numbers. This avoids most of the problems associated with negative values in both the original and transformed data.  It is most useful for highly skewed data that have an extremely wide range of values, such as population and basin runoff.  

 

LogN:  This transform is similar to Log10, except that it produces a more finely divided logarithmic scale than Log10, which makes it useful for variables that rescale with less compression, such as ocean bathymetry.

 

Abs:  This returns the absolute value of the number (in effect, transforms negative numbers into positive ones).  This will probably be most useful in dealing with changes and rates of change, where the magnitude of the difference is more important than the direction.

 

Sq.Root:  Returns the square root of the selected value. Negative values are transformed to null values.  The transform is potentially useful for scaling area-based values to compare with linear variables.

Back to Dataset Review, Refinement, and Editing

 

Dataset Verification and Modification Information

The ‘Variables Selected’ page provides final review and modification opportunities for the selected data set.

Please check the latitude-longitude range, cell types, and the variable names, to verify your selections.

The “Info” button in the “Variable” name box permits you to view the original dataset characteristics and distribution, and to preview some of the filters or transforms you might desire, but it does not describe or view the actual filtered or transformed dataset.

You may make the following adjustments to the dataset, for any, all or none of the variables:

Exclude cells that contain null (no data) values from the analysis;

Include cells with values above, below, or between selected value intervals;

Exclude a specific value of a variable;

Reset values above, below, or outside of a selected value interval to limiting values;

Transform the values using Log10, LogN, Absolute Value, or Square Root operators.

After you make the final verifications and any transformations and click the ‘generate cluster data button,’ the final page will next page will contain a ‘View or Download Variable Selection Report File.’ This option permits you to check your final output and save a record of dataset contents and modifications.

Exclude Null checkbox: If selected this eliminates from the dataset all cells that contain the ‘no data’ marker (-9999) for that variable. This is true both for the original dataset and for any transformations that result in null values (see below).

Include/Exclude operators:

These filters allow you to eliminate selected ranges of values, and/or the cells containing these values, from the entire dataset.
You may use the transform function to alter datasets after they are filtered, but you cannot filter datasets after they are transformed.
The “Info” button in the “Variable” name box permits you to view the original dataset characteristics and distribution, and to preview some of the filters or transforms you might desire, but it does not describe or view the actual filtered or transformed dataset.

Include only <: enter a number within the range of variable values. All values above that are changed to the ‘null’ (no data, or –9999) entry. If the ‘no-null’ box is checked, all of the cells excluded from this variable will be dropped from the entire dataset. Example: if you only want to cluster rural areas, select a value for population density that would exclude urban areas (e.g., 500/km2) and click no-null.

Include only >: enter a number within the range of variable values. All values below that are changed to the ‘null’ (no data, or –9999) entry. If the ‘no-null’ box is checked, all of the cells excluded from this variable will be dropped from the entire dataset. Example: if you only want to cluster areas that do not experience sustained freezing, select zero (or some other appropriate value) for minimum monthly temperature and click no-null.

Include only between: enter 2 numbers within the range of variable values, separated by “and” (format example: 100 and 200). All values above and below those limits are changed to the ‘null’ (no data, or –9999) entry. If the ‘no-null’ box is checked, all of the cells excluded from this variable will be dropped from the entire dataset. Example: if you only want to cluster temperate regions, select values for temperature variables to identify the desired range (e.g., “8 and 18” for mean annual temperature) and click no-null.

Exclude =: enter the value you wish to drop from the dataset; all cells with that value will be changed to the ‘null’ (no data, or –9999) entry. If the ‘no-null’ box is checked, all of the cells excluded from this variable will be dropped from the entire dataset. Example: this could be used to heighten the comparison between extremes in a classified variable dataset by removing the value associated with the middle class. It could also be used to clean or test a dataset if the user suspected a value of being in error.

Reset operators
These filters allow you to modify selected ranges of values, in order to keep the cells containing these values in the dataset for clustering while avoiding the effects of (for example) extreme variance on the clustering.
You may use the transform function to alter datasets after they are filtered, but you cannot filter datasets after they are transformed.
The “Info” button in the “Variable” name box permits you to view the original dataset characteristics and distribution, and to preview some of the filters or transforms you might desire, but it does not describe or view the actual filtered or transformed dataset.

Reset Outside Of: enter 2 numbers within the range of variable values, separated by “and” (format example: 100 and 200). All values above and below those limits are changed to the limiting values indicated. Example: if you want to focus on interactions of other variables with slope or topographic complexity, you might select values of standard deviation of elevation (bathymetry) to emphasis the transiotion from operationally flat to operationally steep (or variable).

Reset >: enter a number within the range of variable values. All values above that are changed to the entered value. Example: if you want to examine rural-to-urban transition regions and you find that the clusters are dominated by a few high-density urban outliers, select an upper limit on population density (e.g., 1000/km2) that classifies all urban areas together and retains them, but not their variance, in the clustering.

Reset <: enter a number within the range of variable values. All values below that are changed to that value and retained in the dataset. Example: if you want to cluster shallow-water areas on the basis of depth, select 100 (or some other appropriate value) for minimum or mean bathymetry. This effectively establishes a numerical definition of ‘deep water’ and retains that category while distinguishing among variations in the shallower range.

Transform Operators:
These filters allow you to modify values, in order to avoid the effects of (for example) extreme variance on the clustering. Transforms operate on the entire data set.

You may use the transform function to alter datasets after they are filtered, but you cannot filter datasets after they are transformed.

The “Info” button in the “Variable” name box permits you to view the original dataset characteristics and distribution, and to preview some of the filters or transforms you might desire, but it does not describe or view the actual filtered or transformed dataset.

Log10: This transform ignores null values (if present), assigns a null value (-9999) to all negative numbers, and adds a value of one to all remaining numbers. This avoids most of the problems associated with negative values in both the original and transformed data. It is most useful for highly skewed data that have an extremely wide range of values, such as population and basin runoff.

LogN: This transform is similar to Log10, except that it produces a more finely divided logarithmic scale than Log10, which makes it useful for variables that rescale with less compression, such as ocean bathymetry.

Abs: This returns the absolute value of the number (in effect, transforms negative numbers into positive ones). This will probably be most useful in dealing with changes and rates of change, where the magnitude of the difference is more important than the direction.

Sq.Root: Returns the square root of the selected value. Negative values are transformed to null values. The transform is potentially useful for scaling area-based values to compare with linear variables.