Mexico Web-LV developments (5/01)

The new features that we developed at the Mexico workshop for Web-LV include:

The standard K-means clustering now has a user-specifiable random seed
Improved summary report that contains all information necessary to re-create a cluster run (random seed, variables & weighting, distance measures and methods)
Average archetype supervised clustering:
- Takes one or more examples for each cluster
- Averages the examples together to form a set of cluster average vectors
- Classifies the data set using the distance measure specified by the user
Knn supervised clustering
- Takes one or more examples of each cluster
- The average distance to the k-nearest neighbors in each cluster determines the point classification
- The distance measure is specified by the user
Covariance matrix store: You can now store a covariance matrix associated with a clustering run (for example an unsupervised clustering of a regional area). To do this, select a visualization file, click on the Source button, and then look for a store button next to the cov button. Then you can tell Web-LV to use that covariance matrix for a supervised clustering on a new data set with identical variables (for example, to upscale a local clustering to a larger data set).
Random seed setting for unsupervised clustering: You can now specify the random seed in an unsupervised clustering. This enables the user to exactly recreate an unsupervised clustering given a data set, variable list, variable weights, and random seed, all of which are stored in the improved summary report.
Rubustness: Web-LV is now more robust to long header names and long identifying fields. While the ends of long header and identifying fields may be ignored by the program, Web-LV will still handle the data as usual.
In the source menu there is now a new file you can download called the SUP file. This file gives the average vectors for an unsupervised or supervised clustering run in a format that is easy to add back into a data set. This is intended to be used on upscaling. You can run an unsupervised clustering on a certain data set. Then you can take the vectors in the SUP file from the unsupervised run and add them to a new data (with the same variables) as examples for supervised clustering.
We made the the labels more intelligent so they take into account the %populated of a variable before declaring it to be important. In other words, the ability of a variable to be used in the cluster labeling is now proportional to the percent of the data points in that cluster than valid data for that variable.