Mexico Web-LV developments (5/01)

The new features that we developed at the Mexico workshop for Web-LV include:

  1. The standard K-means clustering now has a user-specifiable random seed

  2. Improved summary report that contains all information necessary to re-create a cluster run (random seed, variables & weighting, distance measures and methods)

  3. Average archetype supervised clustering:

  4. Knn supervised clustering

  5. Covariance matrix store: You can now store a covariance matrix associated with a clustering run (for example an unsupervised clustering of a regional area). To do this, select a visualization file, click on the Source button, and then look for a store button next to the cov button. Then you can tell Web-LV to use that covariance matrix for a supervised clustering on a new data set with identical variables (for example, to upscale a local clustering to a larger data set).

  6. Random seed setting for unsupervised clustering: You can now specify the random seed in an unsupervised clustering. This enables the user to exactly recreate an unsupervised clustering given a data set, variable list, variable weights, and random seed, all of which are stored in the improved summary report.

  7. Rubustness: Web-LV is now more robust to long header names and long identifying fields. While the ends of long header and identifying fields may be ignored by the program, Web-LV will still handle the data as usual.

  8. In the source menu there is now a new file you can download called the SUP file. This file gives the average vectors for an unsupervised or supervised clustering run in a format that is easy to add back into a data set. This is intended to be used on upscaling. You can run an unsupervised clustering on a certain data set. Then you can take the vectors in the SUP file from the unsupervised run and add them to a new data (with the same variables) as examples for supervised clustering.

  9. We made the the labels more intelligent so they take into account the %populated of a variable before declaring it to be important. In other words, the ability of a variable to be used in the cluster labeling is now proportional to the percent of the data points in that cluster than valid data for that variable.