Mexico Web-LV developments (5/01)
The new features that we developed at the Mexico workshop for Web-LV include:
- The standard K-means clustering now has a user-specifiable random seed
- Improved summary report that contains all information necessary to
re-create a cluster run (random seed, variables & weighting, distance
measures and methods)
- Average archetype supervised clustering:
- Takes one or more examples for each cluster
- Averages the examples together to form a set of cluster average vectors
- Classifies the data set using the distance measure specified by the user
- Knn supervised clustering
- Takes one or more examples of each cluster
- The average distance to the k-nearest neighbors in each cluster
determines the point classification
- The distance measure is specified by the user
- Covariance matrix store: You can now store a covariance matrix associated
with a clustering run (for example an unsupervised clustering of a regional
area). To do this, select a visualization file, click on the Source button,
and then look for a store button next to the cov button. Then you can tell
Web-LV to use that covariance matrix for a supervised clustering on a new data
set with identical variables (for example, to upscale a local clustering to a
larger data set).
- Random seed setting for unsupervised clustering: You can now specify the
random seed in an unsupervised clustering. This enables the user to exactly
recreate an unsupervised clustering given a data set, variable list, variable
weights, and random seed, all of which are stored in the improved summary
report.
- Rubustness: Web-LV is now more robust to long header names and long
identifying fields. While the ends of long header and identifying fields may
be ignored by the program, Web-LV will still handle the data as usual.
- In the source menu there is now a new file you can download called the SUP
file. This file gives the average vectors for an unsupervised or supervised
clustering run in a format that is easy to add back into a data set. This is
intended to be used on upscaling. You can run an unsupervised clustering on a
certain data set. Then you can take the vectors in the SUP file from the
unsupervised run and add them to a new data (with the same variables) as
examples for supervised clustering.
- We made the the labels more intelligent so they take into account the
%populated of a variable before declaring it to be important. In other words,
the ability of a variable to be used in the cluster labeling is now
proportional to the percent of the data points in that cluster than valid data
for that variable.