Dissimilarity: To Configuration (kruskal)...

A command that creates a Configuration object from a Dissimilarity object.

Settings

Number of dimensions (standard value: 2)
The dimensionality of the Configuration.
Distance metric (standard value: 2, i.e. Euclidean)
the general distance between points xi and xj (i,j = 1..numberOfPoints) is:
(∑k=1..numberOfDimensions |xikxjk|metric)1/metric
Sort distances
determines the handling of ties in the data. When off, whenever two or more dissimilarities are equal we do not care whether the fitted distances are equal or not. Consequently, no constraints are imposed on the fitted distances. When on, however, we impose the constaint that the fitted distances be equal whenever the dissimilarities are equal.

For the calculation of stress:

Formula1 (default)
stress = √(∑(distancekfittedDistancek)2 / ∑ distancek2)
Formula2
stress = √(∑(distancekfittedDistancek)2 / ∑ (distancekaverageDistance)2)
Note that values of stress 2 are generally more than double those of stress 1 for the same degree of fit.

Finding the optimal Configuration involves a minimization process:

Tolerance
When successive values for the stress differ less than Tolerance the minimization process stops.
Maximum number of iterations
Minimization stops after this number of iterations has been reached.
Number of repetitions
When chosen larger than 1, the minimalization process will be repeated, each time with another random start configuration. The configuration that results in minimum stress will be saved.

Precautions

When there are few objects it is impossible to recover many dimensions. A rough rule of thumb is that there should be at least twice as many number of observations, i.e. the numberOfPoints · (numberOfPoints - 1) / 2 (dis)similarities, than parameters to be estimated, i.e. the numberOfPoints · numberOfDimensions position coordinates. A practical guide is:

for numberOfDimensions = 1 you need ≥ 5 objects
for numberOfDimensions = 2 you need ≥ 9 objects
for numberOfDimensions = 3 you need ≥ 13 objects

There is no feasible way to be certain that you have found the true global minimum. However, by using a great number of different random starting configurations to scale the same data it is often possible to obtain practical certainty. Although the procedure for obtaining an initial configuration is based on a linear relation between distance and (dis)similarity, it gives a very good approximation of the optimal Configuration and the Minimizer practically always finds the global minimum from it (I guess...). A way to find out is to try the numberOfRepetitions parameter which gives you the possibility to fit many times and each time start with another random initial configuration.

Algorithm

1. The Dissimilarity object is converted to a Distance object in the same way as in Dissimilarity: To Distance....)
2. From the Distance object an initial Configuration is found by first transforming the Distance object to a matrix with scalar products of distances and subsequently solving for the first numberOfDimensions eigenvectors of this matrix.
3. A minimalization algorithm is started that tries to minimize a function. In this function:
• 3.1 We normalize the current Configuration from the minimizer
• 3.2 Calculate a new Distance object from the configuration
• 3.3 Do a monotone regression of this Distance on the Dissimilarity. This results in a new Distance object.
• 3.4 Calculate stress from this Distance and the Distance obtained from Dissimilarity.

The optimization process is ccontrolledby a conjugate gradient minimization algorithm that tries to minimize the stress function. In Kruskal (1964), a steepest descent algorithm is used wwhichis less efficient.


© djmw, April 7, 2004