Confusion: To Dissimilarity (pdf)...

Confusion: To Dissimilarity (pdf)...

A command that creates a Dissimilarity from every selected Confusion.

Settings

Symmetrize first: when on, the confusion matrix is symmetrized before we calculate dissimilarities.
Maximum dissimilarity (units of sigma): specifies the dissimilarity from confusion matrix elements that are zero.

Algorithm

1. Normalize rows by dividing each row element by the row sum (optional).
2. Symmetrize the matrix by averaging f_ij and f_ji.
3. Transformation of the confusion measure which is a sort of similarity measure to the dissimilarity measure.

Similarity and dissimilarity have an inverse relationship: the greater the similarity, the smaller the dissimilarity and vice versa. Both have a monotonic relationship with distance. The most simple way to transform the similarities f_ij into dissimilarities is:

dissimilarity_ij = maximumSimilarity – similarity_ij

For ordinal analyses like Kruskal this transformation is fine because only order relations are important in this analysis. However, for metrical analyses like INDSCAL this is not optimal. In INDSCAL, distance is a linear function of dissimilarity. This means that, with the transformation above, you ultimately fit an INDSCAL model in which the distance between object i and j will be linearly related to the confusion between i and j.

For the relation between confusion and dissimilarity, the model implemented here, makes the assumption that the amount of confusion between objects i and j is related to the amount that their probability density functions, pdf's, overlap. Because we do not know these pdf's we make the assumption that both are normal, have equal sigma and are one-dimensional. The parameter to be determined is the distance between the centres of both pdf's. According to formula 26.2.23 in Abramowitz & Stegun (1970), for each fraction f_ij, we have to find an x that solves:

f_ij = 1 / √(2π) ∫_x^∞ e^-t·t/2 dt

This x will be used as the dissimilarity between i and j. The relation between x and f_ij is monotonic. This means that the results for a Kruskal analysis will not change much. For INDSCAL, in general, you will note a significantly better fit.

Settings

Algorithm

Links to this page