Journal of innovative applied mathematics and computational sciences
Volume 3, Numéro 2, Pages 156-161
2024-01-21
Authors : Nielsen Dreas .
Distances between data sets are used for analyses such as classification and clustering analyses. Some existing distance metrics, such as the Manhattan (City Block or L1 ) distance, are suitable for use with categorical data, where the data subtype is numeric, or more specifically, integers. However, ordinality of categories imposes additional constraints on data distributions, and the ordering of categories should be considered in the calculation of distances. A new distance metric is presented here that is based on the number of misclassifications that must have occurred within one data set if it were in fact identical to another data set. This "misclassification distance" is equivalent to the number of reclassifications necessary to transform one data set into another. This metric takes account not only of the numbers of observations in corresponding ordinal categories, but also of the number of categories across which observations must be moved to correct all misclassifications. Each stepwise movement of an observation across one or more categories that is required to equalize the distributions increases the distance metric, thus this method is referred to as a stepwise ordinal misclassification distance (SOMD). An algorithm is provided for the calculation of this metric.
ordinal ; distance ; multinomial ; categorical ; misclassification
Azoui Haroun
.
pages 209-2028.
Benyahia Mohammed Elsseddik
.
pages 04-04.
Milles Soheyb
.
Latreche Abdelkrim
.
Barkat Omar
.
pages 40-47.
Tahi Abderrahmane
.
Djebouri Mohammed
.
pages 833-848.