Por favor, use este identificador para citar o enlazar este ítem:
http://inaoe.repositorioinstitucional.mx/jspui/handle/1009/188
Hierarchical multi-label classification for tree and DAG hierarchies | |
MALLINALI RAMIREZ CORONA | |
LUIS ENRIQUE SUCAR SUCCAR EDUARDO FRANCISCO MORALES MANZANARES | |
Acceso Abierto | |
Atribución-NoComercial-SinDerivadas | |
Classification chains Ontologies Proteins | |
The core of supervised classification consists in assigning to an object or phenomenon one of a previously specified set of categories or classes. There are more complex problems where, instead of a single label, a set of labels are assigned to each instance, this is called multi-label classification. When the labels in a multi-label classification problem are ordered in a predefined structure, typically a tree or a Direct Acyclic Graph (DAG); the task is called Hierarchical Multi-label Classification (HMC). There are HMC methods that create a global model which take advantage of the relations (predefined structure) of the labels. However these methods tend to create too complex models unusable for large scale data. Other methods divide the problem in a set of subproblems, which usually does not benefit from the predefined structure. This thesis addresses the problem of hierarchical classification for tree and DAG structures considering large datasets with a considerable number of labels. A local classifier per parent node is trained for each non-leaf node in the hierarchy. Our method exploits the correlation of the labels with its ancestors in the hierarchy and evaluates each possible path from the root to a leaf node, taking into account the level of the predicted labels to give a score to each path and finally return the one with the best score. In some cases there are instances whose labels do not reach a leaf node, for this cases we developed an extension of the base method for Non Mandatory Leaf Node Prediction (NMLNP); in which a pruning phase is performed before selecting the best path. We noticed that many evaluation measures scored the short paths that only predict the most general cases better than longer more specific paths, that is why we also propose a new evaluation measure that avoids the bias toward conservative predictions in the case of NMLNP. We tested our methods with 18 datasets with tree and DAG structured hierarchies against a number of state-of-the-art methods. The evaluation shows the advantages of these methods, in terms of predictive performance, execution time and scalability compared with other methods for hierarchical classification. Our methods proved to obtain superior results when dealing with deep hierarchies and competitive with shallower hierarchies. | |
Instituto Nacional de Astrofísica, Óptica y Electrónica | |
2014-10 | |
Tesis de maestría | |
Inglés | |
Estudiantes Investigadores Público en general | |
Ramirez-Corona M. | |
CIENCIA DE LOS ORDENADORES | |
Versión aceptada | |
acceptedVersion - Versión aceptada | |
Aparece en las colecciones: | Maestría en Ciencias Computacionales |
Cargar archivos:
Fichero | Descripción | Tamaño | Formato | |
---|---|---|---|---|
RamirezCM.pdf | 2.51 MB | Adobe PDF | Visualizar/Abrir |