Por favor, use este identificador para citar o enlazar este ítem: http://inaoe.repositorioinstitucional.mx/jspui/handle/1009/188
Hierarchical multi-label classification for tree and DAG hierarchies
MALLINALI RAMIREZ CORONA
LUIS ENRIQUE SUCAR SUCCAR
EDUARDO FRANCISCO MORALES MANZANARES
Acceso Abierto
Atribución-NoComercial-SinDerivadas
Classification chains
Ontologies
Proteins
The core of supervised classification consists in assigning to an object or phenomenon one of a previously specified set of categories or classes. There are more complex problems where, instead of a single label, a set of labels are assigned to each instance, this is called multi-label classification. When the labels in a multi-label classification problem are ordered in a predefined structure, typically a tree or a Direct Acyclic Graph (DAG); the task is called Hierarchical Multi-label Classification (HMC). There are HMC methods that create a global model which take advantage of the relations (predefined structure) of the labels. However these methods tend to create too complex models unusable for large scale data. Other methods divide the problem in a set of subproblems, which usually does not benefit from the predefined structure. This thesis addresses the problem of hierarchical classification for tree and DAG structures considering large datasets with a considerable number of labels. A local classifier per parent node is trained for each non-leaf node in the hierarchy. Our method exploits the correlation of the labels with its ancestors in the hierarchy and evaluates each possible path from the root to a leaf node, taking into account the level of the predicted labels to give a score to each path and finally return the one with the best score. In some cases there are instances whose labels do not reach a leaf node, for this cases we developed an extension of the base method for Non Mandatory Leaf Node Prediction (NMLNP); in which a pruning phase is performed before selecting the best path. We noticed that many evaluation measures scored the short paths that only predict the most general cases better than longer more specific paths, that is why we also propose a new evaluation measure that avoids the bias toward conservative predictions in the case of NMLNP. We tested our methods with 18 datasets with tree and DAG structured hierarchies against a number of state-of-the-art methods. The evaluation shows the advantages of these methods, in terms of predictive performance, execution time and scalability compared with other methods for hierarchical classification. Our methods proved to obtain superior results when dealing with deep hierarchies and competitive with shallower hierarchies.
Instituto Nacional de Astrofísica, Óptica y Electrónica
2014-10
Tesis de maestría
Inglés
Estudiantes
Investigadores
Público en general
Ramirez-Corona M.
CIENCIA DE LOS ORDENADORES
Versión aceptada
acceptedVersion - Versión aceptada
Aparece en las colecciones: Maestría en Ciencias Computacionales

Cargar archivos:


Fichero Descripción Tamaño Formato  
RamirezCM.pdf2.51 MBAdobe PDFVisualizar/Abrir