Mi INAOE Alertas Editar Perfil

Por favor, use este identificador para citar o enlazar este ítem: http://inaoe.repositorioinstitucional.mx/jspui/handle/1009/1190

Título :	Using the Web as corpus for self-training text categorization
Autor:	RAFAEL GUZMAN CABRERA MANUEL MONTES Y GOMEZ Paolo ROSSO LUIS VILLASEÑOR PINEDA
Nivel de acceso:	Acceso Abierto
Licencia:	Atribución-NoComercial-SinDerivadas
Materia:	Text categorization Semi-supervised learning Self-training Web as corpus Authorship attribution
Resumen o descripción:	Most current methods for automatic text categorization are based on supervised learning techniques and, therefore, they face the problem of requiring a great number of training instances to construct an accurate classifier. In order to tackle this problem, this paper proposes a new semi-supervised method for text categorization, which considers the automatic extraction of unlabeled examples from the Web and the application of an enriched self-training approach for the construction of the classifier. This method, even though language independent, is more pertinent for scenarios where large sets of labeled resources do not exist. That, for instance, could be the case of several application domains in different non-English languages such as Spanish. The experimental evaluation of the method was carried out in three different tasks and in two different languages. The achieved results demonstrate the applicability and usefulness of the proposed method.
Editor:	Springer Science+Business Media
Fecha de publicación :	2009
Tipo de publicación :	Artículo
Idioma:	Inglés
Audiencia:	Estudiantes Investigadores Público en general
Forma de citación:	Guzmán-Cabrera, R., et al., (2009). Using the Web as corpus for self-training text categorization, Springer Science Inf. Retrieval (12): 400–415
Área de conocimiento:	CIENCIA DE LOS ORDENADORES
Versión de la publicación:	Versión aceptada
Versión de la publicación:	acceptedVersion - Versión aceptada
Aparece en las colecciones:	Artículos de Ciencias Computacionales

Cargar archivos:

Fichero	Tamaño	Formato
2009-MontesyGomezManuel-Using the Web as Corpus for Self-training Text Categorization.pdf	273.52 kB	Adobe PDF	Visualizar/Abrir