Mi INAOE Alertas Editar Perfil

Por favor, use este identificador para citar o enlazar este ítem: http://inaoe.repositorioinstitucional.mx/jspui/handle/1009/807

Título :	Hardware acceleration of frequent itemsets mining on data streams
Autor:	LAZARO BUSTIO MARTINEZ
Colaborador:	RENE ARMANDO CUMPLIDO PARRA RAUDEL HERNANDEZ LEON
Nivel de acceso:	Acceso Abierto
Licencia:	Atribución-NoComercial-SinDerivadas
Materia:	Frequent itemset mining Data stream FPGAS Systolic trees Hashing Lexicografic order
Resumen o descripción:	In recent times, processing of data streams is gaining the attention of the scientific community due to its practical applications, e.g. network traffic analysis and surveillance, sensor networks control, and financial transactions mining, among others. Data streams are unbounded and infinite flows of data arriving at high rates and; therefore, traditional Data Mining approaches (such as Data Classification, Clustering, and Frequent Itemsets Mining among others) cannot be used straightforwardly. Frequent Itemsets Mining is a Data Mining technique focused on obtaining hidden patterns in data, and it has been used to extract useful knowledge from various data sources, recently including data streams. Finding alternatives to improve the processing time in the discovery of frequent itemsets on data streams is an active research topic. One option to improve the discovery of frequent itemsets is to develop single-pass parallel algorithms that can be implemented in hardware architectures (taking advantage of the inherent parallelism of such devices). Two methods for Frequent Itemsets Mining on data streams are proposed in this dissertation. The first method is based on systolic trees, and the other relies on hash functions and the lexicographic order of items received in transactions of data streams. Both methods fulfill the Continuity, Expiration and Infinity constraints of data streams. The general problem of finding frequent itemsets on data streams can be divided into four subproblems: 1. Data streams are composed of long transactions in a large alphabet. 2. Data streams are composed of long transactions in a small alphabet. 3. Data streams are composed of short transactions in a small alphabet. 4. Data streams are composed of short transactions in a large alphabet. The methods reported in this dissertation are oriented to solve the second and fourth subproblems, and by solving these two subproblems, the third subproblem is indirectly also solved. The first subproblem was not faced because it has been attacked from the software perspective. The method based on systolic tree faces the second subproblem by implementing a hardware architecture based on a tree-like data structure where transactions flow from the top to the bottom and from the left to the right concurrently without interruptions. This method outperforms the processing time of the algorithms reported in the state-of-the-art using software and hardware development platforms and can process the incom
Editor:	Instituto Nacional de Astrofísica, Óptica y Electrónica
Fecha de publicación :	2017-06
Tipo de publicación :	Tesis de doctorado
Idioma:	Inglés
Audiencia:	Estudiantes Investigadores Público en general
Forma de citación:	Bustio-Martinez L.
Área de conocimiento:	BANCOS DE DATOS
Versión de la publicación:	Versión aceptada
Versión de la publicación:	acceptedVersion - Versión aceptada
Aparece en las colecciones:	Doctorado en Ciencias Computacionales

Cargar archivos:

Fichero	Tamaño	Formato
BustioMaL.pdf	2.58 MB	Adobe PDF	Visualizar/Abrir