Please use this identifier to cite or link to this item: http://inaoe.repositorioinstitucional.mx/jspui/handle/1009/807
Hardware acceleration of frequent itemsets mining on data streams
LAZARO BUSTIO MARTINEZ
RENE ARMANDO CUMPLIDO PARRA
RAUDEL HERNANDEZ LEON
Acceso Abierto
Atribución-NoComercial-SinDerivadas
Frequent itemset mining
Data stream
FPGAS
Systolic trees
Hashing
Lexicografic order
In recent times, processing of data streams is gaining the attention of the scientific community due to its practical applications, e.g. network traffic analysis and surveillance, sensor networks control, and financial transactions mining, among others. Data streams are unbounded and infinite flows of data arriving at high rates and; therefore, traditional Data Mining approaches (such as Data Classification, Clustering, and Frequent Itemsets Mining among others) cannot be used straightforwardly. Frequent Itemsets Mining is a Data Mining technique focused on obtaining hidden patterns in data, and it has been used to extract useful knowledge from various data sources, recently including data streams. Finding alternatives to improve the processing time in the discovery of frequent itemsets on data streams is an active research topic. One option to improve the discovery of frequent itemsets is to develop single-pass parallel algorithms that can be implemented in hardware architectures (taking advantage of the inherent parallelism of such devices). Two methods for Frequent Itemsets Mining on data streams are proposed in this dissertation. The first method is based on systolic trees, and the other relies on hash functions and the lexicographic order of items received in transactions of data streams. Both methods fulfill the Continuity, Expiration and Infinity constraints of data streams. The general problem of finding frequent itemsets on data streams can be divided into four subproblems: 1. Data streams are composed of long transactions in a large alphabet. 2. Data streams are composed of long transactions in a small alphabet. 3. Data streams are composed of short transactions in a small alphabet. 4. Data streams are composed of short transactions in a large alphabet. The methods reported in this dissertation are oriented to solve the second and fourth subproblems, and by solving these two subproblems, the third subproblem is indirectly also solved. The first subproblem was not faced because it has been attacked from the software perspective. The method based on systolic tree faces the second subproblem by implementing a hardware architecture based on a tree-like data structure where transactions flow from the top to the bottom and from the left to the right concurrently without interruptions. This method outperforms the processing time of the algorithms reported in the state-of-the-art using software and hardware development platforms and can process the incom
Instituto Nacional de Astrofísica, Óptica y Electrónica
2017-06
Tesis de doctorado
Inglés
Estudiantes
Investigadores
Público en general
Bustio-Martinez L.
BANCOS DE DATOS
Versión aceptada
acceptedVersion - Versión aceptada
Appears in Collections:Doctorado en Ciencias Computacionales

Upload archives


File SizeFormat 
BustioMaL.pdf2.58 MBAdobe PDFView/Open