Please use this identifier to cite or link to this item:
http://inaoe.repositorioinstitucional.mx/jspui/handle/1009/807
Hardware acceleration of frequent itemsets mining on data streams | |
LAZARO BUSTIO MARTINEZ | |
RENE ARMANDO CUMPLIDO PARRA RAUDEL HERNANDEZ LEON | |
Acceso Abierto | |
Atribución-NoComercial-SinDerivadas | |
Frequent itemset mining Data stream FPGAS Systolic trees Hashing Lexicografic order | |
In recent times, processing of data streams is gaining the attention of the scientific community due to its practical applications, e.g. network traffic analysis and surveillance, sensor networks control, and financial transactions mining, among others. Data streams are unbounded and infinite flows of data arriving at high rates and; therefore, traditional Data Mining approaches (such as Data Classification, Clustering, and Frequent Itemsets Mining among others) cannot be used straightforwardly. Frequent Itemsets Mining is a Data Mining technique focused on obtaining hidden patterns in data, and it has been used to extract useful knowledge from various data sources, recently including data streams. Finding alternatives to improve the processing time in the discovery of frequent itemsets on data streams is an active research topic. One option to improve the discovery of frequent itemsets is to develop single-pass parallel algorithms that can be implemented in hardware architectures (taking advantage of the inherent parallelism of such devices). Two methods for Frequent Itemsets Mining on data streams are proposed in this dissertation. The first method is based on systolic trees, and the other relies on hash functions and the lexicographic order of items received in transactions of data streams. Both methods fulfill the Continuity, Expiration and Infinity constraints of data streams. The general problem of finding frequent itemsets on data streams can be divided into four subproblems: 1. Data streams are composed of long transactions in a large alphabet. 2. Data streams are composed of long transactions in a small alphabet. 3. Data streams are composed of short transactions in a small alphabet. 4. Data streams are composed of short transactions in a large alphabet. The methods reported in this dissertation are oriented to solve the second and fourth subproblems, and by solving these two subproblems, the third subproblem is indirectly also solved. The first subproblem was not faced because it has been attacked from the software perspective. The method based on systolic tree faces the second subproblem by implementing a hardware architecture based on a tree-like data structure where transactions flow from the top to the bottom and from the left to the right concurrently without interruptions. This method outperforms the processing time of the algorithms reported in the state-of-the-art using software and hardware development platforms and can process the incom | |
Instituto Nacional de Astrofísica, Óptica y Electrónica | |
2017-06 | |
Tesis de doctorado | |
Inglés | |
Estudiantes Investigadores Público en general | |
Bustio-Martinez L. | |
BANCOS DE DATOS | |
Versión aceptada | |
acceptedVersion - Versión aceptada | |
Appears in Collections: | Doctorado en Ciencias Computacionales |
Upload archives
File | Size | Format | |
---|---|---|---|
BustioMaL.pdf | 2.58 MB | Adobe PDF | View/Open |