Please use this identifier to cite or link to this item:
http://inaoe.repositorioinstitucional.mx/jspui/handle/1009/214
Hardware architecture for frequent itemset mining in static datasets using a segmentation strategy | |
MAURO MARTIN LETRAS LUNA | |
RENE ARMANDO CUMPLIDO PARRA RAUDEL HERNANDEZ LEON | |
Acceso Abierto | |
Atribución-NoComercial-SinDerivadas | |
Hardware architecture Frequent hemset FPGA | |
In recent years there has been a significant increase in the information generated from distinct domains and the size of datasets overwhelm the human capacity to process them and obtain valuable information. Because of this, Data Mining has emerged as a set of techniques and algorithms dedicated to finding patterns in datasets, and then these patterns are used to classify or predict the behavior of some phenomena related to the data. Association Rules Mining is an important branch inside Data Mining, and it consists in finding relationships among the data in the form of implication rules. The problem is usually decomposed into two subproblems. One is to find those itemsets whose occurrences exceed a predefined threshold in the database; those itemsets are called frequent itemsets. The second problem is to generate association rules from those frequent itemsets. In this research, Frequent Itemset Mining is explored, because the huge amount of data in some cases makes dificult to obtain a response in an acceptable time according to the application requirements, due to the exhaustive nature of the problem. There are many algorithms dedicated to searching frequent itemsets, the most widely used are: Apriori, FP-Growth, and Eclat. They use strategies like breadth-first search and depth-first search to go over to the search space. They have to do a search in datasets, some of them like Apriori, have to access many times the dataset. FP-Growth reads the dataset twice, but it must keep in memory large amounts of data. Frequent Itemset Mining is an exhaustive task since the database must be read many times independently of the way in which the data is stored (in main memory or hard disk). In the literature, there have been reported two ways to accelerate Frequent Itemset Mining: the first one consists in improving the existing software algorithms through proposing new heuristics to save time, and the second one consists in developing hardware architectures dedicated to this task. The main goal of this research is to design a Hardware Architecture to accelerate the Frequent Itemsets Mining process. A segmentation strategy is proposed using equivalence classes to guarantee that all the frequent itemsets will be found independently of the available hardware resources. An implementation in FPGA willbe carried out to validate the proposed architecture and compare it with software only implementations. | |
Instituto Nacional de Astrofísica, Óptica y Electrónica | |
2015-11 | |
Tesis de maestría | |
Inglés | |
Estudiantes Investigadores Público en general | |
Letras-Luna M.M. | |
CIENCIA DE LOS ORDENADORES | |
Versión aceptada | |
acceptedVersion - Versión aceptada | |
Appears in Collections: | Maestría en Ciencias Computacionales |
Upload archives
File | Description | Size | Format | |
---|---|---|---|---|
LetrasLMM.pdf | 1.39 MB | Adobe PDF | View/Open |