BibSonomy :: bibtex  ::

tag user group author concept BibTeX key search:all search:hpclabisti
A blue social bookmark and publication sharing system.
tags · relations · groups · popular
help · blog · about
login · register
hpclabisti's BibTeX entry:  

An Efficient Parallel and Distributed Algorithm for Counting Frequent Sets

High Performance Computing for Computational Science — VECPAR 2002, : 3--29, 2003.
Authors: Salvatore Orlando and Paolo Palmerini and Raffaele Perego and Fabrizio Silvestri
URL: http://dx.doi.org/10.1007/3-540-36569-9_28
Description: SpringerLink - Book Chapter
Tags: data-mining
Abstract: Due to the huge increase in the number and dimension of available databases, efficient solutions for counting frequent sets are nowadays very important within the Data Mining community. Several sequential and parallel algorithms were proposed, whichin many cases exhibit excellent scalability. In this paper we present ParDCI, a distributed and multithreaded algorithm forcounting the occurrences of frequent sets within transactional databases. ParDCI is a parallel version of DCI (Direct Count& Intersect), a multi-strategy algorithm which is able to adapt its behavior not only to the features of the specific computingplatform (e.g. available memory), but also to the features of the dataset being processed (e.g. sparse or dense datasets).ParDCI enhances previous proposals by exploiting the highly optimized counting and intersection techniques of DCI, and byrelying on a multi-level parallelization approachwh ichex plicitly targets clusters of SMPs, an emerging computing platform.We focused our work on the efficient exploitation of the underlying architecture. Intra-Node multithreading effectively exploitsthe memory hierarchies of each SMP node, while Inter-Node parallelism exploits smart partitioning techniques aimed at reducingcommunication overheads. In depth experimental evaluations demonstrate that ParDCI reaches nearly optimal performances undera variety of conditions.
| URL | BibTeX  
@inproceedings{orlando02efficient,
title = {An Efficient Parallel and Distributed Algorithm for Counting Frequent Sets},
author = {Salvatore Orlando and Paolo Palmerini and Raffaele Perego and Fabrizio Silvestri},
booktitle = {High Performance Computing for Computational Science — VECPAR 2002},
pages = {3--29},
url = {http://dx.doi.org/10.1007/3-540-36569-9_28},
year = {2003},
description = {SpringerLink - Book Chapter},
abstract = {Due to the huge increase in the number and dimension of available databases, efficient solutions for counting frequent sets are nowadays very important within the Data Mining community. Several sequential and parallel algorithms were proposed, whichin many cases exhibit excellent scalability. In this paper we present ParDCI, a distributed and multithreaded algorithm forcounting the occurrences of frequent sets within transactional databases. ParDCI is a parallel version of DCI (Direct Count& Intersect), a multi-strategy algorithm which is able to adapt its behavior not only to the features of the specific computingplatform (e.g. available memory), but also to the features of the dataset being processed (e.g. sparse or dense datasets).ParDCI enhances previous proposals by exploiting the highly optimized counting and intersection techniques of DCI, and byrelying on a multi-level parallelization approachwh ichex plicitly targets clusters of SMPs, an emerging computing platform.We focused our work on the efficient exploitation of the underlying architecture. Intra-Node multithreading effectively exploitsthe memory hierarchies of each SMP node, while Inter-Node parallelism exploits smart partitioning techniques aimed at reducingcommunication overheads. In depth experimental evaluations demonstrate that ParDCI reaches nearly optimal performances undera variety of conditions.},
keywords = {data-mining }
}