Feature Selection In Document Clustering Using Rough Set Theory

Document Type : Original Article

Abstract

One fundamental aspect of rough set theory is the search of subsets of attributes
that provide the same information for classification purposes as the full set of
attributes. In this paper, application of rough set theory to feature selection in
document clustering is introduced. We emphasize the role of the basic constructs of
rough set approach in feature selection, namely reducts. We propose a method of
generating a best reduct of the data based on rough set theory to overcome the
problems of generating all reducts. The application to a hierarchical clustering of
document dataset is presented as an example. Finally, the paper presents a
comparison of the clustering results based on the original data set and those based
on the reduced data set.

Keywords