Data Mining aims to discover hidden facts that exist in the databases and data warehouses. The discovered data should not reveal secrets that are considered private for individuals or groups. In recent years, there have been privacy concerns over the increase of gathering personal data by various institutions and merchants over the Internet. There has been increasing interest in the problem of building accurate data mining models over aggregate data while protecting privacy at the level of individual records. One approach for this problem is to randomize the values in individual records, and only disclose the randomized values. This method is able to retain privacy while accessing the information implicit in the original attributes. The distribution of the original data set is important and estimating it is one of the goals of the data mining algorithms. This paper introduces the privacy concerns and the obvious conflict between privacy and data mining. Then, two approaches to resolve this conflict are introduced, namely: the randomization approach and the cryptographic approach. We consider the case of performing data mining classification for randomized data. Two proposed algorithms for data mining classification of randomized data ,with high accuracy compared to classification algorithms for non perturbed data, based on Bayes rules will be introduced (Step-Class, and Global-Decision). These two algorithms are experimentally tested to measure the classification accuracy of each of them. Our empirical results show that the Step-Class algorithm has better performance results (classification accuracy ratio) than the Global decision algorithm.
(2007). A Proposed Model to Allow Data Mining Classification Avoiding Privacy Concerns. Journal of the ACS Advances in Computer Science, 1(1), 95-118. doi: 10.21608/asc.2007.147564
MLA
. "A Proposed Model to Allow Data Mining Classification Avoiding Privacy Concerns", Journal of the ACS Advances in Computer Science, 1, 1, 2007, 95-118. doi: 10.21608/asc.2007.147564
HARVARD
(2007). 'A Proposed Model to Allow Data Mining Classification Avoiding Privacy Concerns', Journal of the ACS Advances in Computer Science, 1(1), pp. 95-118. doi: 10.21608/asc.2007.147564
VANCOUVER
A Proposed Model to Allow Data Mining Classification Avoiding Privacy Concerns. Journal of the ACS Advances in Computer Science, 2007; 1(1): 95-118. doi: 10.21608/asc.2007.147564