Utility of Knowledge Discovered from Sanitized Data

dc.contributor.authorSramka, Michaleng
dc.contributor.authorSafavi-Naini, Reihaneheng
dc.contributor.authorDenzinger,Jorgeng
dc.contributor.authorAskari, Minaeng
dc.contributor.authorGao, Jieeng
dc.date.accessioned2012-06-15T20:31:46Z
dc.date.available2012-06-15T20:31:46Z
dc.date.issued2008-09-30
dc.description.abstractWhile much attention has been paid to data sanitization methods with the aim of protecting users’ privacy, far less emphasis has been put to the usefulness of the sanitized data from the view point of knowledge discovery systems. We consider this question and ask whether sanitized data can be used to obtain knowledge that is not defined at the time of the sanitization. We propose a utility function for knowledge discovery algorithms, which quantifies the value of the knowledge from a perspective of users of the knowledge. We then use this utility function to evaluate the usefulness of the extracted knowledge when knowledge building is performed over the original data, and compare it to the case when knowledge building is performed over the sanitized data. Our experiments use an existing cooperative learning model of knowledge discovery and medical data, anonymized and perturbed using two widely known sanitization techniques, called E-differential privacy and k-anonymity. Our experimental results show that although the utility of sanitized data can be drastically reduced and in some cases completely lost, there are cases where the utility can be preserved. This confirms our strategy to look at triples consisting of a utility function, a sanitization mechanism, and a knowledge discovery algorithm that are useful in practice. We categorize a few instances of such triples based on usefulness obtained from experiments over a single database of medical records. We discuss our results and show directions for future work.eng
dc.description.refereedNoeng
dc.identifier.department2008-910-23eng
dc.identifier.doihttp://dx.doi.org/10.11575/PRISM/30983
dc.identifier.urihttp://hdl.handle.net/1880/49026
dc.language.isoengeng
dc.publisher.corporateUniversity of Calgaryeng
dc.publisher.facultyScienceeng
dc.subjectutility of knowledgeeng
dc.subjectprivacy-preserving data miningeng
dc.subjectdifferential privacyeng
dc.subjectk-anonymityeng
dc.subjectcooperative learningeng
dc.subject.otherknowledge discovered from Sanitized Dataeng
dc.titleUtility of Knowledge Discovered from Sanitized Dataeng
dc.typetechnical reporteng
thesis.degree.disciplineComputer Scienceeng
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
19712.pdf
Size:
380.89 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.86 KB
Format:
Item-specific license agreed upon to submission
Description: