Browsing by Author "Barker, Kenneth"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
Item Open Access Data Privacy in High-Dimensional and Big Data Age(2015-12-07) Zakerzadeh, Hessam; Barker, Kenneth; Savavi Naeni, Reyhaneh; Elhajj, Reda; Hagen, Gregory; Matwin, StanThe prevalent need for publicly available data set along with the privacy-breach-related incidents occurring when such data is released, increases the need to develop resilient and precise techniques of privacy-preserving data publishing. To this end, numerous privacy models and algorithms have been developed for different data types. However, advances in privacy algorithms still suffer from two fundamental problems: data dimensionality and cardinality growth. The data dimensionality has remained a challenge for a wide variety of algorithms in data mining, clustering, classification and privacy. In the privacy domain, simply applying the existing privacy algorithms results in unacceptable information loss. Similar to the dimensionality problem, cardinality growth is an open problem in the privacy realm. In fact, privacy algorithms are not implementable in an acceptable time over tera-byte scale data sets. This thesis shows that some of the common properties of real data can be leveraged to ameliorate the negative effects of the curse of dimensionality in practice. In real data sets, many dimensions contain high levels of inter-attribute correlations. Such correlations enable the use of a process known as vertical fragmentation to create vertical subsets of smaller dimensionality. This allows the use of an anonymization process, which is based on combining results from multiple independent fragments. This dissertation presents a vertical fragmentation which is general enough to be applied to the k-anonymity and l-diversity models. In addition, this dissertation presents a new approach to privacy-preserving data mining of very massive data sets using MapReduce. Two of the most widely-used privacy models k-anonymity and l-diversity for anonymization are studied. We also investigate the privacy issue in publishing graph data commonly seen as big data sets (i.e. social networks). Graph data is generally more difficult to anonymize because the structural information “hidden” in the graph can be leveraged by an attacker to infer sensitive information. In big graph data publishing, we only focus on protecting attributes as they typically carry sensitive information.Item Open Access Dynamic Role Lease Authorization Protocol for a Distributed Computing System(2013-01-31) Chu, Nelson Chung Ngok; Barker, KennethA distributed computing system, such as a Grid, could be a very dynamic environment and the user groups are most likely become highly diverse. A user group could be formed by the users of different networks, organizations, or administrative-domains with different hardware/software infrastructures and managerial policies. Handling requests from a wide range of users from different domains becomes a challenge when attempting to accommodate all the differences. Service providers find it impossible to track all users (the number of users could be potentially very large) in a Grid. Therefore, an access control mechanism that provides users appropriate access to the resources in a dynamic environment is required. Role Based Access Control (RBAC) models have been demonstrated to be an effective and efficient approach for an administrator to manage accesses in a computing system. Much has been done to adapt the RBAC concept to Grids and focus on the authorization and verification of the dynamic factors or contexts of a user, such as time, location, rank, etc. Some applications also allow administrators to change the policies during the authorization process, but they did not handle the authorization in real-time and on-demand manner in a Grid. It is a critical authorization requirement for a dynamic environment. Therefore, this problem motivated us to develop a new dynamic authorization protocol, Dynamic Role Lease Authorization (DRLA) that is suitable for a dynamic distributed computing environment.Item Open Access Facilitating Programming-by-Demonstration for Bioinformatics using Semantic Web Resources.(2014-01-07) Gordon, Paul-Michael; Barker, Kenneth; Sensen, ChristophMany software applications have “macro” capabilities, allowing users to record keystrokes, then replay them verbatim to save time performing repetitive tasks. Programming by demonstration (PbD) is a more sophisticated technique where the application translates the semantics (i.e. meaning) of users’ actions into programming language notation. The PbD engine generalizes to a functional level, rather than simply being rote sets of keystrokes. This can help nonprogrammers robustly automate their computer based tasks. Most extant PbD systems have user actions based on physical phenomena such as maze traversal, but automating gaming tasks is of limited interest to most computer users. One user group that could benefit from task automation without programming is molecular biologists, analyzing large genetics datasets ("bioinformatics") but typically having no programming skills. Visual workflow systems help automate bioinformatics, but using empirical studies I show five significant barriers to their use. I propose here that a domain-savvy PbD system can mitigate these barriers by inferring data analysis workflows from molecular biologists' Web browsing sessions. I call this technique Workflow by Demonstration (WbD). The success of this approach depends on replacing the typical physical phenomenon model of PbD with a biology-specific data model and hypertext interface. Emerging informatics standards ("Semantic Web" technologies) facilitate the use of common data models across different data providers on the Web. Many molecular biology resources are Web based, therefore this work implements Semantic Web technologies to facilitate WbD. Biologists were given pen-and-paper workflow design tasks, revealing the types of data flow they intuitively understood. These defined the types of workflow "code" a WbD system should support, and the corresponding hypertext demonstration actions were modeled. A browser (Seahawk) implements these action to code mappings. User studies evaluating Seahawk show that biologists could 1) demonstrate Web based analysis for realistic tasks, 2) understand the automatically generated workflows, and 3) use them in the workflow environment Taverna. This suggests WbD is a viable technique for bioinformatics. Although the data model used was biology-specific, the underlying semantic technologies used are domain agnostic. Techniques described here may therefore be applicable to novice programmers in other domains.Item Open Access System Catalogue design for a Privacy Preserving Relational Database Management System(2012-12-20) Singh, Sharmila; Barker, KennethThe thesis presents a privacy system catalogue design for a relational database management system (RDBMS). The design includes the predicates of a privacy taxonomy [BAB+09] such as purpose (p), visibility (v), granularity (g) and retention (r). The aim here is to propose a normalized and implementable design for an RDBMS. The main contributions include a study of privacy predicates representation, the basic design of system tables, query processing, an analysis of the design and a partial implementation. Since additional privacy features affect query processing, an algorithm for the SELECT process is described, implemented and analyzed. An alternate design has been suggested as well to fully consider other design possibilities. Lastly, an example application is provided to help understand the proposed design. Design, analysis and implementation of the proposed design, and the query process along with the example system have allowed us to conclude that the design is normalized and implementable.