UNITY - A DATABASE INTEGRATION TOOL
Date
2000-10-16
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The World-Wide Web (WWW) provides users with the ability to access
a vast number of data sources distributed across the planet. Internet
protocols such as TCP/IP and HTTP have provided the mechanisms for exchanging
the data. However, a fundamental problem with distributed data access is the
determination of semantically equivalent data. Ideally, users should be able
to extract data from multiple Internet sites and have it automatically combined
and presented to them in a usable form. No system has been able to accomplish
these goals due to limitations in expressing and capturing data semantics.
This paper details the construction, function, and deployment of Unity, a
database integration software package which allows database semantics to be
captured so that they may be automatically integrated. Unity is the tool that
we use to implement our integration architecture detailed in previous work.
Our integration architecture focuses on capturing the semantics of data stored
in databases with the goal of integrating data sources within a company, across
a network, and even on the World-Wide Web. Our approach to capturing data
semantics revolves around the definition of a standardized dictionary which
provides terms for referencing and categorizing data. These standardized terms
are then stored in semantic specifications called X-Specs which store metadata
and semantic descriptions of the data. Using these semantic specifications,
it becomes possible to integrate diverse data sources even though they were
not originally designed to work together. The centralized
version of the architecture is presented which allows for the independent
integration of data source information (represented using X-Specs) into a
unified view of the data. The architecture preserves full autonomy of the
underlying databases which are transparently accessed by the user from a
central portal. Distributing the architecture would by-pass the central
portal and allow integration of web data sources to be performed by a user's
browser. Such a system which achieves automatic integration of data sources
would have a major impact on how the Web is used and delivered. Unity
is the bridge between concept and implementation. Unity is a complete
software package which allows for the construction and modification of
standardized dictionaries, parsing of database schema and metadata to
construct X-Specs, and contains an implementation of the integration algorithm
to combine X-Specs into an integrated view. Further, Unity provides a
mechanism for building queries on the integrated view and algorithms for
mapping semantic queries on the integrated view to structural (SQL) queries
on the underlying data sources. Notes: Join released technical report.
Released as TR-00-17 for the University of Manitoba, and 2000-664-16 for the
University of Calgary.
Description
Keywords
Computer Science