Proactive Data Management System (PDMS)
Date
2007-04-30
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Proactive Data Management System (PDMS) is designed to manage large
datasets within grid environments. PDMS is particularly useful in scientific
environments where large amounts of data are often moved between computing and
data archiving sites. PDMS facilitates management and movement of data using
metadata, i.e., the data items are iden- tified using their inherent
properties and characteristics rather than the file names in which they are
stored. The use of metadata abstracts away the physical location of a file
allowing PDMS to transparently manage replicas of a file. It is intended to be
used by groups need- ing to manage large data sets across several locations.
PDMS utilizes well known Data Grid services. This allows it to interoperate
with various workflow managers in use today. Management of data using
metadata allows the replication requests to PDMS to be specified in terms of
metadata. For example, a replication request to PDMS can be move the data
generated by user A for project B within last three months to the site X .
The metadata in the above example are - (i) generated by user A, (ii)
belonging to project B and (iii) generated in last 3 months. The metadata in
the above example correspond to some logical files in which the data is
stored. The logical files can be physically present at multiple locations, in
which case, PDMS locates all pieces of the dataset and initiates a transfer
of all the pieces. Thus, with a given replication request, PDMS needs to
perform two key tasks before initiating transfers - (i) use metadata to
establish the logical names of the files that match the metadata query and
(ii) select sources of replicas for those logical files not already at the
destination. This management of replicas on the basis of metadata fills gap
in the previously available Data Management services available. PDMS is
designed to restrict access to authorized and authenticated users who have
permission to use the system. Files are stored in logical groupings referred
to as collections. It also restricts users access to specific collections.
These access restrictions resemble file ownership with ownership of
collections as well as read and write privileges. A more complete description
of the access control can be found in [3]. PDMS maintains the consistency of
the data for a collection. This currently includes not allowing the same
physical file to be registered twice (as two separate logical files). PDMS
also ensures that users conform to the schema they include in their
registration request. PDMS could be configured to enforce each collection to
conform to a specific schema. This is particularly useful in large groups that
need to be sure all metadata contains certain information and want to prevent
buggy registration processes from introduction inconsistency or incomplete
metadata. Consistency requirements for the PDMS system are intended to be
configurable as consistency checking can be expensive.
Description
Keywords
Computer Science