A short review on Distributed database Systems
Before going on to Distributed Database Systems, let me briefly explain you about Centralised Database Systems.
Here all system components i.e the database and the Database Management System (DBMS) reside at a single computer or site.
Users may be able to access the Centralised Database System remotely via terminals connected to the site;
however all the data access and processing takes place at the central site.
The following figure shows a Centralised DB System.
Centralized database System
In a Distributed Database System the database is stored/spread physically across computers or sites in different
locations that are connected together by some form of data communication network. They may be spread over WAN or LAN.
The computers may be of different types such as IBM Mainframes, VAXs, SUN work station, PCs etc managed by different operating
systems and each fragment of the data base may be managed by a different DBMS such as Oracle, Ingress, and Microsoft SOL server.
Distributed database management system (DDBMS)
In a DDS, database applications running at any of the system's sites should be able to operate on any of
the database fragments transparently i.e., as if the data come from a single database managed by one DBMS.
The software that manages a distributed database in such a way is called DDBMS.
The notion of distributed database is different from that of decentralised database.
The latter does not imply sharing of data by a communication network. The former implies a collection of sites connected
together with some kind of network and where each site has a database in its own right, but the sites work together as
if data was stored at only one site.
Distributed database System Decentralized database System
Distributed database design:
The methodology used for the logical design of a centralized database applies to the design of the distributed one as well.
However, for a distributed database three additional factors have to be considered.
Data Fragmentation: Before we decide how to distribute the data we must determine the logical units of distribution.
The database may be broken up into logical units called fragments which will be stored at different sites.
The simplest logical units are the tables themselves.
Horizontal fragmentation: A horizontal fragment of a table is a subset of rows in it. So horizontal fragmentation divides a table 'horizontally' by selecting the relevant rows and these fragments can be assigned to different sides in the distributed system (for ex. Euston Road branch gets the fragment where myTable.branch ='Euston Road').
Vertical fragmentation: a vertical fragment of a table keeps only certain attributes of it. It divides a table vertically by columns. It is necessary to include the primary key of the table in each vertical fragment so that the full table can be reconstructed if needed.
Mixed fragmentation: in a mixed fragmentation each fragment can be specified by a SELECT-PROJECT combination of operations. In this case the original table can be reconstructed be applying union and natural join operations in the appropriate order.
Data Replication: A copy of each fragment can be maintained at several sites. Data replication is the
design process of deciding which fragments will be replicated.
Data Allocation: Each fragment has to be allocated to one or more sites, where it'll be stored.
There are three strategies regarding the allocation of data:
Fragmented (or Partitioned): The database is partitioned into disjoint fragments, with each fragment assigned to one site (no replication). This is also called 'non-redundant allocation'.
Complete replication: A complete copy of the database is maintained at each site (no fragmentation). Here, storage costs and communication costs for updates are most expensive. To overcome some of these problems, snapshots are sometimes used. A snapshot is a copy of the data at a given time. Copies are updated periodically.
Selective replication: A combination of fragmentation and replication.
Types of DDBMS
A DDBMS can be classified as :
Homogeneous, if all sites use the same DBMS product
Heterogeneous, if sites may run different DBMS products, which need not be based on the same underlying data model and so may be composed of Relational, Network, Hierarchical and Object-oriented DBMSs.
The term DBMS transparency refers to the ability of DDBMS to hide the implementation details from the user.
Thus the fact that a distributed database is split into several fragments that can be stored on different computers
and perhaps replicated is hidden from the user. The objective of the transparency is to make the distributed system
appear like a centralized system.
We can identify four main types of transparency in a DDBMS:
Distributed systems mirror the structure of an enterprise.
Local autonomy (control). Security, integrity, storage representation and hardware are controlled locally. At the same the same time user can access remote data when necessary.
No reliance on a central site. Avoid bottlenecks and system vulnerability.
Reliability and availability. Continue to operate if one or more sites go down or communication links fail.
Speed up of query processing. Queries about data stored locally are answered faster. Moreover, queries can be split to execute in parallel at different sites or they can be redirected to less busy sites.
Modular growth. It is much easier to add another site than to expand a centralised system.
Software complexity and high costs. A DDBMS that hides the distributed nature from the user and provides
an acceptable level of performance, reliability and availability is inherently more complex than a centralized DBMS.
Processing overheads: Increased query processing costs, catalogue management and consistency maintenance.
Database design more complex