Table of Contents
Designing a database and a software application to access it is a multi-phase process. Usually, the starting phase is to architect the logical design of the database, which involves defining a database, determining tables and their fields, establishing relationships among these, identifying Primary and Foreign keys and their mappings. The next phase involves implementation of the above logical design within a specific database program using proper tools to implement data integrity. In this phase a database software (for example MySQL, Postgres, Oracle, etc.) is selected, and database, tables are created physically. In the last phase the end-user application is developed. This application allows users to interact with the data stored in the database by using interfaces (such as web pages, GUI, etc.).
This essay will concentrate mainly on the logical design of databases and provide an introduction to databases and Structured Query Language (SQL).
A database is a collection of related data or information, which usually consists of three elements:
Tables
Columns
Rows
Tables, also referred to as entities, are the data structures holding the content of the database. For example, you might choose to store information about your library staff and the departments they work for. In this case, staff would become one table, and departments become another.
Columns, also referred to as fields, represent the attributes of a table. In the above example, a staff member can be described as an entity with its own attributes such as first name, last name, address, city, state, ZIP code, phone number, and so on, and a department can have its own attributes.
Rows, also referred to as records, represent the actual data. Whereas fields describe what data is stored, the rows of a table are where the actual data is stored. Each record is a collection of information about one specific thing -- in this example, it's your staff member or his/her department.
Depending on the database architecture, there are many types of databases, such as flat file databases, relational databases, hierarchical databases, network databases and more. Flat file databases, and relational databases will be discussed in this essay.
A flat file database is described as a simple database model, where all the information is stored in a plain text file, one record per line. Each record is divided into fields using delimiters (such as comma, tab, etc.) or at fixed column positions. The data is "flat", as in a sheet of paper, as compared to a more complex model such as a relational database. Strictly, a flat file database should consist of nothing but data and delimiters, with no relationships or links between records and fields except the table structure.
A single table can run into problems quickly when trying to represent anything but simple data types. While simple, this system rapidly becomes inefficient as the number of records grows, and makes it difficult for users to organize data. This system is not as secure as relational databases, and the data can be easily corrupted if more than one process is writing into database at the same time. There is no inherent locking mechanism that detects when a file is being used or modified.
Flat-file databases are ideal for small amounts of data that need to be human readable or edited by hand.
In an Relational Database Model (RDM), the data in different tables is mapped with relations. The goal of relational database design is to store information without unnecessary redundancy and retrieve information easily and accurately.
Unlike flat file databases, in RDM data integrity is built into the model at various levels, such as the field level, to ensure the accuracy of the data; at the table level to ensure that records are not duplicated; at the relationship level to ensure that the relationship between a pair of tables is valid; and at the business level to ensure that the data is accurate in terms of the business itself. Data is consistent and accurate due to the various levels of integrity you can impose within the database.
Data modeling is an act of analyzing your organization's or client's goals about the database application you will be developing. In this process, you define tables, relations and how these are related to each other. Data modeling asks the question "What?" instead of the more common data processing question, "How?"
The simplest data model consists of entities and relationships. As mentioned before, entities are real world objects storing data. A relationship is a significant association between two entities.
Entities can be related to each other in a variety of ways. Functional dependencies are formed when a column of one table relates to attributes of other tables. The simplest relationship is the one-to-one relationship, in which one record in a table is related to another record in a separate table. A one-to-many relationship is one in which one record in a table is related to multiple records in another table. A many-to-one relationship defines the reverse situation; more than one record in a single table relates to only one record in another table. Finally, in a many-to-many relationship, more than one record in a table relates to more than one record in another table.
A key is an entity in a table that distinguishes one record of data from another. The key can be a single column, or it can consist of a group of columns that uniquely identifies a record. Tables can contain primary keys which differentiate records from one another. Primary keys can be an individual attribute, or a combination of attributes. Foreign keys relate tables in the database to one another. A foreign key in one table is a primary key in another. The foreign keys generally define parent-to-child relationships between tables. Database design is the process of transforming a logical data model into a physical database design and then implementing the physical model as an actual database. A logical data model is required before you can even begin to design a physical database
An index is structured to make it easier to find data in the database, with fewer I/O operations. Therefore, queries can perform faster when using an index to look up data based on specific key values. Try to build indexes on large tables to support the most frequently run queries, and also create indexes on the most-referenced columns in frequently run queries in your application.
Another aspect of database modeling is the creation of database views to support specific application data requirements. Views are not required to access a physical database, but they can be helpful to support specific application and user requirements. No physical structure is required of a view; it is a representation of data that is stored in other tables
Normalization is a design approach that minimizes data redundancy and optimizes tables by systematically and properly placing data elements into the appropriate groupings. The normalized data will ensure that each entity is well formed and that each attribute is assigned to the proper entity. The normal forms are defined as follows:
The objective of first normal form (1NF) is to eliminate repeating groups from an entity. When data conforms to 1NF, each attribute of the entity is a single discrete fact.
Second normal form (2NF) ensures that all the attributes of each entity are dependent on the primary key. To transform 1NF data into 2NF, create separate entities for sets of attributes that apply to multiple records and assign a foreign key to the new entity to relate it to its previous entity. Simply stated, entity occurrences should not depend on anything other than the entity's primary key.
Third normal form (3NF) ensures that no relationships between attributes exist within an entity. Every attribute in the entity should depend only on the primary key.
Boyce Codd normal form (BCNF) is a further refinement of 3NF. Fourth normal form (4NF) states that no entity can have more than a single one-to-many relationship if the one-to-many attributes are independent of each other. Fifth normal form (5NF) specifies that every join dependency for the entity must be a consequence of its candidate keys.
There are three types of data integrity that are implemented during the database design process.
This ensures that the field that identifies each record within the table is unique and is never missing its value. This will make certain that there are no duplicate records in a table; every record in a table is identified by a Primary key value; every Primary key value is unique; and Primary key values are not null.
This ensures that the structure of every column is well defined, that the values in each field are valid, consistent, and accurate throughout the database. Field Specifications help to warrant that the identity and purpose of each field is clear, and that all of the tables in which it appears are properly identified; field definitions are consistent throughout the database; and the values of the field are consistent and valid.
SQL, a Data Manipulation and Data Definition Language, allows users to access data in relational database management systems, such as Oracle, Sybase, Informix, Microsoft SQL Server, MySQL, and others, by allowing users to describe the data the user wishes to see.
As a Data Manipulation Language (DML), it can be used to query and update data in the database:
SELECT - extracts data from a database table
UPDATE - updates data in a database table
DELETE - deletes data from a database table
INSERT INTO - inserts new data into a database table
As a Data Definition Language (DDL), SQL permits database tables to be created or deleted. The most important DDL statements in SQL are:
CREATE TABLE - creates a new database table
ALTER TABLE - alters (changes) a database table
DROP TABLE - deletes a database table
CREATE INDEX - creates an index (search key)
DROP INDEX - deletes an index
Here I only touched the surface of Relational Database Design model and SQL. I hope the information provided here can be useful as an introduction to your database application development project.
Hernandez, Michael J. Database Design for Mere Mortals: A Hands-On Guide to Relational Database Design, Second Edition
Database Types http://websiteowner.info/articles/cgi/databasetypes.asp
What is a Relational Database? http://www.wisegeek.com/what-is-a-relational-database.htm