Database Design – 30 STEM Links a Week

What’s a Database?

If you do not know, a database is a place to store information used by software applications. For example, you could have a web page with a list of companies and all their locations with contact information for each location. Or a banking application on your computer to sort and manage your checkbook. In both cases, it makes sense to store the data in one piece of software called a database. The database has a structure and rules about how to add, edit, delete, and read data stored in the database.

Databases also reside in many different places. Some databases exist on only your computer. Other databases have their data shared and divided across hundreds or thousands of databases located in many data centers all over the world.

Types of Databases

There are at least two types of databases each with different design restrictions. The traditional SQL (Structured Query Language) database, also called a relational database, tends to have more tables (rows of data) and more references, relationships, and consistency between data in those tables. NoSQL violates many of the data consistency rules of SQL databases while providing benefits mostly unavailable with SQL databases.

For example, a NoSQL database works best for applications with massive amounts of data where most activity involves reading data from the database with some writing of data to the database. Reading is less intensive than writing because writing data to a database requires tracking when a database table is open. NoSQL databases tend to be on multiple machines and, in some cases, machines in multiple data centers. Keeping data in sync is comparatively easier and less complex with NoSQL databases. Even in cases where large data sets are not involved, some developers prefer the easier interactions between their code and a NoSQL database.

Another key difference between these two types of databases is design flexibility. SQL databases tend to require more work and care because the underlying structure of one or more tables need to be adjusted when changes happen. NoSQL databases, in contrast, have table structures that make it comparatively easier to change the number of fields included in a database table.

As with anything technical, there are all sorts of exceptions you will encounter. For example, relational databases use sharding and other techniques to manage the synchronization of data across machines and data centers. This article is only an overview to provide context as you learn more about database design.

Data Models

The first step in any database design is the creation of a data model. The model distills all the functionality requirements for an application into data collections, for example, products, customers, and suppliers for an ecommerce site, as well as properties and relationships between these collections of data.

There are several risks data models help limit or avoid:

Business processes some times can be duplicated in the database structure, creating problems if a process changes. A good data model provides flexibility independent from any process.
Needless duplicated table in multiple locations within the same database. This is a big issue in relational databases.
Data models for related applications differ for no reason. Ideally, a data model takes into account other applications used by the business or individual.
Data might be difficult to extract or share with other software applications. If data sharing is important, a data model should ensure data can be extracted easily.

Database and data models typically are represented as graphs. Early stages of development, however, use business requirements and functional specifications to clarify the system a data model must represent and support. In some cases, for example, health care or finance, there may be examples of data models used widely which are adopted or adapted.

The data model also is one of several factors in the decision about what database management system (DBMS) to use, relational or NoSQL.

NoSQL Database Design

Key-value pairs are the main feature of these databases. Keys are names or unique ID numbers and values range from simple data to documents to columns of data to structured lists (arrays) of key-value data. Each row in a NoSQL table includes the key and its value. The design of NoSQL databases depends on the type of database, called stores:

Document Stores pair each key identifier with a document which can be a document, key-value pairs, or key-value arrays.
Graph Stores are designed to hold data best represented by graphs, interconnected data with an unknown number of relations between the data, for example, social networks or road maps.
Key-Value Stores are the simplest type with with every bit of data stored with a name (as key) and its data (value).
Wide Column Stores are optimized for queries across large data sets.

There are other ways to describe the range of NoSQL databases available but these are the simplest and most comprehensive categories. And within each type of NoSQL database, functionality differs which can impact database design. For example, MongoDB was evolved from the MySQL project, changing the data model from relational to NoSQL, yet retains most of the indexing, dynamic queries, and other useful features of relational databases.

Perhaps the key design difference between NoSQL and relational databases is the structure of data in each database. Relational databases require data be organized ahead of time. NoSQL databases can have their structure modified on the fly with little impact because they use key-value pairs; updating a data structure in NoSQL can involve adding additional data to the value of one or more keys while leaving other key-value pairs in the database untouched.

Design strategies for NoSQL databases depend on the type of database and the virtues (or negatives) of different data model techniques. Where relational databases have a user-centered approach, asking “What answers can I get from the database?”, NoSQL databases have an application-centered approach, asking “What questions do I have?”

This is a critical difference both in data structures as well as approaches to designing a database.

Configuring a database to provide specific answers entails lots of design and structure up front which limits future flexibility and makes future changes likely to be complicated. Configuring a database to handle many possible questions, in contrast, results in a more flexible database design. Typically data is duplicated in many different places in a database to help answer questions with less effort. NoSQL database design uses a set of rules called BASE (basically available, soft-state, eventually consistent) to guide their design.

NoSQL database data model techniques include:

Denormalization puts all data needed to answer a query in one place, typically a single database table, instead of splitting the data into multiple tables.
Aggregates use light or no validation of data types, for example, strings or integers.
Joins are done at the application level, not as part of a database query. This requires more planning to match one type or set of data with another, for example, all examples of a product type (jeans) sorted by manufacturer in an online store.
Indexes and key tables to identify and sort data quickly for retrieval.
Tree structures can be modeled as a single data entity, for example, a comment with all its responses.

The NoSQL Data Modeling Techniques article linked at the bottom of this article includes a more comprehensive list, additional explanations, and links to learn more about specific data model techniques.

Relational (SQL) Database Design

SQL database design relies mostly on techniques called “normalization.” The goal of normalization is to reduce or eliminate duplicate data in a database to reduce errors in stored data. Each table in a relational database ideally holds data of one type or thing, for example, addresses. The trade-off is less flexibility when application changes impact more than one database table. Relational databases use a set of rules called ACID (Atomicity, Consistency, Integrity, Durability) to guide database design.

The key design steps for a relational database include:

Define the purpose of the database.
Research and collect all information about data to be included and the purpose of the database over time.
Divide the data to be included into subjects or types, for example, user account information. Each of these will (or should) become individual database tables.
For each database table, identify the data points to include. Each data point becomes a column in the database table.
For each database table, identify the optimal primary key to uniquely identify each row of data.
Compare and evaluate how data in your tables relate to each other. Add fields, or possibly tables, to clarify relationships between data within each table. For example, a database with contact information for companies might need to include multiple addresses, phone numbers, and other data for each company.
Test your database design with paper then code queries for the most common tasks. Refine your table design as needed.
Normalize the database design to ensure each table represents one thing, or concept, with references and relationships to other tables if/as needed.

Learn More

Introduction to Relational (SQL) Databases

http://www.techopedia.com/definition/1234/relational-database-rdb
http://en.wikipedia.org/wiki/Relational_database

Introduction to NoSQL Databases

http://www.mongodb.com/learn/nosql
http://www.techopedia.com/definition/27689/nosql-database
http://www.techopedia.com/definition/25218/non-relational-database
https://en.wikipedia.org/wiki/NoSQL

10 Useful Articles on Database Design

While an old article, many of the points still pertain. Focus is on relational databases.
http://woork.blogspot.com/2008/09/10-useful-articles-about-database.html

Database Design Basics

Microsoft’s take on designing Access relational databases.
http://office.microsoft.com/en-us/access-help/database-design-basics-HA001224247.aspx

Database Design (Wikipedia)

Written mostly from a relational database viewpoint.
https://en.wikipedia.org/wiki/Database_design

Data Access for Highly-Scalable Solutions: Using SQL, NoSQL, and Polyglot Persistence

An extensive Microsoft tutorial to show how to combine relational and NoSQL databases in a single application.
http://msdn.microsoft.com/en-us/library/dn271399.aspx

Why You Should Never Use MongoDB

While her title is overwrought, and inaccurate, this is a great description how people model data in the real world and how they work with technology. It also shows how non-relational databases are still young technology with limitations compared to older SQL databases which have a few decades of evolution and iteration behind them. Comments on this article also are interesting.
http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/

What’s a Database?

Types of Databases

Data Models

NoSQL Database Design

Relational (SQL) Database Design

Learn More

Introduction to Relational (SQL) Databases

Introduction to NoSQL Databases

10 Useful Articles on Database Design

Database Design Basics

Database Design (Wikipedia)

Data Models

NoSQL Data Modeling Techniques

Designing NoSQL DBs

Data Access for Highly-Scalable Solutions: Using SQL, NoSQL, and Polyglot Persistence

Why You Should Never Use MongoDB