Intro to Database Systems

💾Intro to Database Systems Unit 4 – Relational Database Design

Relational database design is a crucial skill for organizing and managing data efficiently. This unit covers key concepts like the relational model, primary and foreign keys, normalization, and SQL fundamentals. Understanding these principles helps create robust database structures that ensure data integrity and support complex queries. Entity-Relationship diagrams and normalization techniques are essential tools for designing effective databases. The unit also explores common design patterns and practical applications in various industries, highlighting the importance of proper database design for scalability, performance, and data security.

Key Concepts

  • Relational model represents data as a collection of tables (relations) with rows (tuples) and columns (attributes)
  • Primary key uniquely identifies each row in a table and can consist of one or more columns
    • Composite key is a primary key that consists of multiple columns
  • Foreign key is a column or set of columns in one table that refers to the primary key of another table establishing a relationship between the two tables
  • Normalization is the process of organizing data in a database to reduce redundancy and improve data integrity
    • Involves dividing larger tables into smaller tables and defining relationships between them based on the dependencies
  • SQL (Structured Query Language) is used to manage and manipulate relational databases
    • Includes commands for creating, modifying, and querying database structures and data
  • Entity-Relationship (ER) diagrams visually represent the relationships between entities in a database
    • Entities are objects or concepts that are represented in the database (person, place, thing, event)
    • Attributes are the properties or characteristics of an entity

Relational Model Basics

  • Relational model is based on mathematical concepts from set theory and predicate logic
  • Relations (tables) consist of tuples (rows) and attributes (columns)
    • Each tuple represents a single instance of an entity or relationship
    • Each attribute represents a specific piece of data about the entity or relationship
  • Relational algebra defines a set of operations that can be performed on relations
    • Operations include select, project, union, intersection, difference, and join
  • Relational calculus is a formal language for defining and manipulating relations using first-order logic
  • Relational databases enforce data integrity through constraints
    • Entity integrity ensures that each row in a table has a unique identifier (primary key)
    • Referential integrity ensures that relationships between tables are consistent and valid (foreign keys)
  • Relational databases support ACID properties (Atomicity, Consistency, Isolation, Durability) to ensure data reliability and consistency

Entity-Relationship Diagrams

  • ER diagrams provide a high-level conceptual view of the database structure
  • Entities are represented as rectangles and attributes as ovals connected to the entities
  • Relationships between entities are represented as diamonds connected to the participating entities
    • Relationships can be one-to-one (1:1), one-to-many (1:M), or many-to-many (M:N)
  • Cardinality specifies the number of instances of an entity that can be associated with instances of another entity
    • Minimum cardinality indicates the minimum number of instances (0 or 1)
    • Maximum cardinality indicates the maximum number of instances (1 or many)
  • Participation constraint specifies whether the existence of an entity depends on its relationship with another entity
    • Total participation means every instance of an entity must participate in the relationship
    • Partial participation means instances of an entity may or may not participate in the relationship
  • ER diagrams can be translated into relational schemas by mapping entities to tables and relationships to foreign keys

Normalization Techniques

  • Normalization is the process of organizing data to minimize redundancy and dependency
  • Normal forms define different levels of normalization based on the presence of specific types of dependencies
    • First Normal Form (1NF) eliminates repeating groups and ensures atomic values in each cell
    • Second Normal Form (2NF) removes partial dependencies on composite keys
    • Third Normal Form (3NF) eliminates transitive dependencies on non-prime attributes
    • Boyce-Codd Normal Form (BCNF) is a stricter version of 3NF that allows no functional dependencies on non-prime attributes
  • Functional dependency is a relationship between attributes where the value of one attribute determines the value of another
  • Multivalued dependency occurs when the presence of one attribute value implies the presence of other attribute values, regardless of any other attributes
  • Denormalization is the intentional introduction of redundancy to improve query performance, but it must be balanced against the cost of maintaining data consistency

SQL Fundamentals

  • SQL is a declarative language used to manage and manipulate relational databases
  • Data Definition Language (DDL) statements are used to define and modify database structures
    • CREATE statement is used to create tables, views, indexes, and other database objects
    • ALTER statement is used to modify the structure of existing database objects
    • DROP statement is used to delete database objects
  • Data Manipulation Language (DML) statements are used to insert, update, and delete data in tables
    • INSERT statement is used to add new rows to a table
    • UPDATE statement is used to modify existing rows in a table
    • DELETE statement is used to remove rows from a table
  • Data Query Language (DQL) statements are used to retrieve data from tables
    • SELECT statement is used to query data from one or more tables based on specified conditions
    • JOIN clauses are used to combine rows from multiple tables based on related columns
      • INNER JOIN returns only the matching rows between tables
      • LEFT JOIN and RIGHT JOIN return all rows from one table and the matching rows from the other table
      • FULL OUTER JOIN returns all rows from both tables, with NULL values for non-matching rows
  • Aggregate functions (COUNT, SUM, AVG, MIN, MAX) are used to perform calculations on groups of rows
  • Subqueries are nested queries that can be used in various parts of an SQL statement to retrieve data based on complex conditions

Database Design Process

  • Conceptual design involves identifying entities, attributes, and relationships based on the business requirements
    • Includes creating ER diagrams to represent the high-level structure of the database
  • Logical design involves translating the conceptual model into a relational schema
    • Includes defining tables, columns, primary keys, foreign keys, and constraints
    • Normalization is applied during this stage to ensure data integrity and minimize redundancy
  • Physical design involves implementing the logical design in a specific database management system (DBMS)
    • Includes defining storage structures, indexes, partitions, and other performance-related aspects
  • Database testing and validation ensure that the designed database meets the business requirements and performs efficiently
    • Includes testing data integrity, query performance, and user acceptance
  • Database maintenance and evolution involve making changes to the database structure and data over time
    • Includes handling schema changes, data migrations, and performance optimizations

Common Design Patterns

  • Supertype-Subtype pattern (generalization/specialization) models entities that share common attributes but have distinct specialized attributes
    • Supertypes contain common attributes and subtypes inherit these attributes and add specific ones (Vehicle as supertype, Car and Truck as subtypes)
  • Recursive relationship pattern models entities that have a hierarchical or self-referencing relationship
    • Useful for representing organizational structures, bill of materials, or comment threads (Employee table with a self-reference to the manager)
  • Many-to-Many relationship with attributes pattern models relationships that have associated attributes
    • Involves creating an intermediate table (associative entity) to store the relationship attributes (Student, Course, and Enrollment tables)
  • Arc-Edge pattern models graph-like structures where entities have complex relationships
    • Nodes represent entities and edges represent relationships between nodes (Social network with Person nodes and Friend edges)
  • Exclusive Arc pattern models relationships where an entity can have only one of several possible relationships
    • Useful for representing mutually exclusive roles or states (Employee can be either a Manager or a Supervisor, but not both)

Practical Applications

  • Relational databases are widely used in various domains, such as e-commerce, healthcare, finance, and social media
  • E-commerce applications use relational databases to store product catalogs, customer information, orders, and transactions
    • Design patterns like Many-to-Many with attributes are used to model product categories, shopping carts, and order details
  • Healthcare applications use relational databases to store patient records, medical history, and treatment plans
    • Supertype-Subtype pattern is used to model different types of medical conditions and treatments
  • Financial applications use relational databases to store account information, transactions, and financial instruments
    • Recursive relationship pattern is used to model account hierarchies and portfolio structures
  • Social media applications use relational databases to store user profiles, connections, posts, and interactions
    • Arc-Edge pattern is used to model user relationships and content sharing
  • Proper database design is crucial for ensuring data integrity, scalability, and performance in these applications
    • Normalization and indexing techniques are applied to optimize query performance and minimize data redundancy
    • Security measures, such as access control and data encryption, are implemented to protect sensitive information


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.