Link: A History and Evaluation of System R
Summary
- Demonstrate the advantages of relational data model
- Describe 3 principle phases of the System R
Introduction
- Trends: data independence - immunity of applications to change in storage structure and access strategy
- Relational data model: Proposed by E.F. Codd, store information in two ways:
- by the contents of records stored in the database
- by the ways in which these records are connected together
- Two properties:
- all information is represented by data values
- the system supports a very high-level language
Phase 0
- Relational access method: XRM.
- XRM stores relations in the form of “tuples”, each of which has a unique 32-bit “tuple identifier”: TID.
- The most challenging task: design of optimizer algorithms for efficient execution of SQL statements on top of XRM. => optimizer made extensive use of inversions
- Demonstrated the usability of SQL.
- Pros & Cons:
- Pros: fast w/ finding distinct jobs
- Cons:
- Too many I/Os For each tuple, look up all its fields
- Use “inversions” to find TIDs with a given value for a field
- Lessons learned:
- The optimizer should take into account not just the cost of fetching tuples, but the costs of creating and manipulating TID lists.
- A better measure of cost would have been number of “I/Os”.
- Cost measure should be a weighted sum of CPU time and I/O count.
- Importance of “join”
- Simpler interaction, minimize the “path length” for simple SQL statements.
Phase 1: B-Tree
- Assess method: Research Storage System (RSS)
- Store data in the individual records of the database.
- Pros: all the data value of a record could be fetched by a single I/O.
- B-tree indexes.
- Optimizing SQL processor: Relational Data System (RDS)
- Subsystems:
- View & Authorization subsystem
- View definition in the form of an SQL parse tree.
- Authorization is based on privileges, controlled by GRANT and REVOKE.
- Recovery subsystem
- Media(disk) failure: image dump + change log.
- System failure: change log + shadow pages.
- Why do we need both shadow pages and change logs?
- You may need log for transaction replay
- You can only have log because shadow mechanism is quite expensive for large logs
- Why do we need both shadow pages and change logs?
- Transaction failure: process the change log backwards.
- Locking subsystem
- “predicate locks” (abandoned)
- determine whether predicates are satisfiable are difficult and time-consuming
- two predicates may appear to conflict
- hierarchy of locks (used by MySQL)
- “intention” locks
- “predicate locks” (abandoned)
- View & Authorization subsystem
References
https://marcoma.xyz/2019/04/10/system_r_ibm_reading/