Jes2ica.IO

coding, trading, reading

  1. 1. Summary
  2. 2. Introduction
  3. 3. Phase 0
  4. 4. Phase 1: B-Tree

Link: A History and Evaluation of System R

Summary

  • Demonstrate the advantages of relational data model
  • Describe 3 principle phases of the System R

Introduction

  • Trends: data independence - immunity of applications to change in storage structure and access strategy
  • Relational data model: Proposed by E.F. Codd, store information in two ways:
    • by the contents of records stored in the database
    • by the ways in which these records are connected together
  • Two properties:
    • all information is represented by data values
    • the system supports a very high-level language

Phase 0

  • Relational access method: XRM.
  • XRM stores relations in the form of “tuples”, each of which has a unique 32-bit “tuple identifier”: TID.
  • The most challenging task: design of optimizer algorithms for efficient execution of SQL statements on top of XRM. => optimizer made extensive use of inversions
  • Demonstrated the usability of SQL.
  • Pros & Cons:
    • Pros: fast w/ finding distinct jobs
    • Cons:
      • Too many I/Os For each tuple, look up all its fields
      • Use “inversions” to find TIDs with a given value for a field
  • Lessons learned:
    • The optimizer should take into account not just the cost of fetching tuples, but the costs of creating and manipulating TID lists.
    • A better measure of cost would have been number of “I/Os”.
    • Cost measure should be a weighted sum of CPU time and I/O count.
    • Importance of “join”
    • Simpler interaction, minimize the “path length” for simple SQL statements.

Phase 1: B-Tree

  • Assess method: Research Storage System (RSS)
    • Store data in the individual records of the database.
    • Pros: all the data value of a record could be fetched by a single I/O.
    • B-tree indexes.
  • Optimizing SQL processor: Relational Data System (RDS)
  • Subsystems:
    • View & Authorization subsystem
      • View definition in the form of an SQL parse tree.
      • Authorization is based on privileges, controlled by GRANT and REVOKE.
    • Recovery subsystem
      • Media(disk) failure: image dump + change log.
      • System failure: change log + shadow pages.
        • Why do we need both shadow pages and change logs?
          • You may need log for transaction replay
          • You can only have log because shadow mechanism is quite expensive for large logs
      • Transaction failure: process the change log backwards.
    • Locking subsystem
      • “predicate locks” (abandoned)
        • determine whether predicates are satisfiable are difficult and time-consuming
        • two predicates may appear to conflict
      • hierarchy of locks (used by MySQL)
      • “intention” locks

References
https://marcoma.xyz/2019/04/10/system_r_ibm_reading/

This article was last updated on days ago, and the information described in the article may have changed.