ANSI/SPARC Architecture	An architecture describing three different levels in a database system: 1. External level: different users view the data in different ways. Data may be displayed in different format. There may be derived values that are not stored persistently in the database. 2. Conceptual level: the logical structure of the entire database described by the conceptual schema, including all data entities (and their attributes), relations and integrity constraints. All external views must be derivable from conceptual schema. Independent of any storage details. 3. Internal level: describes the physical storage characteristics of the data, including storage space allocations, indexes, compression/encryption, etc. Independent of any particular DBMS software.
Advantage of the 3 schema representation	1. Each user/application can have his own customized view of the data, and should be able to change this view without affecting other users. 2. Changes to the conceptual level should only affect those users/applications accessing the changed data - this is called logical data independence. 3. Changes to the internal level should not affect user views - this is called physical data independence. 4. Changes to physical storge device should not affect other levels.
Data Model	A description of the data, relationships between data, and the constraints on data. There are external, conceptual and internal data models. Chronologically: hierarchical, network, relational, ER and OO models.
Candidate key	One or a set of attributes with the following properties: 1. Unique: no two turples have the same candidate key. 2. Minimal: no attribute of the candicate key can be removed without violating the uniqueness property.
Foreign key	One or a set of attributes within one relation defined over the same domain as the primary key of another relation. Foreign keys implement relationships between tables.
Relational database	A collection of nomalized relations.
Types of tables	1. Base table: a table stroed as a file on disk. Physically persistent. 2. Derived table: a temporary table produced as a result of a query. Exists only for the duration of the operation which engenders it. 3. View: a virtual table. A predefined query. Excuted to get the derived table only when invoked.
Relational Languages	1. Data Definiton Language (DDL) - specify the structure of data. Such as create and delete. 2. Data Manipulation Language (DML)
Relational DMLs	1. Relational calculus - non-procedual. 2. Relational algebra - procedual. Operators apply to at most two relations at a time. 3. Transform oriented languages 4. Graphical languages 5. Fourth and fifth generation languages.
Relational algebra	1. Selection 2. Projection 3. Join 4. Union 5. Intersection 6. Difference 7. Division
Join	1. Theta-join 2. Equi-join: "select * from A, B where ..." May contain duplicated columns. 3. Natural join: "select column1, column2, ... from ..." Equi-join without duplicated columns. 4. Outer join: a join which includes non-matching rows in one table.
Division	The result of table (x, y) divided by (x) is a table containing all y values in table (x, y) whose coresponding x values includes all values in table (x).
Database design cycle	1. Requirements definition: identify and analyse user views. External level of ANSI/SPARC architecture. 2. Conceptual design: develop enterprise data model independent of any specific database model. Various design methodologies can be used, such as ERD. Conceptual level of ANSI/SPARC architecture. 3. Logical design: develop a data model targeting a specific database model, but still independent of particular DBMS package. Conceptual level of ANSI/SPARC architecture. 4. Physical design: physical implementation of the logical design, including file organizations, access methods, storage structures, etc. Depending on the used DBMS. Internal level of ANSI/SPARC architecture.
Local & Global data model	Each user view defines a local data model. Merging all user views forms an enterprise data model or global data model.
Integrity constraint	A condition which data in the database must satisfy. Including: 1. Entity integrity: no part of the primary key can be null. 2. Referential integrity: a foreign key can only be null, or equal to the value of an existing primary key in another relation.
Referential integrity update rules	1. Update restricted: a primary key is not allowed to be updated if it is referred to by a foreign key. 2. Update cascaded: if a primary key is updated, all foreign keys which refer to it will be accordingly updated. 3. Update nullified: if a primary key is updated, all foreign keys which refer to it will be nullified.
Referential integrity delete rules	1. Delete restricted: a primary key is not allowed to be deleted if it is referred to by a foreign key. 2. Delete cascaded: if a primary key is deleted, all foreign keys which refer to it will be accordingly deleted. 3. Delete nullified: if a primary key is deleted, all foreign keys which refer to it will be nullified.
DBDL (database define lauguage)	customer(customer_id, surname, initial, address, telephone)
Normalization	To get rid of three anomalies: 1. Modification anomaly: when a single data item is changed, there is a need to make multiple changes. 2. Creation anomaly: where a new row is created, extra data should be filled in. 3. Deletion anomaly: when a row is deleted, other data is lost.
Normalization procedures	1. Remove repeating groups. 2. Remove partial key dependencies. 3. Remove transitive dependencies.
File organisation	Physical organization of a file - the arrangement of records in a file. There are three types: 1. Serial files. 2. Random files. 3. Indexed files.
File access methods	The techniques used to make records available to programs and users. Three types: 1. Serial access. 2. Random or direct access. 3. Dynamic access.
Random file collision resolutions	1. Linear probing. 2. Double hashing. 3. Separate overflow.
Two types of indexed files	1. Indexed non-sequential files. Must be fully indexed. 2. Indexed sequential files. Dynamic access method can be used.
Secondary index	Indexes on non-key attributes. Non-key attributes which appears in WHERE clauses may be indexed to speed up the query.
Physical index structures	1. ISAM. 2. Binary tree. 3. B tree. 4. B+ tree. 5. VSAM (proprietary implementation of B+ tree).
ISAM	1. By IBM. 2. Indexed sequential access mchanism. 3. Hardware oriented. Master index => cylinder index => track index => data tracks. 4. Initial loading run with percent-fill factor. Need to be periodically re-organised. Therefore not efficient for frequently updated files. 5. Not supported by RDBMS.
Binary tree	Binary tree is unbalanced. Data pointers are dispersed throughout every level of the tree. This results in an wildly unequal access time and complex traversal programming.
B-tree	Balanced. Unequal access time improved but not solved. Traversal still complex.
B+ tree	Balanced. All data pointers on leaf nodes. Unique access time. Easy traversal.
Oracle data dictionary	System maintained tables containing information about all tables, views, indexes and other objects that make up the database. They are accessible through views: 1. DICT 2. USER_TABLES 3. USER_VIEWS 4. USER_INDEXES 5. USER_TABLESPACES 6. USER_CATLOG 7. ALL_CATLOG
Transaction	A logical unit of work. Contains a number of processes, involving database access. It has two properties: 1. Atomic: either completely done, or not done at all. 2. Durable: once committed, the result must not be lost.
Process of a transaction	A transaction has an identifier. It commences with a start-transaction command, reaches a commit point, and terminates with either a commit or a rollback.
Logging and saving	Writing data to a serial log file is faster than formally saving it on disk. So all transaction details are immediately recorded in the log file, but only saved on disk at a certain checkpoint which happens periodically. Logging contains: 1. Start-transaction and end-transaction records. 2. Before and after images. 3. Checkpoints.
Checkpoint procedure	Checkpoints are taken periodically. Procedures: 1. Current transaction is suspended. Accepting new transactions is temporarily halted. 2. All commited transactions are written to disk. 3. A checkpoint record is written in the log. 4. Resume normal execution.
Two kinds of crashes	1. Soft crash: volatile storage is lost. No demage to the hard disk. Need a restart. 2. Hard crash: hard disk is made permanently unreadable. Need a recovery.
Soft crash restart	1. The log is rewound to the last checkpoint. 2. Read the log forward from the checkpoint, generating a REDO list containing all committed-but-unsaved transactions, and a UNDO list containing all uncommitted transactions. 3. Undo transactions listed in the UNDO list using before-images, and redo transactions listed in the REDO list using after-images.
Hard crash recovery	1. Re-build the database from the most recent backup. 2. Roll forward the database by applying the accumated after-images taken after that backup.
Centralised concurrency control	Without it, two problems can occur: 1. Lost update problem: if two users acquires the same data to update before any of them writes back, the first one's result written back will be overwritten by the second one. 2. Allow the one transaction to view the partial result of another. Solution: locking the data.
Lock types	1. Shared lock: multiple transactions can lock the same data simultaneously to read it. 2. Exclusive lock: if a process wants to update a data, it must acquire an exclusive lock of the data. It will not get the lock until all other locks on that data is released.
Granularity (lock level)	1. File level 2. Record level 3. Field level
Dead lock	A transaction can only release a lock after it is finished. Suppose transaction 1 nees to update data A and B. It locks A but can not lock B because B has been locked by transaction 2. Transaction 2 has locked B but it also needs to lock data A. So the two transactions are waiting for each other to finish to release a lock, so both of them can not finish. To prevent deadlock, a transaction must acquire all locks before it starts to update any data. If it can not acquire all locks, it release all acquired locks and try later. In Oracle, an updated data will be locked until the update is commited.
Pre-relational database models	1. Network model (including both trees and network structures). CODASYL model. 2. Pointer linked data structures 3. Hierarchical model
4 ways to store physical trees	1. A single hierarchical file. No foreign keys. 2. Two files without foreign key plus an index connecting the two data types. 3. Two files without foreign key linked by pointers. 4. Two fully normalized files. Relational model.
Pointer linked data structures	1. Linked list: parent record points to the first child, first child points to the second, ... 2. Ring: linked list, last child points back to the parent. 3. Double-linked ring. 4. Multiple child pointers. 5. Additional owner points: all above methods plus a pointer from child to parent. 6. Multi-level trees.