Course Code: B63005H
Course Name: Database System (B)
Credits: 2.0
Level: Undergraduate
Pre-requisite: Data Structure, Programming, Principles of Computer
Lecture Time: 10 weeks, 2 sessions/week, 2 hours/session
Instructors: Dr. Shimin Chen
Course Description
Database system is the core technology in the age of big data. Student should master the following basic knowledge through the study of this course. (1) Model and theory of relational database; (2) Use of database and design of database application; (3) The classic data structures and algorithms in database system; (4) Characteristics and key technologies of different types of database system. At the same time, we hope that students can use the real system to solve practical problems and cultivate their practical ability through programming experiments, thus having a more profound understanding of class content. This course will lay a solid foundation, when students are engaged in scientific research, application development, system optimization, data analysis in the direction of database system and big data processing.
Topics and Schedule
1.1. Origin and development of database
1.2. Relationship between database and operating system
1.3. Relationship between database and file system
1.4. Relationship between database and data storage
1.5. Relationship between database and data mining
2.1. E-R model
2.2. Integrity constraints
2.3. Functional dependency
2.4. Normalized paradigm
2.5. Data view
2.6. Relational algebra and set-representation
2.7. Selection, projection and join
2.8. Relational algebra expression ability
2.9. Atomicity, consistency, isolation and durability (ACID)
3.1. Origin and development of structured Query Language
3.2. Combination relationship model and algebra
3.3. SQL, DDL, DML, DCL, TCL and Nested SQL
3.4. UDF and stored procedure
3.5. Expression ability of SQL
3.6. Typical table definition, query and update based on TPC benchmarking
3.7. Database Systems Lab
4.1. Install open source database system (PostgreSQL or MySQL)
4.2. Use TPC-H dbgen to generate TPC-H database and data loading
4.3. Write SQL according to requirements
4.4. Run TPC-H query to test performance
4.5. Database Systems Lab
5.1. Commonly used database system (Oracle, Microsoft SQL Server, IBM DB2, MySQL and PostgreSQL)
5.2. Procedure of database application design
5.3. JDBC/ODBC programming
5.4. Web server, application server and db server
Lab 2:
Design an application which based on back-end services of database, it is modeled on the Amazon.com, 12306, CMT/Easychair conference system, Linkedin, or lending system, etc. A project team should have three people to carry on it.
6.1. Front-end and back-end
6.2. Syntax parsing and system directory
6.3. Generation of operation plan
6.4. Query optimization
6.5. Data storage, index, buffer pool and query execution
7.1. Storage mode (file and partition) and space management
7.2. Internal structure of data page
7.3. Line and column storage
7.4. Insert, delete and modification operations
7.5. Operation of buffer pool
7.6. Replacement algorithm
7.7. Index structure, B+/- Tree and Hash index
Lab 3:
Design to implement a database system for data analysis which support value type, type string data and support the function: select from where group by. A project team should have three people to carry on it.
8.1. Storage mode (file and partition) and space management
8.2. Algorithm of selection, group by, aggregation and distrinct
8.3. Cost analysis
8.4. Design of operator pull/push
8.5. Blocking/non-blocking operator
8.6. Nested loop join and nested loop index join
8.7. Sort merge join
8.8. Hash join
9.1. Relationship between modules in a complete database system
9.2. Database Systems Lab
10.1. Statistics and cost analysis
10.2. Equivalent transformation
10.3. Optimization of operation plan and query
11.1. Transaction management
11.2. 2-phase locking
11.3. Multi-Version Concurrency Control (MVCC)
11.4. Write-ahead logging
11.5. Crash recovery
12.1. Transaction management
12.2. Online Analytical Processing (OLAP)
12.3. Data cube
12.4. Materialized view
13.1. Optimization of CPU cache
13.2. Type of main memory database
13.3. Structure of MonetDB
13.4. Structure of SAP HANA
14.1. Characteristics of big data
14.2. MapReduce
14.3. KV Store
Grading
Three experimental assignments and labs will be given. These will be graded by group, and their scores will count for 60% of the total. At the end of the class it will be followed by a final examination which will count for 40%.
Textbook
Raghu Ramakrishnan, Johannes Gehrke. McGraw-Hill. Database Management Systems. 3rd edition (August 14, 2002)
Reference