IN3020/IN4020 –Database Systems Spring 2020, Week 15.2 …
Transcript of IN3020/IN4020 –Database Systems Spring 2020, Week 15.2 …
IN3020/IN4020 – Database Systems Spring 2020, Week 15.2
SUMMARY
Dr. M. Naci Akkøk, Chief Architect, Oracle NordicsBased upon slides by E. Thorstensen from Spring 2019
The course
o This is a Database Systems course
o IN3020 is a bachelor level course, and this is the last lecture covering IN3020 curriculum (common with IN4020 so far), though the rest is also recommended
o IN4020 is a higher-level course. There will be approximately 5 x 45 minutes (5 lecture hours) of material. 2 hours by external lecturers, rest by me.
Today
o We will use 1 hour for summary and exam form etc. Relevant for both IN3020 and IN4020
o We will use the second hour to go through DBMS architecture and dependencies/interactions between layers/components.
o The second hour is a requirement for IN5020 – Distributed Systems. Relevant and required for IN4020 and recommended (but not required) for IN3020 students planning to take IN5020 any time soon.
o Do ask questions, but in chat.
Repetition & Highlights
Goals:At the end of the course, you should be able to…o Understand what Database Management Systems (DBMS) do,
how they work and how they are implementedo Have a good grasp of the theoretical & practical aspects of
database administration (which you will hear more about next week Monday in “Current and Future Role of the DBA”)
o Have a good grasp of use of SQL and how queries are optimizedo Understand the principles of transaction managemento Know about the TX management types (isolation levels) that
modern DBMS offer, and their respective strengths/weaknesses
We looked at these challenges (and more)
o PERFORMANCE, OPTIMIZATION:What can one do if the queries are slow?
o CONCURRENCY:How can one read from and write to the same database at the same time without problems? How would the performance be effected?
o RECOVERY:What happens if the database crashes while updating the data?
We took up some background we needed to build up the course
o DBMS architecture (relevant also for IN5020)o DBMS file systemso Various data structures and algorithmso Relational algebra (relevant for query optimization)
The course is divided into several parts
o Main focus:o Queries, SQL and query optimizationo DBMS characteristics and mechanismso ACID, transaction management, concurrency, isolation
levels o Other DBMS (slightly less than previous years)
o DBMS architecture (syllabus for IN4020)o Additional bit about security, DBA´s role, what data science &
analytics require of DBMS, emerging technologies and database research
Syllabus: Queries, SQL and query optimization
o SQL (repetition and a bit more)o Query optimization (including relational algebra)o Indexes, index usage and the underlying searcho Query plans and optimization
Syllabus: ACID, DBMS characteristics and Mechanisms o Managing multiple concurrent uses/users and protecting
the datao ACID – what it stands foro Transaction Management, concurrencyo Locking, logging, buffer & cache managemento Relationships between these mechanismso Synchronization/replication challengeso Serialization, isolation
Syllabus: Other database systems (briefly) –NoSQL DBMS and newer DBMS technologieso Categorization of the NoSQL DBMS and the problems they
address (their “reason of being”)o A bit more about spatial & graph databases, semantic
technologies and databases with built-in intelligence (AI/ML capabilities in and outside the “black box”)
o Autonomous databaseso Multi-model databases
FOR IN4020(recommended otherwise as well)
Lectures for IN4020 (recommended for all)
o DBMS architecture and relations between DBMS layers/components (today, next hour)
o Security (guest, coming Monday)o Current and future role of the DBA (guest, coming Monday)
o Implications of data science & analytics, modern DBMS and newer mechanisms (coming Wednesday)
Lectures for IN4020 (recommended for all)
o R&D themeso Performance – how and for what purpose, exactly?o Big or fast data – or both? How? How can newer architectures like the Data
Mesh help?o What is the relation doing in the table? What happens to indexes and keys
and search (queries, joins etc.) if it is not part of the data? Can it be as cleanly formalized in an “extended” relational algebra?
o Intelligence in databases – autonomy, auto-inference (also in RDF semantics) and other built in mechanisms of learning and intelligence.
o Extraction-transformation-loading challenges: Intelligence in data quality management and I/O
EXAM INFO
ABSOLTELY RECOMMENDED FOR ALL J
Exam informationo The final exam is now a take-home examo Evaluation: Pass/Fail
(preferably pass, of course J) o
You can use books and notes and slides and Google (alle hjelpemidler tillat)
o You cannot cooperate or ask your brother/sister to do it for you! Remember that strict social distancing is required J
Exam informationo You will receive the exam on the 26th of May 2020 as
announcedo You have 1 week to deliver the take-home examo The work will be the equivalent of maximum 2 days.o You will be informed how you are to deliver the take-home
(most likely Inspera but you will be informed when you get the exam)
Exam informationo This all means that there won´t be standard exam questions
you can simply look upo It means that there will be actual “do it in practice”
exercises directing you to formulate or try things out, show results and reason about them
o It mean that there will be discussion question, questions that ask you to reason etc., showing your understanding of the syllabus
In other words…GOOD LUCK!
(IN3020) IN4020 – Database Systems Spring 2020, Week 15.2
DBMS ARCHITECTURE & DEPENDENCIES BETWEEN COMPONENTS
Dr. M. Naci Akkøk, Chief Architect, Oracle NordicsBased upon slides by V. Goebel & E. Munthe-Kaas
Example DBMS architecture
The components
Typi
cal D
BMS
Com
pone
nts
Same architecture as functional layers
Applications
User Interfaces / View Management
Semantic Integrity Control / Authorization
Query Processing and Optimization
Storage Structures
Buffer Management
Concurrency Control / Logging
Interface
Control
Compilation
Execution
Data Access
Consistency
Tran
sact
ion
Man
agem
ent
Database
Dependencies between layers/componentso Central components (system functions): components that
manage their resources directly down to the operating system interface - system buffer management, lock component, log component (with savepoint management).
o Higher-level components (system functions): components demanding/requiring the central components as prerequisites - transaction management, access path management, sorting component, etc.
Dependencies between central components
LOG COMPONENT(with savepoint management)
LOCK COMPONENT
SYSTEM BUFFER MANAGEMENT
1 3
2
Interaction 1: Log & Buffer (pool)o WAL (write ahead log) principle: log information is written to
secondary storage before the “real” information (pages) is written to secondary storage.
o Page replacement strategy determines which protocol (logging) approaches are applicable. For instance, logical protocol approach rquires indirect replacement strategy.
Interaction 2: Log & Locko Logging unit ≤ Locking unito In case of logical protocol approaches, the logging unit is the set of
all data objects changed by a DML operation. If a rollback is performed no other transactions must be affected (damaged).
o If the lock component realizes only a simple 2-phase-locking protocol - instead of a strict 2-phase-commit protocol (all locks are kept until end-of-transaction), all relationships (interferences) between transactions (also read-only) about commonly used data items must be logged to enable a recursive rollback in case of failure.
Interaction 3: Lock & Buffer (pool)o The main task of the lock component is to guarantee the
“logical single user mode”.
o This requires that the lock component controlls which pages a fixed and which pages are unfixed in the system buffer by the system buffer management (influencing paging strategy) to guarantee isolation (locking).
Dependencies between central & higher-level components
CENTRAL COMPONENTS
ACCESS PATH MANAGEMENT
TRANSACTION MANAGEMENT
SORTINGCOMPONENT
B CA D E F
Interaction A: Log & Access path mgmnt. o Physical state logging:
Requires the maintenance and logging of access paths when inserting new data items.
o Logical state logging: requires no extra effort of access path maintenance because DML operations can be rolled back (inverted) and repeated.
Interaction B: Lock & Access path mgmnt.o For B* trees (special access paths) an unadjusted locking
concept can decrease parallelism. No exclusive locks should be set on the root of the tree, because this would block all access paths along that tree.
Interaction C: Transaction mgmnt. & Logo The “all-or-nothing” principle of a transaction requires the
logging of UNDO information for rollback and REDO information to repeat a transaction.
o In case of very many short transactions the logging component can become the bottleneck of the DBMS.
o One solution to this problem is to group transactions and to defer their end-of-transaction in order to write the log information in a blocked way to secondary storage.
Interaction D: Tx & System buffer mgmnto The system buffer management is the central component
concerning performance optimization. o The number of parallel transactions has a strong influence
on the paging rate. o Transaction management should defer the activation of a
transaction if this would cause a decrease of the paging rate.
o Transaction management should group transactions in a buffer-oriented way for performance reasons.
Interaction E: Tx management & Locko The locking component guarantees the transaction-oriented isolation
of data items. o It is possible that deadlock situations (circular resource allocations)
occur that require “unjustified” rollbacks of transactions to resolve the deadlock.
o Transaction management has to determine which transaction(s) have to be rolled back in order to cause minimum work loss.
o Transaction management and locking component have to be adjusted so that deadlocks are not possible, e.g., by pre-claiming(*), or sequentialization of transactions which have overlapping data areas.
Pre-Claiming Lock Protocolo Pre-claiming protocols evaluate their operations and create
a list of data items on which they need locks in advance.o Before initiating an execution, the transaction requests the
system for all the locks it needs beforehand. o If all the locks are granted, the transaction executes and
releases all the locks when all its operations are over. o If all the locks are not granted, the transaction rolls back
and waits until all the locks are granted.
Interaction F: Sorting component & System buffer managemento Special precautions for system buffer management
are necessary if the sorting component is active (especially relevant for relational DBMS).
o The paging strategy should not be contra-productive to the applied sorting algorithm.
SEE YOU NEXT WEEK!STAY SAFE, STAY HEALTHY!