DATABASE MANAGEMENT SYSTEM

Presentation on theme: "DATABASE MANAGEMENT SYSTEM"— Presentation transcript:

1 DATABASE MANAGEMENT SYSTEM
The slides for this text are organized into chapters. This lecture covers Chapter 1. Chapter 1: Introduction to Database Systems Chapter 2: The Entity-Relationship Model Chapter 3: The Relational Model Chapter 4 (Part A): Relational Algebra Chapter 4 (Part B): Relational Calculus Chapter 5: SQL: Queries, Programming, Triggers Chapter 6: Query-by-Example (QBE) Chapter 7: Storing Data: Disks and Files Chapter 8: File Organizations and Indexing Chapter 9: Tree-Structured Indexing Chapter 10: Hash-Based Indexing Chapter 11: External Sorting Chapter 12 (Part A): Evaluation of Relational Operators Chapter 12 (Part B): Evaluation of Relational Operators: Other Techniques Chapter 13: Introduction to Query Optimization Chapter 14: A Typical Relational Optimizer Chapter 15: Schema Refinement and Normal Forms Chapter 16 (Part A): Physical Database Design Chapter 16 (Part B): Database Tuning Chapter 17: Security Chapter 18: Transaction Management Overview Chapter 19: Concurrency Control Chapter 20: Crash Recovery Chapter 21: Parallel and Distributed Databases Chapter 22: Internet Databases Chapter 23: Decision Support Chapter 24: Data Mining Chapter 25: Object-Database Systems Chapter 26: Spatial Data Management Chapter 27: Deductive Databases Chapter 28: Additional Topics

2 Basic Definitions Database:
A logical coherent collection of data representing the mini-world such that change in the mini-world brings about change in database collected for a particular purpose and for a group of intended users. Data: Meaningful facts, text, graphics, images, sound, video segments that can be recorded and have an implicit meaning. Metadata: Data that describes data File Processing System A collection of application programs that perform services for the end-users such as production of reports Each program defines and manages its own data Database Management System (DBMS): A software package/ system to facilitate the creation and maintenance of a computerized database. Database System: The DBMS software together with the data itself. Sometimes, the applications are also included. Database + DBMS

3 Simplified database system environment

4 Evolution of DB Systems
Flat files s s Hierarchical – 1970s s Network – 1970s s Relational – 1980s - present Object-oriented – 1990s - present Object-relational – 1990s - present Data warehousing – 1980s - present Web-enabled – 1990s - present

5 Purpose of Database Systems
Database management systems were developed to handle the difficulties of typical file-processing systems supported by conventional operating systems

6 Disadvantages of File Processing
Program-Data Dependence File structure is defined in the program code. All programs maintain metadata for each file they use Duplication of Data (Data Redundancy) Different systems/programs have separate copies of the same data Same data is held by different programs. Wasted space and potentially different values and/or different formats for the same item. Limited Data Sharing No centralized control of data Programs are written in different languages, and so cannot easily access each other’s files. Lengthy Development Times Programmers must design their own file formats Excessive Program Maintenance 80% of of information systems budget Vulnerable to Inconsistency Change in one table need changes in corresponding tables as well otherwise data will be inconsistent

7 Advantages of Database Approach
Data independence and efficient access. Data integrity and security. Uniform data administration. Concurrent access, recovery from crashes. Replication control Reduced application development time. Improved Data Sharing Different users get different views of the data Enforcement of Standards All data access is done in the same way Improved Data Quality Constraints, data validation rules Better Data Accessibility/ Responsiveness Use of standard data query language (SQL) Security, Backup/Recovery, Concurrency Disaster recovery is easier

8 Costs and Risks of the Database Approach
Up-front costs: Installation Management Cost and Complexity Conversion Costs Ongoing Costs Requires New, Specialized Personnel Need for Explicit Backup and Recovery Organizational Conflict Old habits die hard

9 Database Applications
Banking: all transactions Airlines: reservations, schedules Universities: registration, grades Sales: customers, products, purchases Manufacturing: production, inventory, orders, supply chain Human resources: employee records, salaries, tax deductions Databases touch all aspects of our lives

10 Levels of Abstraction Many views, single conceptual (logical) schema and physical schema. Views describe how users see the data. Conceptual schema defines logical structure Physical schema describes the files and indexes used. View 1 View 2 View 3 Conceptual Schema Physical Schema Schemas are defined using DDL; data is modified/queried using DML. 6

11 Example: University Database
Conceptual schema: Students(sid: string, name: string, login: string, age: integer, gpa:real) Courses(cid: string, cname:string, credits:integer) Enrolled(sid:string, cid:string, grade:string) Physical schema: Relations stored as unordered files. Index on first column of Students. External Schema (View): Course_info(cid:string, enrollment:integer) 7

12 Instances and Schemas Similar to types and variables in programming languages Schema – the logical structure of the database (e.g., set of customers and accounts and the relationship between them) Instance – the actual content of the database at a particular point in time

13 Data Independence Ability to modify a schema definition in one level without affecting a schema definition in the other levels. The interfaces between the various levels and components should be well defined so that changes in some parts do not seriously influence others. Two levels of data independence Physical data independence:- Protection from changes in logical structure of data. Logical data independence:- Protection from changes in physical structure of data.

14 Instances and Schemas Similar to types and variables in programming languages Schema – the logical structure of the database e.g., the database consists of information about a set of customers and accounts and the relationship between them) Analogous to type information of a variable in a program Physical schema: database design at the physical level Logical schema: database design at the logical level Instance – the actual content of the database at a particular point in time Analogous to the value of a variable Physical Data Independence – the ability to modify the physical schema without changing the logical schema Applications depend on the logical schema In general, the interfaces between the various levels and components should be well defined so that changes in some parts do not seriously influence others.

15 Database Languages Data Definition Language (DDL)
Specification notation for defining the database schema DDL compiler generates a set of tables stored in a data dictionary Data dictionary contains metadata (data about data) Data storage and definition language – special type of DDL in which the storage structure and access methods used by the database system are specified Data Manipulation Language (DML) Language for accessing and manipulating the data organized by the appropriate data model Two classes of languages Procedural – user specifies what data is required and how to get those data Nonprocedural – user specifies what data is required without specifying how to get those data

16 Database Users Users are differentiated by the way they expect to interact with the system Application programmers – interact with system through DML calls Sophisticated users – form requests in a database query language Specialized users – write specialized database applications that do not fit into the traditional data processing framework Naïve users – invoke one of the permanent application programs that have been written previously E.g. people accessing database over the web, bank tellers, clerical staff

17 Database Administrator
Coordinates all the activities of the database system; the database administrator has a good understanding of the enterprise’s information resources and needs. Database administrator's duties include: Schema definition Storage structure and access method definition Schema and physical organization modification Granting user authority to access the database Specifying integrity constraints Acting as liaison with users Monitoring performance and responding to changes in requirements

18 Data Models A collection of tools for describing:
Data relationships Data semantics Data constraints Object-based logical models Entity-relationship model Object-oriented model Semantic model Functional model Record-based logical models Relational model (e.g., SQL/DS, DB2) Network model Hierarchical model (e.g., IMS)

19 Entity-Relationship Model
The basics of Entity-Relationship modelling Entities (objects) E.g. customers, accounts, bank branch Attributes Relationships between entities E.g. Account A-101 is held by customer Johnson Relationship set depositor associates customers with accounts Widely used for database design Database design in E-R model usually converted to design in the relational model which is used for storage and processing

20 Employees ssn name lot ER Model Basics Entity: Real-world object distinguishable from other objects. An entity is described using a set of attributes. Each attribute has a domain. Entity Set: A collection of similar entities. E.g., all employees. All entities in an entity set have the same set of attributes. (Until we consider ISA hierarchies, anyway!) Each entity set has a key. Weak Entities: A weak entity can be identified uniquely only by considering the primary key of another (owner) entity. The slides for this text are organized into several modules. Each lecture contains about enough material for a 1.25 hour class period. (The time estimate is very approximate--it will vary with the instructor, and lectures also differ in length; so use this as a rough guideline.) This covers Lectures 1 and 2 (of 6) in Module (5). Module (1): Introduction (DBMS, Relational Model) Module (2): Storage and File Organizations (Disks, Buffering, Indexes) Module (3): Database Concepts (Relational Queries, DDL/ICs, Views and Security) Module (4): Relational Implementation (Query Evaluation, Optimization) Module (5): Database Design (ER Model, Normalization, Physical Design, Tuning) Module (6): Transaction Processing (Concurrency Control, Recovery) Module (7): Advanced Topics 3

21 name ER Model Basics ssn lot Employees since name dname super-visor subor-dinate ssn lot did budget Reports_To Employees Works_In Departments Relationship: Association among two or more entities. E.g., Attishoo works in Pharmacy department. Relationship Set: Collection of similar relationships. An n-ary relationship set R relates n entity sets E1 . En; each relationship in R involves entities e1 E1, . en En Same entity set could participate in different relationship sets, or in different “roles” in same set. 4

22 E-R Diagrams Rectangles represent entity sets.
Diamonds represent relationship sets. Lines link attributes to entity sets and entity sets to relationship sets. Ellipses represent attributes Double ellipses represent multivalued attributes. Dashed ellipses denote derived attributes. Underline indicates primary key attributes (will study later)

23 Mapping Cardinality Constraints
Express the number of entities to which another entity can be associated via a relationship set. Most useful in describing binary relationship sets. For a binary relationship set the mapping cardinality must be one of the following types: One to one One to many Many to one Many to many

24 Mapping Cardinalities
One to one One to many Many to one Many to many

25 Participation Constraints
Does every department have a manager? If so, this is a participation constraint: the participation of Departments in Manages is said to be total (vs. partial). Every Department entity must appear in an instance of the relationship Works_In (have an employee) and every Employee must be in a Department Both Employees and Departments participate totally in Works_In name name since dname dname ssn lot did did budget budget Employees Manages Departments Works_In since 8

26 Keys A super key of an entity set is a set of one or more attributes whose values uniquely determine each entity. A candidate key of an entity set is a minimal super key Customer_id is candidate key of customer account_number is candidate key of account Although several candidate keys may exist, one of the candidate keys is selected to be the primary key. Alternate key is the candidate key which are not selected as primary key. Foreign key are the attributes of an entity that points to the primary key of another entity. They act as a cross-reference between entities. Composite Key consists of two or more attributes that uniquely identify an entity. Non-key attributes are the attributes or fields of a table, other than candidate key attributes/fields in a table. Non-prime Attributes are attributes other than Primary Key attribute(s)..

27 Relational Model Example of tabular data in the relational model:

28 Relational Model (Basic)
The relational model used the basic concept of a relation or table. Tuple:- A tuple is a row in a table. Attribute:- An attribute is the named column of a relation. Domain:- A domain is the set of allowable values for one or more attributes. Degree:- The number of columns in a table is called the degree of relation. Cardinality:- The number of rows in a relation,is called the cardinality of the relation.

29 Integrity Constraints
Integrity constraints guard against accidental damage to the database, by ensuring that authorized changes to the database do not result in a loss of data consistency. Domain Constraints:- It specifies that the value of each attribute x must be an atomic value from the domain of x. Key Constraints:- Primary Key must have unique value in the relational table. Referential Integrity:-It states that if a foreign key in table A refers to the primary key of table B then, every value of the foreign key in table A must be null or be available in table B. Entity Integrity:- It states that no attribute of a primary key can have a null value.

30 A Sample Relational Database

31 SQL Introduction Standard language for querying and manipulating data
Structured Query Language Many standards out there: ANSI SQL, SQL92 (a.k.a. SQL2), SQL99 (a.k.a. SQL3), …. Vendors support various subsets: watch for fun discussions in class !

32 SQL Data Definition Language (DDL) Data Manipulation Language (DML)
Create/alter/delete tables and their attributes Following lectures. Data Manipulation Language (DML) Query one or more tables – discussed next ! Insert/delete/modify tuples in tables

33 Tables in SQL Table name Attribute names Product PName Price Category
Manufacturer Gizmo $19.99 Gadgets GizmoWorks Powergizmo $29.99 SingleTouch $149.99 Photography Canon MultiTouch $203.99 Household Hitachi Tuples or rows

34 Tables Explained The schema of a table is the table name and its attributes: Product(PName, Price, Category, Manfacturer) A key is an attribute whose values are unique; we underline a key

35 Data Types in SQL Atomic types:
Characters: CHAR(20), VARCHAR(50) Numbers: INT, BIGINT, SMALLINT, FLOAT Others: MONEY, DATETIME, … Every attribute must have an atomic type Hence tables are flat Why ?

36 Tables Explained A tuple = a record A table = a set of tuples
Restriction: all attributes are of atomic type A table = a set of tuples Like a list… …but it is unorderd: no first(), no next(), no last().

37 SQL Query Basic form: (plus many many more bells and whistles)
SELECT FROM WHERE

38 Simple SQL Query SELECT * FROM Product WHERE category=‘Gadgets’
PName Price Category Manufacturer Gizmo $19.99 Gadgets GizmoWorks Powergizmo $29.99 SingleTouch $149.99 Photography Canon MultiTouch $203.99 Household Hitachi SELECT * FROM Product WHERE category=‘Gadgets’ PName Price Category Manufacturer Gizmo $19.99 Gadgets GizmoWorks Powergizmo $29.99 “selection”

39 Simple SQL Query Product PName Price Category Manufacturer Gizmo $19.99 Gadgets GizmoWorks Powergizmo $29.99 SingleTouch $149.99 Photography Canon MultiTouch $203.99 Household Hitachi SELECT PName, Price, Manufacturer FROM Product WHERE Price > 100 PName Price Manufacturer SingleTouch $149.99 Canon MultiTouch $203.99 Hitachi “selection” and “projection”

40 Notation Input Schema Product(PName, Price, Category, Manfacturer) SELECT PName, Price, Manufacturer FROM Product WHERE Price > 100 Answer(PName, Price, Manfacturer) Output Schema

41 Keys and Foreign Keys Company Key Product Foreign key CName StockPrice
Country GizmoWorks 25 USA Canon 65 Japan Hitachi 15 Key Product Foreign key PName Price Category Manufacturer Gizmo $19.99 Gadgets GizmoWorks Powergizmo $29.99 SingleTouch $149.99 Photography Canon MultiTouch $203.99 Household Hitachi

42 Join between Product and Company
Joins Product (pname, price, category, manufacturer) Company (cname, stockPrice, country) Find all products under $200 manufactured in Japan; return their names and prices. Join between Product and Company SELECT PName, Price FROM Product, Company WHERE Manufacturer=CName AND Country=‘Japan’ AND Price

43 Joins Product Company PName Price Category Manufacturer Gizmo $19.99 Gadgets GizmoWorks Powergizmo $29.99 SingleTouch $149.99 Photography Canon MultiTouch $203.99 Household Hitachi Cname StockPrice Country GizmoWorks 25 USA Canon 65 Japan Hitachi 15 SELECT PName, Price FROM Product, Company WHERE Manufacturer=CName AND Country=‘Japan’ AND Price

44 More Joins Product (pname, price, category, manufacturer)
Company (cname, stockPrice, country) Find all Chinese companies that manufacture products both in the ‘electronic’ and ‘toy’ categories SELECT cname FROM WHERE

45 NULLS in SQL Whenever we don’t have a value, we can put a NULL
Can mean many things: Value does not exists Value exists but is unknown Value not applicable Etc. The schema specifies for each attribute if can be null (nullable attribute) or not How does SQL cope with tables that have NULLs ?

46 Outer Joins Left outer join: Right outer join: Full outer join:
Include the left tuple even if there’s no match Right outer join: Include the right tuple even if there’s no match Full outer join: Include the both left and right tuples even if there’s no match

47 Modifying the Database
Three kinds of modifications Insertions Deletions Updates Sometimes they are all called “updates”

48 Insertions General form: INSERT INTO R(A1,…., An) VALUES (v1,…., vn)
Example: Insert a new purchase to the database: INSERT INTO Purchase(buyer, seller, product, store) VALUES (‘Joe’, ‘Fred’, ‘wakeup-clock-espresso-machine’, ‘The Sharper Image’) Missing attribute  NULL. May drop attribute names if give them in order.

49 Insertions INSERT INTO PRODUCT(name) SELECT DISTINCT Purchase.product
FROM Purchase WHERE Purchase.date > “10/26/01” The query replaces the VALUES keyword. Here we insert many tuples into PRODUCT

50 Insertion: an Example Product(name, listPrice, category)
Purchase(prodName, buyerName, price) prodName is foreign key in Product.name Suppose database got corrupted and we need to fix it: Purchase Product prodName buyerName price camera John 200 gizmo Smith 80 225 name listPrice category gizmo 100 gadgets Task: insert in Product all prodNames from Purchase

51 Insertion: an Example INSERT INTO Product(name)
SELECT DISTINCT prodName FROM Purchase WHERE prodName NOT IN (SELECT name FROM Product) name listPrice category gizmo 100 Gadgets camera -

52 Insertion: an Example INSERT INTO Product(name, listPrice)
SELECT DISTINCT prodName, price FROM Purchase WHERE prodName NOT IN (SELECT name FROM Product) name listPrice category gizmo 100 Gadgets camera 200 - camera ?? 225 ?? Depends on the implementation

53 Deletions Example: DELETE FROM PURCHASE WHERE seller = ‘Joe’ AND
product = ‘Brooklyn Bridge’ Factoid about SQL: there is no way to delete only a single occurrence of a tuple that appears twice in a relation.

54 Updates Example: UPDATE PRODUCT SET price = price/2
WHERE Product.name IN (SELECT product FROM Purchase WHERE Date =‘Oct, 25, 1999’);

55 BIG DATA Big data is a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data-processing application software. Data with many cases (rows) offer greater statistical power, while data with higher complexity (more attributes or columns) may lead to a higher false discovery rate. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating, information privacy and data source. Big data was originally associated with three key concepts: volume, variety, and velocity. Other concepts later attributed with big data are veracity (i.e., how much noise is in the data) and value.

56 BIG DATA Current usage of the term big data tends to refer to the use of predictive analytics, user behavior analytics, or certain other advanced data analytics methods that extract value from data, and seldom to a particular size of data set. "There is little doubt that the quantities of data now available are indeed large, but that's not the most relevant characteristic of this new data ecosystem

57 BIG DATA Analysis of data sets can find new correlations to "spot business trends, prevent diseases, combat crime and so on.” Scientists, business executives, practitioners of medicine, advertising and governments alike regularly meet difficulties with large data-sets in areas including Internet search, fintech, urban informatics, and business informatics. Scientists encounter limitations in e-Science work, including meteorology, genomics, connectomics, complex physics simulations, biology and environmental research.