SQL Server 2005 New Features

47
SQL Server 2005 New Features & Business Intelligence Kleanthis Georgaris Technology Specialist Microsoft Hellas

description

 

Transcript of SQL Server 2005 New Features

Page 1: SQL Server 2005 New Features

SQL Server 2005New Features

&Business Intelligence

Kleanthis Georgaris

Technology Specialist

Microsoft Hellas

Page 2: SQL Server 2005 New Features

SQL Server 2005A Complete Enterprise Data Management and BI Solution

Analysis ServicesAnalysis ServicesOLAP & Data MiningOLAP & Data Mining

Data TransformationData TransformationServicesServices

ETLETL

SQL ServerSQL ServerRelational EngineRelational Engine

Reporting ServicesReporting Services Managem

ent ToolsM

anagement ToolsD

evel

opm

ent T

ools

Dev

elop

men

t Too

ls

Page 3: SQL Server 2005 New Features

Agenda

n XML Support in SQL Server 2005

n .NET Inside the Database

n A step towards Object Oriented Programming

n User Defined Types

n Business Intelligence

n OLAP

n Data Mining

Page 4: SQL Server 2005 New Features

Agenda

n XML Support in SQL Server 2005

n .NET Inside the Database

n A step towards Object Oriented Programming

n User Defined Types

n Business Intelligence

n OLAP

n Data Mining

Page 5: SQL Server 2005 New Features

Data Representations

n Data can be represented in two waysn Relational (Databases) : Requires Infrastructuren Structured (XML): It’s simply text

n Data are exchange in XML Format but stored in Relationaln We need convergence of the two modelsn Three alternatives

n XML can be stored as textn loses much of value of XML representation

n XML can decomposed into multiple relational tablesn allows use of relational technologies

n XML can be stored as an xml data typen allows use of XML technologies

Page 6: SQL Server 2005 New Features

Mapping Data Models

n Sometimes you need to mix data modelsn middle-tier processing done with XML toolsn web service requires message content in xmln browser requires xml for client side processing

n But you have relational datan most data is stored using the relational model

database

37 Joe D Inc.41 May A Co.14 Sam H Inc.58 Bev K Inc.

company table

id name company<organization><title sn="37" org="D Inc."/><title sn="41" org="A Co."/><title sn="14" org="H Inc."/>

...</organization>

content & identifiers mapped

XML required for message

Page 7: SQL Server 2005 New Features

XML as a data type

n The XML data type is native database typen used as type of column in tablen used as type of parameter in stored proceduren used as type of return value of a user-defined

functionn used as type of a variable

Page 8: SQL Server 2005 New Features

XML data type - Example

CREATE TABLE xml_tab (the_id INTEGER, xml_col XML)

GO

-- auto conversionINSERT INTO xml_tab VALUES(1, '<doc/>')INSERT INTO xml_tab VALUES(2, N'<doc/>')

SELECT CAST(xml_col AS VARCHAR(MAX))FROM xml_tab WHERE the_id < 10

-- fails, not wellformedINSERT INTO xml_tab

VALUES(3, '<doc><x1><x2></x1></x2></doc>')

Page 9: SQL Server 2005 New Features

XML column usage

n XML column is not just a TEXT columnn XML technologies supported

n the contents can validated using XML Scheman XML-aware indexes are supportedn XQuery and XPath 2.0 supportedn in-database XML-related functionality works on

the typen FOR XMLn OpenXML

Page 10: SQL Server 2005 New Features

XML Demo

Page 11: SQL Server 2005 New Features

Agenda

n XML Support in SQL Server 2005

n .NET Inside the Database

n Business Intelligence

n OLAP

n Data Mining

Page 12: SQL Server 2005 New Features

Hosted CLR

n .Net CLR hosted inside SQL Server to improve performancen applications run in same address space as SQL Servern stored procedures in any language supported by CLRn web services can run inside of SQL Server

user .Netcode

T-SQLfunction

database

SQL Server Process

Page 13: SQL Server 2005 New Features

.NET and Visual Studio IntegrationBreakthrough in Developer Productivity

n Choice of programming language n T-SQL for data-intensive functions and proceduresn .NET languages for CPU-intensive functions and procedures

n Choice of where to run logicn Database or mid-tiern Symmetric data access model – ADO.NET

n Integrated debugging experience across mid-tier and database tiern Seamlessly step cross-language – TSQL and .NETn Set breakpoints anywhere, inspect anything

n Flexible and extensiblen Users defined functions, procedures, triggersn User defined types and aggregatesn XML data type

Page 14: SQL Server 2005 New Features

Development Environment

n New SQL Server Project template in VS 2005 for SQL Server 2005 managed code

n Server debug integrationn Full debugger visibilityn Set breakpoints anywhere

n Single step support n Between languagesn Between deployment

tiersn Auto-deployment

n Attributes

Page 15: SQL Server 2005 New Features

VS .NET VS .NET ProjectProject

Assembly: “TaxLib.dll”

VB,C#,C++VB,C#,C++ BuildBuild

SQL ServerSQL Server

SQL Data Definition:SQL Data Definition:create assembly …create function …create procedure …create trigger …create type …

SQL Queries: SQL Queries: select sum(tax(sal,statetax(sal,state)))from Empwhere county = ‘King’

Runtime hosted by SQL

(in-proc)

The Developer Experience

Page 16: SQL Server 2005 New Features

n Native SOAP accessn Standards based access to SQL

Servern No client dependencyn Improved Interoperability

n New “ENDPOINT AS HTTP” objectn Configure connection infon Configure authenticationn Expose Functions & SPsn Expose TSQL Batches

http://server1/aspnet/default.aspxhttp://server1/aspnet/default.aspx

http://server1/sql/pubs?wsdlhttp://server1/sql/pubs?wsdl

KernelKernelModeMode

ListenerListener

SQL Web Services

Page 17: SQL Server 2005 New Features

Why user-defined types?

n Add scalars that extend the type systemn used in sorts, aggregatesn customized sort orders and arithmetic calculations

n Allows scalars to be implemented efficientlyn compact representationn operations written in compiled language

Page 18: SQL Server 2005 New Features

UDTs on the client

n SQL Server UDTs are "normal" .NET classesn can be used in clients as

n parametersn DataReader column values

n Methods can be used on the client or servern Code can be

n locally available to clientsn stored in GAC

Page 19: SQL Server 2005 New Features

Using UDTs with T-SQL

n Using UDTs through Transact-SQL involves nothing new/* assuming a UDT called Point has m_x and m_y properties

CREATE TABLE point_tab( oid integer, point_col POINT)*/SqlConnection conn = new SqlConnection("my connect string");SqlCommand cmd = new SqlCommand();cmd.Connection = conn;conn.Open();cmd.CommandText = "insert into point_tab values(1, convert(Point, '10:10');

int i;i = cmd.ExecuteNonQuery();cmd.CommandText = "update point_tab

set point_col::m_x = 15where oid = 1";

i = cmd.ExecuteNonQuery();

Page 20: SQL Server 2005 New Features

UDTs and procedural code-- TSQL ProcedureCREATE PROCEDURE GetPoints (@a PointCls)AS SELECT thepoint::m_x, thepoint::m_y FROM point_tabWHERE thepoint::m_x > @a::m_xGO

DECLARE @p PointCls SET @p = CONVERT(PointCls, '1:1')EXEC GetPoints @p

-- .NET functionCREATE FUNCTION AddPoints (@a PointCls, @b PointCls)

RETURNS PointClsEXTERNAL NAME Point:PointCls::AddPointsGO

DECLARE @a PointCls, @b PointCls, @c PointClsSET @a = CONVERT(PointCls, '100:200')SET @b = CONVERT(PointCls, '3:4')SET @c = dbo.AddPoints(@a, @b)SELECT @c::m_x

Page 21: SQL Server 2005 New Features

Agenda

n XML Support in SQL Server 2005

n .NET Inside the Database

n Business Intelligence

n OLAP

n Data Mining

Page 22: SQL Server 2005 New Features

What is Data Warehouse?

n Defined in many different ways, but not rigorously.

n A decision support database that is maintained separately from

the organization’s operational database

n Support information processing by providing a solid platform of

consolidated, historical data for analysis.

n “A data warehouse is a subject-oriented, integrated, time-variant,

and nonvolatile collection of data in support of management’s

decision-making process.”—W. H. Inmon

n Data warehousing:

n The process of constructing and using data warehouses

Page 23: SQL Server 2005 New Features

Data Warehouse—Subject-Oriented

n Organized around major subjects, such as customer,

product, sales.

n Focusing on the modeling and analysis of data for decision

makers, not on daily operations or transaction processing.

n Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision

support process.

Page 24: SQL Server 2005 New Features

Data Warehouse—Integrated

n Constructed by integrating multiple, heterogeneous data sourcesn relational databases, flat files, on-line transaction

recordsn Data cleaning and data integration techniques are

applied.n Ensure consistency in naming conventions, encoding

structures, attribute measures, etc. among different data sourcesn E.g., Hotel price: currency, tax, breakfast covered, etc.

n When data is moved to the warehouse, it is converted.

Page 25: SQL Server 2005 New Features

Data Warehouse—Time Variant

n The time horizon for the data warehouse is significantly longer than that of operational systems.

n Operational database: current value data.

n Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years)

n Every key structure in the data warehouse

n Contains an element of time, explicitly or implicitly

n But the key of operational data may or may not contain “time element”.

Page 26: SQL Server 2005 New Features

Data Warehouse—Non-Volatile

n A physically separate store of data transformed from the

operational environment.

n Operational update of data does not occur in the data

warehouse environment.

n Does not require transaction processing, recovery, and concurrency control mechanisms

n Requires only two operations in data accessing:

n initial loading of data and access of data.

Page 27: SQL Server 2005 New Features

OLTP vs. OLAP

OLTP OLAP users clerk, IT professional knowledge worker function day to day operations decision support DB design application-oriented subject-oriented data current, up-to-date

detailed, flat relational isolated

historical, summarized, multidimensional integrated, consolidated

usage repetitive ad-hoc access read/write

index/hash on prim. key lots of scans

unit of work short, simple transaction complex query # records accessed tens millions #users thousands hundreds DB size 100MB-GB 100GB-TB metric transaction throughput query throughput, response

Page 28: SQL Server 2005 New Features

Conceptual Modeling of Data Warehouses

n Modeling data warehouses: dimensions & measures

n Star schema: A fact table in the middle connected to a set of dimension tables

n Snowflake schema: A refinement of star schema

where some dimensional hierarchy is normalized into a set of smaller dimension tables, forming a shape

similar to snowflake

n Fact constellations: Multiple fact tables share

dimension tables, viewed as a collection of stars,

therefore called galaxy schema or fact constellation

Page 29: SQL Server 2005 New Features

Example of Star Schema

time_keydayday_of_the_weekmonthquarteryear

time

location_keystreetcitystate_or_provincecountry

location

Sales Fact Table

time_key

item_key

branch_key

location_key

units_sold

dollars_sold

avg_salesMeasures

item_keyitem_namebrandtypesupplier_type

item

branch_keybranch_namebranch_type

branch

Page 30: SQL Server 2005 New Features

Example of Snowflake Schema

time_keydayday_of_the_weekmonthquarteryear

time

location_keystreetcity_key

location

Sales Fact Table

time_key

item_key

branch_key

location_key

units_sold

dollars_sold

avg_sales

Measures

item_keyitem_namebrandtypesupplier_key

item

branch_keybranch_namebranch_type

branch

supplier_keysupplier_type

supplier

city_keycitystate_or_provincecountry

city

Page 31: SQL Server 2005 New Features

Example of Fact Constellation

time_keydayday_of_the_weekmonthquarteryear

time

location_keystreetcityprovince_or_statecountry

location

Sales Fact Table

time_key

item_key

branch_key

location_key

units_sold

dollars_sold

avg_salesMeasures

item_keyitem_namebrandtypesupplier_type

item

branch_keybranch_namebranch_type

branch

Shipping Fact Table

time_key

item_key

shipper_key

from_location

to_location

dollars_cost

units_shipped

shipper_keyshipper_namelocation_keyshipper_type

shipper

Page 32: SQL Server 2005 New Features

Multidimensional Data

n Sales volume as a function of product, month, and region

Prod

uct

Region

Month

Dimensions: Product, Location, TimeHierarchical summarization paths

Industry Region Year

Category Country Quarter

Product City Month Week

Office Day

Page 33: SQL Server 2005 New Features

A Concept Hierarchy: Dimension (location)

all

Europe North_America

MexicoCanadaSpainGermany

Vancouver

M. WindL. Chan

...

......

... ...

...

all

region

office

country

TorontoFrankfurtcity

Page 34: SQL Server 2005 New Features

A Sample Data Cube

Total annual salesof TV in U.S.A.Date

Produ

ct

Cou

ntrysum

sumTV

VCRPC

1Qtr 2Qtr 3Qtr 4QtrU.S.A

Canada

Mexico

sum

Page 35: SQL Server 2005 New Features

OLAP Server Architectures

n Relational OLAP (ROLAP)n Use relational or extended-relational DBMS to store and manage

warehouse data and OLAP middle ware to support missing piecesn Include optimization of DBMS backend, implementation of

aggregation navigation logic, and additional tools and servicesn greater scalability

n Multidimensional OLAP (MOLAP)n Array-based multidimensional storage engine (sparse matrix

techniques)n fast indexing to pre-computed summarized data

n Hybrid OLAP (HOLAP)n User flexibility, e.g., low level: relational, high-level: array

n Specialized SQL serversn specialized support for SQL queries over star/snowflake schemas

Page 36: SQL Server 2005 New Features

Data Warehouse Usage

n Three kinds of data warehouse applications

n Information processing

n supports querying, basic statistical analysis, and reporting using crosstabs, tables, charts and graphs

n Analytical processing

n multidimensional analysis of data warehouse data

n supports basic OLAP operations, slice-dice, drilling, pivoting

n Data mining

n knowledge discovery from hidden patterns

n supports associations, constructing analytical models, performing classification and prediction, and presenting the mining results using visualization tools.

n Differences among the three tasks

Page 37: SQL Server 2005 New Features

IT for the Past, Present and Future

n Archiving the Past – storage, writing, etcn Awareness of the Present – networking, telecom, etcn Predicting the Future – This is where the action is!n What is needed?

n Data about the past and presentn Models for how systems evolven Ability to associate data with system modelsn Predict the future and develop a course of action

n Let’s enumerate some applications…..

Page 38: SQL Server 2005 New Features

Necessity Is the Mother of Invention

n Data explosion problem

n Automated data collection tools and mature database technology

lead to tremendous amounts of data accumulated and/or to be

analyzed in databases, data warehouses, and other information

repositories

n We are drowning in data, but starving for knowledge!

n Solution: Data warehousing and data mining

n Data warehousing and on-line analytical processing

n Mining interesting knowledge (rules, regularities, patterns,

constraints) from data in large databases

Page 39: SQL Server 2005 New Features

What Is Data Mining?

n Data mining (knowledge discovery from data)

n Extraction of interesting (non-trivial, implicit,

previously unknown and potentially useful) patterns or knowledge from huge amount of data

n Data mining: a misnomer?

n Alternative names

n Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc.

Page 40: SQL Server 2005 New Features

Data Mining Process

n Data mining—core of knowledge discovery process

Data Cleaning

Data Integration

Databases

Data Warehouse

Task-relevant Data

Selection

Data Mining

Pattern Evaluation

Page 41: SQL Server 2005 New Features

Complete Set of Algorithms

Decision TreesDecision Trees ClusteringClustering Time SeriesTime Series

Sequence Sequence ClusteringClustering

AssociationAssociation NaNaïïve Bayesve Bayes

Neural NetNeural Net

Introduced in SQL Server 2000

Page 42: SQL Server 2005 New Features

What Is Association Mining?

n Association rule mining:n Finding frequent patterns, associations, correlations, or

causal structures among sets of items or objects in transaction databases, relational databases, and other information repositories.

n Frequent pattern: pattern (set of items, sequence, etc.) that occurs frequently in a database [AIS93]

n Motivation: finding regularities in datan What products were often purchased together? — Beer

and diapers?!n What are the subsequent purchases after buying a PC?n What kinds of DNA are sensitive to this new drug?n Can we automatically classify web documents?

Page 43: SQL Server 2005 New Features

n Classification:n predicts categorical class labels (discrete or nominal)n classifies data (constructs a model) based on the

training set and the values (class labels) in a classifying attribute and uses it in classifying new data

n Prediction: n models continuous-valued functions, i.e., predicts

unknown or missing values n Typical Applications

n credit approvaln target marketingn medical diagnosisn treatment effectiveness analysis

Classification vs. Prediction

Page 44: SQL Server 2005 New Features

Classification Process (1): Model Construction

TrainingData

NAME RANK YEARS TENUREDMike Assistant Prof 3 noMary Assistant Prof 7 yesBill Professor 2 yesJim Associate Prof 7 yesDave Assistant Prof 6 noAnne Associate Prof 3 no

ClassificationAlgorithms

IF rank = ‘professor’OR years > 6THEN tenured = ‘yes’

Classifier(Model)

Page 45: SQL Server 2005 New Features

Classification Process (2): Use the Model in Prediction

Classifier

TestingData

NAME RANK YEARS TENUREDTom Assistant Prof 2 noMerlisa Associate Prof 7 noGeorge Professor 5 yesJoseph Assistant Prof 7 yes

Unseen Data

(Jeff, Professor, 4)

Tenured?

Page 46: SQL Server 2005 New Features

Training Dataset

age income student credit_rating buys_computer<=30 high no fair no<=30 high no excellent no31…40 high no fair yes>40 medium no fair yes>40 low yes fair yes>40 low yes excellent no31…40 low yes excellent yes<=30 medium no fair no<=30 low yes fair yes>40 medium yes fair yes<=30 medium yes excellent yes31…40 medium no excellent yes31…40 high yes fair yes>40 medium no excellent no

This follows an example from Quinlan’s ID3

Page 47: SQL Server 2005 New Features

Output: A Decision Tree for “buys_computer”

age?

overcast

student? credit rating?

no yes fairexcellent

<=30 >40

no noyes yes

yes

30..40