Exploring Database Archival Strategies With emphasis on “Why” and “When” before delving into...

38
Exploring Database Archival Strategies Exploring Database Archival Strategies With emphasis on “ With emphasis on “ Why Why ” and “ ” and “ When When before delving into “ before delving into “ How to How to options options By: By: Ben Aminnia Ben Aminnia President, L.A. SQL Server Professionals Group President, L.A. SQL Server Professionals Group www.sql.la www.sql.la Database Architect, Pointer Corporation Database Architect, Pointer Corporation www.pointercorp.com www.pointercorp.com

Transcript of Exploring Database Archival Strategies With emphasis on “Why” and “When” before delving into...

Exploring Database Archival StrategiesExploring Database Archival Strategies

With emphasis on “With emphasis on “WhyWhy” and “” and “WhenWhen” ” before delving into “before delving into “How toHow to” options” options

By: By: Ben AminniaBen Aminnia

President, L.A. SQL Server Professionals Group President, L.A. SQL Server Professionals Group www.sql.lawww.sql.la Database Architect, Pointer Corporation www.pointercorp.com Database Architect, Pointer Corporation www.pointercorp.com

22

ObjectivesObjectives

What “What “isis” and what “” and what “is notis not” covered in this ” covered in this presentationpresentation

The first question we should ask ourselves is The first question we should ask ourselves is not “not “How to archiveHow to archive” but rather …” but rather …

Why do we need to archive and what happens Why do we need to archive and what happens if we don’t?if we don’t?

And then …And then …

33

ObjectivesObjectives

The “How to” part will encompass multiple The “How to” part will encompass multiple questions …questions …

How do we plan and design the archive from How do we plan and design the archive from an architect’s perspective?an architect’s perspective?

How do we look for alternative approaches?How do we look for alternative approaches?How do we choose among those alternative How do we choose among those alternative

approaches?approaches?And more importantly …And more importantly …Remember that we’re not alone in the decision Remember that we’re not alone in the decision

process.process.

44

Management ConcernsManagement Concerns

Before the archive process starts, management must Before the archive process starts, management must approve the approach.approve the approach.

How do we defend and justify our selected approach?How do we defend and justify our selected approach? When talking to IT ManagementWhen talking to IT Management When talking to Non-IT Management; CEO’s; CFO’sWhen talking to Non-IT Management; CEO’s; CFO’s

How do we put ourselves in their shoes?How do we put ourselves in their shoes?How do we build a decision matrix to compare How do we build a decision matrix to compare

various alternative approaches?various alternative approaches?What criteria columns should we put on the decision What criteria columns should we put on the decision

matrix?matrix?What’s the cost / benefit summary of different What’s the cost / benefit summary of different

alternatives on our decision matrix?alternatives on our decision matrix?

55

Management ConcernsManagement Concerns

How do we measure cost / benefit?How do we measure cost / benefit? One-time / Initial CostsOne-time / Initial Costs Recurring / Annual CostsRecurring / Annual Costs What about measuring benefits? Have you thought What about measuring benefits? Have you thought

about this?about this?

Do we have a Service Level Agreement Do we have a Service Level Agreement (SLA)?(SLA)?

66

Management ConcernsManagement Concerns

But the challenge of seeking management approval But the challenge of seeking management approval may go well beyond that;may go well beyond that;

They may also ask: They may also ask: How did we get here?How did we get here?Why didn’t we do this and that earlier?Why didn’t we do this and that earlier?Why can’t we save nothing and just recreate?Why can’t we save nothing and just recreate?Trying to solve one problem while creating other Trying to solve one problem while creating other

problems …problems … L.A. Traffic is BadL.A. Traffic is Bad Make all highways one-way to Big Bear LakeMake all highways one-way to Big Bear Lake That will solve L.A.’s traffic problemThat will solve L.A.’s traffic problem Let the mayor of Big Bear Lake worry about their Let the mayor of Big Bear Lake worry about their

traffic problem; that’s not my problem!traffic problem; that’s not my problem!

77

Why Do We Archive?Why Do We Archive?

Increasing Cost of Storage / HardwareIncreasing Cost of Storage / HardwarePerformance Degradation / Response Performance Degradation / Response

TimeTimeRegulatory / Government RequirementsRegulatory / Government RequirementsApplication RequirementsApplication Requirements

Must show current year onlyMust show current year only Data transfer to disconnected usersData transfer to disconnected users Part of a bigger picture, beyond the scope of Part of a bigger picture, beyond the scope of

our role in the projectour role in the project

It’s part of SLA!It’s part of SLA!

88

When Do We Archive?When Do We Archive?

Once a yearOnce a year During spring cleaning seasonDuring spring cleaning season

When something breaks unexpectedly When something breaks unexpectedly and then everyone wakes up and says and then everyone wakes up and says ““Oops! We forgot to archive.Oops! We forgot to archive.””

When we have budgetWhen we have budgetWhen we have nothing else to doWhen we have nothing else to doWhen we are told to get it done by When we are told to get it done by

MondayMonday

99

When Do We Archive?When Do We Archive?

When DB size approaches a predefined When DB size approaches a predefined threshold …threshold …

1 GB1 GB 10 GB10 GB 100 GB100 GB 1 TB1 TB

The important point is to understand the issue The important point is to understand the issue and to have a strategy for addressing it.and to have a strategy for addressing it.

1010

Time for a Quick De-Tour!Time for a Quick De-Tour!

Using ASP.NET, XSLT, and XML to TakeUsing ASP.NET, XSLT, and XML to TakeSQL Server SQL Server toto aa New New HeightHeight

SQL Server as a Document RepositorySQL Server as a Document Repository

Part 4 - Part 4 - The Database Archival ChallengeThe Database Archival Challenge

By: By: Ben AminniaBen Aminnia

President, L.A. SQL Server Professionals Group President, L.A. SQL Server Professionals Group www.sql.lawww.sql.la Database Architect, Pointer Corporation www.pointercorp.com Database Architect, Pointer Corporation www.pointercorp.com

1212

AgendaAgenda

Part 1 Review Summary: Part 1 Review Summary: Background and Background and Overview of the VIP System ArchitectureOverview of the VIP System Architecture

Part 2 Review Summary: Part 2 Review Summary: Generating Reports & Generating Reports & Graphs with SSRS and MS-ChartGraphs with SSRS and MS-Chart

Part 3 Review Summary:Part 3 Review Summary: The Road Ahead – The Road Ahead – Using SQL Server as a Document RepositoryUsing SQL Server as a Document Repository

Part 4:Part 4: The Database Archival Challenge! The Database Archival Challenge!Questions and AnswersQuestions and Answers

1313

Architectural Notes and Architectural Notes and Challenge for the DBAChallenge for the DBA

Each record is about 100 KB large;Each record is about 100 KB large; So it takes about ten thousand records to reach So it takes about ten thousand records to reach

one GB in DB size;one GB in DB size; There’s no physical deletion; deleted records There’s no physical deletion; deleted records

are only marked for deletion (with are only marked for deletion (with [isdeleted]=1[isdeleted]=1););

1414

Some Fundamental QuestionsSome Fundamental Questions

There are many questions on There are many questions on HowHow, , WhatWhat, and , and WhyWhy to deal with database archival process. to deal with database archival process. Who’s on 1Who’s on 1stst??

Should we plan archival when we’re running out Should we plan archival when we’re running out of space? … or performance has gone down? … of space? … or performance has gone down? … or some other company policy mandates it?or some other company policy mandates it?

OR … as the famous database architect, OR … as the famous database architect, Julie Julie AndrewsAndrews, sings in , sings in The Sound of MusicThe Sound of Music … …

LET’S START AT THE VERY BEGINNING!LET’S START AT THE VERY BEGINNING!

1515

The First Question:The First Question:HOW?HOW?, , WHAT?WHAT?, or , or WHYWHY??

There are six different possible orders to ask There are six different possible orders to ask these questions …these questions …

I think the answer is:I think the answer is:

• WHY?WHY?• WHAT?WHAT?• HOW?HOW?

1616

Why?Why? Increasing cost of storage / hardwareIncreasing cost of storage / hardware Performance Degradation / Response TimePerformance Degradation / Response Time Application requirements (e.g. must show Application requirements (e.g. must show

current year only)current year only)

1717

What and Where to?What and Where to? The whole record is moved to another location The whole record is moved to another location

and deleted from the main locationand deleted from the main location Only part of the record is movedOnly part of the record is moved Destination could be …Destination could be …

To another DBTo another DB To the file systemTo the file system No longer onlineNo longer online

1818

How?How? How to archive?How to archive? How to retrieve the archived record?How to retrieve the archived record? What are the possible alternative from an What are the possible alternative from an

architectural perspective?architectural perspective?

1919

Four Alternative Ways …Four Alternative Ways …

Method 1:Method 1: Store in archived Store in archived location (e.g. on the network location (e.g. on the network file system) from the beginning.file system) from the beginning.

There’s nothing to archive periodically or at a There’s nothing to archive periodically or at a later time.later time.

This used to be the most common way for This used to be the most common way for document archival, before the XML technology document archival, before the XML technology which we started to use in the VIP Letters which we started to use in the VIP Letters system.system.

I stored over 50,000 documents from one of my I stored over 50,000 documents from one of my applications this way.applications this way.

2020

Four Alternative Ways …Four Alternative Ways …

Method 2:Method 2: Periodic archive of the Periodic archive of the whole record to a different database.whole record to a different database.

Move last year’s data to a different database Move last year’s data to a different database with identical format.with identical format.

Delete the entire archived record from the main Delete the entire archived record from the main database.database.

Main database remains small and portable.Main database remains small and portable. Helps with response time.Helps with response time. Also helps with portability (e.g. when laptop Also helps with portability (e.g. when laptop

users need to have a small version of database users need to have a small version of database on their local drive, while disconnected from on their local drive, while disconnected from corporate network or the internet).corporate network or the internet).

2121

Four Alternative Ways …Four Alternative Ways …

Method 3:Method 3: Partial archive of the Partial archive of the old records …old records …

This is the case of our VIP Letters architecture.This is the case of our VIP Letters architecture. Each record is about 100 KB.Each record is about 100 KB. 10,000 records are almost 1.0 GB.10,000 records are almost 1.0 GB. Most of it is the XML column data which holds Most of it is the XML column data which holds

the saved letter.the saved letter. The archive process will move the XML part of it The archive process will move the XML part of it

to the network share on the file system …to the network share on the file system … Keeping the other data columns in the main DB.Keeping the other data columns in the main DB. You then set the “IsArchived” column to 1.You then set the “IsArchived” column to 1.

2222

Four Alternative Ways …Four Alternative Ways …

Method 4:Method 4: No longer online … No longer online … Very common practice when a government or Very common practice when a government or

regulatory agency mandates only x number of regulatory agency mandates only x number of years to keep records online.years to keep records online.

The archived records are then scanned and The archived records are then scanned and stored offline (on a tape or in paper form).stored offline (on a tape or in paper form).

2323

Back to Our Discussion Back to Our Discussion HereHere

2424

In SummaryIn Summary

Archival is done in Phase 4Archival is done in Phase 4And that’s when it should be doneAnd that’s when it should be doneNobody said archival should be done in Nobody said archival should be done in

Phase 1Phase 1But we should have it in sight – on the But we should have it in sight – on the

horizon – from the beginninghorizon – from the beginningArchival planning / implementation Archival planning / implementation

should not come as a surprise!should not come as a surprise!Example:Example: DB and Website Size Tracking DB and Website Size Tracking

(before the archival time)(before the archival time)

2525

Four Methods to ArchiveFour Methods to Archive

Again, from my other presentation, we Again, from my other presentation, we looked into four methods to archive:looked into four methods to archive:

Method 1:Method 1: Store in archived location (e.g. on Store in archived location (e.g. on the network file system) from the beginning.the network file system) from the beginning.

Method 2:Method 2: Periodic archive of the whole Periodic archive of the whole record to a different database.record to a different database.

Method 3:Method 3: Partial archive of the old records Partial archive of the old records Method 4:Method 4: No longer online No longer online

2626

A Closer Look at Method 3A Closer Look at Method 3

Method 3:Method 3: Partial archive of the old records Partial archive of the old recordsSave a 2Save a 2ndnd copy of the document on the file system copy of the document on the file system

and then delete it from the XML column of the and then delete it from the XML column of the databasedatabase

What tracking columns do we add to the main What tracking columns do we add to the main Document Archive table?Document Archive table?

How / when to copy document(s) to the file system How / when to copy document(s) to the file system ……

During the original creationDuring the original creation Later; one-by-one; on demandLater; one-by-one; on demand Later; in batches (e.g. older than 1/1/2005)Later; in batches (e.g. older than 1/1/2005)

How to retrieve it back from the file systemHow to retrieve it back from the file system

2727

A Closer Look at Method 4A Closer Look at Method 4

Method 4:Method 4: No longer online No longer onlineThat is, neither in the XML column nor on the That is, neither in the XML column nor on the

file systemfile systemHow do we recreate it later when needed?How do we recreate it later when needed?Scan the paper copy?Scan the paper copy?I don’t think so!I don’t think so!Regenerate using original letter’s parameter Regenerate using original letter’s parameter

values which are still in the database?values which are still in the database?What if the original template (XSLT) has What if the original template (XSLT) has

changed and recreate doesn’t look like the changed and recreate doesn’t look like the original anymore or it fails?original anymore or it fails?

2828

Beyond Four MethodsBeyond Four Methods

Method 5:Method 5: Totally move from primary database Totally move from primary database to anther (non-DB) medium (e.g. network share to anther (non-DB) medium (e.g. network share / tape)/ tape)

More common in legacy system; Not one of my More common in legacy system; Not one of my options.options.

Method 6:Method 6: Don’t move anything; just set the Don’t move anything; just set the archived flag and use a VIEW to filter out archived flag and use a VIEW to filter out archived records.archived records.

Makes sense if the objective is “visibility” and Makes sense if the objective is “visibility” and not “space” or “response time” or “data not “space” or “response time” or “data transmission between network and local for transmission between network and local for disconnected mode”disconnected mode”

2929

Beyond Four MethodsBeyond Four Methods

Method 7:Method 7: Use “The Cloud”Use “The Cloud”A viable alternative for space limitations in a A viable alternative for space limitations in a

hosted environment or cost considerationshosted environment or cost considerationsIt’s really NOT an archival alternative.It’s really NOT an archival alternative.

Gave it a shotGave it a shot Asked for a compatibility testAsked for a compatibility test Nice presentation, but not compatible!Nice presentation, but not compatible! Analogy with a personal power-generator vs. Analogy with a personal power-generator vs.

electrical outlet connected to DWP …electrical outlet connected to DWP …

3030

The “Cloud” - The “Cloud” - BeforeBefore

My Personal My Personal Power GeneratorPower Generator

3131

The “Cloud” - The “Cloud” - AfterAfter

XX

3232

The “Cloud” - The “Cloud” - AfterAfter

XXWhat if I need a 150 V generator?What if I need a 150 V generator?

3333

Final ThoughtsFinal ThoughtsBefore Meeting with ManagementBefore Meeting with Management

Be Prepared to Answer QuestionsBe Prepared to Answer QuestionsWhich functions will NOT work on archived records? Which functions will NOT work on archived records?

(e.g. Full-Text Search)(e.g. Full-Text Search)Have a Decision Matrix, showing all options with Have a Decision Matrix, showing all options with

pros and cons of eachpros and cons of eachUser Interface to Retrieve an Archived RecordUser Interface to Retrieve an Archived Record

From the main applicationFrom the main application From a secondary application solely for the purpose of From a secondary application solely for the purpose of

archive retrievalarchive retrieval By sending a request to a service application or email to a By sending a request to a service application or email to a

designated contactdesignated contact

Turnaround time and other pros and cons for each Turnaround time and other pros and cons for each of the above approachesof the above approaches

3434

The Solution MatrixThe Solution MatrixSolution Alternatives Problems Solved

Cost of Storage

Performance / Response Time

Application Requirements

1. Store in archive location from beginning ? ? ?

2. Move from primary to history DB √ √ √

3. Partially move; e.g. keep the record but empty the XML column

√ √ X

4. Just delete the XML column without moving anything; rebuild (based on most recent template) if necessary

√ √ X

5. Move from primary table to anther (non-DB) medium (e.g. network / tape)

√ √ √

6. Don’t move anything; use filters X X √

7. Use “the cloud” √ X X

3535

Ideally, each solution alternative on the matrix Ideally, each solution alternative on the matrix needs the following:needs the following:

Development CostsDevelopment Costs Infrastructure CostsInfrastructure Costs Maintenance CostsMaintenance Costs Which problem(s) are being addressedWhich problem(s) are being addressed Which problem(s) are NOT being addressedWhich problem(s) are NOT being addressed What new problems might be introducedWhat new problems might be introduced

Eventually, as the IT architects, we will Eventually, as the IT architects, we will be responsible for the outcome!be responsible for the outcome!

The Solution MatrixThe Solution MatrixFinal ThoughtsFinal Thoughts

3636

Questions and AnswersQuestions and Answers

3737

Contact InformationContact Information

• Emails:Emails: [email protected] [email protected] [email protected] [email protected]

• Websites:Websites: www.sql.lawww.sql.la www.pointercorp.com www.pointercorp.com www.vipletters.com www.vipletters.com

3838

Thank You!Thank You!