Storage Systems CSE 598D, Spring 2007 Lecture 1: Introduction and Overview January 25, 2007.

27
Storage Systems Storage Systems CSE 598D, Spring CSE 598D, Spring 2007 2007 Lecture 1: Introduction and Lecture 1: Introduction and Overview Overview January 25, 2007 January 25, 2007

Transcript of Storage Systems CSE 598D, Spring 2007 Lecture 1: Introduction and Overview January 25, 2007.

Page 1: Storage Systems CSE 598D, Spring 2007 Lecture 1: Introduction and Overview January 25, 2007.

Storage SystemsStorage SystemsCSE 598D, Spring CSE 598D, Spring

20072007

Lecture 1: Introduction and Lecture 1: Introduction and OverviewOverview

January 25, 2007January 25, 2007

Page 2: Storage Systems CSE 598D, Spring 2007 Lecture 1: Introduction and Overview January 25, 2007.

How this course will work

• Class meetings twice a week– Tue, Thu: 5.30 - 6.45 pm, 223B

• Lectures by me in most classes• Some student presentations

– To be determined as the course progresses

• Everyone should participate in discussions– Part of your grade for participation!– Scribe notes to record lectures and discussions

• 2-3 assignments– May involve some simple system building: details to be

decided– Some written homeworks

• Online resources– Class URL via my Web page– Slides/scribe notes/assignments on Angel– Please make sure your Angel email works

Page 3: Storage Systems CSE 598D, Spring 2007 Lecture 1: Introduction and Overview January 25, 2007.

How this course will work

• Background expected– Operating systems (411-level)

• Basic knowledge of file systems, I/O subsystem, DMA, device drivers, …

– Distributed systems• Consistency semantics, replication, caching,

synchronization, …

– Algorithms and data structures (undergraduate-level)

• Analysis of algorithms, basic data structures

• Will cover background material whenever needed– Your feedback important in deciding what to cover

Page 4: Storage Systems CSE 598D, Spring 2007 Lecture 1: Introduction and Overview January 25, 2007.

How this course will work

• No text-book– I will use some chapters from a set of books

• If needed, photocopies of these will be made available to you

• Syllabus consists of material presented in class– Most of it based on research papers made available on the

course page• Not up yet but will be soon• Additional reading material: for background or to delve

deeper

• What you need to do– Read assigned papers BEFORE each class– During the class

• Ask questions, express your opinions, argue!

• Goal: Learn about storage systems– Also learn

• How to read a research paper?• How to write a good systems paper?

– What separates good (systems) research from bad?

Page 5: Storage Systems CSE 598D, Spring 2007 Lecture 1: Introduction and Overview January 25, 2007.

How this course will work

• Grading– Scribe notes: 10%

• Detailed notes that one can go back to and find everything that was presented and discussed in the class

– And that you can use for revision before the exam!

– Participation in class: 10%– Mid-term exam: 20%– Presentation: 10%– Assignments (2-3): 20%– Survey or Project: 30%

Page 6: Storage Systems CSE 598D, Spring 2007 Lecture 1: Introduction and Overview January 25, 2007.

How this course will work

• Survey– A 10-15 page comprehensive exploration/synthesis

of an area related to storage systems at the end of the semester

• Project– Groups of up to 2 students– Identify a problem and motivate the need to solve it

• Convince where existing research lacks– Develop and evaluate your solution– Present it in a paper-style write-up at the end of the

semester

Page 7: Storage Systems CSE 598D, Spring 2007 Lecture 1: Introduction and Overview January 25, 2007.

• Today– Some background/history on storage

systems– Overview of course content

• A superset of topics we will study

Page 8: Storage Systems CSE 598D, Spring 2007 Lecture 1: Introduction and Overview January 25, 2007.

Introduction

Page 9: Storage Systems CSE 598D, Spring 2007 Lecture 1: Introduction and Overview January 25, 2007.

Why Applications Need Storage

• Memory is – Volatile: Durability is needed– Not enough: High Capacity is needed– Not easy to share/move: Portability is needed– Expensive

• Non-volatile, cheap, long-lasting, reliable, abundant storage is needed for numerous applications– Personal/individual applications– Scientific applications– Enterprise applications– Internet scale applications– Emerging sensor networks, highly distributed

systems such as some P2P systems

Page 10: Storage Systems CSE 598D, Spring 2007 Lecture 1: Introduction and Overview January 25, 2007.

Personal Applications

• Email, Contacts, Schedules, …• Financial data, personal files, …• Media files• Gaming

Page 11: Storage Systems CSE 598D, Spring 2007 Lecture 1: Introduction and Overview January 25, 2007.

Scientific Applications• Manipulate large data sets: Either explicitly

(files) or implicitly (VM).

Sanger Institute Sequencing facility to add 100 TB each yr.

CERN Particle Collider

NASA EOSDIS

Page 12: Storage Systems CSE 598D, Spring 2007 Lecture 1: Introduction and Overview January 25, 2007.

Enterprise Applications

• File and Email servers• OLTP• OLAP• Other Database applications• SAP• Financial workloads• …

Page 13: Storage Systems CSE 598D, Spring 2007 Lecture 1: Introduction and Overview January 25, 2007.

Internet Scale Applications

Data Grids

Page 14: Storage Systems CSE 598D, Spring 2007 Lecture 1: Introduction and Overview January 25, 2007.

Sensor Networks

Page 15: Storage Systems CSE 598D, Spring 2007 Lecture 1: Introduction and Overview January 25, 2007.

IBM 305 RAMAC - 1956Random Access Method of Accounting and

Control

• 5 MB capacity, 50 disks each 24” diameter, 2000 bits/sq-inch density

• First computer with magnetic hard disk– Replaced the “magnetic drum”

• Could store roughly 2000 pages of text!

Page 16: Storage Systems CSE 598D, Spring 2007 Lecture 1: Introduction and Overview January 25, 2007.

Seagate Savvio 10K.1 - 2004

• 10K RPM, 73.4 GBytes• Can read and write complete works of

Shakespeare 15 times each second!

Page 17: Storage Systems CSE 598D, Spring 2007 Lecture 1: Introduction and Overview January 25, 2007.

Seagate Savvio 15K - 2007

• 15K RPM, 73.4 Gbytes• World’s fastest disk?

Page 18: Storage Systems CSE 598D, Spring 2007 Lecture 1: Introduction and Overview January 25, 2007.
Page 19: Storage Systems CSE 598D, Spring 2007 Lecture 1: Introduction and Overview January 25, 2007.
Page 20: Storage Systems CSE 598D, Spring 2007 Lecture 1: Introduction and Overview January 25, 2007.

Storage Devices/Hardware

RAIDArrays

Storage Area Networks

Tape Archives

Page 21: Storage Systems CSE 598D, Spring 2007 Lecture 1: Introduction and Overview January 25, 2007.

Overview of Course Content

Page 22: Storage Systems CSE 598D, Spring 2007 Lecture 1: Introduction and Overview January 25, 2007.

Overview of Course

• What goes on inside a disk?– Hardware– Modeling the disk– Performance optimizations

• Disk scheduling• Rearranging data blocks

• How do you improve bandwidth to/from disks?– RAID arrays– Reduce data transferred from disks (Active Disks)– Storage Area Networks to allow concurrent transfers

to/from several hosts• Shared Storage Model

Page 23: Storage Systems CSE 598D, Spring 2007 Lecture 1: Introduction and Overview January 25, 2007.

• How can software take advantage of these enhancements– Review of the OS I/O subsystem– How sys-admins manage storage– File Systems for NAS/SAN– Caching and Pre-fetching

• Theory of storage– Which problems are hard?– Important data structures

• With shared storage, and a very complicated storage system, how do we manage this hierarchy?– Storage Provisioning– QoS Control/Virtualization– Security – Case-studies of enterprise storage systems (e.g., EMC, Veritas)

Page 24: Storage Systems CSE 598D, Spring 2007 Lecture 1: Introduction and Overview January 25, 2007.

• Requirements are becoming more stringent - we need do guarantee availability,

and store data for a long time (archival storage). How do we achieve this?– Dependability/Availability issues– Disaster management– Data lifetime

• Power and thermal management of storage systems

• Storage in highly distributed systems– Storage in P2P systems– Sensor storage– Grid-like infrastructure based storage: E.g., Oceanstore

• Storage in search, information retrieval– Google File System

• Are disks going to be the norm in the future?– Future of magnetic storage – MEMS– Flash storage

• Windows Vista for laptops

Page 25: Storage Systems CSE 598D, Spring 2007 Lecture 1: Introduction and Overview January 25, 2007.

• Part of the material will be from these books– “Storage Networks Explained” (Wiley),

Troppens, Erkens, and Muller– “The Holy Grail of Storage Management” by

Toigo– “Storage Area Network Essentials” (Wiley)

by Barker and Massiglia

Page 26: Storage Systems CSE 598D, Spring 2007 Lecture 1: Introduction and Overview January 25, 2007.

Next time• Hard disk• Certain aspects of I/O subsystem

– Spanning hardware and OS

Page 27: Storage Systems CSE 598D, Spring 2007 Lecture 1: Introduction and Overview January 25, 2007.

I/O System View

CPU

iL1

dL1

L2

DiskCtrller

I/OBus(e.g. PCI)

Memory Bus(e.g. PC133) Main

Memory

Software StackAppln.File SystemBuffer ManagerDevice Driver Controller(ASIC)

Device FirmwareCacheDMA engine

e.g. SCSI

PlattersActuatorMotorsElectronics