Computer systems a programmer's persp 2nd ed r bryant, d o'hallaron (pearson, 2010)

484
Computer Systems A Programmer’s Perspective

description

Computer Systems - A Programmer's Persp. 2nd ed. - R. Bryant, D. O'Hallaron (Pearson, 2010)

Transcript of Computer systems a programmer's persp 2nd ed r bryant, d o'hallaron (pearson, 2010)

  • Computer SystemsA Programmers Perspective

  • This page intentionally left blank

    Computer SystemsA Programmers Perspective

    Randal E. BryantCarnegie Mellon University

    David R. OHallaronCarnegie Mellon University and Intel Labs

    Prentice Hall

    Boston Columbus Indianapolis New York San Francisco Upper Saddle River

    Amsterdam Cape Town Dubai London Madrid Milan Munich Paris Montreal Toronto

    Delhi Mexico City Sao Paulo Sydney Hong Kong Seoul Singapore Taipei Tokyo

  • Editorial Director: Marcia HortonEditor-in-Chief: Michael HirschAcquisitions Editor: Matt GoldsteinEditorial Assistant: Chelsea BellDirector of Marketing: Margaret WaplesMarketing Coordinator: Kathryn FerrantiManaging Editor: Jeff HolcombSenior Manufacturing Buyer: Carol MelvilleArt Director: Linda KnowlesCover Designer: Elena SidorovaImage Interior Permission Coordinator: Richard RodriguesCover Art: Randal E. Bryant and David R. OHallaronMedia Producer: Katelyn BollerProject Management and Interior Design: Paul C. Anagnostopoulos, Windfall SoftwareComposition: Joe Snowden, Coventry CompositionPrinter/Binder: Edwards BrothersCover Printer: Lehigh-Phoenix Color/Hagerstown

    Copyright 2011, 2003 byRandal E. Bryant andDavidR.OHallaron. All rights reserved.Manufactured in the United States of America. This publication is protected by Copyright,andpermission should beobtained from thepublisher prior to anyprohibited reproduction,storage in a retrieval system, or transmission in any form or by any means, electronic,mechanical, photocopying, recording, or likewise. To obtain permission(s) to use materialfrom this work, please submit a written request to Pearson Education, Inc., PermissionsDepartment, 501 Boylston Street, Suite 900, Boston, Massachusetts 02116.

    Many of the designations by manufacturers and seller to distinguish their products areclaimed as trademarks. Where those designations appear in this book, and the publisherwas aware of a trademark claim, the designations have been printed in initial caps or allcaps.

    Library of Congress Cataloging-in-Publication Data

    Bryant, Randal.Computer systems : a programmers perspective / Randal E. Bryant, David R.

    OHallaron.2nd ed.p. cm.

    Includes bibliographical references and index.ISBN-13: 978-0-13-610804-7 (alk. paper)ISBN-10: 0-13-610804-0 (alk. paper)1. Computer systems. 2. Computers. 3. Telecommunication. 4. User interfaces

    (Computer systems) I. OHallaron, David Richard. II. Title.QA76.5.B795 2010004dc22

    2009053083

    10 9 8 7 6 5 4 3 2 1EB14 13 12 11 10

    ISBN 10: 0-13-610804-0ISBN 13: 978-0-13-610804-7

    To the students and instructors of the 15-213

    course at Carnegie Mellon University, for inspiring

    us to develop and rene the material for this book.

  • This page intentionally left blank

    Contents

    Preface xix

    About the Authors xxxiii

    1A Tour of Computer Systems 11.1 Information Is Bits + Context 31.2 Programs Are Translated by Other Programs into Different Forms 41.3 It Pays to Understand How Compilation Systems Work 61.4 Processors Read and Interpret Instructions Stored in Memory 7

    1.4.1 Hardware Organization of a System 71.4.2 Running the hello Program 10

    1.5 Caches Matter 121.6 Storage Devices Form a Hierarchy 131.7 The Operating System Manages the Hardware 14

    1.7.1 Processes 161.7.2 Threads 171.7.3 Virtual Memory 171.7.4 Files 19

    1.8 Systems Communicate with Other Systems Using Networks 201.9 Important Themes 21

    1.9.1 Concurrency and Parallelism 211.9.2 The Importance of Abstractions in Computer Systems 24

    1.10 Summary 25Bibliographic Notes 26

    Part I Program Structure and Execution

    2Representing and Manipulating Information 292.1 Information Storage 33

    2.1.1 Hexadecimal Notation 342.1.2 Words 382.1.3 Data Sizes 38

    vii

  • viii Contents

    2.1.4 Addressing and Byte Ordering 392.1.5 Representing Strings 462.1.6 Representing Code 472.1.7 Introduction to Boolean Algebra 482.1.8 Bit-Level Operations in C 512.1.9 Logical Operations in C 542.1.10 Shift Operations in C 54

    2.2 Integer Representations 562.2.1 Integral Data Types 572.2.2 Unsigned Encodings 582.2.3 Twos-Complement Encodings 602.2.4 Conversions Between Signed and Unsigned 652.2.5 Signed vs. Unsigned in C 692.2.6 Expanding the Bit Representation of a Number 712.2.7 Truncating Numbers 752.2.8 Advice on Signed vs. Unsigned 76

    2.3 Integer Arithmetic 792.3.1 Unsigned Addition 792.3.2 Twos-Complement Addition 832.3.3 Twos-Complement Negation 872.3.4 Unsigned Multiplication 882.3.5 Twos-Complement Multiplication 892.3.6 Multiplying by Constants 922.3.7 Dividing by Powers of Two 952.3.8 Final Thoughts on Integer Arithmetic 98

    2.4 Floating Point 992.4.1 Fractional Binary Numbers 1002.4.2 IEEE Floating-Point Representation 1032.4.3 Example Numbers 1052.4.4 Rounding 1102.4.5 Floating-Point Operations 1132.4.6 Floating Point in C 114

    2.5 Summary 118Bibliographic Notes 119Homework Problems 119Solutions to Practice Problems 134

    3Machine-Level Representation of Programs 1533.1 A Historical Perspective 1563.2 Program Encodings 159

    Contents ix

    3.2.1 Machine-Level Code 1603.2.2 Code Examples 1623.2.3 Notes on Formatting 165

    3.3 Data Formats 1673.4 Accessing Information 168

    3.4.1 Operand Speciers 1693.4.2 Data Movement Instructions 1713.4.3 Data Movement Example 174

    3.5 Arithmetic and Logical Operations 1773.5.1 Load Effective Address 1773.5.2 Unary and Binary Operations 1783.5.3 Shift Operations 1793.5.4 Discussion 1803.5.5 Special Arithmetic Operations 182

    3.6 Control 1853.6.1 Condition Codes 1853.6.2 Accessing the Condition Codes 1873.6.3 Jump Instructions and Their Encodings 1893.6.4 Translating Conditional Branches 1933.6.5 Loops 1973.6.6 Conditional Move Instructions 2063.6.7 Switch Statements 213

    3.7 Procedures 2193.7.1 Stack Frame Structure 2193.7.2 Transferring Control 2213.7.3 Register Usage Conventions 2233.7.4 Procedure Example 2243.7.5 Recursive Procedures 229

    3.8 Array Allocation and Access 2323.8.1 Basic Principles 2323.8.2 Pointer Arithmetic 2333.8.3 Nested Arrays 2353.8.4 Fixed-Size Arrays 2373.8.5 Variable-Size Arrays 238

    3.9 Heterogeneous Data Structures 2413.9.1 Structures 2413.9.2 Unions 2443.9.3 Data Alignment 248

    3.10 Putting It Together: Understanding Pointers 2523.11 Life in the Real World: Using the gdb Debugger 2543.12 Out-of-Bounds Memory References and Buffer Overow 256

    3.12.1 Thwarting Buffer Overow Attacks 261

  • x Contents

    3.13 x86-64: Extending IA32 to 64 Bits 2673.13.1 History and Motivation for x86-64 2683.13.2 An Overview of x86-64 2703.13.3 Accessing Information 2733.13.4 Control 2793.13.5 Data Structures 2903.13.6 Concluding Observations about x86-64 291

    3.14 Machine-Level Representations of Floating-Point Programs 2923.15 Summary 293

    Bibliographic Notes 294Homework Problems 294Solutions to Practice Problems 308

    4Processor Architecture 3334.1 The Y86 Instruction Set Architecture 336

    4.1.1 Programmer-Visible State 3364.1.2 Y86 Instructions 3374.1.3 Instruction Encoding 3394.1.4 Y86 Exceptions 3444.1.5 Y86 Programs 3454.1.6 Some Y86 Instruction Details 350

    4.2 Logic Design and the Hardware Control Language HCL 3524.2.1 Logic Gates 3534.2.2 Combinational Circuits and HCL Boolean Expressions 3544.2.3 Word-Level Combinational Circuits and HCL Integer

    Expressions 3554.2.4 Set Membership 3604.2.5 Memory and Clocking 361

    4.3 Sequential Y86 Implementations 3644.3.1 Organizing Processing into Stages 3644.3.2 SEQ Hardware Structure 3754.3.3 SEQ Timing 3794.3.4 SEQ Stage Implementations 383

    4.4 General Principles of Pipelining 3914.4.1 Computational Pipelines 3924.4.2 A Detailed Look at Pipeline Operation 3934.4.3 Limitations of Pipelining 3944.4.4 Pipelining a System with Feedback 398

    4.5 Pipelined Y86 Implementations 4004.5.1 SEQ+: Rearranging the Computation Stages 400

    Contents xi

    4.5.2 Inserting Pipeline Registers 4014.5.3 Rearranging and Relabeling Signals 4054.5.4 Next PC Prediction 4064.5.5 Pipeline Hazards 4084.5.6 Avoiding Data Hazards by Stalling 4134.5.7 Avoiding Data Hazards by Forwarding 4154.5.8 Load/Use Data Hazards 4184.5.9 Exception Handling 4204.5.10 PIPE Stage Implementations 4234.5.11 Pipeline Control Logic 4314.5.12 Performance Analysis 4444.5.13 Unnished Business 446

    4.6 Summary 4494.6.1 Y86 Simulators 450Bibliographic Notes 451Homework Problems 451Solutions to Practice Problems 457

    5Optimizing Program Performance 4735.1 Capabilities and Limitations of Optimizing Compilers 4765.2 Expressing Program Performance 4805.3 Program Example 4825.4 Eliminating Loop Inefciencies 4865.5 Reducing Procedure Calls 4905.6 Eliminating Unneeded Memory References 4915.7 Understanding Modern Processors 496

    5.7.1 Overall Operation 4975.7.2 Functional Unit Performance 5005.7.3 An Abstract Model of Processor Operation 502

    5.8 Loop Unrolling 5095.9 Enhancing Parallelism 513

    5.9.1 Multiple Accumulators 5145.9.2 Reassociation Transformation 518

    5.10 Summary of Results for Optimizing Combining Code 5245.11 Some Limiting Factors 525

    5.11.1 Register Spilling 5255.11.2 Branch Prediction and Misprediction Penalties 526

    5.12 Understanding Memory Performance 5315.12.1 Load Performance 5315.12.2 Store Performance 532

  • xii Contents

    5.13 Life in the Real World: Performance Improvement Techniques 5395.14 Identifying and Eliminating Performance Bottlenecks 540

    5.14.1 Program Proling 5405.14.2 Using a Proler to Guide Optimization 5425.14.3 Amdahls Law 545

    5.15 Summary 547Bibliographic Notes 548Homework Problems 549Solutions to Practice Problems 552

    6The Memory Hierarchy 5596.1 Storage Technologies 561

    6.1.1 Random-Access Memory 5616.1.2 Disk Storage 5706.1.3 Solid State Disks 5816.1.4 Storage Technology Trends 583

    6.2 Locality 5866.2.1 Locality of References to Program Data 5876.2.2 Locality of Instruction Fetches 5886.2.3 Summary of Locality 589

    6.3 The Memory Hierarchy 5916.3.1 Caching in the Memory Hierarchy 5926.3.2 Summary of Memory Hierarchy Concepts 595

    6.4 Cache Memories 5966.4.1 Generic Cache Memory Organization 5976.4.2 Direct-Mapped Caches 5996.4.3 Set Associative Caches 6066.4.4 Fully Associative Caches 6086.4.5 Issues with Writes 6116.4.6 Anatomy of a Real Cache Hierarchy 6126.4.7 Performance Impact of Cache Parameters 614

    6.5 Writing Cache-friendly Code 6156.6 Putting It Together: The Impact of Caches on Program Performance 620

    6.6.1 The Memory Mountain 6216.6.2 Rearranging Loops to Increase Spatial Locality 6256.6.3 Exploiting Locality in Your Programs 629

    6.7 Summary 629Bibliographic Notes 630Homework Problems 631Solutions to Practice Problems 642

    Contents xiii

    Part II Running Programs on a System

    7Linking 6537.1 Compiler Drivers 6557.2 Static Linking 6577.3 Object Files 6577.4 Relocatable Object Files 6587.5 Symbols and Symbol Tables 6607.6 Symbol Resolution 663

    7.6.1 How Linkers Resolve Multiply Dened Global Symbols 6647.6.2 Linking with Static Libraries 6677.6.3 How Linkers Use Static Libraries to Resolve References 670

    7.7 Relocation 6727.7.1 Relocation Entries 6727.7.2 Relocating Symbol References 673

    7.8 Executable Object Files 6787.9 Loading Executable Object Files 6797.10 Dynamic Linking with Shared Libraries 6817.11 Loading and Linking Shared Libraries from Applications 6837.12 Position-Independent Code (PIC) 6877.13 Tools for Manipulating Object Files 6907.14 Summary 691

    Bibliographic Notes 691Homework Problems 692Solutions to Practice Problems 698

    8Exceptional Control Flow 7018.1 Exceptions 703

    8.1.1 Exception Handling 7048.1.2 Classes of Exceptions 7068.1.3 Exceptions in Linux/IA32 Systems 708

    8.2 Processes 7128.2.1 Logical Control Flow 7128.2.2 Concurrent Flows 7138.2.3 Private Address Space 7148.2.4 User and Kernel Modes 7148.2.5 Context Switches 716

  • xiv Contents

    8.3 System Call Error Handling 7178.4 Process Control 718

    8.4.1 Obtaining Process IDs 7198.4.2 Creating and Terminating Processes 7198.4.3 Reaping Child Processes 7238.4.4 Putting Processes to Sleep 7298.4.5 Loading and Running Programs 7308.4.6 Using fork and execve to Run Programs 733

    8.5 Signals 7368.5.1 Signal Terminology 7388.5.2 Sending Signals 7398.5.3 Receiving Signals 7428.5.4 Signal Handling Issues 7458.5.5 Portable Signal Handling 7528.5.6 Explicitly Blocking and Unblocking Signals 7538.5.7 Synchronizing Flows to Avoid Nasty Concurrency Bugs 755

    8.6 Nonlocal Jumps 7598.7 Tools for Manipulating Processes 7628.8 Summary 763

    Bibliographic Notes 763Homework Problems 764Solutions to Practice Problems 771

    9Virtual Memory 7759.1 Physical and Virtual Addressing 7779.2 Address Spaces 7789.3 VM as a Tool for Caching 779

    9.3.1 DRAM Cache Organization 7809.3.2 Page Tables 7809.3.3 Page Hits 7829.3.4 Page Faults 7829.3.5 Allocating Pages 7839.3.6 Locality to the Rescue Again 784

    9.4 VM as a Tool for Memory Management 7859.5 VM as a Tool for Memory Protection 7869.6 Address Translation 787

    9.6.1 Integrating Caches and VM 7919.6.2 Speeding up Address Translation with a TLB 7919.6.3 Multi-Level Page Tables 7929.6.4 Putting It Together: End-to-end Address Translation 794

    9.7 Case Study: The Intel Core i7/Linux Memory System 799

    Contents xv

    9.7.1 Core i7 Address Translation 8009.7.2 Linux Virtual Memory System 803

    9.8 Memory Mapping 8079.8.1 Shared Objects Revisited 8079.8.2 The fork Function Revisited 8099.8.3 The execve Function Revisited 8109.8.4 User-level Memory Mapping with the mmap Function 810

    9.9 Dynamic Memory Allocation 8129.9.1 The malloc and free Functions 8149.9.2 Why Dynamic Memory Allocation? 8169.9.3 Allocator Requirements and Goals 8179.9.4 Fragmentation 8199.9.5 Implementation Issues 8209.9.6 Implicit Free Lists 8209.9.7 Placing Allocated Blocks 8229.9.8 Splitting Free Blocks 8239.9.9 Getting Additional Heap Memory 8239.9.10 Coalescing Free Blocks 8249.9.11 Coalescing with Boundary Tags 8249.9.12 Putting It Together: Implementing a Simple Allocator 8279.9.13 Explicit Free Lists 8359.9.14 Segregated Free Lists 836

    9.10 Garbage Collection 8389.10.1 Garbage Collector Basics 8399.10.2 Mark&Sweep Garbage Collectors 8409.10.3 Conservative Mark&Sweep for C Programs 842

    9.11 Common Memory-Related Bugs in C Programs 8439.11.1 Dereferencing Bad Pointers 8439.11.2 Reading Uninitialized Memory 8439.11.3 Allowing Stack Buffer Overows 8449.11.4 Assuming that Pointers and the Objects They Point to Are the

    Same Size 8449.11.5 Making Off-by-One Errors 8459.11.6 Referencing a Pointer Instead of the Object It Points to 8459.11.7 Misunderstanding Pointer Arithmetic 8469.11.8 Referencing Nonexistent Variables 8469.11.9 Referencing Data in Free Heap Blocks 8479.11.10 Introducing Memory Leaks 847

    9.12 Summary 848

    Bibliographic Notes 848

    Homework Problems 849

    Solutions to Practice Problems 853

  • xvi Contents

    Part III Interaction and Communication BetweenPrograms

    10System-Level I/O 86110.1 Unix I/O 86210.2 Opening and Closing Files 86310.3 Reading and Writing Files 86510.4 Robust Reading and Writing with the Rio Package 867

    10.4.1 Rio Unbuffered Input and Output Functions 86710.4.2 Rio Buffered Input Functions 868

    10.5 Reading File Metadata 87310.6 Sharing Files 87510.7 I/O Redirection 87710.8 Standard I/O 87910.9 Putting It Together: Which I/O Functions Should I Use? 88010.10 Summary 881

    Bibliographic Notes 882Homework Problems 882Solutions to Practice Problems 883

    11Network Programming 88511.1 The Client-Server Programming Model 88611.2 Networks 88711.3 The Global IP Internet 891

    11.3.1 IP Addresses 89311.3.2 Internet Domain Names 89511.3.3 Internet Connections 899

    11.4 The Sockets Interface 90011.4.1 Socket Address Structures 90111.4.2 The socket Function 90211.4.3 The connect Function 90311.4.4 The open_clientfd Function 90311.4.5 The bind Function 90411.4.6 The listen Function 90511.4.7 The open_listenfd Function 90511.4.8 The accept Function 90711.4.9 Example Echo Client and Server 908

    Contents xvii

    11.5 Web Servers 91111.5.1 Web Basics 91111.5.2 Web Content 91211.5.3 HTTP Transactions 91411.5.4 Serving Dynamic Content 916

    11.6 Putting It Together: The Tiny Web Server 91911.7 Summary 927

    Bibliographic Notes 928Homework Problems 928Solutions to Practice Problems 929

    12Concurrent Programming 93312.1 Concurrent Programming with Processes 935

    12.1.1 A Concurrent Server Based on Processes 93612.1.2 Pros and Cons of Processes 937

    12.2 Concurrent Programming with I/O Multiplexing 93912.2.1 A Concurrent Event-Driven Server Based on I/O

    Multiplexing 94212.2.2 Pros and Cons of I/O Multiplexing 946

    12.3 Concurrent Programming with Threads 94712.3.1 Thread Execution Model 94812.3.2 Posix Threads 94812.3.3 Creating Threads 95012.3.4 Terminating Threads 95012.3.5 Reaping Terminated Threads 95112.3.6 Detaching Threads 95112.3.7 Initializing Threads 95212.3.8 A Concurrent Server Based on Threads 952

    12.4 Shared Variables in Threaded Programs 95412.4.1 Threads Memory Model 95512.4.2 Mapping Variables to Memory 95612.4.3 Shared Variables 956

    12.5 Synchronizing Threads with Semaphores 95712.5.1 Progress Graphs 96012.5.2 Semaphores 96312.5.3 Using Semaphores for Mutual Exclusion 96412.5.4 Using Semaphores to Schedule Shared Resources 96612.5.5 Putting It Together: A Concurrent Server Based on

    Prethreading 97012.6 Using Threads for Parallelism 974

  • xviii Contents

    12.7 Other Concurrency Issues 97912.7.1 Thread Safety 97912.7.2 Reentrancy 98012.7.3 Using Existing Library Functions in Threaded Programs 98212.7.4 Races 98312.7.5 Deadlocks 985

    12.8 Summary 988Bibliographic Notes 989Homework Problems 989Solutions to Practice Problems 994

    AError Handling 999A.1 Error Handling in Unix Systems 1000A.2 Error-Handling Wrappers 1001

    References 1005

    Index 1011

    Preface

    This book (CS:APP) is for computer scientists, computer engineers, and otherswho want to be able to write better programs by learning what is going on underthe hood of a computer system.

    Our aim is to explain the enduring concepts underlying all computer systems,and to show you the concrete ways that these ideas affect the correctness, perfor-mance, and utility of your application programs. Other systems books are writtenfrom a builders perspective, describing how to implement the hardware or the sys-tems software, including the operating system, compiler, and network interface.This book is written from a programmers perspective, describing how applicationprogrammers can use their knowledge of a system to write better programs. Ofcourse, learningwhat a system is supposed to do provides a good rst step in learn-ing how to build one, and so this book also serves as a valuable introduction tothose who go on to implement systems hardware and software.

    If you study and learn the concepts in this book, you will be on your way tobecoming the rare power programmer who knows how things work and howto x them when they break. Our aim is to present the fundamental concepts inways that youwill nd useful right away.Youwill also be prepared to delve deeper,studying such topics as compilers, computer architecture, operating systems, em-bedded systems, and networking.

    Assumptions about the Readers Background

    The presentation of machine code in the book is based on two related formatssupported by Intel and its competitors, colloquially known as x86. IA32 is themachine code that has become the de facto standard for a wide range of systems.x86-64 is an extension of IA32 to enable programs to operate on larger data and toreference awider range ofmemory addresses. Since x86-64 systems are able to runIA32 code, both of these forms of machine code will see widespread use for theforeseeable future.We consider how these machines execute C programs onUnixor Unix-like (such as Linux) operating systems. (To simplify our presentation,we will use the term Unix as an umbrella term for systems having Unix astheir heritage, including Solaris, Mac OS, and Linux.) The text contains numerousprogramming examples that have been compiled and run on Linux systems. Weassume that you have access to such amachine and are able to log in and do simplethings such as changing directories.

    If your computer runs Microsoft Windows, you have two choices. First, youcan get a copy of Linux (www.ubuntu.com) and install it as a dual boot option,so that your machine can run either operating system. Alternatively, by installinga copy of the Cygwin tools (www.cygwin.com), you can run aUnix-like shell under

    xix

  • xx Preface

    Windows and have an environment very close to that provided by Linux. Not allfeatures of Linux are available under Cygwin, however.

    We also assume that you have some familiarity with C or C++. If your onlyprior experience is with Java, the transition will require more effort on your part,but we will help you. Java and C share similar syntax and control statements.However, there are aspects of C, particularly pointers, explicit dynamic memoryallocation, and formatted I/O, that do not exist in Java. Fortunately, C is a smalllanguage, and it is clearly and beautifully described in the classic K&R textby Brian Kernighan and Dennis Ritchie [58]. Regardless of your programmingbackground, consider K&R an essential part of your personal systems library.

    Several of the early chapters in the book explore the interactions betweenC programs and their machine-language counterparts. The machine-languageexamples were all generated by the GNU gcc compiler running on IA32 and x86-64 processors. We do not assume any prior experience with hardware, machinelanguage, or assembly-language programming.

    New to C? Advice on the C programming language

    To help readers whose background in C programming is weak (or nonexistent), we have also includedthese special notes to highlight features that are especially important in C. We assume you are familiarwith C++ or Java.

    How to Read the Book

    Learning how computer systems work from a programmers perspective is greatfun, mainly because you can do it actively. Whenever you learn something new,you can try it out right away and see the result rst hand. In fact, we believe thatthe only way to learn systems is to do systems, either working concrete problemsor writing and running programs on real systems.

    This theme pervades the entire book. When a new concept is introduced, itis followed in the text by one or more practice problems that you should workimmediately to test your understanding. Solutions to the practice problems areat the end of each chapter. As you read, try to solve each problem on your own,and then check the solution to make sure you are on the right track. Each chapteris followed by a set of homework problems of varying difculty. Your instructorhas the solutions to the homework problems in an Instructors Manual. For eachhomeworkproblem,we showa rating of the amount of effortwe feel itwill require:

    Should require just a few minutes. Little or no programming required. Might require up to 20minutes. Often involves writing and testing some code.

    Many of these are derived from problems we have given on exams.

    Requires a signicant effort, perhaps 12 hours. Generally involves writingand testing a signicant amount of code.

    A lab assignment, requiring up to 10 hours of effort.

    Preface xxi

    code/intro/hello.c

    1 #include

    2

    3 int main()

    4 {

    5 printf("hello, world\n");

    6 return 0;

    7 }

    code/intro/hello.c

    Figure 1 A typical code example.

    Each code example in the text was formatted directly, without any manualintervention, from a C program compiled with gcc and tested on a Linux system.Of course, your systemmay have a different version of gcc, or a different compileraltogether, and so your compiler might generate different machine code, but theoverall behavior should be the same. All of the source code is available from theCS:APPWeb page at csapp.cs.cmu.edu. In the text, the le names of the sourceprograms are documented in horizontal bars that surround the formatted code.For example, the program in Figure 1 can be found in the le hello.c in directorycode/intro/. We encourage you to try running the example programs on yoursystem as you encounter them.

    To avoid having a book that is overwhelming, both in bulk and in content,we have created a number of Web asides containing material that supplementsthe main presentation of the book. These asides are referenced within the bookwith a notation of the form CHAP:TOP, where CHAP is a short encoding of thechapter subject, and TOP is short code for the topic that is covered. For example,Web Aside data:bool contains supplementary material on Boolean algebra forthepresentationondata representations inChapter 2, whileWebAsidearch:vlogcontainsmaterial describingprocessor designs using theVeriloghardwaredescrip-tion language, supplementing the presentation of processor design in Chapter 4.All of these Web asides are available from the CS:APP Web page.

    Aside What is an aside?

    You will encounter asides of this form throughout the text. Asides are parenthetical remarks that giveyou some additional insight into the current topic. Asides serve a number of purposes. Some are littlehistory lessons. For example, where did C, Linux, and the Internet come from? Other asides are meantto clarify ideas that students often nd confusing. For example, what is the difference between a cacheline, set, and block? Other asides give real-world examples. For example, how a oating-point errorcrashed a French rocket, or what the geometry of an actual Seagate disk drive looks like. Finally, someasides are just fun stuff. For example, what is a hoinky?

  • xxii Preface

    Book Overview

    The CS:APP book consists of 12 chapters designed to capture the core ideas incomputer systems:

    . Chapter 1: A Tour of Computer Systems. This chapter introduces the majorideas and themes in computer systems by tracing the life cycle of a simplehello, world program.

    . Chapter 2: Representing and Manipulating Information.We cover computerarithmetic, emphasizing the properties of unsigned and twos-complementnumber representations that affect programmers. We consider how numbersare represented and thereforewhat range of values can be encoded for a givenword size.We consider the effect of casting between signed andunsignednum-bers. We cover the mathematical properties of arithmetic operations. Noviceprogrammers are often surprised to learn that the (twos-complement) sumor product of two positive numbers can be negative. On the other hand, twos-complement arithmetic satises the algebraic properties of a ring, and hence acompiler can safely transform multiplication by a constant into a sequence ofshifts and adds. We use the bit-level operations of C to demonstrate the prin-ciples and applications of Boolean algebra.We cover the IEEE oating-pointformat in terms of how it represents values and the mathematical propertiesof oating-point operations.

    Having a solid understanding of computer arithmetic is critical to writingreliable programs. For example, programmers and compilers cannot replacethe expression (x

  • xxiv Preface

    . Chapter 8: Exceptional Control Flow. In this part of the presentation, westep beyond the single-program model by introducing the general conceptof exceptional control ow (i.e., changes in control ow that are outside thenormal branches and procedure calls). We cover examples of exceptionalcontrol ow that exist at all levels of the system, from low-level hardwareexceptions and interrupts, to context switches between concurrent processes,to abrupt changes in control ow caused by the delivery of Unix signals, tothe nonlocal jumps in C that break the stack discipline.

    This is the part of the book where we introduce the fundamental idea ofa process, an abstraction of an executing program. You will learn how pro-cesses work and how they can be created and manipulated from applicationprograms. We show how application programmers can make use of multipleprocesses via Unix system calls. When you nish this chapter, you will be ableto write a Unix shell with job control. It is also your rst introduction to thenondeterministic behavior that arises with concurrent program execution.

    . Chapter 9: Virtual Memory. Our presentation of the virtual memory systemseeks to give some understanding of how it works and its characteristics. Wewant you to know how it is that the different simultaneous processes can eachuse an identical range of addresses, sharing some pages but having individualcopies of others. We also cover issues involved in managing and manipulatingvirtual memory. In particular, we cover the operation of storage allocatorssuch as the Unix malloc and free operations. Covering this material servesseveral purposes. It reinforces the concept that the virtual memory space isjust an array of bytes that the program can subdivide into different storageunits. It helps you understand the effects of programs containing memory ref-erencing errors such as storage leaks and invalid pointer references. Finally,many application programmers write their own storage allocators optimizedtoward the needs and characteristics of the application. This chapter, morethan any other, demonstrates the benet of covering both the hardware andthe software aspects of computer systems in a unied way. Traditional com-puter architecture and operating systems texts present only part of the virtualmemory story.

    . Chapter 10: System-Level I/O.We cover the basic concepts of Unix I/O suchas les and descriptors. We describe how les are shared, how I/O redirectionworks, and how to access lemetadata.We also develop a robust buffered I/Opackage that deals correctly with a curious behavior known as short counts,where the library function reads only part of the input data. We cover the Cstandard I/O library and its relationship to Unix I/O, focusing on limitationsof standard I/O that make it unsuitable for network programming. In general,the topics covered in this chapter are building blocks for the next two chapterson network and concurrent programming.

    . Chapter 11: Network Programming.Networks are interesting I/O devices toprogram, tying together many of the ideas that we have studied earlier in thetext, such as processes, signals, byte ordering, memory mapping, and dynamic

    Preface xxv

    storage allocation. Network programs also provide a compelling context forconcurrency, which is the topic of the next chapter. This chapter is a thin slicethrough network programming that gets you to the point where you can writea Web server. We cover the client-server model that underlies all networkapplications. We present a programmers view of the Internet, and show howto write Internet clients and servers using the sockets interface. Finally, weintroduce HTTP and develop a simple iterative Web server.

    . Chapter 12: Concurrent Programming. This chapter introduces concurrentprogramming using Internet server design as the running motivational ex-ample. We compare and contrast the three basic mechanisms for writing con-current programsprocesses, I/O multiplexing, and threadsand show howto use them to build concurrent Internet servers. We cover basic principles ofsynchronization usingP andV semaphore operations, thread safety and reen-trancy, race conditions, and deadlocks. Writing concurrent code is essentialfor most server applications. We also describe the use of thread-level pro-gramming to express parallelism in an application program, enabling fasterexecution on multi-core processors. Getting all of the cores working on a sin-gle computational problem requires a careful coordination of the concurrentthreads, both for correctness and to achieve high performance.

    New to this Edition

    The rst edition of this book was published with a copyright of 2003. Consider-ing the rapid evolution of computer technology, the book content has held upsurprisingly well. Intel x86 machines running Unix-like operating systems andprogrammed in C proved to be a combination that continues to encompass manysystems today. Changes in hardware technology and compilers and the experienceof many instructors teaching the material have prompted a substantial revision.

    Here are some of the more signicant changes:

    . Chapter 2: Representing andManipulating Information.Wehave tried tomakethis material more accessible, with more careful explanations of conceptsand with many more practice and homework problems. We moved some ofthe more theoretical aspects to Web asides. We also describe some of thesecurity vulnerabilities that arise due to the overow properties of computerarithmetic.

    . Chapter 3: Machine-Level Representation of Programs.We have extended ourcoverage to include x86-64, the extension of x86 processors to a 64-bit wordsize.We also use the code generated by amore recent version of gcc. We haveenhanced our coverage of buffer overow vulnerabilities. We have createdWeb asides on two different classes of instructions for oating point, andalso a view of the more exotic transformations made when compilers attempthigher degrees of optimization. Another Web aside describes how to embedx86 assembly code within a C program.

  • xxvi Preface

    . Chapter 4: Processor Architecture. We include a more careful exposition ofexception detection and handling in our processor design. We have also cre-ated a Web aside showing a mapping of our processor designs into Verilog,enabling synthesis into working hardware.

    . Chapter 5: Optimizing Program Performance.We have greatly changed ourdescription of how an out-of-order processor operates, and we have createda simple technique for analyzing program performance based on the pathsin a data-ow graph representation of a program. A Web aside describeshow C programmers can write programs that make use of the SIMD (single-instruction, multiple-data) instructions found in more recent versions of x86processors.

    . Chapter 6: The Memory Hierarchy. We have added material on solid-statedisks, and we have updated our presentation to be based on the memoryhierarchy of an Intel Core i7 processor.

    . Chapter 7: Linking.This chapter has changed only slightly.

    . Chapter 8: Exceptional Control Flow. We have enhanced our discussion ofhow theprocessmodel introduces some fundamental concepts of concurrency,such as nondeterminism.

    . Chapter 9: VirtualMemory.Wehave updated ourmemory systemcase study todescribe the 64-bit Intel Core i7 processor. We have also updated our sampleimplementation of malloc to work for both 32-bit and 64-bit execution.

    . Chapter 10: System-Level I/O.This chapter has changed only slightly.

    . Chapter 11: Network Programming.This chapter has changed only slightly.

    . Chapter 12: Concurrent Programming.We have increased our coverage of thegeneral principles of concurrency, and we also describe how programmerscan use thread-level parallelism to make programs run faster on multi-coremachines.

    In addition, we have added and revised a number of practice and homeworkproblems.

    Origins of the Book

    The book stems from an introductory course that we developed at Carnegie Mel-lonUniversity in the Fall of 1998, called 15-213: Introduction to Computer Systems(ICS) [14]. The ICS course has been taught every semester since then, each time toabout 150250 students, ranging from sophomores tomasters degree students andwith a wide variety of majors. It is a required course for all undergraduates in theCS and ECE departments at Carnegie Mellon, and it has become a prerequisitefor most upper-level systems courses.

    The idea with ICS was to introduce students to computers in a different way.Few of our students would have the opportunity to build a computer system. Onthe other hand, most students, including all computer scientists and computerengineers, will be required to use and program computers on a daily basis. So we

    Preface xxvii

    decided to teach about systems from the point of view of the programmer, usingthe following lter: we would cover a topic only if it affected the performance,correctness, or utility of user-level C programs.

    For example, topics such as hardware adder and bus designs were out. Topicssuch as machine language were in, but instead of focusing on how to write assem-bly language by hand, we would look at how a C compiler translates C constructsinto machine code, including pointers, loops, procedure calls, and switch state-ments. Further, we would take a broader and more holistic view of the systemas both hardware and systems software, covering such topics as linking, loading,processes, signals, performance optimization, virtual memory, I/O, and networkand concurrent programming.

    This approach allowed us to teach the ICS course in a way that is practical,concrete, hands-on, and exciting for the students. The response from our studentsand faculty colleagues was immediate and overwhelmingly positive, and we real-ized that others outside of CMU might benet from using our approach. Hencethis book, which we developed from the ICS lecture notes, and which we havenow revised to reect changes in technology and how computer systems are im-plemented.

    For Instructors: Courses Based on the Book

    Instructors can use the CS:APP book to teach ve different kinds of systemscourses (Figure 2). The particular course depends on curriculum requirements,personal taste, and the backgrounds and abilities of the students. From left toright in the gure, the courses are characterized by an increasing emphasis on theprogrammers perspective of a system. Here is a brief description:

    . ORG: A computer organization course with traditional topics covered in anuntraditional style. Traditional topics such as logic design, processor architec-ture, assembly language, and memory systems are covered. However, there ismore emphasis on the impact for the programmer. For example, data repre-sentations are related back to the data types and operations of C programs,and the presentation on assembly code is based on machine code generatedby a C compiler rather than hand-written assembly code.

    . ORG+: TheORG course with additional emphasis on the impact of hardwareon the performance of application programs. Compared to ORG, studentslearn more about code optimization and about improving the memory per-formance of their C programs.

    . ICS: The baseline ICS course, designed to produce enlightened programmerswho understand the impact of the hardware, operating system, and compila-tion systemon the performance and correctness of their application programs.A signicant difference fromORG+ is that low-level processor architecture isnot covered. Instead, programmers work with a higher-level model of a mod-ern out-of-order processor. The ICS course ts nicely into a 10-week quarter,and can also be stretched to a 15-week semester if covered at a more leisurelypace.

  • xxviii Preface

    Course

    Chapter Topic ORG ORG+ ICS ICS+ SP

    1 Tour of systems 2 Data representation (d)3 Machine language 4 Processor architecture 5 Code optimization 6 Memory hierarchy (a) (a)7 Linking (c) (c) 8 Exceptional control ow 9 Virtual memory (b)

    10 System-level I/O 11 Network programming 12 Concurrent programming

    Figure 2 Five systems courses based on the CS:APP book. Notes: (a) Hardware only,(b) No dynamic storage allocation, (c) No dynamic linking, (d) No oating point. ICS+is the 15-213 course from Carnegie Mellon.

    . ICS+: The baseline ICS course with additional coverage of systems program-ming topics such as system-level I/O, network programming, and concurrentprogramming. This is the semester-longCarnegieMellon course, which coversevery chapter in CS:APP except low-level processor architecture.

    . SP: A systems programming course. Similar to the ICS+ course, but dropsoating point and performance optimization, and places more emphasis onsystems programming, including process control, dynamic linking, system-level I/O, network programming, and concurrent programming. Instructorsmight want to supplement from other sources for advanced topics such asdaemons, terminal control, and Unix IPC.

    The main message of Figure 2 is that the CS:APP book gives a lot of optionsto students and instructors. If you want your students to be exposed to lower-level processor architecture, then that option is available via theORG andORG+courses. On the other hand, if you want to switch from your current computerorganization course to an ICS or ICS+ course, but are wary are making sucha drastic change all at once, then you can move toward ICS incrementally. Youcan start with ORG, which teaches the traditional topics in a nontraditional way.Once you are comfortable with that material, then you can move to ORG+, andeventually to ICS. If students have no experience in C (for example they haveonly programmed in Java), you could spend several weeks on C and then coverthe material of ORG or ICS.

    Preface xxix

    Finally, we note that the ORG+ and SP courses would make a nice two-term(either quarters or semesters) sequence. Or you might consider offering ICS+ asone term of ICS and one term of SP.

    Classroom-Tested Laboratory Exercises

    The ICS+ course at CarnegieMellon receives very high evaluations from students.Median scores of 5.0/5.0 and means of 4.6/5.0 are typical for the student courseevaluations. Students cite the fun, exciting, and relevant laboratory exercises asthe primary reason. The labs are available from the CS:APP Web page. Here areexamples of the labs that are provided with the book:

    . Data Lab. This lab requires students to implement simple logical and arith-metic functions, but using a highly restricted subset of C. For example, theymust compute the absolute value of a number using only bit-level operations.This lab helps students understand the bit-level representations of C datatypes and the bit-level behavior of the operations on data.

    . Binary Bomb Lab. A binary bomb is a program provided to students as anobject-code le. When run, it prompts the user to type in six different strings.If any of these is incorrect, the bomb explodes, printing an error messageand logging the event on a grading server. Students must defuse theirown unique bombs by disassembling and reverse engineering the programsto determine what the six strings should be. The lab teaches students tounderstand assembly language, and also forces them to learn how to use adebugger.

    . Buffer Overow Lab.Students are required to modify the run-time behaviorof a binary executable by exploiting a buffer overow vulnerability. This labteaches the students about the stack discipline, and teaches them about thedanger of writing code that is vulnerable to buffer overow attacks.

    . Architecture Lab. Several of the homework problems of Chapter 4 can becombined into a lab assignment, where students modify the HCL descriptionof a processor to add new instructions, change the branch prediction policy,or add or remove bypassing paths and register ports. The resulting processorscan be simulated and run through automated tests that will detect most of thepossible bugs. This lab lets students experience the exciting parts of processordesign without requiring a complete background in logic design and hardwaredescription languages.

    . Performance Lab.Students must optimize the performance of an applicationkernel function such as convolution or matrix transposition. This lab providesa very clear demonstration of the properties of cache memories, and givesstudents experience with low-level program optimization.

    . Shell Lab.Students implement their ownUnix shell programwith job control,including the ctrl-c and ctrl-zkeystrokes, fg, bg, and jobs commands. Thisis the students rst introduction to concurrency, and gives them a clear ideaof Unix process control, signals, and signal handling.

  • xxx Preface

    . Malloc Lab. Students implement their own versions of malloc, free, and(optionally) realloc. This lab gives students a clear understanding of datalayout and organization, and requires them to evaluate different trade-offsbetween space and time efciency.

    . Proxy Lab. Students implement a concurrent Web proxy that sits betweentheir browsers and the rest of the World Wide Web. This lab exposes thestudents to such topics as Web clients and servers, and ties together many ofthe concepts from the course, such as byte ordering, le I/O, process control,signals, signal handling, memorymapping, sockets, and concurrency. Studentslike being able to see their programs in actionwith realWebbrowsers andWebservers.

    The CS:APP Instructors Manual has a detailed discussion of the labs, as wellas directions for downloading the support software.

    Acknowledgments for the Second Edition

    Wearedeeply grateful to themanypeoplewhohavehelpedus produce this secondedition of the CS:APP text.

    First and foremost, we would to recognize our colleagues who have taught theICS course at Carnegie Mellon for their insightful feedback and encouragement:GuyBlelloch, RogerDannenberg,DavidEckhardt,GregGanger, SethGoldstein,Greg Kesden, Bruce Maggs, Todd Mowry, Andreas Nowatzyk, Frank Pfenning,and Markus Pueschel.

    Thanks also to our sharp-eyed readers who contributed reports to the erratapage for the rst edition: Daniel Amelang, Rui Baptista, Quarup Barreirinhas,Michael Bombyk, Jorg Brauer, Jordan Brough, Yixin Cao, James Caroll, Rui Car-valho, Hyoung-Kee Choi, Al Davis, Grant Davis, Christian Dufour, Mao Fan,Tim Freeman, Inge Frick, Max Gebhardt, Jeff Goldblat, Thomas Gross, AnitaGupta, JohnHampton,HiepHong,Greg Israelsen, Ronald Jones, HaudyKazemi,Brian Kell, Constantine Kousoulis, Sacha Krakowiak, Arun Krishnaswamy, Mar-tin Kulas, Michael Li, Zeyang Li, Ricky Liu, Mario Lo Conte, Dirk Maas, DevonMacey, Carl Marcinik, Will Marrero, Simone Martins, Tao Men, Mark Morris-sey, Venkata Naidu, Bhas Nalabothula, Thomas Niemann, Eric Peskin, David Po,Anne Rogers, John Ross, Michael Scott, Seiki, Ray Shih, Darren Shultz, ErikSilkensen, Suryanto, Emil Tarazi, Nawanan Theera-Ampornpunt, Joe Trdinich,Michael Trigoboff, James Troup, Martin Vopatek, Alan West, Betsy Wolff, TimWong, JamesWoodruff, ScottWright, JackieXiao, GuanpengXu,QingXu, CarenYang, Yin Yongsheng, Wang Yuanxuan, Steven Zhang, and Day Zhong. Specialthanks to Inge Frick, who identied a subtle deep copy bug in our lock-and-copyexample, and to Ricky Liu, for his amazing proofreading skills.

    Our Intel Labs colleagues Andrew Chien and Limor Fix were exceptionallysupportive throughout the writing of the text. Steve Schlosser graciously providedsome disk drive characterizations. Casey Helfrich and Michael Ryan installedand maintained our new Core i7 box. Michael Kozuch, Babu Pillai, and JasonCampbell provided valuable insight on memory system performance, multi-core

    Preface xxxi

    systems, and the power wall. Phil Gibbons and Shimin Chen shared their consid-erable expertise on solid-state disk designs.

    We have been able to call on the talents of many, including Wen-Mei Hwu,Markus Pueschel, and Jiri Simsa, to provide both detailed comments and high-level advice. James Hoe helped us create a Verilog version of the Y86 processorand did all of the work needed to synthesize working hardware.

    Many thanks to our colleagues who provided reviews of the draft manu-script: James Archibald (Brigham Young University), Richard Carver (GeorgeMason University), Mirela Damian (Villanova University), Peter Dinda (North-western University), John Fiore (Temple University), Jason Fritts (St. Louis Uni-versity), John Greiner (Rice University), Brian Harvey (University of California,Berkeley), Don Heller (Penn State University), Wei Chung Hsu (University ofMinnesota), Michelle Hugue (University of Maryland), Jeremy Johnson (DrexelUniversity), Geoff Kuenning (Harvey Mudd College), Ricky Liu, Sam Mad-den (MIT), Fred Martin (University of Massachusetts, Lowell), Abraham Matta(Boston University), Markus Pueschel (Carnegie Mellon University), NormanRamsey (Tufts University), Glenn Reinmann (UCLA), Michela Taufer (Univer-sity of Delaware), and Craig Zilles (UIUC).

    Paul Anagnostopoulos of Windfall Software did an outstanding job of type-setting the book and leading the production team. Many thanks to Paul and hissuperb team: Rick Camp (copyeditor), Joe Snowden (compositor), MaryEllen N.Oliver (proofreader), Laurel Muller (artist), and Ted Laux (indexer).

    Finally, wewould like to thank our friends at PrenticeHall.MarciaHorton hasalways been there for us. Our editor Matt Goldstein provided stellar leadershipfrom beginning to end.We are profoundly grateful for their help, encouragement,and insights.

    Acknowledgments from the First Edition

    We are deeply indebted to many friends and colleagues for their thoughtful crit-icisms and encouragement. A special thanks to our 15-213 students, whose infec-tious energy and enthusiasm spurred us on. Nick Carter and Vinny Furia gener-ously provided their malloc package.

    GuyBlelloch, GregKesden, BruceMaggs, andToddMowry taught the courseover multiple semesters, gave us encouragement, and helped improve the coursematerial. Herb Derby provided early spiritual guidance and encouragement. Al-lan Fisher, Garth Gibson, Thomas Gross, Satya, Peter Steenkiste, and Hui Zhangencouraged us to develop the course from the start. A suggestion from Garthearly on got the whole ball rolling, and this was picked up and rened with thehelp of a group led by Allan Fisher. Mark Stehlik and Peter Lee have been verysupportive about building this material into the undergraduate curriculum. GregKesden provided helpful feedback on the impact of ICS on the OS course. GregGanger and Jiri Schindler graciously provided some disk drive characterizationsand answered our questions on modern disks. Tom Stricker showed us the mem-ory mountain. James Hoe provided useful ideas and feedback on how to presentprocessor architecture.

  • xxxii Preface

    A special group of studentsKhalil Amiri, Angela Demke Brown, ChrisColohan, Jason Crawford, Peter Dinda, Julio Lopez, Bruce Lowekamp, JeffPierce, Sanjay Rao, Balaji Sarpeshkar, Blake Scholl, Sanjit Seshia, Greg Stef-fan, Tiankai Tu, Kip Walker, and Yinglian Xiewere instrumental in helpingus develop the content of the course. In particular, Chris Colohan established afun (and funny) tone that persists to this day, and invented the legendary binarybomb that has proven to be a great tool for teachingmachine code and debuggingconcepts.

    Chris Bauer, Alan Cox, Peter Dinda, Sandhya Dwarkadas, John Greiner,Bruce Jacob, Barry Johnson, Don Heller, Bruce Lowekamp, Greg Morrisett,Brian Noble, Bobbie Othmer, Bill Pugh, Michael Scott, Mark Smotherman, GregSteffan, and Bob Wier took time that they did not have to read and advise uson early drafts of the book. A very special thanks to Al Davis (University ofUtah), Peter Dinda (Northwestern University), John Greiner (Rice University),Wei Hsu (University of Minnesota), Bruce Lowekamp (College of William &Mary), Bobbie Othmer (University of Minnesota), Michael Scott (University ofRochester), and Bob Wier (Rocky Mountain College) for class testing the Betaversion. A special thanks to their students as well!

    We would also like to thank our colleagues at Prentice Hall. Marcia Horton,Eric Frank, and Harold Stone have been unagging in their support and vision.Harold also helped us present an accurate historical perspective on RISC andCISC processor architectures. Jerry Ralya provided sharp insights and taught usa lot about good writing.

    Finally, we would like to acknowledge the great technical writers BrianKernighan and the late W. Richard Stevens, for showing us that technical bookscan be beautiful.

    Thank you all.

    Randy BryantDave OHallaronPittsburgh, Pennsylvania

    About the Authors

    RandalE.Bryant received hisBachelors degree fromthe University of Michigan in 1973 and then attendedgraduate school at the Massachusetts Institute ofTechnology, receiving a Ph.D. degree in computer sci-ence in 1981. He spent three years as an AssistantProfessor at the California Institute of Technology,and has been on the faculty at Carnegie Mellon since1984. He is currently a University Professor of Com-puter Science and Dean of the School of ComputerScience. He also holds a courtesy appointment with

    the Department of Electrical and Computer Engineering.He has taught courses in computer systems at both the undergraduate and

    graduate level for over 30 years. Over many years of teaching computer archi-tecture courses, he began shifting the focus from how computers are designed toone of how programmers can write more efcient and reliable programs if theyunderstand the system better. Together with Professor OHallaron, he developedthe course 15-213 Introduction to Computer Systems at Carnegie Mellon thatis the basis for this book. He has also taught courses in algorithms, programming,computer networking, and VLSI design.

    Most of Professor Bryants research concerns the design of software toolsto help software and hardware designers verify the correctness of their systems.These include several types of simulators, as well as formal verication tools thatprove the correctness of a design using mathematical methods. He has publishedover 150 technical papers. His research results are used by major computer manu-facturers, including Intel, FreeScale, IBM, and Fujitsu. He has won several majorawards for his research. These include two inventor recognition awards and atechnical achievement award from the Semiconductor Research Corporation, theKanellakis Theory and Practice Award from the Association for Computer Ma-chinery (ACM), and theW. R. G. Baker Award, the Emmanuel Piore Award, andthe Phil Kaufman Award from the Institute of Electrical and Electronics Engi-neers (IEEE). He is a Fellow of both the ACM and the IEEE and a member ofthe U.S. National Academy of Engineering.

    xxxiii

  • xxxiv About the Authors

    David R. OHallaron is the Director of Intel LabsPittsburgh and an Associate Professor in ComputerScience and Electrical and Computer Engineering atCarnegie Mellon University. He received his Ph.D.from the University of Virginia.

    He has taught computer systems courses at theundergraduate and graduate levels on such topics ascomputer architecture, introductory computer sys-tems, parallel processor design, and Internet services.Together with Professor Bryant, he developed thecourse at Carnegie Mellon that led to this book. In

    2004, he was awarded the Herbert Simon Award for Teaching Excellence by theCMUSchool of Computer Science, an award for which the winner is chosen basedon a poll of the students.

    Professor OHallaron works in the area of computer systems, with specicinterests in software systems for scientic computing, data-intensive computing,and virtualization. The best known example of his work is the Quake project, agroup of computer scientists, civil engineers, and seismologists who have devel-oped the ability to predict the motion of the ground during strong earthquakes. In2003, Professor OHallaron and the other members of the Quake team won theGordon Bell Prize, the top international prize in high-performance computing.

    CHAPTER 1A Tour of Computer Systems

    1.1 Information Is Bits + Context 3

    1.2 Programs Are Translated by Other Programs into Different Forms 4

    1.3 It Pays to Understand How Compilation Systems Work 6

    1.4 Processors Read and Interpret Instructions Stored in Memory 7

    1.5 Caches Matter 12

    1.6 Storage Devices Form a Hierarchy 13

    1.7 The Operating System Manages the Hardware 14

    1.8 Systems Communicate with Other Systems Using Networks 20

    1.9 Important Themes 21

    1.10 Summary 25

    Bibliographic Notes 26

    1

  • 2 Chapter 1 A Tour of Computer Systems

    A computer system consists of hardware and systems software that work togetherto run application programs. Specic implementations of systems change overtime, but the underlying concepts do not. All computer systems have similarhardware and software components that perform similar functions. This book iswritten for programmers who want to get better at their craft by understandinghow these components work and how they affect the correctness and performanceof their programs.

    You are poised for an exciting journey. If you dedicate yourself to learning theconcepts in this book, then youwill be on yourway to becoming a rare power pro-grammer, enlightened by an understanding of the underlying computer systemand its impact on your application programs.

    You are going to learn practical skills such as how to avoid strange numericalerrors caused by the way that computers represent numbers. You will learn howto optimize your C code by using clever tricks that exploit the designs of modernprocessors and memory systems. You will learn how the compiler implementsprocedure calls and how to use this knowledge to avoid the security holes frombuffer overowvulnerabilities that plaguenetworkand Internet software.Youwilllearn how to recognize and avoid the nasty errors during linking that confoundthe average programmer. You will learn how to write your own Unix shell, yourown dynamic storage allocation package, and even your ownWeb server. You willlearn the promises and pitfalls of concurrency, a topic of increasing importance asmultiple processor cores are integrated onto single chips.

    In their classic text on the C programming language [58], Kernighan andRitchie introduce readers to C using the hello program shown in Figure 1.1.Although hello is a very simple program, every major part of the system mustwork in concert in order for it to run to completion. In a sense, the goal of thisbook is to help you understand what happens and why, when you run hello onyour system.

    We begin our study of systems by tracing the lifetime of the hello program,from the time it is created by a programmer, until it runs on a system, prints itssimple message, and terminates. As we follow the lifetime of the program, we willbriey introduce the key concepts, terminology, and components that come intoplay. Later chapters will expand on these ideas.

    code/intro/hello.c

    1 #include

    2

    3 int main()

    4 {

    5 printf("hello, world\n");

    6 }

    code/intro/hello.c

    Figure 1.1 The hello program.

    Section 1.1 Information Is Bits + Context 3

    1.1 Information Is Bits + Context

    Our hello program begins life as a source program (or source le) that theprogrammer creates with an editor and saves in a text le called hello.c. Thesource program is a sequence of bits, each with a value of 0 or 1, organizedin 8-bit chunks called bytes. Each byte represents some text character in theprogram.

    Mostmodern systems represent text characters using theASCII standard thatrepresents each character with a unique byte-sized integer value. For example,Figure 1.2 shows the ASCII representation of the hello.c program.

    The hello.c program is stored in a le as a sequence of bytes. Each byte hasan integer value that corresponds to some character. For example, the rst bytehas the integer value 35, which corresponds to the character #. The second bytehas the integer value 105, which corresponds to the character i, and so on. Noticethat each text line is terminated by the invisible newline character \n, which isrepresented by the integer value 10. Files such as hello.c that consist exclusivelyof ASCII characters are known as text les. All other les are known as binaryles.

    The representation of hello.c illustrates a fundamental idea:All informationin a systemincluding disk les, programs stored in memory, user data stored inmemory, and data transferred across a networkis represented as a bunch of bits.The only thing that distinguishes different data objects is the context in whichwe view them. For example, in different contexts, the same sequence of bytesmight represent an integer, oating-point number, character string, or machineinstruction.

    As programmers, we need to understandmachine representations of numbersbecause they are not the same as integers and real numbers. They are niteapproximations that can behave in unexpected ways. This fundamental idea isexplored in detail in Chapter 2.

    # i n c l u d e < s t d i o .

    35 105 110 99 108 117 100 101 32 60 115 116 100 105 111 46

    h > \n \n i n t m a i n ( ) \n {

    104 62 10 10 105 110 116 32 109 97 105 110 40 41 10 123

    \n p r i n t f ( " h e l

    10 32 32 32 32 112 114 105 110 116 102 40 34 104 101 108

    l o , w o r l d \ n " ) ; \n }

    108 111 44 32 119 111 114 108 100 92 110 34 41 59 10 125

    Figure 1.2 The ASCII text representation of hello.c.

  • 4 Chapter 1 A Tour of Computer Systems

    Aside Origins of the C programming language

    C was developed from 1969 to 1973 by Dennis Ritchie of Bell Laboratories. The American NationalStandards Institute (ANSI) ratied theANSIC standard in 1989, and this standardization later becamethe responsibility of the International Standards Organization (ISO). The standards dene the Clanguage and a set of library functions known as theC standard library. Kernighan andRitchie describeANSI C in their classic book, which is known affectionately as K&R [58]. In Ritchies words [88], Cis quirky, awed, and an enormous success. So why the success?

    . C was closely tied with the Unix operating system. C was developed from the beginning as thesystem programming language for Unix. Most of the Unix kernel, and all of its supporting toolsand libraries, were written in C. As Unix became popular in universities in the late 1970s and early1980s, many people were exposed to C and found that they liked it. Since Unix was written almostentirely in C, it could be easily ported to new machines, which created an even wider audience forboth C and Unix.

    . C is a small, simple language.The designwas controlled by a single person, rather than a committee,and the result was a clean, consistent design with little baggage. The K&R book describes thecomplete language and standard library, with numerous examples and exercises, in only 261 pages.The simplicity of C made it relatively easy to learn and to port to different computers.

    . C was designed for a practical purpose.C was designed to implement the Unix operating system.Later, other people found that they could write the programs they wanted, without the languagegetting in the way.

    C is the language of choice for system-level programming, and there is a huge installed base ofapplication-level programs as well. However, it is not perfect for all programmers and all situations.C pointers are a common source of confusion and programming errors. C also lacks explicit supportfor useful abstractions such as classes, objects, and exceptions. Newer languages such as C++ and Javaaddress these issues for application-level programs.

    1.2 Programs Are Translated by Other Programs intoDifferent Forms

    The hello program begins life as a high-level C program because it can be readand understood by human beings in that form. However, in order to run hello.con the system, the individual C statements must be translated by other programsinto a sequence of low-levelmachine-language instructions. These instructions arethen packaged in a form called an executable obUect program and stored as a binarydisk le. Object programs are also referred to as executable obUect les.

    On a Unix system, the translation from source le to object le is performedby a compiler driver:

    unix> gcc -o hello hello.c

    Section 1.2 Programs Are Translated by Other Programs into Oifferent Forms 4

    Pre-processor(cpp)

    Compiler(cc1)

    Assembler(as)

    Linker(ld)

    hello.c hello.i hello.s hello.o

    printf.o

    hello

    Sourceprogram

    (text)Modifiedsource

    program(text)

    Assemblyprogram

    (text)Relocatable

    objectprograms(binary)

    Executableobject

    program(binary)

    Figure 1.3 The compilation system.

    Here, the gcc compiler driver reads the source le hello.c and translates it intoan executable object le hello. The translation is performed in the sequenceof four phases shown in Figure 1.3. The programs that perform the four phases(preprocessor, compiler, assembler, and linker) are known collectively as thecompilation system.

    . Preprocessing phase.The preprocessor (cpp) modies the original C programaccording to directives that begin with the # character. For example, the#include command in line 1 of hello.c tells the preprocessorto read the contents of the system header le stdio.h and insert it directlyinto the program text. The result is another C program, typically with the .isufx.

    . Compilation phase. The compiler (cc1) translates the text le hello.i intothe text le hello.s, which contains an assembly-language program. Eachstatement in an assembly-language program exactly describes one low-levelmachine-language instruction in a standard text form. Assembly language isuseful because it provides a common output language for different compilersfor different high-level languages. For example, C compilers and Fortrancompilers both generate output les in the same assembly language.

    . Assembly phase.Next, the assembler (as) translates hello.s into machine-language instructions, packages them in a form known as a relocatable obUectprogram, and stores the result in the object le hello.o. The hello.o le isa binary le whose bytes encode machine language instructions rather thancharacters. If we were to view hello.o with a text editor, it would appear tobe gibberish.

    . Linkingphase.Notice that ourhelloprogramcalls theprintf function, whichis part of the standard C library provided by every C compiler. The printffunction resides in a separate precompiled object le called printf.o, whichmust somehow bemergedwith our hello.o program. The linker (ld) handlesthis merging. The result is the hello le, which is an executable obUect le (orsimply executable) that is ready to be loaded into memory and executed bythe system.

  • 6 Chapter 1 A Tour of Computer Systems

    Aside The GNU project

    GCC is one of many useful tools developed by the GNU (short for GNUs Not Unix) project. TheGNU project is a tax-exempt charity started by Richard Stallman in 1984, with the ambitious goal ofdeveloping a complete Unix-like system whose source code is unencumbered by restrictions on howit can be modied or distributed. The GNU project has developed an environment with all the majorcomponents of a Unix operating system, except for the kernel, which was developed separately bythe Linux project. The GNU environment includes the emacs editor, gcc compiler, gdb debugger,assembler, linker, utilities for manipulating binaries, and other components. The gcc compiler hasgrown to support many different languages, with the ability to generate code for many differentmachines. Supported languages include C, C++, Fortran, Java, Pascal, Objective-C, and Ada.

    The GNU project is a remarkable achievement, and yet it is often overlooked. The modern open-source movement (commonly associated with Linux) owes its intellectual origins to the GNU projectsnotion of free software (free as in free speech, not free beer). Further, Linux owes much of itspopularity to the GNU tools, which provide the environment for the Linux kernel.

    1.3 It Pays to Understand How Compilation Systems Work

    For simple programs such as hello.c, we can rely on the compilation system toproduce correct and efcient machine code. However, there are some importantreasons why programmers need to understand how compilation systems work:

    . Optimizing program performance.Modern compilers are sophisticated toolsthat usually produce good code. As programmers, we do not need to knowthe inner workings of the compiler in order to write efcient code. However,in order to make good coding decisions in our C programs, we do need abasic understanding of machine-level code and how the compiler translatesdifferent C statements into machine code. For example, is a switch statementalways more efcient than a sequence of if-else statements? How muchoverhead is incurred by a function call? Is a while loop more efcient thana for loop? Are pointer references more efcient than array indexes? Whydoes our loop run so much faster if we sum into a local variable instead of anargument that is passed by reference? How can a function run faster when wesimply rearrange the parentheses in an arithmetic expression?

    In Chapter 3, we will introduce two related machine languages: IA32, the32-bit code that has becomeubiquitous onmachines runningLinux,Windows,and more recently the Macintosh operating systems, and x86-64, a 64-bitextension found in more recent microprocessors. We describe how compilerstranslate different C constructs into these languages. In Chapter 5, you willlearn how to tune the performance of your C programs by making simpletransformations to the C code that help the compiler do its job better. InChapter 6, you will learn about the hierarchical nature of the memory system,how C compilers store data arrays in memory, and how your C programs canexploit this knowledge to run more efciently.

    Section 1.4 Processors Read and Interpret Instructions Stored in Memory 7

    . Understanding link-time errors. In our experience, some of the most perplex-ing programming errors are related to the operation of the linker, especiallywhen you are trying to build large software systems. For example, what doesit mean when the linker reports that it cannot resolve a reference?What is thedifference between a static variable and a global variable? What happens ifyou dene two global variables in different C les with the same name?Whatis the difference between a static library and a dynamic library? Why does itmatter what order we list libraries on the command line? And scariest of all,why do some linker-related errors not appear until run time? You will learnthe answers to these kinds of questions in Chapter 7.

    . Avoiding security holes. For many years, buffer overow vulnerabilities haveaccounted for the majority of security holes in network and Internet servers.These vulnerabilities exist because too few programmers understand the needto carefully restrict the quantity and forms of data they accept from untrustedsources. A rst step in learning secure programming is to understand the con-sequences of the way data and control information are stored on the programstack. We cover the stack discipline and buffer overow vulnerabilities inChapter 3 as part of our study of assembly language. We will also learn aboutmethods that can be used by the programmer, compiler, and operating systemto reduce the threat of attack.

    1.4 Processors Read and Interpret InstructionsStored in Memory

    At this point, our hello.c source program has been translated by the compilationsystem into an executable object le called hello that is stored on disk. To runthe executable le on a Unix system, we type its name to an application programknown as a shell:

    unix> ./hello

    hello, world

    unix>

    The shell is a command-line interpreter that prints a prompt, waits for you to type acommand line, and then performs the command. If the rst word of the commandline does not correspond to a built-in shell command, then the shell assumes thatit is the name of an executable le that it should load and run. So in this case,the shell loads and runs the hello program and then waits for it to terminate. Thehello programprints itsmessage to the screen and then terminates. The shell thenprints a prompt and waits for the next input command line.

    1.4.1 Hardware Organization of a System

    To understand what happens to our hello program when we run it, we needto understand the hardware organization of a typical system, which is shown inFigure 1.4. This particular picture is modeled after the family of Intel Pentium

  • 8 Chapter 1 A Tour of Computer Systems

    Figure 1.4Hardware organizationof a typical system. CPU:Central Processing Unit,ALU: Arithmetic/LogicUnit, PC: Program counter,USB: Universal Serial Bus.

    CPURegister file

    PC ALU

    Bus interface I/Obridge

    System bus Memory bus

    Mainmemory

    I/O busExpansion slots forother devices suchas network adaptersDisk

    controllerGraphicsadapter

    DisplayMouse Keyboard

    USBcontroller

    Diskhello executable

    stored on disk

    systems, but all systems have a similar look and feel. Dont worry about thecomplexity of this gure just now. We will get to its various details in stagesthroughout the course of the book.

    Buses

    Running throughout the system is a collection of electrical conduits called busesthat carry bytes of information back and forth between the components. Busesare typically designed to transfer xed-sized chunks of bytes known aswords. Thenumber of bytes in a word (theword size) is a fundamental system parameter thatvaries across systems. Most machines today have word sizes of either 4 bytes (32bits) or 8 bytes (64 bits). For the sake of our discussion here, wewill assume awordsize of 4 bytes, and we will assume that buses transfer only one word at a time.

    I/O Devices

    Input/output (I/O) devices are the systems connection to the external world. Ourexample system has four I/O devices: a keyboard and mouse for user input, adisplay for user output, and a disk drive (or simply disk) for long-term storage ofdata and programs. Initially, the executable hello program resides on the disk.

    Each I/Odevice is connected to the I/Obusby either a controller or an adapter.The distinction between the two is mainly one of packaging. Controllers are chipsets in the device itself or on the systems main printed circuit board (often calledthemotherboard). An adapter is a card that plugs into a slot on the motherboard.Regardless, the purpose of each is to transfer information back and forth betweenthe I/O bus and an I/O device.

    Section 1.4 Processors Read and Interpret Instructions Stored in Memory 9

    Chapter 6 has more to say about how I/O devices such as disks work. InChapter 10, youwill learn how to use theUnix I/O interface to access devices fromyour application programs. We focus on the especially interesting class of devicesknown as networks, but the techniques generalize to other kinds of devices as well.

    Main Memory

    The main memory is a temporary storage device that holds both a program andthe data it manipulates while the processor is executing the program. Physically,mainmemory consists of a collection ofdynamic randomaccessmemory (DRAM)chips. Logically, memory is organized as a linear array of bytes, each with its ownunique address (array index) starting at zero. In general, each of the machineinstructions that constitute a program can consist of a variable number of bytes.The sizes of data items that correspond to C program variables vary according totype. For example, on an IA32machine runningLinux, data of type short requirestwo bytes, types int, float, and long four bytes, and type double eight bytes.

    Chapter 6 has more to say about how memory technologies such as DRAMchips work, and how they are combined to form main memory.

    Processor

    The central processing unit (CPU), or simply processor, is the engine that inter-prets (or executes) instructions stored in main memory. At its core is a word-sizedstorage device (or register) called the program counter (PC). At any point in time,the PC points at (contains the address of) some machine-language instruction inmain memory.1

    From the time that power is applied to the system, until the time that thepower is shut off, a processor repeatedly executes the instruction pointed at by theprogram counter and updates the program counter to point to the next instruction.A processor appears to operate according to a very simple instruction executionmodel, denedby its instruction set architecture. In thismodel, instructions executein strict sequence, and executing a single instruction involves performing a seriesof steps. The processor reads the instruction from memory pointed at by theprogram counter (PC), interprets the bits in the instruction, performs some simpleoperation dictated by the instruction, and then updates the PC to point to the nextinstruction, whichmay ormay not be contiguous inmemory to the instruction thatwas just executed.

    There are only a few of these simple operations, and they revolve aroundmain memory, the register le, and the arithmetic/logic unit (ALU). The registerle is a small storage device that consists of a collection of word-sized registers,each with its own unique name. The ALU computes new data and address values.Here are some examples of the simple operations that the CPU might carry outat the request of an instruction:

    1. PC is also a commonly used acronym for personal computer. However, the distinction betweenthe two should be clear from the context.

  • 10 Chapter 1 A Tour of Computer Systems

    . Load: Copy a byte or a word from main memory into a register, overwritingthe previous contents of the register.

    . Store: Copy a byte or a word from a register to a location in main memory,overwriting the previous contents of that location.

    . Operate:Copy the contents of two registers to theALU, performanarithmeticoperation on the two words, and store the result in a register, overwriting theprevious contents of that register.

    . Jump: Extract a word from the instruction itself and copy that word into theprogram counter (PC), overwriting the previous value of the PC.

    We say that a processor appears to be a simple implementation of its in-struction set architecture, but in fact modern processors use far more complexmechanisms to speed up program execution. Thus, we can distinguish the pro-cessors instruction set architecture, describing the effect of each machine-codeinstruction, from its microarchitecture, describing how the processor is actuallyimplemented. When we study machine code in Chapter 3, we will consider theabstraction provided by the machines instruction set architecture. Chapter 4 hasmore to say about how processors are actually implemented.

    1.4.2 Running the hello Program

    Given this simple view of a systems hardware organization and operation, we canbegin to understand what happens when we run our example program. We mustomit a lot of details here that will be lled in later, but for now we will be contentwith the big picture.

    Initially, the shell program is executing its instructions, waiting for us to typea command. As we type the characters ./hello at the keyboard, the shellprogram reads each one into a register, and then stores it in memory, as shown inFigure 1.5.

    When we hit the enter key on the keyboard, the shell knows that we havenished typing the command. The shell then loads the executable hello le byexecuting a sequence of instructions that copies the code and data in the helloobject le from disk to main memory. The data include the string of charactershello, world\n that will eventually be printed out.

    Using a technique known as direct memory access (DMA, discussed in Chap-ter 6), the data travels directly fromdisk tomainmemory, without passing throughthe processor. This step is shown in Figure 1.6.

    Once the code and data in the hello object le are loaded into memory, theprocessor begins executing the machine-language instructions in the hello pro-grams main routine. These instructions copy the bytes in the hello, world\nstring frommemory to the register le, and from there to the display device, wherethey are displayed on the screen. This step is shown in Figure 1.7.

    Section 1.4 Processors Read and Interpret Instructions Stored in Memory 11

    Figure 1.5Reading the hellocommand from thekeyboard.

    CPURegister file

    PC ALU

    Bus interface I/Obridge

    System bus Memory bus

    Mainmemory

    I/O busExpansion slots forother devices suchas network adaptersDisk

    controllerGraphicsadapter

    DisplayMouse Keyboard

    USBcontroller

    Disk

    hello

    Usertypeshello

    Disk

    CPURegister file

    PC ALU

    Bus interface I/Obridge

    System bus Memory bus

    Mainmemory

    I/O busExpansion slots forother devices suchas network adaptersDisk

    controllerGraphicsadapter

    DisplayMouse Keyboard

    USBcontroller

    hello, world\nhello code

    hello executablestored on disk

    Figure 1.6 Loading the executable from disk into main memory.

  • 12 Chapter 1 A Tour of Computer Systems

    Figure 1.7Writing the output stringfrom memory to thedisplay.

    CPURegister file

    PC ALU

    Bus interface I/Obridge

    System bus Memory bus

    Mainmemory

    I/O busExpansion slots forother devices suchas network adaptersDisk

    controllerGraphicsadapter

    DisplayMouse Keyboard

    USBcontroller

    Diskhello, world\n

    hello, world\nhello code

    hello executablestored on disk

    1.5 Caches Matter

    An important lesson from this simple example is that a system spends a lot oftime moving information from one place to another. The machine instructions inthe hello program are originally stored on disk. When the program is loaded,they are copied to main memory. As the processor runs the program, instruc-tions are copied from main memory into the processor. Similarly, the data stringhello,world\n, originally on disk, is copied to main memory, and then copiedfrommainmemory to the display device. From a programmers perspective, muchof this copying is overhead that slows down the real work of the program. Thus,a major goal for system designers is to make these copy operations run as fast aspossible.

    Because of physical laws, larger storage devices are slower than smaller stor-age devices. And faster devices are more expensive to build than their slowercounterparts. For example, the disk drive on a typical system might be 1000 timeslarger than the main memory, but it might take the processor 10,000,000 timeslonger to read a word from disk than from memory.

    Similarly, a typical register le stores only a few hundred bytes of information,as opposed to billions of bytes in the main memory. However, the processor canread data from the register le almost 100 times faster than from memory. Evenmore troublesome, as semiconductor technology progresses over the years, thisprocessor-memory gap continues to increase. It is easier and cheaper to makeprocessors run faster than it is to make main memory run faster.

    To deal with the processor-memory gap, system designers include smallerfaster storage devices called cache memories (or simply caches) that serve astemporary staging areas for information that the processor is likely to need inthe near future. Figure 1.8 shows the cache memories in a typical system. An L1

    Section 1.6 Storage Devices Form a Hierarchy 13

    Figure 1.8Cache memories.

    I/Obridge

    CPU chip

    Cachememories

    Register file

    System bus Memory bus

    Bus interface Mainmemory

    ALU

    cache on the processor chip holds tens of thousands of bytes and can be accessednearly as fast as the register le. A larger L2 cache with hundreds of thousandsto millions of bytes is connected to the processor by a special bus. It might take 5times longer for the process to access the L2 cache than the L1 cache, but this isstill 5 to 10 times faster than accessing themainmemory. The L1 and L2 caches areimplemented with a hardware technology known as static random access memory(SRAM). Newer and more powerful systems even have three levels of cache: L1,L2, and L3. The idea behind caching is that a system can get the effect of botha very large memory and a very fast one by exploiting locality, the tendency forprograms to access data and code in localized regions. By setting up caches to holddata that is likely to be accessed often, we can perform most memory operationsusing the fast caches.

    One of the most important lessons in this book is that application program-mers who are aware of cache memories can exploit them to improve the perfor-mance of their programs by an order of magnitude. You will learn more aboutthese important devices and how to exploit them in Chapter 6.

    1.6 Storage Devices Form a Hierarchy

    This notion of inserting a smaller, faster storage device (e.g., cache memory)between the processor and a larger slower device (e.g., main memory) turns outto be a general idea. In fact, the storage devices in every computer system areorganized as a memory hierarchy similar to Figure 1.9. As we move from the topof the hierarchy to the bottom, the devices become slower, larger, and less costlyper byte. The register le occupies the top level in the hierarchy, which is knownas level 0, or L0. We show three levels of caching L1 to L3, occupying memoryhierarchy levels 1 to 3. Main memory occupies level 4, and so on.

    The main idea of a memory hierarchy is that storage at one level serves as acache for storage at the next lower level. Thus, the register le is a cache for theL1 cache. Caches L1 and L2 are caches for L2 and L3, respectively. The L3 cacheis a cache for the main memory, which is a cache for the disk. On some networkedsystemswith distributed le systems, the local disk serves as a cache for data storedon the disks of other systems.

  • 14 Chapter 1 A Tour of Computer Systems

    CPU registers hold words retrieved from cache memory.

    L1 cache holds cache lines retrieved from L2 cache.

    L2 cache holds cache linesretrieved from L3 cache.

    Main memory holds disk blocks retrieved from local disks.

    Local disks hold filesretrieved from disks onremote network server.

    Regs

    L3 cache(SRAM)

    L2 cache(SRAM)

    L1 cache(SRAM)

    Main memory(DRAM)

    Local secondary storage(local disks)

    Remote secondary storage(distributed file systems, Web servers)

    Smaller,faster,and

    costlier(per byte)storagedevices

    Larger,slower,

    andcheaper

    (per byte)storagedevices

    L0:

    L1:

    L2:

    L3:

    L4:

    L5:

    L6:

    L3 cache holds cache linesretrieved from memory.

    Figure 1.9 An example of a memory hierarchy.

    Just as programmers can exploit knowledge of the different caches to improveperformance, programmers can exploit their understanding of the entire memoryhierarchy. Chapter 6 will have much more to say about this.

    1.7 The Operating System Manages the Hardware

    Back to our hello example. When the shell loaded and ran the hello program,and when the hello program printed its message, neither program accessed thekeyboard, display, disk, or main memory directly. Rather, they relied on theservices provided by the operating system. We can think of the operating system asa layer of software interposed between the application program and the hardware,as shown in Figure 1.10. All attempts by an application program tomanipulate thehardware must go through the operating system.

    The operating system has two primary purposes: (1) to protect the hardwarefrom misuse by runaway applications, and (2) to provide applications with simpleand uniformmechanisms for manipulating complicated and often wildly differentlow-level hardware devices. The operating system achieves both goals via the

    Figure 1.10Layered view of acomputer system.

    Application programs

    Operating system

    Main memory I/O devicesProcessor

    Software

    Hardware

    Section 1.7 The Operating System Manages the Hardware 15

    Figure 1.11Abstractions provided byan operating system.

    Main memory I/O devicesProcessor

    Processes

    Virtual memory

    Files

    fundamental abstractions shown in Figure 1.11: processes, virtual memory, andles. As this gure suggests, les are abstractions for I/O devices, virtual memoryis an abstraction for both the main memory and disk I/O devices, and processesare abstractions for the processor, main memory, and I/O devices. We will discusseach in turn.

    Aside Unix and Posix

    The 1960s was an era of huge, complex operating systems, such as IBMs OS/360 and HoneywellsMultics systems. While OS/360 was one of the most successful software projects in history, Multicsdragged on for years and never achievedwide-scale use. Bell Laboratories was an original partner in theMultics pro