CSE 3302 - Rangerranger.uta.edu › ~nystrom › courses › cse3302-fa10 › lec › 3302-01.pdfCSE...

60
CSE 3302 Lecture 1: Introduction 26 Aug 2010 Nate Nystrom University of Texas at Arlington

Transcript of CSE 3302 - Rangerranger.uta.edu › ~nystrom › courses › cse3302-fa10 › lec › 3302-01.pdfCSE...

CSE 3302Lecture 1: Introduction

26 Aug 2010

Nate NystromUniversity of Texas at Arlington

What is a programming language?

• Medium for communicating our intentions to machines, and to other people, and to ourselves

• A language should express computation:• precisely• at a high level• so we (and the machine) can reason about them

• Make it easier to write programs that really work

Why study languages?

• Learn new ways of thinking about programming• Understanding the tools helps you avoid nasty surprises• Become a sophisticated, skeptical consumer of

languages• Learn to reason about your programs• Get a job

Story: buffer overflows

• Oldest software defect in the book

copy(Array  a,  Array  b)  {        for  (int  i  =  0;  i  <  a.length;  i++)                a[i]  =  b[i];}

Story: buffer overflows

• Oldest software defect in the book

copy(Array  a,  Array  b)  {        for  (int  i  =  0;  i  <  a.length;  i++)                a[i]  =  b[i];} blows up if b.length < a.length

Not only fail-stop

#include  <string.h>  void  foo(char  *bar){      char  c[12];      memcpy(c,  bar,  strlen(bar));    //  no  bounds  checking...}  int  main(int  argc,  char  **argv){      foo(argv[1]);  }

Benign Malicious

Overwrite return address, causing branch to injected code

Possible fixes

Possible fixes

• Don’t execute code on the stack or heap• There are ways to workaround (return to libc)

Possible fixes

• Don’t execute code on the stack or heap• There are ways to workaround (return to libc)

• Use safe libraries• How to enforce?

Possible fixes

• Don’t execute code on the stack or heap• There are ways to workaround (return to libc)

• Use safe libraries• How to enforce?

• Use safe languages• Java, C#, ML, Cyclone, ...• What about legacy code?

CCured

• George Necula et al. 02, 03• source-to-source translator for C• determines smallest number of run-time checks that

must be inserted to (statically) guarantee no memory safety violations

• resulting program is memory safe• but: need to store and check array bounds information

• => can hurt performance

Some questions...

What’s your favorite programming language?

What’s the best programming language?

Why choose one language over another?

Why study different languages?• Some languages are more powerful than others

• Happy user of Blub:• “Blub beats Cobol and assembly.”• “Use Haskell, Scheme, ML? Hell no! ‘cause they are

all equivalent to Blub plus some bizarre stuff no one uses.”

• Habit blinds us to power

• Only valid reason to use an inferior language is backward compatibility with legacy libraries and tools

Red-black tree insertion (C)void  LeftRotate(rb_red_blk_tree*  tree,  rb_red_blk_node*  x)  {    rb_red_blk_node*  y;    rb_red_blk_node*  nil=tree-­‐>nil;

   y=x-­‐>right;    x-­‐>right=y-­‐>left;

   if  (y-­‐>left  !=  nil)  y-­‐>left-­‐>parent=x;

   y-­‐>parent=x-­‐>parent;      

   if(  x  ==  x-­‐>parent-­‐>left)  {        x-­‐>parent-­‐>left=y;    }  else  {        x-­‐>parent-­‐>right=y;    }    y-­‐>left=x;    x-­‐>parent=y;}

void  RightRotate(rb_red_blk_tree*  tree,  rb_red_blk_node*  y)  {    rb_red_blk_node*  x;    rb_red_blk_node*  nil=tree-­‐>nil;

   x=y-­‐>left;    y-­‐>left=x-­‐>right;

   if  (nil  !=  x-­‐>right)    x-­‐>right-­‐>parent=y;        x-­‐>parent=y-­‐>parent;    if(  y  ==  y-­‐>parent-­‐>left)  {        y-­‐>parent-­‐>left=x;    }  else  {        y-­‐>parent-­‐>right=x;    }    x-­‐>right=y;    y-­‐>parent=x;}

rb_red_blk_node  *  RBTreeInsert(rb_red_blk_tree*  tree,  void*  key,  void*  info)  {    rb_red_blk_node  *  y;    rb_red_blk_node  *  x;    rb_red_blk_node  *  newNode;

   x=(rb_red_blk_node*)  SafeMalloc(sizeof(rb_red_blk_node));    x-­‐>key=key;    x-­‐>info=info;

   TreeInsertHelp(tree,x);    newNode=x;    x-­‐>red=1;    while(x-­‐>parent-­‐>red)  {        if  (x-­‐>parent  ==  x-­‐>parent-­‐>parent-­‐>left)  {            y=x-­‐>parent-­‐>parent-­‐>right;            if  (y-­‐>red)  {   x-­‐>parent-­‐>red=0;   y-­‐>red=0;   x-­‐>parent-­‐>parent-­‐>red=1;   x=x-­‐>parent-­‐>parent;            }  else  {   if  (x  ==  x-­‐>parent-­‐>right)  {      x=x-­‐>parent;      LeftRotate(tree,x);   }   x-­‐>parent-­‐>red=0;   x-­‐>parent-­‐>parent-­‐>red=1;   RightRotate(tree,x-­‐>parent-­‐>parent);            }          }  else  {  /*  case  for  x-­‐>parent  ==  x-­‐>parent-­‐>parent-­‐>right  */            y=x-­‐>parent-­‐>parent-­‐>left;            if  (y-­‐>red)  {   x-­‐>parent-­‐>red=0;   y-­‐>red=0;   x-­‐>parent-­‐>parent-­‐>red=1;   x=x-­‐>parent-­‐>parent;            }  else  {   if  (x  ==  x-­‐>parent-­‐>left)  {      x=x-­‐>parent;      RightRotate(tree,x);   }   x-­‐>parent-­‐>red=0;   x-­‐>parent-­‐>parent-­‐>red=1;   LeftRotate(tree,x-­‐>parent-­‐>parent);            }          }    }    tree-­‐>root-­‐>left-­‐>red=0;    return(newNode);}

void  TreeInsertHelp(rb_red_blk_tree*  tree,  rb_red_blk_node*  z)  {    rb_red_blk_node*  x;    rb_red_blk_node*  y;    rb_red_blk_node*  nil=tree-­‐>nil;        z-­‐>left=z-­‐>right=nil;    y=tree-­‐>root;    x=tree-­‐>root-­‐>left;    while(  x  !=  nil)  {        y=x;        if  (1  ==  tree-­‐>Compare(x-­‐>key,z-­‐>key))  {  /*  x.key  >  z.key  */            x=x-­‐>left;        }  else  {  /*  x,key  <=  z.key  */            x=x-­‐>right;        }    }    z-­‐>parent=y;    if  (  (y  ==  tree-­‐>root)  ||              (1  ==  tree-­‐>Compare(y-­‐>key,z-­‐>key)))  {  /*  y.key  >  z.key  */        y-­‐>left=z;    }  else  {        y-­‐>right=z;    }}

Red-black tree insertion (Scala)

abstract  class  RBMap[K:  Ordered,  V]  {    protected  def  blacken(n:  RBMap[K,V])  =  n  match  {        case  L()  =>  n        case  T(_,l,k,v,r)  =>  T(B,l,k,v,r)    }        protected  def  balance  (c:  Color)  (l:  RBMap[K,V])  (k:  K)  (v:  Option[V])  (r:  RBMap[K,V])  =  (c,l,k,v,r)  match  {        case  (B,T(R,T(R,a,xK,xV,b),yK,yV,c),zK,zV,d)  =>  T(R,T(B,a,xK,xV,b),yK,yV,T(B,c,zK,zV,d))        case  (B,T(R,a,xK,xV,T(R,b,yK,yV,c)),zK,zV,d)  =>  T(R,T(B,a,xK,xV,b),yK,yV,T(B,c,zK,zV,d))        case  (B,a,xK,xV,T(R,T(R,b,yK,yV,c),zK,zV,d))  =>  T(R,T(B,a,xK,xV,b),yK,yV,T(B,c,zK,zV,d))        case  (B,a,xK,xV,T(R,b,yK,yV,T(R,c,zK,zV,d)))  =>  T(R,T(B,a,xK,xV,b),yK,yV,T(B,c,zK,zV,d))        case  (c,a,xK,xV,b)  =>  T(c,a,xK,xV,b)    }

   private[map]  def  modWith  (k:  K,  f:  (K,  Option[V])  =>  Option[V]):  RBMap[K,V]  

   def  modifiedWith  (k:  K,  f:  (K,  Option[V])  =>  Option[V]):  RBMap[K,V]  =  blacken(modWith(k,f))

   def  insert  (k:  K,  v:  V)  =  modifiedWith  (k,  (_,_)  =>  Some(v))}

private  case  class  L[K:  Ordered,  V]  extends  RBMap[K,V]    {    private[map]  def  modWith  (k:  K,  f:  (K,  Option[V])  =>  Option[V])  =  T(R,  this,  k,  f(k,None),  this)}

private  case  class  T[K:  Ordered,  V](c:  Color,  l:  RBMap[K,V],  k:  K,  v:  Option[V],  r:  RBMap[K,V])  extends  RBMap[K,V]  {    private[map]  def  modWith  (k:  K,  f:  (K,  Option[V])  =>  Option[V]):  RBMap[K,V]  =  {        if  (k  <    this.k)  (balance  (c)  (l.modWith(k,f))  (this.k)  (this.v)  (r))  else        if  (k  ==  this.k)  (T(c,l,k,f(this.k,this.v),r))  else        (balance  (c)  (l)  (this.k)  (this.v)  (r.modWith(k,f)))    }}

Agenda

• Intellectual tools to understand and evaluate programming languages• focus is on language features

• Learn by doing• write mostly short programs, thought required• implement language features to understand how they

work

What abstractions should a language support?

Language features• Choose abstractions (i.e., language features) to suit the

needs

• Some features to help build your vocabulary:

• higher-order functions

• polymorphism• pattern matching

• data for symbolic computing: lists, tables, sets• abstract data types, encapsulation

• objects and subtyping

• modules and parameterization• searching and backtracking

What features?

“A programming language should be designed not by piling feature on top of feature, but by removing the weaknesses that make additional features appear necessary.”

–The Scheme Report

What influences language design?

Language design

Design influences

Theory

Programmer productivity

Implementation

Theory

• Functional programming• Type systems• Formal semantics (defining the language precisely)

• Operational semantics (tools of the trade)• Denotational semantics (for mathematicians)• Axiomatic semantics (for logicians)

Productivity• Programming methodologies• Software engineering• Goals

• fewer bugs, easier to isolate• including performance bugs

• code reuse• Techniques

• strong typing (static or dynamic)• abstract data types• modules (including generics)• objects and inheritance• separate compilation• automatic memory management

Implementation

• Cannot design a language without considering how it will be implemented

• Techniques• Parser generators• Memory allocation• Garbage collection• Runtime typing, tagging• Reflection• Performance

• fast execution, fast compilation, low footprint, predictability

Design dimensions• Typing

• strong vs. weak• static vs. dynamic• monomorphic vs. polymorphic

• First-class values• structures?• procedures?• are built-in types different?

• Safety• no unexplained crashes• security?

• Control flow• stack-based• heap-based (closures and continuations)• search-based (logic programming, unification)

Design non-dimensions

• Simplicity

• Orthogonality

• Readability

A short history of programming languages

In the beginning...

• 1800s Jacquard loom: punch cards => cloth designs• 1830-40s Charles Babbage: difference engine

• finally built in 1991• http://www.youtube.com/watch?v=Lcedn6fxgS0

• Ada Lovelace wrote some notes on how to program the engine

• 1941 Z3: first digital computer (electromechanical)• 1943 ENIAC: first electronic computer• 1945 EDVAC: von Neumann architecture (program is

data)• programmed by rewiring

Machine code and Assembly

• Machine code: bit sequences• 00000 00001 00010 00110 00000 10000

• Assembly language• symbolic representation of machine code• machine-specific

• ld r2, 0[r1]• addi r3, r2, 1• st 0[r1], r3

Towards more abstraction• Fortran

• John Backus et al. (IBM) 1954-57• arrays, loops, if statement

• Cobol• Grace Murray Hopper (DoD) 1959-60• record structure, separate data structures from execution

• Algol60• type declarations, block structure, recursion

• LISP• John McCarthy 1960• first-class functions, garbage collection, eval

• CLU• Liskov et al. 1972-80• abstract data types, iterators, exceptions

Major paradigms

• Imperative• Functional• Logic• Object-oriented

Functional languages• LISP (John McCarthy 1960)

• Scheme (Guy Steele 1980)• ML (Milner)

• OCaml (Leroy), F# (Syme)• Haskell

• pure functional language (no mutable state), lazy evaluation• Key features

• first-class functions (aka higher-order functions, closures)• val square = fun (x) => x*x• val xs = [1,2,3,4]• val ys = map square xs (* [1,4,9,16] *)• val zs = foldl (+) 0 xs (* 0+1+2+3+4 = 10 *)

• pattern matching• fun length(xs) = case (xs) [] => 0 | y::ys => 1 + length(ys) end

Logic languages

• Prolog• search-based evaluation

sibling(a,b) :- parent(a,x), parent(b,x).sister(a,b) :- sibling(a,b), female(b).brother(a,b) :- sibling(a,b), male(b).parent(“Apollo”, “Zeus”). male(“Apollo”).parent(“Artemis”, “Zeus”). female(“Artemis”).parent(“Ares”, “Zeus”). male(“Ares”).brother(“Artemis”, X).

--> X=”Apollo”, X=”Ares”

OO languages• Simula67

• Kristen Nygaard, Ole-Johan Dahl• objects, classes, inheritance, virtual methods, coroutines

• Smalltalk• 1972-80 Alan Kay, Dan Ingalls, et al. (Xerox PARC)• pure OO (everything is an object)• classes, metaclasses• blocks (closures)

• C++ - C with Simula67 classes• Java

• 1995 James Gosling et al. (Sun)• C++ syntax but portable, type-safe, GC, rich libraries

• C# - Microsoft’s Java with a few improvements

Multi-paradign languages

• Scala• Martin Odersky et al. (EPFL) 2005-10• Runs on JVM and CLR• OO + FP features

• pure OO (first-class functions are objects)• type inference• implicit conversions

• Functional logic languages

Scripting languages

• Perl, Python, PHP, Ruby, Tcl, Groovy, JavaScript• dynamically typed• high-level data structures (list, map, etc) [many

borrowed from functional languages]

Domain-specific languages

• SQL - databases• TeX, LaTeX, troff - text processing• Matlab, Mathematica - math• AutoLisp - CAD• Processing, NodeBox - graphics

Hot topics in PL

• Effects• How to reason about side-effects (state, I/O,

concurrency)• Concurrency

• abstractions for concurrent programming• type systems for eliminating concurrency-related bugs• much more later

• Security• enforcing security policies in the language

• Bug-finding• program analysis to find bugs

Moore’s law

Multicore• Moore’s law still holds:

• 2x transistors every 18 months• Intel: 32nm in early 2010, 4nm in 2022

• Others: about one generation behind (IBM @ 45nm late 2008)

• Use transistors to add smaller, simpler cores

• IBM shifted to multicore in 2002, Intel in 2004• Intel scrapped Prescott (3.4GHz P4)

• Run at lower clock frequency• Less work per transistor ⇒ less heat

Another trend: hybrid architectures• Cell

• 1 Power processor• 8-16 “synergistic processing elements” (vector processors)

• GPGPU• many vector processors (NVIDIA GTX 480 = 448 cores)

• Can take advantage of these architectures if you can express your computation as vector operations• e.g. CUDA for NVIDIA GPUs

• no recursion, no virtual dispatch• limited memory• must manually manage movement of data to/from GPU

Concurrency for the masses

• Parallelism is the way to get high-performance on modern architectures

• Parallel programming is becoming mainstream

• No longer domain of the expert

Concurrent programming

• ... is hard:• data races• deadlock• livelock• overlocking• underlocking• priority inversion

• how to parallelize effectively?

Concurrent programming languages

• Need new languages to hide the complexity• abstractions for concurrent programming• type systems to rule out errors, improve performance

My research projects• Thorn

• a scalable concurrent scripting language• http://www.thorn-lang.org

• X10• a concurrent OO language for HPC• http://www.x10-lang.org

• Firepile• a Scala library for GPU programming

• Polyglot• an extensible compiler framework• http://www.cs.cornell.edu/Projects/polyglot

• See me for more

Rest of the course

• Objects• Functional programming• Concurrency

Administrivia

Grading

• three exams: 15% each• 10 homeworks: 45% total

• can drop lowest grade• term paper: 10%

• must getting a passing grade on term paper to pass the course

Assignments

• 10 small assignments• some writing• some (short, but sometimes tricky) programming• late penalty: 100%• readability and speling counts!

Working together

• Collaborate! (up to a point)• that’s what professionals do• vital to your success• discuss problems, techniques, ideas• all discussions must be acknowledged• if in doubt, ask me• if still in doubt, don’t collaborate

• Must not collaborate on code• don’t even look!

Method of study

• focus on• semantics, not syntax• the unusual, not the usual (weird by powerful)

• case studies of interpreters• learn foundations of languages by studying and

modifying implementations

• study abstracted “essentials” of languages• supplement by

• descriptive tools: operational semantics, lambda calculus, type systems

Questions?