An Introduction to C Prepared for UCSD summer 2009 class.

An Introduction to C

Prepared for UCSD summer 2009 class

Overview Low level programming language Language of choice when speed/efficiency at

a premium A compiled language Supports both Pass by Reference and Pass by

Value Very similar syntax to R (in fact, R closely

adopted C syntax)

When to use C over R WEAKNESSES

Not a statistics environment Limited graphical support (ie. no plot()

commands) Pointless for small problems, or when you

don’t need to optimize speed STRENGTHS

Custom estimators that take a long time to converge

Working with huge data sets

/* A standard program to print out a greeting*/

#include <stdio.h> int main(void){

printf(“\nHello world”);

return(0); }

Hello.c --- Our first C program

> gcc –o hello hello.c

> hello

Note commenting conventions ‘//’ ‘vs. ‘/* ‘

// is just like # in R, /* is for long comments.

Exactly like the library() command in R

All executables are functions called “main”, which must always spit out a number of type int . Functions are always defined by the thing they output (int), their name (main), their input (void in this case, which is nothing)

gcc is the default gnu compiler, -o tells it what to output to (hello.exe), the final file is the source code

In C, you must directly use the return() command, not just output ‘0’

Brackets: Note that, like R, C uses different kinds of brackets. Round brackets ‘()’ are for functions, squiggly brackets ‘{}’ are for control flow, and square brackets ‘[]’ are for indexes. This is identical to R.

All commands end with semicolons, and can not span multiple lines.

Working with Vectors A container of objects, in some order Can be a list of characters, integers, doubles,

or structs In R, just like sequences you would observe

from functions like seq(), rep() Static vs Dynamic Allocation: When to do what

What is a memory leak?

int main(void){ double numlist[4];

numlist[0] = 3; numlist[1] = 4; numlist[2] = 5; numlist[3] = 10; printf(“\n numlist[0] == %f”, numlist[0]); printf(“\n numlist[1] == %f”, numlist[1]); printf(“\n numlist[2] == %f”, numlist[2]); printf(“\n numlist[3] == %f”, numlist[3]); printf(“\n\nWhy will numlist[4] give an error?”); return(0);

}

Vector.c --- Static Allocation of Vector

First, notice you declare your variables up front. This is considered good form for compiled procedural language programming. To declare, it should be the type (double here) and a name (numlist here). If it is a vector and you are statically allocating it, use square brackets and size.

Doubles are basically the same thing as “numeric” in R

Assigning values to the container in order. Notice that it starts at 0. This is different from R, so be very careful here.

Printing the results, just like how sprintf() is done in R. %f is to show floating point, which is what a double is.

int main(void){ double *numlist; numlist = (double *) malloc(10*sizeof(double));

numlist[0] = 3; numlist[1] = 4; numlist[2] = 5; numlist[3] = 10; printf(“\n numlist[0] == %f”, numlist[0]); printf(“\n numlist[1] == %f”, numlist[1]); printf(“\n numlist[2] == %f”, numlist[2]); printf(“\n numlist[3] == %f”, numlist[3]); printf(“\n\nWhy will numlist[4] give an error?”); free(numlist) return(0);}

Vector.c --- Dynamic Allocation of Vector

To dynamically allocate, instantiate it as a pointer with a * (more on this later). At this point it has no space reserved, it just points to a double

malloc() takes pointer and reserves space. First, remember to cast it for safety (double *). Second, remember that you must reserve space equal to number of slots (10 slots here) x amount of space for each slot (size(double) in this case)

calloc() and realloc() are similar functions, calloc() initiates everything to be 0, realloc() resizes arrays

Always free the space after, this is the whole point

Working with Arrays Arrays are basically matrices, though they can

span more than two dimensions Can think of them as “vectors of vectors” Often these are stored as vectors that need to

be unrolled (we will see this later) Dynamic allocation is still supported

int main(void){ double randMat[3][3]; randMat[0][0] = 1; randMat[1][2] = 3; printf(“\nrandMat[0][0] = %f”, randMat[0][0]); printf(“\nrandMat[1][2] = %f”, randMat[1][2]); return(0); }

Array.c --- Static Allocation of Array

To create an array, declare the type, then the size of each dimension in the square brackets like this.

Only thing to note here is that notation is different from R. Not randMat[1,2], but randmat[1][2]

Always free the space after, this is the whole point

int main(void){ double **randMat;

randMat[0] = (double *) malloc (3*sizeof(double)); randMat[1] = (double *) malloc (3*sizeof(double)); randMat[2] = (double *) malloc (3*sizeof(double)); randMat[0][0] = 1; randMat[1][2] = 3; printf(“\nrandMat[0][0] = %f”, randMat[0][0]); printf(“\nrandMat[1][2] = %f”, randMat[1][2]);

free(randMat[0]); free(randMat[1]); free(randMat[2]); free(randMat);

return(0); }

Array.c --- Dynamic Allocation of Array

Dynamic allocation as pointer of pointers. Essentially 3x3 matrix is represented as 3 vectors, in a vector.

Obviously you can do this allocation in a for() loop, but I haven’t gotten there yet.

This technique is particularly effective for sparse matrices.

When freeing, you have to free each thing individually, this is very important.

Control Flow for(), if(), and while() are the ones we are

concerned about Use {} brackets “Controls flow” in the sense that it may not

simply do the next command on the line if(): used to condition execution of a

statement while(): use to loop over a chunk of code until

condition is met for(): used to loop over a chunk of code in

some order

int main(){

int i, a=4, b=10; int temp[10]; if(b>a){ printf(“\nb>a is true”); } if(b<a){ printf(“\nb<a is true”); } while(a != b){ printf(“\na equals %i”, a); a=a+1; } for(i=0;i<10;i++){ temp[i] = i; } for(i=0;i<10;i++){ printf(“\nElement %i in temp = %i”, i, temp[i]); } }

Control.c --- Control Flow demonstration

if() statements evaluate booleans (ie. True/False statements), which can also be 1/0 integers. Evaluative statements include >, <, ==, !=, >=, <=. note: ‘==’ is not the same as ‘=’

while() statements have the same syntax as if(), but an important point to note here is that the condition has to change at some point (a=a+1 here). Otherwise you will get an infinite loop.

for() loops typically loop around vectors like this, doing something to each element. Note that 3 things are present in the syntax. First, a counter is initialized to a start value (i=0). Next, a condition is set, and the loop runs until while the condition is met (i<10). Finally, a piece of code that increments/decrements the counter at the end of each loop is included (i++)

Functions Just like functions in R, think of these as

black boxes Usually output something after getting

some (multiple) inputs In fact, with pointers you can have multiple

outputs (we will see this soon) Use ‘()’ brackets, match arguments to call One question to think about: what is the

computer actually doing here?

double convert(double fahrenheit);

int main(){

printf("\n\t30 degrees fahrenheit == %f celcius",convert(30)); printf("\n\t20 degrees fahrenheit == %f celcius",convert(20));

}

double convert(double fahrenheit){

double celcius; celcius = (fahrenheit - 32)/1.8; return(celcius);

}

Function.c --- Temperature Converter

Put a header for each function in the code. For trival programs this won’t matter, but it will matter a lot for larger programs.

This defines a function. The first double declares output type. ‘convert’ is the name. “double celcius” defines input type. This line is equivalent to convert <- function(double celcius) in R, except R doesn’t declare output type

Just like in R: define a quantity like “fahrenheit”, do some stuff with it, then call return() on it. Remember that the output has to be the same output type you declared!

Call the function with the function name, and arguments in ‘()’ brackets.

Pointers Consider the following line in R, where

each matrix is NxN and huge superMat = superMat %*% anotherMat

What is/could be happening here? Why might this be stupid? What is the alternative? Pass by Reference vs. Pass by Value

(convert() was done pass by value) Two key commands: ‘*’ (dereference) and

‘&’ (reference) Matrices are special because of this

void convert(double *celcius);

int main(){

double temperature = 80;

printf("\n\tBefore conversion == %f celcius",temperature);

convert(&temperature);

printf("\n\tAfter conversion == %f celcius\n\n",temperature);

}

void convert(double *celcius){

double temp; temp = *celcius; temp = (temp - 32)/1.8; *celcius = temp;

}

Pointer.c ---Temp Converter with Pointer

temp = *celcius takes the value at the address of celcius (i.e. the dereferencing operator), and stores it in the variable temp.

At the end here, we are referencing the value of celcius and storing a new temperature in it. Hence, the original variable “temperature” has been modified.

Note that an address to temperature is being passed here. This means temperature is being passed by reference, so it can be modified by the function it is passed to.

LAPACK/BLAS Often you will want to do linear algebra routines to

matrices While you can write functions to do calculations

manually, this is definitely not advised LAPACK/BLAS standardizes the functions, and runs

much more efficiently General rule: Need to read documentation very

carefully, and test with small examples Let’s work through a manual calculation of OLS

using dgesv() and dgemm() You really need to have the documentation of the

functions to understand this example as I walk through this. See handouts. For other functions, just Google them!

int main(){

int i,info, ipiv[2]; char trans = 't', notrans ='n'; double alpha = 1.0, beta=0.0; int ncol=2; int nrow=5; int one=1; double XprimeX[4]; double X[10] = {1,1,1,1,1,0.3,-0.2,0.4,-0.5,0.3}; double Y[5] = {0.7,-0.5,0.9,-1.1,0.7}; double XXinv[4] = {1,0,0,1}; double XXinvX[10]; double coef[2];

printf("\n\nX = "); for(i=0;i<5;i++) printf("\n%f %f", X[i],X[i+5]); printf("\n\nY = "); for(i=0;i<5;i++) printf("\n%f", Y[i]);

ols.c --- Manual OLS with BLAS functions

Everything here should be pretty straightforward. Just define a few variables, planning to solve for the beta hats in OLS given X and Y.

The only new thing here so far is that X is a 5x2 matrix, but I am storing it as a 10x1 vector. This is very common when working with LAPACK/BLAS.

//solve X’X dgemm_(&trans,&notrans,&ncol,&ncol,&nrow,&alpha,X,&nrow,X,&nrow,&beta,XprimeX,&ncol); printf("\n\nX'X = "); for(i=0;i<2;i++) printf("\n%f %f",XprimeX[i], XprimeX[i+2]);

//solve (X’X)-1 dgesv_(&ncol,&ncol,XprimeX,&ncol,ipiv,XXinv,&ncol,&info); printf("\n\n(X'X)-1 = "); for(i=0;i<2;i++) printf("\n%f %f",XXinv[i], XXinv[i+2]);

//solve (X’X)-1X’ dgemm_(&notrans,&trans,&ncol,&nrow,&ncol,&alpha,XXinv,&ncol,X,&nrow,&beta,XXinvX,&ncol);

//solve (X’X)-1X’Y dgemm_(&notrans,&notrans,&ncol,&one,&nrow,&alpha,XXinvX,&ncol,Y,&nrow,&beta,coef,&nrow);

printf("\n\nB0 = %f", coef[0]); printf("\nB1 = %f\n\n", coef[1]);

return(0);

}

ols.c --- continued

Data Input and Output Up to now, we have just manually created data.

What if we want to read data from a file? Core idea: Create a file pointer, open the file with

permissions, and then read or write to it Even better: Have some error checking on the

reading In many cases: you will start with a large memory

space you read data into that you will need to resize Common mistake here: Incorrect casting of what you

are reading Two programs here are attached. We first generate

some random data in a file. Then, we read the data in another program.

#include <stdlib.h>#include <stdio.h>

int main(void){

FILE *fp; double data[10]; int i = 0;

for(i=0;i<10;i++){data[i] = ( (double)rand() / ((double)(RAND_MAX)+(double)(1))); }

fp = fopen("data.txt","w");

for(i=0;i<10;i++){ fprintf(fp, "%2.3f \n", data[i]); }

fclose(fp); }

writedata.c --- Writes random numbers into file

You will need some C libraries here for file and random number functions.

Declare a file pointer here. The file pointer has not yet been opened.

rand() generates a random number from 0 to RAND_MAX, so this line generates a random number from 0 to 1.

Here you open a file with the file pointer you created, with the “w” permission to write to it.

Always close your pointers after you are done with them!!

fprintf() works just like printf(), except you have to specify the file pointer it is printing to.

#include <stdlib.h>#include <stdio.h>#include <errno.h>

int main(void) {

int MAXVOTES = 10000; FILE *fp; double *numlist; numlist = (double *) malloc (MAXVOTES*sizeof(double)); int i;

if((fp = fopen("data.txt","r"))==NULL) { printf("\nUnable to open file DATA.TXT: %s\n", strerror(errno)); exit(1); } else { i=0; while (!feof(fp)) { fscanf(fp,"%f", (double *) &numlist[i]); i++; } }

fclose(fp); numlist = (double *) realloc(numlist, i* sizeof(double)); printf("\nAllocation OK, %i votes allocated.\n", i);}

readdata.c --- Reads random numbers into file Open required libraries again,

including one for error handling.

Very typical error recovery. Try opening the file with read permissions. If it fails, print error message.

Usually you will allocate a ton of space before reading data because you don’t know how much space you need. Declaring that space up front is a good idea, like MAXVOTES

feof() returns true if it is the end of file, so this while() loop reads data until there is no more to be read.

fscanf() is the reverse of fprintf(), it reads an observation from a data file. Same syntax.

Notice I used ‘i’ to count the number of entries, then I resized the array. If you only have 10 entries, you don’t need a container that can contain 10,000!

An Introduction to C Prepared for UCSD summer 2009 class.

Documents

Transcript of An Introduction to C Prepared for UCSD summer 2009 class.