An Introduction to C Prepared for UCSD summer 2009 class.
-
Upload
tracey-gilmore -
Category
Documents
-
view
213 -
download
0
Transcript of An Introduction to C Prepared for UCSD summer 2009 class.
An Introduction to C
Prepared for UCSD summer 2009 class
Overview Low level programming language Language of choice when speed/efficiency at
a premium A compiled language Supports both Pass by Reference and Pass by
Value Very similar syntax to R (in fact, R closely
adopted C syntax)
When to use C over R WEAKNESSES
Not a statistics environment Limited graphical support (ie. no plot()
commands) Pointless for small problems, or when you
don’t need to optimize speed STRENGTHS
Custom estimators that take a long time to converge
Working with huge data sets
/* A standard program to print out a greeting*/
#include <stdio.h> int main(void){
printf(“\nHello world”);
return(0); }
Hello.c --- Our first C program
> gcc –o hello hello.c
> hello
Note commenting conventions ‘//’ ‘vs. ‘/* ‘
// is just like # in R, /* is for long comments.
Exactly like the library() command in R
All executables are functions called “main”, which must always spit out a number of type int . Functions are always defined by the thing they output (int), their name (main), their input (void in this case, which is nothing)
gcc is the default gnu compiler, -o tells it what to output to (hello.exe), the final file is the source code
In C, you must directly use the return() command, not just output ‘0’
Brackets: Note that, like R, C uses different kinds of brackets. Round brackets ‘()’ are for functions, squiggly brackets ‘{}’ are for control flow, and square brackets ‘[]’ are for indexes. This is identical to R.
All commands end with semicolons, and can not span multiple lines.
Working with Vectors A container of objects, in some order Can be a list of characters, integers, doubles,
or structs In R, just like sequences you would observe
from functions like seq(), rep() Static vs Dynamic Allocation: When to do what
What is a memory leak?
int main(void){ double numlist[4];
numlist[0] = 3; numlist[1] = 4; numlist[2] = 5; numlist[3] = 10; printf(“\n numlist[0] == %f”, numlist[0]); printf(“\n numlist[1] == %f”, numlist[1]); printf(“\n numlist[2] == %f”, numlist[2]); printf(“\n numlist[3] == %f”, numlist[3]); printf(“\n\nWhy will numlist[4] give an error?”); return(0);
}
Vector.c --- Static Allocation of Vector
First, notice you declare your variables up front. This is considered good form for compiled procedural language programming. To declare, it should be the type (double here) and a name (numlist here). If it is a vector and you are statically allocating it, use square brackets and size.
Doubles are basically the same thing as “numeric” in R
Assigning values to the container in order. Notice that it starts at 0. This is different from R, so be very careful here.
Printing the results, just like how sprintf() is done in R. %f is to show floating point, which is what a double is.
int main(void){ double *numlist; numlist = (double *) malloc(10*sizeof(double));
numlist[0] = 3; numlist[1] = 4; numlist[2] = 5; numlist[3] = 10; printf(“\n numlist[0] == %f”, numlist[0]); printf(“\n numlist[1] == %f”, numlist[1]); printf(“\n numlist[2] == %f”, numlist[2]); printf(“\n numlist[3] == %f”, numlist[3]); printf(“\n\nWhy will numlist[4] give an error?”); free(numlist) return(0);}
Vector.c --- Dynamic Allocation of Vector
To dynamically allocate, instantiate it as a pointer with a * (more on this later). At this point it has no space reserved, it just points to a double
malloc() takes pointer and reserves space. First, remember to cast it for safety (double *). Second, remember that you must reserve space equal to number of slots (10 slots here) x amount of space for each slot (size(double) in this case)
calloc() and realloc() are similar functions, calloc() initiates everything to be 0, realloc() resizes arrays
Always free the space after, this is the whole point
Working with Arrays Arrays are basically matrices, though they can
span more than two dimensions Can think of them as “vectors of vectors” Often these are stored as vectors that need to
be unrolled (we will see this later) Dynamic allocation is still supported
int main(void){ double randMat[3][3]; randMat[0][0] = 1; randMat[1][2] = 3; printf(“\nrandMat[0][0] = %f”, randMat[0][0]); printf(“\nrandMat[1][2] = %f”, randMat[1][2]); return(0); }
Array.c --- Static Allocation of Array
To create an array, declare the type, then the size of each dimension in the square brackets like this.
Only thing to note here is that notation is different from R. Not randMat[1,2], but randmat[1][2]
Always free the space after, this is the whole point
int main(void){ double **randMat;
randMat[0] = (double *) malloc (3*sizeof(double)); randMat[1] = (double *) malloc (3*sizeof(double)); randMat[2] = (double *) malloc (3*sizeof(double)); randMat[0][0] = 1; randMat[1][2] = 3; printf(“\nrandMat[0][0] = %f”, randMat[0][0]); printf(“\nrandMat[1][2] = %f”, randMat[1][2]);
free(randMat[0]); free(randMat[1]); free(randMat[2]); free(randMat);
return(0); }
Array.c --- Dynamic Allocation of Array
Dynamic allocation as pointer of pointers. Essentially 3x3 matrix is represented as 3 vectors, in a vector.
Obviously you can do this allocation in a for() loop, but I haven’t gotten there yet.
This technique is particularly effective for sparse matrices.
When freeing, you have to free each thing individually, this is very important.
Control Flow for(), if(), and while() are the ones we are
concerned about Use {} brackets “Controls flow” in the sense that it may not
simply do the next command on the line if(): used to condition execution of a
statement while(): use to loop over a chunk of code until
condition is met for(): used to loop over a chunk of code in
some order
int main(){
int i, a=4, b=10; int temp[10]; if(b>a){ printf(“\nb>a is true”); } if(b<a){ printf(“\nb<a is true”); } while(a != b){ printf(“\na equals %i”, a); a=a+1; } for(i=0;i<10;i++){ temp[i] = i; } for(i=0;i<10;i++){ printf(“\nElement %i in temp = %i”, i, temp[i]); } }
Control.c --- Control Flow demonstration
if() statements evaluate booleans (ie. True/False statements), which can also be 1/0 integers. Evaluative statements include >, <, ==, !=, >=, <=. note: ‘==’ is not the same as ‘=’
while() statements have the same syntax as if(), but an important point to note here is that the condition has to change at some point (a=a+1 here). Otherwise you will get an infinite loop.
for() loops typically loop around vectors like this, doing something to each element. Note that 3 things are present in the syntax. First, a counter is initialized to a start value (i=0). Next, a condition is set, and the loop runs until while the condition is met (i<10). Finally, a piece of code that increments/decrements the counter at the end of each loop is included (i++)
Functions Just like functions in R, think of these as
black boxes Usually output something after getting
some (multiple) inputs In fact, with pointers you can have multiple
outputs (we will see this soon) Use ‘()’ brackets, match arguments to call One question to think about: what is the
computer actually doing here?
double convert(double fahrenheit);
int main(){
printf("\n\t30 degrees fahrenheit == %f celcius",convert(30)); printf("\n\t20 degrees fahrenheit == %f celcius",convert(20));
}
double convert(double fahrenheit){
double celcius; celcius = (fahrenheit - 32)/1.8; return(celcius);
}
Function.c --- Temperature Converter
Put a header for each function in the code. For trival programs this won’t matter, but it will matter a lot for larger programs.
This defines a function. The first double declares output type. ‘convert’ is the name. “double celcius” defines input type. This line is equivalent to convert <- function(double celcius) in R, except R doesn’t declare output type
Just like in R: define a quantity like “fahrenheit”, do some stuff with it, then call return() on it. Remember that the output has to be the same output type you declared!
Call the function with the function name, and arguments in ‘()’ brackets.
Pointers Consider the following line in R, where
each matrix is NxN and huge superMat = superMat %*% anotherMat
What is/could be happening here? Why might this be stupid? What is the alternative? Pass by Reference vs. Pass by Value
(convert() was done pass by value) Two key commands: ‘*’ (dereference) and
‘&’ (reference) Matrices are special because of this
void convert(double *celcius);
int main(){
double temperature = 80;
printf("\n\tBefore conversion == %f celcius",temperature);
convert(&temperature);
printf("\n\tAfter conversion == %f celcius\n\n",temperature);
}
void convert(double *celcius){
double temp; temp = *celcius; temp = (temp - 32)/1.8; *celcius = temp;
}
Pointer.c ---Temp Converter with Pointer
temp = *celcius takes the value at the address of celcius (i.e. the dereferencing operator), and stores it in the variable temp.
At the end here, we are referencing the value of celcius and storing a new temperature in it. Hence, the original variable “temperature” has been modified.
Note that an address to temperature is being passed here. This means temperature is being passed by reference, so it can be modified by the function it is passed to.
LAPACK/BLAS Often you will want to do linear algebra routines to
matrices While you can write functions to do calculations
manually, this is definitely not advised LAPACK/BLAS standardizes the functions, and runs
much more efficiently General rule: Need to read documentation very
carefully, and test with small examples Let’s work through a manual calculation of OLS
using dgesv() and dgemm() You really need to have the documentation of the
functions to understand this example as I walk through this. See handouts. For other functions, just Google them!
int main(){
int i,info, ipiv[2]; char trans = 't', notrans ='n'; double alpha = 1.0, beta=0.0; int ncol=2; int nrow=5; int one=1; double XprimeX[4]; double X[10] = {1,1,1,1,1,0.3,-0.2,0.4,-0.5,0.3}; double Y[5] = {0.7,-0.5,0.9,-1.1,0.7}; double XXinv[4] = {1,0,0,1}; double XXinvX[10]; double coef[2];
printf("\n\nX = "); for(i=0;i<5;i++) printf("\n%f %f", X[i],X[i+5]); printf("\n\nY = "); for(i=0;i<5;i++) printf("\n%f", Y[i]);
ols.c --- Manual OLS with BLAS functions
Everything here should be pretty straightforward. Just define a few variables, planning to solve for the beta hats in OLS given X and Y.
The only new thing here so far is that X is a 5x2 matrix, but I am storing it as a 10x1 vector. This is very common when working with LAPACK/BLAS.
//solve X’X dgemm_(&trans,¬rans,&ncol,&ncol,&nrow,&alpha,X,&nrow,X,&nrow,&beta,XprimeX,&ncol); printf("\n\nX'X = "); for(i=0;i<2;i++) printf("\n%f %f",XprimeX[i], XprimeX[i+2]);
//solve (X’X)-1 dgesv_(&ncol,&ncol,XprimeX,&ncol,ipiv,XXinv,&ncol,&info); printf("\n\n(X'X)-1 = "); for(i=0;i<2;i++) printf("\n%f %f",XXinv[i], XXinv[i+2]);
//solve (X’X)-1X’ dgemm_(¬rans,&trans,&ncol,&nrow,&ncol,&alpha,XXinv,&ncol,X,&nrow,&beta,XXinvX,&ncol);
//solve (X’X)-1X’Y dgemm_(¬rans,¬rans,&ncol,&one,&nrow,&alpha,XXinvX,&ncol,Y,&nrow,&beta,coef,&nrow);
printf("\n\nB0 = %f", coef[0]); printf("\nB1 = %f\n\n", coef[1]);
return(0);
}
ols.c --- continued
Data Input and Output Up to now, we have just manually created data.
What if we want to read data from a file? Core idea: Create a file pointer, open the file with
permissions, and then read or write to it Even better: Have some error checking on the
reading In many cases: you will start with a large memory
space you read data into that you will need to resize Common mistake here: Incorrect casting of what you
are reading Two programs here are attached. We first generate
some random data in a file. Then, we read the data in another program.
#include <stdlib.h>#include <stdio.h>
int main(void){
FILE *fp; double data[10]; int i = 0;
for(i=0;i<10;i++){data[i] = ( (double)rand() / ((double)(RAND_MAX)+(double)(1))); }
fp = fopen("data.txt","w");
for(i=0;i<10;i++){ fprintf(fp, "%2.3f \n", data[i]); }
fclose(fp); }
writedata.c --- Writes random numbers into file
You will need some C libraries here for file and random number functions.
Declare a file pointer here. The file pointer has not yet been opened.
rand() generates a random number from 0 to RAND_MAX, so this line generates a random number from 0 to 1.
Here you open a file with the file pointer you created, with the “w” permission to write to it.
Always close your pointers after you are done with them!!
fprintf() works just like printf(), except you have to specify the file pointer it is printing to.
#include <stdlib.h>#include <stdio.h>#include <errno.h>
int main(void) {
int MAXVOTES = 10000; FILE *fp; double *numlist; numlist = (double *) malloc (MAXVOTES*sizeof(double)); int i;
if((fp = fopen("data.txt","r"))==NULL) { printf("\nUnable to open file DATA.TXT: %s\n", strerror(errno)); exit(1); } else { i=0; while (!feof(fp)) { fscanf(fp,"%f", (double *) &numlist[i]); i++; } }
fclose(fp); numlist = (double *) realloc(numlist, i* sizeof(double)); printf("\nAllocation OK, %i votes allocated.\n", i);}
readdata.c --- Reads random numbers into file Open required libraries again,
including one for error handling.
Very typical error recovery. Try opening the file with read permissions. If it fails, print error message.
Usually you will allocate a ton of space before reading data because you don’t know how much space you need. Declaring that space up front is a good idea, like MAXVOTES
feof() returns true if it is the end of file, so this while() loop reads data until there is no more to be read.
fscanf() is the reverse of fprintf(), it reads an observation from a data file. Same syntax.
Notice I used ‘i’ to count the number of entries, then I resized the array. If you only have 10 entries, you don’t need a container that can contain 10,000!