Shunting data Copying stuctures. Text fileDB tableDouble[]Gsl_vectorGsl_matrixApop_dat a Text...

Post on 26-Mar-2015

215 views 2 download

Tags:

Transcript of Shunting data Copying stuctures. Text fileDB tableDouble[]Gsl_vectorGsl_matrixApop_dat a Text...

Shunting data

Copying stuctures

Text file DB table Double[] Gsl_vector Gsl_matrix Apop_data

Text file C F F

DB table Q Q Q Q

Double[] C F F

Gsl_vector P P F C F F

Gsl_matrix P P F V C F

Apop_data

P P F S S C

TO

FROM

METHODS OF CONVERSION

• C- copying• F-Fuction call• Q-querying• P-Printing• V-views• S-Subelements.

C – copying

• Gsl_memcpy function is used• int gsl_vector_memcpy (gsl_vector * dest,

const gsl_vector * src)– This function copies the elements of the

vector src into the vector dest. The two vectors must have the same length.

• This function assumes that the destination to which data is copied has already been allocated.

• apop_data* apop_data_copy(const apop_data * in)– Copies one apop_data structure to another. That

is, all data is duplicated.

• Memmove(&second,&first,sizeof(datatype))– It goes to the locatin of first and blindly copies

what it finds to the location of second upto the size of one datatype.

• Apop_system() : int apop_system(const char * fmt, ... )– Call system(), but with printf-style arguments.– E.g. : char filenames[] = "apop_asst.c apop_asst.o"

apop_system("ls -l %s", filenames); • Returns: The return value of the system() call.

• int gsl_matrix_memcpy (gsl_matrix * dest, const gsl_matrix * src)– This function copies the elements of the matrix src into the

matrix dest. The two matrices must have the same size.

F – Function Call

• These are designed to convert one format to another

• There are two ways :– Using pointer to declare a list of pointers to

pointers– Automatically allocated array to use double

subscriptsSecond method is more convenient but it allows

decl of matrix only once.

F – Function Call

P -printing

• The ouptut can be directed to screen , file, database or system

• Apop_opts.output_type function is used .it has following choices :– ‘s’ : print to screen (default)– ‘f’ : print to file– ‘d’ : stores the result in db table– ‘p’ : write to pipe in apop_opts.output-pipe

Q – querying

• To get data from db queries can be used.• Four ways :

– Apop_query_to_float– Apop_query_to_vector– Apop_query_to_data– Apop_query_to_matrix

• int apop_query(const char * fmt, ... )– Send a query to the database that returns no

data.– As with the apop_query_to_... functions, the

query can include printf-style format specifiers, such as :

apop_query("create table %s(id, name, age);", tablename).

• apop_data* apop_query_to_data(const char * fmt, ... )– Queries the database, and dumps the result into an

apop_data set.– Most data will be in the matrix element of the output.

Column names are appropriately placed.• double apop_query_to_float(const char

* fmt, ... )– Queries the database, and dumps the result into a

single double-precision floating point number.

• This calls apop_query_to_data and returns the (0,0)th element of the returned matrix. Thus, if your query returns multiple lines, you will get no warning, and the function will return the first in the list

• gsl_matrix* apop_query_to_matrix(const char * fmt, ... )

• Queries the database, and dumps the result into a matrix.

• Uses apop_query_to_data and returns just the matrix part

• Returns gsl_matrix

• gsl_vector* apop_query_to_vector(const char * fmt, ... )

• Queries the database, and dumps the first column of the result into a gsl_vector.

• Uses apop_query_to_data internally, then throws away all but the first column of the matrix.

• Returns:A gsl_vector holding the first column of the returned matrix. Thus, if your query returns multiple lines, you will get no warning, and the function will return the first in the list.

S- subelements

• Only some data items can be pulled out of entire set.

• For this method of copying function from F above can be used.

V - views

• Exactly similar to db views.• Can have have subsets of original matrices .• Changes made to original data will be reflected

in views and vice versa• Following gsl_matrix functions are used:

– Apop_matrix_row(m,row,v)– Apop_matrix_col(m,col,v)– Apop_submatrix (m, srow, scol, nrows, ncols,

o )

• Apop_matrix_col(m,col,v)– After this call, v will hold a vector view of the colth

column of m. – Eg : Apop_matrix_col(m,5,col_v)– It will return a gsl_vector named col_v holding the

fifth column • Apop_matrix_row(m,row,v)– After this call, v will hold a vector view of the rowth

row of m.– Eg : Apop_matrix_row(m,3,row_v)– It will return a gsl_vector named row_v holding the

third row

Apop_submatrix (m, srow, scol, nrows, ncols, o )

• It Pulls a pointer to a submatrix into a gsl_matrix • Parameters:

– m : The root matrix– srowthe first row (in the root matrix) of the top of

the submatrix– scol :the first column (in the root matrix) of the left

edge of the submatrix– nrow: number of rows in the submatrix ncolnumber

of columns in the submatrix

Example

• Apop_submatrix(m,2,4,6,8,submat)– It will return a gsl_matrix * named submat whose

(0,0)th element is at (2,4) from original matrixFor data sets we use these functions with

row/column namesApop_row_t(m,”fourth_row”,row_v)Apop_col_t(m,”fifth column”,col_v)

LINEAR ALGEBRA

• apop_data* apop_dot(const apop_data * d1, const apop_data * d2, char form1,char form2 )• A convenience function for dot products.

– d1 may be a vector or a matrix, and the same for d2, – so this function can do vector dot matrix, matrix dot

matrix, and so on. – If d1 includes both a vector and a matrix, then later

parameters will indicate which to use.– Char form 1 and 2 are flags for each matrix indicating

what to do with it– i.e ‘t’ for transpose– ‘v’ for vector – 0 use the matrix as it is.

• Eg : apop_data(X,X,’t’,0)• it will X’X i.e.it takesdot product of X with

itself and the first version of X is transposed.while the second is not.

• If first row is vector it is always taken to be row .if second element is is a vector it is alwys taken to be column

• int gsl_blas_ddot (const gsl_vector * x, const gsl_vector * y, double * result)– It returns the dot product of vectors X and Y– Eg : double dotprod; gsl_blas_dot(x,y,&dotprod); **The Basic Linear Algebra Subprograms (BLAS) define

a set of fundamental operations on vectors and matrices which can be used to create optimized higher-level linear algebra functionality.

The functions are declared in the file gsl_blas.h

MATRIX INVERSION AND EQUATION SOLVING

• gsl_matrix* apop_matrix_inverse(const gsl_matrix * in)– Inverts a matrix. The in matrix is not destroyed in the

process. You may need to call apop_matrix_determinant first to check that your input is invertible, or use apop_det_and_inv to do both at once.

• Parameters : in is the The matrix to be inverted.• Returns:Its inverse.

• double apop_matrix_determinant(const gsl_matrix * in)– Find the determinant of a matrix. The in matrix is

not destroyed in the process.– apop_matrix_inverse , or apop_det_and_inv to

do both at once.

• Parameters: in– The matrix to be determined.Returns:The determinant.

• double apop_det_and_inv(const gsl_matrix * in,gsl_matrix ** out,int calc_det,int calc_inv )

• Calculate the determinant of a matrix, its inverse, or both, via LU decomposition. The in matrix is not destroyed in the process.

• Parameters:in The matrix to be inverted/determined.• Out : If you want an inverse, this is where to place the

matrix to be filled with the inverse. Will be allocated by the function.

• calc_det0: Do not calculate the determinant. \ 1: Do.• calc_inv0: Do not calculate the inverse. \ 1: Do.

Returns:If calc_det == 1, then return the determinant. Otherwise, just returns zero.

If calc_inv!=0, then *out is pointed to the matrix inverse.

Numbers

• Values taken by floating point numbers can take :they areINFINITY-INFINITYNAN(not a number)(mainly used for missing data)

MODELS

Apop_model• Similar to apo_data it encapsulates model

information in uniform manner• It allows models to be in various functions

that can take any model as input .• A model is intermediate between data and

parameters.from there model can go in three directions

1. X ⇒ β : given data, estimate parameters (OLS parameter or covariance)

2. β ⇒ X : given parameters generate artificial data (Monte Carlo)

3. (X, β )⇒ p : given both parameters and data estimate their likelihood or probability (Bayesian Estimation)

apop_model* apop_estimate(apop_data * d,apop_model m )

• estimate the parameters of a model given data.

• This function copies the input model, preps it, and calls m.estimate(d,&m).

• If your model has no estimate method, then It assume apop_maximum_likelihood(d, m), with the default MLE params.

• Parameters:– d :The data– m :The model Returns:A pointer to an output model, which

typically matches the input model but has its parameters element filled in.

• Eg apop_model *est = apop_estimate(data,apop_normal)

Examples

• Cook’s distance• Network data• MLE models• Utility maximization

Cook’s Distance• It is an estimate of how much each data point affects

a regression. In a practical ordinary least squares analysis, Cook's

distance can be used in several ways: 1)to indicate data points that are particularly worth checking for validity;2) to indicate regions of the design space where it would be good to be able to obtain more data points.

It is named after the American statistician R. Dennis Cook, who introduced the concept in 1977.

Cook's distance measures the effect of deleting a given observation. Data points with large residuals (outliers) and/or high leverage may distort the outcome and accuracy of a regression. Points with a large Cook's distance are considered to merit closer examination in the analysis

MLE models

• maximum-likelihood estimation (MLE) is a method of estimating the parameters of a statistical model.

• When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters.

Example • one may be interested in the heights of adult female

penguins, but be unable to measure the height of every single penguin in a population due to cost or time constraints.

• Assuming that the heights are normally (Gaussian) distributed with some unknown mean and variance, the mean and variance can be estimated with MLE while only knowing the heights of some sample of the overall population.

• MLE would accomplish this by taking the mean and variance as parameters and finding particular parametric values that make the observed results the most probable (given the model).

GRAPHICS

Graphics

• Gnuplot is a free, command-driven, interactive, function and data plotting program.

• Any mathematical expression accepted by C, FORTRAN, Pascal, or BASIC may be plotted. The precedence of operators is determined by the specifications of the C programming language.

• plot and splot are the primary commands in Gnuplot. They plot functions and data in many many ways. plot is used to plot 2-d functions and data, while splot plots 3-d surfaces and data.

Syntax

• plot {[ranges]} {[function] | {"[datafile]" {datafile-modifiers}}} {axes [axes] } { [title-spec] } {with [style] } {, {definitions,} [function] ...}

• where either a [function] or the name of a data file enclosed in quotes is supplied.

• To plot functions simply type:• plot [function] at the gnuplot> prompt.• For example:• gnuplot> plot sin(x)/x• gnuplot> splot sin(x*y/20) • gnuplot> plot sin(x) title 'Sine Function', tan(x)

title 'Tangent'

• Discrete data contained in a file can be displayed by specifying the name of the data file (enclosed in quotes) on the plot or splot command line.

• Data files should have the data arranged in columns of numbers

• Columns should be separated by white space (tabs or spaces) only, (no commas).

• Lines beginning with a # character are treated as comments and are ignored by Gnuplot.

• A blank line in the data file results in a break in the line connecting data points.

• Customization of the axis ranges, axis labels, and plot title, as well as many other features, are specified using the set command. Specific examples of the set command follow.

• Create a title: > set title "Force-Deflection Data" • Put a label on the x-axis: > set xlabel

"Deflection (meters)" • Put a label on the y-axis: > set ylabel "Force

(kN)" • Change the x-axis range: > set xrange

[0.001:0.005] • Change the y-axis range: > set yrange [20:500]

Have Gnuplot determine ranges: > set autoscale

• Move the key: > set key 0.01,100

• Delete the key: > unset key• Put a label on the plot: > set label "yield point"

at 0.003, 260 • Remove all labels: > unset label • Plot using log-axes: > set logscale • Plot using log-axes on y-axis: > unset logscale;

set logscale y• Change the tic-marks: > set xtics

(0.002,0.004,0.006,0.008)• Return to the default tics: > unset xtics; set

xtics auto