Programming for Geographical Information Analysis: Advanced Skills
description
Transcript of Programming for Geographical Information Analysis: Advanced Skills
![Page 1: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/1.jpg)
Programming for Geographical Information Analysis:
Advanced Skills
Lecture 8: Libraries II: ScienceDr Andy Evans
![Page 2: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/2.jpg)
Things we might want to do
Mathematics:Algebra; calculus; vector and matrix mathematics; etc.
Statistics:Hypotheses testing; sample comparisons; regression; etc.
Graph theory/Network analysis:Network form analysis; statistics of centrality etc.; flow efficiency.
Text processing:Parsing; Natural Language Processing; Statistics.
![Page 3: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/3.jpg)
MathematicsStatisticsGraphs and NetworksText and Language
![Page 4: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/4.jpg)
Mathematics
The classic texts for scientific computing are the Numerical Recipes books.
Java code available to buy:http://www.nr.com/aboutJava.html
![Page 5: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/5.jpg)
Numerical Recipes
For java there is also Hang T. Lau (2003) A Numerical Library in Java for Scientists and Engineers
Colthttp://acs.lbl.gov/software/colt/JSciencehttp://www.jscience.org/JAMA http://math.nist.gov/javanumerics/jama/
![Page 6: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/6.jpg)
Maple
Commercial mathematics application (commercial licence $2,845).Does, for example, algebraic manipulation, calculus, etc.Outputs processes as C, C#, Java, Fortran, Visual Basic, and MATLAB code.C and Java APIs for program connection.
![Page 7: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/7.jpg)
Mathematics
StatisticsGraphs and NetworksText and Language
![Page 8: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/8.jpg)
Statistics
R (GNU):http://www.r-project.org/
Developed as a free version on the stats language S, combined with a functional programming language.
![Page 9: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/9.jpg)
Programming languages
We’ve dealt with Imperative Programming languages: commands about what to do to change the state of the program (i.e. its collected variables).
These are usually also Procedural, in that the program is divided into procedures to change states.
Most Procedural languages are now Object Orientated.
![Page 10: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/10.jpg)
Programming languagesThe other branch of languages allow Declarative Programming: concentrates on describing what a program should do, not how, and avoiding state changes.
Clearest examples are Functional Programming: everything is described as a reference to another function:
a = x + 10;x = y + 2;Run program for the argument y = 12
Also Logical Programming: same kind of thing but based on finding logical proofs/derivations.
Things that fall into the category mortal includes humans.Socrates is human.Run program to find if Socrates is mortal?
![Page 11: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/11.jpg)
Declarative languages
Examples: Lisp; Prolog; (bits of SQL)
Beloved of academics, but weren’t used much in the real world, until recently (except SQL).
Advantage is that they avoid unlimited internal and external state changes, therefore much easier to check and predict.
Prolog useful for language processing.
A version of Lisp, Scheme, inspired elements of R.
![Page 12: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/12.jpg)
R
Language and a series of packages.
Written in C/C++/Fortran but Java can be used.
Functional language but with procedural and OOP elements.
Uses scalars, matrices, vectors, and lists.
Can replace the GUI with a variety of alternatives.
Powerful and increasingly stats software of choice, but steep learning curve and massive range of add-on packages.
![Page 13: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/13.jpg)
RGui
![Page 14: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/14.jpg)
Packages
Lots come with it.Comprehensive R Archive Network (CRAN):http://cran.r-project.org/web/packages/available_packages_by_name.html
Packages → Set CRAN Mirror…
Packages → Install package(s)…
library() : list packageslibrary(packageName) : load package for uselibrary(help = packageName) : what’s in a packagedetach("package:packageName") : unload
![Page 15: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/15.jpg)
Example
data1 <- read.csv("m:\\r-projects\\data.tab", header = TRUE)
attach(data1)
plot(Age, Desperation, main="Age vs. Desperation")
lineeq <- lm(Desperation ~ Age, data=data1)
x <- seq(min(Age), max(Age), by=10.0)
newData <- data.frame(Age = x)
predictions <- predict(lineeq, newdata = newData)
lines(Age, predictions)detach(data1)rm (data1, lineeq, newData, predictions, x)
![Page 16: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/16.jpg)
Working with R
R uses ‘Workspace’ directories.Good practice to work in a new directory for each project (File → Change Dir…)
Dataset names etc. must have a letter before any numbers.
R constructs data objects, that can be seen with objects() and removed with rm(objectName).If you save the workspace, it saves these objects in an .RData file.
![Page 17: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/17.jpg)
Working with R
Commands can be separated by new lines or enclosed thus: {command;command;command}
If you fail to close a command, you’ll see “+”.
You can load scripts of commands. Note that on Windows you just have to be careful to adjust all filepaths, thus:source("c:\\scripts\\commands.r")
Orsource("c:/scripts/commands.r")
The scripts are just text files of commands.
![Page 18: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/18.jpg)
Quick tipsSimplest data structure is the vector of datax <- c(10.4, 5.6, 3.1, 6.4, 21.7)
attach() makes data available by column name (cp. detach(name)).
Vector elements can be searched and selected using indices or expressions in [], e.g.:y <- x[!is.na(x)] where na is “Not available”
In operations using 2 vectors, the shortest gets cycled through as much as is needed.
![Page 19: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/19.jpg)
Other data structures
Matrices or more generally arraysFactors (handle categorical data).Vectors or lists (latter can be recursive)Data frames – tables of dataFunctions (store code)
Each data element is assigned a mode, or data type: logical, numeric, complex, character or raw.
![Page 20: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/20.jpg)
Quick tips
$ can be used to look inside objects, e.g. myData$column1
Operators: +, -, *, / and ^ (i.e. raise to the power)
Functions include: log, exp, sin, cos, tan, sqrt, max, min, length, sum, mean, var (variance), sort.
![Page 21: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/21.jpg)
Help
Best start is “Introduction to R”:http://cran.r-project.org/doc/manuals/R-intro.html
?solve : help for solve functionhelp.start() : start the HTML help??solve : search help for solve?help : info other help systems
![Page 22: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/22.jpg)
R-Spatial
Large number of packages dealing with spatial analysis, including:
Mapping (incl. GoogleMap/Chart, and KML production)Point pattern and cluster analysis.Geographically Weighted Regression.Network mathematics.Kriging and other interpolation.
Excellent starting point is James Cheshire’s (CASA) :http://spatialanalysis.co.uk/r/
![Page 23: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/23.jpg)
Non-package addons
R-Forge:http://r-forge.r-project.org/
GUIs, bridges to other languages, etc.
![Page 24: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/24.jpg)
Programming R
Has its own flow control:if ( condition ) {
statement 1
} else {
statement 2
}
for (i in 1:3) print(i)
Note that this is actually a “for-each” loop - “:” just generates a list of numbers, so you can also do this:x <- c("Hello","World")
for (i in x) print(i)
![Page 25: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/25.jpg)
Programming with R
Various options, but best is rJava:http://cran.r-project.org/web/packages/rJava/index.html
Two parts:rJava itself : lets R use Java objects.JRI (Java/R Interface) : lets Java use R.
![Page 26: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/26.jpg)
JRI
Start by setting up an Rengine object.Can run it with or without an R prompt GUI.
Send in standard R commands using Rengine’s eval(String) method.
Can also assign () various values to a symbol re.assign(“x”, “10.0,20.0,30.0”);
Methods for dealing with GUI elements (see also the iPlot and JavaGD packages).
![Page 27: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/27.jpg)
Getting data back
Two mechanisms:
Push: Get back an object containing the information R would have output to the console (and a bit more).
Callback:Java provides methods which R calls when different tasks done.
![Page 28: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/28.jpg)
Push
Get back a REXP object:
Contains R output and other information.
rexp.toString() : shows content.
Can filter out information with:rexp.asDoubleArray()
rexp.asStringArray()
etc.
![Page 29: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/29.jpg)
Callback
Add an object to handle events:rengine.addMainLoopCallbacks(RMainLoopCallbacks)
Largely set up to manage user interface interaction.
RMainLoopCallbacks contains methods called at key moments, for example:rReadConsole()
Called while R is waiting for user input.
![Page 30: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/30.jpg)
Floating point numbers
Be aware that floating point numbers are rounded.
For example, in R, floating point numbers are rounded to (typically) 53 binary digits accuracy.
This means numbers may differ depending on the algorithm sequence used to generate them.
There is no guarantee that even simple floating point numbers will be accurate at large decimal places, even if they don’t appear to use them.
![Page 31: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/31.jpg)
Floating point numbers
David Goldberg (1991), “What Every Computer Scientist Should Know About Floating-Point Arithmetic”, ACM Computing Surveys, 23/1, 5–48 http://www.validlab.com/goldberg/paper.pdf
http://floating-point-gui.de/
Hacker's Delight by Henry S. Warren Jr
Randall Hyde’s “Write Great Code” series.
http://ta.twi.tudelft.nl/users/vuik/wi211/disasters.html
![Page 32: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/32.jpg)
Mathematics
Statistics
Graphs and NetworksText and Language
![Page 33: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/33.jpg)
Graph/Network maths
Graph theory deals with networks as mixes of nodes and vertices (edges).Was limited to relatively simple graphs until more data on links and more processing power.Now huge research and development area.
![Page 34: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/34.jpg)
Network statistics
Distribution/average of node degree (edges connected).
Distances:Eccentricity: distance from a node to the node furthest from it.Average path length: average eccentricity.Radius: minimum eccentricity in the graph.Diameter: maximum eccentricity in the graph.
Global clustering: how many nodes are connected in complete connection triangles (triadic closures) as a proportion of the connected triplets in the graph.
![Page 35: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/35.jpg)
Other key statistics
Centrality: various measures, including degree, but two are:Betweenness centrality: number of shortest paths passing through a node.Closeness centrality: average of shortest paths to all other nodes.
Node degree (or other) correlation: how similar are nodes to their neighbours?
![Page 36: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/36.jpg)
SoftwareMasses of software E.g. Inflow
Network CentralitySmall-World Networks Cluster Analysis Network Density Prestige / Influence Structural Equivalence Network Neighborhood External / Internal Ratio Weighted Average Path Length Shortest Paths & Path Distribution
http://en.wikipedia.org/wiki/Social_network_analysis_software
Pajek - for Large Network Analysishttp://pajek.imfm.si/doku.php?id=pajek
![Page 37: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/37.jpg)
Programming GraphsGUESS (Open Source Java program)http://graphexploration.cond.org/Nicely uses GraphML, XML for representing graphs.
JUNG libraryhttp://jung.sourceforge.net/
R: various packages, including igraph.
![Page 38: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/38.jpg)
MathematicsStatistics
Graphs and Networks
Text and Language
![Page 39: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/39.jpg)
Text analysis
Processing of text.
Natural language processing and statistics.
![Page 40: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/40.jpg)
Processing text: Regex
Java Regular Expressionsjava.util.regex
Regular expressions:Powerful search, compare (and replace) tools.
(other types of regex include direct replace options – in java regex these are separate methods)
![Page 41: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/41.jpg)
Regex
Standard java:
if ((email.indexOf(“@” > 0) &&
(email.endsWith(“.org”))) {
return true;
}
Regex version:
if(email.matches(“[A-Za-z]+@[A-Za-z]+\\.org”)) return true;
![Page 42: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/42.jpg)
Example components[abc] a, b, or c (simple class) [^abc] Any character except a, b, or c (negation) [a-zA-Z] a through z, or A through Z, inclusive (range) [a-d[m-p]] a through d, or m through p: [a-dm-p] (union) [a-z&&[def]] d, e, or f (intersection) [a-z&&[^bc]] a through z, except for b and c: [ad-z] (subtraction) [a-z&&[^m-p]] a through z, and not m through p: [a-lq-z] (subtraction). Any character (may or may not match line terminators) \d A digit: [0-9] \D A non-digit: [^0-9] \s A whitespace character: [ \t\n\x0B\f\r] \S A non-whitespace character: [^\s] \w A word character: [a-zA-Z_0-9] \W A non-word character: [^\w]? Once or not at all* Zero or more times+ One or more times
![Page 43: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/43.jpg)
Matching
Find all words that start with a number.
Pattern p = Pattern.compile(“\\d\\.*”);
Matcher m = p.matcher(stringToSearch);
while (m.find()) {
String temp = m.group();
System.out.println(temp);
}
![Page 44: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/44.jpg)
Replacing
replaceFirst(String regex, String replacement)
replaceAll(String regex, String replacement)
![Page 45: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/45.jpg)
Regex
Good start is the tutorial at:http://docs.oracle.com/javase/tutorial/essential/regex/
Also Mehran Habibi’s Java Regular Expressions.
![Page 46: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/46.jpg)
Natural Language Processing
A large part is Part of Speech (POS) Tagging:Marking up of text into nouns, verbs, etc., usually based on the location in the text and other context rules.
Often formulates these rules using machine-learning (of various kinds), training the program on corpora of marked-up text.
Used for :Text understanding.Knowledge capture and use.Text forensics.
![Page 47: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/47.jpg)
NLP Libraries
Popular are:
Natural Language Toolkit (NLTK; Python)http://www.nltk.org/
OpenNLP (Java)http://opennlp.apache.org/index.html
![Page 48: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/48.jpg)
OpenNLP
Sentence recognition and tokenising.Name extraction (including placenames).POS Tagging.Text classification.
For clear examples, see the manual at:http://opennlp.apache.org/documentation.html
![Page 49: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/49.jpg)
Other info
Other than the Numerical Recipes books, the other classic texts are Donald E. Knuth’s The Art of Computer ProgrammingFundamental Algorithms Seminumerical Algorithms Sorting and SearchingCombinatorial Algorithms
But at this stage, you’re better off getting…
![Page 50: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/50.jpg)
Other infoMichael T. Goodrich and Roberto Tamassia’s Data Structures and Algorithms in Java.
Basic java, arrays and list.Recursion in algorithms.Key mathematical algorithms.Algorithm analysis.Data storage structures (stacks, queues,
hashtables, binary trees, etc.)Search and sort.Text processing.Graph/network analysis.Memory management.
![Page 51: Programming for Geographical Information Analysis: Advanced Skills](https://reader036.fdocuments.us/reader036/viewer/2022062323/5681599b550346895dc6e51d/html5/thumbnails/51.jpg)
Next Lecture
Modelling I: Netlogo
PracticalRrJavaNetwork visualisation