Post on 28-Dec-2015
• SQL stands for « Structured Query Language »• Progamming language for database closer to natural English than the
other (based on « sentence » instead of « procedure »)• Aim is to ease the querying of data by the human and the
programmation of interfaces• Powerful functions for text recognition• Powerfull extensions for GIS (PostGIS, Oracle)• Standardized and recognized by most of the recent relational database
BUT1)...minor differences of syntax between vendors and enhanced
functions prevent easy interoperability between products2) SQL databases often imply that the development of the
interfaces is a distinct from the development of the core of the database
Interoperability problem between vendors
– possible solutions• use an intermediate layer between the database and
the interface– ODBC/JDBC (connectors used by other software by
Windows/Java)– use ORM (Object Relational Mapper) software that allows the
programmer to use the same syntax when developing interfaces
e.g : Doctrine (Open-Source)
SQL and NoSQL• SQL is pretty useful for normalized database where the control of data
integrity is important (scientific value)• ...but it is not scalable : huge amount (> 300 000) of data lower peformances)• since 4/5 years, with the explosion of the Internet there is a trend in NoSQL
database; fastr databases that can handle huge amount of data raplidly, e.g: Solr (to index Words and PDF),MongoDB, Cassandra etc...
• NoSQL offers speed, fast replication between locations, flexible structure but no control on integrity. It doesn’t replace SQL but complements it
(SQL=> control of the integrity and of the completness of data is more important than speed + good interaction with GIS
NoSQL: high availability of data on the Internet but no schema to validate integrity and not yet GIS plug in )Problem: scientic information network requires both quality control and high availiability
SQL: 4 parts• Data Query Language (DQL)
– Search and display data matching specific criteria• Data manipulation language (DML):
– modify data (insert, update, delete)– lock (atomicity of data: two user cannot modify the ame data in parallel)– use transation (rollback to the previous state of the database if a
modification fails)• Data Definition language (DDL)
– create the schema of the database (the normalised structure, the index): you can defined yourselve how to check the integrity of the database
• Data Control language (DCL)– create authorization and access rule for users
Recommandations
To ease the manipulation with SQLwhen creating a database:
– Avoid uppercase letters in field names– Avoid accented characters in field names (but you must keep them
in the content of course!)– replace white spaces with underscore – avoid at any price other non alphabetical or numerical characters– avoid giving the same name to two fields in different tables (not
always possible...) – table name in plural– field name singular– use descriptive field name (e.g: not ‘dc’ but ‘date_collected’)
Querying
Pattern: SELECT <comma-separated list of fields> FROM <name of Table> ;e.g.SELECT Locality FROM localities;SELECT Locality, Country FROM localities;SELECT * FROM Localities;
« * »=> all fields (wildcard)
Querying II
Pattern: SELECT <comma-separated list of fields> FROM <name of Table> WHERE [condition] ;e.g.SELECT pk_locality, latitude_decimals, longitude_decimals FROM localities WHERE Locality =‘Tienen’;
Querying II
Pattern: SELECT <comma-separated list of fields> FROM <name of Table> WHERE [condition] ;e.g.SELECT * FROM localities
WHERE latitude_decimals >50.80
AND latitude_decimals<50.85
Querying III (boolean)
Compare the resultSELECT * FROM localities
WHERE latitude_decimals >50.80
AND latitude_decimals<50.85SELECT * FROM localities
WHERE latitude_decimals >50.80
OR latitude_decimals<50.85
Querying IV (boolean)
Compare the resultSELECT * FROM localities
WHERE locality=‘Tienen’
AND locality=‘Bunsbeek’;SELECT * FROM localities
WHERE locality=‘Tienen’
OR locality=‘Bunsbeek’;
Querying II
Pattern: SELECT <comma-separated list of fields> FROM <name of Table> WHERE [condition] ;e.g.SELECT * FROM localities
WHERE locality <> ‘Hensberg’;
SELECT * FROM localities WHERE
locality IS NULL;
JOINING (I)SELECT *FROM specimensJOINscientific_names ONspecimens.fk_scientific_name=scientific_names.pk_scientific_name[+ WHERE CONDITION] ;
Joining II
• Exercice• Find the collectors of ‘Agostis’
SELECT collector_name, genusFROM specimensJOINscientific_names ONspecimens.fk_scientific_name=scientific_names.pk_scientific_namewhere genus='Agrostis';
Joining III
• Exercice– Find the scientific names having been collected in TienenSELECT scientific_nameFROM specimensJOINscientific_names ONspecimens.fk_scientific_name=scientific_names.pk_scientific_nameJOINlocalities ONspecimens.fk_locality=localities.pk_localitywhere locality='Tienen';
Joining III (ordering)• Exercice Find the scientific names having been collected in Tienen
SELECT scientific_nameFROM specimensJOINscientific_names ONspecimens.fk_scientific_name=scientific_names.pk_scientific_nameJOINlocalities ONspecimens.fk_locality=localities.pk_localitywhere locality='Tienen‘ORDER BY scientific_name;
Joining III
• Exercice Find the collectors of ‘Balsaminaceae’– Find the collectors of ‘Balsaminaceae’
Joining III• Exercice
– Find the collectors of ‘Balsaminaceae’SELECT collector_nameFROM specimensJOINscientific_names ONspecimens.fk_scientific_name=scientific_names.pk_scientific_nameJOINfamilies ONscientific_names.fk_family=families.pk_familywhere family='Balsaminaceae';
Views‘Save’ and make complex queries permanent in the database(useful for programming of filtering)CREATE VIEW v_specimen_names_localities
AS SELECT scientific_nameFROM specimensJOINscientific_names ONspecimens.fk_scientific_name=scientific_names.pk_scientific_nameJOINlocalities ONspecimens.fk_locality=localities.pk_locality
Search on Text Patterns (I)
a) match one position: '_';Þ ‘_’ means any character present one time
b) match several positions: '%';Þ ‘%’ means the absence or repetition of any
character
Note: white space counts for one character
Search on Text Patterns (II)
• SQL SyntaxSELECT ...WHERE field LIKE 'pattern';
• PostgresSQL SyntaxSELECT ...WHERE field SIMILAR TO 'pattern';
Search on Text Patterns (III)
Example:find the scientific names having «’e’ » as second letter of genus:SELECT scientific_name FROM scientific_names WHERE genus SIMILAR TO '_e%';
Search on Text Patterns (IV)
Example:Pattern: '_e%';Response: ‘Aegopodium’
‘Aethusa’‘Bellis’‘Betula’...
Search on Text Patterns (V)
Example:Pattern: '_e%';
Response: ‘Aegopodium’‘Aethusa’‘Bellis’‘Betula’...
Search on text pattern (VI)
• Interval of characters• Use brackets
[a-z]: any lower case letter[A-Z]: any uppercase letter[0-9]: any numer[aA]: ‘a’ or ‘A’
Search on text pattern (VII)
• Useful to control nomenclature!!• Exercice: Search the species containing
uppercase characters:
Search on text pattern (VII)
• Useful to control nomenclature• Exercice: Search the species containing
uppercase characters:SELECT *FROM scientific_namesWHERE species SIMILAR TO '%[A-Z]%';
Search on text pattern (VIII)
• Useful to control nomenclature• Exercice: Search the genus containing
uppercase letters after the first one:
Search on text pattern (VIII)
Exercice: Search the genus containing uppercase letters after the first letter:
SELECT *FROM scientific_namesWHERE genus SIMILAR TO ‘_%[A-Z]%';
Search on text pattern (IX)
• Useful to control nomenclature• Exercice: Search the genus containing more
than one word:
Search on text pattern (IX)
Exercice: Search the genus containing more than one word
SELECT *FROM scientific_namesWHERE genus SIMILAR TO '%[a-z]% %[a-z]%';
Search on text pattern (X)
• PostgreSQL is also compliant with an even more powerfull mechanism called « regular expression »– standard syntax shared by several programming
languages– allow matching complex patterns– can perform replacements and extractions
<optional if somebody ask how to group information in one row>
Group specimen collected in Tienen per CollectorSELECT array_to_string(array_agg(scientific_name), ','), collector_nameFROM specimensJOINscientific_names ONspecimens.fk_scientific_name=scientific_names.pk_scientific_nameJOINlocalities ONspecimens.fk_locality=localities.pk_localitywhere locality='Tienen'GROUP BY collector_nameORDER BY collector_name;
<optional if somebody ask how to group information in one row>
Group localities per collectorsSELECT array_to_string(array_agg(locality), ','), collector_nameFROM specimensJOINscientific_names ONspecimens.fk_scientific_name=scientific_names.pk_scientific_nameJOINlocalities ONspecimens.fk_locality=localities.pk_localityGROUP BY collector_nameORDER BY collector_name;