Databases for Renaissance and Early Modern Sources
description
Transcript of Databases for Renaissance and Early Modern Sources
Databases for Databases for Renaissance and Renaissance and
Early Modern Early Modern SourcesSources
Session Tutor: Sarah Session Tutor: Sarah Richardson Richardson
[email protected]@warwick.ac.ukc.uk
Using DatabasesUsing Databases
Databases may be used in a number Databases may be used in a number of ways to support your research.of ways to support your research.
Bibliography (see later sessions)Bibliography (see later sessions) For simple lists For simple lists To analyse complex sourcesTo analyse complex sources
OverviewOverview
Source assessment and data-Source assessment and data-modelling modelling
The challenge of sourcesThe challenge of sources How will relational databases help?How will relational databases help? Source analysisSource analysis Database design and creationDatabase design and creation Free text databasesFree text databases Methodological issuesMethodological issues
ChallengesChallenges
Unstructured source materialUnstructured source material Missing dataMissing data Complications with numbers and Complications with numbers and
datesdates Data comes from more than one Data comes from more than one
sourcesource
Databases should look Databases should look like this?like this?
Voter ID First Name Surname Address Occupation Voting Preference
001 John Smith Halifax Butcher X
002 John Smith Halifax Butcher X
003 John Smith Halifax Butcher X
004 John Smith Halifax Butcher X
Unique identifier or primary key
Column or field or attribute
Row or record
Field name or attribute name
But what do you do with But what do you do with this?this?
Letter from the Medici Granducal Archive
From Source to Database
Frankpledge: Original source from The National Archive translated to the Thame Database
How will relational How will relational databases help?databases help?
A relational database is a database A relational database is a database created with many tables linked created with many tables linked togethertogether
Each table has a common factor Each table has a common factor which links it to others in the which links it to others in the databasedatabase
For complex sources a number of For complex sources a number of tables may be created to deal with tables may be created to deal with different aspects of the datadifferent aspects of the data
Relational modelRelational modelSentence Table
Defendant IDCase NumberVerdictSentenceComments
Offences Table
Defendant IDCase NumberOffence TypePlace of OffenceDate of OffenceDescriptionComments
Occupational Categorisation Table
Occupation TitleOccupational Categorisation 1Occupational Categorisation 2
Witnesses Table
Case NumberWitness 1 First nameWitness 1 SurnameWitness 1 AddressWitness 1 SexWitness 2 First nameWitness 2 SurnameWitness 2 AddressWitness 2 SexComments
Defendant Table
Defendant IDFirst nameSurnameAddressAgeSexOccupation TitleComments
A more complex relational A more complex relational databasedatabase
Source analysisSource analysis Data should be broken down into components Data should be broken down into components
that collects groups of information into that collects groups of information into objects or events. objects or events.
For example information relating to a person, For example information relating to a person, an organisation, a document, an object or a an organisation, a document, an object or a building, or to events such as a marriage, a building, or to events such as a marriage, a transaction, the making of a will, or an transaction, the making of a will, or an election. election.
In database terminology these are referred to In database terminology these are referred to as as entitiesentities. .
Each entity will form a table in the final Each entity will form a table in the final database. database.
AttributesAttributes
Once each entity has been identified, Once each entity has been identified, list the data associated with each. list the data associated with each.
For example, the Defendant table has For example, the Defendant table has information on the first name, surname, information on the first name, surname, address, age, sex and occupation of address, age, sex and occupation of each defendant.each defendant.
This information will produce the fields This information will produce the fields for each table. for each table.
The fields are also known as The fields are also known as attributesattributes. .
Field typesField types
Text For alphabetical or numerical data but beware that numbers will be treated like text if you choose this data type.
Numbers For all numbers but you may wish to use one of the types below for currency/dates.
Date/Time For dates and/or times.
Currency In most commercial database software this is applicable only to modern currency.
AutoNumber
Allocates a unique identifier to each record. It is useful for ID fields.
Memo For fields containing much unstructured information. Useful for comments fields.
Issues for field typesIssues for field types
SizeSize CalculationsCalculations DatesDates CurrencyCurrency Unstructured dataUnstructured data Unique identifiersUnique identifiers
RelationshipsRelationships
One-to-one relationships: records in one One-to-one relationships: records in one table have only one match with records table have only one match with records in a second table. in a second table.
One-to-many relationships: records in One-to-many relationships: records in the first table match many in the second, the first table match many in the second, but those in the second table only have but those in the second table only have one match. one match.
Many-to-many relationships: records Many-to-many relationships: records from both tables have relationships from both tables have relationships between them between them
Data entry tipsData entry tips Fields may be designated as ‘required’. Fields may be designated as ‘required’. Default values may be entered. Default values may be entered. Use the tool to allow one of only two Use the tool to allow one of only two
options to be entered such as Yes/No, options to be entered such as Yes/No, True/False, Guilty/Not Guilty. True/False, Guilty/Not Guilty.
‘‘Look-up’ tables: a fixed list of values that Look-up’ tables: a fixed list of values that may be entered into a particular field. may be entered into a particular field.
Validation rules. Validation rules. Automatic generation of unique numbers. Automatic generation of unique numbers.
Free Text DatabasesFree Text Databases Free text databases search unstructured Free text databases search unstructured
texts and images provided in digital formtexts and images provided in digital form They work by ‘tagging’ the text in a mark-They work by ‘tagging’ the text in a mark-
up language (eg HTML, XML, SGML). In up language (eg HTML, XML, SGML). In the past users had to do this. Now most the past users had to do this. Now most programmes will do it for you.programmes will do it for you.
The database may then be searched in a The database may then be searched in a number of ways: full-text; wildcard number of ways: full-text; wildcard searches with * and ?; Boolean searches searches with * and ?; Boolean searches (AND, OR, and NOT); proximity searches; (AND, OR, and NOT); proximity searches; numeric searches (>, <, >=, <=, <>); numeric searches (>, <, >=, <=, <>); Date searches; Fuzzy searches Date searches; Fuzzy searches
ZoteroZoteroZotero is an easy-to-use yet powerful research tool that helps you gather, organize, and analyze sources (citations, full texts, web pages, images, and other objects), and lets you share the results of your research in a variety of ways. Zotero is an easy-to-use yet powerful research tool that helps you gather, organize, and analyze sources (citations, full texts, web pages, images, and other objects), and lets you share the results of your research in a variety of ways. It stores author, title, and publication fields and exports that information as formatted references. It also has the ability to interact, tag, and search in advanced ways. http://www.zotero.org/
http://www.zotero.org/
For anyone who writes with footnotes, Zotero is a fabulous tool. With a click of a mouse, it imports catalogue records from a library database, or JSTOR, or even Amazon, allowing a scholar to create a personal reference database on his desktop. Better still, it permits extensive annotations, keyword tagging, and hyperlinks both to other items in the database and to external materials. Some users know that it can catalogue images, too, pulling metadata from Flickr. If you already run Zotero and need to work with images, try it. The possibilities are mind-bending for those of us who work with visual resources.
Old Bailey OnlineOld Bailey Online
http://www.oldbaileyonline.org/
Methodological IssuesMethodological Issues
Nominal record linkageNominal record linkage CodingCoding Occupational analysisOccupational analysis ProsopographyProsopography Community reconstructionCommunity reconstruction
Nominal Record LinkageNominal Record Linkage Concerns all historians using data containing Concerns all historians using data containing
namesnames How do we determine that sources relate to the How do we determine that sources relate to the
same person and not another person with the same person and not another person with the same name?same name?
Particularly difficult for early modern sources Particularly difficult for early modern sources where names are not fixed. where names are not fixed.
Two problems: Two problems: The existence of multiple common names. This The existence of multiple common names. This
problem is particularly acute in local communities problem is particularly acute in local communities where certain surnames are dominant.where certain surnames are dominant.
Variation in spellings.Variation in spellings.
SolutionsSolutions
Coding surnames using Coding surnames using standardisation schemes, eg standardisation schemes, eg SOUNDEX or FISKsSOUNDEX or FISKs
Using multiple passes through the Using multiple passes through the data changing variables each time as data changing variables each time as the data is matchedthe data is matched
Using a combination of computer Using a combination of computer and manual techniquesand manual techniques
SOUNDEXSOUNDEXNumber Represents the
Letters
1 B, F, P, V
2 C, G, J, K, Q, S, X, Z
3 D, T
4 L
5 M, N
6 R
SOUNDEX rulesSOUNDEX rules Names With Double Letters: Names With Double Letters: If the surname If the surname
has any double letters, they should be treated as has any double letters, they should be treated as one letter. one letter.
Names with Letters Side-by-Side that have Names with Letters Side-by-Side that have the Same SOUNDEX Code Numberthe Same SOUNDEX Code Number : should : should be treated as one letter. For example, Jabe treated as one letter. For example, Jacksckson on or Schmior Schmidtdt. .
Names with Prefixes: Names with Prefixes: such as Van or De should such as Van or De should be coded twice with and without the prefix be coded twice with and without the prefix
Consonant Separators: Consonant Separators: If a vowel (A, E, I, O, If a vowel (A, E, I, O, U) separates two consonants that have the same U) separates two consonants that have the same SOUNDEX code, the consonant to the right of SOUNDEX code, the consonant to the right of the vowel is coded. the vowel is coded.
Problems with SOUNDEXProblems with SOUNDEX
Does not work so well for European Does not work so well for European names. Works best with names of names. Works best with names of English originEnglish origin
Does not work as well with early Does not work as well with early modern names and spelling variantsmodern names and spelling variants
One solution for early modern One solution for early modern historians is FISKhistorians is FISK
Four Letter Initial Surname Four Letter Initial Surname Codes (FISK)Codes (FISK)
Consists of letters and punctuation Consists of letters and punctuation marksmarks
Generated from first letter of a Generated from first letter of a surname variant plus up to three surname variant plus up to three further consonants from the further consonants from the surname. surname.
Vowels only used when they are the Vowels only used when they are the first letter of the surnamefirst letter of the surname
A full stop is used where no second, A full stop is used where no second, third or fourth letter is available for third or fourth letter is available for use. use.
If surname variants are deduced to If surname variants are deduced to be of the same surname base these be of the same surname base these names are considered to form a names are considered to form a distinct surname group and the distinct surname group and the same FISK is allocatedsame FISK is allocated Thus: Thus: Eyres Eyres is coded as ARS. is coded as ARS.
Group Group Ayres. Morrice Ayres. Morrice is coded as is coded as MRS. Group MRS. Group Morris Morris
Bowyer Bowyer is coded with is coded with Boyer Boyer and and Springall Springall with with Springold. Springold.
Davies Davies and and Davidson Davidson are placed in are placed in one group. one group. ap Howell ap Howell is included is included in the group in the group PowellPowell
Five letter FISKsFive letter FISKs Used to differentiate between similar but Used to differentiate between similar but
distinct surname groups. distinct surname groups. Fifth letter would normally be a Fifth letter would normally be a
distinctive letter from the end of the distinctive letter from the end of the surname, but any letter could be used, surname, but any letter could be used, and often a vowel from the start of the and often a vowel from the start of the surname would be convenient. surname would be convenient. To distinguish To distinguish Partridge Partridge from from Porter Porter (FISK = (FISK =
PRTR) an additional letter PRTR) an additional letter g g is added to make is added to make the new FISK for the new FISK for Partridge Partridge (PRTRG). The (PRTRG). The code for code for Porter Porter remains as (PRTR). remains as (PRTR).
To distinguish To distinguish Bailey Bailey from from Bloy Bloy (FISK = BLY.) (FISK = BLY.) an additional letter an additional letter y y is added to make the is added to make the new FISK for new FISK for Bailey Bailey (BLY.Y) The code for (BLY.Y) The code for Bloy Bloy remains as (BLY.)remains as (BLY.)
CodingCoding Used to be necessary because Used to be necessary because
databases could not handle large databases could not handle large amounts of text amounts of text
Historians still code: Historians still code: data entry may be speeded up by using data entry may be speeded up by using
simple codes eg. ‘M’ for married, ‘U’ for simple codes eg. ‘M’ for married, ‘U’ for unmarried, and ‘W’ for widowed but unmarried, and ‘W’ for widowed but complicated coding may complicated coding may slow slow data entry data entry down down
Is a form of close assessment of the data Is a form of close assessment of the data and may lead to the development of and may lead to the development of categories for ease categories for ease
May facilitate the process of record linkage May facilitate the process of record linkage
Deciding to codeDeciding to code Should coding take place before or Should coding take place before or
after data entry?after data entry? Should codes be letters or numbers? Should codes be letters or numbers?
Numbers mean high level of errorNumbers mean high level of error Coding schemes should make Coding schemes should make
decisions in the light of other decisions in the light of other classification systems used by classification systems used by historians. historians.
Full code book should be developed Full code book should be developed as part of the documentation to as part of the documentation to accompany the database. accompany the database.
Occupational analysisOccupational analysis Form of post-coding Form of post-coding Assist in analysing fields with Assist in analysing fields with
numerous values numerous values Most common type is categorisation Most common type is categorisation
of occupational information. of occupational information. Must be able to compare with other Must be able to compare with other
research in the field and to provide research in the field and to provide as complete a picture as possible as complete a picture as possible regarding the status and occupation regarding the status and occupation of the populationof the population
Coding schemesCoding schemes Modern historians use standardised Modern historians use standardised
occupational classification systemsoccupational classification systems Early modern historians often each Early modern historians often each
devise their own schemadevise their own schema A compromise is to use a multi-A compromise is to use a multi-
dimensional approach: each dimensional approach: each occupation is classified using several occupation is classified using several different methods. Occasionally different methods. Occasionally individual occupational titles may be individual occupational titles may be isolated where any categorisation isolated where any categorisation would destroy the nuances of work would destroy the nuances of work experiences. experiences.
ProsopographyProsopography
Mostly used for study of elitesMostly used for study of elites Database is created not from a single Database is created not from a single
source but many bringing source but many bringing biographical data togetherbiographical data together
Use relational design to avoid very Use relational design to avoid very large, multi-field databases large, multi-field databases containing many blank fieldscontaining many blank fields
Consider issues of nominal record Consider issues of nominal record linkagelinkage
Community Community ReconstructionReconstruction
Concentrates on bringing together Concentrates on bringing together all records from one placeall records from one place
Needs careful designNeeds careful design Primary methodological issue is one Primary methodological issue is one
of record linkage, so documents, of record linkage, so documents, place names and individuals may all place names and individuals may all have their own ID codeshave their own ID codes