From legal Language to computer language (2009)

55
From Legal Language to Computer Language Radboud Winkels Emile de Maat

description

Overview of our research to (semi-)automatically get from sources of law in natural language to formal computer models of these sources.

Transcript of From legal Language to computer language (2009)

Page 1: From legal Language to computer language (2009)

From Legal Language toComputer Language

Radboud Winkels

Emile de Maat

Page 2: From legal Language to computer language (2009)

Outline

Leibniz Center for Law

From sources of law to ICT

applications

Structure

References

Content

Empirical results

Conclusions and current

research

09/08/20102

Page 3: From legal Language to computer language (2009)

Leibniz Center for Law

Computational Legal Theory and

Legal Knowledge Management

(Formal) Models of:

Legal Knowledge

Sources? Elementary legal concepts?

Constituents of norms, coherence, …

Valid Legal Reasoning

Case assessment, causality, legal comparison,

Page 4: From legal Language to computer language (2009)

Leibniz Center for Law -2

Applied Topics:

Improve quality of legal products

Legislation; decisions; advises, etc.

Inprove access to legal information

and knowledge

Support teaching and learning of legal

knowledge and skills

Legal organisations and change

management

Page 5: From legal Language to computer language (2009)

Norms and Language

09/08/20105

Page 6: From legal Language to computer language (2009)

09/08/2010

“Legal Engineering”

Legislation can be seen as

specification of a

normative system.

Legislation is

underspecified.

It suffers from anomalies:

• inconsistencies

• Circle reasoning

• open evaluative terms

• ambiguities

Page 7: From legal Language to computer language (2009)

doctrine

Case law on

legislation

From Sources of Law to ICT Applications

Case law

legislation

Sources

concepts

p1,p2,…

q1,q2,…

norms

Meta-knowledge

GTerm: This

means that and has relations with those

ApplicationsFormal

Models

Tasks and

reasoning

e-CourtCLIME

FOLaw

LLD

Sartor

…LRI-core

O(α І β)

Page 8: From legal Language to computer language (2009)

Sources of Law

Most important source of „knowledge‟

Explicite links between sources and

knowledge models essential for:

Validation

Maintenance (traceability)

Justification

Link at right level of detail (granularity)

Page 9: From legal Language to computer language (2009)

8/9/2010

From Sources of Law to Formal Models

Automatic support :

Increase quality models and efficiency process

Increase inter-coder reliability

NL text

Structured

text with

explicit and

typed refs

Model of

individual

provisions

Integrated

model of

meaning

Recognizing

and

classifying

Model

fragment

suggestions

Page 11: From legal Language to computer language (2009)

Structure Marking

Hoofdstuk 1

Paragraaf 1

Artikel 1

Lid 1

Lid 2

Artikel 2

Artikel 3

Paragraaf 2

Hoofdstuk 2

Ho

ofd

stu

k

Pa

rag

raa

f

Art

ikel

Lid

Page 12: From legal Language to computer language (2009)

Relations between Sources of Law

Legislation

Adm. Case

Law

Case Law

Doctrine

Page 13: From legal Language to computer language (2009)

Characteristics of Sources of Law

Legislation

Precise grammar for reference, clear

identity and version criteria

(Adm.) Case Law

Precise grammar for reference, precise

identity, no versions

Doctrine

Sloppy reference, no identity markings,

sloppy versioning

Page 14: From legal Language to computer language (2009)

The Structure of References: Simple References

Simple references

Name

Customs Law

Label and number

Article 1

Label, number and publication date

The law of April 13th, 2006

Indirect references

That article

Page 15: From legal Language to computer language (2009)

The Structure of References: Complex References

Multi-valued references

Articles 1, 5 and 12

Multi-layered references

Customs Law, article 5, first member

Multi-valued, multi-layered references

Customs Law, articles 1, 5, first

member, and 12

Page 16: From legal Language to computer language (2009)

The Structure of References: Ordering

Zooming in

Customs Law, article 5, first member

Zooming out

first member, article 5, Customs Law

Zooming in, then zooming out

article 5, first member, Customs Law

Page 17: From legal Language to computer language (2009)

The Structure of References: Miscellaneous

Opening words

Article 12, opening words and parts 1

and 2

Exceptions

Articles 5-21, with the exception of

article 9

Each time

Articles 5-10, each time the first

member

Page 18: From legal Language to computer language (2009)

Complete and incomplete references

Complete references

Does mention the document that is being

referred to

Customs Law, article 5, first member

Incomplete references

Does not mention the document that is

being referred to

Article 5, first member

Page 19: From legal Language to computer language (2009)

Finding references

Use context-free grammar, e.g:<article>

“article”

<designation> [[“,”] <lower_level>]

[ “-“ <designation> [[“,”] <lower_level>]

[

( [“,”]<designation> [[“,”] <lower_level>])*

“and” <designation> [[“,”] <lower_level>]

]

[[“,”] [“or”] <higher_level>]

Page 20: From legal Language to computer language (2009)

Problems

Names cannot be recognised

Add names as a list to the grammar

Headings will (falsely) be recognised

as a reference

Mark headings beforehand; use Metalex as

input

Page 21: From legal Language to computer language (2009)

Resolving references

Incomplete references

Reference needs to be completed from

context

Within a regulation, an incomplete

reference refers to the regulation itself

Within commentaries, incomplete

reference refer back to an earlier made

complete reference

Page 22: From legal Language to computer language (2009)

Automatic Parsing

1. Determine identity source

In doc: Title, citation title

In metadata

2. Parse document

“Natural language” – model sentences

3. Find references

4. Determine type reference

E.g. attribution and delegation of power;

definitions; enactment; change

5. Determine identity goal

I.e. the thing it refers to

Page 23: From legal Language to computer language (2009)

Results simple parser

99% of all simple references correctly

identified

95% of all complex references

correctly identified

Few false positives

Works adapted for Flemish law

Opsomer (2009)

Page 24: From legal Language to computer language (2009)

Causes of errors

Failing to detect a reference

Missing labels or names

Textual errors

False positives

Homonyms: a label has a second

meaning in addition to being part of a

reference

the first member

Page 25: From legal Language to computer language (2009)

Conclusions

Automatic detection of references is

entirely feasible

No complicated methods are needed;

regular grammars may suffice

Page 26: From legal Language to computer language (2009)

8/9/2010

From Sources of Law to Formal Models

From structured text to models of individual

sentences…

NL text

Structured

text with

explicit and

typed refs

Model of

individual

provisions

Integrated

model of

meaning

Recognizing

and

classifying

Model

fragment

suggestions

Page 27: From legal Language to computer language (2009)

Towards Automatic modelling

Page 28: From legal Language to computer language (2009)

Automatic modelling – Sentences (1)

Start with sentences

Independent unit.

Often marked, otherwise easy to

recognize

Different types of sentences require

different translation, different model

Page 29: From legal Language to computer language (2009)

Conclusions From Earlier Research

Dutch Law:

Provisions usually match one sentence

Several types of sentences can be easily

distinguished

Limited amount of language constructs

per type

Automatic recognition and

classification seems doable

Types not specific for Dutch law

(cf. Tiscornia e.a. for Italian law)

Page 30: From legal Language to computer language (2009)

Categories

1. Definitions

2. Deeming Provision

3. Norm –

Right/Permission

4. Norm –

Obligation/Duty

5. Application

Provision

6. Value Assignment

7. Change*

8. Delegation

9. Enactment Date

10.Citation Title

11.Penalization

Each category uses specific language

constructs that can be used to identify

them.

Page 31: From legal Language to computer language (2009)

Example: Penalisation Provision

Penalisation provisions set punishments

for breaking the law, and mark such an

act as either a misdemeanour or a crime.

Mining Act, article 133

1.Breaking article 43, sub 2, is punished

with a monetary fine of the second

category.

2.The fact marked as punishable by this

article is a misdemeanour.

Page 32: From legal Language to computer language (2009)

Example: Norms (1)

Normative sentences form the core of

each regulation, stating obligations

and rights

Rights can be denoted by a wide

range of verbs: can, may, is allowed

to, has a right to, …

Similarly, obligations can be denoted

by the use of certain verbs: is

prohibited, is charged with

Many variations

Page 33: From legal Language to computer language (2009)

Example: Norms (2)

However, obligations are often represented

as a “statement of fact”

Funeral Act, article 46, section 1

No bodies are interred on a closed cemetery.

May be about any subject

No common signal words or patterns

Preferred by the Guidelines for Legal

Drafting

Page 34: From legal Language to computer language (2009)

Experiment (1)

Classifier

Based on 88 patterns

JAVA

Based on input in which

sentences and quoted text have

already been marked (MetaLex)

Assumes a statement of fact

norm if no explicit pattern is used

Page 35: From legal Language to computer language (2009)

Experiment (2) - Lists

Lists are classified based on its header, if this contains a pattern; otherwise, each item is classified (without the header)

Tobacco Act, article 1

In this law, and in the stipulations based on it, is understood by:

a. tobacco products: … ;

b. Our Minister: …;

c. appendix: …;

Page 36: From legal Language to computer language (2009)

Experiment – Test Set

18 texts

One royal decree

Three new bills

Fourteen amending bills

All „recent‟

No overlap with the training set

654 sentences

592 „regular‟ sentences

62 lists

Page 37: From legal Language to computer language (2009)

Results per Document (1)

Source

Sentence List

TypeTotal Correct % Total Correct Partial %

Royal Decree Stb.

1945, F 214

26 23 97% 4 4 0 75%New

Bill 20 585 nr. 2 31 30 97% 4 3 1 75% New

Bill 22 139 nr. 2 22 20 91% 2 2 100% New

Bill 27 570 nr. 4 21 16 76% Change

Bill 27 611 nr. 2 11 11 100% 1 1 100% Change

Bill 30 411 nr. 2 141 128 91% 25 20 3 80% New

Bill 30 435 nr. 2 40 39 98% 4 3 1 75% Change

Bill 30 583 nr. A 27 27 100% Change

Bill 31 531 nr. 2 3 3 100% Change

Relative low score due to

a misapplied pattern (3x)

Page 38: From legal Language to computer language (2009)

Results per Document (2)

Source

Sentence List

TypeTotal Correct % Total Correct Partial %

Bill 31 537 nr. 2 29 29 100% 2 2 0 100% Change

Bill 31 540 nr. 2 7 7 100% Change

Bill 31 541 nr. 2 8 8 100% Change

Bill 31 713 nr. 2 7 6 86% 2 2 0 100% Change

Bill 31 722 nr. 2 31 22 71% 6 5 0 83% Change

Bill 31 726 nr. 2 78 67 86% 2 1 1 50% Change

Bill 31 832 nr. 2 7 7 100% 3 3 100% Change

Bill 31 833 nr. 2 4 4 100% Change

Bill 31 835 nr. 2 99 90 91% 7 4 3 57% Change

Total 592 537 91% 62 50 9 81%

Relative low score due to

a pattern appearing in an

auxiliary sentence (5x)

Page 39: From legal Language to computer language (2009)

Overall Results

91% of all regular sentences have

been correctly classified

71%-100% over laws

81% of all lists have been correctly

classified

50%-100% over laws

Page 40: From legal Language to computer language (2009)

Results per Type (1)

Type In corpus Missed False

Definition 2% 12 1 0

Norm - Right/Permission 11% 64 4 13

Norm - Duty 5% 29 0 1

Delegation 3% 19 6 0

Publication Provision 1% 4 0 0

Application Provision 7% 40 1 8

Enactment Date 3% 17 1 0

Citation Title 1% 3 0 0

Value Assignment/Change 0% 1 0 0

Penalisation 0% 0 0 2

Change 41% 241 16 8

Mixed Type 1% 3 3 0

Norm - Statement of Fact

(default) 27% 159 23 23

Total 592 55 55

Page 41: From legal Language to computer language (2009)

Results per Type (2)

Mostly norms and

modifications

right/permission 11%

obligation/duty 27% + 5%

change 41%

Several definitions and

application provisions

Barely any of the others

Page 42: From legal Language to computer language (2009)

Results – Patterns Used

TypePatterns

Known

Patterns

Used

Definition 14 5

Norm - Right/Permission 17 3

Norm - Obligation/Duty 15 8

Delegation 7 5

Publication Provision 1 1

Application Provision 5 5

Enactment Date 1 1

Citation Title 2 2

Value Assignment 8 1

Penalisation 3 1

Change - Scope 2 2

Change - Insertion 4 4

Change - Replacement 3 3

Change - Repeal 2 1

Change - Renumbering 3 2

87 44

About 50% of the

known patterns has

been used

Difference in age

between test and

training set?

Underrepresented

sentence types

Page 43: From legal Language to computer language (2009)

Problems (1)

Patterns appearing in auxiliary

sentences instead of the main

sentence

Mostly happens with rights and

application provisions:

If x has the right to …

If x is able to …

If x applies …

Page 44: From legal Language to computer language (2009)

Problems (2)

Lists need a more serious approach Some can be classified by the header

only;

Some can be classified by the list item only;

Some can only be classified by the header combined with the item.

Lists need to be converted to individual sentences (header plus list item)

Page 45: From legal Language to computer language (2009)

Minor problems

Missing patterns

Mixed sentences

Difficult to solve, but does not occur often

Patterns used for other purposes

Repeal of fines instead of repeal of

regulations

Specific patterns for specific laws

E.g. Tax Law (value assignment)

Page 46: From legal Language to computer language (2009)

Conclusions

This (symbolic) approach is feasible

Using obligation as a default category

seems acceptable

No major categories are missing

We expect it to generalise to other

Dutch regulations

The approach could be used for other

(civil) jurisdictions and languages

Biagioli et al. (2005) similar results for

Italian law but statistical approach

Page 47: From legal Language to computer language (2009)

8/9/2010

Next Step

NL text

Structured

text with

explicit and

typed refs

Model of

individual

provisions

Integrated

model of

meaning

Recognizing

and

classifying

Model

fragment

suggestions

Page 48: From legal Language to computer language (2009)

Next Step

Divide sentence in different terms that

are linked through relations

Classification (and base pattern) gives

a rough division, and a rough relation

More detailed division of the

sentences is needed

Using of Dutch grammar parsers

Page 49: From legal Language to computer language (2009)

Current Research (1)

Page 50: From legal Language to computer language (2009)

Automatic modelling – Reference parser

References are important in legal texts

Useful when the computer understands

these better

Better understanding is possible

References do not fit well in “normal

Dutch sentence structure”

Separate reference parser

Page 51: From legal Language to computer language (2009)

Things to think about – Granularity

Granulary – How far do we want to go

with the splitting of text?

Liquor: those drinks, that, at a

temperature of twenty degrees

Celsius, consist of alcohol for at least

fifteen volume percents, with the

exception of wine.

Page 52: From legal Language to computer language (2009)

Thinks to think about – Norms

Classification distinguishes only a

limited set of norms

Do we need more distinction?

For computer calculations?

For interaction with the user

Page 53: From legal Language to computer language (2009)

Things to think about - Procedures

Procedures use the same language

constructs as other norms (at least in

Dutch), but:

Procedures have a more specific

context

Procedures have a stronger ordering

Page 54: From legal Language to computer language (2009)

Overall Conclusions

Distance from Legal Language to Computer

Language is too big to cross in one step

Automatic modelling support is already

partially possible:

Structure and References

Classification of sentences in legislation

Generalisation to all Dutch legislation

possible

Same method for other languages and

jurisdictions

Generalisation to other sources of law more

difficult

Page 55: From legal Language to computer language (2009)

09/08/201059

[email protected] [email protected]

www.LeibnizCenter.org

Questions?