META-NORD · Contract no. 270899 D4.4 V1.0 Page 5 of 45 1. Overall summary of the language...
Transcript of META-NORD · Contract no. 270899 D4.4 V1.0 Page 5 of 45 1. Overall summary of the language...
-
META-NORD Baltic and Nordic Branch of the European Open Linguistic
Infrastructure Project no. 270899
Deliverable 4.4
Second upload of language resources
Version No. 1.0
31/07/2012
-
Contract no. 270899
D4.4 V1.0 Page 2 of 45
Document Information
Deliverable number: D4.4
Deliverable title: Second upload of language resources
Due date of deliverable: 31/07/2012
Actual submission date
of deliverable:
31/07/2012
Main Authors: Jussi Piitulainen, Imre Bartis
Participants: All
Internal reviewer: TILDE
Workpackage: WP4
Workpackage title: Cross-national collaboration and Pilot service
Workpackage leader: UHEL
Dissemination Level: PU
Version: 1.0
Keywords: Resources, meta-data
Meta-data model applied
to metadata:
the META-SHARE V2.1 metadata model
History of Versions
Version Date Status Name of the
Author
(Partner)
Contributions Description/
Approval Level
0.3 11/07/
2012
Initial
draft
UHEL Jussi Piitulainen, Imre
Bartis
Draft
0.4 14/07/
2012
Review
Draft
Tilde Pre final review Draft
0.5 23/07/
2012
Final draft UHEL, Tilde Final review Final Draft
1.0 31/07/
2012
Final Tilde Submitted to PO Submitted to PO
EXECUTIVE SUMMARY
This report describes the second upload of language resources at M18. The second upload
contains metadata descriptions of the resources provided by META-NORD partners and
complying with the formats agreed by the META-NET projects. Data provided in this second
upload are publicly available at: http://metashare21.tilde.lv/,
http://spraakbanken.gu.se/metashare//, http://metashare.csc.fi/ and http://metashare.ut.ee/.
http://metashare21.tilde.lv/http://spraakbanken.gu.se/metashare/http://metashare.csc.fi/http://metashare.ut.ee/
-
Contract no. 270899
D4.4 V1.0 Page 3 of 45
Table of Contents Abbreviations ............................................................................................................................. 4
1. Overall summary of the language resources with metadata ............................................... 5
2. List of language resources metadata ................................................................................... 5
3. Description of the meta-data schema adopted by the consortium ...................................... 8
4. Description of nodes ........................................................................................................... 8
5. Licences used in the second batch ...................................................................................... 9
6. Feedback on upload procedure ......................................................................................... 10
7. Concluding remarks .......................................................................................................... 11
References ................................................................................................................................ 12
Appendix A: List of second batch metadata ........................................................................... 13
Appendix A: Planned and Actual ........................................................................................... 30
-
Contract no. 270899
D4.4 V1.0 Page 4 of 45
Abbreviations Table 1 Abbreviations
Abbreviation Term/definition
LRT Language resources and tools
DoW The META-NORD Description of Work document
CC Creative Commons
TILDE TILDE SIA (Latvia )
UCPH Københavns Universitet (Danmark)
UT Tartu Ülikool (Estonia)
UIB Universitetet i Bergen Organisasjonsedd (Norway)
UHEL Helsingin Yliopisto (Finland)
HI Haskoli Islands (Iceland)
LKI Lietuviu Kalbos Institutas (Lithuania)
UGOT Göteborgs Universitet (Sweden)
LRT Language Resources and Technologies
IPR Intellectual Property Rights
CLARIN Common Language Resources and Technology Infrastructure
BLARK The Basic Language Resource Kit
-
Contract no. 270899
D4.4 V1.0 Page 5 of 45
1. Overall summary of the language resources with metadata
An important aim of META-NORD is to upgrade and harmonize national language
resources and tools in order to make them widely available and usable, within languages
and across languages, with respect to their data formats.
A further central aim is the definition of standardized resource and tool metadata and
mechanisms for making these metadata harvestable, so that distributed resources and
tools can be effectively utilized in language technology applications, both in academic
research and in industry.
The META-SHARE metadata model and its supporting software have matured
significantly since the first META-NORD upload in November 2011.
The four META-SHARE nodes of META-NORD (TILDE, UGOT, UHEL and UT)
installed the beta (v2.0) of the software and made it available for editing by all partners
right after it was published by T4ME. The nodes were upgraded to v2.1 in time for the
second upload.
Since the synchronization of the nodes is not yet implemented, META-NORD made the
TILDE node their master repository for the second upload. The partners who edited in the
other nodes exported their records as XML, uploaded them to the TILDE node, and
published them there.
The metadata records from the first META-NORD upload were upgraded to the current
version of the metadata model. The automatic transformations from v1.1 to v2.0 to v2.1
were not completely reliable, so some manual labor was required.
2. List of language resources metadata In total for 174 new resources metadata have been described using META-SHARE editor
tool. Data provided in this second upload is publicly available at:
http://metashare21.tilde.lv/
http://spraakbanken.gu.se/metashare//
http://metashare.csc.fi/
http://metashare.ut.ee/
81 new resources metadata, out of which 79 delivered by UGOT and 2 by UCPH, were
semi-automatically generated xml with manual revision and didn’t require any work on
clearing up intellectual property rights related issues. The other 93 metadata were
manually added to the META-SHARE editor. Out of these 93 resources metadata 52
required work on licencing related issues, such as IPR guidance to the IPR holders,
encouraging them to make the distribution agreement flexible, negotiating user terms,
organizing a seminar with the IPR holders on licensing issues, etc.
http://metashare21.tilde.lv/http://spraakbanken.gu.se/metashare/http://metashare.csc.fi/http://metashare.ut.ee/
-
Contract no. 270899
D4.4 V1.0 Page 6 of 45
Figure 1 Work carried out to meet the META-SHARE
The 93 resources metadata that were manually added required also technical work, such
as developing the tools that were uploaded together with their metadata, converting the
LR to standard downloadable formats, gathering information on the LR from various
written documentation and from the owner of the LR, giving technical feedback to the
META-SHARE developers, promoting the use of standard formats (xml/lmf), translating
the metadata from one language to the other, etc.
Figure 2 IPR and technical work carried out
For more information on the uploaded metadata and the work carried out to meet the
META-SHARE metadata schema see Appendix A: List of second batch metadata.
Out of the 174 LRs provided for the second upload 114 are resources of the consortium,
while 60 are from outside the consortium:
81
93
75
80
85
90
95
Semi-automatically generatedmetadata
Manually generated metadata (all ofthem required technical work)
Work carried out to meet the META-SHARE metadata schema (general
overview)
52 41
93
0
20
40
60
80
100
Metadata requiring IPRrelated work
Metadata that didn'trequire IPR related work
metadata that requiredtechnical work
IPR and technical work carried out to meet the META-SHARE metadata
schema
-
Contract no. 270899
D4.4 V1.0 Page 7 of 45
Figure 3 Location of resources
Most of the LR provided for the second upload were corpora (111). The other types of
resources were lexical resources (6), lexical conceptual (29), corpus-ngrams (9), speech (3),
tools (10), lexicons (5) and ontology (1):
Figure 4 Typology of resources
79
5 4 6 0 3 8 9
114
0
32
9 2
10 0
7 0
60
0
20
40
60
80
100
120
Location of resources
Own resources
Resources from outsidethe consortium
-
Contract no. 270899
D4.4 V1.0 Page 8 of 45
3. Description of the meta-data schema adopted by the consortium
The META-SHARE metadata model is documented in the META-SHARE support
community’s internet forum.1
In the model, language resources are considered to be of four overall types listed below.
The creation of a record in the META-SHARE editor starts with the selection of one of
these types. The editor supports specifically the description of a resource of the chosen
type:
* corpus (including written/text, oral/spoken, multimodal/multimedia corpora),
* lexical / conceptual resource (including terminological resources, word lists,
semantic lexica, ontologies etc.),
* language description (including grammars),
* tool / service (including basic processing tools, applications, web services etc.
required for processing data resources).
Administrative information is the same for all resource types, e.g. the name of the
resource, the description of the resource, contact and location information, origins,
availability, and licensing.
Language resources may consist of different types of media. The model and the editor
provide separate support for the description of the following media types where
applicable:
* text (+textNumerical, textNgram),
* audio,
* image,
* video.
The details include the languages of the resource, the size of the resource in various units,
annotation information, validation, format standards to which the resource conforms,
actual and foreseen use, and technicalities like character sets and operating environment.
The metadata model is implemented as an XML schema and a supporting editor that is
accessed through a web browser. Records can be created, published and maintained in a
META-SHARE node using the editor. Anyone can browse and search the published
records. The editor can also export selected records as XML documents. Conforming
XML records can be uploaded to a META-SHARE node through the editor.
The META-SHARE XML schema itself is available at: http://metashare.ilsp.gr/META-
XMLSchema/v2.1/META-SHARE-Resource.xsd.
4. Description of nodes
All four META-NORD nodes (TILDE, UGOT, UHEL and UT) were upgraded to the
version 2.1 soon after it was released in May 2012. The version 2.1 is the first proper
release of the software, following the beta release 2.0 in March as open source2.
1 See http://metashare.ilsp.gr/portal/knowledgebase/OverviewOfTheMetadataModel
and
http://metashare.ilsp.gr/portal/knowledgebase/DetailedPresentationOfTheModel. 2 See https://github.com/metashare/META-SHARE.
http://metashare.ilsp.gr/META-XMLSchema/v2.1/META-SHARE-Resource.xsdhttp://metashare.ilsp.gr/META-XMLSchema/v2.1/META-SHARE-Resource.xsdhttp://metashare.ilsp.gr/portal/knowledgebase/OverviewOfTheMetadataModelhttp://metashare.ilsp.gr/portal/knowledgebase/DetailedPresentationOfTheModelhttps://github.com/metashare/META-SHARE
-
Contract no. 270899
D4.4 V1.0 Page 9 of 45
The version 3 of the META-SHARE software is expected to provide the means to make
the nodes a part of a synchronized network in a few months, thus making the tedious and
error-prone manual copying of the records between the networked nodes unnecessary
(export/upload works actually relatively well, but it creates copies).
TILDE’s META-SHARE node3 :
Figure 5 TILDE’s META-SHARE node
5. Licences used in the second batch
Out of the 174 language resources whose metadata was provided in the second upload
only one, the “STO-LMF, morphology” lexicon is available at a widely known resources
database, namely that of ELDA4 . The LDC Corpus Catalog
5 does not contain any of the
metadata provided in the second upload by META-NORD.
3 http://metashare21.tilde.lv/.
4 http://catalog.elra.info/.
5 http://www.ldc.upenn.edu/Catalog/.
http://metashare21.tilde.lv/http://catalog.elra.info/http://www.ldc.upenn.edu/Catalog/
-
Contract no. 270899
D4.4 V1.0 Page 10 of 45
Figure 6 Licences used in the second batch
As illustrated in the diagram above, from the 174 resources of the second batch only 12
have licences that are still under negotiation, while the licences of the others are not an
open question anymore. The most popular category of licences is CC_BY-SA_3.0, while
also CC_BY, CC_BY-NC-SA_3.0 and GPL are well represented. This proves that the
interest towards sharing resources in the spirit of open data is strong in the META-NORD
network.
6. Feedback on upload procedure
The second upload was performed in a smooth and efficient manner, due to the great
improvements to the META-SHARE software.
The first and second upload’s metadata is publicly available at:
http://metashare21.tilde.lv/
http://spraakbanken.gu.se/metashare//
http://metashare.csc.fi/
http://metashare.ut.ee/
The META-SHARE metadata model and its accompanying editor software are beginning
to function as it was promised. The META-SHARE developers have been active and
helpful during and after the beta cycle. All META-NORD partners had access to the
editor in time and were able to create the required descriptions for the second upload.
The ability to export metadata records from the META-SHARE editor as XML
documents and to upload them again has helped in the moving of the records between the
current META-NORD nodes with minimal need of human communication. These
facilities will be useful even regardless of the synchronized networking in the future
versions of META-SHARE, since they make it easy to share resource descriptions with
the wider world.
UiB in particular raised the issue of collaborative editing. There is a need for people to
help each other, and a need to control their own metadata descriptions. In v2, the editing
rights mean the right to edit anything, so collaboration is possible at the moment but there
is a considerable element of trust involved. A solution is needed so that many content
owners can be let in to describe and share their own resources, individually and
80
3 7 2 1 1 7 5 6 9 2
11 15 25
0102030405060708090
Licences used in the second batch
http://metashare21.tilde.lv/http://spraakbanken.gu.se/metashare/http://metashare.csc.fi/http://metashare.ut.ee/
-
Contract no. 270899
D4.4 V1.0 Page 11 of 45
collaboratively, without too much fear of accidental (or even malicious) stepping on other
people's toes. The developers have acknowledged this issue.
Many minor issues remain while the software continues to be refined. For example,
several people were confused when the editor would not make their metadata record
"published" because it had not been "ingested" first, and in a few cases some lost partial
records without receiving any warning message. Luckily the latter issue was quickly
corrected in the source repository.
The META-SHARE editor provides in-line documentation that is generally helpful,
together with a link to an extensive description of the whole model in detail (see chapter 3
of the present report).
The current release of the META-SHARE software provided by T4ME shows serious
improvements from the one available for the first upload. The feedback given by META-
NORD partners has been taken into consideration by T4ME. The collaboration between
the two projects goes on, since META-NORD partners are giving feedback on
shortcomings of the META-SHARE software that should be fixed for the next version of
the editor.
A recurrent minor problem has been the disappearance of metadata descriptions from the
presentation side (browsing and searching) of META-SHARE nodes (at least in UHEL
and TILDE). Sometimes the nodes also report an internal error instead of showing a
description, and sometimes the permission to delete a record is denied. Such problems
need to be reported to and addressed by the developers (in GitHub6).
7. Concluding remarks
The second upload was very successful. All in all metadata for 174 language resources
were provided, as compared to the 127 envisioned in D2.4, “Selection of resources,
agreements, detailed work plan” Appendix B Planned and Actual provides overview of
the status for envisioned resources and actually provided metadata descriptions and
location of resources. Also the number of META-SHARE nodes increased from three to
four with the release of the UT node on 28 June 2012. For the next data upload it is
planned that there will be 4 additional META-SHARE nodes.
The META-SHARE model of describing language resources, together with the
supporting software, has matured significantly during the first half of 2012. Now the
system appears to be usable in individual nodes and also provides the means to move
descriptions in and out of such nodes as XML documents. Minor problems are
continuously solved.
UiB has expressed reservations about the expressiveness of the metadata model. There
seem to be objects of interest for language researchers in the humanities whose
description would benefit from a more flexible content model than the one META-
SHARE has created (CLARIN has one). Perhaps the META-SHARE software could even
be adapted to such needs.
The most significant omission in the software turned out to be the management of editing
rights. There is a need for individual metadata descriptions to be both owned and shared
by people responsible for them.
The most significant planned feature of the software is without any doubt the
synchronization of META-SHARE nodes. This feature is expected to be usable in time
6 https://github.com/metashare/META-SHARE.
-
Contract no. 270899
D4.4 V1.0 Page 12 of 45
for the third upload. The META-NORD partners should endeavour to experience the
inevitable initial problems early and help the developers solve them by sending feedback.
The META-NORD consortium hopes that META-SHARE will be the basis of a strong
and constantly growing community keen on sharing both language resources and their
respective metadata.
References
[1] C. Federmann, B. Georgantopoulos, R. del Gratta, O. Hamon, B. Magnini, D. Mavroeidis, S. Piperidis, M. Schroeder, M. Speranza.
META-NET Deliverable D7.1.1 – META-SHARE Functional and Technical Specification, 2011 [2] S. Piperidis. The META-SHARE Language Resources Sharing Infrastructure: Principles, Challenges, Solutions. Proceedings of the
Eight International Conference on Language Resources and Evaluation (LREC'12), Istanbul, Turkey, 36-42, Beijing, 2012.
-
Contract no. 270899
D4.4 V1.0 Page 13 of 45
Appendix A: List of second batch metadata
Resource name Resource
Type
Work carried out to meet the META-SHARE
meta-data schema
Tilde
Accurat
Comparable
Corpora
corpus
Metadata manually added using the META-SHARE
editor v2.1. Clearing up licensing issues: IPR free
resource.
Corpus of Latvian
literature corpus
Metadata manually added using the META-SHARE
editor v2.1. Clearing up licensing issues. Content of
corpus is prepared for download: content is
transformed from propriatery format into XML,
UTF-8 encoding is used.
EASTIN-CL
Multilingual
Ontology of
Assistive
Technology
ontology
Updated the batch 1 metadata to conform with the
requirements and possibilities of the new META-
SHARE editor v2.1. To validate the data, removed
"und" language tags, updated other language codes
from the three-letter ISO 639 standard (e.g. "eng"
English) to the two-letter standard (e.g. "en"
English).
Estonian-Latvian
dictionary
LexicalConcep
tual
Updated the batch 1 metadata to conform with the
requirements of the new META-SHARE editor
v2.1 - and to conform with the information added to
the batch 2 upload.
EuroTermBank LexicalConcep
tual
Updated the batch 1 metadata to conform with the
requirements of the new META-SHARE editor
v2.1 - and to conform with the information added to
the batch 2 upload.
Latvian-English
Ngram corpus,
Legislation of
Republic of Latvia
corpus
Metadata manually added using the META-SHARE
editor v2.1. Clearing up licensing issues. Content of
corpus is updated according to standarts and
prepared for download: content is transformed from
propriatery format into TMX, UTF-8 encoding is
used.
Latvian-Lithuanian
dictionary
LexicalConcep
tual
Updated the batch 1 metadata to conform with the
requirements of the new META-SHARE editor
v2.1 - and to conform with the information added to
the batch 2 upload.
Lithuanian-Latvian
dictionary
LexicalConcep
tual
Updated the batch 1 metadata to conform with the
requirements of the new META-SHARE editor
v2.1 - and to conform with the information added to
the batch 2 upload.
Latvian-Russian
Person Names
Glossary
LexicalConcep
tual
Metadata manually added using the META-SHARE
editor v2.1. Clearing up licensing issues:
MSCommons_BY-NC-ND. Content of corpus is
updated according to standarts and prepared for
-
Contract no. 270899
D4.4 V1.0 Page 14 of 45
Resource name Resource
Type
Work carried out to meet the META-SHARE
meta-data schema
Tilde
download: content is transformed from propriatery
format into TMX, UTF-8 encoding is used.
Resource name Resource
Type
Work carried out to meet the META-SHARE
meta-data schema
UCPH
DanNet Lexicon
Updated the batch 1 metadata to conform with the
requirements and possibilities of the new META-
SHARE editor v2.1. Added documentation in
English.
Finnish-Danish
Linked Wordnets Lexicon
Assigned metadata to conform with the
requirements and possibilities of the new META-
SHARE editor v2.1.
Danish-Swedish
Linked Wordnets lexicon
Assigned metadata to conform with the
requirements and possibilities of the new META-
SHARE editor v2.1.
Finnish Estonian
Linked Wordnets lexicon
Assigned metadata to conform with the
requirements and possibilities of the new META-
SHARE editor v2.1.
Finnish-Swedish
Linked Wordnets Lexicon
Assigned 1 metadata to conform with the
requirements and possibilities of the new META-
SHARE editor v2.1.
STO-LMF Lexical
resource
Updated the batch 1 metadata to conform with the
requirements of the new META-SHARE editor
v2.1 - and to conform with the information added to
the batch 2 upload.
Copenhagen
Danish-English
Dependency
Treebank
Corpus Semi-automatically generated with manual revision
Copenhagen
Dependency
Treebank
Corpus Semi-automatically generated with manual revision
Resource name Resource
Type
Work carried out to meet the META-SHARE
meta-data schema
UT
The Estonian
Reference Corpus corpus
Checking and specifying the information required
for filling in the META-SHARE editor’s mandatory
fields, clearing up licensing issues; the latter is still
in progress.
-
Contract no. 270899
D4.4 V1.0 Page 15 of 45
Resource name Resource
Type
Work carried out to meet the META-SHARE
meta-data schema
UT
Estonian Corpus
with shallow
syntactic
annotation
corpus
Checking and specifying the information required
for filling in the META-SHARE editor’s mandatory
fields, clearing up licensing issues; the latter is still
in progress.
Morphosyntactic
disambiguator and
shallow parser for
Estonian
tool
Manually added using the META-SHARE editor
v2.1, gathering information from various written
documentation.
Corpus of Institute
of the Estonian
Language
corpus Manually added using the META-SHARE editor
v2.1
Dictionary of
Standard Estonian
ÕS 2006
LexicalConcep
tual
Manually added using the META-SHARE editor
v2.1
English-Estonian
Machine
Translation
Dictionary
LexicalConcep
tual
Manually added using the META-SHARE editor
v2.1
Estonian Emotional
Speech Corpus corpus
Manually added using the META-SHARE editor
v2.1
Estonian-Russian
Dictionary
LexicalConcep
tual
Manually added using the META-SHARE editor
v2.1
Morphological
Toolset for
Estonian
tool Manually added using the META-SHARE editor
v2.1
Estonian WordNet LexicalConcep
tual
Updated the batch 1 metadata to conform with the
requirements and possibilities of the new META-
SHARE editor.
The database of
Estonian multi-
word expressions
LexicalConcep
tual
Updated the batch 1 metadata to conform with the
requirements and possibilities of the new META-
SHARE editor.
Corpus of
morphologically
disambiguated
Estonian texts
Corpus
Updated the batch 1 metadata to conform with the
requirements and possibilities of the new META-
SHARE editor.
Estonian-English
parallel corpus Corpus
Updated the batch 1 metadata to conform with the
requirements and possibilities of the new META-
SHARE editor.
Estonian Treebank Corpus
Updated the batch 1 metadata to conform with the
requirements and possibilities of the new META-
SHARE editor.
Semantically
disambiguated
corpus of Estonian
Corpus
Updated the batch 1 metadata to conform with the
requirements and possibilities of the new META-
SHARE editor.
https://extranet.tilde.lv/metanord/Lists/Data%20upload%20Batch%202/DispForm.aspx?ID=170
-
Contract no. 270899
D4.4 V1.0 Page 16 of 45
Resource name Resource
Type
Work carried out to meet the META-SHARE
meta-data schema
UIB
Norwegian
newspaper corpus Corpus
Manually added using the META-SHARE editor
v2.1, including technical feedback to the META-
SHARE developers. Information collected from
written documentation; IPR matters resolved
through contact with the distributors (Språkbanken -
The National Library of Norway).
Norwegian wordnet LexicalConcep
tual
Manually added using the META-SHARE editor
v2.1, including technical feedback to the META-
SHARE developers. Information collected from
written documentation; IPR matters resolved
through contact with the distributors (Språkbanken -
The National Library of Norway).
n-grams for
Norwegian bokmål
and nynorsk
corpus -
ngrams
Manually added using the META-SHARE editor
v2.1, including technical feedback to the META-
SHARE developers. Information collected from
written documentation; IPR matters resolved
through contact with the distributors (Språkbanken -
The National Library of Norway).
The Norwegian
Language Council's
dictionary from
Norwegian Bokmål
to Nynorsk
LexicalConcep
tual
The resource only existed in HTML at the
Norwegian Language Council's (LC) webpages.
META-NORD/UIB has arranged for Språkbanken
(the National Library of Norway) to make the
resource freely downloadable. META-NORD/UIB
manually added metadata using the META-SHARE
editor v2.1, including technical feedback to the
META-SHARE developers. Information collected
from the Norw. Language Council. META-
NORD/UIB has promoted the use of standard
formats (xml/lmf) and encouraged the IPR holder
and Distributor to make the distribution agreement
as flexible (open and extendable) as possible. Some
of the lists have already been converted to
downloadable format through the treebanking
project INESS, and are made available to the
distributor.
The Norwegian
Language Council's
list over language
names in
Norwegian
LexicalConcep
tual
The resource only existed in HTML at the
Norwegian Language Council's (LC) webpages.
META-NORD/UIB has arranged for Språkbanken
(the National Library of Norway) to make the
resource freely downloadable. META-NORD/UIB
manually added metadata using the META-SHARE
editor v2.1, including technical feedback to the
META-SHARE developers. Information collected
from the Norw. Language Council. META-
NORD/UIB has promoted the use of standard
-
Contract no. 270899
D4.4 V1.0 Page 17 of 45
Resource name Resource
Type
Work carried out to meet the META-SHARE
meta-data schema
UIB
formats (xml/lmf) and encouraged the IPR holder
and Distributor to make the distribution agreement
as flexible (open and extendable) as possible. Some
of the lists have already been converted to
downloadable format through the treebanking
project INESS, and are made available to the
distributor.
The Norwegian
Language Council's
list over
geographical names
LexicalConcep
tual
The resource only existed in HTML at the
Norwegian Language Council's (LC) webpages.
META-NORD/UIB has arranged for Språkbanken
(the National Library of Norway) to make the
resource freely downloadable. META-NORD/UIB
manually added metadata using the META-SHARE
editor v2.1, including technical feedback to the
META-SHARE developers. Information collected
from the Norw. Language Council. META-
NORD/UIB has promoted the use of standard
formats (xml/lmf) and encouraged the IPR holder
and Distributor to make the distribution agreement
as flexible (open and extendable) as possible. Some
of the lists have already been converted to
downloadable format through the treebanking
project INESS, and are made available to the
distributor.
The Norwegian
Language Council's
list over names of
historical events
and persons
LexicalConcep
tual
The resource only existed in HTML at the
Norwegian Language Council's (LC) webpages.
META-NORD/UIB has arranged for Språkbanken
(the National Library of Norway) to make the
resource freely downloadable. META-NORD/UIB
manually added metadata using the META-SHARE
editor v2.1, including technical feedback to the
META-SHARE developers. Information collected
from the Norw. Language Council. META-
NORD/UIB has promoted the use of standard
formats (xml/lmf) and encouraged the IPR holder
and Distributor to make the distribution agreement
as flexible (open and extendable) as possible. Some
of the lists have already been converted to
downloadable format through the treebanking
project INESS, and are made available to the
distributor.
The Norwegian
Language Council's
list over names in
Norwegian of
inhabitants
LexicalConcep
tual
The resource only existed in HTML at the
Norwegian Language Council's (LC) webpages.
META-NORD/UIB has arranged for Språkbanken
(the National Library of Norway) to make the
resource freely downloadable. META-NORD/UIB
-
Contract no. 270899
D4.4 V1.0 Page 18 of 45
Resource name Resource
Type
Work carried out to meet the META-SHARE
meta-data schema
UIB
manually added metadata using the META-SHARE
editor v2.1, including technical feedback to the
META-SHARE developers. Information collected
from the Norw. Language Council. META-
NORD/UIB has promoted the use of standard
formats (xml/lmf) and encouraged the IPR holder
and Distributor to make the distribution agreement
as flexible (open and extendable) as possible. Some
of the lists have already been converted to
downloadable format through the treebanking
project INESS, and are made available to the
distributor.
The Norwegian
Language Council's
list over names in
Norwegian of
public departments
LexicalConcep
tual
The resource only existed in HTML at the
Norwegian Language Council's (LC) webpages.
META-NORD/UIB has arranged for Språkbanken
(the National Library of Norway) to make the
resource freely downloadable. META-NORD/UIB
manually added metadata using the META-SHARE
editor v2.1, including technical feedback to the
META-SHARE developers. Information collected
from the Norw. Language Council. META-
NORD/UIB has promoted the use of standard
formats (xml/lmf) and encouraged the IPR holder
and Distributor to make the distribution agreement
as flexible (open and extendable) as possible. Some
of the lists have already been converted to
downloadable format through the treebanking
project INESS, and are made available to the
distributor.
The Norwegian
Language Council's
list over names in
Norwegian of states
LexicalConcep
tual
The resource only existed in HTML at the
Norwegian Language Council's (LC) webpages.
META-NORD/UIB has arranged for Språkbanken
(the National Library of Norway) to make the
resource freely downloadable. META-NORD/UIB
manually added metadata using the META-SHARE
editor v2.1, including technical feedback to the
META-SHARE developers. Information collected
from the Norw. Language Council. META-
NORD/UIB has promoted the use of standard
formats (xml/lmf) and encouraged the IPR holder
and Distributor to make the distribution agreement
as flexible (open and extendable) as possible. Some
of the lists have already been converted to
downloadable format through the treebanking
project INESS, and are made available to the
distributor.
-
Contract no. 270899
D4.4 V1.0 Page 19 of 45
Resource name Resource
Type
Work carried out to meet the META-SHARE
meta-data schema
UIB
n-gram for Danish
(based on the NST
text corpus)
Corpus -
ngrams
Manually added using the META-SHARE editor
v2.1, including technical feedback to the META-
SHARE developers. Information collected from
written documentation; IPR matters resolved
through contact with the distributors (Språkbanken -
The National Library of Norway).
n-gram for
Norwegian Bokmål
(based on NNC)
Corpus -
ngrams
Manually added using the META-SHARE editor
v2.1, including technical feedback to the META-
SHARE developers. Information collected from
written documentation; IPR matters resolved
through contact with the distributors (Språkbanken -
The National Library of Norway).
n-gram for
Norwegian Bokmål
(based on NST
news text)
Corpus -
ngrams
Manually added using the META-SHARE editor
v2.1, including technical feedback to the META-
SHARE developers. Information collected from
written documentation; IPR matters resolved
through contact with the distributors (Språkbanken -
The National Library of Norway).
n-gram for
Norwegian Bokmål
(based on NNC and
NST news text)
Corpus -
ngrams
Manually added using the META-SHARE editor
v2.1, including technical feedback to the META-
SHARE developers. Information collected from
written documentation; IPR matters resolved
through contact with the distributors (Språkbanken -
The National Library of Norway).
n-gram for
Norwegian
Nynorsk (based on
NNC and NST)
Corpus -
ngrams
Manually added using the META-SHARE editor
v2.1, including technical feedback to the META-
SHARE developers. Information collected from
written documentation; IPR matters resolved
through contact with the distributors (Språkbanken -
The National Library of Norway).
n-gram for Swedish
(based on the NST
Text Corpus)
Corpus -
ngrams
Manually added using the META-SHARE editor
v2.1, including technical feedback to the META-
SHARE developers. Information collected from
written documentation; IPR matters resolved
through contact with the distributors (Språkbanken -
The National Library of Norway).
Dependency Part of
BulTreeBank Corpus
Manually added using the META-SHARE editor
v2.1, including technical feedback to the META-
SHARE developers. Information collected through
a meeting with the resource provider and from
written documentation. META-NORD added the
resource to the INESS treebank infrastructure.
The
Morphologically
Annotated Part of
BulTreeBank
Corpus
Manually added using the META-SHARE editor
v2.1, including technical feedback to the META-
SHARE developers. Information collected through
a meeting with the resource provider and from
-
Contract no. 270899
D4.4 V1.0 Page 20 of 45
Resource name Resource
Type
Work carried out to meet the META-SHARE
meta-data schema
UIB
written documentation. META-NORD added the
resource to the INESS treebank infrastructure.
Acoustic database
for Danish Speech
Updated the batch 1 metadata to conform with the
requirements and possibilities of the new META-
SHARE editor v2.1 (among other things, the v2.1
allows to add more than one RestrictionOfUse, and
UIB had to update language codes from the three-
letter ISO 639 standard [e.g. "eng" English] to the
two-letter standard [e.g. "en" English]).
Acoustic database
for Norwegian Speech
Updated the batch 1 metadata to conform with the
requirements and possibilities of the new META-
SHARE editor v2.1 (among other things, the v2.1
allows to add more than one RestrictionOfUse, and
UIB had to update language codes from the three-
letter ISO 639 standard [e.g. "eng" English] to the
two-letter standard [e.g. "en" English]).
Acoustic database
for Swedish Speech
Updated the batch 1 metadata to conform with the
requirements and possibilities of the new META-
SHARE editor v2.1 (among other things, the v2.1
allows to add more than one RestrictionOfUse, and
UIB had to update language codes from the three-
letter ISO 639 standard [e.g. "eng" English] to the
two-letter standard [e.g. "en" English]).
Lexical database
for Danish
LexicalConcep
tual
Updated the batch 1 metadata to conform with the
requirements and possibilities of the new META-
SHARE editor v2.1 (among other things, the v2.1
allows to add more than one RestrictionOfUse, and
UIB had to update language codes from the three-
letter ISO 639 standard [e.g. "eng" English] to the
two-letter standard [e.g. "en" English]).
Lexical database
for Norwegian
LexicalConcep
tual
Updated the batch 1 metadata to conform with the
requirements and possibilities of the new META-
SHARE editor v2.1 (among other things, the v2.1
allows to add more than one RestrictionOfUse, and
UIB had to update language codes from the three-
letter ISO 639 standard [e.g. "eng" English] to the
two-letter standard [e.g. "en" English]).
Lexical database
for Swedish
LexicalConcep
tual
Updated the batch 1 metadata to conform with the
requirements and possibilities of the new META-
SHARE editor v2.1 (among other things, the v2.1
allows to add more than one RestrictionOfUse, and
UIB had to update language codes from the three-
letter ISO 639 standard [e.g. "eng" English] to the
two-letter standard [e.g. "en" English]).
-
Contract no. 270899
D4.4 V1.0 Page 21 of 45
Resource name Resource
Type
Work carried out to meet the META-SHARE
meta-data schema
UIB
Norsk ordbank,
bokmål
LexicalConcep
tual
Updated the batch 1 metadata to conform with the
requirements and possibilities of the new META-
SHARE editor v2.1 (among other things, the v2.1
allows to add more than one RestrictionOfUse, and
UIB had to update language codes from the three-
letter ISO 639 standard [e.g. "eng" English] to the
two-letter standard [e.g. "en" English]).
Norsk ordbank,
nynorsk
LexicalConcep
tual
Updated the batch 1 metadata to conform with the
requirements and possibilities of the new META-
SHARE editor v2.1 (among other things, the v2.1
allows to add more than one RestrictionOfUse).
UIB discussed language codes with the META-
SHARE developers and, among other things had to
update language codes from the three-letter ISO
639 standard [e.g. "eng" English] to the two-letter
standard [e.g. "en" English]).
SCARRIE lexicon LexicalConcep
tual
Updated the batch 1 metadata to conform with the
requirements and possibilities of the new META-
SHARE editor v2.1 (among other things, the v2.1
allows to add more than one RestrictionOfUse).
UIB discussed language codes with the META-
SHARE developers and, among other things had to
update language codes from the three-letter ISO
639 standard [e.g. "eng" English] to the two-letter
standard [e.g. "en" English]).
TRIS Spanish-
German parallel
corpus v0.1
Corpus
Updated the batch 1 metadata to conform with the
requirements and possibilities of the new META-
SHARE editor v2.1.UIB is negotiation the Location
of distribution of this resource as well as licensing,
among others we discuss distribution through
ELDA.
Oslo-Bergen tagger tool
Updated the batch 1 metadata to conform with the
requirements and possibilities of the new META-
SHARE editor v2.1.
Frequency lists
(tokens) from
NoWaC -
Norwegian Web as
Corpus
Corpus -
ngrams
Manually added using the META-SHARE editor
v2.1, including technical feedback to the META-
SHARE developers. Information, including IPR
matters, collected through direct contact with
resource providers and from written documentation.
Frequency lists
(lemmas) from
NoWaC -
Norwegian Web as
Corpus
Corpus -
ngrams
Manually added using the META-SHARE editor
v2.1, including technical feedback to the META-
SHARE developers. Information, including IPR
matters, collected through direct contact with
resource providers and from written documentation.
-
Contract no. 270899
D4.4 V1.0 Page 22 of 45
Resource name Resource
Type
Work carried out to meet the META-SHARE
meta-data schema
UIB
NoWaC -
Norwegian Web as
Corpus
Corpus
Manually added using the META-SHARE editor
v2.1, including technical feedback to the META-
SHARE developers. Information, including IPR
matters, collected through direct contact with
resource providers and from written documentation.
TRIS Spanish-
German parallel
corpus v0.2
Corpus
Manually added using the META-SHARE editor
v2.1, including technical feedback to the META-
SHARE developers. UIB is negotiation the
Location of distribution of this resource as well as
licensing, among others we discuss distribution
through ELDA.
UHR's termbase for
Norwegian higher
education
institutions
LexicalConcep
tual
The resource existed in searchable form at the
Norwegian Association of Higher Education
Institutions (UHR). UIB assisted UHR in choosing
a license, signed a Depositor's agreement with
UHR. UIB has converted the term base from an
implicitly structured HTML table into the standard
TBX format and is distributing the TBX files
through Github since META-SHARE does not yet
offer a server for data uploads. Manually added
using the META-SHARE editor v2.1, including
technical feedback to the META-SHARE
developers.
Sofie monolingual
treebank Corpus
Updated the batch 1 metadata to conform with the
requirements and possibilities of the new META-
SHARE editor v2.1 (RestrictionsOfUse, language
codes). UIB has also used the recent META-
SHARE legal templates and re-negotiated the user
terms for the novel "Sofies verden" [Sophie's
world]. We now have a formal clearance from the
IPR holder, and are in the process of getting a
formal Depositor'a agreement (including the license
conditions) from the author of the novel.
Sofie multilingual
treebank Corpus
Updated the batch 1 metadata to conform with the
requirements and possibilities of the new META-
SHARE editor v2.1 (RestrictionsOfUse, language
codes). UIB is in the process of renegotiating the
user terms for the translations of the original novel
in Norwegian, "Sofies verden" [Sophie's world]: we
are sending formal requests to publishing houses,
based on the successful clearance of the original.
The aim is to make everybody sign a META-
SHARE Depositor's agreement.
-
Contract no. 270899
D4.4 V1.0 Page 23 of 45
Resource name Resource
Type
Work carried out to meet the META-SHARE
meta-data schema
UHEL
International
Corpus of Learner
Finnish (ICLFI)
Corpus
Seminar with the members of the corpus project,
gathering the information required for filling in the
META-SHARE editor’s mandatory fields, IPR
guidance, clearing up licensing issues.
Helsinki University
Conversation Data
Archive
Corpus
Gathering the information required for filling in the
META-SHARE editor’s mandatory fields, IPR
guidance, clearing up licensing issues.
Oulu corpus Corpus
Gathering the information required for filling in the
META-SHARE editor’s mandatory fields, clearing
up licensing issues.
Geographic Names
Register of the
National Land
Survey
LexicalConcep
tual
Gathering the information required for filling in the
META-SHARE editor’s mandatory fields, IPR
guidance, clearing up licensing issues, translating
the metadata from Finnish to English.
Samples of Spoken
Finnish Corpus
Gathering the information required for filling in the
META-SHARE editor’s mandatory fields, IPR
guidance, clearing up licensing issues, translating
the metadata from Finnish to English.
Finland-Swedish
Text Collection Corpus
Gathering the information required for filling in the
META-SHARE editor’s mandatory fields, clearing
up licensing issues.
Finnish Text
Collection Corpus
Gathering the information required for filling in the
META-SHARE editor’s mandatory fields, clearing
up licensing issues.
The Helsinki
Corpus of English
Texts
Corpus
Gathering the information required for filling in the
META-SHARE editor’s mandatory fields, IPR
guidance, negotiations and a seminar with the IPR
holders on licensing issues.
WWW-Lemmie Tool
Gathering the information required for filling in the
META-SHARE editor’s mandatory fields, clearing
up licensing issues.
Finnish WordNet LexicalConcep
tual
Developing the tool, ensuring that the current
version is available and functional, filling in the
META-SHARE editor’s mandatory fields.
Updating the batch 1 metadata to conform with the
requirements and possibilities of the v2.1 META-
SHARE editor.
Open morphology
for Finnish Tool
Developing the tool, ensuring that the current
version is available and functional, filling in the
META-SHARE editor’s mandatory fields.
Helsinki Finite-
State Transducer
Technology
Tool
Developing the tool, ensuring that the current
version is available and functional, filling in the
META-SHARE editor’s mandatory fields.
FinnTreeBank 2 Tool
Developing the tool, ensuring that the current
version is available and functional, filling in the
META-SHARE editor’s mandatory fields.
-
Contract no. 270899
D4.4 V1.0 Page 24 of 45
Resource name Resource
Type
Work carried out to meet the META-SHARE
meta-data schema
HI
CombiTagger Tool
Metadata manually added using the META-SHARE
editor v2.1. IPR resolved through contact with the
distsributor (Reykjavík University) .
IceNLP - Tagger,
Parser, Lemmatizer Tool
Metadata manually added using the META-SHARE
editor v2.1. IPR resolved through contact with the
distributor (Reykjavík University) .
Apertium-is-en
Translation System Tool
Metadata manually added using the META-SHARE
editor v2.1. IPR resolved through contact with the
distributor (Reykjavík University) .
Tagged Icelandic
Corpus (MÍM) Corpus
Metadata manually added using the META-SHARE
editor v2.1. IPR resolved through contact with the
individual copyright holders (work performed by
The Arni Magnusson Institute for Icelandic Studies
before META-NORD project started) and the
distributor (The Arni Mangusson Institute for
Icelandic Studies). The META-NORD project has
enabled further work to be performed on the corpus,
such as finishing (automatic) tagging, converting to
standard format (TEI-conformant XML-format) and
some work on the search interface.
Database of
Modern Icelandic
Inflection (DMII)
LexicalConcep
tual
Metadata manually added using the META-SHARE
editor v2.1. IPR resolved through contact with the
distributor (The Arni Magnusson Institute for
Icelandic Studies).
Icelandic Term
Bank –
Terminology
LexicalConcep
tual
Metadata manually added using the META-SHARE
editor v2.1. IPR resolved through contact with
individual copyright holders (owners of individual
terminologies) and the distributor (The Arni
Mangusson Institute for Icelandic Studies).
Terminologies will be converted to standard format
(xml/TBX) and made available for download.
Distribution will be by license CC BY_SA_3.0.
Íslenskur
orðasjóður - Large
Corpus
Corpus
Metadata manually added using the META-SHARE
editor v2.1. IPR resolved through contact with the
distributor (Deutscher Wortschatz and Erla
Hallsteinsdóttir) .
The Jensson
Corpus Corpus
Metadata manually added using the META-SHARE
editor v2.1. IPR resolved through contact with the
distributor (Arnar Jensson) .
The Thor Corpus Corpus
Metadata manually added using the META-SHARE
editor v2.1. IPR resolved through contact with the
distributor (Arnar Jensson) .
-
Contract no. 270899
D4.4 V1.0 Page 25 of 45
Resource name Resource
Type
Work carried out to meet the META-SHARE
meta-data schema
HI
The Broadcast
News RUV-1
Corpus
Corpus
Metadata manually added using the META-SHARE
editor v2.1. IPR resolved through contact with the
distributor (Arnar Jensson) .
Resource name Resource
Type
Work carried out to meet the META-SHARE
meta-data schema
LKI
Geoinformational
Database of
Lithuanian
Toponyms
LexicalConcep
tual
Metadata manually added using the META-SHARE
editor v2.1. IPR resolved. Conversion to standard
downloadable formats added.
Database of
Lithuanian
Historical Ethnic
Place Names
LexicalConcep
tual
Metadata manually added using the META-SHARE
editor v2.1. IPR resolved. Conversion to standard
downloadable formats added.
Database
Synonymy of
Lithuanian Terms
LexicalConcep
tual
Metadata manually added using the META-SHARE
editor v2.1. IPR resolved. Conversion to standard
downloadable formats added.
Resource name Resource
Type
Work carried out to meet the META-SHARE
meta-data schema
UGOT
Academic texts –
Humanities corpus
Semi-automatically generated xml with manual
revision
Adult bloggers corpus Semi-automatically generated xml with manual
revision
Astra corpus Semi-automatically generated xml with manual
revision
August Strindberg's
letters corpus
Semi-automatically generated xml with manual
revision
August Strindberg's
novels corpus
Semi-automatically generated xml with manual
revision
Bellman corpus Semi-automatically generated xml with manual
revision
Blog mix corpus Semi-automatically generated xml with manual
revision
Bonnier novels I
(1976/77) corpus
Semi-automatically generated xml with manual
revision
Bonniers novels II
(1980/81) corpus
Semi-automatically generated xml with manual
revision
Corpus for health
care technical
language
corpus Semi-automatically generated xml with manual
revision
Corpus Oral de
Referencia del corpus
Semi-automatically generated xml with manual
revision
-
Contract no. 270899
D4.4 V1.0 Page 26 of 45
Resource name Resource
Type
Work carried out to meet the META-SHARE
meta-data schema
UGOT
Español
Contemporáneo
(SOL)
DiabetologNytt
(1996–1999) corpus
Semi-automatically generated xml with manual
revision
DN 1987 corpus Semi-automatically generated xml with manual
revision
Dramawebben
(demo) corpus
Semi-automatically generated xml with manual
revision
Essayistic literature
1970-2011 corpus
Semi-automatically generated xml with manual
revision
Fiction 1970-2011 corpus Semi-automatically generated xml with manual
revision
FNB 1999 corpus Semi-automatically generated xml with manual
revision
FNB 2000 corpus Semi-automatically generated xml with manual
revision
Forskning &
Framsteg corpus
Semi-automatically generated xml with manual
revision
GP – Två dagar corpus Semi-automatically generated xml with manual
revision
GP 1994 corpus Semi-automatically generated xml with manual
revision
GP 2001 corpus Semi-automatically generated xml with manual
revision
GP 2002 corpus Semi-automatically generated xml with manual
revision
GP 2003 corpus Semi-automatically generated xml with manual
revision
GP 2004 corpus Semi-automatically generated xml with manual
revision
GP 2005 corpus Semi-automatically generated xml with manual
revision
GP 2006 corpus Semi-automatically generated xml with manual
revision
GP 2007 corpus Semi-automatically generated xml with manual
revision
GP 2008 corpus Semi-automatically generated xml with manual
revision
GP 2009 corpus Semi-automatically generated xml with manual
revision
GP 2010 corpus Semi-automatically generated xml with manual
revision
GP 2011 corpus Semi-automatically generated xml with manual
revision
-
Contract no. 270899
D4.4 V1.0 Page 27 of 45
Resource name Resource
Type
Work carried out to meet the META-SHARE
meta-data schema
UGOT
Hanken corpus Semi-automatically generated xml with manual
revision
Hufvudstadsbladet
1991 corpus
Semi-automatically generated xml with manual
revision
Hufvudstadsbladet
1998 corpus
Semi-automatically generated xml with manual
revision
Hufvudstadsbladet
1999 corpus
Semi-automatically generated xml with manual
revision
Jakobstads Tidning
1999 corpus
Semi-automatically generated xml with manual
revision
Jakobstads Tidning
2000 corpus
Semi-automatically generated xml with manual
revision
Källan 2008-2010 corpus Semi-automatically generated xml with manual
revision
Lagtexter 1990–
2000 corpus
Semi-automatically generated xml with manual
revision
Läkartidningen
medical journal
1996
corpus Semi-automatically generated xml with manual
revision
Läkartidningen
medical journal
1997
corpus Semi-automatically generated xml with manual
revision
Läkartidningen
medical journal
1998
corpus Semi-automatically generated xml with manual
revision
Läkartidningen
medical journal
1999
corpus Semi-automatically generated xml with manual
revision
Läkartidningen
medical journal
2000
corpus Semi-automatically generated xml with manual
revision
Läkartidningen
medical journal
2001
corpus Semi-automatically generated xml with manual
revision
LäSBarT corpus Semi-automatically generated xml with manual
revision
Meddelanden från
Åbo Akademi
2002–2010
corpus Semi-automatically generated xml with manual
revision
Myndighetsprosa
1990–2000 corpus
Semi-automatically generated xml with manual
revision
Non-fiction 1970-
2011 corpus
Semi-automatically generated xml with manual
revision
Norstedts novels
(1999) corpus
Semi-automatically generated xml with manual
revision
-
Contract no. 270899
D4.4 V1.0 Page 28 of 45
Resource name Resource
Type
Work carried out to meet the META-SHARE
meta-data schema
UGOT
Nya Argus 2010–
2011 corpus
Semi-automatically generated xml with manual
revision
Older Swedish
novels corpus
Semi-automatically generated xml with manual
revision
ORDAT corpus Semi-automatically generated xml with manual
revision
Österbottens
tidning 2011 corpus
Semi-automatically generated xml with manual
revision
Österbottens
tidning 2012 corpus
Semi-automatically generated xml with manual
revision
Parole corpus corpus Semi-automatically generated xml with manual
revision
Press 65 corpus Semi-automatically generated xml with manual
revision
Press 76 corpus Semi-automatically generated xml with manual
revision
Press 95 corpus Semi-automatically generated xml with manual
revision
Press 96 corpus Semi-automatically generated xml with manual
revision
Press 97 corpus Semi-automatically generated xml with manual
revision
Press 98 corpus Semi-automatically generated xml with manual
revision
Psalm book (1937) corpus Semi-automatically generated xml with manual
revision
Smittskydd corpus Semi-automatically generated xml with manual
revision
SNP 1978-79 corpus Semi-automatically generated xml with manual
revision
Studentbladet 2011 corpus Semi-automatically generated xml with manual
revision
Svenskbygden
2010-2011 corpus
Semi-automatically generated xml with manual
revision
Swedish party
programs and
election manifestos
corpus Semi-automatically generated xml with manual
revision
Swedish statute
book 1978-81 corpus
Semi-automatically generated xml with manual
revision
Swedish Wikipedia
Corpus corpus
Semi-automatically generated xml with manual
revision
SweWaC corpus Semi-automatically generated xml with manual
revision
Syd-Österbotten
2012 corpus
Semi-automatically generated xml with manual
revision
-
Contract no. 270899
D4.4 V1.0 Page 29 of 45
Resource name Resource
Type
Work carried out to meet the META-SHARE
meta-data schema
UGOT
Syntag treebank corpus Semi-automatically generated xml with manual
revision
Talbanken corpus Semi-automatically generated xml with manual
revision
Vasabladet 1991 corpus Semi-automatically generated xml with manual
revision
Vasabladet 2012 corpus Semi-automatically generated xml with manual
revision
Parole+ lexical
resource
Semi-automatically generated xml with manual
revision
WordNet-SALDO lexical
resource
Semi-automatically generated xml with manual
revision
-
Contract no. 270899
D4.4 V1.0 Page 30 of 45
Appendix A: Planned and Actual
Resource name
Planned
Batch for
metadata
in
accordance
with D2.4
Status Resource availability Respon
sible
partner
Eurotermbank 1 ✓ http://www.eurotermbank.eu/ TILDE
Lithuanian-Latvian dictionary 1 ✓ http://lietuviu.letonika.lv TILDE
Latvian-Lithuanian dictionary 1 ✓ http://www.letonika.lv/lvlt/ TILDE
Estonian-Latvian dictionary 1 ✓ http://eesti.letonika.lv TILDE
Latvian-English legislation
corpus of Republic of Latvia
(Latvian-English Ngram
corpus, Legislation of Republic
of Latvia)
1
✓
http://dl.tilde.lv/META-
NORD/LegislationNgramCorpusOfTh
eRepublicOfLatvia.html
TILDE
Multilingual dictionary of
person names 1
✓ http://www.letonika.lv/personvardi TILDE
Tilde’s POS-tagger 3 3 Planned to made available at Batch 3 TILDE
Corpus of Latvian literature 1 ✓ http://www.letonika.lv/literatura/ TILDE
EASTIN-CL multilingual
ontology 3
✓
http://dl.tilde.lv/META-
NORD/EastinClMultilingualOntology
OfAssistiveTechnology.html
TILDE
Latvian Russian person names
and geo names glossary 2
✓
http://dl.tilde.lv/META-
NORD/LvRuPersonNamesGlossary.ht
ml
TILDE
Not initially planned in D2.4
Accurat Comparable Corpora ✓
http://dl.tilde.lv/META-
NORD/AccuratComparableCorpora.ht
ml
TILDE
Resource name
Planned
Batch for
metadata
in
accordanc
e with D2.4
Status Resource availability Respon
sible
partner
Danish wordnet, DanNet 1,2,3 ✓
http://wordnet.dk/dannet/dannet/menu?
item=2 UCPH
Cross-lingually linked
resources (with FIN and SWE) 2,3 2,3 Planned to made available at Batch 3 UCPH
SprogTeknologisk Ordbase 1, 2, 3 ✓ Planned to made available at Batch 3 UCPH
Copenhagen Dependency
Treebanks 1
✓
http://code.google.com/p/copenhagen-
dependency-treebank/wiki/CDT UCPH
The Copenhagen Danish-
English Dependency Treebank 1
✓
http://code.google.com/p/copenhagen-
dependency-treebank/wiki/CDT UCPH
Danish first encounters
NOMCO corpus 3 3 Planned to made available at Batch 3 UCPH
http://www.eurotermbank.eu/http://lietuviu.letonika.lv/http://www.letonika.lv/lvlt/http://eesti.letonika.lv/http://dl.tilde.lv/META-NORD/LegislationNgramCorpusOfTheRepublicOfLatvia.htmlhttp://dl.tilde.lv/META-NORD/LegislationNgramCorpusOfTheRepublicOfLatvia.htmlhttp://dl.tilde.lv/META-NORD/LegislationNgramCorpusOfTheRepublicOfLatvia.htmlhttp://www.letonika.lv/personvardihttp://www.letonika.lv/literatura/http://dl.tilde.lv/META-NORD/EastinClMultilingualOntologyOfAssistiveTechnology.htmlhttp://dl.tilde.lv/META-NORD/EastinClMultilingualOntologyOfAssistiveTechnology.htmlhttp://dl.tilde.lv/META-NORD/EastinClMultilingualOntologyOfAssistiveTechnology.htmlhttp://dl.tilde.lv/META-NORD/LvRuPersonNamesGlossary.htmlhttp://dl.tilde.lv/META-NORD/LvRuPersonNamesGlossary.htmlhttp://dl.tilde.lv/META-NORD/LvRuPersonNamesGlossary.htmlhttp://dl.tilde.lv/META-NORD/AccuratComparableCorpora.htmlhttp://dl.tilde.lv/META-NORD/AccuratComparableCorpora.htmlhttp://dl.tilde.lv/META-NORD/AccuratComparableCorpora.htmlhttp://wordnet.dk/dannet/dannet/menu?item=2http://wordnet.dk/dannet/dannet/menu?item=2http://code.google.com/p/copenhagen-dependency-treebank/wiki/CDThttp://code.google.com/p/copenhagen-dependency-treebank/wiki/CDThttp://code.google.com/p/copenhagen-dependency-treebank/wiki/CDThttp://code.google.com/p/copenhagen-dependency-treebank/wiki/CDT
-
Contract no. 270899
D4.4 V1.0 Page 31 of 45
Resource name
Planned
Batch for
metadata
in
accordanc
e with D2.4
Status Resource availability Respon
sible
partner
Reference corpus for Danish - - Not available UCPH
Corpus of sublanguage texts
(2000 – 2010) - - Not available UCPH
Danish XLE grammar - - Not available UCPH
CstTokeniser 3 3 Planned to made available at Batch 3 UCPH
CstNER 3 3 Planned to made available at Batch 3 UCPH
CstTagger 3 3 Planned to made available at Batch 3 UCPH
CstLemma 3 3 Planned to made available at Batch 3 UCPH
CstKeyExt - - Not available UCPH
CstNP-Rec - - Not available UCPH
CstRep - - Not available UCPH
HPSG –grammar - - Not available UCPH
Not initially planned in D2.4
STO-LMF, morphology ✓ Not available UCPH
Danish-Swedish linked
wordnets ✓ Not available UCPH
Finnish-Danish linked
wordnets ✓ Not available UCPH
Finnish-Estonian linked
wordnets ✓ Not available UCPH
Finnish-Swedish linked
wordnets ✓ Not available UCPH
NST Lexical database for
Danish ✓
http://www.nb.no/spraakbanken/tilgjen
gelege-ressursar/leksikalske-databasar UCPH
Resource name
Planned
Batch for
metadata
in
accordanc
e with D2.4
Status Resource availability Respon
sible
partner
The Estonian Reference
Corpus 1
✓ http://www.cl.ut.ee/korpused/segakorp
us/ UT
Treebank 1 ✓ Planned to made available at Batch 3 UT
Estonian WordNet 1 ✓
http://www.cl.ut.ee/ressursid/teksaurus
/ UT
BABEL Estonian Database 3 3 Not available UT
Corpora of morphologically
disambiguated texts 1
✓ http://www.cl.ut.ee/korpused/morfkorp
us/index.php?lang=en UT
Corpora with shallow syntactic
annotation 1
✓ http://math.ut.ee/~kaili/Korpus/pindmi
ne/ UT
Corpus of emotional speech 2 ✓ http://peeter.eki.ee:5000/ UT
Corpus of Institute of Estonian 2 ✓ http://en.eki.ee/corpus UT
http://www.nb.no/spraakbanken/tilgjengelege-ressursar/leksikalske-databasarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/leksikalske-databasarhttp://www.cl.ut.ee/korpused/segakorpus/http://www.cl.ut.ee/korpused/segakorpus/http://www.cl.ut.ee/ressursid/teksaurus/http://www.cl.ut.ee/ressursid/teksaurus/http://www.cl.ut.ee/korpused/morfkorpus/index.php?lang=enhttp://www.cl.ut.ee/korpused/morfkorpus/index.php?lang=enhttp://math.ut.ee/~kaili/Korpus/pindmine/http://math.ut.ee/~kaili/Korpus/pindmine/http://en.eki.ee/corpus
-
Contract no. 270899
D4.4 V1.0 Page 32 of 45
Resource name
Planned
Batch for
metadata
in
accordanc
e with D2.4
Status Resource availability Respon
sible
partner
Language
Corpus of Spoken Estonian 3 3 Not available UT
Cross-lingually linked resource 3 3 Planned to made available at Batch 3 UT
Dictionaries Estonian-
Russian, 2
✓ http://portaal.eki.ee/dict/evs/ UT
English-Estonian and
Estonian-English parallel
corpus 1
✓ Planned to made available at Batch 3 UT
Estonian Foreign Accent
Corpus 3 3 Planned to made available at Batch 3 UT
Monolingual dictionaries 2 ✓ Not available UT
Dictionary of Standard
Estonian ÕS 2006 2
✓ http://www.eki.ee/dict/qs UT
Semantically disambiguated
corpus 1
✓ http://www.cl.ut.ee/korpused/semkorp
us/ UT
The database of Estonian
verbal multi-word expressions 1
✓
https://svn.spraakdata.gu.se/repos/meta
nord/pub/ut/LexicalConceptual/ESTM
WE.gz UT
Estonian text-speech
synthesizer 3 3 Planned to made available at Batch 3 UT
Morphological analyzer 3 3 Planned to made available at Batch 3 UT
Morphological Toolset for
Estonian 2
✓
http://eelex.eki.ee/pub/Install/ekiMorfo
/ekiMorfoSetup.msi (ver 4.2
executable)
ftp://ftp.eki.ee/pub/keeletehnoloogia/m
orfana/ (ver 3.2 source code,
executable)
UT
Morph syntactic disambiguator
and shallow parser 2
✓ http://www.ut.ee/~kaili/grammatika/ UT
Was not initially planned in D2.4
English-Estonian Machine
Translation Dictionary
✓
ftp://ftp.eki.ee/pub/keeletehnoloogia/in
glise-eesti/ UT
Resource name
Planned
Batch for
metadata
in
accordance
with D2.4
Status Resource availability Respon
sible
partner
Acoustic database for Danish 1 ✓
http://www.nb.no/spraakbanken/tilgjen
gelege-ressursar/taledatabasar UIB
Acoustic database for
Norwegian 1
✓
http://www.nb.no/spraakbanken/tilgjen
gelege-ressursar/taledatabasar UIB
http://portaal.eki.ee/dict/evs/http://www.cl.ut.ee/korpused/semkorpus/http://www.cl.ut.ee/korpused/semkorpus/http://www.ut.ee/~kaili/grammatika/ftp://ftp.eki.ee/pub/keeletehnoloogia/inglise-eesti/ftp://ftp.eki.ee/pub/keeletehnoloogia/inglise-eesti/http://www.nb.no/spraakbanken/tilgjengelege-ressursar/taledatabasarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/taledatabasarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/taledatabasarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/taledatabasar
-
Contract no. 270899
D4.4 V1.0 Page 33 of 45
Resource name
Planned
Batch for
metadata
in
accordance
with D2.4
Status Resource availability Respon
sible
partner
Acoustic database for Swedish 1 ✓
http://www.nb.no/spraakbanken/tilgjen
gelege-ressursar/taledatabasar UIB
NST Lexical database for
Danish 1,2
✓
http://www.nb.no/spraakbanken/tilgjen
gelege-ressursar/leksikalske-databasar UIB
NST Lexical database for
Norwegian 1,2
✓
http://www.nb.no/spraakbanken/tilgjen
gelege-ressursar/leksikalske-databasar UIB
NST Lexical database for
Swedish 1,2
✓
http://www.nb.no/spraakbanken/tilgjen
gelege-ressursar/leksikalske-databasar UIB
Norsk ordbank, bokmål 1 ✓
http://www.nb.no/spraakbanken/tilgjen
gelege-ressursar/leksikalske-databasar UIB
Norsk Ordbank, nynorsk 1 ✓
http://www.nb.no/spraakbanken/tilgjen
gelege-ressursar/leksikalske-databasar UIB
Oslo-Bergen tagger 1 ✓
http://tekstlab.uio.no/obt-
ny/english/download.html UIB
SCARRIE lexicon 1,2 ✓
http://www.nb.no/spraakbanken/tilgjen
gelege-ressursar/leksikalske-databasar UIB
Sofietrebanken 1 ✓ http://iness.uib.no UIB
TRIS Spanish-German parallel
corpus 1 ✓
Available upon request (commercial
license) UIB
Norwegian-Vietnamese digital
dictionary 1 X Not available UIB
Acquis communautaire 2 X Not available UIB
Leksikografisk bokmålskorpus 2 X Not available UIB
Milterm 2 X Not available UIB
NHH Termbase 2 X Not available UIB
UHR's Termbase for
Norwegian higher education
institutions 2 ✓ http://github.com/clarino/uhrtermlists/ UIB
Stadsnamnsamlinga 2 X Not available UIB
Translation Corpus Aligner 2 2 X Not available UIB
The Norwegian-Spanish
Parallel Corpus 2
X Not available UIB
Det nynorske tekstkorpuset 3 3 Planned to made available at Batch 3 UIB
International Computer
Archive of Modern and
Medieval English 3 3 Planned to made available at Batch 3 UIB
n-grams for Norwegian
bokmål and nynorsk 3 ✓ Available, see alternative resources UIB
Norwegian newspaper corpus 3 ✓
http://www.nb.no/sbfil/tekst/norsk_avi
skorpus.zip UIB
Norwegian reference corpus
for bokmål and nynorsk 3 3 Planned to made available at Batch 3 UIB
Norwegian wordnet 3 ✓ Available, see alternative resources UIB
http://www.nb.no/spraakbanken/tilgjengelege-ressursar/taledatabasarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/taledatabasarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/leksikalske-databasarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/leksikalske-databasarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/leksikalske-databasarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/leksikalske-databasarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/leksikalske-databasarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/leksikalske-databasarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/leksikalske-databasarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/leksikalske-databasarhttp://tekstlab.uio.no/obt-ny/english/download.htmlhttp://tekstlab.uio.no/obt-ny/english/download.htmlhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/leksikalske-databasarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/leksikalske-databasarhttp://iness.uib.no/http://github.com/clarino/uhrtermlists/http://www.nb.no/sbfil/tekst/norsk_aviskorpus.ziphttp://www.nb.no/sbfil/tekst/norsk_aviskorpus.zip
-
Contract no. 270899
D4.4 V1.0 Page 34 of 45
Resource name
Planned
Batch for
metadata
in
accordance
with D2.4
Status Resource availability Respon
sible
partner
Was not initially planned in D2.4
n-gram for Norwegian Bokmål
(based on NNC and NST news
text)
✓
ttp://www.nb.no/spraakbanken/tilgjeng
elege-ressursar/tekstressursar UIB
n-gram for Norwegian Bokmål
(based on NNC)
✓ http://www.nb.no/spraakbanken/tilgjen
gelege-ressursar/tekstressursar UIB
n-gram for Norwegian Bokmål
(based on NST news text)
✓ http://www.nb.no/spraakbanken/tilgjen
gelege-ressursar/tekstressursar UIB
n-gram for Danish (based on
the NST text corpus)
✓ http://www.nb.no/spraakbanken/tilgjen
gelege-ressursar/tekstressursar UIB
n-gram for Norwegian
Nynorsk (based on NNC and
NST)
✓
http://www.nb.no/spraakbanken/tilgjen
gelege-ressursar/tekstressursar UIB
n-gram for Swedish (based on
the NST Text Corpus)
✓ http://www.nb.no/spraakbanken/tilgjen
gelege-ressursar/tekstressursar UIB
NoWaC - Norwegian Web as
Corpus
✓ http://omilia.uio.no/swamp/index.php UIB
Frequency lists (tokens) from
NoWaC - Norwegian Web as
Corpus
✓ http://omilia.uio.no/swamp/index.php UIB
Frequency lists (lemmas) from
NoWaC - Norwegian Web as
Corpus
✓ http://omilia.uio.no/swamp/index.php UIB
Sofie Parallel Treebank ✓
http://iness.uib.no UIB
The Wordnet for Norwegian
Bokmål
✓ http://www.nb.no/sbfil/leksikalske_dat
abaser/ordnett_nob_0.2.zip UIB
The Wordnet for Norwegian
Nynorsk
✓ http://www.nb.no/sbfil/leksikalske_dat
abaser/ordnett_nno_0.2.zip UIB
The Norwegian Language
Council's list over names of
historical events and
persons
✓
http://xn--sprkrdet-c0ac.no/nb-
NO/Sprakhjelp/Rettskrivning_Ordboe
ker/Historiske_navn/ UIB
The Norwegian Language
Council's list over names of
inhabitants in
Norwegian
✓
http://xn--sprkrdet-c0ac.no/nb-
NO/Sprakhjelp/Rettskrivning_Ordboe
ker/Innbyggjarnamn/
UIB
The Norwegian Language
Council's dictionary from
Norwegian Bokmål to
Norwegian Nynorsk
✓
http://xn--sprkrdet-c0ac.no/nb-
NO/Sprakhjelp/Raad/Fra_bokmaal_til
_nynorsk/
UIB
The Norwegian Language
Council's list over language
names in Norwegian
✓
http://xn--sprkrdet-c0ac.no/nb-
NO/Sprakhjelp/Rettskrivning_Ordboe
ker/Navn_paa_spraak/
UIB
http://www.nb.no/spraakbanken/tilgjengelege-ressursar/tekstressursarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/tekstressursarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/tekstressursarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/tekstressursarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/tekstressursarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/tekstressursarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/tekstressursarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/tekstressursarhttp://www.nb.no/sbfil/leksikalske_databaser/ordnett_nob_0.2.ziphttp://www.nb.no/sbfil/leksikalske_databaser/ordnett_nob_0.2.ziphttp://språkrådet.no/nb-NO/Sprakhjelp/Rettskrivning_Ordboeker/Innbyggjarnamn/http://språkrådet.no/nb-NO/Sprakhjelp/Rettskrivning_Ordboeker/Innbyggjarnamn/http://språkrådet.no/nb-NO/Sprakhjelp/Rettskrivning_Ordboeker/Innbyggjarnamn/http://språkrådet.no/nb-NO/Sprakhjelp/Raad/Fra_bokmaal_til_nynorsk/http://språkrådet.no/nb-NO/Sprakhjelp/Raad/Fra_bokmaal_til_nynorsk/http://språkrådet.no/nb-NO/Sprakhjelp/Raad/Fra_bokmaal_til_nynorsk/http://språkrådet.no/nb-NO/Sprakhjelp/Rettskrivning_Ordboeker/Navn_paa_spraak/http://språkrådet.no/nb-NO/Sprakhjelp/Rettskrivning_Ordboeker/Navn_paa_spraak/http://språkrådet.no/nb-NO/Sprakhjelp/Rettskrivning_Ordboeker/Navn_paa_spraak/
-
Contract no. 270899
D4.4 V1.0 Page 35 of 45
Resource name
Planned
Batch for
metadata
in
accordance
with D2.4
Status Resource availability Respon
sible
partner
Was not initially planned in D2.4
The Norwegian Language
Council's list over state names
in Norwegian
✓
http://xn--sprkrdet-c0ac.no/nb-
NO/Sprakhjelp/Rettskrivning_Ordboe
ker/Navn_paa_stater/
UIB
The Norwegian Language
Council's list over names of
public departments
in Norwegian
✓
http://xn--sprkrdet-c0ac.no/nb-
NO/Sprakhjelp/Rettskrivning_Ordboe
ker/Navn-pa-statsorganer/
UIB
The Norwegian Language
Council's list over
geographical names in
Norwegian
✓
http://www.sprakradet.no/nb-
NO/Sprakhjelp/Rettskrivning_Ordboe
ker/Geografiske_namn/
UIB
the Morphologically Annotated
Part of BulTreeBank ✓ http://iness.uib.no UIB
the Dependency Part of
BulTreeBank ✓ http://iness.uib.no UIB
Resource name
Planned
Batch for
metadata
in
accordance
with D2.4
Status Resource availability Respon
sible
partner
Finnish TreeBank Grammar
Definition Corpus 1
✓ http://www.ling.helsinki.fi/kieliteknolo
gia/tutkimus/treebank/index.shtml UHEL
Finnish WordNet 1 ✓
http://www.ling.helsinki.fi/en/lt/resear
ch/finnwordnet/ UHEL
Written corpora of old literary
Finnish (Vanha kirjasuomi) 1
✓ http://kaino.kotus.fi/korpus/vks/meta/v
ks_coll_rdf.xml UHEL
Corpus of early modern
Finnish (Varhaisnykysuomen
korpus) 1
✓
http://kaino.kotus.fi/korpus/1800/meta/
1800_coll_rdf.xml UHEL
Finnish literature classics
(Suomalaisen kirjallisuuden
klassikoita) 1
✓
http://kaino.kotus.fi/korpus/klassikot/
meta/klassikot_coll_rdf.xml UHEL
Up-to-date word list of modern
Finnish (Ajantasainen
nykysuomen sanalista) 1
✓ http://kaino.kotus.fi/sanat/nykysuomi/ UHEL
Frequency list of words in
written Finnish (Kirjoitetun
suomen kielen sanojen
taajuuslista)
1
✓
http://kaino.kotus.fi/sanat/taajuuslista/
parole.php UHEL
Kansainvälinen oppijansuomen
korpus (ICLFI) 2 X Will Not be available UHEL
http://språkrådet.no/nb-NO/Sprakhjelp/Rettskrivning_Ordboeker/Navn_paa_stater/http://språkrådet.no/nb-NO/Sprakhjelp/Rettskrivning_Ordboeker/Navn_paa_stater/http://språkrådet.no/nb-NO/Sprakhjelp/Rettskrivning_Ordboeker/Navn_paa_stater/http://språkrådet.no/nb-NO/Sprakhjelp/Rettskrivning_Ordboeker/Navn-pa-statsorganer/http://språkrådet.no/nb-NO/Sprakhjelp/Rettskrivning_Ordboeker/Navn-pa-statsorganer/http://språkrådet.no/nb-NO/Sprakhjelp/Rettskrivning_Ordboeker/Navn-pa-statsorganer/http://www.sprakradet.no/nb-NO/Sprakhjelp/Rettskrivning_Ordboeker/Geografiske_namn/http://www.sprakradet.no/nb-NO/Sprakhjelp/Rettskrivning_Ordboeker/Geografiske_namn/http://www.sprakradet.no/nb-NO/Sprakhjelp/Rettskrivning_Ordboeker/Geografiske_namn/http://www.ling.helsinki.fi/kieliteknologia/tutkimus/treebank/index.shtmlhttp://www.ling.helsinki.fi/kieliteknologia/tutkimus/treebank/index.shtmlhttp://www.ling.helsinki.fi/en/lt/research/finnwordnet/http://www.ling.helsinki.fi/en/lt/research/finnwordnet/http://kaino.kotus.fi/korpus/vks/meta/vks_coll_rdf.xmlhttp://kaino.kotus.fi/korpus/vks/meta/vks_coll_rdf.xmlhttp://kaino.kotus.fi/korpus/1800/meta/1800_coll_rdf.xmlhttp://kaino.kotus.fi/korpus/1800/meta/1800_coll_rdf.xmlhttp://kaino.kotus.fi/korpus/klassikot/meta/klassikot_coll_rdf.xmlhttp://kaino.kotus.fi/korpus/klassikot/meta/klassikot_coll_rdf.xmlhttp://kaino.kotus.fi/sanat/nykysuomi/http://kaino.kotus.fi/sanat/taajuuslista/parole.phphttp://kaino.kotus.fi/sanat/taajuuslista/parole.php
-
Contract no. 270899
D4.4 V1.0 Page 36 of 45
Resource name
Planned
Batch for
metadata
in
accordance
with D2.4
Status Resource availability Respon
sible
partner
Corpus of Conversational
Finnish
(Keskusteluntutkimuksen
arkisto)
2
✓
Will Not be available UHEL
Open Source (Finnish)
Morphology 2
✓ http://code.google.com/p/omorfi/ UHEL
Oulu corpus (Language Bank
Of Finland) 2
✓ Will Not be available UHEL
Geographic Names Register of
the National Land Survey 2
✓ Will Not be available UHEL
Samples of Spoken Finnish
(Suomen kielen näytteitä) 2
✓ http://lat.csc.fi UHEL
Helsinki Finite-State
Transducer Technology 2
✓ http://hfst.sourceforge.net/ UHEL
Finland-Swedish Text
Collection (Kielipankki,
Language Bank of Finland) 2
✓ Will Not be available UHEL
Finnish Text Collection
(Kielipankki, Language Bank
of Finland) 2
✓
http://www.csc.fi/tutkimus/alat/kielitie
de UHEL
Lemmie 2 ✓ Will Not be available UHEL
Helsinki Corpus 2 ✓ Will Not be available UHEL
Finnish TreeBank 2 ✓
http://www.ling.helsinki.fi/kieliteknolo
gia/tutkimus/treebank/ UHEL
FinINTAS corpus (includes the
FDC - Finnish Dialogue
Corpus) 3 3 Will Not be available UHEL
ProoF Corpus 3 3 Will Not be available UHEL
UTA Cross-Language
Information Retrieval System 3 3 Will Not be available