META-NORD · Contract no. 270899 D4.4 V1.0 Page 5 of 45 1. Overall summary of the language...

45
META-NORD Baltic and Nordic Branch of the European Open Linguistic Infrastructure Project no. 270899 Deliverable 4.4 Second upload of language resources Version No. 1.0 31/07/2012

Transcript of META-NORD · Contract no. 270899 D4.4 V1.0 Page 5 of 45 1. Overall summary of the language...

  • META-NORD Baltic and Nordic Branch of the European Open Linguistic

    Infrastructure Project no. 270899

    Deliverable 4.4

    Second upload of language resources

    Version No. 1.0

    31/07/2012

  • Contract no. 270899

    D4.4 V1.0 Page 2 of 45

    Document Information

    Deliverable number: D4.4

    Deliverable title: Second upload of language resources

    Due date of deliverable: 31/07/2012

    Actual submission date

    of deliverable:

    31/07/2012

    Main Authors: Jussi Piitulainen, Imre Bartis

    Participants: All

    Internal reviewer: TILDE

    Workpackage: WP4

    Workpackage title: Cross-national collaboration and Pilot service

    Workpackage leader: UHEL

    Dissemination Level: PU

    Version: 1.0

    Keywords: Resources, meta-data

    Meta-data model applied

    to metadata:

    the META-SHARE V2.1 metadata model

    History of Versions

    Version Date Status Name of the

    Author

    (Partner)

    Contributions Description/

    Approval Level

    0.3 11/07/

    2012

    Initial

    draft

    UHEL Jussi Piitulainen, Imre

    Bartis

    Draft

    0.4 14/07/

    2012

    Review

    Draft

    Tilde Pre final review Draft

    0.5 23/07/

    2012

    Final draft UHEL, Tilde Final review Final Draft

    1.0 31/07/

    2012

    Final Tilde Submitted to PO Submitted to PO

    EXECUTIVE SUMMARY

    This report describes the second upload of language resources at M18. The second upload

    contains metadata descriptions of the resources provided by META-NORD partners and

    complying with the formats agreed by the META-NET projects. Data provided in this second

    upload are publicly available at: http://metashare21.tilde.lv/,

    http://spraakbanken.gu.se/metashare//, http://metashare.csc.fi/ and http://metashare.ut.ee/.

    http://metashare21.tilde.lv/http://spraakbanken.gu.se/metashare/http://metashare.csc.fi/http://metashare.ut.ee/

  • Contract no. 270899

    D4.4 V1.0 Page 3 of 45

    Table of Contents Abbreviations ............................................................................................................................. 4

    1. Overall summary of the language resources with metadata ............................................... 5

    2. List of language resources metadata ................................................................................... 5

    3. Description of the meta-data schema adopted by the consortium ...................................... 8

    4. Description of nodes ........................................................................................................... 8

    5. Licences used in the second batch ...................................................................................... 9

    6. Feedback on upload procedure ......................................................................................... 10

    7. Concluding remarks .......................................................................................................... 11

    References ................................................................................................................................ 12

    Appendix A: List of second batch metadata ........................................................................... 13

    Appendix A: Planned and Actual ........................................................................................... 30

  • Contract no. 270899

    D4.4 V1.0 Page 4 of 45

    Abbreviations Table 1 Abbreviations

    Abbreviation Term/definition

    LRT Language resources and tools

    DoW The META-NORD Description of Work document

    CC Creative Commons

    TILDE TILDE SIA (Latvia )

    UCPH Københavns Universitet (Danmark)

    UT Tartu Ülikool (Estonia)

    UIB Universitetet i Bergen Organisasjonsedd (Norway)

    UHEL Helsingin Yliopisto (Finland)

    HI Haskoli Islands (Iceland)

    LKI Lietuviu Kalbos Institutas (Lithuania)

    UGOT Göteborgs Universitet (Sweden)

    LRT Language Resources and Technologies

    IPR Intellectual Property Rights

    CLARIN Common Language Resources and Technology Infrastructure

    BLARK The Basic Language Resource Kit

  • Contract no. 270899

    D4.4 V1.0 Page 5 of 45

    1. Overall summary of the language resources with metadata

    An important aim of META-NORD is to upgrade and harmonize national language

    resources and tools in order to make them widely available and usable, within languages

    and across languages, with respect to their data formats.

    A further central aim is the definition of standardized resource and tool metadata and

    mechanisms for making these metadata harvestable, so that distributed resources and

    tools can be effectively utilized in language technology applications, both in academic

    research and in industry.

    The META-SHARE metadata model and its supporting software have matured

    significantly since the first META-NORD upload in November 2011.

    The four META-SHARE nodes of META-NORD (TILDE, UGOT, UHEL and UT)

    installed the beta (v2.0) of the software and made it available for editing by all partners

    right after it was published by T4ME. The nodes were upgraded to v2.1 in time for the

    second upload.

    Since the synchronization of the nodes is not yet implemented, META-NORD made the

    TILDE node their master repository for the second upload. The partners who edited in the

    other nodes exported their records as XML, uploaded them to the TILDE node, and

    published them there.

    The metadata records from the first META-NORD upload were upgraded to the current

    version of the metadata model. The automatic transformations from v1.1 to v2.0 to v2.1

    were not completely reliable, so some manual labor was required.

    2. List of language resources metadata In total for 174 new resources metadata have been described using META-SHARE editor

    tool. Data provided in this second upload is publicly available at:

    http://metashare21.tilde.lv/

    http://spraakbanken.gu.se/metashare//

    http://metashare.csc.fi/

    http://metashare.ut.ee/

    81 new resources metadata, out of which 79 delivered by UGOT and 2 by UCPH, were

    semi-automatically generated xml with manual revision and didn’t require any work on

    clearing up intellectual property rights related issues. The other 93 metadata were

    manually added to the META-SHARE editor. Out of these 93 resources metadata 52

    required work on licencing related issues, such as IPR guidance to the IPR holders,

    encouraging them to make the distribution agreement flexible, negotiating user terms,

    organizing a seminar with the IPR holders on licensing issues, etc.

    http://metashare21.tilde.lv/http://spraakbanken.gu.se/metashare/http://metashare.csc.fi/http://metashare.ut.ee/

  • Contract no. 270899

    D4.4 V1.0 Page 6 of 45

    Figure 1 Work carried out to meet the META-SHARE

    The 93 resources metadata that were manually added required also technical work, such

    as developing the tools that were uploaded together with their metadata, converting the

    LR to standard downloadable formats, gathering information on the LR from various

    written documentation and from the owner of the LR, giving technical feedback to the

    META-SHARE developers, promoting the use of standard formats (xml/lmf), translating

    the metadata from one language to the other, etc.

    Figure 2 IPR and technical work carried out

    For more information on the uploaded metadata and the work carried out to meet the

    META-SHARE metadata schema see Appendix A: List of second batch metadata.

    Out of the 174 LRs provided for the second upload 114 are resources of the consortium,

    while 60 are from outside the consortium:

    81

    93

    75

    80

    85

    90

    95

    Semi-automatically generatedmetadata

    Manually generated metadata (all ofthem required technical work)

    Work carried out to meet the META-SHARE metadata schema (general

    overview)

    52 41

    93

    0

    20

    40

    60

    80

    100

    Metadata requiring IPRrelated work

    Metadata that didn'trequire IPR related work

    metadata that requiredtechnical work

    IPR and technical work carried out to meet the META-SHARE metadata

    schema

  • Contract no. 270899

    D4.4 V1.0 Page 7 of 45

    Figure 3 Location of resources

    Most of the LR provided for the second upload were corpora (111). The other types of

    resources were lexical resources (6), lexical conceptual (29), corpus-ngrams (9), speech (3),

    tools (10), lexicons (5) and ontology (1):

    Figure 4 Typology of resources

    79

    5 4 6 0 3 8 9

    114

    0

    32

    9 2

    10 0

    7 0

    60

    0

    20

    40

    60

    80

    100

    120

    Location of resources

    Own resources

    Resources from outsidethe consortium

  • Contract no. 270899

    D4.4 V1.0 Page 8 of 45

    3. Description of the meta-data schema adopted by the consortium

    The META-SHARE metadata model is documented in the META-SHARE support

    community’s internet forum.1

    In the model, language resources are considered to be of four overall types listed below.

    The creation of a record in the META-SHARE editor starts with the selection of one of

    these types. The editor supports specifically the description of a resource of the chosen

    type:

    * corpus (including written/text, oral/spoken, multimodal/multimedia corpora),

    * lexical / conceptual resource (including terminological resources, word lists,

    semantic lexica, ontologies etc.),

    * language description (including grammars),

    * tool / service (including basic processing tools, applications, web services etc.

    required for processing data resources).

    Administrative information is the same for all resource types, e.g. the name of the

    resource, the description of the resource, contact and location information, origins,

    availability, and licensing.

    Language resources may consist of different types of media. The model and the editor

    provide separate support for the description of the following media types where

    applicable:

    * text (+textNumerical, textNgram),

    * audio,

    * image,

    * video.

    The details include the languages of the resource, the size of the resource in various units,

    annotation information, validation, format standards to which the resource conforms,

    actual and foreseen use, and technicalities like character sets and operating environment.

    The metadata model is implemented as an XML schema and a supporting editor that is

    accessed through a web browser. Records can be created, published and maintained in a

    META-SHARE node using the editor. Anyone can browse and search the published

    records. The editor can also export selected records as XML documents. Conforming

    XML records can be uploaded to a META-SHARE node through the editor.

    The META-SHARE XML schema itself is available at: http://metashare.ilsp.gr/META-

    XMLSchema/v2.1/META-SHARE-Resource.xsd.

    4. Description of nodes

    All four META-NORD nodes (TILDE, UGOT, UHEL and UT) were upgraded to the

    version 2.1 soon after it was released in May 2012. The version 2.1 is the first proper

    release of the software, following the beta release 2.0 in March as open source2.

    1 See http://metashare.ilsp.gr/portal/knowledgebase/OverviewOfTheMetadataModel

    and

    http://metashare.ilsp.gr/portal/knowledgebase/DetailedPresentationOfTheModel. 2 See https://github.com/metashare/META-SHARE.

    http://metashare.ilsp.gr/META-XMLSchema/v2.1/META-SHARE-Resource.xsdhttp://metashare.ilsp.gr/META-XMLSchema/v2.1/META-SHARE-Resource.xsdhttp://metashare.ilsp.gr/portal/knowledgebase/OverviewOfTheMetadataModelhttp://metashare.ilsp.gr/portal/knowledgebase/DetailedPresentationOfTheModelhttps://github.com/metashare/META-SHARE

  • Contract no. 270899

    D4.4 V1.0 Page 9 of 45

    The version 3 of the META-SHARE software is expected to provide the means to make

    the nodes a part of a synchronized network in a few months, thus making the tedious and

    error-prone manual copying of the records between the networked nodes unnecessary

    (export/upload works actually relatively well, but it creates copies).

    TILDE’s META-SHARE node3 :

    Figure 5 TILDE’s META-SHARE node

    5. Licences used in the second batch

    Out of the 174 language resources whose metadata was provided in the second upload

    only one, the “STO-LMF, morphology” lexicon is available at a widely known resources

    database, namely that of ELDA4 . The LDC Corpus Catalog

    5 does not contain any of the

    metadata provided in the second upload by META-NORD.

    3 http://metashare21.tilde.lv/.

    4 http://catalog.elra.info/.

    5 http://www.ldc.upenn.edu/Catalog/.

    http://metashare21.tilde.lv/http://catalog.elra.info/http://www.ldc.upenn.edu/Catalog/

  • Contract no. 270899

    D4.4 V1.0 Page 10 of 45

    Figure 6 Licences used in the second batch

    As illustrated in the diagram above, from the 174 resources of the second batch only 12

    have licences that are still under negotiation, while the licences of the others are not an

    open question anymore. The most popular category of licences is CC_BY-SA_3.0, while

    also CC_BY, CC_BY-NC-SA_3.0 and GPL are well represented. This proves that the

    interest towards sharing resources in the spirit of open data is strong in the META-NORD

    network.

    6. Feedback on upload procedure

    The second upload was performed in a smooth and efficient manner, due to the great

    improvements to the META-SHARE software.

    The first and second upload’s metadata is publicly available at:

    http://metashare21.tilde.lv/

    http://spraakbanken.gu.se/metashare//

    http://metashare.csc.fi/

    http://metashare.ut.ee/

    The META-SHARE metadata model and its accompanying editor software are beginning

    to function as it was promised. The META-SHARE developers have been active and

    helpful during and after the beta cycle. All META-NORD partners had access to the

    editor in time and were able to create the required descriptions for the second upload.

    The ability to export metadata records from the META-SHARE editor as XML

    documents and to upload them again has helped in the moving of the records between the

    current META-NORD nodes with minimal need of human communication. These

    facilities will be useful even regardless of the synchronized networking in the future

    versions of META-SHARE, since they make it easy to share resource descriptions with

    the wider world.

    UiB in particular raised the issue of collaborative editing. There is a need for people to

    help each other, and a need to control their own metadata descriptions. In v2, the editing

    rights mean the right to edit anything, so collaboration is possible at the moment but there

    is a considerable element of trust involved. A solution is needed so that many content

    owners can be let in to describe and share their own resources, individually and

    80

    3 7 2 1 1 7 5 6 9 2

    11 15 25

    0102030405060708090

    Licences used in the second batch

    http://metashare21.tilde.lv/http://spraakbanken.gu.se/metashare/http://metashare.csc.fi/http://metashare.ut.ee/

  • Contract no. 270899

    D4.4 V1.0 Page 11 of 45

    collaboratively, without too much fear of accidental (or even malicious) stepping on other

    people's toes. The developers have acknowledged this issue.

    Many minor issues remain while the software continues to be refined. For example,

    several people were confused when the editor would not make their metadata record

    "published" because it had not been "ingested" first, and in a few cases some lost partial

    records without receiving any warning message. Luckily the latter issue was quickly

    corrected in the source repository.

    The META-SHARE editor provides in-line documentation that is generally helpful,

    together with a link to an extensive description of the whole model in detail (see chapter 3

    of the present report).

    The current release of the META-SHARE software provided by T4ME shows serious

    improvements from the one available for the first upload. The feedback given by META-

    NORD partners has been taken into consideration by T4ME. The collaboration between

    the two projects goes on, since META-NORD partners are giving feedback on

    shortcomings of the META-SHARE software that should be fixed for the next version of

    the editor.

    A recurrent minor problem has been the disappearance of metadata descriptions from the

    presentation side (browsing and searching) of META-SHARE nodes (at least in UHEL

    and TILDE). Sometimes the nodes also report an internal error instead of showing a

    description, and sometimes the permission to delete a record is denied. Such problems

    need to be reported to and addressed by the developers (in GitHub6).

    7. Concluding remarks

    The second upload was very successful. All in all metadata for 174 language resources

    were provided, as compared to the 127 envisioned in D2.4, “Selection of resources,

    agreements, detailed work plan” Appendix B Planned and Actual provides overview of

    the status for envisioned resources and actually provided metadata descriptions and

    location of resources. Also the number of META-SHARE nodes increased from three to

    four with the release of the UT node on 28 June 2012. For the next data upload it is

    planned that there will be 4 additional META-SHARE nodes.

    The META-SHARE model of describing language resources, together with the

    supporting software, has matured significantly during the first half of 2012. Now the

    system appears to be usable in individual nodes and also provides the means to move

    descriptions in and out of such nodes as XML documents. Minor problems are

    continuously solved.

    UiB has expressed reservations about the expressiveness of the metadata model. There

    seem to be objects of interest for language researchers in the humanities whose

    description would benefit from a more flexible content model than the one META-

    SHARE has created (CLARIN has one). Perhaps the META-SHARE software could even

    be adapted to such needs.

    The most significant omission in the software turned out to be the management of editing

    rights. There is a need for individual metadata descriptions to be both owned and shared

    by people responsible for them.

    The most significant planned feature of the software is without any doubt the

    synchronization of META-SHARE nodes. This feature is expected to be usable in time

    6 https://github.com/metashare/META-SHARE.

  • Contract no. 270899

    D4.4 V1.0 Page 12 of 45

    for the third upload. The META-NORD partners should endeavour to experience the

    inevitable initial problems early and help the developers solve them by sending feedback.

    The META-NORD consortium hopes that META-SHARE will be the basis of a strong

    and constantly growing community keen on sharing both language resources and their

    respective metadata.

    References

    [1] C. Federmann, B. Georgantopoulos, R. del Gratta, O. Hamon, B. Magnini, D. Mavroeidis, S. Piperidis, M. Schroeder, M. Speranza.

    META-NET Deliverable D7.1.1 – META-SHARE Functional and Technical Specification, 2011 [2] S. Piperidis. The META-SHARE Language Resources Sharing Infrastructure: Principles, Challenges, Solutions. Proceedings of the

    Eight International Conference on Language Resources and Evaluation (LREC'12), Istanbul, Turkey, 36-42, Beijing, 2012.

  • Contract no. 270899

    D4.4 V1.0 Page 13 of 45

    Appendix A: List of second batch metadata

    Resource name Resource

    Type

    Work carried out to meet the META-SHARE

    meta-data schema

    Tilde

    Accurat

    Comparable

    Corpora

    corpus

    Metadata manually added using the META-SHARE

    editor v2.1. Clearing up licensing issues: IPR free

    resource.

    Corpus of Latvian

    literature corpus

    Metadata manually added using the META-SHARE

    editor v2.1. Clearing up licensing issues. Content of

    corpus is prepared for download: content is

    transformed from propriatery format into XML,

    UTF-8 encoding is used.

    EASTIN-CL

    Multilingual

    Ontology of

    Assistive

    Technology

    ontology

    Updated the batch 1 metadata to conform with the

    requirements and possibilities of the new META-

    SHARE editor v2.1. To validate the data, removed

    "und" language tags, updated other language codes

    from the three-letter ISO 639 standard (e.g. "eng"

    English) to the two-letter standard (e.g. "en"

    English).

    Estonian-Latvian

    dictionary

    LexicalConcep

    tual

    Updated the batch 1 metadata to conform with the

    requirements of the new META-SHARE editor

    v2.1 - and to conform with the information added to

    the batch 2 upload.

    EuroTermBank LexicalConcep

    tual

    Updated the batch 1 metadata to conform with the

    requirements of the new META-SHARE editor

    v2.1 - and to conform with the information added to

    the batch 2 upload.

    Latvian-English

    Ngram corpus,

    Legislation of

    Republic of Latvia

    corpus

    Metadata manually added using the META-SHARE

    editor v2.1. Clearing up licensing issues. Content of

    corpus is updated according to standarts and

    prepared for download: content is transformed from

    propriatery format into TMX, UTF-8 encoding is

    used.

    Latvian-Lithuanian

    dictionary

    LexicalConcep

    tual

    Updated the batch 1 metadata to conform with the

    requirements of the new META-SHARE editor

    v2.1 - and to conform with the information added to

    the batch 2 upload.

    Lithuanian-Latvian

    dictionary

    LexicalConcep

    tual

    Updated the batch 1 metadata to conform with the

    requirements of the new META-SHARE editor

    v2.1 - and to conform with the information added to

    the batch 2 upload.

    Latvian-Russian

    Person Names

    Glossary

    LexicalConcep

    tual

    Metadata manually added using the META-SHARE

    editor v2.1. Clearing up licensing issues:

    MSCommons_BY-NC-ND. Content of corpus is

    updated according to standarts and prepared for

  • Contract no. 270899

    D4.4 V1.0 Page 14 of 45

    Resource name Resource

    Type

    Work carried out to meet the META-SHARE

    meta-data schema

    Tilde

    download: content is transformed from propriatery

    format into TMX, UTF-8 encoding is used.

    Resource name Resource

    Type

    Work carried out to meet the META-SHARE

    meta-data schema

    UCPH

    DanNet Lexicon

    Updated the batch 1 metadata to conform with the

    requirements and possibilities of the new META-

    SHARE editor v2.1. Added documentation in

    English.

    Finnish-Danish

    Linked Wordnets Lexicon

    Assigned metadata to conform with the

    requirements and possibilities of the new META-

    SHARE editor v2.1.

    Danish-Swedish

    Linked Wordnets lexicon

    Assigned metadata to conform with the

    requirements and possibilities of the new META-

    SHARE editor v2.1.

    Finnish Estonian

    Linked Wordnets lexicon

    Assigned metadata to conform with the

    requirements and possibilities of the new META-

    SHARE editor v2.1.

    Finnish-Swedish

    Linked Wordnets Lexicon

    Assigned 1 metadata to conform with the

    requirements and possibilities of the new META-

    SHARE editor v2.1.

    STO-LMF Lexical

    resource

    Updated the batch 1 metadata to conform with the

    requirements of the new META-SHARE editor

    v2.1 - and to conform with the information added to

    the batch 2 upload.

    Copenhagen

    Danish-English

    Dependency

    Treebank

    Corpus Semi-automatically generated with manual revision

    Copenhagen

    Dependency

    Treebank

    Corpus Semi-automatically generated with manual revision

    Resource name Resource

    Type

    Work carried out to meet the META-SHARE

    meta-data schema

    UT

    The Estonian

    Reference Corpus corpus

    Checking and specifying the information required

    for filling in the META-SHARE editor’s mandatory

    fields, clearing up licensing issues; the latter is still

    in progress.

  • Contract no. 270899

    D4.4 V1.0 Page 15 of 45

    Resource name Resource

    Type

    Work carried out to meet the META-SHARE

    meta-data schema

    UT

    Estonian Corpus

    with shallow

    syntactic

    annotation

    corpus

    Checking and specifying the information required

    for filling in the META-SHARE editor’s mandatory

    fields, clearing up licensing issues; the latter is still

    in progress.

    Morphosyntactic

    disambiguator and

    shallow parser for

    Estonian

    tool

    Manually added using the META-SHARE editor

    v2.1, gathering information from various written

    documentation.

    Corpus of Institute

    of the Estonian

    Language

    corpus Manually added using the META-SHARE editor

    v2.1

    Dictionary of

    Standard Estonian

    ÕS 2006

    LexicalConcep

    tual

    Manually added using the META-SHARE editor

    v2.1

    English-Estonian

    Machine

    Translation

    Dictionary

    LexicalConcep

    tual

    Manually added using the META-SHARE editor

    v2.1

    Estonian Emotional

    Speech Corpus corpus

    Manually added using the META-SHARE editor

    v2.1

    Estonian-Russian

    Dictionary

    LexicalConcep

    tual

    Manually added using the META-SHARE editor

    v2.1

    Morphological

    Toolset for

    Estonian

    tool Manually added using the META-SHARE editor

    v2.1

    Estonian WordNet LexicalConcep

    tual

    Updated the batch 1 metadata to conform with the

    requirements and possibilities of the new META-

    SHARE editor.

    The database of

    Estonian multi-

    word expressions

    LexicalConcep

    tual

    Updated the batch 1 metadata to conform with the

    requirements and possibilities of the new META-

    SHARE editor.

    Corpus of

    morphologically

    disambiguated

    Estonian texts

    Corpus

    Updated the batch 1 metadata to conform with the

    requirements and possibilities of the new META-

    SHARE editor.

    Estonian-English

    parallel corpus Corpus

    Updated the batch 1 metadata to conform with the

    requirements and possibilities of the new META-

    SHARE editor.

    Estonian Treebank Corpus

    Updated the batch 1 metadata to conform with the

    requirements and possibilities of the new META-

    SHARE editor.

    Semantically

    disambiguated

    corpus of Estonian

    Corpus

    Updated the batch 1 metadata to conform with the

    requirements and possibilities of the new META-

    SHARE editor.

    https://extranet.tilde.lv/metanord/Lists/Data%20upload%20Batch%202/DispForm.aspx?ID=170

  • Contract no. 270899

    D4.4 V1.0 Page 16 of 45

    Resource name Resource

    Type

    Work carried out to meet the META-SHARE

    meta-data schema

    UIB

    Norwegian

    newspaper corpus Corpus

    Manually added using the META-SHARE editor

    v2.1, including technical feedback to the META-

    SHARE developers. Information collected from

    written documentation; IPR matters resolved

    through contact with the distributors (Språkbanken -

    The National Library of Norway).

    Norwegian wordnet LexicalConcep

    tual

    Manually added using the META-SHARE editor

    v2.1, including technical feedback to the META-

    SHARE developers. Information collected from

    written documentation; IPR matters resolved

    through contact with the distributors (Språkbanken -

    The National Library of Norway).

    n-grams for

    Norwegian bokmål

    and nynorsk

    corpus -

    ngrams

    Manually added using the META-SHARE editor

    v2.1, including technical feedback to the META-

    SHARE developers. Information collected from

    written documentation; IPR matters resolved

    through contact with the distributors (Språkbanken -

    The National Library of Norway).

    The Norwegian

    Language Council's

    dictionary from

    Norwegian Bokmål

    to Nynorsk

    LexicalConcep

    tual

    The resource only existed in HTML at the

    Norwegian Language Council's (LC) webpages.

    META-NORD/UIB has arranged for Språkbanken

    (the National Library of Norway) to make the

    resource freely downloadable. META-NORD/UIB

    manually added metadata using the META-SHARE

    editor v2.1, including technical feedback to the

    META-SHARE developers. Information collected

    from the Norw. Language Council. META-

    NORD/UIB has promoted the use of standard

    formats (xml/lmf) and encouraged the IPR holder

    and Distributor to make the distribution agreement

    as flexible (open and extendable) as possible. Some

    of the lists have already been converted to

    downloadable format through the treebanking

    project INESS, and are made available to the

    distributor.

    The Norwegian

    Language Council's

    list over language

    names in

    Norwegian

    LexicalConcep

    tual

    The resource only existed in HTML at the

    Norwegian Language Council's (LC) webpages.

    META-NORD/UIB has arranged for Språkbanken

    (the National Library of Norway) to make the

    resource freely downloadable. META-NORD/UIB

    manually added metadata using the META-SHARE

    editor v2.1, including technical feedback to the

    META-SHARE developers. Information collected

    from the Norw. Language Council. META-

    NORD/UIB has promoted the use of standard

  • Contract no. 270899

    D4.4 V1.0 Page 17 of 45

    Resource name Resource

    Type

    Work carried out to meet the META-SHARE

    meta-data schema

    UIB

    formats (xml/lmf) and encouraged the IPR holder

    and Distributor to make the distribution agreement

    as flexible (open and extendable) as possible. Some

    of the lists have already been converted to

    downloadable format through the treebanking

    project INESS, and are made available to the

    distributor.

    The Norwegian

    Language Council's

    list over

    geographical names

    LexicalConcep

    tual

    The resource only existed in HTML at the

    Norwegian Language Council's (LC) webpages.

    META-NORD/UIB has arranged for Språkbanken

    (the National Library of Norway) to make the

    resource freely downloadable. META-NORD/UIB

    manually added metadata using the META-SHARE

    editor v2.1, including technical feedback to the

    META-SHARE developers. Information collected

    from the Norw. Language Council. META-

    NORD/UIB has promoted the use of standard

    formats (xml/lmf) and encouraged the IPR holder

    and Distributor to make the distribution agreement

    as flexible (open and extendable) as possible. Some

    of the lists have already been converted to

    downloadable format through the treebanking

    project INESS, and are made available to the

    distributor.

    The Norwegian

    Language Council's

    list over names of

    historical events

    and persons

    LexicalConcep

    tual

    The resource only existed in HTML at the

    Norwegian Language Council's (LC) webpages.

    META-NORD/UIB has arranged for Språkbanken

    (the National Library of Norway) to make the

    resource freely downloadable. META-NORD/UIB

    manually added metadata using the META-SHARE

    editor v2.1, including technical feedback to the

    META-SHARE developers. Information collected

    from the Norw. Language Council. META-

    NORD/UIB has promoted the use of standard

    formats (xml/lmf) and encouraged the IPR holder

    and Distributor to make the distribution agreement

    as flexible (open and extendable) as possible. Some

    of the lists have already been converted to

    downloadable format through the treebanking

    project INESS, and are made available to the

    distributor.

    The Norwegian

    Language Council's

    list over names in

    Norwegian of

    inhabitants

    LexicalConcep

    tual

    The resource only existed in HTML at the

    Norwegian Language Council's (LC) webpages.

    META-NORD/UIB has arranged for Språkbanken

    (the National Library of Norway) to make the

    resource freely downloadable. META-NORD/UIB

  • Contract no. 270899

    D4.4 V1.0 Page 18 of 45

    Resource name Resource

    Type

    Work carried out to meet the META-SHARE

    meta-data schema

    UIB

    manually added metadata using the META-SHARE

    editor v2.1, including technical feedback to the

    META-SHARE developers. Information collected

    from the Norw. Language Council. META-

    NORD/UIB has promoted the use of standard

    formats (xml/lmf) and encouraged the IPR holder

    and Distributor to make the distribution agreement

    as flexible (open and extendable) as possible. Some

    of the lists have already been converted to

    downloadable format through the treebanking

    project INESS, and are made available to the

    distributor.

    The Norwegian

    Language Council's

    list over names in

    Norwegian of

    public departments

    LexicalConcep

    tual

    The resource only existed in HTML at the

    Norwegian Language Council's (LC) webpages.

    META-NORD/UIB has arranged for Språkbanken

    (the National Library of Norway) to make the

    resource freely downloadable. META-NORD/UIB

    manually added metadata using the META-SHARE

    editor v2.1, including technical feedback to the

    META-SHARE developers. Information collected

    from the Norw. Language Council. META-

    NORD/UIB has promoted the use of standard

    formats (xml/lmf) and encouraged the IPR holder

    and Distributor to make the distribution agreement

    as flexible (open and extendable) as possible. Some

    of the lists have already been converted to

    downloadable format through the treebanking

    project INESS, and are made available to the

    distributor.

    The Norwegian

    Language Council's

    list over names in

    Norwegian of states

    LexicalConcep

    tual

    The resource only existed in HTML at the

    Norwegian Language Council's (LC) webpages.

    META-NORD/UIB has arranged for Språkbanken

    (the National Library of Norway) to make the

    resource freely downloadable. META-NORD/UIB

    manually added metadata using the META-SHARE

    editor v2.1, including technical feedback to the

    META-SHARE developers. Information collected

    from the Norw. Language Council. META-

    NORD/UIB has promoted the use of standard

    formats (xml/lmf) and encouraged the IPR holder

    and Distributor to make the distribution agreement

    as flexible (open and extendable) as possible. Some

    of the lists have already been converted to

    downloadable format through the treebanking

    project INESS, and are made available to the

    distributor.

  • Contract no. 270899

    D4.4 V1.0 Page 19 of 45

    Resource name Resource

    Type

    Work carried out to meet the META-SHARE

    meta-data schema

    UIB

    n-gram for Danish

    (based on the NST

    text corpus)

    Corpus -

    ngrams

    Manually added using the META-SHARE editor

    v2.1, including technical feedback to the META-

    SHARE developers. Information collected from

    written documentation; IPR matters resolved

    through contact with the distributors (Språkbanken -

    The National Library of Norway).

    n-gram for

    Norwegian Bokmål

    (based on NNC)

    Corpus -

    ngrams

    Manually added using the META-SHARE editor

    v2.1, including technical feedback to the META-

    SHARE developers. Information collected from

    written documentation; IPR matters resolved

    through contact with the distributors (Språkbanken -

    The National Library of Norway).

    n-gram for

    Norwegian Bokmål

    (based on NST

    news text)

    Corpus -

    ngrams

    Manually added using the META-SHARE editor

    v2.1, including technical feedback to the META-

    SHARE developers. Information collected from

    written documentation; IPR matters resolved

    through contact with the distributors (Språkbanken -

    The National Library of Norway).

    n-gram for

    Norwegian Bokmål

    (based on NNC and

    NST news text)

    Corpus -

    ngrams

    Manually added using the META-SHARE editor

    v2.1, including technical feedback to the META-

    SHARE developers. Information collected from

    written documentation; IPR matters resolved

    through contact with the distributors (Språkbanken -

    The National Library of Norway).

    n-gram for

    Norwegian

    Nynorsk (based on

    NNC and NST)

    Corpus -

    ngrams

    Manually added using the META-SHARE editor

    v2.1, including technical feedback to the META-

    SHARE developers. Information collected from

    written documentation; IPR matters resolved

    through contact with the distributors (Språkbanken -

    The National Library of Norway).

    n-gram for Swedish

    (based on the NST

    Text Corpus)

    Corpus -

    ngrams

    Manually added using the META-SHARE editor

    v2.1, including technical feedback to the META-

    SHARE developers. Information collected from

    written documentation; IPR matters resolved

    through contact with the distributors (Språkbanken -

    The National Library of Norway).

    Dependency Part of

    BulTreeBank Corpus

    Manually added using the META-SHARE editor

    v2.1, including technical feedback to the META-

    SHARE developers. Information collected through

    a meeting with the resource provider and from

    written documentation. META-NORD added the

    resource to the INESS treebank infrastructure.

    The

    Morphologically

    Annotated Part of

    BulTreeBank

    Corpus

    Manually added using the META-SHARE editor

    v2.1, including technical feedback to the META-

    SHARE developers. Information collected through

    a meeting with the resource provider and from

  • Contract no. 270899

    D4.4 V1.0 Page 20 of 45

    Resource name Resource

    Type

    Work carried out to meet the META-SHARE

    meta-data schema

    UIB

    written documentation. META-NORD added the

    resource to the INESS treebank infrastructure.

    Acoustic database

    for Danish Speech

    Updated the batch 1 metadata to conform with the

    requirements and possibilities of the new META-

    SHARE editor v2.1 (among other things, the v2.1

    allows to add more than one RestrictionOfUse, and

    UIB had to update language codes from the three-

    letter ISO 639 standard [e.g. "eng" English] to the

    two-letter standard [e.g. "en" English]).

    Acoustic database

    for Norwegian Speech

    Updated the batch 1 metadata to conform with the

    requirements and possibilities of the new META-

    SHARE editor v2.1 (among other things, the v2.1

    allows to add more than one RestrictionOfUse, and

    UIB had to update language codes from the three-

    letter ISO 639 standard [e.g. "eng" English] to the

    two-letter standard [e.g. "en" English]).

    Acoustic database

    for Swedish Speech

    Updated the batch 1 metadata to conform with the

    requirements and possibilities of the new META-

    SHARE editor v2.1 (among other things, the v2.1

    allows to add more than one RestrictionOfUse, and

    UIB had to update language codes from the three-

    letter ISO 639 standard [e.g. "eng" English] to the

    two-letter standard [e.g. "en" English]).

    Lexical database

    for Danish

    LexicalConcep

    tual

    Updated the batch 1 metadata to conform with the

    requirements and possibilities of the new META-

    SHARE editor v2.1 (among other things, the v2.1

    allows to add more than one RestrictionOfUse, and

    UIB had to update language codes from the three-

    letter ISO 639 standard [e.g. "eng" English] to the

    two-letter standard [e.g. "en" English]).

    Lexical database

    for Norwegian

    LexicalConcep

    tual

    Updated the batch 1 metadata to conform with the

    requirements and possibilities of the new META-

    SHARE editor v2.1 (among other things, the v2.1

    allows to add more than one RestrictionOfUse, and

    UIB had to update language codes from the three-

    letter ISO 639 standard [e.g. "eng" English] to the

    two-letter standard [e.g. "en" English]).

    Lexical database

    for Swedish

    LexicalConcep

    tual

    Updated the batch 1 metadata to conform with the

    requirements and possibilities of the new META-

    SHARE editor v2.1 (among other things, the v2.1

    allows to add more than one RestrictionOfUse, and

    UIB had to update language codes from the three-

    letter ISO 639 standard [e.g. "eng" English] to the

    two-letter standard [e.g. "en" English]).

  • Contract no. 270899

    D4.4 V1.0 Page 21 of 45

    Resource name Resource

    Type

    Work carried out to meet the META-SHARE

    meta-data schema

    UIB

    Norsk ordbank,

    bokmål

    LexicalConcep

    tual

    Updated the batch 1 metadata to conform with the

    requirements and possibilities of the new META-

    SHARE editor v2.1 (among other things, the v2.1

    allows to add more than one RestrictionOfUse, and

    UIB had to update language codes from the three-

    letter ISO 639 standard [e.g. "eng" English] to the

    two-letter standard [e.g. "en" English]).

    Norsk ordbank,

    nynorsk

    LexicalConcep

    tual

    Updated the batch 1 metadata to conform with the

    requirements and possibilities of the new META-

    SHARE editor v2.1 (among other things, the v2.1

    allows to add more than one RestrictionOfUse).

    UIB discussed language codes with the META-

    SHARE developers and, among other things had to

    update language codes from the three-letter ISO

    639 standard [e.g. "eng" English] to the two-letter

    standard [e.g. "en" English]).

    SCARRIE lexicon LexicalConcep

    tual

    Updated the batch 1 metadata to conform with the

    requirements and possibilities of the new META-

    SHARE editor v2.1 (among other things, the v2.1

    allows to add more than one RestrictionOfUse).

    UIB discussed language codes with the META-

    SHARE developers and, among other things had to

    update language codes from the three-letter ISO

    639 standard [e.g. "eng" English] to the two-letter

    standard [e.g. "en" English]).

    TRIS Spanish-

    German parallel

    corpus v0.1

    Corpus

    Updated the batch 1 metadata to conform with the

    requirements and possibilities of the new META-

    SHARE editor v2.1.UIB is negotiation the Location

    of distribution of this resource as well as licensing,

    among others we discuss distribution through

    ELDA.

    Oslo-Bergen tagger tool

    Updated the batch 1 metadata to conform with the

    requirements and possibilities of the new META-

    SHARE editor v2.1.

    Frequency lists

    (tokens) from

    NoWaC -

    Norwegian Web as

    Corpus

    Corpus -

    ngrams

    Manually added using the META-SHARE editor

    v2.1, including technical feedback to the META-

    SHARE developers. Information, including IPR

    matters, collected through direct contact with

    resource providers and from written documentation.

    Frequency lists

    (lemmas) from

    NoWaC -

    Norwegian Web as

    Corpus

    Corpus -

    ngrams

    Manually added using the META-SHARE editor

    v2.1, including technical feedback to the META-

    SHARE developers. Information, including IPR

    matters, collected through direct contact with

    resource providers and from written documentation.

  • Contract no. 270899

    D4.4 V1.0 Page 22 of 45

    Resource name Resource

    Type

    Work carried out to meet the META-SHARE

    meta-data schema

    UIB

    NoWaC -

    Norwegian Web as

    Corpus

    Corpus

    Manually added using the META-SHARE editor

    v2.1, including technical feedback to the META-

    SHARE developers. Information, including IPR

    matters, collected through direct contact with

    resource providers and from written documentation.

    TRIS Spanish-

    German parallel

    corpus v0.2

    Corpus

    Manually added using the META-SHARE editor

    v2.1, including technical feedback to the META-

    SHARE developers. UIB is negotiation the

    Location of distribution of this resource as well as

    licensing, among others we discuss distribution

    through ELDA.

    UHR's termbase for

    Norwegian higher

    education

    institutions

    LexicalConcep

    tual

    The resource existed in searchable form at the

    Norwegian Association of Higher Education

    Institutions (UHR). UIB assisted UHR in choosing

    a license, signed a Depositor's agreement with

    UHR. UIB has converted the term base from an

    implicitly structured HTML table into the standard

    TBX format and is distributing the TBX files

    through Github since META-SHARE does not yet

    offer a server for data uploads. Manually added

    using the META-SHARE editor v2.1, including

    technical feedback to the META-SHARE

    developers.

    Sofie monolingual

    treebank Corpus

    Updated the batch 1 metadata to conform with the

    requirements and possibilities of the new META-

    SHARE editor v2.1 (RestrictionsOfUse, language

    codes). UIB has also used the recent META-

    SHARE legal templates and re-negotiated the user

    terms for the novel "Sofies verden" [Sophie's

    world]. We now have a formal clearance from the

    IPR holder, and are in the process of getting a

    formal Depositor'a agreement (including the license

    conditions) from the author of the novel.

    Sofie multilingual

    treebank Corpus

    Updated the batch 1 metadata to conform with the

    requirements and possibilities of the new META-

    SHARE editor v2.1 (RestrictionsOfUse, language

    codes). UIB is in the process of renegotiating the

    user terms for the translations of the original novel

    in Norwegian, "Sofies verden" [Sophie's world]: we

    are sending formal requests to publishing houses,

    based on the successful clearance of the original.

    The aim is to make everybody sign a META-

    SHARE Depositor's agreement.

  • Contract no. 270899

    D4.4 V1.0 Page 23 of 45

    Resource name Resource

    Type

    Work carried out to meet the META-SHARE

    meta-data schema

    UHEL

    International

    Corpus of Learner

    Finnish (ICLFI)

    Corpus

    Seminar with the members of the corpus project,

    gathering the information required for filling in the

    META-SHARE editor’s mandatory fields, IPR

    guidance, clearing up licensing issues.

    Helsinki University

    Conversation Data

    Archive

    Corpus

    Gathering the information required for filling in the

    META-SHARE editor’s mandatory fields, IPR

    guidance, clearing up licensing issues.

    Oulu corpus Corpus

    Gathering the information required for filling in the

    META-SHARE editor’s mandatory fields, clearing

    up licensing issues.

    Geographic Names

    Register of the

    National Land

    Survey

    LexicalConcep

    tual

    Gathering the information required for filling in the

    META-SHARE editor’s mandatory fields, IPR

    guidance, clearing up licensing issues, translating

    the metadata from Finnish to English.

    Samples of Spoken

    Finnish Corpus

    Gathering the information required for filling in the

    META-SHARE editor’s mandatory fields, IPR

    guidance, clearing up licensing issues, translating

    the metadata from Finnish to English.

    Finland-Swedish

    Text Collection Corpus

    Gathering the information required for filling in the

    META-SHARE editor’s mandatory fields, clearing

    up licensing issues.

    Finnish Text

    Collection Corpus

    Gathering the information required for filling in the

    META-SHARE editor’s mandatory fields, clearing

    up licensing issues.

    The Helsinki

    Corpus of English

    Texts

    Corpus

    Gathering the information required for filling in the

    META-SHARE editor’s mandatory fields, IPR

    guidance, negotiations and a seminar with the IPR

    holders on licensing issues.

    WWW-Lemmie Tool

    Gathering the information required for filling in the

    META-SHARE editor’s mandatory fields, clearing

    up licensing issues.

    Finnish WordNet LexicalConcep

    tual

    Developing the tool, ensuring that the current

    version is available and functional, filling in the

    META-SHARE editor’s mandatory fields.

    Updating the batch 1 metadata to conform with the

    requirements and possibilities of the v2.1 META-

    SHARE editor.

    Open morphology

    for Finnish Tool

    Developing the tool, ensuring that the current

    version is available and functional, filling in the

    META-SHARE editor’s mandatory fields.

    Helsinki Finite-

    State Transducer

    Technology

    Tool

    Developing the tool, ensuring that the current

    version is available and functional, filling in the

    META-SHARE editor’s mandatory fields.

    FinnTreeBank 2 Tool

    Developing the tool, ensuring that the current

    version is available and functional, filling in the

    META-SHARE editor’s mandatory fields.

  • Contract no. 270899

    D4.4 V1.0 Page 24 of 45

    Resource name Resource

    Type

    Work carried out to meet the META-SHARE

    meta-data schema

    HI

    CombiTagger Tool

    Metadata manually added using the META-SHARE

    editor v2.1. IPR resolved through contact with the

    distsributor (Reykjavík University) .

    IceNLP - Tagger,

    Parser, Lemmatizer Tool

    Metadata manually added using the META-SHARE

    editor v2.1. IPR resolved through contact with the

    distributor (Reykjavík University) .

    Apertium-is-en

    Translation System Tool

    Metadata manually added using the META-SHARE

    editor v2.1. IPR resolved through contact with the

    distributor (Reykjavík University) .

    Tagged Icelandic

    Corpus (MÍM) Corpus

    Metadata manually added using the META-SHARE

    editor v2.1. IPR resolved through contact with the

    individual copyright holders (work performed by

    The Arni Magnusson Institute for Icelandic Studies

    before META-NORD project started) and the

    distributor (The Arni Mangusson Institute for

    Icelandic Studies). The META-NORD project has

    enabled further work to be performed on the corpus,

    such as finishing (automatic) tagging, converting to

    standard format (TEI-conformant XML-format) and

    some work on the search interface.

    Database of

    Modern Icelandic

    Inflection (DMII)

    LexicalConcep

    tual

    Metadata manually added using the META-SHARE

    editor v2.1. IPR resolved through contact with the

    distributor (The Arni Magnusson Institute for

    Icelandic Studies).

    Icelandic Term

    Bank –

    Terminology

    LexicalConcep

    tual

    Metadata manually added using the META-SHARE

    editor v2.1. IPR resolved through contact with

    individual copyright holders (owners of individual

    terminologies) and the distributor (The Arni

    Mangusson Institute for Icelandic Studies).

    Terminologies will be converted to standard format

    (xml/TBX) and made available for download.

    Distribution will be by license CC BY_SA_3.0.

    Íslenskur

    orðasjóður - Large

    Corpus

    Corpus

    Metadata manually added using the META-SHARE

    editor v2.1. IPR resolved through contact with the

    distributor (Deutscher Wortschatz and Erla

    Hallsteinsdóttir) .

    The Jensson

    Corpus Corpus

    Metadata manually added using the META-SHARE

    editor v2.1. IPR resolved through contact with the

    distributor (Arnar Jensson) .

    The Thor Corpus Corpus

    Metadata manually added using the META-SHARE

    editor v2.1. IPR resolved through contact with the

    distributor (Arnar Jensson) .

  • Contract no. 270899

    D4.4 V1.0 Page 25 of 45

    Resource name Resource

    Type

    Work carried out to meet the META-SHARE

    meta-data schema

    HI

    The Broadcast

    News RUV-1

    Corpus

    Corpus

    Metadata manually added using the META-SHARE

    editor v2.1. IPR resolved through contact with the

    distributor (Arnar Jensson) .

    Resource name Resource

    Type

    Work carried out to meet the META-SHARE

    meta-data schema

    LKI

    Geoinformational

    Database of

    Lithuanian

    Toponyms

    LexicalConcep

    tual

    Metadata manually added using the META-SHARE

    editor v2.1. IPR resolved. Conversion to standard

    downloadable formats added.

    Database of

    Lithuanian

    Historical Ethnic

    Place Names

    LexicalConcep

    tual

    Metadata manually added using the META-SHARE

    editor v2.1. IPR resolved. Conversion to standard

    downloadable formats added.

    Database

    Synonymy of

    Lithuanian Terms

    LexicalConcep

    tual

    Metadata manually added using the META-SHARE

    editor v2.1. IPR resolved. Conversion to standard

    downloadable formats added.

    Resource name Resource

    Type

    Work carried out to meet the META-SHARE

    meta-data schema

    UGOT

    Academic texts –

    Humanities corpus

    Semi-automatically generated xml with manual

    revision

    Adult bloggers corpus Semi-automatically generated xml with manual

    revision

    Astra corpus Semi-automatically generated xml with manual

    revision

    August Strindberg's

    letters corpus

    Semi-automatically generated xml with manual

    revision

    August Strindberg's

    novels corpus

    Semi-automatically generated xml with manual

    revision

    Bellman corpus Semi-automatically generated xml with manual

    revision

    Blog mix corpus Semi-automatically generated xml with manual

    revision

    Bonnier novels I

    (1976/77) corpus

    Semi-automatically generated xml with manual

    revision

    Bonniers novels II

    (1980/81) corpus

    Semi-automatically generated xml with manual

    revision

    Corpus for health

    care technical

    language

    corpus Semi-automatically generated xml with manual

    revision

    Corpus Oral de

    Referencia del corpus

    Semi-automatically generated xml with manual

    revision

  • Contract no. 270899

    D4.4 V1.0 Page 26 of 45

    Resource name Resource

    Type

    Work carried out to meet the META-SHARE

    meta-data schema

    UGOT

    Español

    Contemporáneo

    (SOL)

    DiabetologNytt

    (1996–1999) corpus

    Semi-automatically generated xml with manual

    revision

    DN 1987 corpus Semi-automatically generated xml with manual

    revision

    Dramawebben

    (demo) corpus

    Semi-automatically generated xml with manual

    revision

    Essayistic literature

    1970-2011 corpus

    Semi-automatically generated xml with manual

    revision

    Fiction 1970-2011 corpus Semi-automatically generated xml with manual

    revision

    FNB 1999 corpus Semi-automatically generated xml with manual

    revision

    FNB 2000 corpus Semi-automatically generated xml with manual

    revision

    Forskning &

    Framsteg corpus

    Semi-automatically generated xml with manual

    revision

    GP – Två dagar corpus Semi-automatically generated xml with manual

    revision

    GP 1994 corpus Semi-automatically generated xml with manual

    revision

    GP 2001 corpus Semi-automatically generated xml with manual

    revision

    GP 2002 corpus Semi-automatically generated xml with manual

    revision

    GP 2003 corpus Semi-automatically generated xml with manual

    revision

    GP 2004 corpus Semi-automatically generated xml with manual

    revision

    GP 2005 corpus Semi-automatically generated xml with manual

    revision

    GP 2006 corpus Semi-automatically generated xml with manual

    revision

    GP 2007 corpus Semi-automatically generated xml with manual

    revision

    GP 2008 corpus Semi-automatically generated xml with manual

    revision

    GP 2009 corpus Semi-automatically generated xml with manual

    revision

    GP 2010 corpus Semi-automatically generated xml with manual

    revision

    GP 2011 corpus Semi-automatically generated xml with manual

    revision

  • Contract no. 270899

    D4.4 V1.0 Page 27 of 45

    Resource name Resource

    Type

    Work carried out to meet the META-SHARE

    meta-data schema

    UGOT

    Hanken corpus Semi-automatically generated xml with manual

    revision

    Hufvudstadsbladet

    1991 corpus

    Semi-automatically generated xml with manual

    revision

    Hufvudstadsbladet

    1998 corpus

    Semi-automatically generated xml with manual

    revision

    Hufvudstadsbladet

    1999 corpus

    Semi-automatically generated xml with manual

    revision

    Jakobstads Tidning

    1999 corpus

    Semi-automatically generated xml with manual

    revision

    Jakobstads Tidning

    2000 corpus

    Semi-automatically generated xml with manual

    revision

    Källan 2008-2010 corpus Semi-automatically generated xml with manual

    revision

    Lagtexter 1990–

    2000 corpus

    Semi-automatically generated xml with manual

    revision

    Läkartidningen

    medical journal

    1996

    corpus Semi-automatically generated xml with manual

    revision

    Läkartidningen

    medical journal

    1997

    corpus Semi-automatically generated xml with manual

    revision

    Läkartidningen

    medical journal

    1998

    corpus Semi-automatically generated xml with manual

    revision

    Läkartidningen

    medical journal

    1999

    corpus Semi-automatically generated xml with manual

    revision

    Läkartidningen

    medical journal

    2000

    corpus Semi-automatically generated xml with manual

    revision

    Läkartidningen

    medical journal

    2001

    corpus Semi-automatically generated xml with manual

    revision

    LäSBarT corpus Semi-automatically generated xml with manual

    revision

    Meddelanden från

    Åbo Akademi

    2002–2010

    corpus Semi-automatically generated xml with manual

    revision

    Myndighetsprosa

    1990–2000 corpus

    Semi-automatically generated xml with manual

    revision

    Non-fiction 1970-

    2011 corpus

    Semi-automatically generated xml with manual

    revision

    Norstedts novels

    (1999) corpus

    Semi-automatically generated xml with manual

    revision

  • Contract no. 270899

    D4.4 V1.0 Page 28 of 45

    Resource name Resource

    Type

    Work carried out to meet the META-SHARE

    meta-data schema

    UGOT

    Nya Argus 2010–

    2011 corpus

    Semi-automatically generated xml with manual

    revision

    Older Swedish

    novels corpus

    Semi-automatically generated xml with manual

    revision

    ORDAT corpus Semi-automatically generated xml with manual

    revision

    Österbottens

    tidning 2011 corpus

    Semi-automatically generated xml with manual

    revision

    Österbottens

    tidning 2012 corpus

    Semi-automatically generated xml with manual

    revision

    Parole corpus corpus Semi-automatically generated xml with manual

    revision

    Press 65 corpus Semi-automatically generated xml with manual

    revision

    Press 76 corpus Semi-automatically generated xml with manual

    revision

    Press 95 corpus Semi-automatically generated xml with manual

    revision

    Press 96 corpus Semi-automatically generated xml with manual

    revision

    Press 97 corpus Semi-automatically generated xml with manual

    revision

    Press 98 corpus Semi-automatically generated xml with manual

    revision

    Psalm book (1937) corpus Semi-automatically generated xml with manual

    revision

    Smittskydd corpus Semi-automatically generated xml with manual

    revision

    SNP 1978-79 corpus Semi-automatically generated xml with manual

    revision

    Studentbladet 2011 corpus Semi-automatically generated xml with manual

    revision

    Svenskbygden

    2010-2011 corpus

    Semi-automatically generated xml with manual

    revision

    Swedish party

    programs and

    election manifestos

    corpus Semi-automatically generated xml with manual

    revision

    Swedish statute

    book 1978-81 corpus

    Semi-automatically generated xml with manual

    revision

    Swedish Wikipedia

    Corpus corpus

    Semi-automatically generated xml with manual

    revision

    SweWaC corpus Semi-automatically generated xml with manual

    revision

    Syd-Österbotten

    2012 corpus

    Semi-automatically generated xml with manual

    revision

  • Contract no. 270899

    D4.4 V1.0 Page 29 of 45

    Resource name Resource

    Type

    Work carried out to meet the META-SHARE

    meta-data schema

    UGOT

    Syntag treebank corpus Semi-automatically generated xml with manual

    revision

    Talbanken corpus Semi-automatically generated xml with manual

    revision

    Vasabladet 1991 corpus Semi-automatically generated xml with manual

    revision

    Vasabladet 2012 corpus Semi-automatically generated xml with manual

    revision

    Parole+ lexical

    resource

    Semi-automatically generated xml with manual

    revision

    WordNet-SALDO lexical

    resource

    Semi-automatically generated xml with manual

    revision

  • Contract no. 270899

    D4.4 V1.0 Page 30 of 45

    Appendix A: Planned and Actual

    Resource name

    Planned

    Batch for

    metadata

    in

    accordance

    with D2.4

    Status Resource availability Respon

    sible

    partner

    Eurotermbank 1 ✓ http://www.eurotermbank.eu/ TILDE

    Lithuanian-Latvian dictionary 1 ✓ http://lietuviu.letonika.lv TILDE

    Latvian-Lithuanian dictionary 1 ✓ http://www.letonika.lv/lvlt/ TILDE

    Estonian-Latvian dictionary 1 ✓ http://eesti.letonika.lv TILDE

    Latvian-English legislation

    corpus of Republic of Latvia

    (Latvian-English Ngram

    corpus, Legislation of Republic

    of Latvia)

    1

    http://dl.tilde.lv/META-

    NORD/LegislationNgramCorpusOfTh

    eRepublicOfLatvia.html

    TILDE

    Multilingual dictionary of

    person names 1

    ✓ http://www.letonika.lv/personvardi TILDE

    Tilde’s POS-tagger 3 3 Planned to made available at Batch 3 TILDE

    Corpus of Latvian literature 1 ✓ http://www.letonika.lv/literatura/ TILDE

    EASTIN-CL multilingual

    ontology 3

    http://dl.tilde.lv/META-

    NORD/EastinClMultilingualOntology

    OfAssistiveTechnology.html

    TILDE

    Latvian Russian person names

    and geo names glossary 2

    http://dl.tilde.lv/META-

    NORD/LvRuPersonNamesGlossary.ht

    ml

    TILDE

    Not initially planned in D2.4

    Accurat Comparable Corpora ✓

    http://dl.tilde.lv/META-

    NORD/AccuratComparableCorpora.ht

    ml

    TILDE

    Resource name

    Planned

    Batch for

    metadata

    in

    accordanc

    e with D2.4

    Status Resource availability Respon

    sible

    partner

    Danish wordnet, DanNet 1,2,3 ✓

    http://wordnet.dk/dannet/dannet/menu?

    item=2 UCPH

    Cross-lingually linked

    resources (with FIN and SWE) 2,3 2,3 Planned to made available at Batch 3 UCPH

    SprogTeknologisk Ordbase 1, 2, 3 ✓ Planned to made available at Batch 3 UCPH

    Copenhagen Dependency

    Treebanks 1

    http://code.google.com/p/copenhagen-

    dependency-treebank/wiki/CDT UCPH

    The Copenhagen Danish-

    English Dependency Treebank 1

    http://code.google.com/p/copenhagen-

    dependency-treebank/wiki/CDT UCPH

    Danish first encounters

    NOMCO corpus 3 3 Planned to made available at Batch 3 UCPH

    http://www.eurotermbank.eu/http://lietuviu.letonika.lv/http://www.letonika.lv/lvlt/http://eesti.letonika.lv/http://dl.tilde.lv/META-NORD/LegislationNgramCorpusOfTheRepublicOfLatvia.htmlhttp://dl.tilde.lv/META-NORD/LegislationNgramCorpusOfTheRepublicOfLatvia.htmlhttp://dl.tilde.lv/META-NORD/LegislationNgramCorpusOfTheRepublicOfLatvia.htmlhttp://www.letonika.lv/personvardihttp://www.letonika.lv/literatura/http://dl.tilde.lv/META-NORD/EastinClMultilingualOntologyOfAssistiveTechnology.htmlhttp://dl.tilde.lv/META-NORD/EastinClMultilingualOntologyOfAssistiveTechnology.htmlhttp://dl.tilde.lv/META-NORD/EastinClMultilingualOntologyOfAssistiveTechnology.htmlhttp://dl.tilde.lv/META-NORD/LvRuPersonNamesGlossary.htmlhttp://dl.tilde.lv/META-NORD/LvRuPersonNamesGlossary.htmlhttp://dl.tilde.lv/META-NORD/LvRuPersonNamesGlossary.htmlhttp://dl.tilde.lv/META-NORD/AccuratComparableCorpora.htmlhttp://dl.tilde.lv/META-NORD/AccuratComparableCorpora.htmlhttp://dl.tilde.lv/META-NORD/AccuratComparableCorpora.htmlhttp://wordnet.dk/dannet/dannet/menu?item=2http://wordnet.dk/dannet/dannet/menu?item=2http://code.google.com/p/copenhagen-dependency-treebank/wiki/CDThttp://code.google.com/p/copenhagen-dependency-treebank/wiki/CDThttp://code.google.com/p/copenhagen-dependency-treebank/wiki/CDThttp://code.google.com/p/copenhagen-dependency-treebank/wiki/CDT

  • Contract no. 270899

    D4.4 V1.0 Page 31 of 45

    Resource name

    Planned

    Batch for

    metadata

    in

    accordanc

    e with D2.4

    Status Resource availability Respon

    sible

    partner

    Reference corpus for Danish - - Not available UCPH

    Corpus of sublanguage texts

    (2000 – 2010) - - Not available UCPH

    Danish XLE grammar - - Not available UCPH

    CstTokeniser 3 3 Planned to made available at Batch 3 UCPH

    CstNER 3 3 Planned to made available at Batch 3 UCPH

    CstTagger 3 3 Planned to made available at Batch 3 UCPH

    CstLemma 3 3 Planned to made available at Batch 3 UCPH

    CstKeyExt - - Not available UCPH

    CstNP-Rec - - Not available UCPH

    CstRep - - Not available UCPH

    HPSG –grammar - - Not available UCPH

    Not initially planned in D2.4

    STO-LMF, morphology ✓ Not available UCPH

    Danish-Swedish linked

    wordnets ✓ Not available UCPH

    Finnish-Danish linked

    wordnets ✓ Not available UCPH

    Finnish-Estonian linked

    wordnets ✓ Not available UCPH

    Finnish-Swedish linked

    wordnets ✓ Not available UCPH

    NST Lexical database for

    Danish ✓

    http://www.nb.no/spraakbanken/tilgjen

    gelege-ressursar/leksikalske-databasar UCPH

    Resource name

    Planned

    Batch for

    metadata

    in

    accordanc

    e with D2.4

    Status Resource availability Respon

    sible

    partner

    The Estonian Reference

    Corpus 1

    ✓ http://www.cl.ut.ee/korpused/segakorp

    us/ UT

    Treebank 1 ✓ Planned to made available at Batch 3 UT

    Estonian WordNet 1 ✓

    http://www.cl.ut.ee/ressursid/teksaurus

    / UT

    BABEL Estonian Database 3 3 Not available UT

    Corpora of morphologically

    disambiguated texts 1

    ✓ http://www.cl.ut.ee/korpused/morfkorp

    us/index.php?lang=en UT

    Corpora with shallow syntactic

    annotation 1

    ✓ http://math.ut.ee/~kaili/Korpus/pindmi

    ne/ UT

    Corpus of emotional speech 2 ✓ http://peeter.eki.ee:5000/ UT

    Corpus of Institute of Estonian 2 ✓ http://en.eki.ee/corpus UT

    http://www.nb.no/spraakbanken/tilgjengelege-ressursar/leksikalske-databasarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/leksikalske-databasarhttp://www.cl.ut.ee/korpused/segakorpus/http://www.cl.ut.ee/korpused/segakorpus/http://www.cl.ut.ee/ressursid/teksaurus/http://www.cl.ut.ee/ressursid/teksaurus/http://www.cl.ut.ee/korpused/morfkorpus/index.php?lang=enhttp://www.cl.ut.ee/korpused/morfkorpus/index.php?lang=enhttp://math.ut.ee/~kaili/Korpus/pindmine/http://math.ut.ee/~kaili/Korpus/pindmine/http://en.eki.ee/corpus

  • Contract no. 270899

    D4.4 V1.0 Page 32 of 45

    Resource name

    Planned

    Batch for

    metadata

    in

    accordanc

    e with D2.4

    Status Resource availability Respon

    sible

    partner

    Language

    Corpus of Spoken Estonian 3 3 Not available UT

    Cross-lingually linked resource 3 3 Planned to made available at Batch 3 UT

    Dictionaries Estonian-

    Russian, 2

    ✓ http://portaal.eki.ee/dict/evs/ UT

    English-Estonian and

    Estonian-English parallel

    corpus 1

    ✓ Planned to made available at Batch 3 UT

    Estonian Foreign Accent

    Corpus 3 3 Planned to made available at Batch 3 UT

    Monolingual dictionaries 2 ✓ Not available UT

    Dictionary of Standard

    Estonian ÕS 2006 2

    ✓ http://www.eki.ee/dict/qs UT

    Semantically disambiguated

    corpus 1

    ✓ http://www.cl.ut.ee/korpused/semkorp

    us/ UT

    The database of Estonian

    verbal multi-word expressions 1

    https://svn.spraakdata.gu.se/repos/meta

    nord/pub/ut/LexicalConceptual/ESTM

    WE.gz UT

    Estonian text-speech

    synthesizer 3 3 Planned to made available at Batch 3 UT

    Morphological analyzer 3 3 Planned to made available at Batch 3 UT

    Morphological Toolset for

    Estonian 2

    http://eelex.eki.ee/pub/Install/ekiMorfo

    /ekiMorfoSetup.msi (ver 4.2

    executable)

    ftp://ftp.eki.ee/pub/keeletehnoloogia/m

    orfana/ (ver 3.2 source code,

    executable)

    UT

    Morph syntactic disambiguator

    and shallow parser 2

    ✓ http://www.ut.ee/~kaili/grammatika/ UT

    Was not initially planned in D2.4

    English-Estonian Machine

    Translation Dictionary

    ftp://ftp.eki.ee/pub/keeletehnoloogia/in

    glise-eesti/ UT

    Resource name

    Planned

    Batch for

    metadata

    in

    accordance

    with D2.4

    Status Resource availability Respon

    sible

    partner

    Acoustic database for Danish 1 ✓

    http://www.nb.no/spraakbanken/tilgjen

    gelege-ressursar/taledatabasar UIB

    Acoustic database for

    Norwegian 1

    http://www.nb.no/spraakbanken/tilgjen

    gelege-ressursar/taledatabasar UIB

    http://portaal.eki.ee/dict/evs/http://www.cl.ut.ee/korpused/semkorpus/http://www.cl.ut.ee/korpused/semkorpus/http://www.ut.ee/~kaili/grammatika/ftp://ftp.eki.ee/pub/keeletehnoloogia/inglise-eesti/ftp://ftp.eki.ee/pub/keeletehnoloogia/inglise-eesti/http://www.nb.no/spraakbanken/tilgjengelege-ressursar/taledatabasarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/taledatabasarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/taledatabasarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/taledatabasar

  • Contract no. 270899

    D4.4 V1.0 Page 33 of 45

    Resource name

    Planned

    Batch for

    metadata

    in

    accordance

    with D2.4

    Status Resource availability Respon

    sible

    partner

    Acoustic database for Swedish 1 ✓

    http://www.nb.no/spraakbanken/tilgjen

    gelege-ressursar/taledatabasar UIB

    NST Lexical database for

    Danish 1,2

    http://www.nb.no/spraakbanken/tilgjen

    gelege-ressursar/leksikalske-databasar UIB

    NST Lexical database for

    Norwegian 1,2

    http://www.nb.no/spraakbanken/tilgjen

    gelege-ressursar/leksikalske-databasar UIB

    NST Lexical database for

    Swedish 1,2

    http://www.nb.no/spraakbanken/tilgjen

    gelege-ressursar/leksikalske-databasar UIB

    Norsk ordbank, bokmål 1 ✓

    http://www.nb.no/spraakbanken/tilgjen

    gelege-ressursar/leksikalske-databasar UIB

    Norsk Ordbank, nynorsk 1 ✓

    http://www.nb.no/spraakbanken/tilgjen

    gelege-ressursar/leksikalske-databasar UIB

    Oslo-Bergen tagger 1 ✓

    http://tekstlab.uio.no/obt-

    ny/english/download.html UIB

    SCARRIE lexicon 1,2 ✓

    http://www.nb.no/spraakbanken/tilgjen

    gelege-ressursar/leksikalske-databasar UIB

    Sofietrebanken 1 ✓ http://iness.uib.no UIB

    TRIS Spanish-German parallel

    corpus 1 ✓

    Available upon request (commercial

    license) UIB

    Norwegian-Vietnamese digital

    dictionary 1 X Not available UIB

    Acquis communautaire 2 X Not available UIB

    Leksikografisk bokmålskorpus 2 X Not available UIB

    Milterm 2 X Not available UIB

    NHH Termbase 2 X Not available UIB

    UHR's Termbase for

    Norwegian higher education

    institutions 2 ✓ http://github.com/clarino/uhrtermlists/ UIB

    Stadsnamnsamlinga 2 X Not available UIB

    Translation Corpus Aligner 2 2 X Not available UIB

    The Norwegian-Spanish

    Parallel Corpus 2

    X Not available UIB

    Det nynorske tekstkorpuset 3 3 Planned to made available at Batch 3 UIB

    International Computer

    Archive of Modern and

    Medieval English 3 3 Planned to made available at Batch 3 UIB

    n-grams for Norwegian

    bokmål and nynorsk 3 ✓ Available, see alternative resources UIB

    Norwegian newspaper corpus 3 ✓

    http://www.nb.no/sbfil/tekst/norsk_avi

    skorpus.zip UIB

    Norwegian reference corpus

    for bokmål and nynorsk 3 3 Planned to made available at Batch 3 UIB

    Norwegian wordnet 3 ✓ Available, see alternative resources UIB

    http://www.nb.no/spraakbanken/tilgjengelege-ressursar/taledatabasarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/taledatabasarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/leksikalske-databasarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/leksikalske-databasarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/leksikalske-databasarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/leksikalske-databasarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/leksikalske-databasarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/leksikalske-databasarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/leksikalske-databasarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/leksikalske-databasarhttp://tekstlab.uio.no/obt-ny/english/download.htmlhttp://tekstlab.uio.no/obt-ny/english/download.htmlhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/leksikalske-databasarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/leksikalske-databasarhttp://iness.uib.no/http://github.com/clarino/uhrtermlists/http://www.nb.no/sbfil/tekst/norsk_aviskorpus.ziphttp://www.nb.no/sbfil/tekst/norsk_aviskorpus.zip

  • Contract no. 270899

    D4.4 V1.0 Page 34 of 45

    Resource name

    Planned

    Batch for

    metadata

    in

    accordance

    with D2.4

    Status Resource availability Respon

    sible

    partner

    Was not initially planned in D2.4

    n-gram for Norwegian Bokmål

    (based on NNC and NST news

    text)

    ttp://www.nb.no/spraakbanken/tilgjeng

    elege-ressursar/tekstressursar UIB

    n-gram for Norwegian Bokmål

    (based on NNC)

    ✓ http://www.nb.no/spraakbanken/tilgjen

    gelege-ressursar/tekstressursar UIB

    n-gram for Norwegian Bokmål

    (based on NST news text)

    ✓ http://www.nb.no/spraakbanken/tilgjen

    gelege-ressursar/tekstressursar UIB

    n-gram for Danish (based on

    the NST text corpus)

    ✓ http://www.nb.no/spraakbanken/tilgjen

    gelege-ressursar/tekstressursar UIB

    n-gram for Norwegian

    Nynorsk (based on NNC and

    NST)

    http://www.nb.no/spraakbanken/tilgjen

    gelege-ressursar/tekstressursar UIB

    n-gram for Swedish (based on

    the NST Text Corpus)

    ✓ http://www.nb.no/spraakbanken/tilgjen

    gelege-ressursar/tekstressursar UIB

    NoWaC - Norwegian Web as

    Corpus

    ✓ http://omilia.uio.no/swamp/index.php UIB

    Frequency lists (tokens) from

    NoWaC - Norwegian Web as

    Corpus

    ✓ http://omilia.uio.no/swamp/index.php UIB

    Frequency lists (lemmas) from

    NoWaC - Norwegian Web as

    Corpus

    ✓ http://omilia.uio.no/swamp/index.php UIB

    Sofie Parallel Treebank ✓

    http://iness.uib.no UIB

    The Wordnet for Norwegian

    Bokmål

    ✓ http://www.nb.no/sbfil/leksikalske_dat

    abaser/ordnett_nob_0.2.zip UIB

    The Wordnet for Norwegian

    Nynorsk

    ✓ http://www.nb.no/sbfil/leksikalske_dat

    abaser/ordnett_nno_0.2.zip UIB

    The Norwegian Language

    Council's list over names of

    historical events and

    persons

    http://xn--sprkrdet-c0ac.no/nb-

    NO/Sprakhjelp/Rettskrivning_Ordboe

    ker/Historiske_navn/ UIB

    The Norwegian Language

    Council's list over names of

    inhabitants in

    Norwegian

    http://xn--sprkrdet-c0ac.no/nb-

    NO/Sprakhjelp/Rettskrivning_Ordboe

    ker/Innbyggjarnamn/

    UIB

    The Norwegian Language

    Council's dictionary from

    Norwegian Bokmål to

    Norwegian Nynorsk

    http://xn--sprkrdet-c0ac.no/nb-

    NO/Sprakhjelp/Raad/Fra_bokmaal_til

    _nynorsk/

    UIB

    The Norwegian Language

    Council's list over language

    names in Norwegian

    http://xn--sprkrdet-c0ac.no/nb-

    NO/Sprakhjelp/Rettskrivning_Ordboe

    ker/Navn_paa_spraak/

    UIB

    http://www.nb.no/spraakbanken/tilgjengelege-ressursar/tekstressursarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/tekstressursarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/tekstressursarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/tekstressursarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/tekstressursarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/tekstressursarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/tekstressursarhttp://www.nb.no/spraakbanken/tilgjengelege-ressursar/tekstressursarhttp://www.nb.no/sbfil/leksikalske_databaser/ordnett_nob_0.2.ziphttp://www.nb.no/sbfil/leksikalske_databaser/ordnett_nob_0.2.ziphttp://språkrådet.no/nb-NO/Sprakhjelp/Rettskrivning_Ordboeker/Innbyggjarnamn/http://språkrådet.no/nb-NO/Sprakhjelp/Rettskrivning_Ordboeker/Innbyggjarnamn/http://språkrådet.no/nb-NO/Sprakhjelp/Rettskrivning_Ordboeker/Innbyggjarnamn/http://språkrådet.no/nb-NO/Sprakhjelp/Raad/Fra_bokmaal_til_nynorsk/http://språkrådet.no/nb-NO/Sprakhjelp/Raad/Fra_bokmaal_til_nynorsk/http://språkrådet.no/nb-NO/Sprakhjelp/Raad/Fra_bokmaal_til_nynorsk/http://språkrådet.no/nb-NO/Sprakhjelp/Rettskrivning_Ordboeker/Navn_paa_spraak/http://språkrådet.no/nb-NO/Sprakhjelp/Rettskrivning_Ordboeker/Navn_paa_spraak/http://språkrådet.no/nb-NO/Sprakhjelp/Rettskrivning_Ordboeker/Navn_paa_spraak/

  • Contract no. 270899

    D4.4 V1.0 Page 35 of 45

    Resource name

    Planned

    Batch for

    metadata

    in

    accordance

    with D2.4

    Status Resource availability Respon

    sible

    partner

    Was not initially planned in D2.4

    The Norwegian Language

    Council's list over state names

    in Norwegian

    http://xn--sprkrdet-c0ac.no/nb-

    NO/Sprakhjelp/Rettskrivning_Ordboe

    ker/Navn_paa_stater/

    UIB

    The Norwegian Language

    Council's list over names of

    public departments

    in Norwegian

    http://xn--sprkrdet-c0ac.no/nb-

    NO/Sprakhjelp/Rettskrivning_Ordboe

    ker/Navn-pa-statsorganer/

    UIB

    The Norwegian Language

    Council's list over

    geographical names in

    Norwegian

    http://www.sprakradet.no/nb-

    NO/Sprakhjelp/Rettskrivning_Ordboe

    ker/Geografiske_namn/

    UIB

    the Morphologically Annotated

    Part of BulTreeBank ✓ http://iness.uib.no UIB

    the Dependency Part of

    BulTreeBank ✓ http://iness.uib.no UIB

    Resource name

    Planned

    Batch for

    metadata

    in

    accordance

    with D2.4

    Status Resource availability Respon

    sible

    partner

    Finnish TreeBank Grammar

    Definition Corpus 1

    ✓ http://www.ling.helsinki.fi/kieliteknolo

    gia/tutkimus/treebank/index.shtml UHEL

    Finnish WordNet 1 ✓

    http://www.ling.helsinki.fi/en/lt/resear

    ch/finnwordnet/ UHEL

    Written corpora of old literary

    Finnish (Vanha kirjasuomi) 1

    ✓ http://kaino.kotus.fi/korpus/vks/meta/v

    ks_coll_rdf.xml UHEL

    Corpus of early modern

    Finnish (Varhaisnykysuomen

    korpus) 1

    http://kaino.kotus.fi/korpus/1800/meta/

    1800_coll_rdf.xml UHEL

    Finnish literature classics

    (Suomalaisen kirjallisuuden

    klassikoita) 1

    http://kaino.kotus.fi/korpus/klassikot/

    meta/klassikot_coll_rdf.xml UHEL

    Up-to-date word list of modern

    Finnish (Ajantasainen

    nykysuomen sanalista) 1

    ✓ http://kaino.kotus.fi/sanat/nykysuomi/ UHEL

    Frequency list of words in

    written Finnish (Kirjoitetun

    suomen kielen sanojen

    taajuuslista)

    1

    http://kaino.kotus.fi/sanat/taajuuslista/

    parole.php UHEL

    Kansainvälinen oppijansuomen

    korpus (ICLFI) 2 X Will Not be available UHEL

    http://språkrådet.no/nb-NO/Sprakhjelp/Rettskrivning_Ordboeker/Navn_paa_stater/http://språkrådet.no/nb-NO/Sprakhjelp/Rettskrivning_Ordboeker/Navn_paa_stater/http://språkrådet.no/nb-NO/Sprakhjelp/Rettskrivning_Ordboeker/Navn_paa_stater/http://språkrådet.no/nb-NO/Sprakhjelp/Rettskrivning_Ordboeker/Navn-pa-statsorganer/http://språkrådet.no/nb-NO/Sprakhjelp/Rettskrivning_Ordboeker/Navn-pa-statsorganer/http://språkrådet.no/nb-NO/Sprakhjelp/Rettskrivning_Ordboeker/Navn-pa-statsorganer/http://www.sprakradet.no/nb-NO/Sprakhjelp/Rettskrivning_Ordboeker/Geografiske_namn/http://www.sprakradet.no/nb-NO/Sprakhjelp/Rettskrivning_Ordboeker/Geografiske_namn/http://www.sprakradet.no/nb-NO/Sprakhjelp/Rettskrivning_Ordboeker/Geografiske_namn/http://www.ling.helsinki.fi/kieliteknologia/tutkimus/treebank/index.shtmlhttp://www.ling.helsinki.fi/kieliteknologia/tutkimus/treebank/index.shtmlhttp://www.ling.helsinki.fi/en/lt/research/finnwordnet/http://www.ling.helsinki.fi/en/lt/research/finnwordnet/http://kaino.kotus.fi/korpus/vks/meta/vks_coll_rdf.xmlhttp://kaino.kotus.fi/korpus/vks/meta/vks_coll_rdf.xmlhttp://kaino.kotus.fi/korpus/1800/meta/1800_coll_rdf.xmlhttp://kaino.kotus.fi/korpus/1800/meta/1800_coll_rdf.xmlhttp://kaino.kotus.fi/korpus/klassikot/meta/klassikot_coll_rdf.xmlhttp://kaino.kotus.fi/korpus/klassikot/meta/klassikot_coll_rdf.xmlhttp://kaino.kotus.fi/sanat/nykysuomi/http://kaino.kotus.fi/sanat/taajuuslista/parole.phphttp://kaino.kotus.fi/sanat/taajuuslista/parole.php

  • Contract no. 270899

    D4.4 V1.0 Page 36 of 45

    Resource name

    Planned

    Batch for

    metadata

    in

    accordance

    with D2.4

    Status Resource availability Respon

    sible

    partner

    Corpus of Conversational

    Finnish

    (Keskusteluntutkimuksen

    arkisto)

    2

    Will Not be available UHEL

    Open Source (Finnish)

    Morphology 2

    ✓ http://code.google.com/p/omorfi/ UHEL

    Oulu corpus (Language Bank

    Of Finland) 2

    ✓ Will Not be available UHEL

    Geographic Names Register of

    the National Land Survey 2

    ✓ Will Not be available UHEL

    Samples of Spoken Finnish

    (Suomen kielen näytteitä) 2

    ✓ http://lat.csc.fi UHEL

    Helsinki Finite-State

    Transducer Technology 2

    ✓ http://hfst.sourceforge.net/ UHEL

    Finland-Swedish Text

    Collection (Kielipankki,

    Language Bank of Finland) 2

    ✓ Will Not be available UHEL

    Finnish Text Collection

    (Kielipankki, Language Bank

    of Finland) 2

    http://www.csc.fi/tutkimus/alat/kielitie

    de UHEL

    Lemmie 2 ✓ Will Not be available UHEL

    Helsinki Corpus 2 ✓ Will Not be available UHEL

    Finnish TreeBank 2 ✓

    http://www.ling.helsinki.fi/kieliteknolo

    gia/tutkimus/treebank/ UHEL

    FinINTAS corpus (includes the

    FDC - Finnish Dialogue

    Corpus) 3 3 Will Not be available UHEL

    ProoF Corpus 3 3 Will Not be available UHEL

    UTA Cross-Language

    Information Retrieval System 3 3 Will Not be available