Поиск ключевых слов в тексте
Transcript of Поиск ключевых слов в тексте
Information:
1. it is very important thing2. it’s amount increases very fast
The problem is:
«How to find necessary information?»
Simple example
Consider some E-Libratygen.lib.rus.ec*– library of science literature.Contains moreover than 250k books.
*(it is not advertising, but just example)
Search “физика”-993 results
Search “закон Ньютона” – 0 results
The question appears:
«How to get the list of keywords from each
book?»
I’ll try answer it in my coursework
Ziph’s law
Ziph’s law
Ziph’s law
Ziph’s law (1940-s) is empirical law
TF-IDF weight
TF-IDF weight
TF-IDF weight
Result weight = TF*IDF
Lemmatisation
Lemmatisation
Lemmatisation
Mystem – the program which can perform lemmatisation.
For non-commercial use.
By
Algorithm
1. Get text2. For each word:– Perform lemmatisation– Find amount of occurances in the text
3. Get list of keywords, using Ziph’s law4. Get more accurate list of words, usint TF-IDF5. Get next text
Algorithm of keywords search
Algorithm of text search by query
Result
The list of keywords (with their weights) for each text
Result
OWL ontology
Classes:• Library• Text• Keyword
Relations:• Contains• Has keyword• Arrears in text• Has TFIDF equals
OWL ontology