Text mining meets neural nets
-
Upload
dan-sullivan -
Category
Data & Analytics
-
view
3.381 -
download
4
Transcript of Text mining meets neural nets
![Page 1: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/1.jpg)
Dan SullivanOctober 21, 2015
Portland, OR
Text Mining Meets Neural Nets: Mining the Biomedical Literature
![Page 2: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/2.jpg)
*Overview
* Introduction to Natural Language Processing and Text Mining
* Linguistic and Statistical Approaches
*Critiquing Classifier Results
* A New Dawn: Deep Learning
* What’s Next
![Page 3: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/3.jpg)
*My Background
* Enterprise Architect, Big Data and Analytics
* Former Research Scientist, bioinformatics institute
* Completing PhD in Computational Biology with focus on text mining
*Author
*Contact*[email protected]*@dsapptech*Linkedin.com/in/dansullivanpdx
![Page 4: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/4.jpg)
*Introduction to Natural Language
Processing and Text Mining
![Page 5: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/5.jpg)
*“Text is unstructured”
![Page 6: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/6.jpg)
*Unstructured?
![Page 7: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/7.jpg)
Manual procedures are time consuming and costly
Volume of literature continues to grow
Commonly used search techniques, such as keyword, similarity searching, metadata filtering, etc. can still yield volumes of literature that are difficult to analyze manually
Some success with popular tools but limitations
Challenges in Text Analysis
![Page 8: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/8.jpg)
*Dominant Eras in NLP
* Linguistic (from 1960s)* Focus on syntax* Transformational Grammar * Sentence parsing
*Statistical (from 1990s)* Focus on words, ngrams, etc.* Statistics and Probability* Related work in Information
Retrieval* Topic Modeling and Classification
* Deep Learning (from ~2006)* Focus on multi-layered neural net
computing non-linear functions* Light on theory, heavy on
engineering* Multiple NLP tasks
![Page 9: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/9.jpg)
*Symbolic vs Sub-Symbolic
VS.
![Page 10: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/10.jpg)
*Linguistic and Statistical
Approaches
http://www.slideshare.net/DanSullivan10/text-mining-meets-neural-nets
http://www.slideshare.net/DanSullivan10/text-mining-meets-neural-nets
http://www.slideshare.net/DanSullivan10/text-mining-meets-neural-nets
![Page 11: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/11.jpg)
*Linguistic Approaches
![Page 12: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/12.jpg)
*Linguistic Approaches -
SyntaxImage: http://www.nltk.org/book_1ed/ch08.html
![Page 13: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/13.jpg)
*Linguistic Approaches - Semantics
Stephen H. Chen et al. Physiol. Genomics 2005;22:257-267
![Page 14: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/14.jpg)
*Statistical Approaches
![Page 15: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/15.jpg)
*Statistical Approach: Topic
Models
* Technique for identify dominant themes in document
* Does not require training
* Multiple Algorithms* Probabilistic Latent Semantic Indexing
(PLSI)* Latent Dirichlet allocation (LDA)
*Assumptions*Documents about a mixture of topics*Words used in document attributable to
topic
Source: http://www.keepcalm-o-matic.co.uk/p/keep-calm-theres-no-training-today/
![Page 16: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/16.jpg)
Debt, Law, Graduation
Debt, EU, Greece, Euro
Source: http://www.nytimes.com/pages/business/index.html April 27, 2015
EU, Greece, Negotiations, Varoufakis
![Page 17: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/17.jpg)
*Topic Modeling Techniques
* Topics represented by words; documents about a set of topics*Doc 1: 50% politics, 50% presidential*Doc 2: 25% CPU, 30% memory, 45% I/O*Doc 3: 30% cholesterol, 40% arteries, 30% heart
* Learning Topics*Assign each word to a topic*For each word and topic, compute* Probability of topic given a document P(topic|doc)* Probability of word given a topic P(word|topic)* Reassign word to new topic with probability
P(topic|doc) * P(word|topic)* Reassignment based on probability that topic T
generated use of word W
TOPICS
![Page 18: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/18.jpg)
Image Source: David Blei, “Probabilistic Topic Models” http://yosinski.com/mlss12/MLSS-2012-Blei-Probabilistic-Topic-Models/
![Page 19: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/19.jpg)
* 3 Key Components* Data* Representation scheme* Algorithms
* Data * Positive examples – Examples from
representative corpus* Negative examples – Randomly selected
from same publications
* Representation* TF-IDF* Vector space representation* Cosine of vectors measure of similarity
* Algorithms* Supervised learning
* SVMs* Ridge Classifier* Perceptrons* kNN* SGD Classifier* Naïve Bayes* Random Forest* AdaBoost *Training a Text Classifier
![Page 20: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/20.jpg)
*Text Classification Process
Source: Steven Bird, Ewan Klein, and Edward Loper. Natural Language Processing with Python:Analyzing Text with Natural Language Toolkit. http://www.nltk.org/book/
![Page 21: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/21.jpg)
*Term Frequency (TF) tf(t,d) = # of occurrences of t in dt is a termd is a document
* Inverse Document Frequency (IDF)idf(t,D) = log(N / |{d in D : t in d}|)D is set of documentsN is number of document
*TF-IDF = tf(t,d) * idf(t,D)
*TF-IDF is * large when high term frequency in document
and low term frequency in all documents*small when term appears in many documents
*Representation: TF-IDF
![Page 22: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/22.jpg)
The 1 0 0 0 0 0 0Esp8 0 1 0 0 0 0 0gene 0 0 1 0 0 0 0is 0 0 0 1 0 0 0a 0 0 0 0 1 0 0known 0 0 0 0 0 1 0virulence 0 0 0 0 0 0 1
translocates reduced levels of Esp8 host cell
Sentence 1 0.193 0.2828 0.078 0.0001 0.389 0.0144 0.011
Sentence 2 0 0.0091 0.0621 0 0 0 0
Sentence 3 0 0 0 0 0.028 0.0113 0
Sentence 4 0.021 0 0 0 0 0 0
One Hot Representation
TF-IDF Representation
*Sparse Representations
![Page 23: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/23.jpg)
* Bag of words model
* Ignores structure (syntax) and meaning (semantics) of sentences
* Representation vector length is the size of set of unique words in corpus
* Stemming used to remove morphological differences
* Each word is assigned an index in the representation vector, V
* The value V[i] is non-zero if word appears in sentence represented by vector
* The non-zero value is a function of the frequency of the word in the sentence and the frequency of the term in the corpus
*Representation: Vector Space
![Page 24: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/24.jpg)
Support Vector Machine (SVM) is large margin classifier
Commonly used in text classification
Initial results based on life sciences sentence classifier
Image Source:http://en.wikipedia.org/wiki/File:Svm_max_sep_hyperplane_with_margin.png
*Classification Algorithms
![Page 25: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/25.jpg)
*Critiquing Classifier Results
![Page 26: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/26.jpg)
Non-VF, Predicted VF: “Collectively, these data suggest that EPEC 30-5-1(3) translocates reduced levels of
EspB into the host cell.”
“Data were log-transformed to correct for heterogeneity of the variances where necessary.”
“Subsequently, the kanamycin resistance cassette from pVK4 was cloned into thePstI site of pMP3, and the resulting plasmid pMP4 was used to target a disruption in the cesF region of EHEC strain 85-170.”
VF, Predicted Non-VF “Here, it is reported that the pO157-encoded Type V-secreted serine protease
EspP influences the intestinal colonization of calves. “
“Here, we report that intragastric inoculation of a Shiga toxin 2 (Stx2)-producing E. coli O157:H7 clinical isolate into infant rabbits led to severe diarrhea and intestinal inflammation but no signs of HUS. “
“The DsbLI system also comprises a functional redox pair”
Virulence Factor (VF)-Misclassification
Examples
![Page 27: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/27.jpg)
Adding additional examples is not likely to substantially improve results as seen by error curve
Preliminary Results-Training
Error
0 2000 4000 6000 8000 100000
0.050.1
0.150.2
0.250.3
0.350.4
0.450.5
All
Training ErrorValidation Error
![Page 28: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/28.jpg)
8 Alternative AlgorithmsSelect 10,000 most important features using chi-square
Alternative Supervised Learning Algorithms
![Page 29: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/29.jpg)
* Increase quantity of data (not always helpful; see error curves)
* Improve quality of data* Utilize multiple supervised algorithms,
ensemble and non-ensemble* Use unlabeled data and semi-supervised
techniques
* Feature Selection
* Parameter Tuning
* Feature Engineering
* Given:* High quality data in sufficient quantity* State of the art machine learning algorithms
* How to improve results: Change Representation?
*Improving Quality
![Page 30: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/30.jpg)
*TF-IDF*Loss of syntactic and
semantic information
*No relation between term index and meaning
*No support for disambiguation
*Feature engineering extends vector representation or substitute specific for more general terms – a crude way to capture semantic properties
*Representation Schemes
Ideal Representation◦ Capture semantic
similarity of words
◦ Does not require feature engineering
◦ Minimal pre-processing, e.g. no mapping to ontologies
◦ Improves precision and recall
![Page 31: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/31.jpg)
*A New Dawn: Deep Learning
![Page 32: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/32.jpg)
*Word Embeddings
*Dense vector representation (n = 50 … 300 or more)
*Capture semantics – similar words close by cosine measure
*Captures language features*Syntactic relations*Semantic relations
![Page 33: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/33.jpg)
*Dense Word Representation
[0.160610 -0.547976 -0.444522 -0.037896 0.044305 0.245423 -0.261498 0.000294 -0.275621 -0.021201 -0.432955 0.388905 0.106494 0.405797 -0.159357 -0.073897 0.177182 0.043535 0.600987 0.064762 -0.348964 0.189289 0.650318 0.112554 0.374456 -0.227780 0.208623 0.065362 0.235401 -0.118003 0.032858 -0.309767 0.024085 -0.055148 0.158807 0.171749 -0.153825 0.090301 0.033275 0.089936 0.187864 -0.044472 0.421533 0.209217 -0.142092 0.153070 -0.168291 -0.052823 -0.090984 0.018695 -0.265503 -0.055572 -0.212252 -0.326411 -0.083590 -0.009575 -0.125065 0.376738 0.059734 -0.005585 -0.085654 0.111499 -0.099688 0.147020 -0.419087 -0.042069 -0.241274 0.154339 -0.008625 -0.298928 0.060612 0.216670 -0.080013 -0.218985 -0.805539 0.298797 0.089364 0.071044 0.390878 0.167600 -0.101478 -0.017312 -0.260500 0.392749 0.184021 -0.258466 -0.222133 0.357018 -0.244508 0.221385 -0.012634 -0.073752 -0.409362 0.113296 0.048397 0.000424 0.146018 -0.060891 -0.139045 -0.180432 0.014984 0.023384 -0.032300 -0.161608 -0.188434 0.018036 0.023236 0.060335 -0.173066 0.053327 0.523037 -0.330135 -0.014888 -0.124564 0.046332 -0.124301 0.029865 0.144504 0.163142 -0.018653 -0.140519 0.060562 0.098858 -0.128970 0.762193 -0.230067 -0.226374 0.100086 0.367147 0.160035 0.148644 -0.087583 0.248333 -0.033163 -0.312134 0.162414 0.047267 0.383573 -0.271765 -0.019852 -0.033213 0.340789 0.151498 -0.195642 -0.105429 -0.172337 0.115681 0.033890 -0.026444 -0.048083 -0.039565 -0.159685 -0.211830 0.191293 0.049531 -0.008248 0.119094 0.091608 -0.077601 -0.050206 0.147080 -0.217278 -0.039298 -0.303386 0.543094 -0.198962 -0.122825 -0.135449 0.190148 0.262060 0.146498 -0.236863 0.140620 0.128250 -0.157921 -0.119241 0.059280 -0.003679 0.091986 0.105117 0.117597 -0.187521 -0.388895 0.166485 0.149918 0.066284 0.210502 0.484910 0.396106 -0.118060 -0.076609 -0.326138 -0.305618 -0.297695 -0.078404 -0.210814 0.423335 -0.377239 -0.323599 0.282586]
immune_system
![Page 34: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/34.jpg)
*Large volume of data*Billions of words in context*Multiple passes over data
*Algorithms*Word2Vec*CBOW*Skip-gram
*GloVe
*Linguistic terms with similar distributions have similar meaning* Learning Word Representation
T. Mikolov, et. al. “Efficient Estimation of Word Representations in Vector Space.” 2013. http://arxiv.org/pdf/1301.3781.pdf
![Page 35: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/35.jpg)
*Skip-gram predicts surrounding wordsImage:
https://drive.google.com/file/d/0B7XkCwpI5KDYRWRnd1RzWXQ2TWc
![Page 36: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/36.jpg)
*CBOW predicts current wordImage:
https://drive.google.com/file/d/0B7XkCwpI5KDYRWRnd1RzWXQ2TWc
![Page 37: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/37.jpg)
*Word Similarity - Malaria
![Page 38: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/38.jpg)
*Word Similarity: Alanine (Amino Acid)
![Page 39: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/39.jpg)
*Word Similarity: Leukocyte
![Page 40: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/40.jpg)
*Word Similarity: Shigella
![Page 41: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/41.jpg)
*Analogy I (correct)
Heart : Cardiovascular as Kidney:
![Page 42: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/42.jpg)
*Analogy II (near miss)
Salmonella : Proteobacteria Staphylococcus
![Page 43: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/43.jpg)
*Analogy III (miss)
Salmonella : Enterobacteriacea as Staphylococcus
Staphylococcaceae
![Page 44: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/44.jpg)
*Quick Intro to Neural Networks
![Page 45: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/45.jpg)
*Feed forward neural networkImage: http://u.cs.biu.ac.il/~yogo/nnlp.pdf
![Page 46: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/46.jpg)
*Calculating with Neural Netshttps://en.wikibooks.org/wiki/Artificial_Neural_Networks/
Activation_Functions
![Page 47: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/47.jpg)
*Key Characteristics
* Non-linear Activation Function*Sigmoid*Hyberbolic tangent (tanh)*Rectifier (ReLU)
* Word embeddings
* Window size
* Loss function*Binary*Multiclass*Cross-entropy
![Page 48: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/48.jpg)
*Training a Neural Network – Stochastic
Gradient DescentImages: http://u.cs.biu.ac.il/~yogo/nnlp.pdf; http://blog.datumbox.com/tuning-the-learning-rate-in-gradient-descent/
![Page 49: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/49.jpg)
*Convolutional Neural Network for TextImage: https://aclweb.org/anthology/P/P14/P14-2105.xhtml
![Page 50: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/50.jpg)
*Sentence Classification with Convolutional
Networks
![Page 51: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/51.jpg)
*What’s Next?
![Page 52: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/52.jpg)
*Survey n-dimensional Word Embedding Space
Image: http://greg.org/archive/2010/07/05/the_planck_all-sky_survey.html
![Page 53: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/53.jpg)
*Formalize a Mathematical Model of
Semanticshttp://riotwire.com/column/immigrants-socialists-and-semantics-oh-my/
![Page 54: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/54.jpg)
*Tools and References
![Page 55: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/55.jpg)
*Word Embedding Tools
* Word2Vec – command line tool* Gensim – Python topic modeling tool
with word2vec module* GloVe (Global Vector for Word
Representation) – command line tool
![Page 56: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/56.jpg)
*Deep Learning Tools
* Theano: Python CPU/GPU symbolic expression compiler
* Torch: Scientific framework for LuaJIT
* PyLearn2: Python deep learning platform
* Lasange: light weight framework on Theano
* Keras: Python library for working with Theano
* DeepDist: Deep Learning on Spark
* Deeplearning4J: Java and Scala, integrated with Hadoop and Spark
![Page 57: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/57.jpg)
*References
*Deep Learning Bibliography - http://memkite.com/deep-learning-bibliography/
* Deep Learning Reading List –http://deeplearning.net/reading-list/
*Kim, Yoon. "Convolutional neural networks for sentence classification." arXiv preprint arXiv:1408.5882 (2014).
* Goldberg, Yav. “A Primer on Neural Network Models for Natural Language Processing” http://u.cs.biu.ac.il/~yogo/nnlp.pdf
![Page 58: Text mining meets neural nets](https://reader033.fdocuments.us/reader033/viewer/2022052318/587147e11a28ab55588b5ce7/html5/thumbnails/58.jpg)
*Q & A