Search Pubmed with R


Query pubmed titles for systemic lupus erythematosus with R Package RISmed

#Type the following in the R console:
library(RISmed)
lupus <- EUtilsSummary('lupus[Ti] erythematosus[ti] systemic[Ti]', retmax=200)

# retmax refer to Maximum number of records to retrieve, default is 100.

fetch.lupus <- EUtilsGet(lupus)
fetch.lupus

# Results: 
PubMed query: lupus[Ti] AND erythematosus[ti] AND systemic[Ti] Records: 200

lupus.tit <- ArticleTitle(fetch.lupus)
lupus.tit [1:10] # to view the first 10 results of titles

# export results to text file

write(lupus.tit,file="lupusRISmedTi.txt")

References
1- RISmed package: Stephanie Kovalchik (2013). RISmed: Download content from NCBI databases. R package version 2.1.0.


Query pubmed titles for systemic lupus erythematosus using RISmed

View results of the exported text file

Export results to text file with R command line 
write(lupus.tit,file="lupusRISmedTi.txt")
# export title results as text file and open file in excel or any other valid text editor

Find the Title Verb Relation with Reverb

REVERB1 is an open extractor executable jarexecutable jar programdeveloped by the University of Washington's Turing Center.

• It is important to note that Reverb is dependent on JAVA, therefore itis not a R program.

• Reverb is powerful and provides useful information about structurerelation of a text. It is relative easy to use and runs very fast.

• In our case we will apply Reverb to to our text title results.

Reference:
@inproceedings{ReVerb2011,
  author = {Anthony Fader and Stephen Soderland and Oren Etzioni},

title = {Identifying Relations for Open Information Extraction},
  booktitle = {Proceedings of the Conference of Empirical Methods in Natural Language Processing ({EMNLP} '11)},
  year = {2011},
  month = {July 27-31},
  address = {Edinburgh, Scotland, UK}
}

Install Reverb
You can download the latest ReVerb jar from

This is the executable jar file is easy to run from MS-DOS command.

In you can find how to use Reverb. It provides the following example which illustrates what it does:

"ReVerb takes raw text as input, and outputs (argument1, relation phrase, argument2) triples. For example, given the sentence "Bananas are an excellent source of potassium," ReVerb will extract the triple (bananas, be source of, potassium)."

In order to run Reverb you need to have Java installed on your computer. You can install Java from

Use of Reverb

Place reverb-latest.jar file and the result file "lupusRISmedTi.txt" under the same folder

Figure shows example of the 2 files in the same folder (which we named Reverb-Java)

Use of Reverb

1-Open the MS-DOS cmd and type the path of the folder (Reverb-Java in our example) containing both files: reverb-latest.jar file and lupusRISmedTi.txt

Use Reverb
2- Type the following cmd line to view results on the console:

java -Xmx512m -jar reverb-latest.jar lupusRISmedTi.txt

Results are displayed on the MS-DOS window

Use of Reverb- export the results to xls file

3- Type the following cmd line to export results to a file :

java -Xmx512m -jar reverb-latest.jar lupusRISmedTi.txt > ReverbLupusRISmedTi.txt

(the name given to the file was ReverbLupusRISmedTi.txt. You can use other name or even export to a xls file if you type ReverbLupusRISmedTi.xls

Open the Reverb result file ReverbLupusRISmedTi.txt with MS excel

Reverb output
The Reverb output has 18 columns

(see results in the excel file)
The most interesting are:

Col 3 (Col C) : Argument1 
Col 4 (Col D): Verb Relation phrase
Col 5 (Col E): Argument2

(Col 12 refer to the confidence that this extraction is correct and col 2 refer to the sentence number where the extraction came from)

Reverb Results
Results of the first 5 rows (excel) from columns 3-5

1- childhood-onset systemic lupus erythematosus is associated with ethnicity
2- renal involvement are lower in ACE inhibitor-treated patients
3- Prednisone induced two-way myocardial development
4- Acetylated histones contribute to the immunostimulatory potential of Neutrophil Extracellular Traps

5-clinical practice monitor the impact of systemic lupus erythematosus

Note: Blue color refer to argument 1; white color is verb relation; orange color refer to argument 2

Note: Note: Blue color refer to argument 1Blue color refer to argument 1; white color is verb relation; ; white color is verb relation; orange color orange color refer to argument 2refer to argument 2

Prepare Reverb Results data for R Wordcloud

# use read.table script (from reference 1) as follows:
d <-

read.table('ReverbLupusRISmedTi.txt',quote='',comment.char='', allowEscapes=F,sep='\t', header=FALSE, stringsAsFactors=F)

# transforms the data into a data frame
e<-data.frame(d)
# merge columns (3-5) into a single text sentence
f=paste(e$V3,e$V4,e$V5) 
f[1:3] # view the first 3 lines 
[1] "childhood-onset systemic lupus erythematosus is associated with ethnicity"
[2] "renal involvement are lower in ACE inhibitor-treated patients" 
[3] "Prednisone induced two-way myocardial development"

Reference:
1 Please stop using Excel-like formats to exchange data
December 7th, 2012 John Mount

December 7th, 2012John MountDecember 7th, 2012John Mount

Represent Reverb Results in R Wordcloud

library (tm)
my.corpus <- Corpus(VectorSource(f))
summary(my.corpus)
inspect(my.corpus [1:3]) 
my.corpus <- tm_map(my.corpus, removeWords, stopwords("english"))
#my.corpus <- tm_map(my.corpus, stemDocument)
myTdm <- TermDocumentMatrix(my.corpus, control =


# A term-document matrix (140 terms, 26 documents)
# Non-/sparse entries: 163/3477
# Sparsity : 96%
# Maximal term length: 22 
# Weighting : term frequency (tf)

Represent Reverb Results in R Wordcloud

findFreqTerms(myTdm, lowfreq=2)
# [1] "associated" "damage" "distinct" "erythematosus"
# [5] "increased" "independently" "lupus" "systemic"

termFrequency <- rowSums(as.matrix(myTdm))
termFrequency <- subset(termFrequency, termFrequency>=10)
m <- as.matrix(myTdm)
wordFreq <- sort(rowSums(m), decreasing=TRUE) # This yields Word Frequency

library (wordcloud)
#library (RColorBrewer)
set.seed(375) 
pal1 <- brewer.pal(6,"Dark2")
wordcloud(words=names(wordFreq), freq=wordFreq,

scale=c(2,.9),min.freq=1, random.order=F, colors= pal1)

R Wordcloud of Reverb Results