Edanz journal selector case study a prototype based on solr nutch hadoop

Post on 28-Aug-2014

569 views 6 download

description

Presented by Liang Shen, Developer, European Bioinformatics Institute I'm going to introduce a project I built in 2011: Edanz Journal Selector. It's a tool for scholars to find the right journals to publish their manuscripts. It will be a typical “How We Did It” Development Case Study. We built Edanz Journal Selector based on Solr/Lucene/Hadoop/Hive and deployed it on Amazon web servies. I'm going to share experiences about architecture, cloud and etc. from this project.

Transcript of Edanz journal selector case study a prototype based on solr nutch hadoop

© 2013 LucidWorks

Edanz Journal Selector: Case Study: a

Prototype based on Solr/Nutch/Hadoop

Liang SHEN @shenzhuxi

European Bioinformatics Institute

© 2013 LucidWorks

Edanz Journal Selector

a Prototype based on Solr/Nutch/Hadoop

© 2013 LucidWorks

English editing for scientists

© 2013 LucidWorks

Help scientists publish papers

© 2013 LucidWorks

Target journal?

© 2013 LucidWorks

Journal Selector

© 2013 LucidWorks

Open Access

PubMed

© 2013 LucidWorks

Journal TOCs

created in 2009

21,498 journals from

1,677 publishers

Institute for Computer

Based Learning

Heriot-Watt University

© 2013 LucidWorks

Partner

• Springer Metadata API

Provides metadata for over 5 million online documents

• Springer Open Access API

Provides metadata, full-text content, and images for

over 80,000 open access articles

© 2013 LucidWorks

Open Source Stack

• Infrastructure: Amazon Web Service

• Data processing: Hadoop/Hive

• Index: Solr/Lucene

• Web service: Drupal

• Secret Sauce/Custom Works

© 2013 LucidWorks

Infrastructure: Amazon EC2

© 2013 LucidWorks

Data processing

HDFS

Index

AP

I

Feed

s

Web

Pages

© 2013 LucidWorks

<script>

http://global.js.wid

get.eja.hk/ja/edan

z_ja/w.js

</script>

Web service

© 2013 LucidWorks

Embeddable web widget

© 2013 LucidWorks

Split Index for performance

Index can be divided without losing ranking, if there is always a facet field.

© 2013 LucidWorks

@shenzhuxi

Thanks!

Questions?