Installing Warcbase on OS X and Linux - WordPress.com · Installing Warcbase • Finally, we need...

29
Installing Warcbase on OS X and Linux Ian Milligan Assistant Professor @ianmilligan1

Transcript of Installing Warcbase on OS X and Linux - WordPress.com · Installing Warcbase • Finally, we need...

Page 1: Installing Warcbase on OS X and Linux - WordPress.com · Installing Warcbase • Finally, we need to install Java JDK. The following should work on UBUNTU 16 (last slide for 14):

Installing Warcbase on OS X and Linux

Ian Milligan Assistant Professor

@ianmilligan1

Page 2: Installing Warcbase on OS X and Linux - WordPress.com · Installing Warcbase • Finally, we need to install Java JDK. The following should work on UBUNTU 16 (last slide for 14):

This Guide

• Installing Warcbase on OS X

• Installing Warcbase on Ubuntu (Linux)

• And a link to the rudimentary Windows instructions

Page 3: Installing Warcbase on OS X and Linux - WordPress.com · Installing Warcbase • Finally, we need to install Java JDK. The following should work on UBUNTU 16 (last slide for 14):

This is a supplement to the install docs at http://lintool.github.io/

warcbase-docs/Getting-Started/.

Page 4: Installing Warcbase on OS X and Linux - WordPress.com · Installing Warcbase • Finally, we need to install Java JDK. The following should work on UBUNTU 16 (last slide for 14):

Installing Warcbase• Warcbase is a tricky

installation!

• http://lintool.github.io/warcbase-docs/Getting-Started/

• These slides should walk through on major platforms.

Page 5: Installing Warcbase on OS X and Linux - WordPress.com · Installing Warcbase • Finally, we need to install Java JDK. The following should work on UBUNTU 16 (last slide for 14):

Installing Warcbase

• On OS X, requires dependencies

• Install homebrew - https://brew.sh/

• brewinstallgit

• brewinstallmaven

• brewcaskinstalljava

Page 6: Installing Warcbase on OS X and Linux - WordPress.com · Installing Warcbase • Finally, we need to install Java JDK. The following should work on UBUNTU 16 (last slide for 14):

Installing Warcbase

• For some reason, on OS X, JAVA_HOME is the bane of my existence. I don’t know why.

• exportJAVA_HOME=/usr/lib/jvm/java-8-oracle

Page 7: Installing Warcbase on OS X and Linux - WordPress.com · Installing Warcbase • Finally, we need to install Java JDK. The following should work on UBUNTU 16 (last slide for 14):

Installing Warcbase

• Now with Git, Maven, and JAVA_HOME set, you want to install Warcbase

• Clone the repo by:

• gitclonehttp://github.com/lintool/warcbase.git

Page 8: Installing Warcbase on OS X and Linux - WordPress.com · Installing Warcbase • Finally, we need to install Java JDK. The following should work on UBUNTU 16 (last slide for 14):

Installing Warcbase

• Now go to the warcbase directory (cd warcbase)

• and build just warcbase-core

• mvncleanpackage-plwarcbase-core-DskipTests

Page 9: Installing Warcbase on OS X and Linux - WordPress.com · Installing Warcbase • Finally, we need to install Java JDK. The following should work on UBUNTU 16 (last slide for 14):

Installing Warcbase

• In theory, you should now see this at right. Hurray!

• If not, please feel free to post your build error as an issue on the GitHub repo (warcbase.org).

Page 10: Installing Warcbase on OS X and Linux - WordPress.com · Installing Warcbase • Finally, we need to install Java JDK. The following should work on UBUNTU 16 (last slide for 14):

Installing Warcbase• Now we need to install

“Spark Shell” so we can interface with Warcbase.

• As of April 20th, install this version in a different directory.

• wgethttp://d3kbcqa49mib13.cloudfront.net/spark-1.6.1-bin-hadoop2.6.tgz

Page 11: Installing Warcbase on OS X and Linux - WordPress.com · Installing Warcbase • Finally, we need to install Java JDK. The following should work on UBUNTU 16 (last slide for 14):

Installing Warcbase

• Now we need to untar it

• tar-xvfspark-1.6.1-bin-hadoop2.6.tgz

Page 12: Installing Warcbase on OS X and Linux - WordPress.com · Installing Warcbase • Finally, we need to install Java JDK. The following should work on UBUNTU 16 (last slide for 14):

Warning: The next screen has a path – make sure to change it to match your own system!

Page 13: Installing Warcbase on OS X and Linux - WordPress.com · Installing Warcbase • Finally, we need to install Java JDK. The following should work on UBUNTU 16 (last slide for 14):

Installing Warcbase• We are almost ready!

• CD to the Spark-Shell directory

• And then run this following command (make sure to point the —jars at the warcbase jar)

• ./bin/spark-shell--jars~/penn/warcbase/warcbase-core/target/warcbase-core-0.1.0-SNAPSHOT-fatjar.jar

Page 14: Installing Warcbase on OS X and Linux - WordPress.com · Installing Warcbase • Finally, we need to install Java JDK. The following should work on UBUNTU 16 (last slide for 14):

Installing Warcbase• You should now be at a

prompt.

• Type :paste and press enter

• Now paste the script at right, MAKING SURE TO CHANGE THE PATH TO POINT TO WARCBASE-CORE

importorg.warcbase.spark.matchbox._

importorg.warcbase.spark.rdd.RecordRDD._

valr=RecordLoader.loadArchives("/Users/ianmilligan1/penn/warcbase/warcbase-core/src/test/resources/arc/example.arc.gz",sc)

.keepValidPages()

.map(r=>ExtractDomain(r.getUrl))

.countItems()

.take(10)

Page 15: Installing Warcbase on OS X and Linux - WordPress.com · Installing Warcbase • Finally, we need to install Java JDK. The following should work on UBUNTU 16 (last slide for 14):

Installing Warcbase

• Success if you see the results at right?

Page 16: Installing Warcbase on OS X and Linux - WordPress.com · Installing Warcbase • Finally, we need to install Java JDK. The following should work on UBUNTU 16 (last slide for 14):

Now for Linux (tested on Ubuntu 14)

Page 17: Installing Warcbase on OS X and Linux - WordPress.com · Installing Warcbase • Finally, we need to install Java JDK. The following should work on UBUNTU 16 (last slide for 14):

Installing Warcbase

• Similar to OS X, but some differences.

• Install git

• sudoapt-getinstallgit

Page 18: Installing Warcbase on OS X and Linux - WordPress.com · Installing Warcbase • Finally, we need to install Java JDK. The following should work on UBUNTU 16 (last slide for 14):

Installing Warcbase• Download Apache Maven

manually

• wgethttp://apache.mirror.gtcomm.net/maven/maven-3/3.5.0/binaries/apache-maven-3.5.0-bin.tar.gz

• Untar

• tar-xvfapache-maven-3.5.0-bin.tar.gz

Page 19: Installing Warcbase on OS X and Linux - WordPress.com · Installing Warcbase • Finally, we need to install Java JDK. The following should work on UBUNTU 16 (last slide for 14):

Installing Warcbase• Now we need to set maven

variables so we can use it from anywhere

• exportM2_HOME=/home/ubuntu/apache-maven-3.5.0

• exportM2=$M2_HOME/bin

• exportPATH=$M2:$PATH

• You may need to change your paths accordingly (in M2_HOME)

Page 20: Installing Warcbase on OS X and Linux - WordPress.com · Installing Warcbase • Finally, we need to install Java JDK. The following should work on UBUNTU 16 (last slide for 14):

Installing Warcbase• Finally, we need to install Java

JDK. The following should work on UBUNTU 14 (next slide for 16):

• sudoapt-getinstallopenjdk-7-jdk

• exportJAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

• Again you may need to change paths (above tested April 2017 on AWS Ubuntu 14 VM).

Page 21: Installing Warcbase on OS X and Linux - WordPress.com · Installing Warcbase • Finally, we need to install Java JDK. The following should work on UBUNTU 16 (last slide for 14):

Installing Warcbase• Finally, we need to install Java JDK. The

following should work on UBUNTU 16 (last slide for 14):

• sudoadd-apt-repositoryppa:openjdk-r/ppa

• sudoapt-getupdate

• sudoapt-getinstallopenjdk-7-jdk

• exportJAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

• Again you may need to change paths (above tested April 2017 on AWS Ubuntu 14 VM).

Page 22: Installing Warcbase on OS X and Linux - WordPress.com · Installing Warcbase • Finally, we need to install Java JDK. The following should work on UBUNTU 16 (last slide for 14):

Installing Warcbase

• Phew!

• Now we’re ready to install warcbase. Go back to home directory.

• gitclonehttp://github.com/lintool/warcbase.git

Page 23: Installing Warcbase on OS X and Linux - WordPress.com · Installing Warcbase • Finally, we need to install Java JDK. The following should work on UBUNTU 16 (last slide for 14):

Installing Warcbase

• Now let’s build. From warcbase directory:

• mvncleanpackage-plwarcbase-core-DskipTests

Page 24: Installing Warcbase on OS X and Linux - WordPress.com · Installing Warcbase • Finally, we need to install Java JDK. The following should work on UBUNTU 16 (last slide for 14):

Installing Warcbase• Hey we’re looking good

now!

• Now let’s download Spark shell.

• wgethttp://d3kbcqa49mib13.cloudfront.net/spark-1.6.1-bin-hadoop2.6.tgz

Page 25: Installing Warcbase on OS X and Linux - WordPress.com · Installing Warcbase • Finally, we need to install Java JDK. The following should work on UBUNTU 16 (last slide for 14):

Installing Warcbase• Hey we’re looking good now!

• Now let’s download Spark shell.

• wgethttp://d3kbcqa49mib13.cloudfront.net/spark-1.6.1-bin-hadoop2.6.tgz

• And then untar with

• tar-xvfspark-1.6.1-bin-hadoop2.6.tgz

Page 26: Installing Warcbase on OS X and Linux - WordPress.com · Installing Warcbase • Finally, we need to install Java JDK. The following should work on UBUNTU 16 (last slide for 14):

Installing Warcbase• Go to the spark-shell

directory (cd spark-1.6.1-bin-hadoop-2.6)

• And then run, MAKING SURE TO CHANGE PATH TO POINT AT WARCBASE JAR

• ./bin/spark-shell--jars~/warcbase/warcbase-core/target/warcbase-core-0.1.0-SNAPSHOT-fatjar.jar

Page 27: Installing Warcbase on OS X and Linux - WordPress.com · Installing Warcbase • Finally, we need to install Java JDK. The following should work on UBUNTU 16 (last slide for 14):

Installing Warcbase• You should now be at a

prompt.

• Type :paste and press enter

• Now paste the script at right, MAKING SURE TO CHANGE THE PATH TO POINT TO WARCBASE-CORE

importorg.warcbase.spark.matchbox._

importorg.warcbase.spark.rdd.RecordRDD._

valr=RecordLoader.loadArchives("/home/ubuntu/warcbase/warcbase-core/src/test/resources/arc/example.arc.gz",sc)

.keepValidPages()

.map(r=>ExtractDomain(r.getUrl))

.countItems()

.take(10)

Page 28: Installing Warcbase on OS X and Linux - WordPress.com · Installing Warcbase • Finally, we need to install Java JDK. The following should work on UBUNTU 16 (last slide for 14):

For Windows, I unfortunately don’t have access to a box.

Instructions are at http://lintool.github.io/warcbase-docs/Getting-Started/

Page 29: Installing Warcbase on OS X and Linux - WordPress.com · Installing Warcbase • Finally, we need to install Java JDK. The following should work on UBUNTU 16 (last slide for 14):

Thanks!

Ian Milligan Assistant Professor

@ianmilligan1