Boardgamegeek scraping

10
Boardgames, Webscraping & R Lightning Talk

Transcript of Boardgamegeek scraping

Boardgames, Webscraping & R

Lightning Talk

boardgamegeek.com• Online community dedicated to all things tabletop

game related

• Includes game reviews, ratings, forums, and trading community, ebay auction links and collection management

• 80,000+ games in collection

• Also includes BGGXML2, a back-end API you can use to create rich datasets! https://boardgamegeek.com/wiki/page/BGG_XML_API2

My Boardgame Collection

What should I get next? Amazon recommendations skewed by casual gamers…

My Purchase Criteria• Limited set of adult friends who play boardgames…mostly

game with people who don’t want to spend time poring over rulebooks

• Mostly play with my kids…but I don’t want to be bored to tears with classic games (Chutes & Ladders, Candyland, etc.)

• Games must have depth, but not get bogged down in long playtimes

• Want “biggest bang for the buck”…games can be expensive

Basic Idea for Project

boardgamegeek

MySQLdatabase

R script

Game Recommendations

My Game Library

shiny app?

All Games & Game IDs

Stats for all Games

Packages & Tools Used

• rvest for simple scraping and accessing BGGXMLAPI2

• dplyr for filtering of data sets

• SelectorGadget (http://selectorgadget.com/)

Let’s look at some data• Wanted to pull game ids from

site…but soon discovered that there’s no clean way to get all the game ids…need to code a spider to crawl the site…will take days to run.

• Grabbed an old snapshot of the site in csv format to get up and running

• When prototyping…make it work, don’t make it pretty…I can code up a spider to crawl the site later…

Current Feature Implementation

Simple filters on game universe

• 2-6 players

• < 60 minutes max playtime

• no expansion packs

created custom ranking mechanism prototype by scaling user rankings for all games

pulled sample game stats for single game for feature coding

Current Personal Game Ranks

Next Steps…• Serious data cleanup…(e.g. improve weighting of

ranks based on number of people ranking)

• Pull game collection by userid off of site

• Game categorization and ranking logic to pair my collection with possible future purchases

• Link my collections with other users based on compatibility

• Track changes in collections and predict changes