Building a Graph-based Analytics Platform

Post on 21-Apr-2017

14.746 views 0 download

Transcript of Building a Graph-based Analytics Platform

(graphs)-[:are]->(everywhere)

Building����������� ������������������  a����������� ������������������  graph-based����������� ������������������  analytics����������� ������������������  platform

© All Rights Reserved 2014 | Neo Technology, Inc.

@kennybastani

Neo4j����������� ������������������  Developer����������� ������������������  Evangelist

Using Meetup as an example use case

Meetup.com is a valuable source of data for understanding trends around products or brands.

Understanding demand is key for delivering compelling content at meetups.

It sounded like a great use case for Neo4j.

The Problem

Track meetup group growth over time.

Apply tags to meetup groups and report combined growth of all groups over time.

Questions

Question #1

Given a start date and an end date, what is the time series that plots the membership growth of a given meetup group?

Question #2

Given a start date, an end date, and a combination of tags, what is the time series that plots the combined membership growth of all meetup groups with those tags?

Question #3

How do you generate the JSON data of a time series for a basic JS line chart plugin?

The Goal

The GraphGist Project

The GraphGist project is a way to quickly build a graph-based proof of concept on Neo4j.

I started with a GraphGist.

Neo4j for Graph Analytics: Meetup.com Example

Graph Data Model

How are groups connected?

How are locations connected?

How are tags/topics connected?

How are stats connected?

How are days connected?

How are weeks connected?

How are months connected?

How are years connected?

Tackling Time in Neo4j

How do you implement a time series in Neo4j?

For any node that represents a unit of time, use a timestamp. Traversals can be costly for selecting time

series. Expose a REST API that takes a normal date format and then convert it to an integer that allows you to select a

range of dates in your Neo4j Cypher query.

Scale it up!

It started with a GraphGist and then I said “Why not?” let’s build something cool using Neo4j.

Challenges

I decided to take my GraphGist and make a full platform.

There were some challenges.

Challenge #1

How do I get historical Meetup group statistics for all groups?

Challenge #2

How do I handle the data import on a daily basis?

Challenge #3

What kind of reports do I want to create? What do I want to know about Meetup groups?

Challenge #4

How do I safely expose Neo4j to a client-side charting control?

Ask Questions

I decided to start asking some questions about my data model.

What do I want to know?

Assuming I had as much historical Meetup data as I pleased, what kind of questions would I want to ask about that data?

How would I want to present it?

What’s the combined growth percent of Meetup groups having a certain topic?

What’s the cumulative growth of Meetup groups with a specific topic?

What’s the relative growth of Meetup groups with a topic for a date range?

How many groups does a topic have relative to others?

What’s the growth percent of all groups for a topic in a location for a date range?

How do I give users a clean set of controls to filter and search?

Scaling it upDesigning a graph-based analytics platform using Node.js and Neo4j

Architecture

Front-end web-based dashboard in Node.js and bootstrap

REST API via Neo4j Swagger in Node.js

Data import services in Node.js

Data storage in Neo4j graph database

Applications

Analytics REST API(Node.js)

Dashboard"(Node.js)

Analytics Data Import Scheduler"(Node.js)

Web

Web

Console

Neo4j(JVM)

REST API(Node.js)

Dashboard (Node.js)

Import Scheduler (Node.js)

Polls Meetup API

Graph Data Storage Analytical Queries Presentation, Filtering

FilterQuery

Import

Web App Web App

Retrieves Report Data Visualizes Report Data

Analytics Dashboard

Analytics REST API

Data Import Scheduler

REST API

The REST API is a fork of Neo4j Swagger. Swagger is a specification and complete framework implementation for describing, producing, consuming, and visualizing RESTful web services.

Demo

http://meetup-analytics-api.herokuapp.com/

Swagger

The REST API module of this project is based on a fork of Swagger.

The Neo4j Swagger Project

The Swagger project was modified to use Neo4j as its data source. The REST API module of this project is extended from the Neo4j swagger project.

REST API Methods

Get Weekly Growth Get Monthly Growth Get Monthly Growth By Tag Get Monthly Growth By Location Get Cities Get Countries Get Group Count By Tag

Get Weekly Growth

Gets the weekly growth percent of meetup groups as a time series. Returns a set of data points containing the week of the year, the meetup group name, and membership count.

Get Monthly Growth

Gets the monthly growth percent of meetup groups as a time series. Returns a set of data points containing the month of the year, the meetup group name, and membership count.

Get Monthly Growth By Tag

Gets the monthly growth percent of meetup group tags as a time series. Returns a set of data points containing the month of the year, the meetup group tag name, and membership count.

Get Monthly Growth By Location

Gets the monthly growth percent of meetup group locations and tags as a time series. Returns a set of data points containing the month of the year, the meetup group tag name, the city, and membership count.

Get Cities

Gets a list of cities that meetup groups reside in. Returns a distinct list of cities for typeahead.

Get Countries

Gets a list of countries that meetup groups reside in. Returns a distinct list of countries for typeahead.

Get Group Count By Tag

Gets a count of groups by tag. Returns a list of tags and the number of groups per tag.

Analytics Dashboard

The dashboard is a web application that uses client-side JavaScript to communicate with the Neo4j Swagger REST API to populate a series of interactive chart controls with data. This web application uses bootstrap for the front-end styles and highcharts.js for the charting controls.

Demo

http://meetup-analytics-dashboard.herokuapp.com/

Reports

Meetup Tag Growth %

Cumulative Meetup Growth

Category Growth %

Groups By Tag

Meetup Tag Growth By Location

Meetup Tag Growth %

https://github.com/kbastani/meetup-analytics/blob/master/docs/DOCS.md#meetup-tag-growth-

Cumulative Meetup Growth

https://github.com/kbastani/meetup-analytics/blob/master/docs/DOCS.md#cumulative-meetup-growth

Category Growth %

https://github.com/kbastani/meetup-analytics/blob/master/docs/DOCS.md#category-growth-

Groups By Tag

https://github.com/kbastani/meetup-analytics/blob/master/docs/DOCS.md#group-count-by-tag

Meetup Tag Growth By Location

Filter & Search

Data Import Scheduler

https://github.com/kbastani/meetup-analytics/blob/master/docs/DOCS.md#data-import-scheduler

GitHub Repository

https://github.com/kbastani/meetup-analytics

Full Documentation

https://github.com/kbastani/meetup-analytics/blob/master/docs/DOCS.md

© All Rights Reserved 2014 | Neo Technology, Inc.

(Next����������� ������������������  steps)

© All Rights Reserved 2014 | Neo Technology, Inc.

Get����������� ������������������  in����������� ������������������  touch

Twitter: @kennybastani

LinkedIn: /in/kennybastani

Email: kenny.bastani@neotechnology.com