GTRI Proprietary / Limited Distribution. Architecture File System DataLoader API Analytics API...

16
GTRI Proprietary / Limited Distribution Quick Start Guide Analytics Framework Ashley Scripka Beavers

Transcript of GTRI Proprietary / Limited Distribution. Architecture File System DataLoader API Analytics API...

Page 1: GTRI Proprietary / Limited Distribution. Architecture File System DataLoader API Analytics API Visualization API MongoDB Resource Management Layer Python.

GTRI Proprietary / Limited Distribution

Quick Start Guide

Analytics Framework

Ashley Scripka Beavers

Page 2: GTRI Proprietary / Limited Distribution. Architecture File System DataLoader API Analytics API Visualization API MongoDB Resource Management Layer Python.

GTRI Proprietary / Limited Distribution

Architecture

File System

DataLoader API Analytics API Visualization API

MongoDB

Resource Management Layer

Python Algorithms

SmallK

Sentiment Stats

Easy Vis

Special Vis

Custom Dashboard (DiamondEye, MINT)

Ingest Modules

CSV/XLSX

MongoDB

Text Documents

Filters

DateTime

GTRI LEAN

Text Processing

Page 3: GTRI Proprietary / Limited Distribution. Architecture File System DataLoader API Analytics API Visualization API MongoDB Resource Management Layer Python.

GTRI Proprietary / Limited Distribution

Documentation

• All interfaces: docs/• INGEST_INTERFACE.py

• FILTER_INTERFACE.py

• ANALYTIC_INTERFACE.py

• VISUALIZATION_INTERFACE.py

• Utility Command-Line application:• configure.py

• Usage: python configure.py

--api {ingest, filters, analytics, visualization}

--filename [filename of filter without .py extension]

--mode {add, remove}

Page 4: GTRI Proprietary / Limited Distribution. Architecture File System DataLoader API Analytics API Visualization API MongoDB Resource Management Layer Python.

GTRI Proprietary / Limited Distribution

Repository Structure

analytics-framework dataloader

conf

analytics

visualization

minimal

docs

configure.py

Interface documentation and templates

Sample data optionally used during installation

Apache web server configuration file for default installation

Command-line application for adding/removing modules fro the APIs

DataLoader API

Analytics API

Visualization API

Page 5: GTRI Proprietary / Limited Distribution. Architecture File System DataLoader API Analytics API Visualization API MongoDB Resource Management Layer Python.

GTRI Proprietary / Limited Distribution

Repository Structure

analytics-framework dataloader

conf

analytics

visualization

minimal

docs

configure.py

data

python

wsgi

Location for storing source files and matrix files

Python files for the associated API

WSGI configuration file

JavaScript files – used for DetectDatatype exploration of MongoDBjs

Page 6: GTRI Proprietary / Limited Distribution. Architecture File System DataLoader API Analytics API Visualization API MongoDB Resource Management Layer Python.

GTRI Proprietary / Limited Distribution

Repository Structure

analytics-framework dataloader

conf

analytics

visualization

minimal

docs

configure.py

data

python

wsgi

js

DataLoader

DataLoaderAPIv01.py

Python Module for DataLoader functionality

Flask file for controlling API endpoints

Page 7: GTRI Proprietary / Limited Distribution. Architecture File System DataLoader API Analytics API Visualization API MongoDB Resource Management Layer Python.

GTRI Proprietary / Limited Distribution

Repository Structure

analytics-framework dataloader

conf

analytics

visualization

minimal

docs

configure.py

data

python

wsgi

js

DataLoader

DataLoaderAPIv01.py

filters

ingest

Location of all Filter modules;when adding a new Filter, the file goes here

Location of all Ingest modules;when adding a new Ingest, the file goes here

Various other helper files

Page 8: GTRI Proprietary / Limited Distribution. Architecture File System DataLoader API Analytics API Visualization API MongoDB Resource Management Layer Python.

GTRI Proprietary / Limited Distribution

Repository Structure

analytics-framework dataloader

conf

analytics

visualization

minimal

docs

configure.py

data

python

wsgi

Location for storing results files

Python files for the associated API

WSGI configuration file

Page 9: GTRI Proprietary / Limited Distribution. Architecture File System DataLoader API Analytics API Visualization API MongoDB Resource Management Layer Python.

GTRI Proprietary / Limited Distribution

Repository Structure

analytics-framework dataloader

conf

analytics

visualization

minimal

docs

configure.py

data

python

wsgi

Analytics

AnalyticsAPIv01.py

Python Module for Analytics functionality

Flask file for controlling API endpoints

Page 10: GTRI Proprietary / Limited Distribution. Architecture File System DataLoader API Analytics API Visualization API MongoDB Resource Management Layer Python.

GTRI Proprietary / Limited Distribution

Repository Structure

analytics-framework dataloader

conf

analytics

visualization

minimal

docs

configure.py

data

python

wsgi

Analytics

AnalyticsAPIv01.py

algorithms Location of all Analytics modules;when adding a new analytic, the file goes here

Various other helper files

Page 11: GTRI Proprietary / Limited Distribution. Architecture File System DataLoader API Analytics API Visualization API MongoDB Resource Management Layer Python.

GTRI Proprietary / Limited Distribution

Repository Structure

analytics-framework dataloader

conf

analytics

visualization

minimal

docs

configure.py

python

wsgi

Python files for the associated API

WSGI configuration file

Page 12: GTRI Proprietary / Limited Distribution. Architecture File System DataLoader API Analytics API Visualization API MongoDB Resource Management Layer Python.

GTRI Proprietary / Limited Distribution

Repository Structure

analytics-framework dataloader

conf

analytics

visualization

minimal

docs

configure.py

python

wsgi

Visualization

VisualizationAPIv01.py

Python Module for Visualization functionality

Flask file for controlling API endpoints

Page 13: GTRI Proprietary / Limited Distribution. Architecture File System DataLoader API Analytics API Visualization API MongoDB Resource Management Layer Python.

GTRI Proprietary / Limited Distribution

Repository Structure

analytics-framework dataloader

conf

analytics

visualization

minimal

docs

configure.py

python

wsgi

Visualization

VisualizationAPIv01.py

vis Location of all Visualization modules;when adding a new vis, the file goes here

Various other helper files

Page 14: GTRI Proprietary / Limited Distribution. Architecture File System DataLoader API Analytics API Visualization API MongoDB Resource Management Layer Python.

GTRI Proprietary / Limited Distribution

Repositories

• Analytics-Framework• Main Analytics-Framework repository

• https://github.gatech.edu/ascripka3/Analytics-Framework-GTRI_PROPRIETARY

• Repository access for DiamondEye• https://github.gatech.edu/DiamondEye

• diamondeye• Main web UI

• analytics• Submodule in the Analytics API for additional analytics, including SmallK’s HierNMF

• lean• Submodule in the DataLoader API for text processing

• smallk• Supporting code for SmallK and Pysmallk (older version which supports Pysmallk)

• diamondeye-basic• Vagrant box for use in setting up a development environment

• Follow the “Developer Instructions” or the “User Instructions”

• diamondeye-bells• Vagrant box with all additional repos for use in setting up dev environment

• Follow the “User Instructions”

Page 15: GTRI Proprietary / Limited Distribution. Architecture File System DataLoader API Analytics API Visualization API MongoDB Resource Management Layer Python.

GTRI Proprietary / Limited Distribution

• Python 2.7 dev• Java 8• Mongo server

• Python libraries:• Numpy

• scipy

• pandas

• scikit-learn

• flask, flask-restful, flask-cors, flask-restplus, django

• xlrd

• arrow, parsedatetime, geocoder

• brewer2mpl, vincent, markdown

• feedparser

Available libraries

Page 16: GTRI Proprietary / Limited Distribution. Architecture File System DataLoader API Analytics API Visualization API MongoDB Resource Management Layer Python.

GTRI Proprietary / Limited Distribution

Repository Structure for Submodules

analytics-framework

dataloader

conf

analytics

visualization

minimal

docs

configure.py

data

python

wsgi

Analytics

AnalyticsAPIv01.py

algorithms

Various other helper files

smallk(separate git repo)

python

__init__.py

__init__.py

hiernmf.py

• When adding submodules, place the module repo within the module directory (‘algorithms’ in this case)

• The submodule must include __init__.py files all the way down• There must be a file that conforms to the appropriate interface• When using configure.py to register the new module, you include the relative path

to the file:• python configure.py

--api analytics--filename smallk.python.hiernmf--mode add