Data Science Innovation: Transforming Instagram Data
Into Location Intelligence and Internet of Things April 2014
linkedin.com/in/sureshsood
Topic Areas
1. Statistics/Data mining or Data Science?2. Data Science workflows/discovery3. Research informing our thinking about location intelligence4. Data Science innovation and exploratory analysis5. Motivations for Instagram project6. Pattern mining trajectories/Data mining 7. Instagram analytics tools8. NoSQL- MongoDB9. Datafication 3 back end (walk thru)10. Location Social Recommender system11. Q&A
Statistics, Data Mining or Data Science ?
• Statistics– precise deterministic causal analysis over precisely collected data
• Data Mining– deterministic causal analysis over re-purposed data carefully sampled
• Data Science– trending/correlation analysis over existing data using bulk of population i.e. big
data
Adapted from:
NIST Big Data taxonomy draft report (see http://bigdatawg.nist.gov /show_InputDoc.php)
Data Science Workflows & Discovery
Useful References Informing our Thinking about Location Intelligence
(Silva et al (2013) A comparison of Foursquare and Instagram to the study of city dynamics and urban social behavior, Proceedings of the 2nd ACM SIGKDD International Workshop on Urban ComputingInstagram and Foursquare datasets might be compatible in finding popular regions of cityChaoming Song, et al. (2010), Limits of Predictability in Human Mobility, Science There is a potential 93% average predictability in user mobility, an exceptionally high value rooted in the inherent regularity of human behavior. Yet it is not the 93% predictability that we find the most surprising. Rather, it is the lack of variability in predictability across the population.Scellato et al. (2011), NextPlace: A Spatio-temporal Prediction Framework for Pervasive Systems. Proceedings of the 9th International Conference on Pervasive Computing (Pervasive'11)Daily and weekly routines => Few significant places every day => Regularity in human activities => Regularity leads to predictability
Domenico, A. Lima, Musolesi.M. (2012) Interdependence and Predictability of Human Mobility and Social Interactions. Proceedings of the Nokia Mobile Data Challenge Workshop.we have shown that it is possible to exploit the correlation between movement data and social interactions in order to improve the accuracy of forecasting of the future geographic position of a user. In particular, mobility correlation, measured by means of mutual information, and the presence of social ties can be used to improve movement forecasting by exploiting mobility data of friends. Moreover, this correlation can be used as indicator of potential existence of physical or distant social interactions and vice versa. Sadilek, A and Krumm, J. (2012) Far Out: Predicting Long-Term Human MobilityWhere are you going to be 285 days from now at 2pm …we show that it is possible to predict location of a wide variety of hundreds of subjects even years into the future and with high accuracy.
Useful References Informing our Thinking about Location Intelligence
“One of the most fascinating aspects of location-based data is the stability and predictability of patterns that can be mined from seemingly unrelated data. A cluster of random dots on a map can represent a daily transportation route, the most popular dating spots or the neighborhoods with the highest concentration of gang violence. These patterns, analyzed over time and in large numbers, begin to allow for informed predictions of behaviors and events. For government, this analytical capability enables better resource allocation and more effective outcomes”.Interview with G. Edward DeSeve, former White House ARRA chief administrator,
December 15, 2011. Seen in “The power of zoom: Transforming government through location intelligence” by Deloitte Consulting LLP Source: https://www.deloitte.com/assets/Dcom-UnitedStates/Local%20Assets/Documents/Federal/us_fed_govlab_power_of_zoom_report_100212.pdf
Useful References Informing our Thinking about Location Intelligence
Useful NSW Govt resources on Location Intelligence
• NSW Globe – globe.six.nsw.gov.au– Uses Google Earth to explore spatial data and images
• NSW Location Intelligence Strategy (April 2014)– http://www.finance.nsw.gov.au/ict/sites/default/files/
NSW Location Intelliegence Strategy.pdf
• NSW Government datasets– http://data.nsw.gov.au/
Data Science Innovation
Data Science innovation is something an organization has not done before or even something nobody anywhere has done before. A data science innovation focuses on discovering and using new or untraditional data sources to solve new problems.
Adapted from:Franks, B. (2012) Taming the Big Data Tidal Wave, p. 255, John Wiley & Son
The ANZ Heavy Traffic Index comprises flows of vehicles weighing more than 3.5 tonnes (primarily trucks) on 11 selected roads around NZ. It is contemporaneous with GDP growth.
The ANZ Light Traffic Index is made up of light or total traffic flows (primarily cars and vans) on 10 selected roads around the country. It gives a six month lead on GDP growth
http://www.anz.co.nz/commercial-institutional/economic-markets-research/truckometer/
Discovery (Exploratory) Analytics
Exploratory– Unstructured– Machine learning– Data mining– Complex analysis– Data diversity
Richness of new sources
X Business Intelligence– Dashboard– Real time decisioning– Alerts– Fresh data– Response time
Speed of Query
Data Science InnovationNew sources of information for data driven applications and Internet of Things
Number of journeys madeDistances travelledTypes of roads usedSpeedTime of travelLevels of acceleration and brakingAny accidents which may occur
The Industrial Ecology Lab - towards an integrated Australian research platform
Black Box Insurance • Telematics technology (black box) helps assess the driving
behavior and deliver true driver centric premiums by capturing: – Number of journeys – Distances travelled– Types of roads – Speed– Time of travel– Acceleration and braking– Any accidents
• Benefits low mileage, smooth and safe drivers• Privacy vs. Saving monies on insurance (Canada)
– http://bit.ly/Black_box
Internet of Things“trillion sensors”
Source: www.tsensorssummit.org
Smartphone, Google Glass or Apple Watchwill Know What you Want before you do
“…from 2014 your phone [glasses or watch] will anticipate your needs, do the research, tell you what what you want to know – sometimes before the question even occurs to you…”
Chapman, Jake (2013), The Wired World in 2014
Push Notification Providers 1. Appboy2. Urban Airship3. StackMob4. Parse5. https://notifica.re6. http://www.xtify.com7. http://push.io8. http://streamin.io9. https://pushbots.com10.http://appsfire.com11.mBlox12.http://quickblox.com/13.https://www.mobdb.net14.http://www.elementwave.com15.Kahuna - http://www.usekahuna.com/
http://www.quora.com/What-are-some-alternatives-to-Urban-Airship-for-mobile-push
Mobile Relationship Management Workflow (Urban Airship)What/When?/Where?
Apple Passbook Styles Urban Airship
Motivations for Instagram Project• Trajectory data (not i.i.d. – independent and identically distributed)
• A new authentication approach based on trajectory
• Predictive capability phones, glasses and watches
• Internet of Things (Sensors, RFID, Wheelchairs and Drones) • Indoor GPS
• Car parking “anywhere”
• Location based services e.g. advertising
• Tourist recommender system
• Food analytics and traceability (farm fork)
• Mobile apps with trajectory data e.g. Foursquare, Instagram, Nike+ EveryTrial
• Insurance “pay as you drive”– telematics black box based insurance policy
Pattern Mining Trajectories
Group of
Trajectories
Trajectory Patterns: 1. Hot regions (basic unit)2. Trajectory pattern is relationships amongst regions
Opportunities : Location based networksDestination predictionCar-poolingPersonal route planningGroup buyingLoyalty Credit card data
Adapted from: Chang, Wei, Yeh and Peng, “Discovering Personalised Routes from Trajectories”ACM, LBSN’11, Chicago,illinois,USA, 1 November 2011
Open Source Artifact Highlighting 68 Data Mining Algorithms
First Australian Instagram Study Conducted by UTS:AAI
Why is Instagram Popular ?
• Mobile photo sharing app + social network• Mobile first Workflow:
– take picture or select => crop/filter => geo-tag/hashtag/description/share
• Instagram is “Twitter but with photo updates”• Status updates are transformed photos• Default is pictures and accounts are public • Pictures include:
– Geolocation, hashtags, comments and likes• Mobile app friendly vs. desktop
Instagram Analytics Tools (off the shelf)• Statigram
– Lifetime likes– Total comments– New followers/last 7 days– Most liked photos
• Simply Measured– Total engagement Instagram, Facebook and Twitter– Engaging photo/filter/location– Top photos by date– Active commenters– Best time for engagement– Best day for engagement– Top filters
• Nitrogram– Countries of followers– Most engaging– Most commented– Likes and comments on a photo
MongoDB - An Innovation in Databases?“MongoDB gets the job done”
“document-oriented NoSQL database”
“MongoDB is natural choice when dealing with JSON”
“Same data model in code = same model in database”
“Data structure store to model applications”
“In MongoDB Instagram post can be stored in single collection and stored exactly as represented in the program as one object. In a relational database an Instagram post would occupy multiple tables.”
“MongoDB understands geo-spatial co-ordinates and supports geo-spatial indexing”
“Initial MongoDB prototype RedHat OpenShift (Public/Private or Community “Platform as a Service”)
Recommendation engine integrating Mahout libraries and MongoDB (see Roadmap)
As discussed @ Journey to MongoDB:Trajectory Pattern Mining in Australian InstagramBy Suresh Sood and Xinhua Zhu
**Sydney MongoDB Meetup 30 April 2013
JSON Sources Driving Internet of Things
• RaZberry– http://www.theregister.co.uk/Print/2013/09/16/zwave_pi_its_time_the_raspberry_pi_took_control/
• Teradata– http://www.teradata.com.au/newsrelease.aspx?LangType=3081
• Google– http://googledevelopers.blogspot.com.au/2012/10/got-big-json-bigquery-expands-data.html
• Rich query language• Native secondary indexes• Geospatial indexes & search• Text indexes & search• Aggregation framework (see Mongo doc for Release 2.4.9) • Map-Reduce (Javascript ) implementation• Client-side analytics
MongoDB Analytics Support of Instagram Project
Architectural Implementation using MongoDB
Name Node
Mongo Database distributed across shards
DataCollection
DataCollection Stats Stats
Map Reduce
Instagram via API
Client for Instagram projectdatafication.com.au/instagram
Timeline based Trajectory Analysis
Google Map based Trajectory Analysis
Social Relationship Analysis
Location based Retrieval
Popular HashTag Analysis
Popular Image Analysis
Peak Usage Time Analysis
Active User Analysis
Roadmap
Data collection
Individual(Group) Analysis
Find Preference and Behavior pattern(including Trajectory pattern)
RecommendationRecommend right product (or service) to right person ( or group) at right time and place
Manually Automatically
MongoDB Mahout or Mortar Recommender
Recommended Trajectories
• Trajectories• Points of Interest• User profiles• Image details
• Recommender engine (Mahout or Mortar)
Algorithms
MongoDB Connector for
Hadoop
Version 1.2.0
Supporting Documentation
• Instagram project documentation – Data Model and Data Collection Procedure (V2.0)
• MongoDB Aggregation and Data Processing Release 2.4.9
Top Related