SHW Engaging Data Final
-
Upload
kipp-jones -
Category
Documents
-
view
222 -
download
0
Transcript of SHW Engaging Data Final
-
8/3/2019 SHW Engaging Data Final
1/22
SkyhookWireless2009
Modeling Social Behavior with Aggregated LocationRequests
EngagingData,October2009
FirstInternaonalForumontheApplicaonandManagementof
PersonalElectronicInformaon
-
8/3/2019 SHW Engaging Data Final
2/22
SkyhookWireless2009
LocaonTechnology
Smartphones,netbooks,tablets,
laptops,digitalcameras
HybridofGPS,cellularposioning
andWi-Filocalizaon
iPhone,iPod,MacOS
Dellnetbooks,laptops
Androidhandsets
WhatisSkyhook?
-
8/3/2019 SHW Engaging Data Final
3/22
SkyhookWireless2009
400-500fullmedrivers
Scanningeverystreet
withSkyhookequipment
Automacdatacapture
andprocessing
110millionWi-FiAPs
1+millioncelltowers
HowSkyhookBuildsCoverage
-
8/3/2019 SHW Engaging Data Final
4/22SkyhookWireless2009
WPSconstellaon
-
8/3/2019 SHW Engaging Data Final
5/22SkyhookWireless2009
US&EuropeanCoverage
-
8/3/2019 SHW Engaging Data Final
6/22SkyhookWireless2009
ResearchBackground
50+ Million devices around the globe
Billions of anonymous location requests every month
20+ months of cumulative data
Smartphone mobile users
Using London and Manhattan as examples
Questions:
What does aggregate user behavior tell us? About users?Groups? Activities? Events? Locations?
Can we discern patterns in the data?
Can we classify time/space/frequency/phase based on thesepatterns?
How can this information be leveraged? Operational?
Applications?
-
8/3/2019 SHW Engaging Data Final
7/22SkyhookWireless2009
CitywideAnalysisandComparisons
London Sunday London Monday
Consistent intensity on weekly basis Quite different, Sunday vs Monday Monday approx. 2x request intensity of Sunday
Magenta area = 1000 requests per km per day
-
8/3/2019 SHW Engaging Data Final
8/22SkyhookWireless2009
Classificaon
Emergencebursts SouthStaon aggregated use pattern where users emerge into a context of uncertainty
Impedanceclustering AccidentonMassPike pattern where multiple users are trying to navigate around traffic blockages or
unanticipated impediments
Socialaffinity/tribalclustering PresidenalInauguraon groups of users have gathered together voluntarily around or in anticipation of a
cultural event
Arterialaccumulaon CommonwealthAve. commuting pathways or pedestrian routes, generally occurring in temporal pulses
Instuonalnucleaon LongwoodMedical usage clusters which have been identified occurring within the confines of
academic campuses or hospital facilities
-
8/3/2019 SHW Engaging Data Final
9/22SkyhookWireless2009
AcvityBasedAnalysis
Time/spacebasedanalysis 'heat'oraggregaterequestanalysis
basedondifferentscalesinmeandspace
Frequency/phasedomain
findtemporalpulses(hourly,weekly,daily,etc.)
grouplikefrequencyandphaseacvityareas
canbeappliedatdifferentspaaldimensions
Baseline/anomalydetecon usetrainingdatasettocomputebaselineandnoisethreshold
forspaalregion
rundataagainstbaselineanddetectanomalies
classifyanomaliesusingabovemethods
-
8/3/2019 SHW Engaging Data Final
10/22SkyhookWireless2009
LargeScaleEvents
Recurring Affinity Cluster
Event Viewing and ImpedanceClustering
Control Sample
'Control'dayversusSt.Patrick'sDayParade
>2.7xbaselinecontrolaverageoverthearea
-
8/3/2019 SHW Engaging Data Final
11/22
SkyhookWireless2009
Boston Sunday
Yankees Stadium, No Game vs. GameMagenta = 1000 requests per square KM per day10 monday games over 30 week sample
LocalIntensity
-
8/3/2019 SHW Engaging Data Final
12/22
SkyhookWireless2009
TileSize
~400m~400m
~1000 Tiles inManhattan
~4B Tiles cover theEarth
-
8/3/2019 SHW Engaging Data Final
13/22
SkyhookWireless2009
Baseline
ForMonday
-
8/3/2019 SHW Engaging Data Final
14/22
SkyhookWireless2009
SATURDAY MONDAY2AM
DailyComparison:Redsquaresmeasureacvity.
Comparisonof2AMManhaan
onSaturdayversusMonday.
-
8/3/2019 SHW Engaging Data Final
15/22
SkyhookWireless2009
RequestIntensityVersusVariance
Monday 6PMMonday 3AM
Acvityand
Consistency:Redsquaresmeasureacvity
(requestcountsperhourper
le);yellowcirclesmeasure
variance.Wherereddominates,
usageisconsistentforle-hour
-
8/3/2019 SHW Engaging Data Final
16/22
SkyhookWireless2009
SampleLocaonofTiles
Midtown (48th & 8th)
Washington Square /
Greenwich Village
Houston
-
8/3/2019 SHW Engaging Data Final
17/22
SkyhookWireless2009
0 500 1000 1500 2000 2500 3000 3500 4000 4500 50000
20
40
60
80
100
120
140
160
180
Time (hrs)
NumberofRequests
Tile Activity Counts
B73AC00B
B73A8C84
B739BFBF
TileAcvityCounts
MidtownHoustonGreenwich Village
Feb 24, 2009 Sep 12, 2009
Raw request logsBin requests by hourDiscretize into 'tiles' 400m squareHourly requests per tile for 3 sample tiles
-
8/3/2019 SHW Engaging Data Final
18/22
SkyhookWireless2009
Intensity in frequency domainIdentify periodic patterns of activityProject future based on past dependable patternsLargest spike shows at the 24 hour cycleFits intuition regarding daily cycle
Not the only cycles that can be found
Frequency of .5 representsthe maximum period we cananalyze ( the maximumsample of 1 hour).
.04167 is 1/12th the maximumfrequency == 24 hours.
FrequencyAnalysis
0 0.1 0.2 0.3 0.4 0.5
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2x 10
4
Frequency (normalized)
Int
ensity
Tile Spectra
2 Hrs
4 Hrs
8 Hrs
12 Hrs
16 Hrs
24 Hrs
1 Wk
1 Mo B73AC00BB73A8C84
B739BFBF
-
8/3/2019 SHW Engaging Data Final
19/22
SkyhookWireless2009
Left shows intensity at the 24 hour cycleAll sample tiles show strong affinity for daily activityRight shows phase of peakGreen tile peaks approximately 5 hours before the red tile every dayCan help partition and classify tiles based on their phase
24-hourFrequencyandPhase
0.038 0.039 0.04 0.041 0.042 0.043 0.044 0.0450
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
x 104
Frequency (normalized)
Intensity
Tile Spectra
24 Hrs
B73AC00B
B73A8C84
B739BFBF
0.038 0.039 0.04 0.041 0.042 0.043 0.044 0.045-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
Frequency (normalized)
Phase(Cycles)
Tile Phases
24 Hrs
B73AC00B
B73A8C84
B739BFBF24 Hrs
l
-
8/3/2019 SHW Engaging Data Final
20/22
SkyhookWireless2009
TileSpectra
Only one of our example tiles shows strong monthly periodicityGreenwich Village tilePattern is consistent month after month
0 0.5 1 1.5 2 2.5 3 3.5 4
x 10-3
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2x 10
4
Frequency (normalized)
Intensity
Tile Spectra
1 Wk1 Mo
B73AC00B
B73A8C84
B739BFBF
90th P l B li B H
-
8/3/2019 SHW Engaging Data Final
21/22
SkyhookWireless2009
0 20 40 60 80 100 120 140 160 1800
10
20
30
40
50
60
70
Time (Hours)
Requests
Baseline Week: 90th Percentile
B73AC00B
B73A8C84
B739BFBF
Outlier detection using training dataUse 90th percentile hourly behavior per tile for a weekFigure shows 7 (daily) peaks for all tilesRed curve peaks several hours behind green curve each dayCalculated this data for all tiles in Manhattan
90thPercenleBaselineByHour
Monday 19:00 EST
-
8/3/2019 SHW Engaging Data Final
22/22
SkyhookWireless2009
What'sNext
Explore other analysis techniques, e.g.Eigenplaces (e.g. Francesco Calabrese,Carlo Ratti 2009)
Systematize processing and classification Determine real-world activities associated
with virtual analysis
Push towards real-time analysis