Knowledge Discovery from Mobile Phone Communication Activity Data Streams Fergal Walsh Data Stream...

1
Knowledge Discovery from Mobile Phone Communication Activity Data Streams Fergal Walsh Data Stream Research presented in this poster was funded by a Strategic Research Cluster Grant (07/SRC/I1168) by Science Foundation Ireland under the National Development Plan. The authors gratefully acknowledge this support. Data Exploration Stream Processor Raw CDR Data Indexed Database Exploratory Query Tool Data stream processor for pre-processing each record and computing aggregates Spatial, temporal and user indices for efficient querying 1 week of data (> 200 million records) Web based tool for ad hoc spatio-temporal queries Communication event counts per cell per hour (weekday average) 00:00 08:00 12:00 18:00 Trajectories of 2 sample users Location of caller and callee for 2 sample users Anonymised Customer Data Records (CDR) from Meteor, Ireland’s 3 rd largest mobile phone network More than 1 million customers One record per call/sms sent received About 40 million records per day mation retrieval using stream data mining and machine learning techniques users similar to some example users (classification using Support Vector Machines): ers who travel from Maynooth to Dublin daily ers who travel to Dublin from rural areas daily (using semantics of spatial areas) oups of users who are planning a meet-up (using communication motifs) areas with similar phone usage activity profiles (clustering) ghtlife, business, residential, rural clusters of users with similar activity profiles (clustering) opment of (ncg.nuim.ie/i2maps/) Future Work Learn activity chains (probabilistic models) of each users communication and movement events. These will use semantic labels rather than raw spatial locations. Predict movement and communication events from learned models. Current Work About 7000 cells (spatial areas) Cell areas range from <1km 2 to ~50km 2 Publicati ons Pozdnoukhov A., Walsh F., Exploratory Novelty Identification in Human Activity Data Streams, ACM SIGSPATIAL International Workshop on GeoStreaming at 18th ACM SIGSPATIAL GIS, 2010. Pozdnoukhov A., Walsh F., Kaiser F., Statistical Machine Learning from VGI, Position paper at Role of Volunteered Geographic Information in Advancing Science Workshop at GIScience'10, 2010. Kaiser C., Walsh F., Farmer C. and Pozdnoukhov A., User-centric time- distance representation of road networks. In Springer LNCS proc. of the GIScience'10 (full paper). 2010. Records are ordered by time and independent of each other, making this data ideally suited to stream processing The authors gratefully acknowledge the support of Meteor for providing the data used in this poster, in particular Mr. John Bathe and Mr. Adrian Whitwham. Thanks to Ronan Farrell (IMWS) for obtaining the data from Meteor for StratAG Thanks to John Doyle for providing the cell tessellation used in the examples above. Acknowledgemen ts

Transcript of Knowledge Discovery from Mobile Phone Communication Activity Data Streams Fergal Walsh Data Stream...

Page 1: Knowledge Discovery from Mobile Phone Communication Activity Data Streams Fergal Walsh Data Stream Research presented in this poster was funded by a Strategic.

Knowledge Discovery from Mobile Phone Communication Activity Data Streams

Fergal Walsh

Data Stream

Research presented in this poster was funded by a Strategic Research Cluster Grant (07/SRC/I1168) by Science Foundation Ireland under the National Development Plan. The authors gratefully acknowledge this support.

Data Exploration

Stream ProcessorStream ProcessorRaw CDR DataRaw CDR Data Indexed

DatabaseIndexed

Database Exploratory Query ToolExploratory Query ToolData stream processor for pre-processing each record and computing

aggregates

Data stream processor for pre-processing each record and computing

aggregatesSpatial, temporal and

user indices for efficient querying

Spatial, temporal and user indices for efficient

querying1 week of data

(> 200 million records)1 week of data

(> 200 million records)Web based tool for ad hoc

spatio-temporal queriesWeb based tool for ad hoc

spatio-temporal queries

Communication event counts per cell per hour (weekday average)Communication event counts per cell per hour (weekday average)00:0000:00 08:0008:00 12:0012:00 18:0018:00

Trajectories of 2 sample usersTrajectories of 2 sample users Location of caller and callee for 2 sample usersLocation of caller and callee for 2 sample users

Anonymised Customer Data Records (CDR) from Meteor, Ireland’s 3rd largest mobile phone network

More than 1 million customersOne record per call/sms sent receivedAbout 40 million records per day

Information retrieval using stream data mining and machine learning techniques

•Find users similar to some example users (classification using Support Vector Machines):• Users who travel from Maynooth to Dublin daily• Users who travel to Dublin from rural areas daily (using semantics of spatial areas)• Groups of users who are planning a meet-up (using communication motifs)

•Find areas with similar phone usage activity profiles (clustering)• Nightlife, business, residential, rural

•Find clusters of users with similar activity profiles (clustering)

Development of (ncg.nuim.ie/i2maps/)

Future WorkLearn activity chains (probabilistic models) of each users communication and movement events. These will use semantic labels rather than raw spatial locations.

Predict movement and communication events from learned models.

Current Work

About 7000 cells (spatial areas)

Cell areas range from <1km2 to ~50km2

PublicationsPozdnoukhov A., Walsh F., Exploratory Novelty Identification in Human Activity Data Streams, ACM SIGSPATIAL International Workshop on GeoStreaming at 18th ACM SIGSPATIAL GIS, 2010.

Pozdnoukhov A., Walsh F., Kaiser F., Statistical Machine Learning from VGI, Position paper at Role of Volunteered Geographic Information in Advancing Science Workshop at GIScience'10, 2010.

Kaiser C., Walsh F., Farmer C. and Pozdnoukhov A., User-centric time-distance representation of road networks. In Springer LNCS proc. of the GIScience'10 (full paper). 2010.

Records are ordered by time and independent of each other,making this data ideally suited to stream processing

The authors gratefully acknowledge the support of Meteor for providing the data used in this poster, in particular Mr. John Bathe and Mr. Adrian Whitwham.

Thanks to Ronan Farrell (IMWS) for obtaining the data from Meteor for StratAG

Thanks to John Doyle for providing the cell tessellation used in the examples above.

Acknowledgements