Monitoring Communication Channels between Targeted Individuals Ross Sparks MAY 2013.

29
Monitoring Communication Channels between Targeted Individuals Ross Sparks MAY 2013

Transcript of Monitoring Communication Channels between Targeted Individuals Ross Sparks MAY 2013.

Page 1: Monitoring Communication Channels between Targeted Individuals Ross Sparks MAY 2013.

Monitoring Communication Channels between Targeted IndividualsRoss Sparks

MAY 2013

Page 2: Monitoring Communication Channels between Targeted Individuals Ross Sparks MAY 2013.

Outline| Ross Sparks

Outline

• Social networks as a source of information• Communications volume between persons of interest• Business intelligence• Twitter messages – syndromic surveillance-disaster management

• Review of spatio-temporal surveillance• Similarities with monitoring communication levels between targeted people• Differences

• A suggested solution• Order statistics and qq-plots• Deciding on the appropriate level of network aggregation• Some simulation results• Extensions to higher dimensions

2 |

Page 3: Monitoring Communication Channels between Targeted Individuals Ross Sparks MAY 2013.

Information| Ross Sparks

Social networks as a source of information

• In Australia, twitter messages have been successfully used in the real-time management of bushfires• Who is affected?• How are they affected?• Where is the fire spreading and how fast is it moving?

– e.g., a combination of a tornado and a bushfire – very fast - devastating.

• Social media information is being mined for security purposes• Facebook is proving useful in criminal investigations

– Addresses, photos, activities, etc.– Conversations and networks.

• Suspected terrorists and friends are being followed – phone, e-mail and cloud services are all being mined

3 |

Page 4: Monitoring Communication Channels between Targeted Individuals Ross Sparks MAY 2013.

Business intelligence| Ross Sparks

Social networks: a source of business intelligence

• Companies are monitoring what customers say about them and their competitors.

• Companies are monitoring their employees to better manage their risks• What employees say to each other?• What they say to others outside the company?

• HR departments of companies are looking at people’s Facebook pages to better evaluate suitability of a person joining the company.

• Hence social network monitoring is likely to increase in the future.

4 |

Page 5: Monitoring Communication Channels between Targeted Individuals Ross Sparks MAY 2013.

Privacy and ethics| Ross Sparks

Ethical issues and privacy concerns

• Clearly the privacy concerns are an issue.

• Cyber bullying - a concern.

• Cyber crime is on the rise• Exploiting children/child pornography• Cyber scams• Misinformation

• This paper is not going to deal with the ethical issues relating to social media, but wanted to raise it as an important consideration.

5 |

Page 6: Monitoring Communication Channels between Targeted Individuals Ross Sparks MAY 2013.

Point Process? | Ross Sparks

Setting the scene for monitoring spatial point processes

6 |

LONGITUDE

LATITUDE

Page 7: Monitoring Communication Channels between Targeted Individuals Ross Sparks MAY 2013.

Spatio-temporal surveillance| Ross Sparks

The time dimension

7 |

TIME

LONGITUDE

LATITUDE

The scan statistic - this counts the number of incidents in the spatio-temporal block and compares it to the expected count

Page 8: Monitoring Communication Channels between Targeted Individuals Ross Sparks MAY 2013.

Spatio-temporal applications| Ross Sparks

• Disease outbreaks which cluster spatially

• Detecting emerging traffic hot-spots

• Pockets of Australia where domestic violence is increasing significantly more than forecasts/expected

• Criminal activity that cluster spatially

• Identifying geographical regions of higher sales than expected for specific items

• Identifying geographical regions where there are a higher number of people cessing their household insurance policy than expected

• etc

Applications

8 |

Page 9: Monitoring Communication Channels between Targeted Individuals Ross Sparks MAY 2013.

Social networks| Ross Sparks

Social networks

9 |

Page 10: Monitoring Communication Channels between Targeted Individuals Ross Sparks MAY 2013.

Social networks| Ross Sparks

Who are “neighbours” in the social network?

10 |

cA cB cC cD cE cF cG cH cI cJ cK cL cM cN cO cP cQ cR cS cTrA 0 6 6 6 10 5 7 7 6 6 1 0 1 1 3 2 0 0 1 2rB 6 0 6 11 8 8 10 8 9 11 1 2 0 2 0 0 0 1 1 0rC 4 8 0 6 10 8 5 8 8 6 1 1 0 0 1 2 0 1 3 1rD 6 11 8 0 7 6 8 7 4 8 1 3 0 1 2 0 0 2 1 3rE 8 10 9 8 0 6 12 8 6 8 1 0 0 1 0 1 0 0 1 2rF 6 8 6 9 8 0 6 10 10 8 0 1 2 2 0 0 1 0 0 0rG 9 8 6 8 12 10 0 8 10 10 3 0 1 0 1 2 1 0 1 1rH 7 9 8 8 11 10 8 0 10 10 0 0 0 0 0 2 0 1 1 1rI 8 6 7 4 6 9 8 10 0 8 0 1 1 1 1 1 0 0 1 1rJ 6 10 6 10 8 6 8 8 6 0 0 2 1 3 1 1 0 3 3 0rK 1 2 0 0 1 2 2 1 0 3 0 4 15 8 6 6 4 12 8 5rL 1 2 1 1 0 1 1 1 2 0 2 0 8 10 10 10 6 10 6 9rM 1 0 2 2 0 3 0 1 0 1 11 10 0 9 10 8 12 8 7 9rN 1 0 4 0 1 1 0 3 4 0 9 10 9 0 6 11 9 10 6 7rO 3 0 0 0 2 2 1 0 0 0 8 8 10 6 0 4 8 8 14 14rP 2 1 1 1 0 4 0 1 1 4 6 10 6 9 6 0 10 8 6 10rQ 0 0 1 0 0 1 1 0 0 0 4 7 14 9 10 9 0 8 10 8rR 1 1 3 0 1 0 1 0 1 3 10 10 8 10 8 8 10 0 9 5rS 1 1 1 0 0 0 0 0 0 0 8 6 7 7 12 6 10 9 0 6rT 1 1 0 0 0 1 1 2 2 0 8 9 7 6 10 8 9 8 6 0

Number of times A contacts B, etc

Page 11: Monitoring Communication Channels between Targeted Individuals Ross Sparks MAY 2013.

Security| Ross Sparks

Monitoring people who are a security risk

• Assume that there are 1000 past criminals (out of jail) that you wish to monitor.

• The scan statistic – Looking for gangs of 5 in the above network – This would need to investigate close to 10 billion (using the long scale)

potential gangs using an exhaustive “SCAN”.

• An computational feasible alternative is needed.

11 |

Page 12: Monitoring Communication Channels between Targeted Individuals Ross Sparks MAY 2013.

Aggregations| Ross Sparks

Dynamic aggregation levels

• In the spatio-temporal monitoring we try to dynamically decide on the level and position of spatial aggregation to best detect an outbreak.

• In the social network case, the natural neighbours in the network are potential dynamic,• e.g., the neighbours socially may differ to neighbours in terms of criminal

gangs.• As such the scan statistic is unlikely to work well in the monitoring of

communication levels unless we are lucky and have people in the appropriate order.

• Neighbours are not easy to define.

12 |

Page 13: Monitoring Communication Channels between Targeted Individuals Ross Sparks MAY 2013.

Social networks | Ross Sparks

How to define the best network aggregation?

Order Statistics are often useful in defining anomalous cells

• For each communication cell calculate their signal-to-noise ratio measuring how much their counts depart from expected.

• Rank these from smallest to largest.

• Plot these against their theoretical distribution under the assumption that the network communication level has not changed (in-control).

13 |

Page 14: Monitoring Communication Channels between Targeted Individuals Ross Sparks MAY 2013.

Social networks | Ross Sparks

Example

14 |

Contacted person

A B C D E F G H I J

Contacting person

A 0 6 5 0 0 0 0 0 0 0

B 4 0 3 0 0 0 0 0 0 0

C 3 3 0 0 0 0 0 0 0 0

D 0 0 0 0 5 2 1 0 0 0

E 0 0 0 2 0 2 0 0 0 0

F 0 0 0 2 0 0 1 0 0 0

G 0 0 0 1 1 1 0 0 0 0

H 0 0 0 0 0 0 0 0 3 2

I 0 0 0 0 0 0 0 2 0 2

J 0 0 0 0 0 0 0 2 2 0

Contacting person

Expected weekly communication levels

A 0 2.1 1.7 0 0 0 0 0 0 0

B 1.9 0 1.3 0 0 0 0 0 0 0

C 0.6 0.4 0 0 0 0 0 0 0 0

D 0 0 0 0 4.5 2.1 0.6 0 0 0

E 0 0 0 2.5 0 2.4 0.4 0 0 0

F 0 0 0 1.5 0.6 0 0.9 0 0 0

G 0 0 0 0.5 1.0 0.8 0 0 0 0

H 0 0 0 0 0 0 0 0 2.5 2.4

I 0 0 0 0 0 0 0 2.1 0 2.0

J 0 0 0 0 0 0 0 1.9 1.8 0

Page 15: Monitoring Communication Channels between Targeted Individuals Ross Sparks MAY 2013.

Social networks | Ross Sparks

QQ-plot

15 |

An alternative is p-values and the use a pp-plot of actuals vstheoretical

Page 16: Monitoring Communication Channels between Targeted Individuals Ross Sparks MAY 2013.

Social networks | Social networks

Is there another way?

• Sum over all the cell counts that are greater than their expected quantile by grouping all cells with unusually high signal-to-noise ratios as in the previous QQ-plot.

• Calculate the signal-to-noise ratio for this group.

• See if it exceeds a threshold

16 |

Page 17: Monitoring Communication Channels between Targeted Individuals Ross Sparks MAY 2013.

Social networks | Social networks

Which cells to aggregate counts?

17 |

Contacted person

A B C D E F G H I J

Contacting person

A 0 6 5 0 0 0 0 0 0 0

B 4 0 3 0 0 0 0 0 0 0

C 3 3 0 0 0 0 0 0 0 0

D 0 0 0 0 5 2 1 0 0 0

E 0 0 0 2 0 2 0 0 0 0

F 0 0 0 2 0 0 1 0 0 0

G 0 0 0 1 1 1 0 0 0 0

H 0 0 0 0 0 0 0 0 3 2

I 0 0 0 0 0 0 0 2 0 2

J 0 0 0 0 0 0 0 2 2 0

Contacting person

Expected weekly communication levels

A 0 2.1 1.7 0 0 0 0 0 0 0

B 1.9 0 1.3 0 0 0 0 0 0 0

C 0.6 0.4 0 0 0 0 0 0 0 0

D 0 0 0 0 4.5 2.1 0.6 0 0 0

E 0 0 0 2.5 0 2.4 0.4 0 0 0

F 0 0 0 1.5 0.6 0 0.9 0 0 0

G 0 0 0 0.5 1.0 0.8 0 0 0 0

H 0 0 0 0 0 0 0 0 2.5 2.4

I 0 0 0 0 0 0 0 2.1 0 2.0

J 0 0 0 0 0 0 0 1.9 1.8 0

Cells with the highest signal-to-noise ratios

Counts=6+5+…+3=24Expected=2.1+1.7+..+0.4=8

Signal –to-noise ratio for the aggregated group= (24-8)/2.828

Page 18: Monitoring Communication Channels between Targeted Individuals Ross Sparks MAY 2013.

Social networks| Ross Sparks

Advantages of this ad hoc procedure

• No need to order the network into neighbours

• It works well even in the spatio-temporal setting where “neighbours” are well defined – a paper will soon appear in Communications in Statistics.

• It works out who to aggregate over and thus determines the number of cells to aggregate. Thus the approach adapts to the size (and shape/network of the outbreak).

• The approach is very simple – intuitive – easy for non-statisticians to understand.

18 |

Page 19: Monitoring Communication Channels between Targeted Individuals Ross Sparks MAY 2013.

Social networks| Ross Sparks

Some other applications

• Monitoring several hundred symptoms collected from twitter messages in several countries around the world.

• Supermarket sales of several hundred or thousands of products at thousands of supermarket stores in Australia.

• Monitoring various crimes at several hundred key locations.

• Cancellation of life insurance policies for clients at various geographical locations (sla) by age group.

• Number of banking transactions – type of transaction by locations in Australia.

• Number of people travelling between train stations at the peak times of the day in big cities (e.g., Sydney).

19 |

Page 20: Monitoring Communication Channels between Targeted Individuals Ross Sparks MAY 2013.

Simulation| Ross Sparks

Simulated example

• We monitor 1000 group of target people.

• Assumed 100 independent social networks of 10 individuals.

• The mean communication daily counts between individuals is taken as uniform on the interval of:• 0.1 to 3 during periods when no crime is being planned, and • 0.0001 for individuals between not in the same gang.

• A step change in communications of delta for all individuals within a specific few gangs will be simulated-these are then hidden.

• We apply the approach to see how early we detect these “unknown” increases.

20 |

Page 21: Monitoring Communication Channels between Targeted Individuals Ross Sparks MAY 2013.

Simulation | Ross Sparks

Different simulated criminal planning outbreaks

• Scenario 1: One cell of ten individuals. • Total communication mean count=137.• Scenario 2: Two neighbouring cells of ten individuals.

• Total communication mean count=275• Scenario 3: Two non-neighbouring cells of ten individuals.

• Total mean count=295.• Scenario 4: Three independent cells involving 7 of the 10 within

each cell. • Total mean count=204.• Scenario 5: Four independent cells involving 6 of the 10 within

each cell. • Total mean count=195.

21 |

Page 22: Monitoring Communication Channels between Targeted Individuals Ross Sparks MAY 2013.

Order statistics| Ross Sparks

Fixed number of order statistic

• Simulations – generating 1000 by 1000 counts matrix.

• I tried aggregating over the top 25, 50, 75, 100, 150, 200, 250, 300 cells with the highest signal to noise ratio to see which provided the earliest signals of out-of-control events quickly.

• The in-control Average Run length was taken as 100.

• Daily counts were generated. The first 500 days were used to estimate in-control cell means. Thereafter hidden out-of-control communication cells were simulated and then the technology was used to find them – recording the run lengths – these were averaged for 100 simulations to give the average run lengths.

22 |

Page 23: Monitoring Communication Channels between Targeted Individuals Ross Sparks MAY 2013.

Nature of “outbreaks”| Ross Sparks

Generation of unusual communication “outbreaks”

• It is assumed that planning a crime has all participants communication at the same increased level, i.e., not proportional to their social communications.• This means that those that don’t communicate much socially but

do when planning a crime are going to have bigger communication cell signal-to-noise ratios.• The opposite is true if the increase in class is proportionally to

their social calls expected counts.

23 |

Page 24: Monitoring Communication Channels between Targeted Individuals Ross Sparks MAY 2013.

Outbreaks| Ross Sparks

Scenarios• Scenario 1: One cell of ten individuals. (Total communication mean count=136.61).

• Scenario 2: Two neighbouring cells of ten individuals. (Total communication mean count=275.25).

• Scenario 3: Two non-neighbouring cells of ten individuals. (Total mean count=294.76).

• Scenario 4: Three independent cells involving 7 of the 10 within each cell. (Total mean count=204.2).

• Scenario 5: Four independent cells involving 6 of the 10 within each cell. (Total mean count=194.95).

• Scenario 6: Four independent cells involving 1 of the 10 within each cell. (Total mean count=194.95).

24 |

Page 25: Monitoring Communication Channels between Targeted Individuals Ross Sparks MAY 2013.

Results| Ross Sparks

Scenario 1: One cells of ten individuals Scenario 2: Two neighbouring cells of ten individuals each.

25 |

Scenario 1 Scenario 2

delta

Number of Order Statistics

Number of Order Statistics

m 25 50 75 100 25 50 75 100

0.0 100.8 99.8 100.3 100.2 101.4 100.2 100.3 100.2 0.5 21.7 22.2 22.6 23.8 15.9 16.0 16.2 16.2 1.0 8.8 9.7 10.5 10.9 7.1 7.5 7.8 8.1 2.0 4.5 5.0 5.5 5.9 3.8 3.9 4.3 4.8 3.0 3.4 3.7 3.9 4.1 2.8 2.9 2.9 3.4 4.0 2.8 2.9 2.9 3.4 1.9 1.9 2.6 2.9 6.0 1.9 2.0 2.0 2.7 1.6 1.6 1.9 2.0

Page 26: Monitoring Communication Channels between Targeted Individuals Ross Sparks MAY 2013.

Presentation title | Presenter name

Scenario 3: Two non-neighbouring cells of ten individuals each. Scenario 4: Four independent non-neighbouring cells involving 7 of the 10 people within each cell.

26 |

Scenario 3 Scenario 4

delta

Number of Order Statistics

Number of Order Statistics

m 25 50 75 100 25 50 75 100

0.0 101.4 99.8 100.3 100.2 101.4 100.2 100.3 100.2 0.5 15.8 15.9 16.3 16.9 15.6 16.4 16.5 16.8 1.0 7.2 7.4 8.0 8.2 7.1 7.5 8.1 8.3 2.0 3.8 3.9 4.1 4.8 3.6 3.9 4.2 4.8 3.0 2.8 2.9 2.9 3.4 2.9 3.0 3.0 3.2 4.0 1.9 1.9 2.5 2.9 2.0 2.0 2.5 2.9 6.0 1.6 1.5 1.9 2.0 1.8 1.8 1.9 2.0

Page 27: Monitoring Communication Channels between Targeted Individuals Ross Sparks MAY 2013.

Results| Ross Sparks

Scenario 5: Four independent cells involving 6 of the 10 within each cell. (Total mean count=194.95).

Scenario 6: Four non-neighbouring cells of ten individuals

27 |

Scenario 5 Scenario 6

delta

Number of Order Statistics

Number of Order Statistics

m 25 50 75 100 25 50 75 100

0.0 101.4 99.8 100.3 100.2 101.4 100.2 100.3 100.2 0.5 14.7 14.8 15.5 15.6 12.0 12.1 12.3 12.6 1.0 6.6 7.3 7.5 8.1 5.4 5.9 6.1 6.4 2.0 3.2 3.9 3.9 4.8 2.9 2.9 3.4 3.8 3.0 2.2 2.9 2.9 3.4 2.2 2.2 2.6 2.9 4.0 1.9 2.0 2.0 2.9 1.9 1.9 2.0 2.0 6.0 1.7 1.6 1.7 2.0

Page 28: Monitoring Communication Channels between Targeted Individuals Ross Sparks MAY 2013.

Conclude| Ross Sparks

Conclusions

• As long as the increase in calls are at least twice the normal number calls when planning a crime, then it can be flagged within a week.

– This is probably sufficient to prevent a gang related crime or a gang related terrorist activity.

• Simulations of large scale networks are challenging – needs computing skills better than I currently possess.

• The technology can be scaled up to higher dimensions if the simulations process can be improved.

28 |

Page 29: Monitoring Communication Channels between Targeted Individuals Ross Sparks MAY 2013.

CSIRO Computational InformaticsRoss SparksResearch scientistt +61 2 9123 4567e [email protected] http://www.csiro.au/

CSIRO COMPUTATIONAL INFORMATICS/DIGITAL PRODUCTIVITY FLAGSHIP

Thank you. Question?