Textual Report Generation from Email utilizing Temporal Topic Analysis · 2019-01-02 · Input...

Post on 27-Jun-2020

2 views 0 download

Transcript of Textual Report Generation from Email utilizing Temporal Topic Analysis · 2019-01-02 · Input...

● Use doc2vec for topic calculus

● Use model trained on Wikipedia articles for topics

● Extract topic labels by compare email vectors & cluster

keyword sets to topic vectors

● Choose a set of topics that together best describe a email

Topic AnalysisInput

Communication groups

Temporal Chains

Textual Report Generation from Email utilizing Temporal Topic Analysis

● Two email datasets: ENRON & Avocado

● Enron contains ~500K emails from 150 employees

● Avocado Research Email Collection contains ~1M emails from 282 accounts

● Group people into clusters based on communication frequency

● Draw graph of communications, weigh edges with email count

● Extract topics for each cluster

● Use clusters to determine communication patterns & anomalies

● Resulting components represent communication groups

Report Generation

Topic Ranking

● Use the hierarchical structure from the analysis (communication groups, email grouping, topic chains, anomalies, etc.)

● Select relevant details to help user understand context of report, based on particular template of choice (summary vs anomalies)

● Reason over content to select good organization/display style.

● Supports multiple report templates, including summary- and anomaly-focused output, with modular extensibility for other styles

Reply /Forward /

Related

● Organize emails into topic chains by looking at replies, forwards, and by comparing topics

● Identify topic flow/change over time

Collaboration

We are proud of a successful collaboration between NC State and the LAS, including monthly meetings with excellent feedback and ideas.

• We use doc2vec to compute similarity via cosine distance

• For topic labeling, we rank topics using additional criteria:

○ PageRank

○ Coverage

○ Redundancy

Colin M. PottsNC State Universitycmpotts@ncsu.edu

Sean Lynch & Tracy StandaferLaboratory for Analytic Science

sclynch@ncsu.edu | tstanda@ncsu.edu

θ