2003 - Discovery of Working Activities by Email Analysis
-
Upload
franck-dernoncourt -
Category
Documents
-
view
217 -
download
0
Transcript of 2003 - Discovery of Working Activities by Email Analysis
-
8/8/2019 2003 - Discovery of Working Activities by Email Analysis
1/13
1
Discovery of Working Activities by Email Analysis
Yueyu Fu & Hong Zhang
School of Library and Information Science
Indiana University, Bloomington
Email {yufu | honzhang} @indiana.edu
April 30, 2003
Abstract
Email has become one of the most widely used computer applications. As the
number of emails we exchange increases at high rate, the number of uses for email
increases. Email data patterns may give us other useful information such as personal
activities. However, there is no proper visualization tool which can meet this purpose.
Our goal is to explore working activity involved in email communication using Treemap
algorithm. The results show that the Treemap layout can successfully present the various
activities involved in the email flow.
Introduction
Email has become one of the most widely used computer applications. As the
number of emails we exchange increases at high rate, the number of uses for email
increases. Although email was originally designed as a communication tool, it is
currently being used for a number of additional functions including personal archiving
and task management. In addition, email data patterns may give us other useful
information such as personal activities. However, there is no proper visualization tool
which can meet these purposes.
-
8/8/2019 2003 - Discovery of Working Activities by Email Analysis
2/13
2
There have been a number of studies aimed at exploring the current uses of email
and identifying the problems commonly encountered by users dealing with large amount
of emails. They have focused on two features of email: threads and time. Timestore
organizes emails by time and send in a two-dimensional grid. It focused on time-based
email archive and retrieval. Outlook 2000 and NECs VisualMail also have time-based
view. However, the view might be messy and hard to understand if there are too many
emails. Threading is necessary to help manage conversation history and track the status
of conversation in email. Usually, a thread is defined as a series of emails sharing the
same subject line, where prefixes such as Re: and Fw: are ignored. Time attribute of
email is very important in both visualization approaches. However, neither of these
systems can discover the various activity trends hidden in email communications. In this
paper, we proposed a visualization of email dataset to help users perform this kind of task.
Visualization Goal
The outcome of our project is going to be used to explore working activity by
analyzing email flow.
User AnalysisThe intended audience of this project can be any one communicating extensively with
others by email. For instance, programmers email each other to solve programming bugs,
and researchers ask for help to locate research papers by email. The users can be globally
distributed, and of various genders, ages, professions, lifestyles, and technical skills. But
all of them own the knowledge of using email and manipulating mouse for interaction.
-
8/8/2019 2003 - Discovery of Working Activities by Email Analysis
3/13
3
To apply our visualization tool, users have to be able to identify patterns, color hues,
and intensity levels. They also can easily distinguish between different morphological
elements such as words, shapes and images. In addition, their desktops have to have
certain graphical ability, color monitor, and browsing software which allows visualization
tool such as Java Applet to work properly.
Task AnalysisWell-organized emails may give people clearer idea about their working progress.
Therefore it is not unusual that people spend lots of time in arranging their mailbox. They
categorize emails, create folders, and house emails with same subject in one folder. Most
email system provides features such as creating folders, moving email to a specific folder,
and deleting email or folder. Those functions allow people to clean their mail box and
organize it to some degree. But it suffers disadvantages as below:
The organization work is time-consuming. People have to figure out how to label their folders. The name of each folder
serves a reminder of email contents in this folder. But if the name is too long,
due to space issue, people can only see partial of it. Or if people create some
abbreviation in place of the full name, they may suffer the danger of forgetting
the meaning of the abbreviation.
The number of and relationship between emails in a folder are hard to see.Based on above, we proposed a new system which will provide following features:
Each email will be represented as a rectangular in the Treemap. Emails with same subject will be grouped together. The colors represent different senders.
-
8/8/2019 2003 - Discovery of Working Activities by Email Analysis
4/13
4
The sender, subject, and time of email will be shown by moving mouse overthe specific rectangular for this email. Thus people need not to open each
email to see those kinds of information.
After clicking the rectangular assigned to an email, a pop-up windowcontaining email content will be generated.
The control panel will allow user to manipulate the Treemap to get an optimalview.
Data Mining
The dataset of our project is provided by Mr. Jason Baumgartner, the instructor of
the course Information Visualization. This dataset consists of 1695 emails, each of which
contains subject title, content body, senders name, senders email address, sent type,
receivers name, receivers email address, and received type. The senders and receivers
of those emails are software developers. They communicated extensively by email during
software development. And the email contents are closely related to problems and
progresses in the period of development of new software. We hope to discover working
activities of those developers by analysis of those emails.
The dataset is stored as a table in Microsoft Access. Each record has nine fields:
identification number, subject, content, sender name, sender address, sent type, receiver
name, receiver address, and received type. Due to our goal, which is to visualize emails
to explore working flow, the attributes such as subject, body content, and sender are
important. By quickly browsing table contents, we discovered three characteristics of our
datasets. First is that the subject title of each email is a good representative of its body
-
8/8/2019 2003 - Discovery of Working Activities by Email Analysis
5/13
5
content. Second is that the developers has had discussions on several topics. And the
number of emails involved in each discussion may indicate different interest developers
have in different topics. Third is that the dataset presents an obvious hierarchical structure.
Based on these three points, we determined to categorize the data according to subject
field. By going through the whole dataset, we identified that the first sub-level contains
three categories: subject with Fluency, subject with Knownspace-teama, and subject
without either Fluency or Knownspace-teama. We did queries on both the original
and derived tables. Finally, the dataset was divided into tables with a tree-like structure
(See Figure 1). It has three first-level nodes, two of which contain three and five
secondary-level nodes respectively. The number of third-level nodes contained by the
eight secondary-level ones is 2, 3, 0, 2, 2, 2 and 3, respectively. This structure should be
able to be visualized very well by Treemap.
Each of these derived tables maintains the same format as the original table. Our
algorithm has been redesigned so that it can retrieve field content of each of those tables
automatically.
-
8/8/2019 2003 - Discovery of Working Activities by Email Analysis
6/13
6
Figure 1. Hierarchical Structure of Dataset
Visualization & Interaction
To pursue our design goals, Treemap layout is utilized to visualize our dataset.
The Treemap algorithm we chose for this project was developed by Christophe Bouthier.
It uses a space filling technique to map a tree structure into nested rectangles with each
rectangle representing a node. A rectangle area is first allocated to hold the representation
of the tree, and this area is then subdivided into a set of rectangles that represent the top
level of the tree. This process continues recursively on the resulting rectangles to
represent each lower level of the tree, each level alternating between vertical and
horizontal subdivision. According to Ben Shneiderman, Treemap layout is best suited to
hierarchies in which the content of the leaf nodes and the structure of the hierarchy are of
-
8/8/2019 2003 - Discovery of Working Activities by Email Analysis
7/13
7
primary importance, and the content information associated with internal nodes is largely
derived from their children. However, Treemaps should not be used to convey
hierarchical structure of a very large data set.
To utilize the Treemap package, each node of the data tree should implement the
treemap.TMNode interface. Once each node implements the TMNode interface, it just
needs to create a new Treemap, and then pass it the root of the data tree in the constructor.
We utilized the file directory structure to construct the hierarchy of the dataset. Each
folder represents a category. Folders may contain various numbers of sub-folders. The
bottom folder contains a text file, which includes all the record IDs corresponding to that
category. Now, the Treemap is ready to get any number of views of the data tree passed
in parameter. Each view can be configured independently from each other, and if the data
tree is changed, all view will be updated.
InterfacesTwo interfaces were created to explore the dataset. They implemented the
Treemap algorithm in different way. Both interfaces represented the hierarchical structure
in a Treemap layout. But each one chose a different color coding scheme.
-
8/8/2019 2003 - Discovery of Working Activities by Email Analysis
8/13
8
Figure 2. Interface I individual activities
Interface I (See Figure 2) was designed to discover the individual activities
involved in the lifecycle of the software development. Software development is a team-
work. Exploration of interaction trends the email communication can help understand the
development process better and may provide instructions for further software
development. To visualize these personal activities, a color was assigned to represent the
emails from one of the most active person in the email flow. The emails from the inactive
senders were represented by another color. A threshold was applied to select the most
active senders based on the volume of the emails they sent. In a Treemap, the size of a
node is usually determined by a numeric attribute associated with that node. For our data
-
8/8/2019 2003 - Discovery of Working Activities by Email Analysis
9/13
9
set, the size of the node is set to a constant because there is no other suitable attribute
associated with the nodes.
Figure 3. Interface II component distribution
Interface II (See Figure 3) was designed to explore the critical human efforts
involved in the lifecycle of the software development. Developing software needs a lot of
team-based intelligent human efforts. It would be interesting to see how these efforts are
distributed in the whole development process. It also will discover the major parts of the
process visually. Hopefully, this will help improve project management when planning
project progress and assigning human labor. In this Treemap layout, the size of the nodes
-
8/8/2019 2003 - Discovery of Working Activities by Email Analysis
10/13
10
is set to a constant for the same reason as above. The color of each node is dependant on
the category that email belongs to.
InteractionsThe Treemap provides a control panel allowing user to freely manipulate the way to
organize emails (See Figure 4).
Figure 4. User Interface Overview
As moving mouse over, a tooltip containing the information of the node can be
seen (See Figure 5). Users can see the detailed information by left clicking on the node
(See Figure 6). A pop-up window will show up with the whole email message. Each
email message has the subject, the sender, and the body. A potential problem with this
-
8/8/2019 2003 - Discovery of Working Activities by Email Analysis
11/13
11
interaction is that email bodies are not well formatted so that the display sometimes may
be messy.
Figure 5. Interactive Function: Tooltip
-
8/8/2019 2003 - Discovery of Working Activities by Email Analysis
12/13
12
Figure 6. Interactive Function: Pop-up Window
Discussion
Treemap layout is designed to manipulate large data set. In this visualization, it
can handle much larger data set than the one we are using now. Though the size of the
nodes will become smaller and smaller, the goals of the visualization wont be affected.
The pattern hidden in the data set can still be discovered. However, as the data set gets
larger, the topics involved may also increases and become more diverse. This can make it
difficult to categorize the data into meaningful groups, which depends on human decision.
Comparing standard classic Treemap and squarified Treemap, the latter one is better at
dealing with large data set. Form our result, in the standard classic Treemap, the
-
8/8/2019 2003 - Discovery of Working Activities by Email Analysis
13/13
13
individual cells are hard to identify and sometimes even dont display very well. The
cells can get clotted and only a black area can be seen. A desirable extension of the
current system is to provide additional options to analyze the data. Also, providing a
legend to explain the color coding scheme will help users to interpret the graphs.
Acknowledgement
Thanks to Jason Baumgartner who provided data set and helped us in the
development of this project.
References
1. C. Bouthier, Treemap visualization package, 2001.
2. J. Baumgartner, Y. Zou, and K. Brner, Space Filling or Treemap Algorithms at
http://iv.slis.indiana.edu/treemap.html
3. S. L. Rohall, D. Gruen, P. Moody and S. Kellerman, Email Visualiztions to Aid
Communications, Late-Breading Topics,Proceedings of the IEEE Symposium on
Information Visualization, October 22-23, 2001, San Diego, CA., pp. 12-15.
4. S. Sudarsky and R. Hjelsvold, Visualizing Electronic Mail, International Conference
on Information Visualization, 10-12 July 2002, London, England, UK., pp. 3-9.
5. Y. A. Kim and M. Shin, Project Report, Retrieved from
www.cs.umd.edu/class/spring2001/cmsc838b/Project/Kim_Shin/FinalReport.doc