Visual Analysis of Macroscopic Patterns

download Visual Analysis of Macroscopic Patterns

If you can't read please download the document

description

Visual Analysis of Macroscopic Patterns. Visual Analysis of Macroscopic Patterns. Chaomei Chen College of Information Science and Technology. Chaomei Chen College of Information Science and Technology. Drexel Computer Science Colloquium. November 12, 2007. Questions. Question 1: - PowerPoint PPT Presentation

Transcript of Visual Analysis of Macroscopic Patterns

  • Visual Analysis of Macroscopic Patterns Chaomei ChenCollege of Information Science and TechnologyVisual Analysis of Macroscopic Patterns Chaomei ChenCollege of Information Science and TechnologyDrexel Computer Science Colloquium. November 12, 2007

  • QuestionsQuestion 1: How do we recognize that something is interesting, or suspicious, or worth pursuing?

  • QuestionsQuestion 2: What does it take for us to decide whether it will be worthwhile going through a collection of information or a complex network of data?

  • QuestionsQuestion 3: How can we strategically fast forward through a complex web of information at a higher-level of aggregation?

  • OutlinePuzzles and mysteriesInformation foraging and scent followingBayesian reasoningDetect surprises and semantic outliersThe role of structural holes in information networksUnderstanding high-profile and low-profile information patterns

  • The Connecting-the-Dots Problem "I don't think anybody could have predicted that they would try to use an airplane as a missile, a hijacked airplane as a missile," said national security adviser Condoleeza Rice on May 16, 2002. "How is it possible we have a national security advisor coming out and saying we had no idea they could use planes as weapons when we had FBI records from 1991 stating that this is a possibility," said Kristen Breitweiser, one of four New Jersey widows who lobbied Congress and the president to appoint the commission.

    The widows want to know why various government agencies didn't connect the dots before Sept. 11, such as warnings from FBI offices in Minnesota and Arizona about suspicious student pilots.

  • First Monday - Uncloaking Terrorist Networks by Valdis Krebs

  • Connectable Dots?Prior to the 9/11 terrorist attacks, several foreign nationals enrolled in different civilian flying schools to learn how to fly large commercial aircraft. They were interested in learning how to navigate civilian airlines, but not in landings or takeoffs. And they all paid cash for their lessons.

  • Puzzles .vs. MysteriesPuzzlesWhere is bin Laden? MysteriesWhy did Enron collapse?

  • Solving MysteriesWe may have all the necessary information in front of us and yet fail to see the connection or recognize an emergent pattern.

    To solve a mystery, one needs to ask the right question.

  • Solving Mysteriesdecompositionaggregation

  • Macroscopic and Microscopic Levelswordsphrasessentencesdocumentsdigital librariesconceptsassociationsclustersspecialtiesdomainsdisciplines

  • Information Foraging and Sense Making

  • What is my profitability here? Gain=?Cost=?

  • Information Foraging TheoryPeople adapt their search strategies to maximize their profitability, or the profit-investment ratio. Profit: finding relevant informationCost: time spentPeople may adapt their search by reconfiguring the information environment.

  • Information ScentInformation scent is the perception of the value, cost, or accessible path of information sources.

  • Information Foraging at Macroscopic Levels through Information Networks

  • Information Entropy and Uncertainty

  • UncertaintyA good exampleVoting in political elections deal with overwhelmingly diverse informationdifferentiate political positionsaccommodate conflicting viewsupdate beliefs in light of new evidencemake macroscopic, categorical decisions

  • Evidence and Beliefs The USS Scorpion was lost from the sea in May 1968. The search for the USS Scorpion nuclear submarine is a frequently told story of a successful application of Bayesian reasoning.

  • NSF AwardsPreprintsPublicationsCitationsGrantsLong-Term PlansPatentsTextbookForesightnessarXiv, ADSScience, NatureUSPTOWeb of ScienceNSF (SGER)NSF Annual Budget Requests

  • NSF Small Grants for Exploratory Research (SGER) (2000-2007)

  • NSF Budget Requests FY2004-FY2008CISEp=0.5

  • Saliency and Novelty

  • Structural Holes and BrokerageThe lack of comprehensive connectivity among components in a social network. Information flows are restricted to the privileged few who are strategically positioned over structural holes.The presence of a structural hole has a potential for gaining distinct advantages.

  • Previous hot topic?Turning point?Transition path?Current hot topic?

  • Macroscopic Views of Information Contents Information Entropy (Vocabulary)

  • relative entropy

  • Information Indices

  • Interestingness, Unexpectedness and Actionabilityinterestingnessobjectivesubjectiveunexpectednessactionabilitybeliefsinterested in learning how to navigate civilian airlines, but not in landings or takeoffs.

  • natural languageprocessingentity-relationextractionstatistical modelingfeatureselectionassociation rulesclassificationsummarizationInformationtheoryinformationindicesgraphical modelsbelief networksontology constructionpredictive modelsdecision treesinformation scentinterestingnessnoveltyuncertainty predictabilitysearch strategiesemergent propertiesnovelty detectiontopic trackingsense makingformulate hypothesesevaluateevidencedecision makingmacroscopic viewsnew theoriessolved mysteriesAnalytical ReasoningInformation ForagingAggregation & TransformationMicroscopic Structures

  • Technical Contentshttp://www.pages.drexel.edu/~cc345/papers/papers.htmlCiteSpacehttp://cluster.cis.drexel.edu/~cchen/citespace Chen, C. (2006) CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature. Journal of the American Society for Information Science and Technology, 57(3), 359-377. http://cluster.cis.drexel.edu/%7Ecchen/citespace/doc/jasist2006.pdfDifferentiating Conflicting OpinionsChen, C., SanJuan, F. I., SanJuan, E., & Weaver, C. (2006) Visual analysis of conflicting opinions. IEEE Symposium on Visual Analytics Science and Technology (VAST 2006), Baltimore, MA. Oct 31-Nov 2, 2006. pp. 59-66. http://cluster.cis.drexel.edu/%7Ecchen/papers/confs/vast2006-chen.pdfScientific DiscoveriesChen, C., Zhang, J., Zhu, W., Vogeley, M. (2007) Delineating the citation impact of scientific discoveries. IEEE/ACM Joint Conference on Digital Libraries (JCDL 2007). June 17-22, 2007. Vancouver, British Columbia, Canada. http://cluster.cis.drexel.edu/%7Ecchen/papers/confs/jcdl2007.pdfKnowledge DiffusionChen, C., Zhu, W., Tomaszewski, B., MacEachren, A. (2007) Tracing conceptual and geospatial diffusion of knowledge. HCI International 2007. Beijing, China. July 22-27, 2007. http://cluster.cis.drexel.edu/%7Ecchen/papers/confs/hcii2007.pdf

  • Acknowledgements

    National Visualization and Analytics Center (NVAC)Northeast Visualization and Analytics Center (NEVAC)NSF IIS Award #0612129SEI: Coordinated Visualization and Analysis of Sky Survey Data and Astronomical Literature

  • Creditshttp://www.care2.com/c2c/groups/disc.html?gpp=12960&pst=600297&archival=&posts=7http://www.princeton.edu/~rvdb/JAVA/election2004/

  • Before After!

  • *#1729: Catholic Bashing Lite (2003-12-25)I have just finished "The Da Vinci Code". What an utter waste of time! Dan Brown adds nothing to the murder mystery genre with this book. Furthermore, the entire premise is implausible as the key element of the mystery, the "Priory of Sion" and its guardianship of the Holy Grail has been proven to be a complete hoax. Mr. Brown's biblical scholarship is shoddy, his analyses of the artworks of Leonardo are facile and, of course, he provides no motive for the secret, which has been kept for so long, to be kept. I am truly amazed that this book has received as much attention as it has. I am sorry to say that I wasted my money on this book. A much better read would be Umberto Eco's "Foucault's Pendulum" or "Badolino". Holy GrailUmberto Eco

    SGER-General (2000-2007.4)2,096 Awards919 Awards Amount >= $75,000.17,027 phrases extracted, with 2~4 wordsSDSSN articalsM phrasesClimate Change (1995-2007.4)2003-2007.42,185 Awards

    Information entropies of the literature of terrorism research between 1990 and the first half of 2007. The two steep increases correspond to the Oklahoma City bombing in 1995 and the September 11 terrorist attacks in 2001. Entropies are computed retrospectively based on the accumulated vocabulary throughout the entire period. Two consecutive and steep increases of entropy are prominently revealed, corresponding to 1995-1997 and 2001-2003. The eminent increases of uncertainty send a strong message that the overall landscape of terrorism research must have been fundamentally altered. The unique advantage of the information-theoretic insight is that it identifies emergent macroscopic properties without overwhelming analysts with a large amount of microscopic details. Using the terminology of information foraging, these two periods have transmitted the strongest information scent. Note that using the numbers of unique keywords fails to detect the first period identified by information entropy. Subsequent analysis at microscopic levels reveals that the two periods are associated with the Oklahoma City bombing in 1995 and the September 11 terrorist attacks in 2001.Symmetric relative entropy matrix shows the divergence between the overall use of terms across different years. The recent few years are most similar to each other. The boundaries between areas in different colors indicate significant changes of underlying topics. The figure (figure c) shows the distributions of terms based on four different ranking mechanisms: frequency-based, entropy-based, relative entropy-based, and information bias-based. One of the aims is to identify low-frequency but significant terms from unstructured text. Overlaying terms based on various ranking can be used to compare and contrast salient and novel terms.*A network of keywords in the terrorism research literature (1995-1997). High-frequency terms are shown in black, whereas outlier terms identified by informational bias are shown in dark red.