Future of text analysis forrester briefing

Post on 14-Jun-2015

511 views 4 download

Tags:

description

A fall 2011 briefing for personnel at Forrester in Cambridge MA.

Transcript of Future of text analysis forrester briefing

The Future of Text AnalysisDr. Stuart Shulman

Texifter, LLC

Thursday, April 13, 2023

Briefing Agenda

• R&D in Annotation and Public Comments• “The Future of Text Analysis” – The vision• “What is DiscoverText?” – The software• The Features – The basics– Capturing social media importing other text– Creating archives, buckets and datasets– Coding a dataset or training a classifier

Dr. Stuart W. ShulmanFounder & CEO, Texifter, LLCAssistant Professor, Department of Political ScienceUniversity of Massachusetts AmherstDirector, Qualitative Data Analysis Program (QDAP)Associate Director, National Center for Digital GovernmentEditor, Journal of Information Technology & Politics413-545-5375 stu@polsci.umass.eduhttp://people.umass.edu/stu/

Major Project Components

Credentials

The Future of ProjectsProjects leverage users’ credentials to control

access to documents, tools, and resources

Documents Peers

Advanced ‘Social’ Search

Tools for Tagging Shared Analysis

Metadata Networks Filtering

Qualitative & Quantitative Findings

The Future of DocumentsImport & archive data from multiple sources into a single, searchable, unified repository

Email

Files Web

The Future of SearcheDiscovery will search, merge, filter & classify

unlimited amounts of text and other data

Filter

Search

Classify

Report

Well Worth Reading

The Future of Tools

Duplicate & near duplicate

detection

Dynamic user-seeded tag clouds

Adaptable, intuitive and

reusable topic models &

shared memos

Sentiment detection,

redaction & seamless

adjudication

Text processing tools will enable quicker processing and more accurate results

The Future of Peer RelationsUtilize trusted peers to scale your knowledge resources,

increase productivity & lower total project costs

Peers GroupsSecurely segment your peers into project groups by

agency, firm, department, location, or affiliation,while controlling their access via credentials

Security & CredentialsData will be encrypted, secure and accessible by only peers who are granted specific permissions via their credentials

Coding, Tagging or LabelingAnnotation enhances your analysis by applying

human interpretation to machine results

Coding in Flexible Teams

Crowdsourcing

- MIT professor Eric von Hippel, specialist in innovation management“This is really the biggest paradigm shift in innovation since the Industrial Revolution”

Crowdsourcing will bring widely distributedwisdom to process of text analysis

Active Machine Learning

Search

Code

ValidateShare

Analyze

By utilizing information and decisions previously captured, we can enhance future machine-based decisions

Active Learning

Loop

What is DiscoverText?DiscoverText is a:• personal or organizational archive in the cloud• search engine for eDiscovery • social media comment aggregator• de-duplication and near duplicate clustering engine• FOIA redaction toolkit• coding, reporting and validation team workbench• repository of human annotation (text about text), and• customizable machine-learning classifier

– (beta launched April 2011)