506 Forms Management Evolution to Records, Information ... · Valora Technologies Sandy is...
Transcript of 506 Forms Management Evolution to Records, Information ... · Valora Technologies Sandy is...
Sandra E. SerkesPresident & CEOValora Technologies
Sandy is President & CEO of Valora Technologies, with an extensive background spanning over 20 years in entrepreneurship, software marketing, product management and corporate strategy, particularly in information governance, predictive analytics, document data mining and processing, computer telephony and speech recognition.
A graduate of Harvard Business School and MIT, she is a frequent industry speaker and panelist.
Since the fall of 2015, Sandy has served as Adjunct Professor to the Columbia University School of Professional Studies, Knowledge Management Program.
Speaker:
2
What’s a form? Formatted document with fields for data entry◦ Takes the place of spoken Q&A
How is this definition/concept changing? Examples:◦ Is a web portal a form? ◦ How about auto-fill-in PDF? ◦ What about audio Q&A? “Say, “yes” to fill a prescription..”
Forms vs. contracts Forms evolved as a way to gather information efficiently◦ Particularly repetitive info◦ Easier to convey & change complex instructions
Webopedia: A form is a formatted document
containing blank fields that users can fill in with data.
Wikipedia: A form is a document with spaces (also named fields or placeholders)
in which to write or select, for a series of documents with similar contents.
3
What’s an information repository? Repository, ECM, DMS, CMS, database – all the
same thing? How is this definition/concept changing?◦ Increased focus on metadata◦ Data lifecycle management◦ Cloud storage options◦ Content Services
Forms vs. repositories Repositories evolved as a way to store information◦ Not well-suited for retrieval or lifecycle management
5
Wikipedia: A content repository is a database of digital content with an associated set of data management, search and access methods allowing application-independent access to
the content, rather like a digital library, but with the ability to store and modify content in addition
to searching and retrieving.
What’s the next logical step?
• Mining the information collected by forms & stored by repositories◦ Content analytics◦ Data mining◦ Predictive analytics◦ AI & Machine Learning
• Why?◦ Organization & control◦ Forecasting/trending◦ Damage control & prevention◦ Legal/regulatory requirements◦ Cost savings◦ Context
6
How do forms & information “get into” the repository?
Brief description of Intake approaches◦ Scanned◦ Email◦ Upload/Drag & drop◦ Physical media◦ Migration
How information is stored in a repository◦ Records and fields, like a database◦ Programmatic access◦ Structured vs. unstructured data
7
What is structured content?
• Structured content either lives in a database and/or has annotations (metadata)◦ CRM Systems◦ Employee Database◦ Financial/ERP Systems◦ Online shopping listings
Structured data is data that has been organized into a formatted repository, typically a database, so that its elements can be made addressable for more effective processing and analysis.
-- Tech Target
8
What is unstructured content? Content that doesn’t have implicit
organization or structure◦ Email and attachments◦ Shared files, active and archived◦ Desktop and “loose” files◦ Paper files, imaged files◦ Cloud storage & collaborative files
Structured content either lives in a database and/or has annotations (metadata)
Unstructured data is a generic label for describing data that is not contained in a database or some other type of data structure.
Unstructured data can be textual or non-textual. Textual unstructured data is generated in media like email messages, PowerPoint presentations, Word documents, collaboration software and instant messages.
Non-textual unstructured data is generated in media like JPEG images, MP3 audio files and Flash video files.
-- Tech Target
“Note that while these sorts of files may have an internal structure, they are still considered "unstructured" because the data they contain doesn't fit neatly in a
database.” -- Webopedia
9
Why should we care?
Growth in File Storage Data, By Type
Unstructured data (CAGR 61.8%)
Structured data (CAGR 23.7%)
What’s an exabyte?One billion gigabytes!
10
What does AutoClassification bring to the table?
AutoClassification makes unstructured data… structured!
That is, providing structure to forms data – without typing or “rubber banding”
Key Things Needed
Metadata
Text
= Context
+
11
Forms are high profile
• Most likely to contain sensitive data◦ PII, PHI, PCI and more
• Most likely to become obsolete, need refreshing/monitoring
• Often represent contractual terms◦ Indicate acceptance/signature/proof
• Serve as access barrier• Serve as system of record• Most likely to be kept/retained
13
How do forms join other document types to become systems of record?
Workflow origin point Contractual/legal document Explicit or implicit retention schedule
14
Introduction to AutoClassification
• Computer software that performs automated analysis and disposition of file/document content
• Software contains recognition algorithms for◦ Document Type◦ Content analytics◦ Indexing & Tagging◦ Recommended locations & naming
• “Middleware” that sits between file locations (storage) and file uses (applications), providing an intelligent filter and control system for all content.
• Definitively NOT “rubber-banding”
AutoClassification = Rich Metadata + Rules
How does machine learning automatically derive information from forms? Why do organizations need this?
16
How does AutoClassification Work? Processing (aka Intake) is ingesting data or processing in place
◦ Creating OCR for scanned images◦ Extracting text for native files & email◦ Speech to text for audio/video files◦ Translating content to English◦ Re-ordering or re-aligning pages◦ Applying redactions
Tagging (aka Coding, Indexing, Sequencing) is the process of extracting key information and attributes about each document◦ Document Type, Important Dates◦ Key Names & Phrases◦ Topics, Keywords & Themes◦ File, Content and DocType attributes◦ Relation to other documents (duplicate, related, attached, contradictory, etc.)
Disposition (rules) is the process of creating a destination or status for each document◦ Retention status & duration◦ Folder (taxonomy) location◦ Labelling & keywords display
17
native text
text fielded data
fielded data disposition
AutoClassifying a document (Completed Job Application)
Position = Waiter
Employment History
DocType = Job Application Form
Author = Sanchez, Carlos Antonio
Date Format = USDate = 05 April 2003
18
Author Address = 2385 Hickory Blvd. Hudson, NC 28638
AutoClassifying an attachment (patent application)
DocType = Patent ApplicationDate = 10/18/2007Date Format = US
Author = Patent Authors, Author City, Author Country
Assignee = RIM
Tone = Neutral to slightly positive
Embedded Graphic with Title
Other Data Capturable Data Elements:• Patent Number• Filing Date• Key Phrases & Terms• Managing PTO• Implied/Attached Docs• Bar Code Present• And many more . . .
Implied status: Responsive, Nonprivileged
19
Identifying & AutoClassifying
PII with Analytics
Clear PII: SSN
Implied classification: Active PII, needs protection & redaction
Likely PII (“warning sign”)
Clear PII: Home Phone Number
Not PII: Interest Rate
20
What is Information Governance and how do forms fit in?
Forms are some of the most critical document types in Information Governance. They are the building blocks of content management, compliance and risk.
21
Typical Enterprise Setup
Misc Fileshares
Personal/ Group Work
Product
Email Collaborative Cloud Apps
Data Silos
Data Needs
Search & Retrieval
Retention & Legal Hold
Data Privacy & Security
Migration & Archival
Data Lifecycle Planning &
Mgmt
Who Serves as the Traffic Cop In Between?
Databases, Repositories & Apps
Paper
22
What’s the future for forms, information and document management?
Forms and unstructured content are converging as structured information◦ Thus subject to Information Lifecycle Management
“Forms” will generate on the fly◦ With all information captured automatically◦ Full predictive analytics◦ Automated security, obsolescence and disposition
Because forms are high profile, they will be subject to greater scrutiny, protections and penalties◦ Opt in vs. opt out and the coming era of consent
24
Thank You!
For More Information:
Valora Technologies, Inc.101 Great Road, Suite 220
Bedford, MA 01730781.229.2265
25