506 Forms Management Evolution to Records, Information ... · Valora Technologies Sandy is...

25
Tuesday May 22, 2018 Forms Management Evolution To Records, Information Governance, and Beyond 1

Transcript of 506 Forms Management Evolution to Records, Information ... · Valora Technologies Sandy is...

Tuesday May 22, 2018

Forms Management EvolutionTo Records, Information

Governance, and Beyond

1

Sandra E. SerkesPresident & CEOValora Technologies

Sandy is President & CEO of Valora Technologies, with an extensive background spanning over 20 years in entrepreneurship, software marketing, product management and corporate strategy, particularly in information governance, predictive analytics, document data mining and processing, computer telephony and speech recognition.

A graduate of Harvard Business School and MIT, she is a frequent industry speaker and panelist.

Since the fall of 2015, Sandy has served as Adjunct Professor to the Columbia University School of Professional Studies, Knowledge Management Program.

Speaker:

2

What’s a form? Formatted document with fields for data entry◦ Takes the place of spoken Q&A

How is this definition/concept changing? Examples:◦ Is a web portal a form? ◦ How about auto-fill-in PDF? ◦ What about audio Q&A? “Say, “yes” to fill a prescription..”

Forms vs. contracts Forms evolved as a way to gather information efficiently◦ Particularly repetitive info◦ Easier to convey & change complex instructions

Webopedia: A form is a formatted document

containing blank fields that users can fill in with data.

Wikipedia: A form is a document with spaces (also named fields or placeholders)

in which to write or select, for a series of documents with similar contents.

3

Is it a form?

4

What’s an information repository? Repository, ECM, DMS, CMS, database – all the

same thing? How is this definition/concept changing?◦ Increased focus on metadata◦ Data lifecycle management◦ Cloud storage options◦ Content Services

Forms vs. repositories Repositories evolved as a way to store information◦ Not well-suited for retrieval or lifecycle management

5

Wikipedia: A content repository is a database of digital content with an associated set of data management, search and access methods allowing application-independent access to

the content, rather like a digital library, but with the ability to store and modify content in addition

to searching and retrieving.

What’s the next logical step?

• Mining the information collected by forms & stored by repositories◦ Content analytics◦ Data mining◦ Predictive analytics◦ AI & Machine Learning

• Why?◦ Organization & control◦ Forecasting/trending◦ Damage control & prevention◦ Legal/regulatory requirements◦ Cost savings◦ Context

6

How do forms & information “get into” the repository?

Brief description of Intake approaches◦ Scanned◦ Email◦ Upload/Drag & drop◦ Physical media◦ Migration

How information is stored in a repository◦ Records and fields, like a database◦ Programmatic access◦ Structured vs. unstructured data

7

What is structured content?

• Structured content either lives in a database and/or has annotations (metadata)◦ CRM Systems◦ Employee Database◦ Financial/ERP Systems◦ Online shopping listings

Structured data is data that has been organized into a formatted repository, typically a database, so that its elements can be made addressable for more effective processing and analysis.

-- Tech Target

8

What is unstructured content? Content that doesn’t have implicit

organization or structure◦ Email and attachments◦ Shared files, active and archived◦ Desktop and “loose” files◦ Paper files, imaged files◦ Cloud storage & collaborative files

Structured content either lives in a database and/or has annotations (metadata)

Unstructured data is a generic label for describing data that is not contained in a database or some other type of data structure.

Unstructured data can be textual or non-textual. Textual unstructured data is generated in media like email messages, PowerPoint presentations, Word documents, collaboration software and instant messages.

Non-textual unstructured data is generated in media like JPEG images, MP3 audio files and Flash video files.

-- Tech Target

“Note that while these sorts of files may have an internal structure, they are still considered "unstructured" because the data they contain doesn't fit neatly in a

database.” -- Webopedia

9

Why should we care?

Growth in File Storage Data, By Type

Unstructured data (CAGR 61.8%)

Structured data (CAGR 23.7%)

What’s an exabyte?One billion gigabytes!

10

What does AutoClassification bring to the table?

AutoClassification makes unstructured data… structured!

That is, providing structure to forms data – without typing or “rubber banding”

Key Things Needed

Metadata

Text

= Context

+

11

Understanding Context: Wonder if anyone would mind…

12

Forms are high profile

• Most likely to contain sensitive data◦ PII, PHI, PCI and more

• Most likely to become obsolete, need refreshing/monitoring

• Often represent contractual terms◦ Indicate acceptance/signature/proof

• Serve as access barrier• Serve as system of record• Most likely to be kept/retained

13

How do forms join other document types to become systems of record?

Workflow origin point Contractual/legal document Explicit or implicit retention schedule

14

How do forms join other document types to become systems of record?

15

Introduction to AutoClassification

• Computer software that performs automated analysis and disposition of file/document content

• Software contains recognition algorithms for◦ Document Type◦ Content analytics◦ Indexing & Tagging◦ Recommended locations & naming

• “Middleware” that sits between file locations (storage) and file uses (applications), providing an intelligent filter and control system for all content.

• Definitively NOT “rubber-banding”

AutoClassification = Rich Metadata + Rules

How does machine learning automatically derive information from forms? Why do organizations need this?

16

How does AutoClassification Work? Processing (aka Intake) is ingesting data or processing in place

◦ Creating OCR for scanned images◦ Extracting text for native files & email◦ Speech to text for audio/video files◦ Translating content to English◦ Re-ordering or re-aligning pages◦ Applying redactions

Tagging (aka Coding, Indexing, Sequencing) is the process of extracting key information and attributes about each document◦ Document Type, Important Dates◦ Key Names & Phrases◦ Topics, Keywords & Themes◦ File, Content and DocType attributes◦ Relation to other documents (duplicate, related, attached, contradictory, etc.)

Disposition (rules) is the process of creating a destination or status for each document◦ Retention status & duration◦ Folder (taxonomy) location◦ Labelling & keywords display

17

native text

text fielded data

fielded data disposition

AutoClassifying a document (Completed Job Application)

Position = Waiter

Employment History

DocType = Job Application Form

Author = Sanchez, Carlos Antonio

Date Format = USDate = 05 April 2003

18

Author Address = 2385 Hickory Blvd. Hudson, NC 28638

AutoClassifying an attachment (patent application)

DocType = Patent ApplicationDate = 10/18/2007Date Format = US

Author = Patent Authors, Author City, Author Country

Assignee = RIM

Tone = Neutral to slightly positive

Embedded Graphic with Title

Other Data Capturable Data Elements:• Patent Number• Filing Date• Key Phrases & Terms• Managing PTO• Implied/Attached Docs• Bar Code Present• And many more . . .

Implied status: Responsive, Nonprivileged

19

Identifying & AutoClassifying

PII with Analytics

Clear PII: SSN

Implied classification: Active PII, needs protection & redaction

Likely PII (“warning sign”)

Clear PII: Home Phone Number

Not PII: Interest Rate

20

What is Information Governance and how do forms fit in?

Forms are some of the most critical document types in Information Governance. They are the building blocks of content management, compliance and risk.

21

Typical Enterprise Setup

Misc Fileshares

Personal/ Group Work

Product

Email Collaborative Cloud Apps

Data Silos

Data Needs

Search & Retrieval

Retention & Legal Hold

Data Privacy & Security

Migration & Archival

Data Lifecycle Planning &

Mgmt

Who Serves as the Traffic Cop In Between?

Databases, Repositories & Apps

Paper

22

23

Forms are part of this tableau

What’s the future for forms, information and document management?

Forms and unstructured content are converging as structured information◦ Thus subject to Information Lifecycle Management

“Forms” will generate on the fly◦ With all information captured automatically◦ Full predictive analytics◦ Automated security, obsolescence and disposition

Because forms are high profile, they will be subject to greater scrutiny, protections and penalties◦ Opt in vs. opt out and the coming era of consent

24

Thank You!

For More Information:

Valora Technologies, Inc.101 Great Road, Suite 220

Bedford, MA 01730781.229.2265

[email protected]

25