Introduction to Content Engineering

63
Copyright © Stilo International plc 2008 An Introduction to Content Engineering Joe Gollner VP e-Publishing Solutions [email protected]

description

This is an introductory tutorial that presents, in a whirlwind fashion, the core concepts underlying Content Engineering.

Transcript of Introduction to Content Engineering

Page 1: Introduction to Content Engineering

Copyright © Stilo International plc 2008

An Introduction to Content Engineering

Joe Gollner VP e-Publishing [email protected]

Page 2: Introduction to Content Engineering

Introduction to Content Engineering: Topics

What is Content?

Content Engineering & the Content Processing Roadmap

The Business Context of Content Engineering

Aims:Establish the nature of, and need for, Content EngineeringDefine a rubric of terminology for the tools and techniques that constitute a practical working framework for discussing, designing, developing and deployingcontent management and processing systems

Page 3: Introduction to Content Engineering

What is Content?

Page 4: Introduction to Content Engineering

Content is how we Communicate

Narrative StructuresImplied Associations

Associative MemoryAcquired PerspectivesImperfect Expression

Associative MemoryAcquired PerspectivesImperfect Interpretation

Content is the physical formof human communication

Content is meaningfulbecause it entails context

Content is typically serializeddue to the ways we

express, store and interpret information

Page 5: Introduction to Content Engineering

The Document as the Popular Face of ContentThe document has proven to be a

powerful device for communicating and retaining content

While documents provide effective physical containers for content, they also lead to multiple modes of exchange and potential obsolescence

Page 6: Introduction to Content Engineering

Content is EverywhereThis has been true since the dawn of

civilization and its importance grows daily

Content populates an ecosystem where people receive, internalize, modify, create and share that content. Content connects everything.

Page 7: Introduction to Content Engineering

The Truth about Content

We are faced with:Massively expanding content volumesDiversifying venues for content deliveryProliferating format varietiesRising expectations of usersEscalating specialization of contentEvolving interconnectedness of contentMultiplying problems related to content securityContinuing lifecycle challenges (obsolescence remains a risk)Increasing complexity of content

(the reintegration of data & documents)Growing recognition of the central importance of content

Page 8: Introduction to Content Engineering

What Lies Ahead?

What are the biggest challenges you face today in managing and using content?

What do you suspect will be the biggest challenge you will be facingin the next five years?

What are the opportunities emergingto leverage content in your business?

Page 9: Introduction to Content Engineering

An Essential Response: Content Engineering

Working DefinitionThe application of rigorous engineering discipline to the design, development and deployment of content management and processing systems

Distinguishing FeaturesSystematic approachProgressive use of technologyAwareness of

Lifecycle considerationsTotal cost of ownershipSolution scalability

Page 10: Introduction to Content Engineering

Engineering and ContentOrganizing work

Laying outwork spacesSequencing of process stepsOptimizing tasksRefining toolsImproving materialsTransferring results between stagesSharing resourcesPerforming maintenanceTroubleshootingproblems

Differential Analyzer – Vannevar Bush (1930s)

Page 11: Introduction to Content Engineering

Content EngineeringContent Engineering

Governing disciplineGoal-directed

Content ManagementProtect Value

Content ProcessingEnhance Value

PeopleCreate Value

PlanningDesigningAuthoringEditing

Page 12: Introduction to Content Engineering

Content Management ComponentsContent Management

ControlOrganize resources, access and lifecycleChangeFacilitate the evolution of content and the associated servicesDeployEnable the servicesthe content makespossible

Control Change Deploy

Page 13: Introduction to Content Engineering

Content Management and Content Processing

A Close RelationshipCM cannot exist without content processing services

Expanding CM services demands more processing

The sophistication of the processing functions increases more rapidly than management functions

Many CMS solutions are constrained by weakcontent processing capabilities

Page 14: Introduction to Content Engineering

Content Processing Components

Content ProcessingConvertTransformPublish

Key Focus in Content Engineering

Page 15: Introduction to Content Engineering

Content Processing ComponentsContent Processing

ConvertTransformPublish

TransformationBreaks down into

RefactorRelateCollectResolveCompile

Emphasis on leveraging efficient automation

Page 16: Introduction to Content Engineering

The Content Processing RoadmapACQUIRE ENRICH DELIVER

CONTEXT

CONNECTIONS

ContentProcessing

ContentProcessing

Convert

Refactor

Collect

Relate

Import

Import

Select

Select

ManageImport Select

Metadata

Links

PublishCONTENT

Resolve

Compile

Page 17: Introduction to Content Engineering

Convert ContentACQUIRE ENRICH DELIVER

CONTEXT

CONNECTIONS

ContentProcessing

ContentProcessing Refactor Relate

Import

Import

Select

Select

ManageImport Select

Links

PublishCONTENT

Resolve

CompileCollect

Metadata

Convert

Page 18: Introduction to Content Engineering

Converting Content

?

Conversion: changing the format of legacy content to make it increasinglysuitable for efficient management, revision, reuse and publishing.

Page 19: Introduction to Content Engineering

The Harsh Reality of Legacy ContentLegacy Content

All content resources that modification in order to be useful

The Legacy Content SpectrumOpaque

Not directly processable (e.g., paper)Annoying

Aggressively proprietaryLittle or no predictability in usage

PollutedNormally processable but frequentlyfilled with deviations & additions (HTML)

TolerableDocumented format that exposes format& structure in a processable form

Page 20: Introduction to Content Engineering

Conversion Fundamentals

Conversion is unavoidable and always under-estimated

Conversion is fundamentally a matter of interpretationParsing the legacy format & layoutInferring a meaning from this informationCorrelating the format & layout to a target structureAddressing problems introduced by format peculiaritiesLeveraging the content itself to guide format interpretation Enhancing interpretive rules by matching content patterns

Automating conversion typically relies on two stages:Format Interpreter that can make sense of source formattingRules-based Correlation Processor that maps content into structures

Page 21: Introduction to Content Engineering

Conversion Process Template

Interaction

Modify Conversion

Process

Source Analysis

Source to Target

Mapping

SubjectMatterExperts

Execute Conversion

Process

Result Analysis

Identified Issues

Validation &Verification

ApplicationTests

Interaction

LegacySourceContent

ModifiedConversion

Rules

ExampleSet

SampleSet 10%

CompleteSet 100%

1

2

3

Target XML

Schema

ManualEditing

Guidance

Complete

ExistingConversion

Rules

Page 22: Introduction to Content Engineering

Refactor ContentACQUIRE ENRICH DELIVER

CONTEXT

CONNECTIONS

ContentProcessing

ContentProcessing

Convert

Relate

Import

Import

Select

Select

ManageImport Select

Links

PublishCONTENT

Resolve

CompileCollect

Metadata

Refactor

Page 23: Introduction to Content Engineering

Refactoring Content

Refactoring: restructuring content, without loss of meaning, to improve itssuitability for management, maintenance and specifically reuse.

Page 24: Introduction to Content Engineering

Aspects of RefactoringRefactoring breaks down into two tasks

BurstingNormalization

Content BurstingDecomposing content into components optimized for reuse

Content NormalizationSystematic removal of redundancies to improve maintainability

ChallengesEnsuring content components remain meaningful & manageableMaintaining a complete equivalence with the originalAdapting the linking mechanisms so they remain valid and functional

Usually entails introduction of an indirect referencing scheme

Page 25: Introduction to Content Engineering

Refactoring Strategies

Strategy needed to ensure adequate returns on investmentRefactor content that undergoes the highest rates of change first

Con

vers

ion

Com

pare

Out

puts

Out

puts

Page 26: Introduction to Content Engineering

Collect MetadataACQUIRE ENRICH DELIVER

CONTEXT

CONNECTIONS

ContentProcessing

ContentProcessing

Convert

Refactor Relate

Import

Import

Select

Select

ManageImport Select

Links

PublishCONTENT

Resolve

CompileCollect

Metadata

Page 27: Introduction to Content Engineering

Collecting Metadata

Metadata: a set of data that provides information about other data.Collecting Metadata: extracting, validating, integrating, supplementing, synchronizing and storing metadata from, and about, the content.

Page 28: Introduction to Content Engineering

The Function of MetadataMetadata is used to make the context of content explicit

Used to facilitate Control

SecurityLimitation of rights

Orderly storage & retrievalDiscovery

SearchingNavigating

Exchange

Surprisingly important pointThe boundary between metadata and content is never completely clear Yale University Library

Page 29: Introduction to Content Engineering

The Storage of Metadata

Useful Design Pattern: Detachable MetadataKey metadata clustered into a document sub-componentShareable amongst many usesIncorporated into documentwhen important to do so &only then

Page 30: Introduction to Content Engineering

Ontologies, Taxonomies & Metadata

The Meaning of MetadataMetadata categories and values relate content to aspects of an OntologyThe Ontology provides the context for metadata

OntologiesDescribe a domain of knowledgeCan be used as the basis of:

Taxonomies (classification schemes)Link networksContext driven navigational aids

Taxonomy

metadata

metadata

Link Network

Ontology

Topic

Topic

Topic

Topic

Page 31: Introduction to Content Engineering

Establish RelationshipsACQUIRE ENRICH DELIVER

CONTEXT

CONNECTIONS

ContentProcessing

ContentProcessing

Convert

Refactor

Import

Import

Select

Select

ManageImport Select PublishCONTENT

Resolve

CompileCollect

Metadata

Relate

Links

Page 32: Introduction to Content Engineering

Establishing Relationships

Explicit Links (Actual)

Identifier Source Target Type

A1

A2

Implicit Links (Potential)

Identifier Source Target Type

B1

B2

Reuse Links (Physical)

Identifier Resource Request Condition

R1

R2

Links: the connections or relationships between things that represent a significant portion of the meaning and value of content

Page 33: Introduction to Content Engineering

Link Management

Increasingly importantIncreasingly complexLink Analysis

Significant processingLeverages external storage of links& link metadata

Link generationbecoming critical

metadata

metadataOutbound Link

Transclusion Link

Inbound Link

Link Base

Bidirectional External Link

Link Analysis:Outbound Links: Intact or brokenTransclusions: Where usedInbound Links: Track-back / Where citedExternal Links: Network participation

Page 34: Introduction to Content Engineering

Deliver ContentACQUIRE ENRICH DELIVER

CONTEXT

CONNECTIONS

ContentProcessing

ContentProcessing

Convert

Refactor Relate

Import

Import

Select

Select

ManageImport Select

Links

CONTENT

Collect

Metadata

Compile

Publish

Resolve

Page 35: Introduction to Content Engineering

Delivering Content

Resolve: assemble content and instantiate applicable relationshipsCompile: convert resolved content into a form suitable for renditionPublish: render the content in the forms required by the context

Resolve

Compile Publish

Page 36: Introduction to Content Engineering

The Goal: High Fidelity Automation

Delivery ProcessingAssembling the inputs

Content requestedSupporting assetsApplicable stylesheets & rules

Resolve into a processable wholeCompile formattable content representationsPublish final formatted renditions

Print Publishing(PDF)

Deliver- Resolve- Compile- Publish

Web Publishing(Portal / Portable)

Content

Res

olve

Publish

Output Web Products

Output Print Products

XHTML

PDF

TemplatesOutput Plan

(Map & View)

Assets

Rules

Out

put V

aria

nts

Ren

der

Tran

sfor

mat

ions

Compile

Content

Page 37: Introduction to Content Engineering

Content Processing & Validation

ValidationEssential capabilityEnables consistent processingStreamlines processes

Validation must beAccurateManageableInformativeActionablePro-activeContinuously improving

Page 38: Introduction to Content Engineering

Validate & Transform: SimpleContent Validation

DTD structural rulesInstance conformance

Content TransformationTraditionally focused on arranging content for formattingSupporting primarily structural manipulation

Validated OutputsInputs to rendition processesHTML outputsXML outputs

Page 39: Introduction to Content Engineering

Validate & Transform: ComplexContent Validation & Verification

Schema structural rulesRules governing content valuesInstance conformance

Content TransformationContinuous process of improvementParse, validate, align, verify…repeatManipulation of many content types

Validated OutputsInputs to rendition processesHTML outputsXML outputsData outputs for applications

TransformationProcessing

Outputs

ContentInstance

Schema Rules

Structure Validation Content Verification

Page 40: Introduction to Content Engineering

Complexity and the Cost of Quality

Complexity is inherent in the nature of content

Increasing content complexity increases the amount and sophistication of content processing tasks

Increases in content processing tasks results in a significant increase in the total cost of quality

Page 41: Introduction to Content Engineering

Solution ArchitecturesAssemblescomponentsto provideintegratedservices

Technologyselection &integration

Standardsselection &integration

Multiple solution instances will exist

SolutionArchitectures

Content Engineering

Content Processing

Content Management

Convert Transform Publish

Relate

Refactor Collect Compile

Resolve

Validate

Page 42: Introduction to Content Engineering

Managing Solution Risk

Integration risk representsThe potential loss of servicesThe potential loss of assets

Integration risk increases with the increase in the number of technologies used to build a solution

System complexityCan be managed Ultimately limits solution affordability and even viabilityAddressed in design selections

Page 43: Introduction to Content Engineering

Technology Selection

Key ConsiderationsSolution contextScored against requirementsScoring scale

0 – No Fit6 – Total Fit

Results weighedagainst acquisition cost

Page 44: Introduction to Content Engineering

Technology Lifecycle Considerations

Solution context includesUrgencyComplexityCriticalityConstraints

Projected lifecycleExpected lifespanRate of changeInfluencing factors

Low

High

High

Time

High

High

Complexity

Measuring Overall Productivity over Time

Page 45: Introduction to Content Engineering

Solution Component Dependencies

MediaSources

ProcessRules

StyleSheetsABC

ProcessingScripts

DocumentTemplates Data

Sources

Relationships

A

LogReports

Because all components within a solution evolve their inter-dependencies require explicit description and management.

Schemas

QualityReports

StructureMaps

AnalysisReports

Bx y.. .... ..

Import Sources

Content Files<X>

ConfigurationFiles

Page 46: Introduction to Content Engineering

Evaluating Standards as Potential ToolsIndependence

From parochial interests, proprietary claims, external influences

FormalityOf creation, validation, approval & modification process

StabilityOf standard over time & the backward compatibility of changes

CompletenessSufficiency for declared scope as well as availability of useful documentation & reference implementations

AdoptionExtent of support amongst tool vendors, authorities & users

PracticalityThe extent to which all, or parts, of the standard can be deployed

Page 47: Introduction to Content Engineering

Evaluating a Specialized Industry Standard

ScenarioIndustry specificationBroad scopeSpecialized stakeholder communityContinuouslychanging & expanding

StrategyImplement where necessaryAddress risk areas

Page 48: Introduction to Content Engineering

Evaluating a Cross-Industry Standard

ScenarioAddressing widespread issuesBroad stakeholder communityMatureFurther capabilities emerging

StrategyPlan for adoptionConsider for use in variety of areas

Page 49: Introduction to Content Engineering

Content Solution Architecture Framework

Content Architecture

Enterprise

Programs Domains

Document Sources

Ontology Sources

Data Sources

Active

External

Inputs Outputs

Authors

Subject Matter Experts

Administrators

Information Architects

Developers

Content Management

Content Processing

Content Authoring

Development Tools

Web Services

Resources

Budget

Personnel

Infrastructure

Mechanisms

Con

trols

Spe

cial

ized

Mod

els

Rul

es

Users Tools

Legacy

Publishing Services

Discovery Services

Data Services

Web

Print

ApplicationInte

grat

e

Page 50: Introduction to Content Engineering

Content ArchitectureEstablishesgoverning modelof the knowledgedomain

The knowledgethat has informedthe content

The knowledgebeing encapsulatedin the solutions

Supports multiplesolution instances

Content Architecture

SolutionArchitectures

Content Engineering

Content Processing

Content Management

Convert Transform Publish

Relate

Refactor Collect Compile

Resolve

Validate

Page 51: Introduction to Content Engineering

The Central Role of the Content Architecture

Concept Reference

Effectivity

SpecializedInformation Types

Specialized Domains

TaskData

Data

Data

Data

Description

Data

Data

Data Data

Data

Data

Data

Description

Description

Description

DataData

SpecializedDelivery Processes

Procedure

Topic

FormattingAnnotation

Change

Procedure

Procedure

Procedure

Procedure

SpecializedTaxonomies

Service Requirements

ContentArchitecture

Procedure

Procedure

Discovery Requirements

Page 52: Introduction to Content Engineering

Content Solution Design Principles

The nature of content demands an adaptable architecture

Technology components should be loosely-coupledContent must always be available in its simplest self-describing form

Data stores should be replaceable by stored instancesTrue for content, metadata and links

Content processing events can be performed many waysSimple methods must be present, sophisticated methods may be

All interfaces established as the exchange of validated contentProcessing rules are, themselves, managed & processable content

Content Processing should be extensively leveragedContent validation, analysis and reporting at every stage Used to manage & optimize solution components to improve efficiency

Page 53: Introduction to Content Engineering

Content Engineering Maturity Model

Modeled on the Software Engineering Institutes (SEI)Capability Maturity Model Integration (CMMI)

“managed” used instead of “quantitatively managed” for level 4“repeated” used instead of “managed” for level 2“reactive” used instead of “performed” for level 1

ObjectiveFollow softwareengineering inemphasizing theimportance of formalization &quantitative methodsfor continuousimprovement

Optimized

Managed

Defined

Repeated

Reactive

Incomplete

5

4

3

2

1

0

LevelContent Engineering Maturity Model

Page 54: Introduction to Content Engineering

CE Maturity Model: Level 0 Incomplete

IncompleteOften the complete absence of a documented processA process that is documented but not followed also qualifies

FeaturesNew requirementsaddressed usingavailable toolsEach solution seeks cost minimizationNo persistentinfrastructureNo improvementbetween projects

Page 55: Introduction to Content Engineering

CE Maturity Model: Level 1 Reactive

ReactiveA process exists for specific goalsSufficient for the needs of selected productsNot institutionalized and not integrated with institutional processes

FeaturesNot designed tohandle new orchanging requirementsCan result in multiple solutionseach created as areaction

Optimized

Managed

Defined

Repeated

Reactive

Incomplete

5

4

3

2

1

0

LevelContent Engineering Maturity Model

Page 56: Introduction to Content Engineering

CE Maturity Model: Level 2 Repeated

RepeatedA managed process exists and is supported by basic infrastructurePredictability can be achieved in process performance & productsReviews are conducted to identify & initiate improvements

FeaturesA common set of tools has been selectedProcedures exist for stepsSolution componentsdocumented

Optimized

Managed

Defined

Repeated

Reactive

Incomplete

5

4

3

2

1

0

LevelContent Engineering Maturity Model

Page 57: Introduction to Content Engineering

CE Maturity Model: Level 3 Defined

DefinedStandardization in processes established on an institutional levelCommon tools & techniques used across processes & projects

FeaturesA single infrastructure usedto support multipleprocesses & projectsProcesses definedwith reference toenterprise modelsInterrelationships are known

Optimized

Managed

Defined

Repeated

Reactive

Incomplete

5

4

3

2

1

0

LevelContent Engineering Maturity Model

Page 58: Introduction to Content Engineering

CE Maturity Model: Level 4 Managed

ManagedProcesses are managed using quantitative measurementAutomation is maximized in the execution of process stepsA single integrated & managed environment supports all processes

FeaturesInfrastructure components managed as contentwith automation used to adapt behaviourHigh levels ofquality sustained

Optimized

Managed

Defined

Repeated

Reactive

Incomplete

5

4

3

2

1

0

LevelContent Engineering Maturity Model

Page 59: Introduction to Content Engineering

CE Maturity Model: Level 5 Optimized

OptimizedContinuous orientation towards improvementContinuous refactoring of solution and content to achieve efficienciesContinuous identification & implementation of heightened standards

FeaturesSystematic analysis& correction of variationsProactive identification of newproducts & servicesthat can be offeredIndustry innovation

Optimized

Managed

Defined

Repeated

Reactive

Incomplete

5

4

3

2

1

0

LevelContent Engineering Maturity Model

Page 60: Introduction to Content Engineering

General ObservationsContent is inherently complex

Current trends have moved content to the center of attention

Content Engineering is an essential responseProvides the necessary discipline & the conceptual frameworkContent has not typically received this level of attention in the past

Effective Content Processing is central to successContent Management services are enabled by content processesAdaptive content processing is essential for addressing change

Effective Content Solutions are designed to cover the complete content lifecycle and all stakeholder perspectives

The efficient management and processing of content remains an elusive goal for most organizations

Page 61: Introduction to Content Engineering

Content Engineering and Business Value

The design of Content Solutions shouldContinuously minimize the costs of acquiring, enriching, managing and delivering contentContinuously improve contentresources through enrichmentContinuously increase the benefits realized throughthe delivery of contentContinuously reduce risksthreatening content assets or the services being supported

Each of these represents an increase in value

Page 62: Introduction to Content Engineering

Top Ten Secrets of Content Solution SuccessDon’t underestimate your content or your businessDon’t underestimate the power of good automationChose an appropriate tool set and validate your choicesDon’t invest in content management technology too earlyCarefully plan and execute migration activitiesTake a “customer service” focus in delivering tangiblebenefits (new products / services) from your investmentsBe demanding of your suppliers (expect quality)Engage your stakeholders and “take control” of the solutionLeverage standards, don’t be enslaved by themBe an active part of the community as a way to learn and as a way to share what you have learned

Page 63: Introduction to Content Engineering

The End

Admittedly an awful lot to cover ina single go. Hopefully some of the ideas connect with some of your experiences and perhaps help in framing aspects of your next project.

Joe GollnerVP e-Publishing SolutionsStilo International

[email protected]