Extreme Content Makeover: Migrating Content to DITA

35
Extreme Content Makeover: Migrating Content to DITA Joe Gollner Copyright © Stilo International 2008 Migrating Content to DITA Vice President e-Publishing Solutions Stilo International [email protected]

description

Presented by Joe Gollner at Documentation and Training West, May 6-9, 2008 in Vancouver, BCWhile most organizations would not want to admit it, their content currently exists in a state of disarray. In all too many cases, the legacy content is in such a state that it could become the subject of a prime-time television show provided someone could be found who would agree to mount such a large-scale renovation job. And however daunting the prospect might be, the movement of the marketplace for almost all industry sectors is such that these types of renovations are not only unavoidable they are often urgently needed.

Transcript of Extreme Content Makeover: Migrating Content to DITA

Page 1: Extreme Content Makeover: Migrating Content to DITA

Extreme Content Makeover:Migrating Content to DITA Joe Gollner

Copyright © Stilo International 2008

Migrating Content to DITA Vice Presidente-Publishing Solutions

Stilo [email protected]

Page 2: Extreme Content Makeover: Migrating Content to DITA

The Essence of Content Conversion

Got this! Want that!Got this! Want that!

Page 3: Extreme Content Makeover: Migrating Content to DITA

L C t t EditiLegacy Content Edition

Page 4: Extreme Content Makeover: Migrating Content to DITA

TopicsThe Growing Demand for High Quality ContentThe Growing Demand for High Quality Content

Challenges with Converting Content

Solution Patterns for Converting Content

ConversionConversionRefactoringMetadataLi kiLinkingValidation

C l iConclusionsKey Lessons Learned

Page 5: Extreme Content Makeover: Migrating Content to DITA

An Inconvenient Truth – About Content

Page 6: Extreme Content Makeover: Migrating Content to DITA

Case Study: Drug Look-up Tool

Migrating drug information into a

precise digital form represented a key challenge.

Sources:Mil 33 Q kMiles33, Quark

& vendor monographs

Page 7: Extreme Content Makeover: Migrating Content to DITA

Enterprise Content FrameworksEnterprisero

ls

Programs Domains

Document SourcesActive

Con

tr

ed Publishing Services WebDocument Sources

Ontology Sources

External

Spec

ializ

eM

odel

sR

ulesLegacy

Publishing Services

Discovery Services

Print

ApplicationInte

grat

e

Content ArchitectureData Sources

Inputs Outputs

MechanismsUsers Tools

Data Services

Service Oriented

Authors

Subject Matter Experts

Content Management

Content Processing

Resources

B d t

MechanismsService Oriented Content Architectureslead to high demands

being placed on content resources and

Administrators

Information Architects

Developers

Content Authoring

Development Tools

Web Services

Budget

Personnel

Infrastructure

content resources and the affordability of the

overall process.

Page 8: Extreme Content Makeover: Migrating Content to DITA

Observations on Content ExpectationsWithin this larger context what is expected of content?Within this larger context, what is expected of content?

1. Content will be available as valid XML2. Content will be modularized3. Content will be discretely addressable4. Content will be uniquely identifiable using metadata5. Content will be linked to related content6 Content will be process able with almost perfect confidence6. Content will be process-able with almost perfect confidence

How much legacy content is ready to play this role?How much legacy content is ready to play this role?(How much XML content is even ready for this?)

Page 9: Extreme Content Makeover: Migrating Content to DITA

The Harsh Reality of Legacy ContentLegacy Contentg y

All content resources that require modification in order to be useful

The Legacy Content SpectrumOpaque

Not directly processable (e.g., paper)AnnoyingAnnoying

Aggressively proprietaryLittle or no predictability in usage

Poll tedPollutedNormally processable but frequentlyfilled with deviations & additions (HTML)

TolerableDocumented format that exposes format& structure in a processable form

Page 10: Extreme Content Makeover: Migrating Content to DITA

Content Processing RoadmapACQUIRE ENRICH DELIVER

CONTEXT Import SelectMetadata

ContentProcessing Convert Collect Compile

ManageImport Select PublishCONTENT

ContentProcessing Refactor Relate Resolve

CONNECTIONS Import SelectLinks

Page 11: Extreme Content Makeover: Migrating Content to DITA

Convert ContentACQUIRE ENRICH DELIVER

CONTEXT Import SelectMetadata

ContentProcessing CompileCollectConvert

ManageImport Select PublishCONTENT

ContentProcessing Refactor Relate Resolve

CONNECTIONS Import SelectLinks

Page 12: Extreme Content Makeover: Migrating Content to DITA

Converting Content

??

Conversion: changing the format of legacy content to make it increasinglysuitable for efficient management, revision, reuse and publishing.

Page 13: Extreme Content Makeover: Migrating Content to DITA

Conversion FundamentalsConversion is unavoidable and always under-estimatedConversion is unavoidable and always under-estimated

Conversion is fundamentally a matter of interpretationParsing the legacy format & layoutInferring a meaning from this informationCorrelating the format & layout to a target structureCorrelating the format & layout to a target structureAddressing problems introduced by format peculiaritiesLeveraging the content itself to guide format interpretation E h i i t ti l b t hi t t ttEnhancing interpretive rules by matching content patterns

Automating conversion typically relies on two stages:Format Interpreter that can make sense of source formattingRules-based Correlation Processor that maps content into structures

Page 14: Extreme Content Makeover: Migrating Content to DITA

Conversion Process TemplateSource to S bj tTarget

InteractionSource Analysis

Source to Target

Mapping

SubjectMatterExperts

Legacy

Target XML

Schema

Guidance

Modify Conversion

Process

LegacySourceContent

ModifiedConversion

Rules

ManualEditing

ExistingConversion

Rules

Execute C i Result Identified

I iExample 1

Conversion Process

esuAnalysis

de edIssues Interaction

pSet

SampleSet 10%

2

Validation &Verification

ApplicationTests

CompleteSet 100%

3Complete

Page 15: Extreme Content Makeover: Migrating Content to DITA

Show Me!

Page 16: Extreme Content Makeover: Migrating Content to DITA

Conversion Process InitiationContent AnalysisContent Analysis

Document all features of source content and format

Establish Control CollectionsEstablish Control CollectionsCollections can be used to group files with similar featuresRules can be tailored to address these featuresCollections provide useful management units for tracking & reporting

Clearly Define the Target End StateSh ld b ll it d t lid ti & ifi ti ti itiShould be well-suited to validation & verification activitiesConversion should be separate from refactoring which can follow itEnsure that application testing is performed for verification

Structural validation is not sufficientThe converted content must support its intended uses

Page 17: Extreme Content Makeover: Migrating Content to DITA

Conversion Process PlanningPrepare a Conversion SpecificationPrepare a Conversion Specification

Document analysis results & content mapping rulesIncorporate naming conventions to be applied

Instances media resources identifiers cross referencesInstances, media resources, identifiers, cross-references… Establish a representative Example Set early in process

A limited set of files that exhibit main features of source contentM t h d ith t d t t th t ill t t i t d d ltMatched with converted content that illustrates intended resultUsed to iteratively refine rules & troubleshoot problemsForms part of the Conversion Specification

Prepare a Conversion PlanDocument intervention procedures to be followedDefine manual editing guidelinesg gExplore outsourcing opportunities to enhance process or reduce costsPrepare schedule & cost estimates

Page 18: Extreme Content Makeover: Migrating Content to DITA

Conversion Process RefinementImplement initial Conversion ProcessImplement initial Conversion Process

Maximize automationDevelop validation & verification scenarios that leverage automationEnsure conversion rules can be modified by non-programmers

The goal is to interact with Subject Matter Experts efficientlyBased on Conversion Specification & Example Setp p

Test Conversion ProcessFollow the process from beginning to endFollow the process from beginning to end

Including application tests & output reviewLook for opportunities to enhance automationPerform trial interventions & manual editing to improve proceduresRevise Conversion Specification, Example Set & automation

Page 19: Extreme Content Makeover: Migrating Content to DITA

Conversion Process Execution & AdaptationProcess refinement should continue throughout conversionProcess refinement should continue throughout conversion

Improve automation as the first response to identified issuesMinimize manual editing and ensure it is made as routine as possible

Suitable for outsourcing under knowledgeable guidance

Application Testing is important (verification)Where all target applications are not availableWhere all target applications are not available

Develop tests that will minimize risksReduce risk of rework

M l l ft f t i t t ti i l t i kManual clean-up after format interpretation is less at risk Manual editing as part of content mapping is at greater risk

Separate format interpretation from content mappingp p pp gAn interim XML format should be used as an interfaceInterim format should retain all details available in source content

Page 20: Extreme Content Makeover: Migrating Content to DITA

Refactor ContentACQUIRE ENRICH DELIVER

CONTEXT Import SelectMetadata

ContentProcessing Convert CompileCollect

ManageImport Select PublishCONTENT

ContentProcessing Relate ResolveRefactor

CONNECTIONS Import SelectLinks

Page 21: Extreme Content Makeover: Migrating Content to DITA

Refactoring Content

Refactoring: restructuring content, without loss of meaning, to improve itsg g , g, psuitability for management, maintenance and specifically reuse. Refactoring entails two activities: bursting & normalization

Page 22: Extreme Content Makeover: Migrating Content to DITA

Aspects of RefactoringRefactoring breaks down intoRefactoring breaks down into two tasks

BurstingNormalizationNormalization

Content BurstingDecomposing content into components p g poptimized for reuse

Content NormalizationS t ti l f d d i t i i t i bilitSystematic removal of redundancies to improve maintainability

ChallengesMaintaining a complete equivalence with the originalMaintaining a complete equivalence with the originalAdapting the linking mechanisms so they remain valid and functional

Usually entails introduction of an indirect referencing scheme

Page 23: Extreme Content Makeover: Migrating Content to DITA

Refactoring StrategiesStrategy needed to ensure adequate returns on investmentStrategy needed to ensure adequate returns on investment

Approach must balance cost, risk, effort and time in a practical way

Con

vers

ion

Out

puts

Com

pare

Out

puts

Page 24: Extreme Content Makeover: Migrating Content to DITA

Refactoring: Planning Granularity LevelFinding the Right Level of GranularityFinding the Right Level of Granularity

What are the most “natural” joints where content can be burstHow is content most meaningfully

ManagedManagedAuthoredUsed

Ideally there is a level of granularity that is consistent across the viewsIdeally there is a level of granularity that is consistent across the viewsWhat to Avoid

Over-ambition in defining granularity levelAt some point of decomposition, content becomes

MeaninglessVery difficult to manage Very expensive to achieve across large sets of contentChallenging to work with for authors

Page 25: Extreme Content Makeover: Migrating Content to DITA

NormalizationNormalization is an optimization appropriate for content that:Normalization is an optimization appropriate for content that:

Has a long lifespan Exhibits a significant rate of changeWill be translated into other languagesWill be translated into other languages

Normalization occurs at two levelsAt the level of managed granularity (component)

Commonly performed tasks in technical documentationExample: Procedures for accessing a control interface

At a sub-component levelBoilerplate text (e.g., copyright notice or disclaimer)Advisories (e.g., safety warnings)

Automation can support the process under guidanceAutomation can support the process under guidanceIdentify redundancies & implement replacement decisionsFacilitate verifications that there has been no content loss or output impacts

Page 26: Extreme Content Makeover: Migrating Content to DITA

Realizing Savings through Refactoring

Page 27: Extreme Content Makeover: Migrating Content to DITA

Collect MetadataACQUIRE ENRICH DELIVER

CONTEXT Import SelectMetadata

ContentProcessing Convert CompileCollect

ManageImport Select PublishCONTENT

ContentProcessing Refactor Relate Resolve

CONNECTIONS Import SelectLinks

Page 28: Extreme Content Makeover: Migrating Content to DITA

Collecting Metadata

M t d t t f d t th t id i f ti b t th d tMetadata: a set of data that provides information about other data.Collecting Metadata: extracting, validating, integrating, supplementing, synchronizing and storing metadata from, and about, the content.

Page 29: Extreme Content Makeover: Migrating Content to DITA

Sources of MetadataInternal OntologyInternal

Segments of content designated as valuable metadataAtt ib t il bl i f t

metadata

Attributes available in source formatKeywords & AbstractAnnotations Identify

E t t

ExternalSystem Data (file information)

metadata

T i

ExtractInsert

Associated keywords & descriptionsRatings & commentaryProcess context Taxonomy

Topic

Topic

ocess co te tAdditional information drawn from other sources (e.g., part database)

Link Network

Topic

Topic

Page 30: Extreme Content Makeover: Migrating Content to DITA

Establish RelationshipsACQUIRE ENRICH DELIVER

CONTEXT Import SelectMetadata

ContentProcessing Convert CompileCollect

ManageImport Select PublishCONTENT

ContentProcessing Refactor ResolveRelate

CONNECTIONS Import SelectLinks

Page 31: Extreme Content Makeover: Migrating Content to DITA

Establishing Relationships

Explicit Links (Actual)

Identifier Source Target Type

A1

A2

Implicit Links (Potential)

Identifier Source Target Type

B1

B2

Reuse Links (Physical)

Identifier Resource Request ConditionIdentifier Resource Request Condition

R1

R2

Links: the connections or relationships between things that represent a significant portion of the meaning and value of content

Page 32: Extreme Content Makeover: Migrating Content to DITA

All About LinksIncreasingly importantIncreasingly importantEssential for portals (enabling navigation)Adding linksg

Source / target identificationLink specificationLink generationLink generationLink validationLink extractionLink reportingLink activation

Level of precisionLevel of precision is high as is the potential for error

Page 33: Extreme Content Makeover: Migrating Content to DITA

Content ValidationValidation

Essential capabilityEnables consistent processingStreamlines processesStreamlines processesConfirms conversion end-point

Validation must beAccurateManageable

Convert Transform Publish

ManageableInformativeActionable

Relate

Refactor Collect Compile

ResolvePro-activeContinuously improving

Relate Resolve

Page 34: Extreme Content Makeover: Migrating Content to DITA

ConclusionsContent conversion is an unavoidable undertaking

Performance Support Portals demand high-precision contentdemand high precision content

Content conversion is a challenging undertaking

Particularly given the precision being demanded of the results

Content conversion is a manageable undertaking

Guided automation Substantially reduces costsSubstantially reduces costsDramatically improves quality

But there is no magic…

Page 35: Extreme Content Makeover: Migrating Content to DITA

Your Dreams Can Come True