WTUI18 - Modelling and Parsing Business Data Using DFDL

Post on 11-Jul-2015

816 views 19 download

Tags:

Transcript of WTUI18 - Modelling and Parsing Business Data Using DFDL

I18: Data Format Description

Language (DFDL)

Modeling and Parsing Business Data

Alex Wood IBM DFDL Development Team Lead

IBM Hursley Lab ,UK  

©  2014  IBM  Corpora/on    

Please  Note  IBM’s  statements  regarding  its  plans,  direc/ons,  and  intent  are  subject  to  change  or  withdrawal  without  no/ce  at  IBM’s  sole  discre/on.  Informa/on  regarding  poten/al  future  products  is  intended  to  outline  our  general  product  direc/on  and  it  should  not  be  relied  on  in  making  a  purchasing  decision.    

The  informa/on  men/oned  regarding  poten/al  future  products  is  not  a  commitment,  promise,  or  legal  obliga/on  to  deliver  any  material,  code  or  func/onality.  Informa/on  about  poten/al  future  products  may  not  be  incorporated  into  any  contract.  The  development,  release,  and  /ming  of  any  future  features  or  func/onality  described  for  our  products  remains  at  our  sole  discre/on  

Performance  is  based  on  measurements  and  projec/ons  using  standard  IBM  benchmarks  in  a  controlled  environment.    The  actual  throughput  or  performance  that  any  user  will  experience  will  vary  depending  upon  many  factors,  including  considera/ons  such  as  the  amount  of  mul/programming  in  the  user’s  job  stream,  the  I/O  configura/on,  the  storage  configura/on,  and  the  workload  processed.    Therefore,  no  assurance  can  be  given  that  an  individual  user  will  achieve  results  similar  to  those  stated  here.  

2  

•  Introduc/on  

• OGF  DFDL  –  a  standard  for  modelling  text  and  binary  data  

•  IBM  DFDL  Component  

• DFDL  and  IIB  -­‐  demo  

• Ques/ons  

Agenda  

Modeling  Text  and  Binary  Data  •  Much  of  the  data  in  the  world  resides  in  files,  is  not  XML,  is  a  mixture  of  textual  and  binary  with  

custom  syntax  and  encodings,  and  does  not  have  a  shareable  machine-­‐readable  descrip/on  

•  But  there  has  been  no  universal  standard  for  modelling  this  data!  – XML  -­‐>  use  XML  Schema  – RDBMS  -­‐>  use  database  schema  – Text/binary  -­‐>  ??  

•  Exis/ng  standards  are  too  prescrip/ve:  “Put  your  data  in  this  format!”  •  Organiza/ons  including  IBM  evolved  their  own  way  of  modelling  text  and  binary  data  based  on  

customer  need.    •  IBM®  examples…  

– WebSphere®  Message  Broker:    MRM  message  set  – WebSphere  Transforma/on  Extender:  Type  Trees  – WebSphere  DataPower:  FFD  – WebSphere  Cast  Iron:  Flat  File  Schema  – Sterling  B2B  Integrator:  DDF  and  IDF  files  

ü  DFDL: a universal, shareable, non-prescriptive description for general text & binary data formats

•  Introduc/on  

• OGF  DFDL  –  a  standard  for  modelling  text  and  binary  data  

•  IBM  DFDL  Component  

• DFDL  and  IIB  

• Ques/ons  

Agenda  

Data  Format  Descrip:on  Language  (DFDL)  

§ A new open standard – From the Open Grid Forum (OGF)

–  http://www.ogf.org/ – Version 1.0

–  ‘Proposed Recommendation’ status

§ A way of describing data… –  It is NOT a data format itself!

§ A powerful modeling language … – Text, binary and bit – Commercial record-oriented – Scientific and numeric – Modern and legacy –  Industry standards

§ While allowing high performance … – You choose the right data format

for the job

§ Leverage XML Schema technology – Uses W3C XML Schema 1.0 subset

& type system to describe the logical structure of the data

– Uses XSDL annotations to describe the physical representation of the data

– The result is a DFDL schema § Keep simple cases simple § Annotations are human readable § Both read and write

– A DFDL Processor can parse and serialize data using a DFDL schema

§  Intelligent parsing – Automatically resolve choices and

optionality § Validation of data when parsing and

serializing

Example  –  Delimited  Text  Data  

cat=5;lbound=-7.1E8 ASCII text

integer ASCII text

floating point

Separator

Separators, initiators (aka tags), & terminators are all examples in DFDL of delimiters

Initiator Initiator

§ 7

Example  –  DFDL  Schema  <xs:complexType name=“numbers"> <xs:sequence> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/v1.0"> <dfdl:sequence separator=“;” encoding=“ascii” …/> </xs:appinfo>

</xs:annotation> <xs:element name=“category" type=“xs:int”> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/v1.0"> <dfdl:element representation="text"

textNumberPattern="###0" encoding="ascii" lengthKind="delimited" initiator=“cat=" …/> </xs:appinfo>

</xs:annotation> </xs:element>

<xs:element name=“lowerBound" type=“xs:float”> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/v1.0"> <dfdl:element representation="text" textNumberPattern="##0.0#E0" encoding="ascii"

lengthKind="delimited" initiator=“lbound=" …/> </xs:appinfo>

</xs:annotation> </xs:element> </xs:sequence> </xs:complexType>

DFDL properties

DFDL annotation

cat=5;lbound=-7.1E8

Example  –  DFDL  Schema  (Short  Form)  

<xs:complexType name=“numbers"> <xs:sequence dfdl:separator=“;” dfdl:encoding=“ascii” …> <xs:element name=“category" type=“xs:int” dfdl:representation="text"

dfdl:textNumberPattern="###0" dfdl:encoding="ascii" dfdl:lengthKind="delimited" dfdl:initiator=“cat=" … /> <xs:element name=“lowerBound" type=“xs:float” dfdl:representation="text"

dfdl:textNumberPattern="##0.0#E0" dfdl:encoding="ascii" dfdl:lengthKind="delimited" dfdl:initiator="lbound=" … /> </xs:sequence> </xs:complexType>

DFDL properties

cat=5;lbound=-7.1E8

§  A DFDL processor uses a DFDL schema to understand a data stream

§  It consists of a DFDL parser and (optionally) a DFDL serializer

§  The DFDL parser reads a data stream and creates a DFDL ‘infoset’

§  The DFDL serializer takes a DFDL ‘infoset’ and writes a data stream

DFDL Processor

<Document> <Element name=“numbers”/> <Element name=“category” dataType=“xs:int” dataValue=“5”/> <Element name=“lowerBound” dataType=“xs:float” dataValue=“-7.1E08”/> </Element> </Document>

cat=5;lbound=-7.1E8

<xs:complexType name=“numbers"> <xs:sequence dfdl:separator=“;” dfdl:encoding=“ascii” ... > <xs:element name=“category" type=“xs:int” dfdl:representation="text"

dfdl:encoding="ascii“ dfdl:textNumberPattern=“###0” dfdl:lengthKind="delimited" dfdl:initiator=“cat=“ ... /> <xs:element name=“lowerBound" type=“xs:float” dfdl:representation="text"

dfdl:encoding="ascii“ dfdl:textNumberPattern=“##0.0#E0” dfdl:lengthKind="delimited" dfdl:initiator=“lbound=“ ... /> </xs:sequence> </xs:complexType>

DFDL Processor

DFDL  1.0  Features  §  Text data types such as strings, numbers, zoned decimals, calendars, booleans §  Binary data types such as integers, floats, BCD, packed decimals, calendars, booleans §  Fixed length data and data delimited by text or binary markup §  Bi-directional text §  Bit data of arbitrary length §  Pattern languages for text numbers and calendars §  Ordered, unordered and floating content §  Default values on parsing and serializing §  Nil values for handling out-of-band data §  Fixed and variable arrays §  XPath 2.0 expression language including variables to model dynamic data §  Speculative parsing to resolve choices and optional content §  Validation to XML Schema 1.0 rules §  Scoping mechanism to allow common property values to be applied at multiple points §  Hide elements in the data §  Calculate element values

When  should  I  use  DFDL?  §  DFDL’s sweet spot is when you need to model and parse a text or binary data format and where either:

§  You have a specification of the data format ‘on the wire’ §  You have actual wire examples of the data format

§  DFDL is recommended to model: §  Binary data from COBOL, C, PL/1, ASM programs §  Text data with delimiters such as CSV §  Text industry standards such as SWIFT, HL7, EDIFACT, X12, HIPPA… §  Binary industry standards such as ISO8583, TLog, ...

§  DFDL is not recommended to model: §  XML

§  Already have XML parsers and XML Schema / DTDs §  JSON

§  Already have JSON parsers, and JSON schema under design §  GPB, HDF5, …

§  With serialization formats like GPB, the wire format is never exposed to the consumer and access to the data is using APIs

DFDL  Adop:on  •  IBM  DFDL  reusable  component  ships  with:  

–  WebSphere  Message  Broker  v8.0    –  IBM  Integra/on  Bus  v9.0  –  IBM  Integra/on  Bus  open-­‐beta  –  Ra/onal®  Performance  Test  Server  v8  (v8.0.1  onwards)  –  Ra/onal  Test  Virtualiza/on  Server  v8  (v8.0.1  onwards)  –  Ra/onal  Test  Workbench  v8  (v8.0.1  onwards)  –  Ra/onal  Developer  for  System  z  v8.5  –  InfoSphere®  Master  Data  Management  v11  

•  Further  IBM  products  and  appliances  looking  to  adopt  

•  Open-­‐source  DFDL  implementa/on  in  progress  ‘Daffodil’  –  Available  as  an  alpha  release  (parser  only)  –  More  features  added  every  release  

•  DFDL  web  community  on  GitHub  for  collabora/ve  authoring  of  DFDL  schemas  for  commercial  and  scien/fic  data  formats  

DFDL  Schemas  Web  Community  •  Free  public  repository                                                                              for  DFDL  models  

•  Hosted  on  the  popular  GitHub    community  website  

•  Unlimited  read-­‐only  access    •  Collabora/on  encouraged  •  Evolving  content  

   

GeMng  Started  with  DFDL    

•  Introduc/on  

• OGF  DFDL  –  a  standard  for  modelling  text  and  binary  data  

•  IBM  DFDL  Component  

• DFDL  and  IIB  

• Ques/ons  

Agenda  

IBM  DFDL    •  Designed  as  an  embeddable  component  

–  First  shipped  in  2011  •  DFDL  processor  

–  High  performance  parser  and  serializer  –  Java  and  C  –  Streaming,  on-­‐demand,  specula/ve  –  Pre-­‐compiles  DFDL  schema  –  Parser  emits  SAX-­‐like  events  

•  Tooling  for  crea/ng  DFDL  models  –  DFDL  Schema  editor  eclipse  plugins  –  Wizards  for  CSV,  COBOL  &  C  –  Debug  model  using  real  data  from  within  tooling    

•  IBM  DFDL  implements  majority  of  the  OGF  DFDL  1.0  specifica/on  –  Some  more  advanced  features  of  DFDL  are  not  yet  available  –  Will  be  added  in  future  DFDL  deliverables  un/l  100%  achieved  

<Document> <Element name=“numbers”/> <Element name=“category” …/> <Element name=“lowerBound” …/> </Element> </Document>

cat=5;lbound=-7.1E8

<xs:schema …> <xs:annotation> <xs:appinfo …> </xs:appinfo> </xs:annotation> ... </xs:schema>

IBM DFDL Processor

What’s  New  in  IBM  DFDL  •  Latest  release  of  IBM  DFDL  is  v1.1.1  •  Since  IBM  DFDL  v1.0  several  spec  features  have  been  added:  

–  Extrac/ng  data  using  a  prefixed  length  –  Extrac/ng  data  using  a  regular  expression  –  Extrac/ng  binary  data  with  delimiters    –  User-­‐defined  variables  –  Default  values  when  serializing  –  More  XPath  func/ons  in  expressions  –  Unordered  sequences  –  Asserts  with  recoverable  errors  

•  Since  IBM  DFDL  v1.0  several  tooling  features  have  been  added:  –  Keyboard  shortcuts  in  DFDL  editor  –  MBCS  enablement  in  DFDL  debugger  –  Copy/paste  in  DFDL  editor  –  Genera/on  of  sample  values  

•  Con/nual  increase  in  performance  

   DFDL  Schemas  for  Industry  Formats  •  Fully  supported,  full  func/on  DFDL  schemas  will  appear  as  part  of  IBM  products  such  as  the  industry  connec/vity  packs  for  IIB  

•  Unsupported,  part  func/on  DFDL  schemas  will  appear  on  DFDLSchemas  GitHub  site  as  and  when  available  

•  HL7  v2.5.1,  v2.6  and  v2.7  –  IIB  Connec/vity  Pack  for  Healthcare  &  GitHub  

•  HIPPA  v5010  mandatory  transac/ons  –  IIB  Connec/vity  Pack  for  Healthcare    

•  IBM/Toshiba  4690  SurePos  ACE  v7r3  TLog  –  IIB  Retail  Pack  &  GitHub  

•  ISO  8583  1987  &  1993  – WMB  v8  and  IIB  v9  sample  &  GitHub    

•  NACHA  2013  – GitHub  

•  EDIFACT  (all  releases)  – GitHub  

•  SWIFT  FIN  (2013-­‐2014)  – New!  IBM  Integra/on  Bus  Solu/on  for  SWIFT  FIN  Messaging  (DFDL  Edi/on)  

•  Introduc/on  

• OGF  DFDL  –  a  standard  for  modelling  text  and  binary  data  

•  IBM  DFDL  Component  

• DFDL  and  IIB  -­‐  demo  

• Ques/ons  

Agenda  

IBM  DFDL  in  WMB  and  IIB  • WMB  v8  embeds  IBM  DFDL  v1.0  •  IIB  v9  embeds  IBM  DFDL  v1.1    •  DFDL  domain  and  parser  

– Available  in  usual  way  –  input  nodes,  ESQL,  Java,  …  – More  capable  and  higher  performing  than  MRM  CWF/TDS    

•  DFDL  models  – DFDL  schema  files  reside  in  libraries,  not  in  message  sets  

•  DFDL  wizards  and  editor  for  crea/ng  DFDL  models    •  DFDL  model  debug  and  test  

– Debug  and  test  parsing  &  wri/ng  of  data  within  toolkit    – No  message  flow  or  run/me  deploy  necessary!  

•  DFDL  schema  deployed  in  BAR  file  – No  dic/onary  file  

• No  automa/c  migra/on  from  MRM  

IBM  DFDL    Demo  

Launcher for creating Message Models

Selecting one of these creates

a DFDL schema

New  Message  Model  

New  Message  Model  –  CSV  Wizard  

Choose the end of line character

Contains a header record?

Change the number of columns

DFDL  Editor  –  With  Generated  CSV  Model  

Logical structure

view DFDL

properties view Problems

view

Test  and  Debug  

Start a test parse

Parsed ‘infoset’

Parsed data

Delimiters highlighted

Test  and  Debug  –  Failure    

Parsed data up to

error

Trace console

Error message

Object in error

Parsed ‘infoset’ up to error

Model and data

linked

Using  the  DFDL  domain  and  parser  

On Demand or Complete

parsing

Optional validation

DFDL domain

Specify message

name

DFDL  message  tree  

( ['MQROOT' : 0xd6d218] (0x01000000:Name):Properties = ( ['MQPROPERTYPARSER' : 0x141d34e8] (0x03000000:NameValue):MessageSet = '' (CHARACTER) (0x03000000:NameValue):MessageType =

'{}:Company' (CHARACTER) (0x03000000:NameValue):MessageFormat = '' (CHARACTER) (0x03000000:NameValue):Encoding = 273 (INTEGER) (0x03000000:NameValue):CodedCharSetId = 850 (INTEGER) .... ) (0x01000000:Name):DFDL = ( ['dfdl' : 0xd812c8] (0x01000000:Name):Company = ( (0x03000000:NameValue):CompanyName = 'IBM' (CHARACTER) (0x01000000:Name):Employee = ( (0x03000000:NameValue):EmpNo = 12345 (INTEGER)

(0x03000000:NameValue):Dept = 21 (INTEGER) (0x03000000:NameValue):EmpName = 'Steve Hanson' (CHARACTER) ... )

) ) )

DFDL message

name

Message name in tree (like XMLNSC)

DFDL domain

Compact ‘Name/Value’

syntax elements

Data types from DFDL

schema

•  The  IBM  DFDL  Java  classes  may  be  used  to  create  stand-­‐alone  Java  applica/ons  for  parsing/serializing  text  &  binary  data  formats  

•  IIB  license  permits  IBM  DFDL  classes  to  be  used  by  Java  applica/ons:  – On  a  computer  where  IIB  is  installed  – On  a  remote  computer  where  IIB  is  not  installed  

• When  used  remotely,  charged  as  if  IIB  was  installed  (via  ITLM)  •  IBM  DFDL  for  Java  is  fully  supported  when  used  in  this  manner  •  Javadoc  for  APIs  and  a  sample  program  are  provided  

<Document> <Element name=“myNumbers”/> <Element name=“myInt” dataType=“xs:int” dataValue=“5”/> <Element name=“myFloat” dataType=“xs:float” dataValue=“-7.1E08”/> </Element> </Document>

intval=5;fltval=-7.1E8 IBM DFDL for Java

Stand-alone Java Application

Using IBM DFDL Outside a Broker

Summary  •  DFDL  –  an  OGF  standard  for  describing  and  parsing  text  and  

binary  data  –  Defines  a  rich  set  of  features  to  handle  broad  range  of  custom  and  

industry  data  formats.  –  Leverages  XML  schema  logical  model  and  logical  valida/on.    –  DFDL  Schemas  community  for  sharing  of  standard  industry  data  

format  descrip/ons  

•  IBM  DFDL  –  the  parser  for  test  and  binary  data  in  IBM  Integra/on  Bus  and  other  IBM  products.    –  Highly  performing  run/me  parser  inside  IIB  and  other  IBM  products.  –  Rich  set  of  DFDL  development  and  test  tools  in  IIB  Studio  and  other  

IBM  products.    

•  DFDL  1.0  specifica:on:    hvp://www.ogf.org/documents/GFD.174.pdf    

•  DFDL  tutorials:    hvp://redmine.ogf.org/dmsf/dfdl-­‐wg?folder_id=5485    

•  DFDL  developerWorks:    hvp://www.ibm.com/developerworks/library/se-­‐dfdl/index.html    

•  DFDL  Wikipedia  page:    hvp://en.wikipedia.org/wiki/DFDL    

•  DFDL  Schemas  on  GitHub:    hvps://github.com/DFDLSchemas    

•  OGF  DFDL  home  page:    hvp://www.ogf.org/dfdl/    

•  Daffodil  open  source  project:    hvps://opensource.ncsa.illinois.edu/confluence/display/DFDL/Daffodil%3A+Open+Source+DFDL  

Useful  DFDL  links  

©  2014  IBM  Corpora/on    

For  Addi:onal  Informa:on  l  IBM  Training  

hvp://www.ibm.com/training    l  IBM  WebSphere  

hvp://www-­‐01.ibm.com/sozware/be/websphere/    l  IBM  developerWorks  

www.ibm.com/developerworks/websphere/websphere2.html    l  WebSphere  forums  and  community  

www.ibm.com/developerworks/websphere/community/    

 

33