WTUI18 - Modelling and Parsing Business Data Using DFDL

33
I18: Data Format Description Language (DFDL) Modeling and Parsing Business Data Alex Wood IBM DFDL Development Team Lead IBM Hursley Lab ,UK

Transcript of WTUI18 - Modelling and Parsing Business Data Using DFDL

Page 1: WTUI18 - Modelling and Parsing Business Data Using DFDL

I18: Data Format Description

Language (DFDL)

Modeling and Parsing Business Data

Alex Wood IBM DFDL Development Team Lead

IBM Hursley Lab ,UK  

Page 2: WTUI18 - Modelling and Parsing Business Data Using DFDL

©  2014  IBM  Corpora/on    

Please  Note  IBM’s  statements  regarding  its  plans,  direc/ons,  and  intent  are  subject  to  change  or  withdrawal  without  no/ce  at  IBM’s  sole  discre/on.  Informa/on  regarding  poten/al  future  products  is  intended  to  outline  our  general  product  direc/on  and  it  should  not  be  relied  on  in  making  a  purchasing  decision.    

The  informa/on  men/oned  regarding  poten/al  future  products  is  not  a  commitment,  promise,  or  legal  obliga/on  to  deliver  any  material,  code  or  func/onality.  Informa/on  about  poten/al  future  products  may  not  be  incorporated  into  any  contract.  The  development,  release,  and  /ming  of  any  future  features  or  func/onality  described  for  our  products  remains  at  our  sole  discre/on  

Performance  is  based  on  measurements  and  projec/ons  using  standard  IBM  benchmarks  in  a  controlled  environment.    The  actual  throughput  or  performance  that  any  user  will  experience  will  vary  depending  upon  many  factors,  including  considera/ons  such  as  the  amount  of  mul/programming  in  the  user’s  job  stream,  the  I/O  configura/on,  the  storage  configura/on,  and  the  workload  processed.    Therefore,  no  assurance  can  be  given  that  an  individual  user  will  achieve  results  similar  to  those  stated  here.  

2  

Page 3: WTUI18 - Modelling and Parsing Business Data Using DFDL

•  Introduc/on  

• OGF  DFDL  –  a  standard  for  modelling  text  and  binary  data  

•  IBM  DFDL  Component  

• DFDL  and  IIB  -­‐  demo  

• Ques/ons  

Agenda  

Page 4: WTUI18 - Modelling and Parsing Business Data Using DFDL

Modeling  Text  and  Binary  Data  •  Much  of  the  data  in  the  world  resides  in  files,  is  not  XML,  is  a  mixture  of  textual  and  binary  with  

custom  syntax  and  encodings,  and  does  not  have  a  shareable  machine-­‐readable  descrip/on  

•  But  there  has  been  no  universal  standard  for  modelling  this  data!  – XML  -­‐>  use  XML  Schema  – RDBMS  -­‐>  use  database  schema  – Text/binary  -­‐>  ??  

•  Exis/ng  standards  are  too  prescrip/ve:  “Put  your  data  in  this  format!”  •  Organiza/ons  including  IBM  evolved  their  own  way  of  modelling  text  and  binary  data  based  on  

customer  need.    •  IBM®  examples…  

– WebSphere®  Message  Broker:    MRM  message  set  – WebSphere  Transforma/on  Extender:  Type  Trees  – WebSphere  DataPower:  FFD  – WebSphere  Cast  Iron:  Flat  File  Schema  – Sterling  B2B  Integrator:  DDF  and  IDF  files  

ü  DFDL: a universal, shareable, non-prescriptive description for general text & binary data formats

Page 5: WTUI18 - Modelling and Parsing Business Data Using DFDL

•  Introduc/on  

• OGF  DFDL  –  a  standard  for  modelling  text  and  binary  data  

•  IBM  DFDL  Component  

• DFDL  and  IIB  

• Ques/ons  

Agenda  

Page 6: WTUI18 - Modelling and Parsing Business Data Using DFDL

Data  Format  Descrip:on  Language  (DFDL)  

§ A new open standard – From the Open Grid Forum (OGF)

–  http://www.ogf.org/ – Version 1.0

–  ‘Proposed Recommendation’ status

§ A way of describing data… –  It is NOT a data format itself!

§ A powerful modeling language … – Text, binary and bit – Commercial record-oriented – Scientific and numeric – Modern and legacy –  Industry standards

§ While allowing high performance … – You choose the right data format

for the job

§ Leverage XML Schema technology – Uses W3C XML Schema 1.0 subset

& type system to describe the logical structure of the data

– Uses XSDL annotations to describe the physical representation of the data

– The result is a DFDL schema § Keep simple cases simple § Annotations are human readable § Both read and write

– A DFDL Processor can parse and serialize data using a DFDL schema

§  Intelligent parsing – Automatically resolve choices and

optionality § Validation of data when parsing and

serializing

Page 7: WTUI18 - Modelling and Parsing Business Data Using DFDL

Example  –  Delimited  Text  Data  

cat=5;lbound=-7.1E8 ASCII text

integer ASCII text

floating point

Separator

Separators, initiators (aka tags), & terminators are all examples in DFDL of delimiters

Initiator Initiator

§ 7

Page 8: WTUI18 - Modelling and Parsing Business Data Using DFDL

Example  –  DFDL  Schema  <xs:complexType name=“numbers"> <xs:sequence> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/v1.0"> <dfdl:sequence separator=“;” encoding=“ascii” …/> </xs:appinfo>

</xs:annotation> <xs:element name=“category" type=“xs:int”> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/v1.0"> <dfdl:element representation="text"

textNumberPattern="###0" encoding="ascii" lengthKind="delimited" initiator=“cat=" …/> </xs:appinfo>

</xs:annotation> </xs:element>

<xs:element name=“lowerBound" type=“xs:float”> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/v1.0"> <dfdl:element representation="text" textNumberPattern="##0.0#E0" encoding="ascii"

lengthKind="delimited" initiator=“lbound=" …/> </xs:appinfo>

</xs:annotation> </xs:element> </xs:sequence> </xs:complexType>

DFDL properties

DFDL annotation

cat=5;lbound=-7.1E8

Page 9: WTUI18 - Modelling and Parsing Business Data Using DFDL

Example  –  DFDL  Schema  (Short  Form)  

<xs:complexType name=“numbers"> <xs:sequence dfdl:separator=“;” dfdl:encoding=“ascii” …> <xs:element name=“category" type=“xs:int” dfdl:representation="text"

dfdl:textNumberPattern="###0" dfdl:encoding="ascii" dfdl:lengthKind="delimited" dfdl:initiator=“cat=" … /> <xs:element name=“lowerBound" type=“xs:float” dfdl:representation="text"

dfdl:textNumberPattern="##0.0#E0" dfdl:encoding="ascii" dfdl:lengthKind="delimited" dfdl:initiator="lbound=" … /> </xs:sequence> </xs:complexType>

DFDL properties

cat=5;lbound=-7.1E8

Page 10: WTUI18 - Modelling and Parsing Business Data Using DFDL

§  A DFDL processor uses a DFDL schema to understand a data stream

§  It consists of a DFDL parser and (optionally) a DFDL serializer

§  The DFDL parser reads a data stream and creates a DFDL ‘infoset’

§  The DFDL serializer takes a DFDL ‘infoset’ and writes a data stream

DFDL Processor

<Document> <Element name=“numbers”/> <Element name=“category” dataType=“xs:int” dataValue=“5”/> <Element name=“lowerBound” dataType=“xs:float” dataValue=“-7.1E08”/> </Element> </Document>

cat=5;lbound=-7.1E8

<xs:complexType name=“numbers"> <xs:sequence dfdl:separator=“;” dfdl:encoding=“ascii” ... > <xs:element name=“category" type=“xs:int” dfdl:representation="text"

dfdl:encoding="ascii“ dfdl:textNumberPattern=“###0” dfdl:lengthKind="delimited" dfdl:initiator=“cat=“ ... /> <xs:element name=“lowerBound" type=“xs:float” dfdl:representation="text"

dfdl:encoding="ascii“ dfdl:textNumberPattern=“##0.0#E0” dfdl:lengthKind="delimited" dfdl:initiator=“lbound=“ ... /> </xs:sequence> </xs:complexType>

DFDL Processor

Page 11: WTUI18 - Modelling and Parsing Business Data Using DFDL

DFDL  1.0  Features  §  Text data types such as strings, numbers, zoned decimals, calendars, booleans §  Binary data types such as integers, floats, BCD, packed decimals, calendars, booleans §  Fixed length data and data delimited by text or binary markup §  Bi-directional text §  Bit data of arbitrary length §  Pattern languages for text numbers and calendars §  Ordered, unordered and floating content §  Default values on parsing and serializing §  Nil values for handling out-of-band data §  Fixed and variable arrays §  XPath 2.0 expression language including variables to model dynamic data §  Speculative parsing to resolve choices and optional content §  Validation to XML Schema 1.0 rules §  Scoping mechanism to allow common property values to be applied at multiple points §  Hide elements in the data §  Calculate element values

Page 12: WTUI18 - Modelling and Parsing Business Data Using DFDL

When  should  I  use  DFDL?  §  DFDL’s sweet spot is when you need to model and parse a text or binary data format and where either:

§  You have a specification of the data format ‘on the wire’ §  You have actual wire examples of the data format

§  DFDL is recommended to model: §  Binary data from COBOL, C, PL/1, ASM programs §  Text data with delimiters such as CSV §  Text industry standards such as SWIFT, HL7, EDIFACT, X12, HIPPA… §  Binary industry standards such as ISO8583, TLog, ...

§  DFDL is not recommended to model: §  XML

§  Already have XML parsers and XML Schema / DTDs §  JSON

§  Already have JSON parsers, and JSON schema under design §  GPB, HDF5, …

§  With serialization formats like GPB, the wire format is never exposed to the consumer and access to the data is using APIs

Page 13: WTUI18 - Modelling and Parsing Business Data Using DFDL

DFDL  Adop:on  •  IBM  DFDL  reusable  component  ships  with:  

–  WebSphere  Message  Broker  v8.0    –  IBM  Integra/on  Bus  v9.0  –  IBM  Integra/on  Bus  open-­‐beta  –  Ra/onal®  Performance  Test  Server  v8  (v8.0.1  onwards)  –  Ra/onal  Test  Virtualiza/on  Server  v8  (v8.0.1  onwards)  –  Ra/onal  Test  Workbench  v8  (v8.0.1  onwards)  –  Ra/onal  Developer  for  System  z  v8.5  –  InfoSphere®  Master  Data  Management  v11  

•  Further  IBM  products  and  appliances  looking  to  adopt  

•  Open-­‐source  DFDL  implementa/on  in  progress  ‘Daffodil’  –  Available  as  an  alpha  release  (parser  only)  –  More  features  added  every  release  

•  DFDL  web  community  on  GitHub  for  collabora/ve  authoring  of  DFDL  schemas  for  commercial  and  scien/fic  data  formats  

Page 14: WTUI18 - Modelling and Parsing Business Data Using DFDL

DFDL  Schemas  Web  Community  •  Free  public  repository                                                                              for  DFDL  models  

•  Hosted  on  the  popular  GitHub    community  website  

•  Unlimited  read-­‐only  access    •  Collabora/on  encouraged  •  Evolving  content  

   

Page 15: WTUI18 - Modelling and Parsing Business Data Using DFDL

GeMng  Started  with  DFDL    

Page 16: WTUI18 - Modelling and Parsing Business Data Using DFDL

•  Introduc/on  

• OGF  DFDL  –  a  standard  for  modelling  text  and  binary  data  

•  IBM  DFDL  Component  

• DFDL  and  IIB  

• Ques/ons  

Agenda  

Page 17: WTUI18 - Modelling and Parsing Business Data Using DFDL

IBM  DFDL    •  Designed  as  an  embeddable  component  

–  First  shipped  in  2011  •  DFDL  processor  

–  High  performance  parser  and  serializer  –  Java  and  C  –  Streaming,  on-­‐demand,  specula/ve  –  Pre-­‐compiles  DFDL  schema  –  Parser  emits  SAX-­‐like  events  

•  Tooling  for  crea/ng  DFDL  models  –  DFDL  Schema  editor  eclipse  plugins  –  Wizards  for  CSV,  COBOL  &  C  –  Debug  model  using  real  data  from  within  tooling    

•  IBM  DFDL  implements  majority  of  the  OGF  DFDL  1.0  specifica/on  –  Some  more  advanced  features  of  DFDL  are  not  yet  available  –  Will  be  added  in  future  DFDL  deliverables  un/l  100%  achieved  

<Document> <Element name=“numbers”/> <Element name=“category” …/> <Element name=“lowerBound” …/> </Element> </Document>

cat=5;lbound=-7.1E8

<xs:schema …> <xs:annotation> <xs:appinfo …> </xs:appinfo> </xs:annotation> ... </xs:schema>

IBM DFDL Processor

Page 18: WTUI18 - Modelling and Parsing Business Data Using DFDL

What’s  New  in  IBM  DFDL  •  Latest  release  of  IBM  DFDL  is  v1.1.1  •  Since  IBM  DFDL  v1.0  several  spec  features  have  been  added:  

–  Extrac/ng  data  using  a  prefixed  length  –  Extrac/ng  data  using  a  regular  expression  –  Extrac/ng  binary  data  with  delimiters    –  User-­‐defined  variables  –  Default  values  when  serializing  –  More  XPath  func/ons  in  expressions  –  Unordered  sequences  –  Asserts  with  recoverable  errors  

•  Since  IBM  DFDL  v1.0  several  tooling  features  have  been  added:  –  Keyboard  shortcuts  in  DFDL  editor  –  MBCS  enablement  in  DFDL  debugger  –  Copy/paste  in  DFDL  editor  –  Genera/on  of  sample  values  

•  Con/nual  increase  in  performance  

Page 19: WTUI18 - Modelling and Parsing Business Data Using DFDL

   DFDL  Schemas  for  Industry  Formats  •  Fully  supported,  full  func/on  DFDL  schemas  will  appear  as  part  of  IBM  products  such  as  the  industry  connec/vity  packs  for  IIB  

•  Unsupported,  part  func/on  DFDL  schemas  will  appear  on  DFDLSchemas  GitHub  site  as  and  when  available  

•  HL7  v2.5.1,  v2.6  and  v2.7  –  IIB  Connec/vity  Pack  for  Healthcare  &  GitHub  

•  HIPPA  v5010  mandatory  transac/ons  –  IIB  Connec/vity  Pack  for  Healthcare    

•  IBM/Toshiba  4690  SurePos  ACE  v7r3  TLog  –  IIB  Retail  Pack  &  GitHub  

•  ISO  8583  1987  &  1993  – WMB  v8  and  IIB  v9  sample  &  GitHub    

•  NACHA  2013  – GitHub  

•  EDIFACT  (all  releases)  – GitHub  

•  SWIFT  FIN  (2013-­‐2014)  – New!  IBM  Integra/on  Bus  Solu/on  for  SWIFT  FIN  Messaging  (DFDL  Edi/on)  

Page 20: WTUI18 - Modelling and Parsing Business Data Using DFDL

•  Introduc/on  

• OGF  DFDL  –  a  standard  for  modelling  text  and  binary  data  

•  IBM  DFDL  Component  

• DFDL  and  IIB  -­‐  demo  

• Ques/ons  

Agenda  

Page 21: WTUI18 - Modelling and Parsing Business Data Using DFDL

IBM  DFDL  in  WMB  and  IIB  • WMB  v8  embeds  IBM  DFDL  v1.0  •  IIB  v9  embeds  IBM  DFDL  v1.1    •  DFDL  domain  and  parser  

– Available  in  usual  way  –  input  nodes,  ESQL,  Java,  …  – More  capable  and  higher  performing  than  MRM  CWF/TDS    

•  DFDL  models  – DFDL  schema  files  reside  in  libraries,  not  in  message  sets  

•  DFDL  wizards  and  editor  for  crea/ng  DFDL  models    •  DFDL  model  debug  and  test  

– Debug  and  test  parsing  &  wri/ng  of  data  within  toolkit    – No  message  flow  or  run/me  deploy  necessary!  

•  DFDL  schema  deployed  in  BAR  file  – No  dic/onary  file  

• No  automa/c  migra/on  from  MRM  

Page 22: WTUI18 - Modelling and Parsing Business Data Using DFDL

IBM  DFDL    Demo  

Page 23: WTUI18 - Modelling and Parsing Business Data Using DFDL

Launcher for creating Message Models

Selecting one of these creates

a DFDL schema

New  Message  Model  

Page 24: WTUI18 - Modelling and Parsing Business Data Using DFDL

New  Message  Model  –  CSV  Wizard  

Choose the end of line character

Contains a header record?

Change the number of columns

Page 25: WTUI18 - Modelling and Parsing Business Data Using DFDL

DFDL  Editor  –  With  Generated  CSV  Model  

Logical structure

view DFDL

properties view Problems

view

Page 26: WTUI18 - Modelling and Parsing Business Data Using DFDL

Test  and  Debug  

Start a test parse

Parsed ‘infoset’

Parsed data

Delimiters highlighted

Page 27: WTUI18 - Modelling and Parsing Business Data Using DFDL

Test  and  Debug  –  Failure    

Parsed data up to

error

Trace console

Error message

Object in error

Parsed ‘infoset’ up to error

Model and data

linked

Page 28: WTUI18 - Modelling and Parsing Business Data Using DFDL

Using  the  DFDL  domain  and  parser  

On Demand or Complete

parsing

Optional validation

DFDL domain

Specify message

name

Page 29: WTUI18 - Modelling and Parsing Business Data Using DFDL

DFDL  message  tree  

( ['MQROOT' : 0xd6d218] (0x01000000:Name):Properties = ( ['MQPROPERTYPARSER' : 0x141d34e8] (0x03000000:NameValue):MessageSet = '' (CHARACTER) (0x03000000:NameValue):MessageType =

'{}:Company' (CHARACTER) (0x03000000:NameValue):MessageFormat = '' (CHARACTER) (0x03000000:NameValue):Encoding = 273 (INTEGER) (0x03000000:NameValue):CodedCharSetId = 850 (INTEGER) .... ) (0x01000000:Name):DFDL = ( ['dfdl' : 0xd812c8] (0x01000000:Name):Company = ( (0x03000000:NameValue):CompanyName = 'IBM' (CHARACTER) (0x01000000:Name):Employee = ( (0x03000000:NameValue):EmpNo = 12345 (INTEGER)

(0x03000000:NameValue):Dept = 21 (INTEGER) (0x03000000:NameValue):EmpName = 'Steve Hanson' (CHARACTER) ... )

) ) )

DFDL message

name

Message name in tree (like XMLNSC)

DFDL domain

Compact ‘Name/Value’

syntax elements

Data types from DFDL

schema

Page 30: WTUI18 - Modelling and Parsing Business Data Using DFDL

•  The  IBM  DFDL  Java  classes  may  be  used  to  create  stand-­‐alone  Java  applica/ons  for  parsing/serializing  text  &  binary  data  formats  

•  IIB  license  permits  IBM  DFDL  classes  to  be  used  by  Java  applica/ons:  – On  a  computer  where  IIB  is  installed  – On  a  remote  computer  where  IIB  is  not  installed  

• When  used  remotely,  charged  as  if  IIB  was  installed  (via  ITLM)  •  IBM  DFDL  for  Java  is  fully  supported  when  used  in  this  manner  •  Javadoc  for  APIs  and  a  sample  program  are  provided  

<Document> <Element name=“myNumbers”/> <Element name=“myInt” dataType=“xs:int” dataValue=“5”/> <Element name=“myFloat” dataType=“xs:float” dataValue=“-7.1E08”/> </Element> </Document>

intval=5;fltval=-7.1E8 IBM DFDL for Java

Stand-alone Java Application

Using IBM DFDL Outside a Broker

Page 31: WTUI18 - Modelling and Parsing Business Data Using DFDL

Summary  •  DFDL  –  an  OGF  standard  for  describing  and  parsing  text  and  

binary  data  –  Defines  a  rich  set  of  features  to  handle  broad  range  of  custom  and  

industry  data  formats.  –  Leverages  XML  schema  logical  model  and  logical  valida/on.    –  DFDL  Schemas  community  for  sharing  of  standard  industry  data  

format  descrip/ons  

•  IBM  DFDL  –  the  parser  for  test  and  binary  data  in  IBM  Integra/on  Bus  and  other  IBM  products.    –  Highly  performing  run/me  parser  inside  IIB  and  other  IBM  products.  –  Rich  set  of  DFDL  development  and  test  tools  in  IIB  Studio  and  other  

IBM  products.    

Page 32: WTUI18 - Modelling and Parsing Business Data Using DFDL

•  DFDL  1.0  specifica:on:    hvp://www.ogf.org/documents/GFD.174.pdf    

•  DFDL  tutorials:    hvp://redmine.ogf.org/dmsf/dfdl-­‐wg?folder_id=5485    

•  DFDL  developerWorks:    hvp://www.ibm.com/developerworks/library/se-­‐dfdl/index.html    

•  DFDL  Wikipedia  page:    hvp://en.wikipedia.org/wiki/DFDL    

•  DFDL  Schemas  on  GitHub:    hvps://github.com/DFDLSchemas    

•  OGF  DFDL  home  page:    hvp://www.ogf.org/dfdl/    

•  Daffodil  open  source  project:    hvps://opensource.ncsa.illinois.edu/confluence/display/DFDL/Daffodil%3A+Open+Source+DFDL  

Useful  DFDL  links  

Page 33: WTUI18 - Modelling and Parsing Business Data Using DFDL

©  2014  IBM  Corpora/on    

For  Addi:onal  Informa:on  l  IBM  Training  

hvp://www.ibm.com/training    l  IBM  WebSphere  

hvp://www-­‐01.ibm.com/sozware/be/websphere/    l  IBM  developerWorks  

www.ibm.com/developerworks/websphere/websphere2.html    l  WebSphere  forums  and  community  

www.ibm.com/developerworks/websphere/community/    

 

33