The Mystical Principles of XSLT: Enlightenment through Software Visualization
Transcript of The Mystical Principles of XSLT: Enlightenment through Software Visualization
Evan Lenz, PresidentAugust 2, 2016
The Mystical Principles of XSLTEnlightenment through Software Visualization
2
Grokking XSLT Easy to learn just enough to be dangerous
− Lots of terrible code in the wild
Understanding template rules is elusive− Not what procedural programmers expect− Lots of explanations that are partial, imprecise, or ambiguous
XSLT really isn’t that difficult a language− But only if you grok template rules
A lot that’s invisible
3
Avoiding xsl:apply-templates
4
Befriending xsl:apply-templates
5
By analogy to OO From XSLT 1.0 Pocket Reference:
− http://lenzconsulting.com/how-xslt-works/#applying_template_rules
• “In object-oriented (OO) terms, xsl:apply-templates is like a function that iterates over a list of objects (nodes) and, for each object, calls the same polymorphic function. Each template rule in your stylesheet defines a different implementation of that single polymorphic function. Which implementation is chosen depends on the runtime characteristics of the object (node). Loosely speaking, you define all the potential bindings by associating a “type” (pattern) with each implementation (template rule).”
− http://lenzconsulting.com/how-xslt-works/#modes• “Furthering the OO analogy, a mode name identifies which polymorphic
function to execute repeatedly in a call to xsl:apply-templates. When mode="foo" is set, foo acts as the name of a polymorphic function, and each template rule with mode="foo" defines an implementation of the foo ‘function.’”
6
By analogy to other instructions This:
<xsl:apply-templates select="animal"/>…<xsl:template match="animal[@type eq 'cat']">…</xsl:template><xsl:template match="animal[@type eq 'dog']">…</xsl:template>
is (more or less) equivalent to this:
<xsl:for-each select="animal"> <xsl:choose> <xsl:when test="@type eq 'cat'">…</xsl:when> <xsl:when test="@type eq 'dog'">…</xsl:when> </xsl:choose></xsl:for-each>
7
De-mystifying XSLT wizardry Mysticism:
− “a theory postulating the possibility of direct and intuitive acquisition of ineffable knowledge or power”
In relation to XSLT:− Belief in the possibility that there is a way to make XSLT’s
behavior more directly and intuitively understandable• By making implicit behavior explicit and the abstract concrete, through an
interactive visualization
8
Elusiveness of the task Alternating between faith and doubt that what I’m
imagining is possible or useful Grand schemes that collapse into themselves Making abstract ideas concrete
− Shapes in my mind converted to shapes on the screen
Disillusionment− Finding out that certain ideas weren’t well-formed− The crucible of incarnation
The feeling that there’s nothing really “there”
9
Like a debugger? Debugging is like the scientific method
− The debugger is the instrument of measurement− Start with a hypothesis− Set:
• breakpoints• watch variables
− One step at a time, careful not to “step over” when we want to “step into”—otherwise we’ve missed our chance and have to start all over (because we can’t go backwards in time)
10
Not like a debugger Software visualization for XSLT:
− A holistic experience, rather than a narrow line of inquiry− A space in which to freely explore, in any direction− Not bound by our previous steps− No need to start with a question− Just start playing around and see what we will see
11
Not like a debugger Sri Aurobindo on gnostic knowledge:
− “For while the reason [like a debugger] proceeds from moment to moment of time and loses and acquires and again loses and again acquires, the gnosis [or visualization tool] dominates time in a one view and perpetual power and links past, present and future in their indivisible connections, in a single continuous map of knowledge, side by side.” (The Synthesis of Yoga, p. 464)
12
Escaping the bounds of time A transformation is fundamentally:
− a data set− not a process
XSLT’s functional nature− More like an abstract sculpture, showing relationships− Less like a movie
Time is just one way to traverse the relationships:− we can go forward and backward in time (result document order)− we can step outside of time− we can traverse the relationships using a different dimension
The essence of transformation
14
Transformation as microcosm “The Big Bang”
− the initial context node
“The Engine of Creation”− template rules
“Emergence/Manifestation”− the result tree
15
Scope and method Scope:
− Template rules (and maybe xsl:for-each)• XSLT’s core processing model
− XPath, though foundational, is out of scope
Method:− Explode a transformation into all its parts− Put them back together
16
Fundamental unit: the focus As defined in the XSLT Recommendation, consists of:
− the context item (.),− the context position (position()), and− the context size (last()).
17
An expanded definition of focus We will define “focus” as:
− the particular instance of a focus− i.e. each instantiation of a focus-changed sequence constructor
• body of <xsl:template>, body of <xsl:for-each>, etc.
Then we can consider the shallow result chunk implied by a focus to be an intrinsic property of that focus
− 1:1 relationship
18
For example Given a single instantiation of this rule:
<xsl:template match="heading"> <title> <xsl:value-of select="."/> </title></xsl:template>
We could represent the “focus” like this:
<trace:focus context-id="n1a833c7b3a005151" context-position="1" context-size="1" rule-id="nfa49863d62353482" invocation-id="bc08a736-13f7-e97b-a272"> <title>This is the title</title></trace:focus>
19
For example Given a single instantiation of this rule:
<xsl:template match="heading"> <title> <xsl:value-of select="."/> </title></xsl:template>
We could represent the “focus” like this:
<trace:focus context-id="n1a833c7b3a005151" context-position="1" context-size="1" rule-id="nfa49863d62353482" invocation-id="bc08a736-13f7-e97b-a272"> <title>This is the title</title></trace:focus>
Cosmic address (GUID) for the
current invocation of <xsl:apply-templates/>
20
For example Given a single instantiation of this rule:
<xsl:template match="heading"> <title> <xsl:value-of select="."/> </title></xsl:template>
We could represent the “focus” like this:
<trace:focus context-id="n1a833c7b3a005151" context-position="1" context-size="1" rule-id="nfa49863d62353482" invocation-id="bc08a736-13f7-e97b-a272"> <title>This is the title</title></trace:focus>
How many nodes are in the
list being processed by that invocation
21
For example Given a single instantiation of this rule:
<xsl:template match="heading"> <title> <xsl:value-of select="."/> </title></xsl:template>
We could represent the “focus” like this:
<trace:focus context-id="n1a833c7b3a005151" context-position="1" context-size="1" rule-id="nfa49863d62353482" invocation-id="bc08a736-13f7-e97b-a272"> <title>This is the title</title></trace:focus>
In other words: how many foci belong to the
current invocation
22
For example Given a single instantiation of this rule:
<xsl:template match="heading"> <title> <xsl:value-of select="."/> </title></xsl:template>
We could represent the “focus” like this:
<trace:focus context-id="n1a833c7b3a005151" context-position="1" context-size="1" rule-id="nfa49863d62353482" invocation-id="bc08a736-13f7-e97b-a272"> <title>This is the title</title></trace:focus>
The position of the current
focus in that list
23
For example Given a single instantiation of this rule:
<xsl:template match="heading"> <title> <xsl:value-of select="."/> </title></xsl:template>
We could represent the “focus” like this:
<trace:focus context-id="n1a833c7b3a005151" context-position="1" context-size="1" rule-id="nfa49863d62353482" invocation-id="bc08a736-13f7-e97b-a272"> <title>This is the title</title></trace:focus>
generated ID of the node being processed (a <heading> element)
Cosmic address (generated ID) of the node being
processed(a <heading>
element)
24
For example Given a single instantiation of this rule:
<xsl:template match="heading"> <title> <xsl:value-of select="."/> </title></xsl:template>
We could represent the “focus” like this:
<trace:focus context-id="n1a833c7b3a005151" context-position="1" context-size="1" rule-id="nfa49863d62353482" invocation-id="bc08a736-13f7-e97b-a272"> <title>This is the title</title></trace:focus>
generated ID of the node being processed (a <heading> element)
Cosmic address (generated ID)of the matching template rule
in the stylesheet
25
For example Given a single instantiation of this rule:
<xsl:template match="heading"> <title> <xsl:value-of select="."/> </title></xsl:template>
We could represent the “focus” like this:
<trace:focus context-id="n1a833c7b3a005151" context-position="1" context-size="1" rule-id="nfa49863d62353482" invocation-id="bc08a736-13f7-e97b-a272"> <title>This is the title</title></trace:focus>
The shallow result chunk that this focus creates
26
For example Given a single instantiation of this rule:
<xsl:template match="heading"> <title> <xsl:value-of select="."/> </title></xsl:template>
We could represent the “focus” like this:
<trace:focus context-id="n1a833c7b3a005151" context-position="1" context-size="1" rule-id="nfa49863d62353482" invocation-id="bc08a736-13f7-e97b-a272"> <title>This is the title</title></trace:focus>
The shallow result chunk that this focus creates
27
Why it’s called a “shallow” result chunk It doesn’t include the results of nested invocations:
<trace:focus context-id="n292122aca28a94ca" context-position="1" context-size="1" rule-id="n8ef8c66cac078ff" invocation-id="4dd7580b-d44e-a0e4-714d"> <html> <head> <trace:invocation id="bc08a736-13f7-e97b-a272"/>
</head> <body> <trace:invocation id="6129c19f-2632-e99c-53b5"/>
</body> </html></trace:focus>
28
Why it’s called a “shallow” result chunk It doesn’t include the results of nested invocations:
<trace:focus context-id="n292122aca28a94ca" context-position="1" context-size="1" rule-id="n8ef8c66cac078ff" invocation-id="4dd7580b-d44e-a0e4-714d"> <html> <head> <trace:invocation id="bc08a736-13f7-e97b-a272"/> <!--invocation of: <xsl:apply-templates select="/doc/heading"/>--> </head> <body> <trace:invocation id="6129c19f-2632-e99c-53b5"/> <!--invocation of: <xsl:apply-templates select="/doc/para"/>--> </body> </html></trace:focus>
29
Why it’s called a “shallow” result chunk It doesn’t include the results of nested invocations:
<trace:focus context-id="n292122aca28a94ca" context-position="1" context-size="1" rule-id="n8ef8c66cac078ff" invocation-id="4dd7580b-d44e-a0e4-714d"> <html> <head> <trace:invocation id="bc08a736-13f7-e97b-a272"/> <!--invocation of: <xsl:apply-templates select="/doc/heading"/>--> </head> <body> <trace:invocation id="6129c19f-2632-e99c-53b5"/> <!--invocation of: <xsl:apply-templates select="/doc/para"/>--> </body> </html></trace:focus>
Links to zero or more other foci
having the given invocation ID
30
Mystical principle: To see is to create Inner seeing results in outer manifestation
− They are not separate; they are two sides of the same coin.
To focus on the input is to create the output− (as defined by the template rule)
They are inextricably linked, 1-to-1− As defined here, the focus and the result chunk are intrinsic to
each other.
31
Mystical principle: All is accessible Each focus is connected to every other focus via
successive invocation-id links Every entity (node, rule, or focus) has a “cosmic address”
(GUID or generated node ID)− Unique, globally accessible address within the world of the
transformation
You can begin anywhere in the world and traverse to anywhere else in the world, including past, present, future
Nothing is lost through the passage of time
32
Mystical principle: Inner upholds outer The result tree is not only isomorphic to the tree of foci
(the XSLT execution tree) It is an adornment of that tree
− The result is the part we see− The inner structure upholds the outer result
The visible world is a decoration of the invisible− a manifestation of a deeper, unseen reality
33
Mystical principle: There is no separation Distinctions are only by definition and thus arbitrary A focus and result chunk are “intrinsic” to each other only
because we are looking at it that way A focus is “extrinsic” to another focus only because we
defined it that way Ultimately, there is only an unbroken whole But:
− Division and distinction are functions we possess− There is joy in the division and joy in the reunion− That’s why we do it
Case Study: Enrichment Tracer
35
Bringing this down to earth: a case study Client project to rewrite an enrichment engine Technical articles are automatically enriched with search
terms based on their content− (using a taxonomy of terms and reverse queries in MarkLogic)
There are various business rules about what types of article should be enriched with what types of terms
The rules need to evolve over time Most importantly, the rule behavior needs to be
transparent to the analysts and taxonomy managers
36
XSLT pipeline for enrichment rules For the rewritten engine’s enrichment rules, I decided to
use XSLT as the rules language (surprise, surprise!)− declarative and powerful− simplified by using a multi-stage pipeline− good in conjunction with the modified identity transform
The article passes through unchanged, except that new metadata (“enrichment”) is inserted into each section
Example stages:− get-terms− add-inherited− remove-subsumed
− remove-excluded− add-more-terms− filter-terms
37
Distinction between “rules” and “engine” Entire enrichment engine consists of XSLT template rules
− (and imported XQuery libraries)
But not all template rules are created equal Some modes are “engine-level”
− not meant to be regularly customized
Some modes are “rules-level”− direct implementations of the business rules− meant to be customized to handle different content scenarios
The latter are the ones that need to be made transparent
38
Making the business rules transparent Modes are grouped by pipeline stage Each mode has a default behavior
− e.g. mode="terms" by default uses a reverse query to find all matching terms
− can be overridden to provide a different behavior• (such as to fix the term for a particular article type, regardless of the content)
The default rules don’t need to be shown− Only the custom overrides
Enrichment Tracer demo
40
Trace-enabling the XSLT Used XSLT to pre-process (“trace-enable”) the original
engine and rules XSLT The trace-enabled stylesheet additionally generates:
− inline trace data• <trace:match-start> and <trace:match-end> markers in the result tree
− out-of-band trace data• using xdmp:set() in MarkLogic
The “rule-level” modes are annotated in the original engine code
− so the trace-enabler knows which template rules to augment− [show example, line 538 of engine.xsl]− [also show trace data example in oXygen]
41
Other techniques used Documenting the rules inline, so they get fed straight to
the tracer interface Storing the trace data for each input document into the
(MarkLogic) database for faster subsequent renders Automatically invalidating (and thus forcing re-generation
of) the cached trace data whenever a rule or data change is detected
Generalizing to arbitrary XSLT
43
Characteristics to preserve Colored lines to depict relationships Visually represent XML as XML Interactive slider for building/un-building the result
44
Difficulties How to visualize temporary trees?
− i.e. results stored in <xsl:variable>
Modifying template rule results could break the stylesheet− type errors− node count dependencies, etc.
45
Solution: Store trace data out-of-band Via side effects
− Write a document to a database, or− Update a global variable
Benefits:− avoid type errors− avoid unexpected (altered) stylesheet behavior− shredded trace data may suggest more scalable visualization
implementations• (e.g. for large source documents)
Demo of early results
47
Links Online demo here:
− http://xmlportfolio.com/xslt-visualizer-demo/
Code on GitHub:− http://github.com/evanlenz/xslt-visualizer