Analyzing differences between W1 and GDLs using tree alignment
Morten Rhiger(The IT-University of Copenhagen)
Outline
1. The problem(The upgrade problem: migrating partner customizations from version N to version N+1)
2. Our solution(Daisychaining procedures, a lá AOP)
3. Other solutions(Repositories with versioning, software merging, …)
4. Validating our solution(Measuring the number of good customizations using a tree diff)
5. Numbers…
2
NAV lifecycle
W1version 5.0
W1version 5.0
DEversion 5.0
DEversion 5.0
GBversion 5.0
GBversion 5.0
BEversion 5.0
BEversion 5.0
DKversion 5.0
DKversion 5.0
W1version 2009
W1version 2009
DKversion 2009
DKversion 2009
Partners customize
Mic
roso
ftev
olve
Time3
The NAV upgrade problem• There are no language features for controlling
customization in NAVs C/AL• Customizations are (destructive) source-code
modifications• There is no versioning in NAV– Some (clever) partners maintain repositories of their edits– www.mergetool.com– On the other hand, few (VAR) partners are IT professionals
• Consequently, partners face a serious problem when migrating their old customization to the new version– Migration takes up to 30% the effort required to
implement the first derived version
4
Our solution
• Distinguish between – The location of a customization in the original
version (a customization point), and– the modification a customization performs
• (Reminiscent of AOP)
5
NAV lifecycle
W1version 5.0
W1version 5.0
DKversion 5.0
DKversion 5.0
W1version 2009
W1version 2009
DKversion 2009
DKversion 2009
Modifying customizations, but leaving customization points unchanged.
Mov
ing
arou
nd
cust
omiz
ation
poi
nts.
6
Pluggin old customization into (possibly moved) customization points (trivial).
Customization points where?
PROCEDURE UpdateBalance();BEGIN
GenJnlManagement.CalcBalance(…);
END
PROCEDURE UpdateBalance();BEGIN
GenJnlManagement.CalcBalance(…); TotalPayAmount := 0; TempGenJourLine.COPY(Rec);END
DE version 5.0W1 version 5.0
PROCEDURE UpdateBalance();BEGIN
GenJnlManagement.CalcBalance(…); <<customization point>> END
W1 version 2009
PROCEDURE UpdateBalance();BEGIN <<customization point>> GenJnlManagement.CalcBalance(…); END
Legal?
7
Customization points where?
• When is it legal to move a customization point? Where can it be moved to? …
• Procedure calls are useful customization points (we hypothesize):– If a procedure call can be moved, so can
customizations a that point• We probably still need something more fine
grained (we also hypothesize)
8
Daisy-chaining procedures
• Daisy-chaining procedures and triggers (a proposal due to Lars)
• Reminiscent of aspect-oriented programming • A property (Trigger) on a procedure or
trigger controls what is (also) invoked when that procedure is called
9
Daisy-chaining procedures• Existing procedure and trigger property:
[Trigger(“*”)]PROCEDURE Foo(…) = …
• Adding code to execute at the end of Foo:
PROCEDURE FooMorten() = …
• The “*” says that after Foo is invoked, all procedures with prefix Foo should also be invoked (in some unspecified order)– Resolved “late”
10
Customization points where?
[Trigger(“*”)]PROCEDURE UpdateBalance();BEGIN GenJnlManagement.CalcBalance(…);END
PROCEDURE UpdateBalanceDE();BEGIN TotalPayAmount := 0; TempGenJourLine.COPY(Rec);END
DE version 5.0W1 version 5.0
W1 version 2009
[Trigger(“*”)]PROCEDURE UpdateBalance();BEGIN … GenJnlManagement.CalcBalance(…); … END
The late partner (here GDL) has decided that the new code should be invoked whenever UpdateBalance is invoked, after the original.
The early partner (here Microsoft) is free to modify the body of the procedure.
There is an understanding that the calls to UpdateBalance are the relevant customization point for the Germain customization.
11
Other Trigger properties
• Daisy chaining:
[Trigger(“*”)]PROCEDURE Foo() = …
[Trigger(“*”)]PROCEDURE FooMorten() = …
PROCEDURE FooMortenMore() = …
Invoking Foo also invokes FooMorten and FooMortenMore (in that order).
12
Other Trigger properties
• “Hijacking” (or replacing) a procedure:
[Trigger(“Other”)]PROCEDURE Foo() = …
PROCEDURE Other() = …
Calls to Foo discards the body of Foo and executes Other instead.
13
Other Trigger properties
• Dynamic dispatch:
[Trigger(“=Dispatch”)]PROCEDURE Foo() = …
PROCEDURE Dispatch() = …
Calls to Foo invokes Dispatch, to produce the string controlling the trigger. For example,
PROCEDURE Dispatch() = RETURN “*”; orPROCEDURE Dispatch() = RETURN “Other”; or evenPROCEDURE Dispatch() = RETURN “=NewDispath”;
14
Evaluating the proposal
• Benefits– Little or no new C/AL syntax required– A class of existing customization can be handled
without modifying the corresponding W1
• Drawbacks– Probably not flexible enough– Editing experience messed up
15
More fine-grained customizations• Inserting new customization point (procedure calls) in W1:
PROCEDURE Foo() = PROCEDURE Foo() = A(); A(); B(); <<customization>>
B();-------------------------------------------------PROCEDURE Foo() = PROCEDURE InFooMorten() = A(); <<customization>> InFoo(); B();
[Trigger(“*”)]PROCEDURE InFoo();
• The need for a customization point must be passed back through the chain of developers
16
Goals …
• … to measure how well existing customization fit the model– … to testdrive our analysis tool (currently)• A tree-diff enginge discovering tree alignments• Test data:
– W1 5.0 SP1, and– 39 GDLs of the same version: DK 5.0 SP1, …
– ... to make the tool available for other analyses (long term)
17
A “diff” for C/AL source code
18
Sequence-based diff source code?
• Traditional sequence-based diff (e.g., UNIX diff) does not take program structure into account
• Valid for software merging (e.g., UNIX diff3)• Not appropriate for identifying whole-statement
modifications:IF X = 0 THEN BEGIN IF X = 0 THEN BEGIN Foo(); Foo(); Bar(); END ELSE BEGIN // addEND; Bar();
END
19
Tree-based diff?
• Yes, but what is a tree-based diff?– Preserve depth?– Allow general movements?– Allow re-ordering siblings?– …
• We propose a tree alignment for ordered trees [Jiang,Wang,Zhang CPM’94] as an appropriate way to identify customizations.
20
Tree alignment• A tree alignment A of two trees T, U is a tree
whose nodes are pairs on form
(t, u) (t, -) (-, u)(copy node) (delete node) (insert node)
where t, u are nodes from T, U, and that satisfying an erasure property:
discarding the second components and removing “-” nodes and their paths gives the original T, and (vice versa) removing the first compont gives the original U.
21
Tree alignment
• A tree alignment for ordered trees– … does not preserve depth,– … does not allow re-ordering of siblings,– … does not allow general movements of subtrees
• From the alignment, an edit script (similar to the output of UNIX diff) can be generated
• Interactive examples…
22
Tree alignment algorithm• Dynamic programming for sequence-based diff:
• Dynamic programming for tree alignment– More “complicated”– More complex: O(|T|×|U|×(deg(T)+deg(U))2) time complexity
a b a c c
0 1 2 3 4 5
a 1 0 1 2 3 4
c 2 1 1 2 2 3
b 3 2 1 2 3 3
a 4 3 2 1 2 3
a 5 4 3 2 2 3
Copy:c
c+u
Delete: c c+d
Add:c
c+a
Minimum cost
23
= edit script
Quantitative analyses of single versions (W1, DK, DE, …)
24
Sizes of code pieces
• Code piece = procedure or trigger• Code pieces are uniquely identified by a code
path, e.g.,– Table/317/FIELDS/0/OnValidate– Codeunit/530/CODE/ValidateEnumVal– Form/31/CONTROLS/4/Menu/MENUITEMS/2/OnPush
• Code size measured in AST nodes (≈ number of statements)
25
W1 5.0 SP1 code pieces
26
W1 5.0 SP1 code amount
27
W1 5.0 SP1 numbers• 39,946 code pieces:– 45% has 3 statements or less,– another 30% has 4-10 statements, – yet another 16% has 11-30 statements.
• 357,713 statements:– 8% are in code with 3 statements or less, – another 17% are in code with 4-10 statements, – Yet another 18% are in code with 11-20 statements.
• Roughly the same numbers for GDLs.• The complexity of the tree alignment algorithm is
under control (a W1-GDL diff takes 14-18 minutes on my laptop)
28
W1 5.0 SP1 numbersDetails
• Four code pieces has more than 1,000 statements:– Codeunit 80 “Sales-Post”
PROPERTIES/OnRun (1462 statements, nontrivial) – Codeunit 90 “Purch.-Post”
PROPERTIES/OnRun (1492 statements, nontrivial)– Report 83 “Change Global Dimensions”
CODE/ChangeGlobalDim (1751, trivial code duplications)– Codeunit 406 “Setup Checklist Management”
CODE/TransferContents (2033, trivial code duplications)
29
Quantitative analyses of differences between versions
30
Amount of customization
31
Amount of customizationDetails
• Much variance:– 91 very mild customizations in IS 5.0 SP1 – 2,593 customizations in TH 5.0 SP1
• Some agreement, too:– 2,593 customizations in all of APAC, ID, MY, PH, SG,
and TH– Same for {GB, IE}, {NA-US, NA-USCA, NA-USCAMX},
and {DE, AT}– Not a coincidence: These versions differ only in
language– (But gives a “Proof of concept”)
32
Customization point usage(Hotspots)
4005
2053
1125
Cold spots Hotspots33
Probably false positives due to hotfix
Customization point usage(Hotspots, details)
• Many cold customization points used by only one (4000), two (2000), or three (1000) GDLs.
• A nontrivial number of customization points (42) used by all GDLs!– (Consistent renamings of, e.g., “.name” to
“.Name”)– Probably a hotfix not captured in the repository– (But gives a “proof of concept”)
34
Hot objects
Cold objects Hot objects35
Hot objectsObject Number of GDLs customizing object Codeunit/2 30 Codeunit/11 31 Codeunit/80 31 Table/39 32 Table/37 33 Table/38 33 Table/36 36 Codeunit/12 37 Table/81 37 Codeunit/1 39 Codeunit/424 39 Codeunit/5054 39 Codeunit/5300 39 Codeunit/7152 39 Codeunit/99008517 39 Report/99008512 39
36
Classes of customization by version
37
False positives (due to measuring)
Example customizations
38
Example modificationsModification that should be avoided!
• Codeunit/80/CODE/FillInvPostingBuffer– W1 5.0 SP1:InvPostingBuffer[1]."Line Discount Amount" := "Line Discount Amount";InvPostingBuffer[1]."Inv. Discount Amount" := "Inv. Discount Amount";
– TH 5.0 SP1:InvPostingBuffer[1]."Inv. Discount Amount" := "Inv. Discount Amount";InvPostingBuffer[1]."Line Discount Amount" := "Line Discount Amount";
39
Example modificationsModification that could be avoided
• Table/4/CODE/InitRoundingPrecision– W1 5.0 SP1:"Unit-Amount Rounding Precision" := 0.00001
– ES 50 SP1:"Unit-Amount Rounding Precision" := 0.000001
40
Example modifications• Table/14/FIELDS/5703/OnValidate
– W1 5.0 SP1:BEGIN Postcode.ValidateCity(City, "Post Code");END
– ES 5.0 SP1:BEGIN Postcode.ValidateCity(City, "Post Code", County);END
– APAC 5.0 SP1:BEGIN PostCodeCheck.ValidateCity(CurrFieldNo, DATABASE::Location, Rec.GETPOSITION, 0, Name, "Name 2", Contact, Address, "Address 2", City, "Post Code", County, "Country/Region Code");END
• Candidate for hijacking41
Example modifications• Codeunit 99000889/CODE/SetSalesHeader
– W1 5.0 SP1:REPEAT SalesLine.NEXT = 0 BEGIN "Entry No." := SalesLine."Line No.“ TransferFromSalesLine(SalesLine) SalesLine.CALCFIELDS("Reserved Qty. (Base)") ...END
– APAC 5.0 SP1:REPEAT SalesLine.NEXT = 0 BEGIN IF SalesLine."Build Kit" THEN TransferFromKitSalesLine(SalesLine,OrderPromisingLine) ELSE BEGIN "Entry No." := SalesLine."Line No.“ TransferFromSalesLine(SalesLine) "Source Sub Line No." := 0 SalesLine.CALCFIELDS("Reserved Qty. (Base)") ... ENDEND
42
Using tree alignment forC/AL source code
44
Identifying code modifications
• Operations that can be applied to the old document:– Here (and elsewhere): delete(L1), add(L2),
update(L1, L2), copy(L1)
• These operations have costs• An edit script is a sequence of operations
transforming the old document into the new.• An optimal edit script is one with least cost
45
Finding optimal edit scripts
• In revision control systems,– for merging code: UNIX diff (sequence based)
• In bioinformatics,– for globally aligning protein sequences
(sequence based),– for comparing RNA secondary structure (tree
based)• In (semi-) structured data models,– for comparing XML documents, etc
46
Assigning costs to edit operations
• Costs for deleting an old node, adding a new node, and updating an old node with the label of a new.
• Costs with the right properties give rise to a distance between two trees (in a certain metric space):
D(x,y) ≥ 0D(x,y) = 0, only if x = yD(x,y) = D(y,x)D(x,z) ≤ D(x,y) + D(y,z)
47
Which edit costsgives “best” edit scripts?
• High costs for updates, low costs for adds and deletes:– Pro: Doesn’t equate unrelated statements– Cons: Fails to detect actual updates
• Low costs for updates, high cost for adds and deletes– Pro: Detects actual updates– Cons: Equates unrelated statements
48
High costs for updates• codeunit 73/properties/OnRun from W1-5.0 SP1:
...PurchHeader.TESTFIELD(Status,PurchHeader.Status::Open);FromBOMComp.SETRANGE("Parent Item No.","No.");NoOfBOMComp := FromBOMComp.COUNT;IF NoOfBOMComp = 0 THEN ERROR(Text001, "No.");Selection := STRMENU(Text005,2);...
• codeunit 73/properties/OnRun from TH-5.0 SP1:...PurchHeader.TESTFIELD(Status,PurchHeader.Status::Open);Item.GET("No.");IF Item."Kit BOM No." = '' THEN ERROR(Text001, "No.");KitManagement.GetKitProdBOM(...);IF NoOfBOMComp = 0 THEN ERROR(Text001, "No.");Selection := STRMENU(Text005,2);...
Aha! A smart way to achieve this
Thus, updating the IF should have low cost.
Hmm… no. These lines replaced these
Thus, updating the IF should have high cost.
49
Low costs for updates• codeunit 73/properties/OnRun from W1-5.0 SP1:
REPEAT ToPurchLine.INIT; NextLineNo := NextLineNo + LineSpacing; ToPurchLine."Line No." := NextLineNo; CASE FromBOMComp.Type OF FromBOMComp.Type::" ": ToPurchLine.Type := ToPurchLine.Type::" "; FromBOMComp.Type::Item: ...
• codeunit 73/properties/OnRun from TH-5.0 SP1:REPEAT ToPurchLine.INIT; NextLineNo := NextLineNo + LineSpacing; ToPurchLine."Line No." := NextLineNo; CASE TempProdBOMLine.Type OF TempProdBOMLine.Type::" ": ToPurchLine.Type := ToPurchLine.Type::" "; TempProdBOMLine.Type::Item: ...
50
Which edit costsgives “best” edit scripts?
• Ideally, we would require
update(L1, L2) < delete(L1) + add(L2)
while still taking the content into account. (For example, updating an IF to a WHILE should have a very high cost.)
51
Which edit costsgives “best” edit scripts?
• Currently,delete(L1) = |L1| / 2add(L2) = |L2| / 2update(L1,L2) = 2 × sift3(L1,L2,5), if L1.kind = L2.kind
= |L1| + |L2| - 1, otherwise• Somewhat arbitrary. (Is it a metric?) What works
in one case might not work in another...• sift3 is a linear-time approximate string distance
“algorithm.” (A true sequence alignment would be a better alternative.)
52
Which edit costsgives “best” edit scripts?
• Example dump of
codeunit 73/properties/OnRun
for W1 5.0 SP1 vs. TH 5.0 SP1
53
Conclusions
54
Future work
• (Bugfixes…)
• Run on W1 version N versus W1 version N+1• Run on partner-customized versions • Input annotated programs, to facilitate more
precise program analyses of diffs• Save diffs to a database, to ease querying(?)
55
Top Related