High-performance sheet-defined functions in Excel - Peter Sestoft at Sems 2014

14

Click here to load reader

Transcript of High-performance sheet-defined functions in Excel - Peter Sestoft at Sems 2014

Page 1: High-performance sheet-defined functions in Excel - Peter Sestoft at Sems 2014

www.itu.dk 1

High-performance sheet-defined functions

in spreadsheets Peter Sestoft

IT University of Copenhagen

SEMS 2014-07-02

With thanks to Thomas S Iversen, Daniel Cortes, Morten Hansen, Poul Serek, Morten Poulsen, Hui Xu, Mainul Liton, Poul Brønnum,

Tim Garbos, Kasper Videbæk, Jens Hamann, Jonas Druedahl Rask, Simon Eikeland Timmermann

Page 2: High-performance sheet-defined functions in Excel - Peter Sestoft at Sems 2014

www.itu.dk 2

The trouble with functions •  One cannot define functions in a spreadsheet •  To define new functions, ”experts” use VBA •  Often very poorly, witness newsgroup

microsoft.public.excel.programming •  Many (Excel) built-in functions are bad:

– Week numbers: two kinds, but not ISO standard •  Possible answers to this mess:

–  ”People should not use spreadsheets” –  ”Only computer scientists should define functions” –  ”All necessary functions should be built in” – Or: Functions within the spreadsheet metaphor

(Nuñez 2000, Peyton-Jones et al 2003)

Page 3: High-performance sheet-defined functions in Excel - Peter Sestoft at Sems 2014

www.itu.dk

Problem example: Area of triangles •  Area of triangle with sides a, b, c is

SQRT(s(s-a)(s-b)(s-c)) where s = (a+b+c)/2 Either (1) compute s in column D:

or (2) try to inline s in the area formula:

Annoying intermediate result

Horrible and error-prone

Page 4: High-performance sheet-defined functions in Excel - Peter Sestoft at Sems 2014

www.itu.dk

A solution: Sheet-defined function TRIAREA

Func

tion

shee

t O

rdin

ary

shee

t

Input cells

Output cell

Page 5: High-performance sheet-defined functions in Excel - Peter Sestoft at Sems 2014

www.itu.dk 5

How use sheet-defined functions •  Assumptions

– End-users understand spreadsheet models – End-users do not understand VBA, C#, VB.NET, …

•  Sheet-defined functions in the organization – Models are developed in ordinary spreadsheets – After a while functions are factored out of models – Functions can be further developed interactively – An organization can develop and share libraries – Without preventing further evolution by users

•  Works only if – Sheet-defined functions are fast enough

Page 6: High-performance sheet-defined functions in Excel - Peter Sestoft at Sems 2014

Dual implementation

•  Ordinary sheets, interpretive evaluation

–  Frequently edited, rarely evaluated (at "recalculation")

•  Function sheets, compiled evaluation –  Rarely edited, frequently evaluated (at function calls) –  Run-time code generation permits interactive editing

Page 7: High-performance sheet-defined functions in Excel - Peter Sestoft at Sems 2014

Runtime code generation =SQRT(a1*a1+a2*a2)

ldloc 2 ldloc 2 mul ldloc 3 ldloc 3 mul add call Math.Sqrt

fldl 0xfffffff0(%ebp) fldl 0xfffffff0(%ebp) fmulp %st,%st(1) fldl 0xffffffe8(%ebp) fldl 0xffffffe8(%ebp) fmulp %st,%st(1) faddp %st,%st(1) fsqrt

Spreadsheet formula

.NET bytecode

x86 machine code

My compiler

JIT compiler

Result: A very fast, portable spreadsheet implementation

Page 8: High-performance sheet-defined functions in Excel - Peter Sestoft at Sems 2014

www.itu.dk

New book (next month) •  Spreadsheet Implementation Technology,

MIT Press, August 2014

8

Peter Sestoft

SpreadsheetImplementationTechnology

Basics and Extensions

Version 0.99.5 of 2014-05-10

The MIT PressCambridge, MassachusettsLondon, England

•  A standard spreadsheet implementation

•  Sheet-defined functions •  Examples •  Design choices •  Scalability and speed •  Implementation details •  Funcalc user manual

Page 9: High-performance sheet-defined functions in Excel - Peter Sestoft at Sems 2014

www.itu.dk 9

Example function: NORMDISTCDF

•  Normal distribution N(0,1) cumulative distribution function •  As accurate as Excel’s built-in NORMSDIST(z), and faster

Input cell Output cell

Page 10: High-performance sheet-defined functions in Excel - Peter Sestoft at Sems 2014

NORMDISTCDF generated code

•  Approximately 118 ns/call on 2.66 GHz Intel Core 2 •  VBA: 1760 ns; Excel built-in: 1140 ns; C#: 64 ns; C: 54 ns

0000 ldarg V_0 0068 ldloc.0 0198 div 0004 call ValueToDoubleOrNan 0069 call Double.IsInfinity 0199 add 0009 stloc.s V_6 006e brtrue IL_01a0 019a div 000b ldloc.s V_6 0073 ldloc.0 019b br IL_01a1 000d call Math.Abs 0074 call Double.IsNaN 01a0 ldloc.0 0012 stloc.3 0079 brtrue IL_01a0 01a1 br IL_01a7 0013 ldc.r8 -1 007e ldloc.0 01a6 ldloc.0 001c ldloc.3 007f ldc.r8 7.071 01a7 stloc.s V_5 001d mul 0088 bge IL_0144 01a9 ldloc.s V_6 001e ldloc.3 008d ldloc.s V_4 01ab stloc.0 001f mul 008f ldc.r8 220.206867912376 01ac ldloc.0 0020 ldc.r8 2 0098 ldloc.3 01ad call Double.IsInfinity 0029 div 0099 ldc.r8 221.213596169931 01b2 brtrue IL_01f3 002a call Math.Exp 00a2 ldloc.3 01b7 ldloc.0 002f stloc.s V_4 00a3 ldc.r8 112.07929149787 01b8 call Double.IsNaN 0031 ldloc.3 00ac ldloc.3 01bd brtrue IL_01f3 0032 stloc.0 00ad ldc.r8 33.912866078383 01c2 ldloc.0 0033 ldloc.0 00b6 ldloc.3 01c3 ldc.r8 0 0034 call Double.IsInfinity 00b7 ldc.r8 6.37396220353165 01cc bge IL_01dd 0039 brtrue IL_01a6 00c0 ldloc.3 01d1 ldloc.s V_5 003e ldloc.0 00c1 ldc.r8 0.700383064443688 01d3 call NumberValue.Make 003f call Double.IsNaN 00ca ldloc.3 01d8 br IL_01ee 0044 brtrue IL_01a6 00cb ldc.r8 0.035262496599891 01dd ldc.r8 1 0049 ldloc.0 00d4 mul 01e6 ldloc.s V_5 004a ldc.r8 37 00d5 add 01e8 sub 0053 ble IL_0066 00d6 mul 01e9 call NumberValue.Make 0058 ldc.r8 0 00d7 add 01ee br IL_01f9 0061 br IL_01a1 00d8 mul 01f3 ldloc.0 0066 ldloc.3 00d9 add 01f4 call NumberValue.Make 0067 stloc.0 ... 61 lines left out ... 01f9 ret

A s

ingl

e un

wra

ppin

g

Wra

ppin

g W

rapp

ing

W

rapp

ing

Page 11: High-performance sheet-defined functions in Excel - Peter Sestoft at Sems 2014

Examples: Calendrical functions •  Excel’s calendar functions are poor

–  Wrong before 1900, no ISO week numbers, cannot easily find first Monday of month, Easter, …

•  Easy to implement as sheet-defined functions •  Example: Easter in a given year (1400 ns/call):

By MSc students Xu and Liton, following Dershowitz & Reingold (3rd ed, Cambridge UP)

Input: year

Output: Easter fixdate

•  Some other functions: –  Fixdate to/from day-month-year –  Fixdate to/from ISO week and ISO year –  Last/nth Monday (etc) before given date –  First/nth Monday (etc) after given date

Page 12: High-performance sheet-defined functions in Excel - Peter Sestoft at Sems 2014

Higher-order functions: Sheet-defined functions as values

•  New built-ins to manipulate functions – CLOSURE(“name”, a1, …) evaluates to a closure:

a partially applied sheet-defined function – APPLY(f, b1, …) applies a function value

•  Example function “ndie”, a general n-side die

•  Defining & rolling 6-sided and 20-sided dice

Input cell

Output cell

Page 13: High-performance sheet-defined functions in Excel - Peter Sestoft at Sems 2014

www.itu.dk

Funsheet: Linking Excel and Funcalc •  Sheet-defined functions in Excel! •  Eikeland and Timmermann MSc, June 2014

•  Via Excel DNA, an Excel-.NET bridge •  Generated code is as fast as Funcalc •  Call speed Excel -> Funcalc suffers from

general Excel slowness, 11 us/call or so

•  Complete Funcalc functionality: DEFINE, CLOSURE, APPLY, SPECIALIZE, BENCHMARK

•  Prototype, so still a number of defects 13

Page 14: High-performance sheet-defined functions in Excel - Peter Sestoft at Sems 2014

www.itu.dk

TO DO: Validation •  Improve the Excel <-> Funcalc link •  Demonstrate one application area •  Fix obvious problems •  Perform development experiments •  Perform maintenance experiments •  ...

•  But experiments is not my area of expertise

14