Simon Murphy simon.murphy@codematic Developer – Codematic Ltd
description
Transcript of Simon Murphy simon.murphy@codematic Developer – Codematic Ltd
Management Paper - Comparison of Spreadsheets with other development
tools (limitations, solutions, workarounds and alternatives)
Simon Murphy
Developer – Codematic Ltd
Spreadsheet background• 5-30 Mb size• 20-200,000 formulas• 1-10,000 unique formulas• 5-10,000 lines of VBA• £1M to Billions in values• Often linked to other technologies such as
OLAP, ADO, COM or .net etc.• Finance, Banking and Sales and Marketing
areas• Growth rate 500 pa pa (12k data items ph)
Agenda• 2 Apologies – Excel v Spreadsheet, Critical• Motivation• Definitions• Brief Summary Of Main Points• In Depth Analysis And Demos
– Select by location– Worksheet & VBA insecurity– Formula complexity
• Culture• Summary• Conclusion• Any Questions
Motivation (For writing the paper)
• Spreadsheets seem more difficult to test effectively than databases and source code.
• Most mainstream/formally trained developers shun spreadsheets as a development tool.
• Most spreadsheet developers only work in spreadsheets, rarely databases, or procedural languages.
• Spreadsheets seem to be stuck on ‘Garbage in Garbage Out’ whilst mainstream development has moved to the much more robust ‘No Garbage In’ approach.
• Many observers recommend extra tools, methodologies or training to improve spreadsheet quality, they miss the key point – People use spreadsheets because of their flexibility not in spite of it.
• Is it realistic to comply with the spirit of Sarbanes-Oxley with spreadsheets?
• I believe spreadsheets should have a much more limited role in important information systems.
Definitions• Spreadsheet – powerful and flexible, single end user,
analysis and presentation tool, optimised for speed of initial development.
• Spreadsheet Paradigm – normal reference based formulas, excludes lookups and pivots etc.
• Systems Development lifecycle – Requirements, Analysis, Design,<Technology Choice>, Construction, Test, Release, Maintain. In some shape or form.
• Spreadsheet Development lifecycle – “Oh! I need a model” – clickety-click, reasonableness check, release, (Test & Maintain in live environment).
Brief Summary
1. Inherently fragile2. Not type safe3. Only have global scope4. Lack of data/logic separation5. Insecure6. Don’t scale well7. Limited development tools8. VBA generally poorly written9. Often ad-hoc10. External links can be dangerous
• Spreadsheets are great for many jobs, but their flexibility make them ...
1. Focus on smaller models2. Clear layout, data validation3. Use blocks with spaces4. Layout for understanding5. Use compiled language6. Use in requirements phase7. 3rd Party tools from web8. Take training/coaching9. Sketch design on paper first10. Use auditable import routine
Issue Possible workaround
External link example from a commercial model
• 1 workbook linked to 34 other workbooks, – 20 of which were found – 14 missing (one was
‘book1’ (i.e. not saved)) – over 100 links found in
total. • Excel generally does not
know or warn about circular references through external links.
• Links lock directory structure
• Note: Approx 1.4 Million data items
In Depth and Demos
• Select by Location v Select by Value
• Worksheet and VBA Insecurity
• Formula complexity
In Depth and Demo 1
Select by location not value
Original paper spreadsheet had only one mode of access, by the human user, reading and writing the text and numbers.
In Depth and Demo 1
Select by location not value
Electronic spreadsheets have two modes of access1.The human user, reading and typing the text and numbers, as before. 2.The spreadsheet itself, calculating formulas based on cell address (location).
This disconnect is great for flexibility but makes spreadsheets very fragile.
In Depth and Demo 1
Database approach
In a database the user issues queries based on the values in the actual data, the database uses these same values to provide results. There is only one access mode, there is no disconnect.
SELECT * FROM PL WHERE LineItem = ‘Sales’
In Depth and Demo 1
Select by location not value
Demo 1: Which worksheet is correct
Demo 2: The database approach
In Depth and Demo 2
Worksheet and VBA InsecurityKey point 1: Security is measured in time and effort.
Eg. Safes – TXTL-60X6 – Torch, explosive and tool resistant for 60 minutes (all 6 walls)In software – User skill, access and specialist tools are considerations
(Personal experience: Relying on spreadsheet security can make bad things happen.)
Demo 1: Break sheet protection faster than setting it.
Demo 2: Cracking VBA protection for fun.
Key point 2: Spreadsheet security is not weak or poor. If anything it is probably too strong (for a single end user tool).
In Depth and Demo 3Formula Complexity (1)
How many people are intermediate or better in spreadsheets?
How many people have any experience of C#?
In Depth and Demo 3Formula Complexity (2)
Who can tell us what this (array) formula does?
{=INDEX(Circulation!CU1:CU175,
MAX(ROW(Circulation!CU71:CU175)
*(Circulation!CU71:CU175<>0)))}
In Depth and Demo 3Formula Complexity (3)
Who can tell us what this custom worksheet formula does?
=LastNonZeroValueFromVerticalList(
Circulation!CU71:CU175)
In Depth and Demo 3Formula Complexity (4)
Note: Although it is possible to write worksheet functions in .net, c++ xlls are still the standard for performance reasons.
Note: function defined once, used many times in workbook
In Depth and Demo 3Formula Complexity (4)
Performance (57,000 Calcs, 10 times)Version Time secs % of Excel
Excel 27 100%VBA 29 107%C++ 13 48%C# 260 963%(Uses XLL+ from Planatech)
Optional Relative complexitySpreadsheets v mainstream
Glass suggests that for each 25% increase in problem complexity the complexity of the required solution doubles.
Spreadsheet
Database/ ProceduralLang
Glass 1:4 ratio
Effort required to learn data theory and/or structured
programming
Problem Complexity
So
luti
on
Co
mp
lexi
ty
Relative Complexity - Spreadsheets v Mainstream tools
Culture• If Spreadsheets have such clear limits, why are
they so regularly abused?• Must be commercial pressure, preference for speed over
accuracy. Considering current skills, time and cost to build, spreadsheets are a great tool. Considering Total Cost of Ownership they aren’t.
• If the research is right about error rates why no large scale business collapse?
• Spreadsheet errors must not be material (or not mission critical at least). Probably spreadsheets are only a part of the decision support system.
Summary• Although spreadsheets have many uses, they
also have some features that make them inappropriate for certain types of development. We looked at 3 in depth:– Inherent weakness of the ‘Select by location’
approach compared to ‘Select by value’– No real security to protect model integrity or
intellectual property.– Rather cryptic native syntax (even experienced
developers sometimes have difficulty)• Spreadsheet robustness and reliability can be
increased by use of complimentary technologies like .net and databases.
Conclusion• The ‘Select by Location’ approach is a
fundamental cause of spreadsheet complexity and fragility.
• Spreadsheets are a superb tool with many uses, mainly at the smaller and shorter-term end of system developments
• Be aware of their limitations
• You have a choice of tools – choose wisely.
Questions?
• [email protected]– Spreadsheet consulting, reviewing,
maintaining, rescuing, migrating, add-in development etc.
• Websites– www.codematic.net– www.xlanalyst.co.uk