Agile Analysis 101: Agile Stats v Command & Control Maths
-
Upload
dynacognetics-ltd -
Category
Data & Analytics
-
view
96 -
download
1
description
Transcript of Agile Analysis 101: Agile Stats v Command & Control Maths
Agile Analysis 101
Part 1: Introducing Basic Analysis
Analysis for Dummy’s, Dummies• Most Agile Teams
– Can’t Identify Influential Delivery Factors Plus…– …Over-reliance on Cycle-Time & Throughput– Equals Shooting in the Dark!
• Little’s Law Applies Only When ‘Predictable’• ‘Fixed’ Mathematics Doesn’t Adequately Facilitate Self-Organisation
– Morphogenesis & Chaos• Too hard for most • We don’t know enough (yet)
• Enterprise Mathematical Models too Hard or Based on Unrealistic Assumptions– e.g. Efficient Market Hypothesis,
• required rational investor
• What can Agilists Do?
Analysis Forms
Controlling Maths v Agile Stats
Traditional Mathematical Analysis Modelled the environment in its entirety Every variable identified and mapped Every factor had to be understood in detail …and managed Fit command-and-control really well! Provided an Exact answer
Useful comfort blanket Exclusivity - Very few people understood it
Needed Masters & PhDs in numerate subjects MBA’s not always enough Mathematics, Physics, Operation Research, Engineering…
Area of a Circle (Traditional Way)• Given origin (h,k) & radius
r• Typically learned for GCSE• Have to know:
– equation – r is a factor & how to get it– What ‘squared’ means– Pi is a constant– Know maths
• What if you didn’t?
Source: Google Images
Statistical Analysis Doesn’t require exact model Doesn’t produce an exact answer
Do you need one? Can you rely on one?
Isn’t variable/factor centric Though they may come out
Looks for correlations Which tell you where else to look for more CAREFUL! Correlations aren’t causations!
If you find a link, it doesn’t necessarily mean it’s so
Can be refined, akin to ‘learning’ Increasing number of samples in known range …akin to reducing Kanban batch size or story size Can also use Bayesian Inference
Fits Lean-Agility really well A 10-year old can often do it!
Area of a Circle (Statistical Way)• Grid around the Circle• Count Squares at least
half inside circle• Need more accuracy?
Easy! Use finer grid!• Typically learned at 10
years old!
QuestionTake a look at the examples on the right, which grid is closer to Actual Area?
8 x 8 x 1cm GridDiameter = 8 x 1cm squares = 8cmRadius = Half diameter i.e. 8/2 = 4cm
Area is the number of squares at least half inside circle.
52 squares: 52x(1x1) = 52cm2
20 x 20 x 0.4cm GridDiameter = 20 x 0.4cm squares = 8cmRadius = Half diameter i.e. 8/2 = 4cm
Area is the number of squares at least half inside circle.
312 squares: 312x(0.4x0.4) = 49.92cm2
Actual Area
When r = 4Area = Pi x (4 x 4)
= 50.27cm2
Image Source: Google Images
Compare to Kanban• Backlog the Tickets• Batch together related
epic tickets• If you need more
accuracy, make the batches smaller!– …and/or sprints shorter
• Statistical form is standard in Monte Carlo Algorithms– Always Fast to run…– …But ‘probably’ correct
• In any case, accurate to a particular range• If that range is good enough use it!
Technical Note!
What is Good Enough?
Guide to a Nebulous Term
Definition of Good Enough?Definitions
What I tell Managers: “Any measure with an accuracy matching your ability to change, is good enough.”
What I tell Techies: “Sampling twice as frequent as the change, is good enough.”
- Ethar Alali
• Any more accurate/frequent is waste• Any less and you can’t make decisions– So risk mitigation strategy may be necessary
Example: CD Quality Sound
In ye olden days we had these
• 44.1kHz sample rate• Stereo Sound• 16-bit Digital Sampling• CD stores 650MB Compact Disc
Image Source: Google Images
Example: Compact Disc EncodingFocusing on Useful Data Storage
Ignore Reed-Solomon error correction & detection
Signed 16-bit number can segment audio into ~ 1/65,536 parts44.1kHz means it takes 1x 16 bit number in this range every 1/44,100ths of a secondStereo sound means two sets of microphones and hence 2 sample channelsTotal storage needs for a 3 minute song:• 44,100 samples x 2 bytes per sample x 2 channels x 3minute x 60 seconds = 31.752MB raw per song. • Album = 20 songs = 635 MB of digital data, which fills a 650MB CD
Great for music :-)
Attribution: Image Courtesy of Grahammitchell.com
What About: Telephone Voice on CD?Voice on Telephones is mono not stereo
Needs only one channel!Telephone quality changes pitch in 3K at worst!Voice doesn’t have the refined nature of music! Hence can be recorded in 8-bit (256 parts)3kHz means it takes 1 x 8 bit number in this range ever 1/3,000th of a secondTotal storage needs for a 3 minute conversation:• 3,000 samples x 1 bytes per sample x 1 channels x 3 minute x 60 seconds = 540KB raw. • Album = 20 songs = 10.8 MB of digital data
Stored on 650MB CD, you have almost 640MB of WASTE!
Attribution: Image Courtesy of Grahammitchell.com
What If: We sampled less?• Not an Accurate Picture!
Note: Dashed red edge case, which samples exactly at transition points. In real scenarios this never happens with sound since change isn’t periodic.
RED = 2/3 as fast samplingAMBER = Twice as frequent sampling GREEN = 4 times as frequent
Which is Closer to Actual?
RED = 2/3 as fast samplingAMBER = Twice as frequent sampling GREEN = 4 times as frequent
Traditional Samples in Business• Annual Accounts
– Plc’s have mid-term or quarterly accounts– If they want to be more agile, make it monthly
• Regulatory Reporting• Charity Commission Reports• Franchises Brand Inspections
– Once every 2-3 years, inspected annually• FCA• …
Identify: Easy! Usually associated with ‘Audit’ of some kind. • Self-governing/managing teams Sample themselves!
Correlation != Causation
What they are, How to find them and What they mean
Causation• One thing occurs as a deterministic consequence of something else
– Fingers in high-voltage socket causes death• Link a number of causes to establish behaviour• Needs Two Factors
– Functional process, including all variables– Initial condition (aka Pre-condition)
• ‘Given’ in Gherkin syntax
• Great for Forecasting…– As long as causal-chain always happen
• Near useless in chaotic environments– Depending on when you look at it
• Initial condition may not be known• Sensitive dependence + Feedback injects uncertainty!
• Code runs deterministically, teams normally work chaotically…• …until they reach predictability, then Little’s Law can apply
Example: Causation• y = 2 + x <- function/process• x = 3 <- Initial [pre]condition• y = 5 <- Final outcome/post-condition
• Post-condition = acceptance test criteria– ‘Then’ in Gherkin Syntax
• Really easy for code! Mostly predictable– Fits Gherkin, OCL, VDM, Z etc. perfectly
Correlation• Aims to find [statistical] links between samples
– When causal links not known or samples appear ‘random’– Also shows strength of relationship
• First step in Factor Analysis– Locate influential factors for dependent variables
• Cycle-time• Throughput• Value delivered
• Can be plotted on graph• Needs Manipulation to Fit Gherkin :(• All aim to locate where to sniff next!
Correlations Can Be Seen• Correlations can be
modelled with Linear Regression
• Seen when an increase in one variable increases/decreases another
Source: Image from Utah.edu Mesowest weather
Source: Scatterplot Image from knottwiki teaching
Example: Burnage Library• Correlation Matrix
Example: Burnage Library• Manchester City Council claim: Library closure based on 11
variables for deprivation– Tasked with saving £80 million a year
• Correlation matrix showed strong correlations between Population of Library catchment area &:– Total Library Visitors – Larger catchments correlate with more library
visitors– Active users – Larger catchments correlate with more active users– Participation in Events– …
• But all factors correlated with each other!
Dependent v Independent CorrelationVery High Correlations of dependent combined score & other allegedly independent factors with catchment population
Independent Variable Inter-correlationLead to Q: How come they are so highly correlated?
A: High Inter-correlation between independent variables!
Correlation: Deprivation Q: Was deprivation a factor? A: Deprivation wasn’t a significant consideration, despite the claims of Council
Example: Burnage Library Conclusion• Basics showed that claims weren't supported– Could have done better with Null Hypothesis
• Interdependence of allegedly independent variables meant weighting of catchment area 5x more important than deprivation– Not likely based on deprivation index, as was
claimed– Potentially hinting at a political decision
• Controversial ;)
NEXT TIME: Agile Teams• In Part 2, we examine how this applies to teams. • In summary:
– Gather Cycle-time, Throughput & Value delivered across a few sprints– Match & Correlate Respective
• Bugs• Blockers• Days of week• Team size• Story • Anything else you already have data for
• Don’t– Make too many inferences early on
Thanks for ViewingFurther Reading
Business Planning Examplehttp://www.solver.com/monte-carlo-simulation-exampleMonte Carlo Simulation Tutorial in Excel
“Statistics in Psychosocial Research, Lecture 8 Factor Analysis I” John Hopkins Universityhttp://ocw.jhsph.edu/courses/statisticspsychosocialresearch/pdfs/lecture8.pdf)
“Correlation & Dependence” Wikipedia http://en.wikipedia.org/wiki/Correlation_and_dependence
Ethar Alali @EtharUK @Dynacognetics
Managing Director & Chief ArchitectPolymath-MathMo. Programming since 9 years old. TOGAF 9 Certified, change agent. Blog: GoadingtheITGeek.blogspot.co.uk Specialist ICT Strategists & Advisors.
Member of HiveMind Network for some of the biggest household and corporate multi-nationals.
Accredited Growth Voucher Advisors certified to deliver IT & Web Growth Consultancy as part of the government’s Growth Voucher Scheme.
About Us
Accreditations & Associations