Revenue Protection South Americaunponte2018.stat.unipd.it/slides/Larcher.pdf · be a UH-60...
Transcript of Revenue Protection South Americaunponte2018.stat.unipd.it/slides/Larcher.pdf · be a UH-60...
Revenue Protection South AmericaA machine learning approach to reducenon-technical losses
Mario Namtao Shianti Larcher
Data Competence Center
Global Digital Solutions
TopicsOur journey for a better identification of frauds and malfunctions
Introduction
Current modelling approach
Architectural details
Results
EnelWho we are
71 million end
users around the
world
Over 70,000
people in 34
countries
Thermal
capacity
46.6 GW
Renewable
capacity
42.5 GW
Revenue Protection
4
What do we mean by revenue protection?
In the utility sector is very important tocarry out targeted field inspections inorder tomaximize energy recovery.
Revenue Protection is a global project withthe goal of identifying frauds andmalfunctions using advanced analytics.
DataWe live in a Big Data world
Geography
Consumption
Meter
Contract
HistoricalEvents &Inspections
Modelling PipelineFrom raw data to a score
Feature EngineeringInject our domain knowledge into the model
Consumption Localization of the drop in consumption
Estimation of the consumption lost
Consumption statistics (mean, standard deviation, etc.)
Meter Ease of tampering
Malfunction rate
Contract Behavior based on the tariff type
Behavior based on the industry sector code
Geography Latitude
Longitude
Area hit rate
Historical Events & Inspections Suspension history
Previous inspections results
Meter / customer changes
Feature EngineeringExample: Localization of the drop in consumption
BiggestDrop: The biggest
drop in absolute value
Greater20Drop: Last drop
greater than 20%
MinMaxDrop: Minimax
algorithm, locate the drop
using game theory
BiggestDrop
MinMaxDrop
Greater20Drop
PlayerMax
PlayerMin
Probabilistic decompositionHow we break the problem
E(Energy | X, Fraud or Malf) P(Fraud or Malf | X)
E(Energy | X)
ClassificationRegression
“If linear regression was a Toyota CamryFiat 500, then Gradient Boosting would be a UH-60 Blackhawk Helicopter” Ben Gorman
Gradient boostingBeyond linear models
Left: Example of partial dependence plot,
many features have a clear non-linear relation
with the target
Sampling biasWe have the wrong data for our goal
𝑷 𝑭𝒓𝒂𝒖𝒅 𝒐𝒓 𝑴𝒂𝒍𝒇 𝑿= 𝑷 𝑭𝒓𝒂𝒖𝒅 𝒐𝒓𝑴𝒂𝒍𝒇 𝑿, 𝑰𝒏𝒔𝒑 ∗ 𝑷(𝑰𝒏𝒔𝒑|𝑿)+ 𝑷 𝑭𝒓𝒂𝒖𝒅 𝒐𝒓𝑴𝒂𝒍𝒇 𝑿,𝑵𝒐𝒕𝑰𝒏𝒔𝒑 ∗ (𝟏 − 𝑷 𝑰𝒏𝒔𝒑 𝑿 )
What we would
like to estimate What we are
estimating
What we cannot
estimate
General Overview of Data Flow
12
We’re using a Big Data Architecture
ED Chile
Data Lake Market
Local legacy systems +
new Enel systems
ED Peru
Codensa
Edesur
ED Rio
ED Ceara
CELG
Build up of
complex
variables &
machine
learning
predictive
models
Data Lake NCO I&N (7 Worker Nodes) computation
nodes
Bases
Nightly
refreshMonthly
refresh
Score
120 final
tablesHundreds
of tables
21 extractors to extract,
clean and organize data
ResultsThe impact of our Machine Learning solution
Preliminary results
(Oct 2017-Sep 2018)
+0%
Improvement in theTPE rate
with the new ML approach
> +500%
CODENSA
+700%
Even compared with
previous sophisticated
approaches, our new
100% Machine Learning
solution appear to be a
winning strategyED CHILE
+50%
ED CELG
+300%
ED RIO
+70%
ED CEARA
+100%
ED ARGENTINA
+80%
14
THANK YOU!