Chicago Insurance Redlining Example Were insurance companies in Chicago denying insurance in...

Post on 16-Dec-2015

213 views 1 download

Transcript of Chicago Insurance Redlining Example Were insurance companies in Chicago denying insurance in...

Chicago Insurance Redlining Example

Were insurance companies in Chicago denying insurance in

neighborhoods based on race?

The background

• In some US cities, services such as insurance are denied based on race

• This is sometimes called “redlining.”• For insurance, many states have a “FAIR” plan

available, for (and limited to) those who cannot obtain insurance in the regular market.

• So an area with high numbers of FAIR plan policies is an area where it is hard to get insurance in the regular market.

The data (for 47 zip codes near Chicago)

• involact = # of new FAIR plan policies and renewals per 100 housing units

• race = % minority

• theft = theft per 1000 population

• fire = fires per 100 housing units

• income = median family income in $1000s

First, some description

• Descriptive statistics for the variables

• Box plots

• Histograms

• Matrix plots

• etc.

Descriptive Statistics: race, fire, theft, age, involact, income

Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3

race 47 0 34.99 4.75 32.59 1.00 3.10 24.50 59.80

fire 47 0 12.28 1.36 9.30 2.00 5.60 10.40 16.50

theft 47 0 32.36 3.25 22.29 3.00 22.00 29.00 39.00

age 47 0 60.33 3.29 22.57 2.00 48.00 65.00 78.10

involact 47 0 0.6149 0.0925 0.6338 0.0000 0.0000 0.4000 0.9000

income 47 0 10.696 0.402 2.754 5.583 8.330 10.694 12.102

Variable Maximum

race 99.70

fire 39.70

theft 147.00

age 90.10

involact 2.2000

income 21.480

100806040200

16

12

8

4

0403020100

16

12

8

4

01501209060300

20

15

10

5

0

806040200

10.0

7.5

5.0

2.5

0.02.01.51.00.50.0

16

12

8

4

02016128

16

12

8

4

0

race

Frequency

fire theft

age involact income

Histogram of race, fire, theft, age, involact, income

100

75

50

25

0

40

30

20

10

0

160

120

80

40

0

80

60

40

20

0

2.0

1.5

1.0

0.5

0.0

20

15

10

5

race fire theft

age involact income

Boxplot of race, fire, theft, age, involact, income

40200 100500 18126

100

50

040

20

0 160

80

0100

50

02

1

0

100500

18

12

6

160800 210

race

fire

thef

tag

ein

vola

ct

race

inco

me

fire theft age involact income

Matrix Plot of race, fire, theft, ... vs race, fire, theft, ...

Simple linear regression model

• Fit a model with involact as the response and race as the predictor

• A strong positive relationship gives some evidence for redlining

100806040200

2.5

2.0

1.5

1.0

0.5

0.0

race

invola

ct

S 0.448832R-Sq 50.9%R-Sq(adj) 49.9%

Fitted Line Plotinvolact = 0.1292 + 0.01388 race

What’s next

• The matrix plot showed that race is correlated with other predictors, e.g., income, fire, etc.

• So it’s possible that these are the important factors in influencing involact

• Next the full model is fit

The regression equation is

involact = - 0.609 + 0.00913 race + 0.0388 fire - 0.0103 theft + 0.00827 age

+ 0.0245 income

Predictor Coef SE Coef T P

Constant -0.6090 0.4953 -1.23 0.226

race 0.009133 0.002316 3.94 0.000

fire 0.038817 0.008436 4.60 0.000

theft -0.010298 0.002853 -3.61 0.001

age 0.008271 0.002782 2.97 0.005

income 0.02450 0.03170 0.77 0.444

S = 0.335126 R-Sq = 75.1% R-Sq(adj) = 72.0%

Analysis of Variance

Source DF SS MS F P

Regression 5 13.8749 2.7750 24.71 0.000

Residual Error 41 4.6047 0.1123

Total 46 18.4796

What have we learned?

• Race is still highly significant (t = 3.94, p-value ≈ 0) in the full model

• Income is not significant (this isn’t surprising, since race and income are highly correlated).

Diagnostics

• Some plots are next.

• Uninteresting (good!)

• We’ll ignore more substantial diagnostics such as looking at leverage and influence, although these should be done.

1.00.50.0-0.5-1.0

99

90

50

10

1

Residual

Perc

ent

2.01.51.00.50.0

1.0

0.5

0.0

-0.5

-1.0

Fitted Value

Resi

dual

0.80.40.0-0.4-0.8

16

12

8

4

0

Residual

Fre

quency

454035302520151051

1.0

0.5

0.0

-0.5

-1.0

Observation Order

Resi

dual

Normal Probability Plot Versus Fits

Histogram Versus Order

Residual Plots for involact

Model selectionResponse is involact

i t n r f h c a i e a o Mallows c r f g mVars R-Sq R-Sq(adj) Cp S e e t e e 1 50.9 49.9 37.7 0.44883 X 2 63.0 61.3 19.8 0.39406 X X 3 69.3 67.2 11.5 0.36310 X X X 4 74.7 72.3 4.6 0.33352 X X X X 5 75.1 72.0 6.0 0.33513 X X X X X