Important definitions in statistics

Important definitions in statistics

ABOUBAKR ELNASHAR

Benha University Hospital, Egypt

ABOUBAKR ELNASHAR

Sensitivity:

Probability of test to be positive when the disease

is present

True positive test

Specificity

Probability of the test to be negative when the

disease is absent

True negative test

Systematic review

is qualitative reports

Meta-analysis

Qualitative analysis of systematic review

ABOUBAKR ELNASHAR

Precision

a description of a level of measurement that

yields consistent results when repeated. It is

associated with the concept of "random error", a

form of observational error that leads to

measurable values being inconsistent when

repeated.

ABOUBAKR ELNASHAR

https://en.wikipedia.org/wiki/Observational_error


Precision or positive predictive value

the proportion of the true positives against all the

positive results (both true positives and false

positives)

ABOUBAKR ELNASHAR

https://en.wikipedia.org/wiki/Positive_predictive_value

https://en.wikipedia.org/wiki/Positive_predictive_value

https://en.wikipedia.org/wiki/False_positive

https://en.wikipedia.org/wiki/False_positive

Accuracy

two definitions:

a level of measurement with no inherent

limitation (i.e. free of systematic error, another

form of observational error).

ISO definition

a level of measurement that yields true (no

systematic errors) and consistent (no random

errors) results.

ABOUBAKR ELNASHAR



https://en.wikipedia.org/wiki/International_Organization_for_Standardization

Accuracy

used as a statistical measure of how well a binary

classification test correctly identifies or excludes a

condition.

Accuracy

is the proportion of true results (both true

positives and true negatives) among the total

number of cases examined.

To make the context clear by the semantics, it is

often referred to as the "Rand accuracy" or "Rand

index".It is a parameter of the test.

ABOUBAKR ELNASHAR

https://en.wikipedia.org/wiki/Binary_classification

https://en.wikipedia.org/wiki/Binary_classification

https://en.wikipedia.org/wiki/True_positive

https://en.wikipedia.org/wiki/True_positive

https://en.wikipedia.org/wiki/True_negative

https://en.wikipedia.org/wiki/Rand_index

https://en.wikipedia.org/wiki/Rand_index

Accuracy may be determined from sensitivity and specificity,

provided prevalence is known, using the equation:

The accuracy paradox for predictive analytics states that

predictive models with a given level of accuracy may have

greater predictive power than models with higher accuracy. It

may be better to avoid the accuracy metric in favor of other

metrics such as precision and recall.

In situations where the minority class is more important, F-

measuremay be more appropriate, especially in situations

with very skewed class imbalance.

ABOUBAKR ELNASHAR

https://en.wikipedia.org/wiki/Prevalence

https://en.wikipedia.org/wiki/Accuracy_paradox

https://en.wikipedia.org/wiki/Predictive_analytics

https://en.wikipedia.org/wiki/Predictive_power

https://en.wikipedia.org/wiki/Precision_and_recall

https://en.wikipedia.org/wiki/Precision_and_recall

https://en.wikipedia.org/wiki/F-measure



Another useful performance measure is the balanced accuracy

which avoids inflated performance estimates on imbalanced

datasets.

It is defined as the arithmetic mean of sensitivity and

specificity, or the average accuracy obtained on either class:

ABOUBAKR ELNASHAR

Confidence interval

A way of expressing certainty about the findings

from a study or group of studies, using statistical

techniques.

A confidence interval describes a range of

possible effects (of a treatment or intervention)

that is consistent with the results of a study or

group of studies.

I am confident 95% that the range is between so

and so

If the range cross 1 , it is insignificant

95% CI (1.05-1.15)= I am 95% confident that the

risk between 1.05 and 1.15

ABOUBAKR ELNASHAR

A wide confidence interval indicates a lack of

certainty or precision about the true size of the

clinical effect and is seen in studies with too few

patients.

Where confidence intervals are narrow they

indicate more precise estimates of effects and a

larger sample of patients studied.

It is usual to interpret a ‘95%’ confidence interval

as the range of effects within which we are 95%

confident that the true effect lies

ABOUBAKR ELNASHAR

In case control study

It is better to have more controls than cases

In clinical studies

It is better for cases and control to be the same

For numbers: t test

For %: chi square

ABOUBAKR ELNASHAR

RR

If 1: no association

<1: negative association

>1: positive association

RR= 2 i.e. risk is doubled

= 5 i.e. risk is 5 times

= 0.5 i.e. negative association ad the risk is

halfed

OR

Is like RR and interpreted as it

ABOUBAKR ELNASHAR

Very common 1/1-1/10 A person in family

Common 1/10-1/100 A person in street

Uncommon 1/100-1/1000 A person in village

Rare 1/1000-1/10,000 A person in small town

Very rare <1/10,000 A person in large town

Royal College of Obstetricians and

Gynaecologists

ABOUBAKR ELNASHAR

Incidence

The rate of new (or newly diagnosed) cases of

the disease.

It is generally reported as the number of new

cases occurring within a period of time (e.g.,

per month, per year).

It is more meaningful when the incidence rate

is reported as a fraction of the population at risk

of developing the disease (e.g., per 100,000 or

per million population).

ABOUBAKR ELNASHAR

The accuracy of incidence data depends upon

the accuracy of diagnosis and reporting of the

disease.

In some cases (including ESRD) it may be

more appropriate to report the rate of treatment

of new cases since these are known, whereas

the actual incidence of untreated cases is not.

Incidence rates can be further categorized

according to different subsets of the population

– e.g., by gender, by racial origin, by age group

or by diagnostic category.

ABOUBAKR ELNASHAR

Prevalence

The actual number of cases alive, with the

disease either during a period of time (period

prevalence) or at a particular date in time (point

prevalence).

Period prevalence provides the better measure

of the disease load since it includes all new cases

and all deaths between two dates

Point prevalence only counts those alive on a

particular date.

Prevalence is also most meaningfully reported as

the number of cases as a fraction of the total

population at risk and can be further categorized

according to different subsets of the population.

ABOUBAKR ELNASHAR

ABOUBAKR ELNASHAR

Important definitions in statistics

Health & Medicine

Transcript of Important definitions in statistics