Severity Calculator

Use this severity calculator to easily calculate the severity criterion (SEV). Use it to determine what inferences about the difference between the means of proportions or continuous metrics are warranted (passed a sever test) given some data. Developed in accordance with the severe testing concept of Prof. Deborah Mayo and Prof. Aris Spanos [1].

α =
μ1 =
Share calculator:

Put it on your site!
get code

What is "severity" and "severely tested"

Severe testing is a meta-statistical principle in statistical inference which addresses some common issues faced by users of statistical data and statisticians when interpreting p-values (significance levels) and observed power in post-hoc analyses of tests. In particular it makes fallacies related to tests of significance that are overly sensitive or not sensitive enough.

The idea is that only statistical hypotheses that have passed severe tests should be inferred from the data. Severity is a post-data assessment similar to the p-value / significance level, and observed (post-hoc) power. that severity evaluates the probativeness of a given test with regards to a particular inference. The higher the severity value is, the more warranted the statistical hypothesis is.

Similarly to how statistical power is evaluated at the point of the minimum detectable effect, severity is evaluated at the point of interest. It is denoted by μ1, which is equal to 0 + y).

More formally put: data x0 in test T provides good evidence for inferring a hypothesis H (just) to the extent that H passes severely with x0, i.e., to the extent that H would (very probably) not have survived the test so well were H false. [1]

Severity criterion

The severity criterion can be formally expressed as [1]:

SEV(T(α), d(x0), μ ≤ μ1)

where μ1 = (μ0 - y), for some y ≥ 0. (T(α), d(x0)) can be suppressed where there can be no confusion and the abbreviation SEV(μ ≤ μ1) can be used.

In its fullest, the severity equation looks like so:

SEV(μ ≤ μ1) = P(d(X) > d(x0); μ ≤ μ1 false) = P(d(X) > d(x0); μ > μ1))

The value of SEV depends on the outcome of the statistical test and the value you want to infer about (μ). If the statistic of the original test is in the rejection region, then SEV(μ > μ1) is calculated, estimating the severity with which one can accept the claim (μ > μ1). If the statistic of the original test is outside the rejection region, then SEV(μ ≤ μ1) is calculated, estimating the severity of the test passed by the claim (μ ≤ μ1).

In a severity evaluation, the original statistical design (sample size, power and significance threshold) are not altered, rather we perform a check to see how well a given inference of interest was probed by the data at hand. In other words, how warranted would such an inference be.

How to interpret severity

Interpreting the severity of a given test in relation to a given inference about μ1 is straightforward: the higher the severity of the test, the higher its probativeness is relative to that μ1, and the more warranted the inference, be it (μ ≤ μ1) or (μ > μ1). Respectively, the lower the severity, the less warranted the statistical inference is.

This is in contrast with the uninformative nature of a low p-value and/or a non-significant result (by itself), as well as the lack of information about what values other than the original null hypothesis are also excluded at a given significance level, for which the use of confidence intervals is necessary.


We have a binomial trial with two groups: control (A) and treatment (B), planned for and executed with 1,000 observations in each group, and we have set the significance threshold at α = 0.05. The observed mean event rate in group A of 12% (0.12) and in group B of 15% (0.15).

We are wondering how well corroborated is the claim that the event rate in the treatment group is higher than 16% (0.16) (μ > 16%)? We plug the numbers in our severity calculator and get SEV(μ > 16%) = 0.256443. That is a very low probability: 1 in 4, so making that statistical inference will be poorly justified in most situations.

On the other hand, if we wanted to check how warranted the conclusion that the event rate in the treatment group is larger than 12.4% (0.124) is, we get SEV(μ > 12.4%) = 0.955558 which means that such a claim is well-probed given the data.

Expanding on the first example, let us imagine that the treatment (B) had a mean of 14% instead of 15%, meaning that the test fails to reject the null at α = 0.05. In this case, how warranted is it to accept the hypothesis that μ < 12% given the data at hand? A severity calculation reveals that SEV(μ ≤ 12%) = 0.091793, meaning that such an inference will be very poorly supported.


[1] Mayo D.G., Spanos A. (2006) – "Severe Testing as a Basic Concept in a Neyman–Pearson Philosophy of Induction", British Society for the Philosophy of Science, 57:323-357

[2] Mayo D.G., Spanos A. (2010) – "Error Statistics", in P. S. Bandyopadhyay & M. R. Forster (Eds.), Philosophy of Statistics, (7, 152–198). Handbook of the Philosophy of Science. The Netherlands: Elsevier.

Cite this calculator & page

If you'd like to cite this online calculator resource and information as provided on the page, you can use the following citation:
Georgiev G.Z., "Severity Calculator", [online] Available at: URL [Accessed Date: 22 Jun, 2018].