# Confidence Interval Calculator

Use this **confidence interval calculator** to easily calculate the confidence bounds for a one-sample statistic, or for differences between two proportions or means (two independent samples). One and two-sided intervals are supported, as well as intervals for relative difference (percent difference). It will also output the **P-value and Z-score** when calculating for difference between two groups.

### Quick navigation:

- Using the confidence interval calculator
- What is a confidence interval and "confidence level"
- Confidence interval formula & critical values
- How to interpret a confidence interval
- Common misinterpretations of confidence intervals
- One-sided vs. two-sided intervals
- Confidence intervals for relative difference

## Using the confidence interval calculator

This **confidence interval calculator** allows you to perform a post-hoc statistical evaluation of a set of data when the outcome of interest is the absolute difference of two proportions (binomial data, e.g. conversion rate or event rate) or the absolute difference of two means (continuous data, e.g. height, weight, speed, time, revenue, etc.), or the relative difference between two proportions or two means. You can also calculate a confidence interval for the mean of just a single group. It uses the Z-distribution (normal distribution). You can select any level of significance you require.

If you are interested in a CI from a single group, then you need to know the sample size, sample standard deviation and the sample mean.

If entering proportions data, you need to know the sample sizes of the two groups as well as the number or rate of events. You can enter that as a proportion (e.g. 0.10), percentage (e.g. 10%) or just the raw number of events (e.g. 50).

If entering means data, you need to simply copy/paste or type in the raw data, each observation separated by comma, space, new line or tab. Copy-pasting from a Google or Excel spreadsheet works fine.

The **confidence interval calculator will output**: two-sided confidence interval, left-sided and right-sided confidence interval, as well as the mean or difference ± the standard error of the mean (SEM). For means data it will also output the sample sizes, means, and pooled standard error of the mean. The Z-score and the p-value for the **one-sided hypothesis** (one-tailed test) will also be printed when calculating for the difference between proportions or means, allowing you to infer the direction of the effect.

**Warning:** You must have fixed the sample size / stopping time of your experiment in advance, otherwise you will be guilty of optional stopping (fishing for significance) which will result in intervals that have narrower coverage than the nominal. Also, you should not use this calculator for comparisons of more than two means or proportions, or for comparisons of two groups based on more than one metric. If your experiment involves more than one treatment group or involve more than one outcome variables you need a more advanced tool which corrects for multiple comparisons and multiple testing. This statistical calculator might help.

## What is a confidence interval and "confidence level"

A confidence interval is defined by an upper and lower boundary for the value of a variable of interest and it aims to aid in assessing the uncertainty associated with a measurement, usually in experimental context. The wider an interval is, the more uncertainty there is in the estimate. Every confidence interval is constructed based on a particular required confidence level, e.g. 0.09, 0.95, 0.99 (90%, 95%, 99%) which is also the coverage probability of the interval. A 95% confidence interval (CI), for example, **will contain the true value of interest** 95% of the time (in 95 out of 5 similar experiments).

Simple two-sided confidence intervals are symmetrical around the observed mean, but in certain scenarios asymmetrical intervals may be produced. In any particular case the true value may lie anywhere within the interval, or it might not be contained within it, no matter how high the confidence level is. Raising the confidence level widens the interval, while decreasing it makes it narrower. Similarly, larger sample sizes result in narrower intervals, since the interval's asymptotic behavior is to be reduced to a single point.

## Confidence interval formula

The formula when calculating a one-sample confidence interval is:

where **n** is the number of observations in the sample, **X** (read "X bar") is the arithmetic mean of the sample and **σ** is the sample standard deviation.

The formula for two-sample confidence interval for the difference of means or proportions is:

where **μ _{1}** is the mean of the baseline or control group,

**μ**is the mean of the treatment group,

_{2}**n**is the sample size of the baseline or control group,

_{1}**n**is the sample size of the treatment group, and

_{2}**σ**is the pooled standard deviation of the two groups.

_{p}In both formulas **Z** is the score statistic, corresponding to the desired confidence level. The Z-score corresponding to a two-sided interval at level α (e.g. 0.90) is calculated for **Z _{1-α/2}**, revealing that a two-sided interval, similarly to a two-sided p-value, is calculated by conjoining two one-sided intervals with half the error rate. E.g. a Z-score of 1.6448 is used for a 0.95 (95%) one-sided confidence interval and a 90% two-sided interval, while 1.956 is used for a 0.975 (97.5%) one-sided confidence interval and a 0.95 (95%) two-sided interval. Therefore it is important to use the right kind of interval: more on one-tailed vs. two-tailed intervals.

### Common critical values Z

Below is a table with common critical values used for constructing two-sided confidence intervals.

Two-sided Confidence level | Critical value (Z) |
---|---|

80% | 1.2816 |

90% | 1.6449 |

95% | 1.9600 |

97.5% | 2.0537 |

98% | 2.3263 |

99% | 3.0902 |

99.9% | 3.2905 |

For one-sided intervals, use a value for 2x the error. E.g. for a 95% one-sided interval use the critical value for a 90% two-sided interval above: 1.6449.

## How to interpret a confidence interval

Confidence intervals are useful in visualizing the full **range of effect sizes compatible with the data**. Basically, any value outside of the interval is rejected: a null with that value would be rejected by a NHST with a significance threshold equal to the interval confidence level (the p-value statistic will be in the rejection region). Conversely, any value inside the interval cannot be rejected, thus when the null hypothesis of interest is covered by the interval it cannot be rejected. The latter, of course, assumes that there is a way to calculate exact interval bounds - many confidence interval calculations achieve their nominal coverage only approximately, that is their coverage is not guaranteed, but approximate. This is especially true in complex scenarios, not covered in this confidence interval calculator.

The above essential means that **the values outside the interval are the ones we can make inferences about**. For the values within the interval we can only say that they cannot be rejected given the data at hand. When assessing the effect sizes that would be refuted by the data, you can construct as many confidence intervals at different confidence levels from the same set of data as you want - this is not a multiple testing issue. A better approach is to calculate the severity criterion of the null of interest, which will also allow you to make decisions about accepting the null.

What then, if our null hypothesis of interest is completely outside the interval? What inference can we make from seeing a result which was quite improbable if the null was true?

**Logically, we can infer one of three things:**

- There is a true effect from the tested treatment or intervention.
- There is no true effect, but we happened to observe a rare outcome.
- The statistical model is invalid (does not reflect reality).

Obviously, one can't simply jump to conclusion 1.) and claim it with one hundred percent confidence. This would go against the whole idea of the confidence interval. Instead, with can say that with confidence 95% (or other level chosen) we can reject the null hypothesis. In order to use the confidence interval as a part of a decision process you need to consider external factors, which are a part of the experimental design process, which includes deciding on the confidence level, sample size and power (power analysis), and the expected effect size, among other things.

## Common misinterpretations of confidence intervals

While confidence intervals tend to lead to fewer misinterpretations than p-values, they are still ripe for misuse or bad interpretations. Here are some of the most popular ones, according to Greenland at al. ^{[1]}.

### Probability statements about specific intervals

Strictly speaking, a given interval either contains or does not contain the true value. Therefore, strictly speaking, it would be incorrect to state about a particular 99% (or any other level) confidence interval that it has 99% probability that it contains the true effect or true value. What you can say is that procedure used to construct the intervals will produce intervals, containing the true value 99% of the time.

The reverse statement would be that there is just 1% probability that the true value is outside of the interval. This is incorrect, as it is assigning probability to a hypothesis, instead of the testing procedure. What you can say is that, if any null hypothesis not covered by the interval is true, it will fall outside of such an interval only 1% of the time.

### A 95% confidence predicts where 95% of estimates from future studies will fall

A confidence interval makes no such predictions and usually the probability with which outcomes from future experiments fall within any specific interval is significantly lower than the interval's confidence level.

### An interval containing the null is less precise than one excluding it

How precise an interval is does not depend on whether or not it contains the null, or not. The precision of a confidence interval is determined by its width: the less wide the interval, the more accurate the estimate drawn from the data.

## One-sided vs. two-sided intervals

While confidence intervals are customarily given in their two-sided form, this can often be misleading, if we are interested if a particular value below or above the interval can be excluded at a given significance level. A one-sided interval in which one side is plus or minus infinity is appropriate when we have a null / want to make statements about a value lying **either above or below** the top / bottom bound. By design a two-sided interval is constructed as the overlap between two one-sided intervals at 1/2 the error rate ^{2}.

For example, if we have the two-sided 90% interval (2.5, 10), we can actually say that values less than 2.5 are excluded with 95% confidence precisely because a 90% two-sided interval is nothing more than two conjoined 95% one-sided intervals:

Therefore, to make directional statements based on two-sided intervals, one needs to increase the significance level for the statement. In such cases it is better to **use the appropriate one-sided interval** instead, to avoid confusion.

## Confidence intervals for relative difference

When comparing two independent groups and the variable of interest is the relative (a.k.a. relative change, relative difference, percent change, percentage difference), as opposed to the absolute difference between the two means or proportions, different confidence intervals need to be constructed. This is due to the fact that in calculating relative difference we are doing an additional division by a random variable: the conversion rate of the control during the experiment, which adds more variance to the estimation.

In simulations performed ^{[3]} the difference a naive extrapolation of a confidence interval with 95% coverage for absolute difference had coverage for the relative difference between 90% and 94.8% depending on the size of the true difference, meaning that it had anywhere from a couple of percentage points to over 2 times worse coverage than the one for absolute difference. At the same time a properly constructed 95% confidence interval for relative difference had coverage of about 95%.

The formula for a confidence interval around the relative difference (percent effect) is ^{[4]}:

where **RelDiff** is calculated as **(μ _{2} / μ_{1} - 1)**,

**CV**is the coefficient of variation for the control and

_{1}**CV**is the coefficient of variation for the treatment group, while

_{2}**Z**is the critical value expressed as standardized score.

#### References

[1] Greenland at al. (2016) "Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations", *European Journal of Epidemiology* 31:337–350

[2] Georgiev G.Z. (2017) "One-tailed vs Two-tailed Tests of Significance in A/B Testing", [online] http://blog.analytics-toolkit.com/2017/one-tailed-two-tailed-tests-significance-ab-testing/ (accessed Apr 28, 2018)

[3] Georgiev G.Z. (2018) "Confidence Intervals & P-values for Percent Change / Relative Difference" [online] http://blog.analytics-toolkit.com/2018/confidence-intervals-p-values-percent-change-relative-difference/ (accessed Jun 15, 2018)

[4] Kohavi et al. (2009) "Controlled experiments on the web: survey and practical guide" *Data Mining and Knowledge Discovery* 18:151

#### Cite this calculator & page

If you'd like to cite this online calculator resource and information as provided on the page, you can use the following citation:

Georgiev G.Z., *"Confidence Interval Calculator"*, [online] Available at: https://www.gigacalculator.com/calculators/confidence-interval-calculator.php URL [Accessed Date: 15 Dec, 2018].