# Z-score to P-value Calculator

Use this **Z to P calculator** to easily convert Z-scores to P-values (one or two-tailed) and see if a result is statistically significant. Z-score to percentile calculator with detailed information on p-values, interpretation, and the difference between one-sided and two-sided percentiles.

## Using the z-score to p-value calculator

If you obtained a Z-score statistic from a given set of data and want to convert it to its corresponding p-value (percentile), this Z to P calculator is right for you. Just enter the Z-score that you know and choose the type of significance test: one-tailed or two-tailed to calculate the corresponding p-value using the normal CPDF (cumulative probability density function of the normal distribution). If you want to make directional inferences (say something about the direction or sign of the effect), select one-tailed, which corresponds to a one-sided composite null hypothesis. If the direction of the effect does not matter, select two-tailed, which corresponds to a point null hypothesis.

Since the normal distribution is symmetrical, it does not matter if you are computing a left-tailed or right-tailed p-value: just select one-tailed and you will get the correct result for the direction in which the observed effect is. If you want the p-value for the other tail of the distribution, just subtract it from 1.

## What is a "p-value"

The p-value is used in the context of a Null-Hypothesis statistical test (NHST) and it is the **probability of observing the result which was observed, or a more extreme one, assuming the null hypothesis is true** ^{1}. It is essentially the proportional area of the Z distribution cut off at the point of the observed Z score. In notation this is expressed as:

**p(x _{0}) = Pr(d(X) > d(x_{0}); H_{0})**

where **x _{0}** is the observed data (x

_{1},x

_{2}...x

_{n}),

**d**is a special function (statistic, e.g. calculating a Z-score),

**X**is a random sample (X

_{1},X

_{2}...X

_{n}) from the sampling distribution of the null hypothesis. This can be visualized in this way:

In terms of possible inferential errors, the p-value expresses the probability of committing a **type I error**: rejecting the null hypothesis if it is in fact true. The p-value is a worst-case bound on that probability. The p-value can be thought of as a percentile expression of a standard deviation measure, which the Z-score is, e.g. a Z-score of 1.65 denotes that the result is 1.65 standard deviations away from the arithmetic mean under the null hypothesis. Therefore, one can think of the p-value as a more user-friendly expression of how many standard deviations away from the normal a given observation is.

## How to interpret a low p-value (statistically significant result)

Saying that a result is **statistically significant** means that the p-value is below the evidential threshold decided for the test before it was conducted. For example, if observing something which would only happen 1 out of 20 times if the null hypothesis is true is considered sufficient evidence to reject the null hypothesis, the threshold will be 0.05. In such case, observing a p-value of 0.025 would mean that the result is statistically significant.

Let us examine what inferences are warranted when seeing a result which was quite improbable if the null was true. **Observing a low p-value can be due to one of three reasons ^{[2]}:**

- There is a true effect from the tested treatment or intervention.
- There is no true effect, but we happened to observe a rare outcome. The lower the p-value, the rarer (less likely, less probable) the outcome.
- The statistical model is invalid (does not reflect reality).

Obviously, one can't simply jump to conclusion 1.) and claim it with one hundred percent certainty, as this would go against the whole idea of the p-value. In order to use the p-value as a part of a decision process you need to consider external factors, which are a part of the experimental design process, which includes deciding on the significance threshold, sample size and power (power analysis), and the expected effect size, among other things.

If you are happy going forward with this much (or this little) uncertainty as is indicated by the p-value, then you have quantifiable guarantees related to the effect and future performance of whatever you are testing.

## Common misinterpretations of p-values

There are several common misinterpretations of p-values and statistical significance and no calculator can save you from falling for them. The most common errors are mistaking a low p-value with evidence for lack of effect or difference, mistaking statistical significance with practical significance, as well as treating the p-value as a probability, related to a hypothesis, e.g. a p-value of 0.05 means that the probability the null hypothesis is true is 5%, or that the probability that the alternative hypothesis is true is 95%. >More details about these misinterpretations.

## Z score to P-value conversion table

Below are some commonly encountered standard scores and their corresponding p-values and percentiles, assuming a **one-tailed hypothesis**.

Standard score (Z) | P-value | Percentile |
---|---|---|

0.8416 | 0.2000 | 80% |

1.2816 | 0.1000 | 90% |

1.6449 | 0.0500 | 95% |

1.9600 | 0.0250 | 97.5% |

2.0537 | 0.0200 | 98% |

2.3263 | 0.0100 | 99% |

3.0902 | 0.0010 | 99.9% |

3.2905 | 0.0005 | 99.95% |

## One-tailed vs. two-tailed p-values

There are wide-spread misconceptions about one-tailed and two-tailed tests, often referred to as one-sided and two-sided hypotheses, and their corresponding p-values ^{[4]}. This is not surprising given that even the Wikipedia article on the topic gets it wrong by stating that one-sided tests are appropriate only if "the estimated value can depart from the reference value in just one direction". Consequently, people often prefer two-sided tests due to the believe that using one-tailed tests results in bias, and/or higher than nominal error rates (type I errors), as well as that it involves more assumptions (about the direction of the effect).

Nothing could be further from the truth. There is **no practical or theoretical situation in which a two-tailed test is appropriate** since for it to be appropriate, the inference drawn or action taken has to be the same regardless of the direction of the effect of interest. That is never the case.

**"A two-sided hypothesis and a two-tailed test should be used only when we would act the same way, or draw the same conclusions, if we discover a statistically significant difference in any direction."** ^{[3]}. Doing otherwise leads to a host of issues, including reporting higher error rates than actual and uncontrolled type III errors: inferring the direction of an effect to be the same as the observed direction, when it is not.

On the other hand, **"A one-sided hypothesis and a one-tailed test should be used when we would act a certain way, or draw certain conclusions, if we discover a statistically significant difference in a particular direction, but not in the other direction."** ^{[5]} which describes all practical and scientific applications of p-values and tests of significance in general. If you carefully formulate your hypotheses, you will find that you always want to state something about the direction of the null and alternative hypothesis.

So, not only do one-tailed tests answer the question you are actually asking, but they are also more efficient, assuming the same significance threshold is used, resulting in %20-%60 faster experiments ^{[3]}. There is no good reason to use two-tailed tests as it results in 20-60% slower tests that answer a question you did not ask. A more in-depth reading and explanations: One-tailed vs Two-tailed Tests of Significance and reference #3 below.

#### References

[1] Fisher R.A. (1935) – "The Design of Experiments", *Edinburgh: Oliver & Boyd*

[2] Georgiev G.Z. (2017) "Statistical Significance in A/B Testing – a Complete Guide", [online] http://blog.analytics-toolkit.com/2017/statistical-significance-ab-testing-complete-guide/ (accessed Apr 27, 2018)

[3] Georgiev G.Z. (2017) "One-tailed vs Two-tailed Tests of Significance in A/B Testing", [online] http://blog.analytics-toolkit.com/2017/one-tailed-two-tailed-tests-significance-ab-testing/ (accessed Apr 27, 2018)

[4] Hyun-Chul Cho Shuzo Abe (2013) "Is two-tailed testing for directional research hypotheses tests legitimate?", *Journal of Business Research* 66:1261-1266

#### Cite this calculator & page

If you'd like to cite this online calculator resource and information as provided on the page, you can use the following citation:

Georgiev G.Z., *"Z-score to P-value Calculator"*, [online] Available at: https://www.gigacalculator.com/calculators/z-score-to-p-value-calculator.php URL [Accessed Date: 20 Oct, 2020].