Brief Introduction to Z-Tests

In our previous article Power and Sample Size in a Nutshell we gave a broad overview of power and sample size calculations. Z-tests are among the most basic of statistical hypothesis testing methods, and are often taught at an introductory level. Nevertheless, they still hold an important position in the realm of applied statistical analysis. They offer, at the very least, a good starting point for many power analysis and sample size determination situations. In this article we go over some z-test basics that will help set up other articles dealing with z-test formula derivations.

Relevance

A z-test is used to compare the mean of a normal random variable to a specified value, $\mu_0$. But don't get hung up on the "normal random variable" part. Z-tests can be used in situations where the data is generated from other distributions, such as binomial and Poisson. This is thanks to properties of maximum likelihood estimators.

Name

The name "z-test" comes from the fact that inference is made from the standard normal distribution, and "Z" is the traditional symbol used to represent a standard normal random variable. Z-tests were originally important because it gave researchers an easy way to perform statistical hypothesis testing when computing power was more limited. That is, p-values and quantiles are easily obtained from standard normal tables.

Notation and Assumptions

Data

Let's represent the data by $Y_1, Y_2, \dots, Y_n$, which may or may not come from a normal distribution.

Normal Data: If the data can be thought of as independent observations from a common normal distribution, then a z-test helps us make inference about the true mean.
Binomial Data: If the data represent independent binary outcomes, where each outcome has the same success probability, then a z-test helps us make inference about the proportion of success, or success probability $p$.
Poisson Data: If the data represent counts or rates that can reasonably be assumed to be independent observations from the same Poisson distribution, then a z-test helps us make inference about the true rate $\lambda$.

Test Statistic

Regardless of the distribution from which the data are generated, we assume there is a test statistic, $X$, that is a normal random variable with mean $\mu$ and variance $\sigma^2_n$. $$X\sim N\left(\mu,\sigma^2_n\right)$$ Here, $X$ is a function of the data rather than the data itself. For example, $X$ might be the average, or the maximum likelihood estimator of a parameter of interest. The assumption is that the probability distribution of $X$ can be reasonably approximated by the normal distribution, regardless of the distribution of the data. We include an "$n$" subscript on the variance to emphasize that it depends on sample size, $n$.

Assumptions

In addition to the assumption that $X$ is a normal random variable, we also assume that the variance $\sigma^2_n$ is known. In practice, we rarely know the true value of $\sigma^2_n$. However, if we have a good guess, perhaps from a pilot study or from subject-domain knowledge, then it is reasonable to use this value in power and sample size determinations. Or, better yet, to use a range of "best guesses" for $\sigma^2_n$ and assess the sensitivity of power and/or sample size to the different possible values of $\sigma^2_n$. Our calculators are particularly well-suited for these types of analyses, since our graphs allow quick visual interpretations of this sensitivity. If we don't know what the value of $\sigma^2_n$ should be, it is best to use a t-test, which accounts for having to estimate $\sigma^2_n$.

Hypotheses

Several pairs of hypothesis are often used in practice, and below we list 5 of these. The one-sided alternative hypotheses are used whenever we're only interested in whether the mean is larger than or less than the null-hypothesis value. For example, we may know that the mean of an existing medical treatment or marketing approach is $\mu_0$, and we're only interested in whether the mean positive response is now greater than the old mean of $\mu_0$. On the other hand, we may know that an existing pharmaceutical or biologic product has a certain adverse event rate, and we're only interested in whether the adverse event rate of a new product is less than that of the existing product. The two-sided "equality" alternative is perhaps the simplest, and is useful when we're interested in whether the mean equals $\mu_0$.

1-Sided, greater than

$$H_0: \mu=\mu_0$$ $$H_1: \mu>\mu_0$$

1-Sided, less than

$$H_0: \mu=\mu_0$$ $$H_1: \mu<\mu_0$$

2-Sided, equality

$$H_0: \mu=\mu_0$$ $$H_1: \mu\neq\mu_0$$

Non-inferiority or Superiority

$$H_0: \mu\le\mu_0+\delta$$ $$H_1: \mu\gt\mu_0+\delta$$

Equivalence

$$H_0: |\mu-\mu_0|\ge\delta$$ $$H_1: |\mu-\mu_0|\lt\delta$$

The non-inferiority/superiority and equivalence hypotheses are a bit more complex at first sight. These are frequently found in the literature of clinical research. The non-inferiority/superiority test seeks to determine whether the mean is larger than $\mu_0$ by an amount of at least $\delta$. The reason this might be useful is that small differences may not be meaningful in a given subject-domain; for example, in a clinical research study, a small difference of "$d$" might be found to be "statistically significant" in a highly-powered study, but could be clinically insignificant in that a small difference would not effect treatment or policy decisions. Whether $\delta$ represents a "non-inferiority margin" or a "superiority" margin depends on whether $\delta$ is positive or negative, and on the context in the given subject domain. In an equivalence study we're only interested in whether the difference between the true and hypothesized means is larger than a specified margin $\delta$, regardless of direction (i.e. whether $\mu$ or $\mu_0$ is larger).

Further Notation

In our derivations we use standard notation that could be a bit confusing for less familiar readers. Since these notations are used in several articles dealing with deriving z-test formulas, we'll explain them here:

$z_a$ is the quantile of the standard normal distribution such that the area under the curve to the left of $z_a$ is $a$. These quantiles are easily obtained from a standard normal distribution table, e.g. this one. For example, if $\alpha=0.05$, then we have the familiar values of $z_{1-\alpha}=1.64$ and $z_{1-\alpha/2}=1.96$.

$\Phi$ is the standard normal distribution function. That is, if $X\sim N(0,1)$, then $Pr(X\le z)=\Phi(z)$.

$Pr(X\le z|H_0)$ means "the probability that $X$ is less than or equal to $z$ given that the null-hypothesis is true". For example, if the null hypothesis is that $X\sim N(\mu_0,\sigma^2_n)$, then $Pr\left(\frac{\displaystyle X-\mu_0}{\displaystyle\sigma_n}\le z|H_0\right)=\Phi(z)$.

Since there is no closed-form formula for $\Phi$ all software, including this site, uses an approximation. Fortunately, since the normal distribution is so central to modern statistics, a lot of work has been put into finding very good approximations for $\Phi$ by a lot of really smart statisticians. At the time of writing, our site uses the famous formula of Abramowitz & Stegun (1964) 7.1.26, which always gives the correct number to at least 7 decimal places -- anyone interested can view this in our javascript code.

Normal, Binomial, and Poisson Examples

Suppose the data are $Y_1, Y_2, \dots, Y_n \overset{iid}\sim N(\mu,\sigma^2)$. Then the test statistic is the average, $X=\bar{Y}=\frac{1}{n}\sum_{i=1}^{n}{Y_i},$ and we know that $$X=\bar{Y}\sim N(\mu,\sigma^2/n),$$ and thus $\sigma^2_n=\sigma^2/n.$ We may also use z-tests for a binomial success probability, $p$, and a Poisson rate, $\lambda$. If the data represent independent binary trials, each having the same success probability $p$, then $$X=\bar{Y}\sim N\left(p,\frac{\displaystyle p(1-p)}{\displaystyle n}\right);$$ if the data represent independent Poisson random variables, each having the same rate $\lambda$, then $$X=\bar{Y}\sim N\left(\lambda,\frac{\displaystyle\lambda}{n}\right).$$ The distribution of $\bar{Y}$ when the data are binomial and Poisson are large-sample approximations, and hold true due to the fact that $\bar{Y}$ is the maximum likelihood estimator of $p$ and $\lambda$, respectively.

Also, note that when the data are binomial or Poisson the question of whether or not we have a good guess for the variance is somewhat mitigated by the fact that the variance is a function of the mean. This is helpful because the power and sample size formulas use the null- and alternative-hypothesis values of the mean, which must be specified regardless of whether we know the variance.

Derivations for Specific Z-Tests

Normal Data

1-Sample

1-Sided

2-Sided

Non-Inferiority or Superiority

Equivalence

2-Sample

1-Sided

2-Sided

Non-Inferiority or Superiority

Equivalence

Binomial Data

1-Sample

1-Sided

2-Sided

Non-Inferiority or Superiority

Equivalence

2-Sample

1-Sided

2-Sided

Non-Inferiority or Superiority

Equivalence

Poisson Data

1-Sample

1-Sided

2-Sided

Non-Inferiority or Superiority

Equivalence

2-Sample

1-Sided

2-Sided

Non-Inferiority or Superiority

Equivalence

Concluding Remarks

This is a basic, elementary statistical topic, and there are many possible ways to present this material. We truly hope you found this content useful for your needs. We welcome all comments, suggestions, corrections, and questions. Please feel free to leave a note in the comments section below, or via email.