P value adjustments for multiple tests in multivariate binomial models

Peter H. Westfall, S. Stanley Young

Research output: Contribution to journalArticlepeer-review

167 Scopus citations

Abstract

Data from rodent carcinogenicity (preclinical) and clinical studies involving new drugs may be modeled as having come from multivariate binomial distributions. In two-year rodent carcinogenicity studies, there are typically 20–50 tissues examined for occurrence of any of several possible lesions. For a particular treatment group, the number of occurrences of a particular lesion at a particular tissue may be modeled as binomial, and the vector of such frequencies may be considered multivariate binomial with unspecified dependence structure. The same model may also apply to clinical side-effects data; in this case the marginal frequencies may represent occurrences of events ranging from headaches to ingrown toenails. Frequently, the goal of such studies is to isolate site-specific significant differences between treatment and control groups. For example, in rodent carcinogenicity analyses it is generally not sufficient to claim that a new compound causes an increase in tumors at some unspecified site; rather, the report should identify the particular sites where unusual increases are noted. Such an analysis requires separate tests for each site. False significances may easily occur when multiple tests are performed. When a marginal significance criterion p ≤ .05 is used, experimentwise false significance rates as large as 44% have been reported (Haseman, Winbush, and O’Donnell 1986). Others have reported the experimentwise false significance rate much lower; for example, Gart, Chu, and Tarone (1979) reported 8%–10% for each sex and species combination of a two-sex, two-species experiment. In this article it is proposed that the experimentwise false significance rate be controlled by adjusting all p values for the multiplicity of testing using vector-based resampling methods. This analysis is an extension of the bootstrap method described by Westfall (1985) to the multisample case, with particular application to models useful in clinical and preclinical biopharmaceutical analyses; it is also similar to the methodology proposed by Brown and Fears (1981). Assuming no differences between treatment and control groups (the null case), one may estimate the multivariate binomial distribution or permutation distribution conveniently via vector resampling. Using this estimated distribution, one may easily estimate (via Monte Carlo) the probability that the smallest p value in the study is smaller than any given threshold. An adjusted p value is then defined as the probability that the smallest p value in the study is less than or equal to the observed p value for the given test. This methodology is compared to the usual Bonferronistyle adjustments, and it is demonstrated that these adjustments are grossly conservative in certain instances because of their failure to account for dependence between tests and the discreteness of the data. Results of bootstrap and permutation resampling adjustments tend to be similar, particularly for large sample sizes. The approaches are philosophically different: Bootstrap resampling is preferable if an unconditional analysis is desired [Upton (1982) demonstrated that nominal and actual Type I errors are closer and that statistical power is greater in the univariate two-sample case] whereas permutation resampling gives essentially exact results and is preferable if a conditional analysis is desired [Yates (1984) gave philosophical arguments for favoring the conditional approach].

Original languageEnglish
Pages (from-to)780-786
Number of pages7
JournalJournal of the American Statistical Association
Volume84
Issue number407
DOIs
StatePublished - Sep 1989

Keywords

  • Bootstrap
  • Clinical trial
  • Permutation test
  • Rodent carcinogenicity study
  • Simultaneous test procedure

Fingerprint

Dive into the research topics of 'P value adjustments for multiple tests in multivariate binomial models'. Together they form a unique fingerprint.

Cite this