Sample Size Calculation and Statistical Analysis

Back to Introduction || Table 2/3

Table 1. Examples of Analysis Approaches/Statistical Tests to Use Depending on Number and Type of Dependent and Independent Variables

N, Type of Independent Variables

Type of Dependent Variable(s)

Types of Analyses/Test(s)

Power Analysis Example

One dependent variable

 0 IVs (1 group/population)

interval & normal

one-sample t-test

PA1

ordinal or interval

one-sample median

 

categorical (2 categories)

binomial test

PA2

categorical

 chi-square goodness-of-fit

PA2

 1 IV with 2 levels (independent groups)

interval & normal

2 independent sample t-test

PA3

 ordinal or interval

2 independent sample t-test

PA3

Wilcoxon-Mann Whitney test

 

 categorical

 chi- square test

 

Fisher's exact test

 

1 IV with 2 or more levels (independent groups)

interval & normal

one-way ANOVA

 

ordinal or interval

Kruskal Wallis

 

categorical

chi- square test

PA4

1 IV with 2 levels (dependent / matched groups)

interval & normal

paired t-test 

PA5

 ordinal or interval

Wilcoxon signed ranks test 

 

 categorical

McNemar

 

1 IV with 2 or more levels (dependent  / matched groups)

interval & normal

repeated measures ANOVA

PA9

ordinal or interval

Friedman test

 

categorical

GEE (e.g. repeated measures logistic regression)

PA9

2 or more IVs (independent groups)

interval & normal

multi-factor ANOVA

general linear model (GLM)

multilevel/hierarchical model (MLM/HLM)

structural equation model (SEM)

PA9

ordinal or interval

(treat DV as interval/normal or create categories—use analyses above or below)

 

categorical

logistic regression

PA8

2 or more IVs (dependent groups / repeated measures)

interval & normal

GLM, MLM/HLM, growth models,

growth mixture models

PA10

PA11

 

categorical

generalized estimating equations (GEE), MLM/HLM, growth/mixture models

PA10

PA11

1 interval IV

interval & normal

correlation 

PA6

1 or more interval IVs and/or
1 or more categorical IVs

ordinal or interval

linear/non-linear regression

PA7

 non-parametric correlation

 

categorical

logistic regression

PA8

interval & normal

multiple regression

PA7

1 or more interval IVs and/or
1 or more categorical IVs

categorical

analysis of covariance, GLM, MLM/HLM

PA10, PA11

logistic regression, GEE

PA8

categorical

Dichotomized (yes vs. no)

discriminant analysis

 

Survival (time to event) analysis

PA12

1 IV with 2 or more levels
(independent groups)

interval & normal

MANOVA, GLM, SEM

 

More than one dependent variable

   

2 or more

interval & normal

regression, SEM

 

0

interval & normal

canonical correlation

 

0

categorical

latent variable models for categorical data

 

Two sets of two or more dependent variables

   

0

interval & normal

factor analysis

latent variable models

 

IV: independent variable; GEE: generalized estimating equations; ANOVA: analysis of variance; MANOVA: Multiple analysis of variance; SEM: structural equation model; GLM: generalized linear model; MLM: multilevel model; HLM: hierarchical linear model; *This chart was adapted from http://www.ats.ucla.edu/stat/mult_pkg/whatstat/choosestat.html which, in turn, was adapted from 'Choosing the Correct Statistic', http://bama.ua.edu/~jleeper/627/choosestat.html by James D. Leeper, Ph.D. 

Top of Page

Examples of power analysis

Each example must be specialized to your research design/analysis, e.g. with relevant changes to alpha and power, sample and group sizes, detectable effect size, references and/or software used for calculations, and references to supporting research.

 

PA 1. Descriptive (means)

Example 1: A sample size of (e.g., 150) should allow description of ASI drug use severity with a 95% confidence interval of +-.10 (e.g. mean=.85+-.10) assuming a standard deviation of about 0.65 as found in [provide a reference related to your study here].

Example 2: the sample size of (e.g., 1,200) will be sufficient to precisely estimate an ASI scores as small as (e.g., 0.08) standard deviations.

Example 3: mean of a continuous variable, such as drug use severity will be precisely estimated in a range of (e.g., 0.16) standard deviations at alpha of 0.05 and power of 0.80 for the sample size of (e.g., 240)  

Example 4: From previous studies, standard deviation of mean ASI may range from 0.1 to 0.2. Sample sizes for each subgroup by county can detect treatment effect of small to medium size. For example, a sample size of (e.g., 240 patients) per county will allow detection of a significant treatment effect when the mean change on ASI score is as small as (e.g., 0.16) standard deviation, given an alpha level of 0.05 and power of 80%.

Top of Page

PA2.  Descriptive (%)

Example 1: A sample size of 100 should allow description with a 95% confidence interval of +-8% on questions with percentage response at about 50% yes/50% no.  The confidence interval will be narrower for questions with a more extreme response split.

Example 2: Rates of categorical variables (e.g. cessation rate) will be precisely estimated in a range of  (e.g., 8.4%) for the sample size of (e.g., 276), given an alpha level of 0.05 and power of 80%.

Example 3: The sample (e.g., 215) will also be sufficient to precisely estimate rates (e.g., morbidity, mortality).  Given an alpha level of 0.05 and power of 80%, a low rate of 10 % will be estimated with a range of  ± 6 %, a moderate rate of 30 % will be estimated within a range of  ± 9 %, and an extreme rate of 50 % will be estimated within a range of  ± 10 %.

 

PA3. Compare means (independent t-test)

Example 1: A sample size of (e.g., 75 per group) should allow detection of medium effects (of about d= 0.46) with power of 0.80, when comparing the experimental and control groups on the CESD scale using a t-test with 2-tailed a=0.05.  This effect size translates to a difference of about 1.75 points on the CESD assuming a standard deviation of 3.80 (as was found in a related intervention by [provide a reference related to your study here]).

Example 2: the sample of (e.g., 296) per group will enable us to detect group differences of small effects (d = 0.17) with 2-tailed alpha of 0.05 and power of 0.80

Example 3: the sample of (e.g., 215 African-Americans and 84 white) will provide moderate power in estimating difference on means of continuous variables (e.g. severity index) or difference on rates (e.g. morbidity).   The detectable difference of means will be (e.g., 0.36) standard deviations or larger with alpha level of 0.05 and power of 80%.  The detectable difference in rates will be 18%, 17% and 13 % when rates in the overall population are 50%, 30% and 10%, respectively. 

Top of Page

PA4. Compare %

Example 1: A sample size of 100 per group should allow detection of medium effects (e.g., h=0.4) with power of 0.80 when comparing groups on percent return to work, using chi square with 2-tailed alpha=0.05; we should thus be able to detect as significant a 50% return to work in intervention group if the control group shows 31% as found in [provide a reference related to your study here].

Example 2: the sample of (e.g., 296) per group will enable us to detect percentage differences between groups of 8% with 2-tailed alpha of 0.05 and power of 0.80.

Example 3: The detectable difference between groups in rates will respectively be 18%, 17% and 13 % when rates in the overall population are 50%, 30% and 10% the sample of (e.g., 215 and 84 in each group) with 2-tailed alpha of 0.05 and power of 0.80. 

 

PA5. Compare means (paired t-test)

Example 1: A sample size of 75 after expected attrition should allow detection of a 1.75-point decrease in CESD scores (medium effect) with power>= 0.99, when comparing post-test to pre-test using a paired t-test with 2-tailed alpha=0.05 (assuming within group standard deviation of 3.8 as found in [provide a reference related to your study here], with correlation of 0.5 from pre-to-post test scores).

Top of Page

PA6. Correlation

Example 1: The target sample size of 150 should allow detection of small-to-medium correlation effects of about r=.24 with power 0.80 and 2-tailed alpha=0.05.  Correlations of this magnitude were found in studies of males by [provide a reference related to your study here].

Example 2: In a simple regression analysis, the detectable correlation of 0.2 or larger requires a sample size of 195 with 2-tailed alpha of 0.05 and power of 0.80.

 

PA7. Regression

Example 1: A sample size of (e.g., 92) should allow detection of medium effects with power of 0.80 when testing the model predicting depression from 5 independent variables at alpha=0.05; a medium effect translates to R2=.13 or 13% of variance accounted for.  The simpler model examined by [provide a reference related to your study here] accounted for 11% of the variance in depression scores.

Example 2: The required sample size for a multiple regression with p covariates will increase, depending on the partial multiple correlation of p-1 covariates. A sample size of (e.g., 240 per county) will allow to detect a correlation of (e.g., 0.19) given a partial multiple correction less than (e.g., 0.03) with 2-tailed alpha of 0.05 and power of 0.80.

Example 3: the sample of (e.g., 1184) will be for detection of small effects (e.g., R2=.02) in multiple regression analysis with up to (e.g., 25) covariates with 2-tailed alpha of 0.05 and power of 0.80.

Top of Page

PA8. Logistic Regression

Example 1: A sample size of 200 should allow detection of an odds ratio for a single predictor of approximately 1.6 with power of 0.80, assuming an outcome of [your description here, e.g. “lab values less than xxx”] at 33%, and with alpha=0.05.  The detectable effect size would increase somewhat for assessing a single predictor within a multivariate model, depending on the correlation of the target predictor with the other covariates in the model; e.g. if the target predictor was modestly correlated (about 0.20) with other covariates, then the detectable odds ratio would increase slightly to about 1.7. 

Example 2: Using logistic regression to assess predictors of drug use status (e.g. abstinence), the sample of (e.g., 1184) will allow detection of odds ratios of (e.g., 1.46-1.54) with alpha of 0.05 and power of 0.80 for a predictor controlling for other predictors, assuming moderate correlations of 0.1-0.5 among predictors and assuming a small proportion (e.g., 5%) of respondents who become abstinent.  Smaller odds ratios of (e.g., 1.32-1.37) will be detectable with higher abstinent rate (e.g. 25%). 

 

PA9. General Linear Model (with repeated measures)

Example 1: For bodyweight, the analysis sample size will allow detection of an average 5% decrease in weight, a target level of clinical significance proposed by the Institute of Medicine (Committee to Develop Criteria for Evaluating the Outcomes of Approaches to Prevent and Treat Obesity, Food and Nutrition Board, 1995). Our calculation was based on the following: an expectation that about 60% of likely weight loss in the intervention group would occur during the intervention, the remainder during the next 6 months, with maintenance through the 12-month follow-up; a small, consistent weight increase (about 1 lb. per year) in the control group as commonly seen in adult women (Sternfeld, Wang, Quesenberry et al., 2004); likely baseline weight from pilot data (n=45) from the target community (mean weight=158.54, standard deviation 27.94); and correlation of .50 of weight over time as found in the pilot data. For analysis with a general linear model with repeated measures, using alpha=.05, the proposed sample size will achieve power=.80 for detecting a significant difference between groups in the weight patterns described above that include the 5% weight decrease for the intervention group (Cohen, 1988; Woodward et al., 1990).

Top of Page

PA10. Random effect regression for repeated measures

Example 1: The target sample size of 120 per group will allow the detection of small-to-medium effects of about d=.32 in detecting a difference in patterns over time between the maintenance group and each of the other two groups, with power=.80 and one-tailed alpha=.05 assuming a moderate correlation of .50 over time and approximately 15% attrition (Hedeker et al., 1999). A slightly larger effect size (d=.34) would be detectable when comparing outcomes at a single point in time (e.g. percentage who present for treatment post-release) allowing for attrition. This effect size translates to a difference in percent between groups of about 15% (e.g. 25% vs. 40%).

Example 2: The proposed sample size of n=75 per group will allow detection of  medium effects of about d=.40 in group differences over time, assuming a moderate correlation of .50 over time and allowing attrition of up to 20% (one-tailed alpha=.05, power=.80) (Hedeker et al., 1999). For the exploratory hypothesis on ART adherence in Aim 3 including only the subsample of HIV+ subjects (approximately 50% of the sample), somewhat larger effects (about d=.58) will be detectable.

Example 3: For exploratory hypothesis testing in this pilot study comparing the electronic monitoring/feedback group to each of the other two treatment arms, the sample size of n=33 per group (total n=99) will allow detection of medium effects (approximately d=.61) with power=.80 and one-tailed alpha of .05, using random effects regression for repeated measures and assuming a moderate correlation of .5 and up to 10% attrition (Hedeker et al., 1999). [from a proposal by D. Koniak-Griffin (PI)]

 

PA11. Cluster sampling/repeated measures

Example 1:  For assessing outcomes measured at the school level or available in aggregate school-level form, the proposed sample size of 24 schools (clusters) will allow the detection of large effects (approximately d=1.0) in testing program differences in change over time, with power=0.80 and 1-tailed alpha=0.05 assuming moderate correlation over time (Hedeker et al., 1999). Other studies have also found a large effect with similar interventions [insert ref. here]; we expect that the longer duration of the proposed intervention will produce an effect of even greater magnitude. For assessing program differences in student-level outcomes, the sample size of approximately 300 students per school will allow detection of  small effects (approximately d=0.25 to 0.35) in repeated measures analyses, accounting for cluster randomization and assuming intraclass correlations of 0.05 to 0.10 (Raudenbush, 1997; Spybrook, Raudenbush et al., 2009). With this large sample, there should be little impact of student attrition on detectable effect size.

 

PA12. Survival Analysis (Cox Model)

Example 1: Assuming moderate partial correlations of 0.1 to 0.5 among predictors, the sample of (e.g., 5,419) will allow detection of hazard ratios of (e.g., 1.21 to 1.28) in a survival analysis with 2-tailed alpha of 0.05 and power of 0.80. 

Example 2: For the event history analysis on outcome variables (e.g. eventual cessation), the sample of (e.g., 192) would enable us to detect hazard ratios of (e.g., 1.89 to 2.84) in comparing survival curves between two groups (e.g. Whites and Hispanics), with 2-tailed alpha of 0.05, power of 0.80 and cessation rate of (e.g., 15 to 40%).

Top of Page || Back to Introduction || Table 2/3