A cautionary note on the finite sample behavior of maximal reliability.

Several calls have been made for replacing coefficient α with more contemporary model-based reliability coefficients in psychological research. Under the assumption of unidimensional measurement scales and independent measurement errors, two leading alternatives are composite reliability and maximal reliability. Of these two, the maximal reliability statistic, or equivalently Hancock’s H, has received a significant amount of attention in recent years. The difference between composite reliability and maximal reliability is that the former is a reliability index for a scale mean (or unweighted sum), whereas the latter estimates the reliability of a scale score where indicators are weighted differently based on their estimated reliabilities. The formula for the maximal reliability weights has been derived using population quantities; however, their finite-sample behavior has not been extensively examined. Particularly, there are two types of bias when the maximal reliability statistic is calculated from sample data: (a) the sample maximal reliability estimator is a positively biased estimator of population maximal reliability, and (b) the true reliability of composites formed with maximal reliability weights calculated from sample data is on average less than the population reliability. Both effects are more pronounced in small-sample scenarios (e.g., <100). We also demonstrate that the composite reliability estimator for equally weighted composite exhibits substantially less bias, which makes it a more appropriate choice for the small-sample case. (PsycINFO Database Record (c) 2019 APA, all rights reserved)