Same Data, Different Analysts: Variation in Effect Sizes Due to Analytical Decisions in Ecology and Evolutionary Biology

Same Data, Different Analysts: The Impact of Analytical Decisions on Effect Sizes in Ecology and Evolutionary Biology

Research Background and Question

In scientific research, particularly in the fields of ecology and evolutionary biology, the replicability and reliability of results are crucial. However, even when using the same dataset and similar research questions, variations in statistical analysis decisions among different researchers can lead to significant differences in results. This phenomenon has been observed not only in ecology and evolutionary biology but also in psychology and social sciences. To explore this issue, Gould et al. (2025) published a study titled “Same Data, Different Analysts: Variation in Effect Sizes Due to Analytical Decisions in Ecology and Evolutionary Biology” in the journal BMC Biology.

The study aims to evaluate how different analysts handling the same dataset can produce varying effect sizes and model predictions due to their analytical decisions. By comparing multiple analysts’ results on the same dataset, the researchers hope to uncover the causes of these variations and explore ways to enhance the reliability and consistency of studies in ecology and evolutionary biology.

Source and Author Information

This paper was co-authored by Elliot Gould, Hannah S. Fraser, Timothy H. Parker, and many other scientists from various global research institutions. Key authors include Timothy H. Parker and Fiona Fidler from Whitman College, and Peter A. Vesk from Monash University. The paper was published in the journal BMC Biology in 2025.

Research Process and Methods

Study Subjects and Datasets

Researchers selected two unpublished datasets for analysis:

  1. Blue Tit Dataset: This dataset comes from a study on the breeding behavior of wild blue tits (Cyanistes caeruleus) in Wytham Wood, UK, involving 332 nests over the years 2001-2003. The goal was to investigate the relationship between nestling growth and sibling competition.

  2. Eucalyptus Dataset: This dataset originates from a vegetation restoration project in the Goulburn Broken Catchment region of Victoria, Australia, covering 351 quadrats surveyed between 2006 and 2007. The aim was to examine the impact of grass cover on eucalyptus seedling recruitment.

Analysis Procedure

Recruiting Analysts

The researchers recruited 174 analyst teams, comprising 246 analysts, through various channels such as academic conferences, social media, and email lists. Each team chose one of the two datasets to analyze and answered predefined research questions. To ensure quality, they also recruited volunteers to peer review the methods used by other analysts.

Data Processing and Analysis

Each analyst team independently analyzed their chosen dataset according to their preferred methods and submitted detailed analysis reports. To ensure comparability, researchers required analysts to provide standardized effect sizes (zr) and predicted values (yi) based on three levels of independent variables. Specific steps included:

  1. Calculating Standardized Effect Size zr: For linear or generalized linear models, convert t-values and degrees of freedom (df) into correlation coefficients ®, then transform them into Fisher’s zr.
  2. Generating Predicted Values yi: Generate point estimates for the dependent variable at the 25th percentile, median, and 75th percentile of the primary independent variable.

Result Analysis

Researchers used random-effects meta-analysis techniques to synthesize all submitted effect sizes and predicted values. The main analyses included:

  1. Descriptive Statistics: Calculating means, standard deviations, and ranges for fixed effects, interaction terms, random effects, and sample sizes in each model.
  2. Heterogeneity Assessment: Quantifying absolute (τ²) and relative (I²) heterogeneity among effect sizes.
  3. Deviation Explanation: Evaluating factors such as peer ratings of methods, distinctiveness of predictor variables, and inclusion of random effects that explain deviations from the mean effect size.

Research Results

Distribution of Effect Sizes

For the blue tit dataset, although most (118131) usable zr effects showed that nestling growth decreased with increased sibling competition, there was significant variability in effect strength and direction. The zr ranged from -1.55 to 0.38, with approximately 93 out of 118 effects having confidence intervals excluding zero. For the eucalyptus dataset, the distribution of effects was more dispersed, with zr ranging from -4.47 to 0.39, mostly clustering around zero, indicating no clear relationship between grass cover and eucalyptus seedling success.

Distribution of Predicted Values

Predicted values for the blue tit dataset, after z-score standardization, varied widely, exceeding one standard deviation. For instance, in the y25 scenario, predictions ranged from -1.84 to 0.42; in the y75 scenario, they ranged from -0.03 to 1.59. Predicted values for the eucalyptus dataset, on the original count scale, ranged from 0.04 to 26.99, 0.04 to 44.34, and 0.03 to 61.34 across the three scenarios.

Quantifying Heterogeneity

Using τ² and I² metrics, researchers found substantial heterogeneity among effect sizes. For the blue tit dataset, τ² was 0.08, and for the eucalyptus dataset, it was 0.27, both exceeding the median value (0.105) found in previous meta-analyses. This suggests that analytical decisions significantly contribute to heterogeneity.

Research Conclusions

The study demonstrates that different analytical decisions can indeed lead to significant variations in effect sizes. Even with the same dataset, different analysts may produce markedly different results. Therefore, future research should focus more on method selection to improve reliability and consistency.

Additionally, the study underscores the importance of transparency and openness. Sharing data and analysis codes can encourage more participation and reduce uncertainty introduced by analytical decisions. Future research could further explore optimizing analysis processes to minimize heterogeneity and enhance replicability.

Research Highlights

  1. First Large-Scale Exploration: This is the first large-scale “many-analyst” study in ecology and evolutionary biology, revealing the significant impact of analytical decisions on effect sizes.
  2. Broad Participation: The study attracted numerous scientists worldwide, ensuring diversity and representativeness in analytical methods.
  3. Innovative Methodology: Novel methods such as meta-analysis and prediction value generation were employed to ensure accurate and reliable results.
  4. Important Implications: The findings offer valuable insights for future research, emphasizing the importance of transparency and openness to improve reliability and consistency.

This study not only highlights the impact of analytical decisions on effect sizes but also provides valuable experience and guidance for future research, offering significant scientific value and application prospects.