St. Jude Survivorship Portal: Sharing and Analyzing Large Clinical and Genomic Datasets from Pediatric Cancer Survivors

St. Jude Survivorship Portal

St. Jude Survivorship Portal: Analysis and Sharing of Large-Scale Clinical and Genomic Data of Pediatric Cancer Survivors

Research Background

In the United States, the five-year survival rate for childhood cancer has increased from about 60% in the 1970s to over 85% today. Despite the significant improvement in survival rates, these childhood cancer survivors face various adverse health outcomes due to cancer and its treatment. These adverse outcomes include premature death, organ dysfunction, new tumors, poor socioeconomic outcomes, psychosocial challenges, and overall decline in quality of life. To tackle these issues, primary research efforts focus on identifying the underlying causes, related risks, and the most susceptible patient subgroups.

Large-scale longitudinal studies like the St. Jude Lifetime Cohort (SJLife) and the Childhood Cancer Survivor Study (CCSS) have generated extensive comprehensive data on survivors, covering demographics, diagnosis, treatment, clinical assessments, chronic health conditions, self-reports, and whole-genome sequencing (WGS) data. These datasets are invaluable resources for the survivor research community and have contributed to hundreds of survivor study publications over the past 25 years.

Study Origin

This study, titled “St. Jude Survivorship Portal: Sharing and Analyzing Large Clinical and Genomic Datasets from Pediatric Cancer Survivors,” was completed by scholars Gavriel Y. Matt, Edgar Sioson, Kyla Shelton, Jian Wang, Congyu Lu, and others from St. Jude Children’s Research Hospital and other research institutions. The article was published in the journal “Cancer Discovery.”

Research Process

To improve the accessibility of cancer survivor data, the research team developed the St. Jude Survivorship Portal, the first portal for sharing and exploring pediatric cancer survivor data. This portal includes extensive clinical data and uniform germline WGS variant genotype data from over 7700 pediatric cancer survivors.

Data Sharing and Features

  1. Types of Data: The portal integrates data from two survivor cohorts, SJLife and CCSS, totaling 5053 SJLife survivors and 2688 CCSS survivors. The cohort data includes various variables ranging from demographics, cancer diagnosis, treatment, clinical assessments, chronic health conditions, to self-reports, encompassing over 1600 phenotypic variables and 400 million genetic variations.

  2. Data Visualization and Analysis: Users can access and explore these data using interactive data dictionaries and gene browsers within the portal. Summary statistics of the variables are dynamically calculated and visualized through customizable interactive charts, including bar charts, violin plots, and scatter plots.

  3. Analysis Tools: The portal provides a variety of analysis tools, including cumulative incidence analysis and regression analysis. Users can use these tools for real-time survivor data analysis. Additionally, a controlled access interface allows users to download individual survivor data for offline analysis.

Detailed Workflow Introduction

  1. Data Acquisition: The relevant data is first obtained from the two main data sources (SJLife and CCSS).

    • SJLife: Includes survivors treated at St. Jude Children’s Research Hospital from 1962 to 2012.
    • CCSS: Includes survivors treated at 31 pediatric oncology institutions in the United States and Canada from 1970 to 1999.
    • Whole-genome sequencing (WGS) data: Conducted on blood samples and covered with over 30x consistency.
  2. Data Processing and Storage: The collected phenotypic and genotypic data are processed and stored.

    • Phenotypic data: Organized into hierarchical data dictionaries for easy user browsing.
    • Genotypic data: Generated through variant calling and joint genotype analysis.
  3. Portal Development and Implementation: Advanced engineering technologies ensure that users can explore and analyze data in real-time within the portal.

    • Technology Stack: JavaScript language, Node.js environment, SQLite, etc.

Analysis and Results

  1. Phenotypic and Genetic Data Analysis: Explore survivor cohort data through the portal’s data dictionary and gene browser. Users can choose different variables to view their summary statistics and visualize the data through interactive charts. For example, users can analyze genetic variations in a specific cancer diagnosis group and their relationship with patient ethnicity.

  2. Ototoxicity Analysis: The research team used the portal’s grouping and summary plotting functions to study the ototoxic effects of platinum chemotherapy drugs (such as cisplatin and carboplatin). The analysis showed that survivors treated with cisplatin are more likely to experience severe hearing loss, while the ototoxicity of carboplatin is relatively lower.

  3. Psychological Health and Amputation Correlation: Researchers used the regression analysis feature to find new interactions between psychological health, age, and amputation. Data showed that the psychological health of amputee survivors is affected by age, with younger patients at higher psychological health risk, possibly because older patients adapt and recover better after amputation.

  4. Cardiomyopathy Cumulative Incidence and Genetic Association Analysis: Studies showed that the higher risk of cardiomyopathy in African-American survivors is mainly driven by males. Additionally, by studying the NRG1 gene locus, significant variations associated with cardiomyopathy were found in African-American survivors.

Research Highlights

  1. Open Data Access: Including real-time data exploration and analysis functions, the St. Jude Survivorship Portal is the first portal to openly share and explore pediatric cancer survivor data, providing powerful research tools for the research community.

  2. Scientific Value of New Discoveries: Through the Portal, researchers can easily verify and explore association studies such as the ARID5B gene, as well as newly discovered haplotypes of the Magi3 gene locus associated with cardiomyopathy risk, promoting research transparency and reproducibility.

  3. Future Expansion Plans: Plans include incorporating longitudinal data, single-cell multi-omics data, and image data into the Portal, continually enriching data types and functions, further expanding research cohorts, improving diagnostic and treatment standards, and driving more in-depth survivor research.

The St. Jude Survivorship Portal greatly advances pediatric cancer survivor research by providing access to a large amount of clinical and genomic data and analysis tools. Future expansion plans will further enhance its research value, providing solid data support for long-term studies.