Multi-Task Aquatic Toxicity Prediction Model Based on Multi-Level Features Fusion
Academic Background
With the growing threat of organic compounds to environmental pollution, studying the toxic responses of different aquatic organisms to these compounds has become crucial. Such research not only helps assess the potential ecological impacts of pollutants on the overall aquatic ecosystem but also provides significant scientific foundations for environmental protection. While traditional experimental methods can offer some data, they are costly, time-consuming, and challenging to apply to large-scale toxicity assessments of chemical substances. With the rapid development of deep learning techniques, they have demonstrated higher accuracy, faster data processing speeds, and better generalization capabilities in predicting aquatic toxicity. However, existing methods still face limitations in handling high-dimensional feature data, particularly in capturing complex molecular structures and interactions. Therefore, developing a multi-task deep learning model capable of simultaneously predicting toxicity across multiple aquatic species has become an important research focus.
Paper Source
This paper was jointly authored by Xin Yang, Jianqiang Sun, Bingyu Jin, and others, affiliated with institutions such as the University of Science and Technology Liaoning, the University of Chinese Academy of Sciences, and Linyi University. The paper was published in Journal of Advanced Research in 2025, titled “Multi-task Aquatic Toxicity Prediction Model Based on Multi-level Features Fusion.”
Research Process
This study proposes a multi-task deep learning model named ATFPGT-Multi for simultaneously predicting the acute toxicity of organic compounds across four different fish species. The detailed research process is as follows:
1. Data Preparation
The researchers collected data on four fish species (Bluegill Sunfish, Rainbow Trout, Fathead Minnow, and Sheepshead Minnow) from the ECOTOX database. To ensure data quality, the chemical structures were standardized, and inorganic compounds, salts, and outliers were removed. The final datasets contained 988, 1246, 938, and 346 compound samples, respectively.
2. Molecular Feature Extraction
The ATFPGT-Multi model integrates two molecular representation methods: Molecular Fingerprint and Molecular Graph.
- Molecular Fingerprint Features: The researchers used Morgan, MACCS, and RDKit fingerprints to encode compound information, with feature selection performed via a Multi-layer Perceptron (MLP).
- Molecular Graph Features: Molecular graph features were extracted using a combination of Graph Neural Network (GNN) and Transformer. The researchers designed Local Map and Global Map to represent molecular structures and captured both local and global information through graph convolutional layers and Transformer layers.
3. Feature Fusion and Multi-task Classification
After fusing molecular fingerprint and molecular graph features, the model generated comprehensive features through fully connected layers and created independent output layers for each fish dataset to achieve multi-task classification.
4. Model Training and Evaluation
The researchers adopted a five-fold cross-validation method to evaluate the model’s performance, using metrics such as Accuracy (ACC), Recall (RE), Precision (PR), and AUC. Additionally, ablation experiments were conducted to study the impact of different modules on model performance.
Key Results
- Advantages of Multi-task Learning: Compared to the single-task model ATFPGT-Single, ATFPGT-Multi achieved AUC improvements of 9.8%, 4%, 4.8%, and 8.2% on the four fish datasets, respectively. This demonstrates that multi-task learning significantly enhances predictive performance by sharing features and enabling knowledge transfer.
- Comparison with Other Methods: Compared to traditional machine learning methods and Graph Convolutional Neural Networks (GCN), ATFPGT-Multi outperformed in all evaluation metrics, particularly excelling in capturing global molecular information.
- Interpretability: ATFPGT-Multi can identify molecular fragments associated with toxicity through an attention mechanism, providing intuitive insights into the relationship between molecular structure and toxicity.
Conclusion and Significance
The ATFPGT-Multi model significantly improves the accuracy and reliability of aquatic toxicity prediction by integrating multi-level molecular features and multi-task learning. The model not only provides a crucial tool for assessing the potential risks of organic compounds to aquatic ecosystems but also offers scientific support for the environmental safety evaluation and design of chemicals. Moreover, its interpretability broadens its application prospects in toxicity mechanism research and chemical optimization.
Research Highlights
- Multi-task Learning: By sharing features and creating independent output layers, the model can simultaneously predict toxicity across multiple fish species, significantly enhancing generalization capabilities.
- Multi-level Feature Fusion: Combining molecular fingerprint and molecular graph features allows the model to comprehensively capture the complex structures and interactions of molecules.
- Interpretability: Through the attention mechanism, the model can identify molecular fragments related to toxicity, providing new perspectives for toxicity mechanism research.
- Broad Application Prospects: The model can be applied not only to environmental toxicity assessment but also to the safe design of chemicals.
Other Valuable Information
The researchers have publicly released the model’s code and datasets on GitHub (https://github.com/zhaoqi106/atfpgt-multi), facilitating further research. Additionally, the study received support from the Ministry of Science and Technology of China, the National Natural Science Foundation of China, and the Natural Science Foundation of Liaoning Province.
Through this study, we not only witness the immense potential of deep learning in aquatic toxicity prediction but also gain new insights and methods for future chemical safety assessments.