Comparative Analysis of Methodologies and Approaches in Recommender Systems Utilizing Large Language Models

Academic Background

With the explosive growth of internet information, recommender systems (RSs) have become indispensable in modern digital life. Whether it’s movie recommendations on Netflix or personalized news feeds on social media, recommender systems are reshaping users’ online experiences. However, traditional recommender systems face numerous challenges, such as data sparsity, cold-start problems, scalability, and lack of explainability. In recent years, large language models (LLMs) have made significant strides in the field of natural language processing (NLP), prompting researchers to explore how these models can be applied to recommender systems to leverage their powerful text representation capabilities and rich knowledge bases to address these issues.

This paper aims to provide a comparative analysis of various methods that apply LLMs to recommender systems, exploring the effectiveness, strengths, and potential limitations of these approaches. Through systematic classification and evaluation, the paper offers important references and insights for future research.

Source of the Paper

This paper is co-authored by Marwa A. Shouman, Hamdy K. Elminir, and Gamal Eldin I. Selim, all of whom have conducted in-depth research in the fields of recommender systems and large language models. The paper was published in 2025 in the journal Artificial Intelligence Review, with the DOI 10.1007/s10462-025-11189-8.

Main Content of the Paper

1. The Role of LLMs in Recommender Systems

The application of LLMs in recommender systems is mainly reflected in two aspects: feature encoding and recommendation generation. As feature encoders, LLMs can extract features from textual data (e.g., reviews, descriptions) to generate representations of users and items. For example, the BERT model, with its bidirectional contextual understanding capabilities, can better represent the textual features of users and items. As recommendation generators, LLMs can generate recommendation lists or scores based on user historical behavior and context. For instance, GPT models can dynamically generate recommendations through autoregressive generation techniques.

2. Learning Paradigms

The adaptability of LLMs to recommendation tasks is primarily achieved through the following learning paradigms:

  • Pretraining: LLMs are pretrained on large-scale text data to learn language structure and semantics. For example, BERT4Rec is pretrained on masked user behavior prediction tasks to capture the context of user behaviors.
  • Fine-tuning: Based on pretraining, LLMs are fine-tuned using task-specific data to adapt to recommendation tasks. For instance, the P5 model performs multiple recommendation tasks, such as rating prediction and review summarization, through multi-task instruction fine-tuning.
  • Tuning-free prompting: By designing specific prompts, LLMs can perform recommendation tasks without modifying model parameters. For example, the NIR model guides GPT-3 to generate movie recommendations through a multi-step prompting strategy.

3. Datasets and Evaluation Metrics

The paper provides a detailed introduction to commonly used datasets in recommender system research, such as Amazon product reviews, MovieLens, and Yelp, and discusses the characteristics and challenges of these datasets. Additionally, the paper introduces commonly used evaluation metrics in recommendation tasks, such as mean squared error (MSE), mean absolute error (MAE), and normalized discounted cumulative gain (NDCG), as well as BLEU and ROUGE scores for language generation tasks.

4. Research Findings and Discussion

Through a comparison of various LLM-based recommendation methods, the paper draws the following conclusions:

  • Adaptability: LLMs can adapt to downstream tasks with minimal data and perform well in cross-domain recommendation tasks. For example, the P5 model can provide effective recommendations in unseen domains.
  • Cold-start problem: LLMs can effectively mitigate the cold-start problem through textual features and large-scale pretraining knowledge. For instance, Sanner et al. demonstrated that few-shot learning can rival traditional recommendation methods in user cold-start scenarios.
  • Explainability: The interactivity and text generation capabilities of LLMs make them excel in recommendation explanation tasks, generating coherent and contextually relevant explanations that enhance user trust in the system.

5. Limitations

Despite the great potential of LLMs in recommender systems, there are still some limitations:

  • Disparity in learning objectives: The pretraining objectives of LLMs differ from the task objectives of recommender systems, limiting their understanding of user-item relationships.
  • Context length limitations: The fixed context window of LLMs restricts their application in long-sequence recommendation tasks.
  • Hallucination and output format issues: LLMs may generate meaningless or undesired outputs, which need to be mitigated through prompt engineering and post-processing modules.
  • Computational cost: The training and fine-tuning processes of LLMs require substantial computational resources, and access to APIs for some models can be costly.

Research Highlights

The main highlights of this paper lie in its systematic comparison of various LLM-based recommendation methods and its provision of important references for future research through classification and evaluation. Specifically, the paper’s innovations include:

  • Classification framework: The paper proposes a classification framework based on the roles of LLMs in recommender systems, learning paradigms, and system structures, providing researchers with a clear perspective.
  • Multi-task recommendation frameworks: The paper explores how to build unified recommendation frameworks, such as P5 and M6-Rec, through multi-task learning, showcasing the potential of LLMs in diverse recommendation tasks.
  • Cold-start and explainability: The paper provides a detailed analysis of the advantages of LLMs in cold-start and recommendation explanation tasks, offering important insights for practical applications.

Conclusion and Significance

Through a comprehensive comparison of LLM-based recommendation methods, this paper reveals the potential and challenges of these methods in recommender systems. With their powerful text representation capabilities and adaptability, LLMs can effectively address many issues faced by traditional recommender systems, such as cold-start problems and explainability. However, issues such as the disparity in learning objectives, context length limitations, and computational costs still require further research. This study provides important theoretical foundations and practical guidance for future recommender system research, promoting the application and development of LLMs in the field of recommendations.