[Paper Review] ReAct: Synergizing Reasoning and Acting in Language Models (ICLR 2023)

11 minute read

Summary: They propose ReAct, a method that enables greater synergy between reasoning traces and task-specific actions by generating them in an interleaved manner.

paper (ICLR 2023)

1. Introduction

Background

Human intelligence naturally integrates reasoning and acting to perform tasks.
For example, when cooking, one might think, “Since I’ve chopped all the ingredients, I should now boil the water,” (reasoning) and then actually proceed to boil the water (acting).
This process involves tracking the current situation, adjusting plans, and retrieving necessary information—all combining reasoning and acting.

Limitations of Existing Large Language Models (LLMs)

Reasoning-only (Chain-of-Thought, CoT) approach:
- The model develops logical reasoning using only its internal knowledge.
- However, it cannot update information from external sources, leading to errors (hallucinations).
Action-only approach:
- The model interacts with the external environment but lacks complex logical reasoning.
- As a result, information retrieval becomes inefficient.

Proposal: The ReAct Approach

ReAct (Reasoning + Acting) enables LLMs to interleave reasoning traces and task-specific actions.
This approach provides the following advantages:
- Reasoning traces: Allow the model to recognize and adjust its progress.
- Actions: Enable external interactions to gather additional information.
- Better decision-making: The model can utilize retrieved information for more accurate judgments.

Key Experiments and Results

ReAct outperforms existing methods in various tasks, offering improved performance and interpretability.
- Question answering (HotpotQA) and fact verification (FEVER):
  - Reduces hallucination errors and generates more reliable responses using external information.
- Interactive decision-making (ALFWorld, WebShop):
  - Achieves superior performance over reinforcement learning and imitation learning.
  - Demonstrates effective learning with only a few examples.

Contributions of the Study

Proposing the ReAct paradigm that integrates reasoning and acting.
Experimentally demonstrating ReAct’s superior performance and interpretability across various tasks.
Analyzing the complementary relationship between reasoning and acting.
Suggesting potential applications of ReAct in reinforcement learning and large-scale AI systems.

2. `ReAct`

Concept and Principles of ReAct

ReAct is an approach that combines reasoning (Reasoning) and acting (Acting) to perform tasks more effectively.
While previous methods conducted reasoning (CoT) and acting (Action) separately,
ReAct interleaves them to maximize complementary advantages.

Limitations of Existing Approaches

Reasoning-only (CoT) approach
- The model can think logically but cannot interact with external environments to update information.
- Consequently, it is prone to errors due to hallucinations.
Action-only approach
- The model can search for external data but lacks a systematic plan for what to look for.
- This results in inefficient retrieval and loss of important context.

Core Principles of ReAct

Expanding the model’s action space
- Traditional AI models predict only task-specific actions based on input.
- ReAct expands this by generating both actions and natural language reasoning traces,
  allowing for better decision-making.
Interleaving reasoning and acting
- Reasoning → Acting: The model first analyzes the situation and determines the next action.
- Acting → Reasoning: After interacting with the environment, the model uses the retrieved information to refine its reasoning.

Advantages of ReAct

Better decision-making
- The model can choose more effective actions by reasoning before acting.
Reduced hallucinations
- By retrieving external information, the model can minimize reliance on incorrect internal knowledge.
Improved interpretability
- The model’s reasoning process is transparent and easier for humans to understand.
Versatile applications
- Applicable to question answering, fact verification, gameplay, web navigation, and more.

3. Knowledge-Intensive Reasoning Tasks

Experiment Overview

We evaluate ReAct in knowledge-intensive reasoning tasks.
Experiments were conducted on two key tasks:
1. HotpotQA – Multi-hop question answering system.
2. FEVER – Fact verification task.
Both tasks require external knowledge (e.g., Wikipedia) for deriving correct answers,
making the integration of reasoning and acting crucial.

Experiment Setup

Datasets
- HotpotQA:
  - A question-answering dataset requiring multi-hop reasoning across multiple Wikipedia documents.
- FEVER:
  - A dataset for determining whether a given claim is true or false.
How ReAct Operates
- Given a question, the model follows a structured process:
  1. Reasoning (Thought) – The model thinks about what information it needs.
  2. Acting (Action) – It searches for the required information.
  3. Final Answer – The model derives a response based on retrieved data.
- Example question:
  “Did A and B attend the same university?”
  
  ReAct Process:
  1. Reasoning: “I need to check A’s educational background.”
  2. Acting: Search for A’s Wikipedia page → Retrieve results.
  3. Reasoning: “Now, I need to check B’s educational background.”
  4. Acting: Search for B’s Wikipedia page → Retrieve results.
  5. Reasoning: “If A and B attended the same university, answer ‘Yes’; otherwise, answer ‘No’.”
  6. Final Answer: Submit response.
Defining the Action Space
- The model operates within a predefined set of possible actions:
  1. search[entity] – Search for an entity’s Wikipedia page.
  2. lookup[string] – Look up specific information within a document.
  3. finish[answer] – Submit the final answer.

Experimental Results

1) Comparison of ReAct with Existing Methods

Method	HotpotQA (Accuracy)	FEVER (Accuracy)
Standard (Basic LLM)	28.7%	57.1%
CoT (Chain of Thought, Reasoning-only)	29.4%	56.3%
CoT-SC (CoT + Self-Consistency)	33.4%	60.4%
Act-only (Action-only)	25.7%	58.9%
ReAct (Reasoning + Acting Combined)	27.4%	60.9%
CoT-SC → ReAct (Hybrid Approach)	34.2%	64.6%

The CoT (Reasoning-only) approach showed slightly better performance than the standard LLM, but it suffered from hallucination errors, reducing reliability.
The Action-only approach used retrieval capabilities but lacked logical reasoning, leading to lower performance.
ReAct (Reasoning + Acting) achieved higher accuracy in FEVER than CoT, providing more accurate and reliable answers.
Notably, the combination of ReAct and CoT-SC achieved the highest accuracy in both HotpotQA and FEVER, demonstrating the effectiveness of integrating internal knowledge and external information retrieval.

2) Analysis of ReAct’s Advantages

Reduction of Hallucination Errors
- The CoT approach often generated incorrect information based on internal knowledge alone, while ReAct retrieved external knowledge to mitigate errors.
Improved Reasoning Ability
- By retrieving information through actions, the model could engage in more logical reasoning.
Increased Interpretability
- The clear sequencing of reasoning and actions allowed humans to easily follow the model’s decision-making process.

Conclusion and Implications

ReAct generates more reliable answers for knowledge-based reasoning tasks.
It is more effective than performing reasoning (CoT) or acting (Action) alone.
Combining internal knowledge (CoT) with external knowledge retrieval (Acting) achieves the best performance.
Future Research Directions:
- Potential integration of ReAct with reinforcement learning (RL) to enhance its action strategy.
- Further performance improvements by training the model with larger datasets.

4. Decision-Making Tasks

Experiment Overview

We applied ReAct to language-based decision-making tasks to evaluate its performance.
Experiments were conducted in two complex interactive environments:
1. ALFWorld – A text-based game where agents complete household tasks in a virtual environment.
2. WebShop – A simulated online shopping environment where the agent finds products based on user requirements.
Unlike simple QA tasks, these tasks require multiple steps to reach a goal, making the combination of reasoning and acting crucial.

Experimental Setup

ALFWorld (Virtual Household Task Execution)
- The model must complete a given task (e.g., “Retrieve the paper under the sink”) through a series of actions.
- Example:
  1. Reasoning: “The paper might be under the sink.”
  2. Action: “Check the sink.”
  3. Observation: “There is paper under the sink.”
  4. Action: “Pick up the paper.”
  5. Final Answer: “Picked up the paper.”
WebShop (Simulated Online Shopping)
- The model must find a product matching a shopping request (e.g., “Find a nickel-finished nightstand with drawers.”).
- Example:
  1. Reasoning: “I should search for ‘nightstand’.”
  2. Action: “Search for ‘nightstand’.”
  3. Observation: “Several nightstands appear.”
  4. Reasoning: “I need to check if any have a nickel finish.”
  5. Action: “Filter options to find nickel-finished models.”
  6. Final Answer: “Purchase the appropriate product.”

Experimental Results

1) Comparison of ReAct with Existing Methods

ALFWorld (Success Rate, %)

Method	Pick	Clean	Heat	Cool	Look	Pick 2	Overall Avg.
Act-only (Action-only)	88	42	74	67	72	41	45
ReAct (Reasoning + Acting)	92	58	96	86	78	41	71
BUTLER (RL-based model)	46	39	74	100	22	24	37

ReAct significantly outperformed the Act-only approach (71% vs. 45%) in ALFWorld.
It also outperformed the reinforcement learning-based model (BUTLER).
Key reason: ReAct analyzes the current state and determines appropriate actions through reasoning, enabling more logical decision-making.

WebShop (Success Rate, %)

Method	Avg. Score	Success Rate
Act-only (Action-only)	62.3	30.1%
ReAct (Reasoning + Acting)	66.6	40.0%
IL (Imitation Learning)	59.9	29.1%
IL+RL (Imitation + RL)	62.4	28.7%
Human (Expert Performance)	82.1	59.6%

ReAct achieved higher performance than Act-only and IL/IL+RL models in WebShop.
While it did not reach human expert performance (59.6%), it achieved the highest performance among automated models (40.0%).
Key reason: Unlike simply repeating actions, ReAct filters necessary information and determines optimal actions through reasoning.

ReAct vs. Existing Models

Existing models perform actions without internal reasoning, leading to repetitive and inefficient exploration.
ReAct combines reasoning and acting, allowing for more structured and logical exploration.
ReAct outperforms traditional reinforcement learning (RL) models and can be applied to general text-based environments.

Key Advantages of ReAct

More Effective Goal Achievement
- Uses reasoning to analyze the current state, filter relevant information, and select optimal actions.
Minimization of Unnecessary Actions
- Instead of random exploration, guides actions toward achieving the objective, improving performance.
Increased Generalizability
- Demonstrates strong performance across various environments such as ALFWorld and WebShop,
  applying a reasoning process similar to human thinking.
High Performance Without Reinforcement Learning
- Achieves excellent results with only a few examples,
  without requiring large-scale training as in RL.

Conclusion and Implications

ReAct outperforms simple action-based models and is applicable to a wide range of tasks.
By integrating reasoning and acting, it enables effective decision-making without reinforcement learning.
Future Research Directions:
- Expanding experiments to more complex environments.
- Developing more powerful models by combining ReAct with RL.
- Applying ReAct to various applications, such as web search and robotics.

Reasoning with Language Models

Recent studies have shown that large language models (LLMs) can perform complex reasoning tasks.
Representative approaches:
1. Chain-of-Thought (CoT) (Wei et al., 2022)
  - Guides models to explicitly express reasoning steps in natural language for solving complex problems.
  - However, it relies only on internal knowledge, increasing the risk of hallucinations.
2. Least-to-Most Prompting (Zhou et al., 2022)
  - Breaks down complex problems into smaller, solvable subproblems.
3. Faithful Reasoning (Creswell & Shanahan, 2022)
  - Designs models for more faithful and reliable reasoning.
4. Scratchpad (Nye et al., 2021)
  - Encourages models to record intermediate calculations, improving answer accuracy.

How ReAct Differs

Existing approaches focus only on reasoning and do not account for external interactions.
In contrast, ReAct integrates reasoning and acting, enabling information retrieval and external actions when necessary.

Decision-Making with Language Models

Recent studies have explored using LLMs for decision-making and task planning.
Notable research:
1. WebGPT (Nakano et al., 2021)
  - Allows LLMs to browse the web and search for information to answer questions.
  - However, it only performs actions without explicit reasoning, limiting logical depth.
2. SayCan (Ahn et al., 2022)
  - Enables robots to plan actions using an LLM and execute them in real-world environments.
3. Inner Monologue (Huang et al., 2022b)
  - Uses an internal feedback mechanism to adjust robotic actions.
  - However, it merely mentions feedback from the environment rather than engaging in real reasoning.

How ReAct Differs

Existing models train LLMs to perform actions but lack explicit reasoning processes or have limited external interactions.
ReAct integrates reasoning and acting dynamically, allowing models to retrieve information and make logical decisions.

Reinforcement Learning and Interactive AI

Studies on reinforcement learning (RL) focus on teaching LLMs how to interact with external environments.
Representative research:
1. BUTLER (Shridhar et al., 2020b)
  - Trains an RL-based action model in virtual environments.
  - However, it requires large-scale data and lacks generalization ability.
2. Interactive Agents (Abramson et al., 2020)
  - Develops AI models that interact with humans using RL.
3. Generalist Agent (Reed et al., 2022)
  - Designs a single AI model capable of performing multiple tasks.

How ReAct Differs

RL-based models require large datasets and long training times.
ReAct can perform effective decision-making with only a few examples (few-shot learning).
Unlike RL, ReAct naturally integrates reasoning and acting, allowing for more intuitive control over model behavior.

Conclusion

Unlike existing studies, ReAct integrates reasoning and acting, enabling more effective problem-solving.
ReAct overcomes the limitations of CoT (reasoning-only) and WebGPT (action-only) by harmonizing both approaches.
Future research may explore combining ReAct with reinforcement learning to develop more powerful AI systems.

6. Conclusion

Summary of the Study

This study proposed ReAct, a novel methodology that enables large language models (LLMs) to solve problems by integrating reasoning and acting.
Compared to existing methods (reasoning-only, action-only), ReAct allows for more reliable and interpretable task execution.
ReAct demonstrated strong performance across various tasks:
- Question Answering (HotpotQA): Improved accuracy by retrieving external information.
- Fact Verification (FEVER): Reduced hallucination errors, increasing reliability.
- Interactive Decision-Making (ALFWorld, WebShop): Enhanced goal-oriented action planning and success rates.

Key Contributions of ReAct

Integration of Reasoning and Acting
- Unlike prior approaches, ReAct enables LLMs to reason and interact with external environments when necessary.
Reduction of Hallucination Errors
- Unlike CoT methods that rely solely on internal knowledge, ReAct leverages external data for increased factual accuracy.
Strong Performance in Few-Shot Learning
- Unlike RL-based approaches that require extensive training data, ReAct makes effective decisions with only a few examples.
Applicability to Diverse Tasks
- ReAct can be utilized in question answering, web search, robotic action planning, and other environments.
Improved Interpretability and Reliability
- The model’s decision-making process is easier for humans to analyze and adjust.

Future Research Directions

Expanding ReAct to More Complex Environments
- Investigating its applicability in physical robotics, real-world web browsing, and other highly interactive settings.
Combining ReAct with Reinforcement Learning (RL)
- Integrating ReAct’s reasoning and action strategies with RL could lead to more powerful AI agents.
Improving Prompt Design and Model Training
- Fine-tuning ReAct using more human-annotated data to further enhance its performance.

Reader Feedback

ReAct enhances LLMs’ capabilities by incorporating feedback from the environment into their reasoning process.
Additional verification of factual relationships in actions and observations is needed.
A solution is required for cases where external data is sparse or when acquiring external data introduces latency and bottlenecks.
Benchmarking ReAct under these conditions would be beneficial.

Hanyong Lee

1. Introduction

Background

Limitations of Existing Large Language Models (LLMs)

Proposal: The ReAct Approach

Key Experiments and Results

Contributions of the Study

2. ReAct

Concept and Principles of ReAct

Limitations of Existing Approaches

Core Principles of ReAct

Advantages of ReAct

3. Knowledge-Intensive Reasoning Tasks

Experiment Overview

Experiment Setup

Experimental Results

1) Comparison of ReAct with Existing Methods

2) Analysis of ReAct’s Advantages

Conclusion and Implications

4. Decision-Making Tasks

Experiment Overview

Experimental Setup

Experimental Results

1) Comparison of ReAct with Existing Methods

ALFWorld (Success Rate, %)

WebShop (Success Rate, %)

ReAct vs. Existing Models

Key Advantages of ReAct

Conclusion and Implications

5. Related Work

Reasoning with Language Models

How ReAct Differs

Decision-Making with Language Models

How ReAct Differs

Reinforcement Learning and Interactive AI

How ReAct Differs

Conclusion

6. Conclusion

Summary of the Study

Key Contributions of ReAct

Future Research Directions

Reader Feedback

Comments

You May Also Enjoy

[논문리뷰] Native Sparse Attention- Hardware-Aligned and Natively Trainable Sparse Attention (ACL 2025)

[논문리뷰] Language Models Resist Alignment- Evidence From Data Compression (ACL 2025)

[논문리뷰] What Can Game Theory Tell Us about an AI ‘Theory of Mind’? (Games 2022)

[논문리뷰] Fairness through Difference Awareness- Measuring Desired Group Discrimination in LLMs (ACL 2025)

2. `ReAct`