Unlocking Advanced Reasoning in Large Language Models: A Deep Dive into Innovative Prompting Techniques

8 min readJan 13, 2024

The advent of large language models (LLMs) like GPT-3.5-Turbo and GPT4 has revolutionized the field of natural language processing (NLP). These AI powerhouses have shown remarkable abilities in understanding and generating human-like text. However, their ability to reason, plan, and act within a given context can still be inconsistent. To improve these aspects, researchers have developed various prompting techniques that guide LLMs toward more coherent, accurate, and contextually relevant outputs.

In this blog, I will explore several advanced prompting techniques that have been developed to enhance the reasoning and acting abilities of LLMs, including

Chain-of-thought (CoT) Prompting;
Self-consistency with CoT;
Tree-of-thought (ToT) Prompting;
Reasoning via Planning (RAP);
ReAct;
Self-refine;
Reflexion;
Language Agent Tree Search (LATS).

Chain-of-thought (CoT) Prompting [1]

Chain-of-thought (CoT) prompting involves providing an LLM with a prompt that encourages it to generate intermediate reasoning steps before arriving at a final answer. This approach can be particularly useful for complex problems where a single-step answer may not be sufficient. However, it needs to use step-by-step reasoning examples per task. To eliminate this issue, by prompting the model to “Let’s think step by step”, it can also often arrive at more accurate conclusions across all tasks (arithmetic, symbolic, commonsense, and other logical reasoning tasks).

Example:
For a math problem like “What is the sum of the consecutive numbers starting from 1 up to 100?”, a CoT prompt might look like this:

Q: What is the sum of the consecutive numbers starting from 1 up to 100?
A: Let's think step by step. 
------
(LLMs Output) First, we need to find the last number in the sequence, which is 100. 
The sum of a consecutive sequence can be found by the formula (n/2)*(first number + last number) where n is the total number of terms. 
So, we have (100/2)*(1+100), which simplifies to 50*101. 
Therefore, the sum is 5050.

Self-consistency with CoT [2]

Building on CoT, the self-consistency method involves generating multiple reasoning paths and answers for the same question and then selecting the most common answer across these paths. This approach can increase the likelihood of the LLM arriving at the correct answer by reinforcing consistent reasoning.

Example:
The model is asked the same question multiple times and generates different chains of thought:

Q: If it rained 3 inches yesterday and 2 inches today, how much did it rain in total? 
------
A1: Yesterday it rained 3 inches and today 2 inches, so 3 + 2 equals 5 inches in total. 
A2: 3 inches from yesterday and 2 inches from today makes 3 + 2, which is 5 inches of rain in total.
------
The most common answer, 5 inches, is selected for consistency.

Tree-of-thought (ToT) Prompting [3]

Tree-of-thought prompting is an extension of CoT that represents the reasoning process as a tree structure, with branches representing different lines of reasoning (each node represents a step in the reasoning process). This technique can help in exploring multiple solutions to a problem and selecting the most plausible one.

Example:
For a multi-step logic puzzle, ToT might involve branching out possibilities like this:

Q: Start at the root: What is the color of the bear?
------
A: Branch 1: If the bear is in the North Pole, it must be white because polar bears live there.
Branch 2: If the bear is in the forest, it could be brown or black.
Conclusion: Given the additional information that the bear is in the North Pole, we follow Branch 1 and determine the bear is white.

Reasoning via Planning (RAP) [4]

Reasoning via Planning (RAP) is a technique where the model generates a plan of action based on given goals and constraints to solve a problem, often involving sequential actions or steps. It’s particularly useful for tasks that require a sequence of actions to achieve a desired outcome.

Example:
For a cooking recipe, RAP could generate a plan like this:

Goal: Bake a chocolate cake.
Step 1: Gather ingredients - flour, sugar, cocoa powder, etc.
Step 2: Preheat the oven to 350°F.
Step 3: Mix dry ingredients.
…
Step n: Bake for 30 minutes and let cool.

ReAct [5]

ReAct is a prompting technique that stands for Reasoning and Acting. It’s designed to help LLMs not only reason about a problem but also take actions like interacting with an API or a database, based on the reasoning process.

Example:
If tasked with finding the weather forecast, a ReAct prompt might look like this:

First, determine the user's location. 
Next, access the weather API with the location data. 
Then, retrieve the forecast information and present it to the user in a readable format.

Self-Refine [6]

The Self-refine approach leverages a cyclical process of self-improvement, enabling the model to enhance its own outputs autonomously through iterative self-assessment (feedback). This process of continuous self-critique and enhancement is guided by specific task parameters, for instance, “What adjustments can be made to craft this content with a more positive tone?”

Example:

An example application of the Self-refine method could be in generating a summary of a complex article:

1. Initial Generation: The model generates a basic summary that captures some key points but misses nuances and contains ambiguities.
2. Self-feedback Generation: The model identifies that certain points are not well-explained and that some important information is missing.
3. Iterative Refinement: The model adds the missing information and clarifies ambiguous points in several refinement cycles.
4. Termination Condition: The model determines that the summary is now clear, detailed, and accurately represents the article's content.
5. Final Output: The model outputs the refined summary, which is of higher quality than the initial version.

Reflexion [7]

Reflexion represents an innovative approach to enhancing language agents by using linguistic feedback as a means of reinforcement, diverging from the conventional practice of adjusting neural network weights. This paradigm shift moves away from the standard reinforcement learning (RL) techniques that typically depend on numerical rewards. By processing and learning from verbal feedback, language agents are able to refine their capabilities in a manner that more closely mimics human learning processes. This method could lead to the development of more advanced and flexible AI systems.

Specifically, within the Reflexion framework, agents engage in a form of self-reflection by considering the language-based feedback they receive and then storing this reflective dialogue within an episodic memory buffer. This repository of linguistic experiences serves to inform and guide the agents’ future choices, enhancing their decision-making processes over time.

Language Agent Tree Search (LATS) [8]

Language Agent Tree Search (LATS) is a strategy that combines language models with tree search algorithms to explore a large number of potential actions and outcomes. This approach can be especially powerful in game-playing scenarios or situations with a wide range of possible decisions.

Example:
In a chess game, LATS could be used to evaluate different moves:

Root: Current board position.
Branch 1: Move pawn from e2 to e4. Evaluate board position after opponent's possible responses.
Branch 2: Move knight from g1 to f3. Evaluate board position after opponent's possible responses.
…
Select the branch that leads to the most advantageous position after several look-ahead moves.

Each of these prompting techniques represents a significant leap forward in the capabilities of LLMs to perform more complex reasoning tasks and interact with their environment in meaningful ways. By incorporating structured thought processes, action plans, and decision trees, these models are becoming increasingly adept at solving real-world problems.

As research continues and these models evolve, we can expect even more sophisticated prompting methods to emerge, further closing the gap between artificial and human intelligence. These advances will undoubtedly unlock new applications and opportunities for LLMs across various domains.

Reference

[1] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., … & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35, 24824–24837.

[2] Wang, X., Wei, J., Schuurmans, D., Le, Q. V., Chi, E. H., Narang, S., … & Zhou, D. (2022, September). Self-Consistency Improves Chain of Thought Reasoning in Language Models. In The Eleventh International Conference on Learning Representations.

[3] Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T. L., Cao, Y., & Narasimhan, K. (2023). Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601.

[4] Hao, S., Gu, Y., Ma, H., Hong, J. J., Wang, Z., Wang, D. Z., & Hu, Z. (2023). Reasoning with language model is planning with world model. arXiv preprint arXiv:2305.14992.

[5] Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K. R., & Cao, Y. (2022, September). ReAct: Synergizing Reasoning and Acting in Language Models. In The Eleventh International Conference on Learning Representations.

[6] Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., … & Clark, P. (2023). Self-refine: Iterative refinement with self-feedback. arXiv preprint arXiv:2303.17651.

[7] Shinn, N., Cassano, F., Gopinath, A., Narasimhan, K. R., & Yao, S. (2023, November). Reflexion: Language agents with verbal reinforcement learning. In Thirty-seventh Conference on Neural Information Processing Systems.

[8] Zhou, A., Yan, K., Shlapentokh-Rothman, M., Wang, H., & Wang, Y. X. (2023). Language agent tree search unifies reasoning acting and planning in language models. arXiv preprint arXiv:2310.04406.