Collaboration is Intelligence: Mixture-of-Agents Outperforms GPT-4o
Collaboration is Intelligence: Mixture-of-Agents Outperforms GPT-4o
If you’ve been with me for a while, you know my enthusiasm for long-inference models. They represent the next leap in AI’s reasoning capabilities, a sentiment shared by academia.
Recently, the researchers at Together.ai, a company specializing in Large Language Models (LLMs), released a paper on Mixture-of-Agents (MoA). This new approach demonstrates how combining different LLMs can achieve superior results in complex tasks, outperforming GPT-4o in popular benchmarks, despite using less advanced models.
So, how is this possible? The key lies in long inference. Let’s explore.
Beyond the Hype: What’s Next for AI?
Many AI newsletters focus on what has just happened, often hyping events with limited practical value. Rarely do they offer a glimpse into the future of AI. If you’re interested in forward-looking insights, consider subscribing to TheTechOasis newsletter.
The Power of Iteration
Long inference LLMs don’t give immediate answers. Instead, they iterate and self-reflect within a set timeframe to provide more thoughtful responses. This method allows even less advanced models to surpass the performance of current top-tier models by giving them “more time to think.”
For example, GPT-3.5, though generally inferior to GPT-4, can outperform it when used within agentic workflows.
Why Does Long Inference Work?
We don’t fully understand why long inference is so effective, but researchers like Yoshua Bengio often refer to Daniel Kahneman’s thinking models: Systems 1 and 2. System 1 is fast and intuitive, while System 2 is slow and deliberate. Current LLMs operate like System 1, whereas long inference models emulate System 2 thinking by engaging in more detailed processing for complex tasks.
In today’s models, techniques like Chain-of-Thought (CoT) prompting can activate System 2 thinking, but these methods don’t reach the full potential of long inference LLMs that actively search the solution space.
Collaborative Efforts in AI
Academia has been working on collaborative frameworks for LLMs to enhance their performance. Over a year ago, researchers from MIT and Google introduced the Society of Minds framework. Despite using less powerful models like Bard and ChatGPT-3.5, they observed emergent behaviors where LLMs with initial wrong answers would correct themselves after collaboration.
Later, Google and Princeton University developed the Tree-of-Thoughts (ToT) framework. This approach allows LLMs to search the space of possible solutions, significantly improving their reasoning capabilities. Recent research on Chain of Preference Optimization uses ToT to fine-tune standard LLMs, achieving powerful results without active searching.
The Mixture-of-Agents Approach
MoA works similarly to mixture-of-experts models but on a larger scale. Instead of breaking an LLM into parts, MoA creates a “grand LLM” composed of smaller models, or agents, organized in layers. Each agent proposes possible responses, which are then refined by agents in subsequent layers. A final agent, called the aggregator, consolidates these responses into the final output.
This self-refining process yields impressive results. For example, a set of open-source models, all inferior to GPT-4o individually, surpassed it in the AlpacaEval 2.0 benchmark by 7.6%.
The Future of Long Inference Models
The demand for generative AI products is high, but poor reasoning remains a major challenge. Long inference models could address this issue. Despite generating more tokens on average, these models might be more cost-efficient because they use smaller, collaborative models rather than large, standalone ones.
This balance of higher performance and cost-efficiency could be crucial for the industry’s future, making it a sustainable and practical solution.
Read more:
https://arxiv.org/abs/2305.10601