Blending: The Remarkable New Tech that outperforms ChatGPT

Small Models, Big Results: The Power of Blending

Recent research has uncovered a groundbreaking technique called “Blending” that’s challenging the notion that bigger is always better in the world of AI language models. This innovative approach combines multiple smaller language models to create a system that can outperform Generative AI giants like ChatGPT, all while using fewer computational resources.

The study compared the performance of several language models, including:

Pygmalion 6B
Chai Model 6B
Vicuna 13B
OpenAI’s GPT-3.5 (175B+ parameters)
A Blended model combining Pygmalion, Chai Model, and Vicuna

The researchers used two key metrics to evaluate performance:

User Retention: The fraction of users returning to the platform k days after joining.
User Engagement: The average time spent per visiting user.

Surprisingly, the Blended model, with a total of just 25B parameters, outperformed OpenAI’s GPT-3.5 (175B+ parameters) in both retention and engagement metrics.

The Science Behind Blending: How It Works

Blending isn’t about creating one massive neural network. Instead, it’s a method of integrating multiple chat AIs by randomly selecting which model generates each response in a conversation. Here’s how it works:

Start with multiple moderately-sized, specialized language models.
For each response in a conversation, randomly select one model to generate the answer.
Repeat this process throughout the conversation.

This approach allows for diverse, dynamic dialogues that benefit from each model’s strengths. The researchers found that this simple method led to significantly higher user engagement and retention compared to using individual models or even much larger models like GPT-4

Learn more !

Get on a 1:1 call with our experts to discuss how Generative AI can add value to your organization !

Thank you ! You will hear back from us shortly.

Data Speaks: Blending vs. The Giants

The results of the large-scale A/B tests on the “Blended models” are striking:

Engagement Improvement:
- Blended (13B, 6B, 6B): 120%
- GPT-3.5 (175B): 80%
- Vicuna+ (13B): 20%
- ChaiLLM (6B): 40%
Retention Improvement:
- Blended (13B, 6B, 6B): 40%
- GPT-3.5 (175B): 20%
- Vicuna+ (13B): 10%
- ChaiLLM (6B): 20%

These percentages represent improvements over the control model (Pygmalion 6B) after 30 days.

The researchers also developed metrics to summarize a chat AI’s performance:

∆α and ∆γ for engagement ratio
∆ζ and ∆β for retention ratio

The Blended model showed the highest relative initial engagement (∆α) and the best engagement ratio decay rate (∆γ). While Vicuna had a better retention ratio decay rate (∆β), its significantly lower initial retention ratio (∆ζ) meant it would take an extended period (estimated around one year) to reach Blended’s retention score.

Perhaps most importantly, the Blended model achieved these results with an inference speed similar to that of the smaller models. This means it offers significant performance gains without increasing computational costs, making it a game-changer for companies and researchers working on AI applications.

In conclusion, the Blending technique presents a promising alternative to the trend of developing ever-larger language models. By cleverly combining smaller, specialized models, it’s possible to create AI systems that are not only more engaging and effective but also more efficient and accessible. This breakthrough could democratize advanced AI technology, making it available to a wider range of businesses and researchers who previously couldn’t afford the computational costs of running large models.