HOLY SMOKES! New ‘Assembly-of-Experts’ Method Delivers 200% Faster LLMs | Sakana AI Orchestrates Multi-Model Gains & Google Embeds Custom AI in Workspace

2025-07-07 AIFlare

A dynamic graphic illustrating 200% faster LLMs using Sakana AI's 'Assembly-of-Experts' method and Google AI.

Key Takeaways

German lab TNG Technology Consulting GmbH has unveiled a DeepSeek LLM variant that is 200% faster, made possible by their innovative Assembly-of-Experts (AoE) method.
Sakana AI introduced “TreeQuest,” a technique using Monte-Carlo Tree Search to orchestrate multi-model LLM teams that outperform individual models by 30% on complex tasks.
Google is integrating customizable Gemini chatbots, called “Gems,” directly into its Workspace applications (Docs, Sheets, Gmail, Drive), making personalized AI agents widely accessible to users.
OpenAI’s GPT-4.1 and Realtime API are enabling rapid commercial success, demonstrated by Genspark building a $36M ARR AI product in just 45 days using no-code personal agents.
New research highlights advancements in optimizing tool selection for LLM workflows through differentiable programming, signaling progress in building more efficient and intelligent AI systems.

Main Developments

Today’s AI landscape is buzzing with rapid advancements across efficiency, capability, and widespread accessibility, marking a pivotal moment in the technology’s evolution. Leading the charge is a groundbreaking revelation from Germany, where TNG Technology Consulting GmbH has unveiled a DeepSeek R1-0528 variant that boasts an astonishing 200% increase in speed. This remarkable leap is attributed to TNG’s novel Assembly-of-Experts (AoE) method, a technique that selectively merges weight tensors to construct LLMs. This significant efficiency gain promises to dramatically reduce the computational burden and latency associated with large language models, opening doors for more real-time, resource-intensive AI applications across various industries. The potential impact on operational costs and the scalability of AI solutions cannot be overstated, signaling a major step towards making powerful AI more practical and pervasive.

Complementing this stride in efficiency, Sakana AI has introduced a new paradigm for enhancing LLM performance through sophisticated collaboration. Their “TreeQuest” inference-time scaling technique orchestrates multi-model teams, allowing them to collectively tackle complex tasks with a reported 30% higher success rate compared to individual LLMs. Utilizing Monte-Carlo Tree Search, this method leverages the strengths of diverse models, marking a crucial step towards more robust and versatile AI systems capable of handling nuanced and multifaceted challenges that single models might struggle with. The implication is clear: the future of high-performance AI lies not just in bigger, more powerful individual models, but increasingly in smarter orchestration of multiple specialized agents working in concert.

On the front of user accessibility and pervasive integration, Google is making a significant move by embedding its customizable Gemini chatbots, known as “Gems,” directly into its popular Workspace suite. Now available in the side panels of Google Docs, Slides, Sheets, Drive, and Gmail, this integration means millions of users can access their personalized AI assistants without ever leaving their core productivity applications. Whether it’s drafting emails, analyzing data in spreadsheets, or generating presentation content, “Gems” empower users to tailor AI to their specific workflows and tasks, democratizing the creation and deployment of specialized AI agents for everyday use. This strategic deployment underscores a broader industry trend toward making AI assistance an intuitive, ever-present part of digital work environments, blurring the lines between human and AI-powered tasks.

The power of readily available AI infrastructure is also evident in the commercial success stories emerging from platforms like OpenAI. Their GPT-4.1 and Realtime API are proving to be potent tools for rapid product development, as demonstrated by Genspark. This company managed to build a remarkable $36 million ARR AI product in a mere 45 days, leveraging no-code personal agents powered by OpenAI’s advanced models. This compelling case study highlights the growing potential for entrepreneurs and businesses, even those without deep technical expertise, to quickly conceptualize, develop, and scale AI-driven solutions, drastically lowering the barrier to entry for the AI economy and fostering unprecedented innovation.

Finally, beneath the surface of these headline-grabbing deployments and performance boosts, fundamental research continues to refine how AI systems operate. An article highlighted on Hacker News delves into optimizing tool selection for LLM workflows using differentiable programming. This technical advancement focuses on making LLMs more adept at choosing and utilizing external tools efficiently, a critical step towards building more autonomous and intelligent agents. Such foundational work ensures that as LLMs become faster and more collaborative, their underlying decision-making processes also become more sophisticated and resource-aware, paving the way for truly adaptive and versatile AI systems in the long run.

Analyst’s View

Today’s news signals a critical inflection point in AI development: the simultaneous push for extreme efficiency, sophisticated multi-model collaboration, and mass user adoption. TNG’s 200% speed increase via their AoE method isn’t just a technical footnote; it’s a game-changer for the economic viability and real-world latency of large-scale AI deployment. Paired with Sakana AI’s multi-model orchestration, we’re seeing the dawn of AI systems that are not only faster but also inherently more capable and robust, moving beyond monolithic models to intelligent ensembles. Google’s pervasive integration of custom AI into Workspace, alongside OpenAI’s platform enabling rapid no-code product development, underscores a future where personalized AI is not a niche tool but a ubiquitous co-pilot. The race now accelerates on two fronts: fundamental architectural breakthroughs and strategic, user-centric deployment. Expect consolidation around platforms offering both.

Source Material

阅读中文版 (Read Chinese Version)

AI Flare

Catch the Next Wave of AI