Meta Launches Muse Spark, Its Most Capable AI Yet—But Gemini 3.1 Pro Still Leads the Pack

Briefly

Meta’s new Muse Spark marks a shift to closed, natively multimodal AI with agent-based reasoning.
Meta reviews sturdy benchmark good points in well being and search, however nonetheless trails Gemini on core reasoning and coding.
In-built 9 months with far much less compute, this factors to a brand new efficiency-driven AI technique.

Meta launched Muse Spark on Wednesday, marking the primary mannequin constructed by Meta Superintelligence Labs—the crew assembled 9 months in the past underneath Chief AI Officer Alexandr Wang after Meta’s $14 billion Scale AI acquisition. It is stay now at meta.ai and the Meta AI app, with a rollout to Fb, Instagram, and WhatsApp coming within the subsequent few weeks.

This is not simply one other chatbot improve or a brand new model of Llama. Muse Spark is natively multimodal—it processes photographs, textual content, and voice from the bottom up, moderately than bolting imaginative and prescient onto an current textual content mannequin. It comes with visible chain-of-thought, tool-use assist, and one thing Meta is looking “Considering mode”: a setup that runs a number of AI brokers in parallel to sort out tougher issues. That is Meta’s reply to the prolonged pondering modes from Google’s Gemini Deep Suppose and OpenAI’s GPT Professional.

“Muse Spark is step one on our scaling ladder and the primary product of a ground-up overhaul of our AI efforts,” Meta wrote in an official announcement. “To assist additional scaling, we’re making strategic investments throughout all the stack—from analysis and mannequin coaching to infrastructure, together with the Hyperion information middle.”

The corporate labored with greater than 1,000 physicians to curate coaching information for Muse Spark’s medical reasoning. The outcomes on HealthBench Laborious—an open-ended well being queries benchmark—are putting: Muse Spark scored 42.8, in comparison with 40.1 for GPT 5.4 and simply 20.6 for Gemini 3.1 Professional. That is not a marginal distinction.

On agentic search (DeepSearchQA), Muse Spark additionally leads with 74.8, beating Gemini (69.7) and GPT 5.4 (73.6). On CharXiv Reasoning—determine understanding from scientific papers—it scored 86.4, the best throughout the fashions within the comparability.

For these into jailbreaking AI, the mannequin was cracked open inside minutes:

🚰 SYSTEM PROMPT LEAK 🚰

This is the total Muse Spark system immediate from Meta!

I seen @AIatMeta forgot to open supply it, so I’ve achieved them the courtesy 😘

PROMPT:”””Who’re you?

You’re a pleasant, clever, and agentic AI assistant. You’re heat and a bit playful.…

— Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭 (@elder_plinius) April 8, 2026

However good isn’t the identical as nice. The general benchmark image exhibits Gemini 3.1 Professional nonetheless operating forward on most classes. The hole is most seen on ARC AGI 2, the summary reasoning puzzle benchmark: Gemini scored 76.5 to Muse Spark’s 42.5.

On coding (LiveCodeBench Professional), Gemini’s 82.9 outpaces Meta’s 80.0. On MMMU Professional—multimodal understanding—Gemini scored 83.9 versus 80.4. Meta’s personal weblog acknowledges present efficiency gaps in long-horizon agentic methods and coding workflows.

There’s additionally a notable strategic shift baked into this launch. Muse Spark is a closed mannequin—its structure and weights will not be made public. That is a pointy departure from Llama, which constructed Meta’s fame in open AI circles. After Llama 4’s underwhelming reception earlier this 12 months, Meta seems to have determined the following chapter must be written in a different way.

The corporate says it hopes to open-source future variations of Muse, however for now the code stays inside Meta. The tech large’s inventory climbed almost 9% on Wednesday following the announcement, and completed the buying and selling day up 6.5% to a value of $612.42.

“Considering mode” makes use of parallel agent orchestration to push the mannequin’s ceiling increased. In that configuration, Muse Spark hit 58% on Humanity’s Final Examination and 38% on FrontierScience Analysis—territory that makes it aggressive with essentially the most succesful variations of Gemini and GPT, moderately than their customary releases.

Meta can also be rolling out a buying assistant that compares merchandise and hyperlinks on to purchases, and plans to convey Muse Spark to Fb, Instagram, and WhatsApp within the coming weeks—following the identical script applied since Llama 3, placing it in entrance of greater than 3.5 billion customers. A non-public API preview is opening to pick out builders.

The mannequin was in-built 9 months, internally codenamed Avocado, with Meta claiming that its new pretraining stack can attain the identical functionality degree as Llama 4 Maverick utilizing over 10 occasions much less compute.

Muse Spark is described internally as a “small and quick” first step within the Muse household. A extra succesful model is already in improvement.