Microsoft Releases New, High-Performance Phi-3.5 Models, Beating Google, OpenAI, and Others

Join our day by day and weekly newsletters to receive the latest updates and exclusive content on industry-leading AI coverage. Learn more

Microsoft has no intention of basing its success in AI on its partnership with OpenAI.

No, quite the opposite. Instead, the company often generally known as Redmond because of its Washington state headquarters got here out swinging today, releasing three recent models in its expanding Phi AI language/multimodal series.

- Advertisement -

Three recent Phi 3.5 models cover 3.82 billion parameters Phi-3.5-mini-instruct41.9 billion parameters Phi-3.5-MoE-manualand parameter 4.15 billion Phi-3.5-vision-manualeach of them is designed for basic/fast reasoning, more advanced reasoning, and tasks requiring vision (image and video evaluation), respectively.

All three models are available for developers to download, use and fine-tune in Hugging Face Microsoft Branded MIT License which allows for business use and modification without restrictions.

Amazingly, all three models boast near-state-of-the-art performance in quite a few third-party benchmarks, outperforming even solutions from other AI vendors including Google’s Gemini 1.5 Flash, Meta’s Llama 3.1, and in some cases even OpenAI’s GPT-4o.

Such achievements, combined with a permissive, open license, have led people to praise Microsoft on the social networking site X:

Let’s gooo… Microsoft just released Phi 3.5 mini, MoE and vision with 128K context, multilingual and MIT license! MoE beats Gemini flash, Vision competes with GPT4o?
> Mini with 3.8B parameters, beats Llama3.1 8B and Mistral 7B and is competitive with Mistral NeMo 12B
>… photo:twitter.com/7QJYOSSdyX
— Vaibhav (VB) Srivastav (@reach_vb) August 20, 2024

Congratulations to @Microsoft for achieving such an amazing result with the recently released Phi 3.5: mini+MoE+vision?
Phi-3.5-MoE beats Llama 3.1 8B in benchmarks
Of course, Phi-3.5-MoE is a 42B MoE with 6.6B activated during generation
And Phi-3.5 MoE outperforms… photo:twitter.com/9d4h5Q5p7Z
— Rohan Paul (@rohanpaul_ai) August 20, 2024

How the hell is Phi-3.5 even possible?
Phi-3.5-3.8B (Mini) by some means beats LLaMA-3.1-8B.
(trained only on 3.4T tokens)
Phi-3.5-16×3.8B (MoE) by some means beats Gemini-Flash
(trained only on 4.9T tokens)
Phi-3.5-V-4.2B (Vision) by some means beats GPT-4o
(trained on 500B tokens)
how? lol photo: twitter.com/97gmx1CsQs
— Yam Peleg (@Yampeleg) August 20, 2024

Today, let’s take a quick look at each of the recent models, based on their release notes published on Hugging Face

Phi-3.5 Mini Instruct: Optimized for compute-constrained environments

The Phi-3.5 Mini Instruct model is a lightweight AI model with 3.8 billion parameters, designed to follow instructions and supporting a token context length of 128k.

This model is ideal for scenarios requiring strong reasoning abilities in environments with limited memory or computational power, including tasks equivalent to code generation, mathematical problem solving, and logic-based reasoning.

Despite its compact size, the Phi-3.5 Mini Instruct delivers competitive performance for conversational tasks involving multiple languages and phrases, a significant improvement over previous models.

In many benchmarks it offers performance near state-of-the-art solutions, and in the RepoQA test, which measures “understanding of long contextual code”, it outperforms other models of comparable size (Llama-3.1-8B-instruct and Mistral-7B-instruct).

Phi-3.5 MoE: Microsoft’s “Mix of Experts”

The Phi-3.5 MoE (Mixture of Experts) model appears to be the first in this class of models from the company, combining several various kinds of models, each specializing in a different task.

This model uses an architecture with 42 billion energetic parameters and supports a 128k token context length, providing scalable AI performance for demanding applications. However, it only works with 6.6B energetic parameters, based on HuggingFace documentation.

Designed to excel in a number of reasoning tasks, Phi-3.5 MoE offers high performance in coding, math, and multilingual language understanding, often outperforming larger models on specific benchmarks including RepoQA:

In an impressive comparison to the GPT-4o, the five-shot MMLU (Massive Multitask Language Understanding) mini test covers subjects equivalent to STEM, humanities, and social sciences at a number of proficiency levels.

The unique architecture of the MoE model allows it to keep up performance while handling complex AI tasks across multiple languages.

Phi-3.5 Vision Instruct: Advanced Multimodal Reasoning

The trio is accomplished by the Phi-3.5 Vision Instruct, which integrates text and image processing capabilities.

This multimodal model is particularly useful for tasks equivalent to general image understanding, optical character recognition, understanding graphs and tables, and summarizing video material.

Like other models in the Phi-3.5 series, Vision Instruct supports a 128 KB token context length, enabling it to administer complex visual tasks spanning multiple frames.

Microsoft emphasizes that the model was trained using a combination of synthetic and filtered publicly available datasets, with an emphasis on high-quality and high-inference-density data.

Training of the recent Phi trio

The Phi-3.5 Mini Instruct model was trained on 3.4 trillion tokens using 512 H100-80G GPUs in 10 days, while the Vision Instruct model was trained on 500 billion tokens using 256 A100-80G GPUs in 6 days.

The Phi-3.5 MoE model, which features a mixed-expert architecture, was trained on 4.9 trillion tokens and 512 H100-80G GPUs in 23 days.

Open source software under the MIT license

All three Phi-3.5 models are available under the MIT License, reflecting Microsoft’s commitment to supporting the open source community.

This license allows developers to freely use, modify, mix, publish, distribute, sublicense, and sell copies of the software.

The license also includes a disclaimer that the software is provided “as is” without warranty of any kind. Microsoft and other copyright holders are not responsible for any claims, damages or other liabilities which will arise from the use of the software.

The launch of the Phi-3.5 series by Microsoft represents a significant step forward in the development of multilingual and multimodal AI.

By making these models open source, Microsoft enables developers to integrate cutting-edge AI capabilities into their applications, driving innovation in each business and research settings.

VB Daily

Stay up up to now! Get the latest news in your inbox every day

By subscribing, you comply with the VentureBeat Terms of Service.

Thanks for subscribing. Check out more VB newsletters here.

An error occurred.

Microsoft Releases New, High-Performance Phi-3.5 Models, Beating Google, OpenAI, and Others

Phi-3.5 Mini Instruct: Optimized for compute-constrained environments

Phi-3.5 MoE: Microsoft’s “Mix of Experts”

Phi-3.5 Vision Instruct: Advanced Multimodal Reasoning

Training of the recent Phi trio

Open source software under the MIT license

Latest Posts

Recomended