
Are you ready to generate more awareness for your brand? Consider becoming a sponsor of The AI Impact Tour. Learn more about opportunities here.
Even as the world witnesses the power struggle and mass resignation at OpenAI, Microsoft, the long-time sponsor of the AI specialty, is not slowing down its own AI efforts. Today, the research arm of the Satya Nadella-led company launched Orca 2, a pair of small language models that match or outperform language models five to ten times larger, including Meta’s Llama-2 Chat-70B , when tested on complex reasoning tasks. in zero shot environments.
The models come in two sizes, 7 billion and 13 billion parameters, and are based on work done on the original 13B Orca model that demonstrated strong reasoning capabilities by step-by-step mimicking the reasoning traces of larger models and capable a few months ago. .
“With Orca 2, we continue to demonstrate that improved signals and training methods can boost smaller language models to achieve enhanced reasoning capabilities typically only found in much larger language models,” Microsoft researchers wrote in a post joint blog.
The company has opened up both new models for further research into developing and evaluating smaller models that can perform as well as larger ones. This work can give companies, particularly those with limited resources, a better option to address their specific use cases without investing too much in computing capacity.
VB Event
The AI Impact Tour
Connect with the enterprise AI community at VentureBeat’s AI Impact Tour coming to a city near you!
Learn more
Teaching little models to reason.
While large language models, such as GPT-4, have long impressed companies and individuals with their ability to reason and answer complex questions with explanations, their smaller counterparts have largely lost that ability. Microsoft Research decided to address this gap by fine-tuning the base Llama 2 models on a highly customized synthetic data set.
However, instead of training the small models to replicate the behavior of more capable models (a commonly used technique known as imitation learning), the researchers trained the models to employ different solution strategies for different tasks at hand. The idea was that the strategy of a larger model may not work perfectly for a smaller one all the time. For example, GPT-4 can answer complex questions directly, but a smaller model, without that kind of capability, could benefit from breaking down the same task into a few steps.
“In Orca 2, we teach the model various reasoning techniques (step by step, remember then generate, remember-reason-generate, direct response, etc.). More importantly, our goal is to help the model learn to determine the most effective solution strategy for each task,” the researchers wrote in a paper published today. The training data for the project was obtained from a more capable teacher model in a way that teaches the student model to handle both aspects: how to use a reasoning strategy and when exactly to use it for a given task.
Orca 2 performs better than larger models
When tested on 15 diverse benchmarks (in zero-shot environments) covering aspects such as language comprehension, common sense reasoning, multi-step reasoning, mathematical problem solving, reading comprehension, summaries and veracity, the Orca 2 models produced amazing results by largely matching or outperforming models that are five to ten times larger in size.
The average of all benchmark results showed that Orca 2 7B and 13B outperformed Llama-2-Chat-13B and 70B and WizardLM-13B and 70B. On the GSM8K benchmark alone, which consists of 8.5,000 high-quality school mathematics problems, WizardLM-70B performed convincingly better than the Orca and Llama models.

While performance is good news for enterprise teams that want a small, high-performance model for profitable enterprise applications, it is important to note that these models can also inherit limitations common to other language models, as well as those of the basic language. model on which they were tuned.
Microsoft added that the technique used to create the Orca models can even be used in other base models out there.
“While it has several limitations…, Orca 2’s potential for future advances is evident, especially in improving the reasoning, specialization, control and safety of smaller models. The use of carefully filtered synthetic data for post-training emerges as a key strategy in these improvements. As larger models continue to excel, our work with Orca 2 marks a significant step in diversifying the applications and implementation options of language models,” the research team wrote.
More small, high-performance models are emerging
With the release of the open source Orca 2 models and ongoing research in the space, it is safe to say that more high-performance small language models are likely to emerge in the near future.
Just a few weeks ago, China’s newly turned unicorn 01.AI, founded by veteran AI expert Kai-Fu Lee, also took a major step in this area with the launch of a 34 billion-parameter model that supports Chinese and English and surpasses the 70 billion Llama 2 and 180 billion Falcon. The startup also offers a smaller option that has been trained with 6 billion parameters and has respectable performance on benchmarks of widely used AI/ML models.
Mistral AI, the six-month-old Paris-based startup that made headlines with its unique Word Art logo and a record $118 million seed round, also offers a 7 billion parameter model that outperforms larger offerings including Meta’s Call 2 13B (one of Meta’s smallest models).
VentureBeat’s mission is to be a digital marketplace for technical decision makers to gain insights into transformative business technology and transact. Discover our Briefings.