Reflection 70B Model Manufacturer Breaks Silence in Face of Fraud Accusations

Join our day by day and weekly newsletters to receive the latest updates and exclusive content on industry-leading AI coverage. Learn more

Matt Shumer, co-founder and CEO of OthersideAI, also often called its flagship AI-assisted writing product Hyperscriptbroke nearly two days of silence after being accused of fraud when independent researchers failed to duplicate the supposedly best-in-class performance of a latest large language model (LLM) that it released on Thursday, September 5.

On his social media account X, Shumer apologized and said it was “ahead of my imagination,” adding: “I know many of you are excited about the potential of this technology, but are skeptical right now.”

- Advertisement -

However, his latest statements doesn’t fully explain why his Reflection 70B model, which he claimed was a variant of the Llama 3.1 Meta, was trained using Glaive AI synthetic data generation platform, didn’t perform in addition to he originally claimed in all subsequent independent tests. Shumer also didn’t explain exactly what went mistaken. Here’s the timeline:

Thursday, September 5, 2024: Initial, exaggerated claims of Reflection 70B’s superior performance in benchmarks

If you are just beginning to catch up, last week Shumer released the Reflection 70B, AI Hugging Face Open Source Communitycalling it “the world’s best open source model” in the post on X and posted a chart showing what he believes are the state-of-the-art results achieved in third-party benchmarks.

Shumer says the impressive results were achieved due to a technique called “Reflection Tuning,” which allows the model to guage and refine its answers for correctness before delivering them to users.

VentureBeat interviewed Shumer and took his benchmarks as presented, attributing them to him because we don’t have the time or resources to run our own independent benchmarks — and most of the model vendors we’ve covered so far have been honest.

Friday, September 6 – Monday, September 9: Third-party rankings do not reflect impressive Reflection 70B results – Shumer accused of fraud

However, just days after the debut and last weekend, independent, external appraisers and members open source AI community posts on Reddit AND Hacker news began to query the model’s performance and were unable to breed it on their very own. Some even found answers and data indicating that the model was linked to — perhaps just a thin “wrapper” — referring to Anthropic’s Claude 3.5 Sonnet model.

Criticism grew after Artificial Analysis, an independent organization that evaluates artificial intelligence, posted on X information about his Reflection 70B tests gave significantly lower results than HyperWrite initially claimed.

Shumer was also It was stated that Glaive was investedartificial intelligence startup whose synthetic data he used to coach his model, which he didn’t disclose when he released Reflection 70B.

Shumer attributed the discrepancies to issues with the model upload process to Hugging Face and promised to correct the model weights last week, but has not done so yet.

One of the X users, Shin Megami Boson, openly accused Shumer “fraud in the artificial intelligence research community” on Sunday, September 8. Shumer didn’t respond on to the accusation.

After posting and re-posting various X messages related to Reflection 70B, Shumer went silent Sunday evening and didn’t reply to VentureBeat’s request for comment — nor did he post any public X posts — until this evening, Tuesday, September 10.

Additionally, AI researchers akin to Nvidia’s Jim Fan have noticed it was easy to coach even less efficient models (with lower parameters or complexity) to perform well in third-party benchmarks.

Tuesday, September 10: Shumer responds and apologizes — but doesn’t explain discrepancies

Shumer finally issued a statement regarding X tonight at 5:30pm EST apologizing and stating, in part,

Noise as well related to a different post X by Sahil Chaudhary, Founder of Glaive AIa platform that, in keeping with Shumer’s earlier claims, was used to generate synthetic data used to coach the Reflection 70B.

Interestingly enough, Chaudhary’s post he stated that some of the responses from Reflection 70B, saying it is a variant of Claude Anthropic, are still a mystery to him. He also admitted that “the benchmark results I shared with Matt have not been repeatable so far.” Read his full post below:

However, Shumer and Chaudhary’s responses weren’t enough to appease skeptics and critics, including Yuchen Jin, co-founder and chief technology officer (CTO) of the company Hyperbolic laboratoriesproviders of open-access artificial intelligence (AI) cloud solutions.

Jin wrote long post about X detailing how hard he worked to get the Reflection 70B version up on his site and fix the alleged bugs, noting that “I was emotionally hurt by it because we put so much time and energy into it, so I tweeted what my faces looked like over the weekend.”

He also responded to Shumer’s statement with the following words: response to X, writing: “Hi Matt, We’ve spent a lot of time, energy, and GPUs hosting your model, and I’m sad you haven’t responded in the last 30 hours. I think you could be more transparent about what happened (especially why your private API has so much better performance).”

Megami Boson, like many others, was unconvinced by Shumer and Chaudhary tonight in terms of the way they described the events, presenting the saga as a collection of mysterious, still unexplained errors born of enthusiasm.

“As far as I can tell, either you’re lying, or Matt Shumer, or both, of course,” he wrote on X, then asked a series of questions. Similarly, the Local Llama subreddit isn’t buying Shumer’s claims:

It stays to be seen whether Shumer and Chaudhary will give you the chance to reply satisfactorily to their critics and skeptics, who include a growing number of members of the online generative AI community.

VB Daily

Stay up thus far! Get the latest news in your inbox every day

By subscribing, you conform to the VentureBeat Terms of Service.

Thanks for subscribing. Check out more VB newsletters here.

An error occurred.

Shaq gives his best advice to student entrepreneurs

The 24-year-old creative’s sales have exceeded $1 million

NBA star Jimmy Butler opens a coffee shop in Miami

How startup competitions provide access to Silicon Valley

How mentoring shapes resilient leaders and thriving teams

Paid search vs. paid social media: finding the right solution for your small business

How did they earn PLN 200,000 after being fired? up to $3 million?

This coffee and gardening store’s strategies for attracting customers

Why streamlining your operations now is key to business success in 2025

A crisis can strike at any time – even on holidays. Are you ready?

Former Amazon employee on how Jeff Bezos deals with failures

Mark Cuban: The ’60s are the new ’40s.

How mission-driven leadership drives growth in the digital age

Kevin O’Leary: Here’s how and when to fire someone

AI startup Cloud Vultr has raised $333 million to $3.5 billion in its first round of external funding

Biggest funding rounds this week: Tenstorrent and Nuvig top picks

Global VC funding surged in November, driven by artificial intelligence and billion-dollar deals

Wealth Management Startups See Funding Growth

Biggest Funding Rounds This Week: xAI and Anthropic Headline Big week for AI (again)

Reflection 70B Model Manufacturer Breaks Silence in Face of Fraud Accusations

Thursday, September 5, 2024: Initial, exaggerated claims of Reflection 70B’s superior performance in benchmarks

Friday, September 6 – Monday, September 9: Third-party rankings do not reflect impressive Reflection 70B results – Shumer accused of fraud

Tuesday, September 10: Shumer responds and apologizes — but doesn’t explain discrepancies

Latest Posts

Paid search vs. paid social media: finding the right solution for...

How did they earn PLN 200,000 after being fired? up to...

Former Amazon employee on how Jeff Bezos deals with failures

The 4 biggest AI stories of 2024 and one key prediction...

The 4 biggest AI stories of 2024 and one key prediction...

Google unveils new Gemini 2.0 Flash Thinking reasoning model that can...

A big tongue-in-cheek: How SLM companies can beat their larger, resource-intensive...

Forecast: 2024 was a slow year for tech IPOs, but 2025...

Recomended

Paid search vs. paid social media: finding the right solution for your small business

How did they earn PLN 200,000 after being fired? up to $3 million?

Former Amazon employee on how Jeff Bezos deals with failures

The 4 biggest AI stories of 2024 and one key prediction for 2025

Shaq gives his best advice to student entrepreneurs

The 24-year-old creative’s sales have exceeded $1 million