Fairgen “tweaks” survey results using synthetic data and responses generated by artificial intelligence

Fairgen “tweaks” survey results using synthetic data and responses generated by artificial intelligence

Surveys have been used since time immemorial to acquire knowledge about populations, products and public opinion. And while methodologies may have modified over the millennia, one thing has remained constant: the need for people, plenty of people.

But what if you may’t find enough people to build a large enough sample group and generate meaningful results? What if you may potentially find enough people, but budget constraints limit the number of individuals you may recruit and interview?

- Advertisement -

This is where Fairgen I need to assist. An Israeli startup today launched a platform that uses “statistical artificial intelligence” to generate synthetic data that it says is nearly as good as the real thing. The company is also announcing a recent fundraising of $5.5 million from Maverick Ventures Israel, The Creator Fund, Tal Ventures, Ignia and a handful of angel investors, bringing the total amount of funds raised since inception to $8 million.

“False Data”

The data could also be the driving force of artificial intelligence, but it has also at all times been the cornerstone of market research. So when these two worlds collide, as they do in Fairgen’s world, the need for high-quality data becomes a little more pronounced.

Founded in Tel Aviv, Israel in 2021, Fairgen has previously been the focus of combating bias in artificial intelligence. However, at the end of 2022, the company decided on a recent product: Fairboostwhich is currently coming out of beta.

Fairboost guarantees to “augment” a smaller data set by as much as threefold, enabling more detailed insight into niches that might otherwise be too difficult or expensive to access. This allows firms to coach a deep machine learning model for each data set they upload to the Fairgen platform, using the AI’s statistical learning patterns across different survey segments.

The concept of “synthetic data” – data created artificially moderately than based on real-world events – is not recent. Its roots go back to the early days of computer science, when it was used to check software and algorithms and simulate processes. However, synthetic data, as we understand it today, has taken on a lifetime of its own, especially with the advent of machine learning, where it is increasingly used to coach models. We can address each data scarcity and data privacy concerns by using artificially generated data that does not contain any sensitive information.

Fairgen is the latest startup to check synthetic data, and its principal goal is market research. It’s value noting that Fairgen doesn’t generate data out of thin air or throw thousands and thousands of historical studies into an AI-powered crucible – market researchers have to survey a small sample of their goal market, and from there Fairgen establishes sample expansion patterns. The company says it could guarantee at least a 2x increase over the original sample, but on average it could achieve a 3x increase.

In this manner, Fairgen can determine that somebody in a certain age range and/or income level is more more likely to answer a query in a certain way. You may mix as many data points as you ought to extrapolate from the original dataset. Basically, it’s about generating what Fairgen’s co-founder and CEO does Samuel Cohen says they are “stronger, more robust segments of data with a lower margin of error.”

“The main takeaway was that people are becoming more diverse – brands need to adapt to that and understand their customer segments,” Cohen explained to TechCrunch. “The segments are very different – representatives of Generation Z think differently than older people. And to be able to have that understanding of the market at a segment level, it costs a lot of money, it takes a lot of time and operational resources. And then I realized that that was where the pain was. We knew synthetic data had a role to play here.”

The obvious criticism – which the company admits it has struggled with – is that this all appears like a huge shortcut to having to go out into the field, interview real people and gather real feedback.

Surely any underrepresented group should worry about their real votes being replaced by, well, fake votes?

“Every client we’ve talked to in the research space has huge blind spots – completely hard-to-reach audiences” – Fairgen Chief Development Officer, Fernando Zatz, he told TechCrunch. “They don’t actually sell projects because there aren’t enough workers available, especially in an increasingly diverse world where there is a lot of market segmentation. Sometimes they cannot go to certain countries; they can’t provide specific demographics, so they actually lose out on projects because they can’t meet their quotas. They have minimal numbers [of respondents]and if they don’t hit that number, they aren’t selling the insights.”

Fairgen is not the only company using generative AI in market research. Last 12 months, Qualtrics said it could invest $500 million over 4 years to bring generative AI to its platform, although with substantive emphasis on qualitative research. However, this is further proof that synthetic data already exists and will remain so.

However, validating the results will play an necessary role in convincing people who this is the real deal and not some cost-cutting measure that can deliver suboptimal results. Fairgen does this by comparing “real” sample gain with “synthetic” sample gain – it takes a small sample of the data set, extrapolates it, and places it next to the real one.

“We do the exact same type of test for every client we sign up,” Cohen said.

Statistically speaking

Cohen holds a Master of Science in Statistics from the University of Oxford and a PhD in Machine Learning from UCL London, a part of which included a nine-month stint as a research fellow at Meta.

One of the company’s co-founders is its president Benny Schnaiderwho previously worked in the enterprise software space, with 4 exits under his belt: Ravello to Oracle for a reported $500 million in 2016; Qumranet to Red Hat for $107 million in 2008; P-Cube for Cisco Down 200 million dollars in 2004; and Pentacom to Cisco for $118 in 2000.

And then there it is Emmanuel Candesprofessor of statistics and electrical engineering at Stanford University, who serves as Fairgen’s chief scientific advisor.

This business and mathematical framework is a major asset for a company attempting to persuade the world that fake data could be nearly as good as real data if applied appropriately. In this manner, they are also in a position to clearly explain the thresholds and limitations of their technology – how large samples must be to attain optimal gain.

According to Cohen, ideally a survey would wish at least 300 real respondents, and based on that, Fairboost can increase the size of a segment that is not more than 15% of the broader survey.

“Below 15%, we can guarantee an average 3x increase after validating it in hundreds of parallel tests,” Cohen said. “Statistically, an increase above 15% is less dramatic. The data already shows a good level of confidence, and our synthetic respondents can only potentially match it or provide a marginal increase. From a business point of view, there is also no problem above 15% – brands can already learn from these groups; they are only stuck at a niche level.”

Lack of LLM factor

It’s value noting that Fairgen doesn’t use Large Language Models (LLM) and its platform doesn’t generate ChatGPT-style “plain English” responses. The reason for this is that LLM will use findings from countless other data sources beyond the study parameters, which increases the risk of introducing bias that is inconsistent with quantitative research.

Fairgen relies on statistical models and tabular data, and its training is based solely on the data contained in the uploaded dataset. This effectively allows market researchers to generate recent and synthetic respondents by extrapolating from adjoining segments of the study.

“We don’t use any LLM for a very simple reason, which is if we were to train for a lot of training, [other] polls, it would simply amount to disinformation,” Cohen said. “Because in another study we might learn something, and we don’t want that. It all depends on reliability.”

In terms of the business model, Fairgen is marketed as a SaaS, where firms upload their surveys in any structured format (.CSV or .SAV) to Fairgen’s cloud platform. According to Cohen, it takes as much as 20 minutes to coach the model based on the survey data provided, depending on the variety of questions. The user then selects a “segment” (a subset of respondents with specific characteristics) – e.g., “Gen Z working in industry

Fairgen is used by BVA and a French survey and market research company IFOP, which have already integrated the startup’s technology into their services. IFOP, or something like that Gallup in the US is using Fairgen for polling purposes in the European elections, although Cohen believes it may be used in the US elections later this 12 months.

“IFOP is basically our stamp of approval because they have been around for about 100 years,” Cohen said. “They approved the technology and were our original design partner. We are also testing or already integrating with some of the largest market research companies in the world, which I am not allowed to talk about yet.”

Latest Posts

Advertisement

More from this stream

Recomended