Data Lakehouse Onehouse raises $35 million to capitalize on GenAI revolution

Data Lakehouse Onehouse raises  million to capitalize on GenAI revolution

Nowadays, you possibly can’t go an hour without reading about generative artificial intelligence. Although we are still in the embryonic phase of what some dubbed “steam engine” of the fourth industrial revolution, there is little doubt that “GenAI” will give you the chance to transform almost every industry – from finance and healthcare to law and beyond.

Cool user-facing apps may attract most of the fanfare, but the firms driving this revolution are currently reaping the most advantages. This month only, chipmaker Nvidia briefly became the Most worthy company in the world, a tycoon price $3.3 trillion, whose most important goal is the need for artificial intelligence computing power.

- Advertisement -

But in addition to GPUs (graphics processing units), firms also need infrastructure to manage data flow – to store, process, train, analyze and ultimately unlock the full potential of AI.

One of the firms that desires to make money on this is One housethree-year-old California startup founded by Vinoth Chandarwho created open source Apache Hudi project, while also serving as a data architect at Uber. Hudi brings advantages data warehouse Down data lakescreating what became often called a “database” able to supporting activities akin to indexing and real-time querying of huge sets of knowledge, whether structured, unstructured, or semi-structured.

For example, an e-commerce company that continually collects customer data including orders, reviews, and related digital interactions will need a system that may process all of this data and stick with it to date, which will help it recommend products based on user feedback. Hudi enables data to be pulled from various sources with minimal latency, with support for deletion, update, and insert (“upsert”), which is essential for real-time data usage.

Onehouse builds on this by offering a fully managed data lake that helps firms implement Hudi. Or, as Chandar puts it, “it accelerates the processing and standardization of data in open data formats” that could be used with just about all major tools in the data analytics, artificial intelligence and machine learning ecosystems.

“Onehouse eliminates the need to build low-level data infrastructure, helping AI companies focus on their models,” Chandar told TechCrunch.

Onehouse announced today that it has raised $35 million in a Series B funding round because it launches two recent products that improve Hudi performance and reduce cloud storage and computing costs.

Down in the lake house (given).

Onehouse ad on a London billboard.
Image credits: One house

Chandar created Hudi as an internal project at Uber in 2016, and the ride-hailing company has since handed over the project to the Apache Foundation in 2019, Hudi accepted By like AmazonDisney and Walmart.

Chandar left Uber in 2019 and founded Onehouse after a transient stint at Confluent. The startup emerged from stealth in 2022 with $8 million in seed funding and a $25 million Series A round shortly thereafter. Both rounds were co-led by Greylock Partners and Addition.

These VC firms have teamed up again to proceed their Series B round, though this time the round leader is David Sacks’ Craft Ventures.

“Data Lakehouse is quickly becoming the standard architecture for organizations looking to centralize their data to support new services such as real-time analytics, predictive machine learning and GenAI,” Craft Ventures partner Michael Robinson said in a statement.

In the context of knowledge warehouses and data lakes, they are similar in terms of serving as a central repository for data collection. However, they do this in other ways: a data warehouse is ideal for processing and querying historical, structured data, while data lakes have turn out to be a more flexible alternative for storing huge amounts of raw data in its original format, with support for many data types and performing high-performance queries.

This makes data lakes ideal for artificial intelligence and machine learning applications because it is cheaper to store pre-transformed raw data, while also supporting more complex queries because the data could be stored in its original form.

However, the trade-off is an entirely recent set of knowledge management complexities that risk compromising data quality given the wide selection of knowledge types and formats. This is partly what Hudi goals to solve by moving some of the key features of knowledge warehouses to data lakes, akin to ACID Transactions to support data integrity and reliability, in addition to improve metadata management for more diverse datasets.

Configuring data pipelines in Onehouse
Configuring data pipelines in Onehouse.
Image sources: One house

Since it is an open source project, any company can implement Hudi. A fast glance at the logo on the Onehouse website reveals some impressive users: AWS, Google, Tencent, Disney, Walmart, Bytedance, Uber, and Huawei, to name a few. However, the proven fact that such high-profile firms use Hudi internally is indicative of the effort and resources required to build it as a part of an on-premises data lake setup.

“While Hudi provides rich functionality for ingesting, managing and transforming data, companies still need to integrate about half a dozen open source tools to achieve their goals of a production-quality data lake,” Chandar said.

That’s why Onehouse offers a fully managed cloud-native platform that ingests, transforms, and optimizes data in a fraction of the time.

“Users can launch an open data lake in under an hour, ensuring broad interoperability with all major cloud services, warehouses and data lake engines,” Chandar said.

The company was coy about naming its business clients, apart from the couple mentioned in case studyakin to the Indian unicorn Apna.

“As a young company, we do not currently publicly share Onehouse’s entire commercial client list,” Chandar said.

With a fresh $35 million in the bank, Onehouse is now expanding its platform with a free tool called Onehouse LakeView, which provides insight into Lakehouse functionality to gain insight into table statistics, trends, file sizes, timeline history, and more. This builds on existing observability metrics provided by the most important Hudi project, providing additional context around workloads.

“Without LakeView, users have to spend a lot of time interpreting metrics and deeply understanding the entire stack to cause performance issues or inefficiencies in the pipeline configuration,” Chandar said. “LakeView automates this and provides email notifications of good or bad trends, signaling data management needs to improve query performance.”

Additionally, Onehouse is also debuting a recent product called Table Optimizer, a managed cloud service that optimizes existing tables to speed up data ingestion and transformation.

“Open and interoperable”

There are countless other big-name players in this space that may’t be ignored. Companies like Databricks and Snowflake are increasingly emerging adopting the lake house paradigm: Earlier this month, Databricks are reportedly depleted $1 billion to acquire a company called Tabular to create a common lake house standard.

Onehouse has definitely entered hot territory, but it hopes its focus on an “open and interoperable” system that makes it easier to avoid vendor lock-in will help it stand the test of time. Essentially, it guarantees the ability to share a single copy of knowledge from anywhere, including Databricks, Snowflake, Cloudera, and AWS, without having to build separate data silos on each of them.

As with Nvidia in the GPU space, you possibly can’t ignore the opportunities that await every company in the data management space. Data is the cornerstone of AI development, and the most important reason is the lack of enough good quality data why many AI projects fail. However, even if data is packaged, firms still need infrastructure to ingest, transform, and standardize to make it useful. This bodes well for Onehouse and others prefer it.

“When it comes to data management and processing, I believe that high-quality data delivered by a robust data infrastructure will play a key role in bringing AI projects to real-world production applications — to avoid data problems being thrown into the garbage,” Chandar said. “We are starting to see this demand among Data Lakehouse users who are struggling to scale their data processing and query needs to build newer AI applications on enterprise-scale data.”

Latest Posts

Advertisement

More from this stream

Recomended