LanceDB, for which Midjourney is a client, builds databases for multimodal artificial intelligence

LanceDB, for which Midjourney is a client, builds databases for multimodal artificial intelligence

Chang She, formerly VP of Engineering at Tubi and a Cloudera veteran, has a long time of experience building data tools and infrastructure. But when he began working in the AI ​​space, he quickly ran into problems with traditional data infrastructure – problems that prevented him from taking AI models to production.

“Machine learning engineers and artificial intelligence researchers often have poor programming experience,” she said in an interview with TechCrunch. “Data infrastructure companies don’t really understand the machine learning data problem at a fundamental level.”

- Advertisement -

So Chang — one of the co-creators of Pandas, the wildly popular Python data science library — teamed up with software engineer Lei Xu to launch LanceDB.

LanceDB makes the eponymous open-source database software LanceDB, which is designed to support multimodal artificial intelligence models – models that train and generate images, videos, and more in addition to text. With Y Combinator’s backing, LanceDB raised $8 million this month in a seed funding round led by CRV, Essence VC and Swift Ventures, bringing its total to $11 million.

“If multimodal AI is critical to the future success of your company, you want your very expensive AI team to focus on the model and connecting AI to business value,” Chang said. “Unfortunately, AI teams today spend most of their time dealing with the low-level details of data infrastructure. LanceDB provides the foundation AI teams need, giving them the freedom to focus on what really matters to enterprise value and bring AI products to market much faster than otherwise possible.”

LanceDB is essentially a vector database – a database containing a series of numbers (“vectors”) that encode the meaning of unstructured data (e.g. images, text, etc.).

As my colleague Paul Sawers recently wrote, vector databases are experiencing a period where the AI ​​hype cycle is at its peak. This is because they are useful in all AI applications, from content advice in e-commerce and social media platforms to reducing hallucinations.

Competition in vector databases is fierce – see Qdrant, Vespa, Weaviate, Pinecone, and Chroma, to call a few vendors (not counting Big Tech dominating the market). So what makes LanceDB special? According to Chang, improved flexibility, performance and scalability.

First, Chang says, LanceDB – which is built on top of Apache arrow — is powered by a custom data format, Lance Format, optimized for multimodal AI training and analytics. The Lance Format enables LanceDB to handle as much as billions of vectors and petabytes of text, images, and videos, and enables engineers to administer various types of metadata associated with this data.

“Until now, there has been no system that combines training, exploration, search and processing of data at scale,” Chang said. “Lance Format allows AI researchers and engineers to have a single source of truth and achieve lightning-fast performance across the entire AI pipeline. It’s not just about vector storage.”

LanceDB makes money by selling fully managed versions of its open source software with additional features comparable to hardware acceleration and management controls, and business appears to be doing well. The company’s client list includes text-to-image platform Midjourney, unicorn chatbot Character.ai, autonomous automotive startup WeRide and Airtable.

Chang insisted that LanceDB’s recent VC backing wouldn’t distract him from the open source project, which he said currently has around 600,000 downloads per 30 days.

“We wanted to create something that would make it 10 times easier for AI teams to work with large-scale, multimodal data,” he said. “LanceDB offers – and will continue to offer – a very rich set of ecosystem integrations to minimize deployment efforts.”

Latest Posts

Advertisement

More from this stream

Recomended