Cohere adds vision to its RAG search capabilities

Cohere adds vision to its RAG search capabilities


Cohere added multimodal embedding to its search model, allowing users to deploy images for RAG-style enterprise search.

Embed 3, which launched last yr, uses embedding models that transform data into numerical representations. Embedding has turn out to be crucial in the augmented search generation (RAG) process, as enterprises can embed their documents, which the model can then compare to obtain the information required in the suggestion.

- Advertisement -

The latest multimodal version can generate embeddings in each images and texts. Cohere says Embed 3 is “currently the most comprehensive multimodal embedding model on the market.” Aidan Gomez, co-founder and CEO of Cohere, posted a chart on X showing the improvement in image search performance with Embed 3.

“This advancement is enabling enterprises to unlock real value from the vast amounts of data stored in images,” Cohere said blog post. “Companies can now create systems that accurately and quickly query important multimodal assets such as complex reports, product catalogs and design files to increase workforce productivity.”

Cohere said a more multimodal focus increases the amount of information firms can access through RAG search. Many organizations often limit RAG searches to structured and unstructured text, although multiple file formats exist in data libraries. Customers can now profit from more charts, product images and design templates.

Performance improvements

Cohere said encoders in Embed 3 “share a unified latent space,” allowing users to include each images and text in the database. Some image embedding methods often require maintaining a separate database for images and text. The company found that this method leads to higher search across modalities.

According to the company, “Other models tend to group text and image data into separate areas, leading to poor search results that focus solely on text data. Embed 3, on the other hand, prioritizes the importance of information without favoring a particular modality.

Embed 3 is available in over 100 languages.

Cohere said the multimodal Embed 3 is now available on its platform and Amazon SageMaker.

I’m catching up

Many consumers are quickly becoming familiar with multimodal search thanks to the introduction of image-based search on platforms like Google and chat interfaces like ChatGPT. As individual users turn out to be accustomed to finding information based on photos, it is logical that they would really like to have the same experience in their skilled lives.

Enterprises have also began to see this profit as other embedding model firms provide some multimodal options. Some modelers, e.g Google AND OpenAIthey provide some form of multimodal embedding. Other open source models also can facilitate the embedding of images and other modalities. The fight now is in a multimodal embedding model that may operate with the speed, accuracy and security that enterprises demand.

Cohere, founded by some of the researchers behind the Transformer model (Gomez is one of the authors of the famous article “Attention is all you need”), has struggled to gain recognition from many in the enterprise community. In September, it updated its APIs to allow customers to easily switch from competitor models to Cohere models. At the time, Cohere said the move was about aligning with industry standards where customers often switch between models.

Latest Posts

Advertisement

More from this stream

Recomended