Meta he did a few most vital announcements this week for robotics and embodied artificial intelligence systems. This includes sharing patterns and artifacts to higher understand and interact with the physical world. Sparsh, Digit 360 and Digit Plexus, three research artifacts released by Meta, focus on touch perception, robot dexterity and human-robot interaction. Meta also provides PARTNR, a latest benchmark for assessing planning and reasoning in human-robot collaboration.
The release comes as advances in core models, renewed interest in robotics, and artificial intelligence firms step by step expand their breed from the digital world to the physical world.
There is renewed hope in the industry that with the help of basic models resembling large language models (LLM) and vision language models (VLM), robots will have the ability to perform more complex tasks requiring reasoning and planning.
Tactile perception
SparshDeveloped in collaboration with the University of Washington and Carnegie Mellon University, it is a family of encoder models for vision-based touch detection. It is intended to provide robots with touch perception capabilities. Touch perception is crucial in robotics tasks, resembling determining how much pressure could be applied to an object to avoid damaging it.
The classic approach to incorporating vision-based touch sensors into robotic tasks is to use labeled data to train custom models that can predict useful states. This approach does not generalize across different sensors and tasks.
Meta describes Sparsha as a general-purpose model that could be applied to various kinds of vision touch sensors and various tasks. To overcome the challenges of previous generations of touch perception models, researchers trained Sparsh models through self-supervised learning (SSL), which eliminates the need for labeled data. The model was trained on over 460,000 touch images consolidated from various datasets. According to the researchers’ experiments, Sparsh achieves an average of 95.1% improvement over end-to-end task- and sensor-specific models under a limited labeled data budget. Scientists have created different versions of Sparsh based on different architectures, including Meta’s I-JEPA and DINO models.
Touch sensors
In addition to leveraging existing data, Meta is also releasing hardware to collect wealthy tactile information from the physical world. The number 360 is a finger-shaped artificial touch sensor with over 18 sensing functions. The sensor has over 8 million taxels to capture omnidirectional and granular deformations on the surface of the fingertip. Digit 360 captures a number of sensing modalities to provide a richer understanding of how the environment and objects interact.
Digit 360 also has AI models on devices to reduce reliance on cloud-based servers. This allows information to be processed locally and to respond to touch with minimal delay, similar to the reflex arc in humans and animals.
“In addition to enhancing robot dexterity, this breakthrough sensor has significant potential applications, from medicine and prosthetics to virtual reality and telepresence,” write the Meta researchers.
Meta makes the file publicly available code and projects for Digit 360 to stimulate community-led research and innovation in tactile perception. But as with releasing open-source models, it has a lot to gain from the potential adoption of its hardware and models. Researchers imagine that the information captured by Digit 360 could help develop more realistic virtual environments, which could possibly be essential for Meta’s metaversion projects in the future.
Meta is also releasing Digit Plexus, a hardware and software platform aimed at facilitating the development of robotic applications. Digit Plexus can integrate various touch sensors on the fingertips and skin in one robot hand, encode touch data collected from the sensors, and transmit it to a host computer via a single cable. Meta releases code and design Digit Plexus to enable researchers to build on the platform and advance robotic dexterity research.
Meta will produce the Digit 360 in partnership with touch sensor manufacturer GelSight Inc. It will also partner with South Korean robotics company Wonik Robotics to develop a fully integrated robotic hand with touch sensors on the Digit Plexus platform.
Assessment of human-robot cooperation
Meta also provides planning and reasoning tasks for human-robot collaboration (PART NO), a benchmark for assessing the effectiveness of AI models when collaborating with humans on homework tasks.
PARTNR is based on Habitat, a simulated Meta environment. It includes 100,000 natural language tasks in 60 houses and includes over 5,800 unique objects. The benchmark goals to evaluate the performance of LLM and VLM in following instructions from humans.
The latest Meta benchmark joins a growing variety of projects exploring the application of LLM and VLM in robotics and embodied artificial intelligence settings. Over the past 12 months, these models have shown great promise as they’ll function planning and reasoning modules for robots in complex tasks. Startups like Figure and Covariant have developed prototypes that use base models for planning. At the same time, AI labs are working to create higher models of the foundations of robotics. An example is Google DeepMind’s RT-X project, which mixes datasets from various robots to train a vision, language, and motion (VLA) model that generalizes to various morphologies and robotics tasks.