Machine learning has been making incredible progress in recent years, with groundbreaking advancements in a wide range of areas. From models that can understand jokes and answer visual questions in multiple languages, to those that can generate images based on text descriptions, the possibilities seem endless. However, these innovations are primarily fueled by the availability of large datasets and advances in training models on this data. Despite some success in scaling robotics models, they have been lagging behind other domains due to a shortage of comparable datasets.
Enter PaLM-E, the new generalist robotics model that aims to overcome these challenges by transferring knowledge from different visual and language domains to a robotics system. Unlike previous efforts that only relied on textual input, PaLM-E takes things up a notch by training the language model to directly ingest raw streams of sensor data from the robotic agent. The result is a powerful model that not only enables effective robot learning but is also a state-of-the-art general-purpose visual-language model, while maintaining excellent language-only task capabilities.
PaLM-E represents a significant leap forward in the field of robotics, as it not only provides a robust model for training robots but also opens up new possibilities for understanding the way that machines perceive and interact with the world. By combining language and vision with the physicality of robotics, PaLM-E has the potential to revolutionize how we think about and approach machine learning. As we continue to push the boundaries of what is possible with AI, models like PaLM-E will undoubtedly play a crucial role in shaping the future of technology.
In their demonstration, the Google team instructed a robot to fetch various items from around the kitchen, such as rice chips from a drawer. The robot was able to complete the task with ease, showcasing its ability to adapt to changes in its environment. To further prove its capabilities, one of the researchers even poked the robot mid-task, and it was able to pick up where it left off and finish the job.
To understand how PaLM-E works, imagine a robot equipped with Chart GPT and MidJourney. Chart GPT enables the robot to comprehend user commands, while its visual sensors gather information about its surroundings. This is a reverse application of MidJourney AI, which generates images from user prompts. In the case of PaLM-E, the model generates text from images or videos of its surroundings. For example, if a user were to enter a prompt like “rice chips in a kitchen drawer,” MidJourney would generate an image from the text.
what midjourney produces from ” a rice chips in a kitchen drawer” text prompt
Considering that MidJourney AI can generate images from text prompts, it’s not difficult to imagine that the reverse could be accomplished. Google Chat, for example, already has a feature that can describe the contents of an image, similar to what would be required for this task.
Once these prompts have been generated, they are sent to a language model like Chart GPT, which is used to control the robot’s arm movement and other functions. Since PaLM-E operates in the real world where things are constantly changing, it works in a continuous loop, gathering visual information about its surroundings and sending it to its interpreters. The model then receives new commands based on this data, allowing it to adapt to its environment in real-time.
Although PaLM-E is a sophisticated robotics model, it is still in the early stages of development. As shown in the video, the videos are sped up, likely due to its current lack of speed. This can be compared to the early days of Boston Dynamics robots, which also faced similar challenges. However, PaLM-E is expected to be more advanced than Boston Dynamics robots, as it can take on new tasks on the fly, whereas the latter are only semi-preprogrammed.
after watching this video you may want to make your own robot replica in 3d, here are some blender material libraries i would recommend,
Matrialiq library that has over 370+ adjustable pbr materials that range from wood, concrete, walls metals and more, these are mostly pbr with variant texture resolutions ranging from 2k to 8k to fit your production requirement.
if pbr materials are not for you then you are going to love sanctus material library which is a library of exclusively procedural materials in a wide range of styles and categories
more about palm-e