Grok Gets Glasses to Understand Your Visuals
X’s AI chatbot is now equipped to explain those memes, even if it lacks brevity.
X (formerly Twitter) has rolled out a new feature for its Premium subscribers, allowing the Grok AI assistant to not only create images but also describe them. This enhancement was introduced by xAI, the company owned by Elon Musk, utilizing the Grok-2 AI model, which powers both the AI chatbot and its Flux AI image generation.
With this update, Grok matches the capabilities of competitors like ChatGPT and Gemini. Premium users can test this feature right now by clicking on an image post and asking Grok to analyze or describe the visual content.
Alongside this, xAI presented a new benchmark called RealWorldQA, designed to evaluate a model’s ability to describe real-world images, including the spatial relationships between objects. The company claims that Grok performs as well or better than its competitors in this aspect, even while the feature is still being refined.
See and Grok
As demonstrated in a screenshot, Grok can dissect intricate images and clarify their contents. It can even interpret the humor in a joke, although, as is often the case, explaining a joke tends to diminish its comedic impact. This development signals that xAI continues to enhance Grok, particularly with multimodal capabilities. It could pave the way for Grok to eventually analyze audio and video content in a similar fashion to its visual analysis.
One aspect not addressed is how Grok’s visual analysis will handle the copyright issues surrounding the AI’s image creation. Users have encountered challenges when creating images of copyrighted characters, such as Mario, especially after Nintendo’s copyright enforcement actions. It will be intriguing to see whether Grok describes AI-generated images of copyrighted characters in specific terms or opts for more generic descriptions.
Given the ownership of xAI, there is significant potential for this feature to be integrated into other Musk-led ventures. For instance, Tesla’s semi-autonomous vehicles could greatly benefit from the ability to identify people and objects in their vicinity, including their spatial arrangements. The same goes for Tesla’s long-awaited humanoid robots, which have been under development for several years.