OpenAI CEO Greg Brockman posted from his X account what appears to be the first public image generated using the company’s latest GPT-4o model.
As you may see in the image below, it’s quite convincingly photorealistic and shows a person wearing a black T-shirt with the OpenAI logo writing something in chalk on a blackboard that reads, “Transfer between modalities. Suppose we directly model P (text, pixels, audio) with one large autoregressive transformer. What are the benefits and disadvantages?”
The latest GPT-4o model, which debuted on Monday, improves on the previous family of GPT-4 models (GPT-4, GPT-4 Vision and GPT-4 Turbo) by being faster, cheaper and retaining more information from inputs akin to audio and picture.
It is in a position to do this because OpenAI has taken a different approach than its previous GPT-4 LLM. While these combined many different models together and converted other media akin to audio and images to text and vice versa, the latest GPT-4o was trained on multimedia tokens from the outset, allowing it to directly analyze and interpret image and audio without prior convert it to text.
VB event
Artificial Intelligence Impact Tour: An Artificial Intelligence Audit
Ask for an invitation
Based on the image above, the latest approach is a noticeable improvement over the last OpenAI DALL-E 3 image generation model that debuted in September 2023. I ran a similar prompt on DALL-E 3 in ChatGPT and here is the result.
As you may see, Brockman’s shared image created with GPT-4o significantly improves the quality, photorealism, and accuracy of text generation.
However, GPT-4o’s native image generation capabilities are not yet publicly available. As Brockman alluded to in his X post, saying, “The team is working hard to share them with the world.”