OpenAI Launches GPT-4 Turbo With Enhanced Vision Capabilities for API and ChatGPT

OpenAI’s Enhanced Vision Capabilities Enable JSON Mode and Function Calling

OpenAI made a significant announcement on Tuesday, unveiling a major enhancement to its latest artificial intelligence (AI) model, GPT-4 Turbo. This updated version now integrates computer vision capabilities, enabling it to process and analyze multimedia inputs such as images and videos. With this advancement, GPT-4 Turbo can effectively respond to questions and inquiries related to visual content. Notably, the company showcased several AI-powered tools leveraging GPT-4 Turbo with Vision, including the AI coding assistant Devin and Healthify’s Snap feature. Just last week, OpenAI introduced a feature allowing users to edit DALL-E 3 generated images directly within ChatGPT.

The announcement was disseminated via the official OpenAI Developers account on X (formerly known as Twitter). In their post, they stated, “GPT-4 Turbo with Vision is now generally available in the API. Vision requests can now also use JSON mode and function calling.” Subsequently, OpenAI’s X account revealed that this feature is now accessible in the API and is gradually being rolled out in ChatGPT.

Essentially, GPT-4 Turbo with Vision builds upon the foundation of the GPT-4 model, incorporating the higher token outputs introduced in the Turbo model. Now, with enhanced computer vision capabilities, the model is equipped to analyze multimedia files more effectively. These vision capabilities can be harnessed in various ways. For instance, an end user could upload an image of the Taj Mahal to ChatGPT and inquire about the materials used in its construction. Furthermore, developers have the opportunity to further refine and tailor this capability within their tools for specific use cases.

OpenAI announced a major improvement to its latest artificial intelligence (AI) model GPT-4 Turbo on Tuesday. The AI model now comes with computer vision capabilities, allowing it to process and analyse multimedia inputs. It can answer questions about an image, video, and more. The company also highlighted several AI tools which are powered by GPT-4 Turbo with Vision including the AI coding assistant Devin and Healthify’s Snap feature. Last week, the AI firm introduced a new feature that would allow users to edit DALL-E 3 generated images within ChatGPT.

 

 

The announcement was made by the official account of OpenAI Developers, which said in an X (formerly known as Twitter) post, “GPT-4 Turbo with Vision is now generally available in the API. Vision requests can now also use JSON mode and function calling.” Later, the X account of OpenAI also revealed that the feature is now available in API and it is being rolled out in ChatGPT.

GPT-4 Turbo with Vision is essentially the GPT-4 foundation model with the higher token outputs introduced with the Turbo model, and it now comes with improved computer vision to analyse multimedia files. The vision capabilities can be used in a variety of methods. The end user, for instance, can use this capability by uploading an image of the Taj Mahal on ChatGPT, and asking it to explain what material the building is made up of. Developers can take this a step ahead and fine-tune the capability in their tools for specific purposes.

Similarly, the Indian calorie tracking and nutrition feedback platform Healthify has a feature called Snap where users can click a picture of a food item or a cuisine, and the platform reveals the possible calories in it. With GPT-4 Turbo with Vision’s capabilities, it now also recommends what the user should do to burn the extra calories or ways to reduce calories in the meal.