OpenAI presents GPT-4: Language model now also understands images

GPT-4 has appeared. Its creative writing skills are said to be enhanced; it understands more extended and complex contexts – and images.

Lesezeit: 6 Min.
In Pocket speichern
vorlesen Druckansicht Kommentare lesen 1 Beitrag

(Bild: Ebru-Omer/

  • Silke Hahn

(Diesen Artikel gibt es auch auf Deutsch.)

GPT-4 is here: As Heise exclusively reported last week, the new version of the AI system has now been released. GPT-4 is no longer a pure language model, but can also handle images in addition to text input. As indicated in German by the CTO of Microsoft Germany on 9 March 2023 at the hybrid kickoff event "AI in focus" in front of business customers, it is indeed a multimodal model that can handle different media – albeit with limitations, there is no mention of text-to-video in the OpenAI release yet. As stated in their blog entry, GPT-4 is capable of interpreting more complex inputs than previously possible and parsing text and images at the same time.

According to OpenAI, the model is supposed to be more creative than the previous GPT-3 series and is probably more geared towards collaboration. It is said to be able to process visual input as well as text input – however, it can apparently only respond in text form and not in images. The text range has been extended: GPT-4 is able to process and generate text up to 25,000 characters long, the announcement states. However, existing problems that were known from ChatGPT have not been solved: The model still tends to confabulate and does not always answer factually.

"What can I make with these ingredients?" - as an answer, GPT-4 suggests possible dishes that can be made from eggs, flour, butter and milk. Combined text and image prompt as input, the answer (output) is in text form.

(Bild: OpenAI)

According to OpenAI, the model should be able to perform creative and technical writing tasks, compose song lyrics, write screenplays or even imitate the style of its users. The ability to generate violent or otherwise harmful content is apparently not banned either. GPT-4 will be available in the paid offer GPT-4 Plus and as an API for developers to build their own applications and services, according to the website (there is a waiting list for API access).

Sam Altman, the CEO of OpenAI, said that the version of GPT-4 that has now been released differs only very slightly from GPT-3.5 in terms of its conversational capability. GPT-3.5 is familiar to most users, as it is the model behind ChatGPT's chat interface. For over a year, the AI scene had speculated on what architecture GPT-4 would have, and Altman himself had tempered expectations in an interview with StrictlyVC in January 2023. After the hype, the public would inevitably be disappointed. It is not yet an AGI – i.e., not a general artificial intelligence at human level.

In the tests conducted internally, GPT-4 is said to have a significantly lower probability than its predecessors of generating unwanted content (reduced by 82 per cent, according to OpenAI) and a 40 per cent higher hit rate for facts than GPT-3.5, for example the well-known version behind ChatGPT. It has apparently outperformed ChatGPT in common benchmark tests and has consistently performed better: for example, GPT-4 is said to be in the top rather than the bottom ten per cent of graduates in a simulated Bar examination (a final law exam).

The OpenAI team trained GPT-4 on "Azure AI supercomputers", as they say on their blog. According to the announcement, GPT-4 had undergone six months of security training and was supposed to have been readjusted for desired behaviour through human feedback in reinforcement learning. A technical research report is available on the OpenAI website. According to it, the architecture of the model is the same as its predecessors, a pre-trained transformer model that predicts the next words according to statistical probability and thus generates its outputs. The model is also supposed to continue learning while in use. More about the research for the model can be found in a separate blog post by the research team.

GPT-4 is said to outperform existing language models in most NLP tasks and to be at least on par with "the vast majority of known SOTA systems". SOTA stands for State-of-the-Art, which means the most powerful AI systems currently available, including from other vendors. In the course of the release, OpenAI also disclosed some pilot customers who are already using GPT-4: The government of Iceland (to preserve its own language, as it says in the blog entry), the language learning app Duolingo, Stripe and the asset management of the major bank Morgan Stanley.

(Bild: Twitter)

In the course of the announcement, Microsoft also announced that the new Bing was already using GPT-4. The assumption had already been circulating in the AI scene, as Microsoft had kept very quiet about the model version used. Microsoft had recently had to restrict its AI-assisted search to a limited number of search queries per IP address and day in order to avoid derailments. From this point of view, there are also initial user experiences of the increased creativity of the new model, according to OpenAI, which had manifested itself with Microsoft's Bing primarily in increased "emotionality" in longer conversations and increased use of emojis.

In their technical report, the OpenAI team also warns that GPT-4 poses "new risks due to its increased capabilities" - exactly which ones and how OpenAI plans to mitigate them, the conclusion is silent. There is still a lot to be done and GPT-4 is a significant step on the way to widely usable and secure AI systems. Further information can be found in the OpenAI release announcement.

You can read more about this here shortly.