OpenAI Launches Experimental GPT-4o Long Output Model with 16X Token Capacity

OpenAI Launches Experimental GPT-4o Long Output Model with 16X Token Capacity

Join our day by day and weekly newsletters to receive the latest updates and exclusive content on industry-leading AI coverage. Learn more


Apparently OpenAI is looking at the money crisisbut that does not stop the leading generative AI company from continuing to release recent models and updates.

Yesterday the company quietly published a website announcing a recent large language model (LLM): GPT-4o Long Output, which is a variation of the flagship GPT-4o model from May but with a significantly expanded output size: as much as 64,000 output tokens as an alternative of the initial 4,000 for GPT-4o — a 16x increase.

- Advertisement -

Tokens, as you might remember, confer with numerical representations of conceptsgrammatical structures and mixtures of letters and numbers arranged on the basis of their semantic meaning in the context of LLM studies.

For example, the word “Hello” is one of the tokens, but so is “hi.” You can see an interactive demonstration of the tokens in motion via OpenAI’s Tokenizer HereMachine Learning Researcher Simon Willison also has a great interactive token encoder/decoder.

By offering a 16x increase in the variety of output tokens with the recent GPT-4o Long Output variant, OpenAI now gives users — or more precisely, third-party developers building on its application programming interface (API) — the ability to get much longer responses from the chatbot, as much as about 200 pages long.

Why is OpenAI introducing a longer lead time model?

OpenAI’s decision to introduce this prolonged output feature comes as a results of customer feedback indicating the need for longer output contexts.

An OpenAI spokesperson told VentureBeat: “We’ve heard feedback from our customers that they want more contextual output. We’re always testing new ways to best serve our customers’ needs.”

The alpha testing phase is expected to last several weeks, allowing OpenAI to gather data on how well the augmented results meet user needs.

This prolonged functionality is particularly helpful for applications requiring detailed and extensive results, reminiscent of code editing and typing refinement.

By offering more robust results, the GPT-4o model can provide more comprehensive and nuanced answers, which may bring significant advantages to these kind of use cases.

Distinguishing between context and end result

At launch, GPT-4o offered a maximum of 128,000 context windows – the variety of tokens the model could handle in a single interaction, including each input and output tokens.

For long GPT-4o output, the maximum context window stays at 128,000.

So how is OpenAI in a position to increase its token count 16x, from 4,000 to 64,000, while keeping its overall context window at 128,000?

It all comes right down to basic math: despite the fact that the original GPT-4o from May had a total context window of 128,000 tokens, its individual message was limited to 4,000.

Similarly, for the recent GPT-4o mini window, the total context is 128,000, but the maximum output has been increased to 16,000 tokens.

This implies that for GPT-4o, a user can provide as much as 124,000 tokens as input and receive a maximum of 4,000 model outputs in a single interaction. They also can provide more tokens as input but receive fewer as outputs, adding as much as 128,000 tokens in total.

For GPT-4o mini, the user can input a maximum of 112,000 tokens to get a maximum of 16,000 tokens output.

For the GPT-4o long output, the total context window is still limited to 128k. However, the user can now provide inputs of as much as 64k tokens in exchange for a maximum of 64k return tokens — that is, if the user or developer of an application built on this output desires to prioritize longer LLM responses while limiting the inputs.

In all cases, the user or developer has to make a alternative or compromise: do they wish to sacrifice some input tokens for longer outputs while staying with 128,000 tokens in total? For users who want longer responses, GPT-4o Long Output now offers this selection.

Aggressive and reasonably priced prices

Prices for the recent GPT-4o Long Output model are as follows:

  • $6 for 1 million input tokens
  • $18 for 1 million output tokens

Compare this to the standard GPT-4o price of $5 per million input tokens and $15 per million output tokens, or even the recent GPT-4o mini price of $0.15 per million input tokens and $0.60 per million output tokens, and you may see that the price is quite aggressive, continuing OpenAI’s recent push to make powerful AI reasonably priced and accessible to a wide selection of user-developers.

Currently, access to this experimental model is limited to a small group of trusted partners. The spokesperson added: “We are running alpha tests for a few weeks with a small number of trusted partners to see if the longer-term results help their use cases.”

Depending on the results of this testing phase, OpenAI may consider expanding access to a broader customer base.

Future perspectives

Ongoing alpha testing will provide precious information on the practical applications and potential advantages of the expanded baseline model.

If the response from this first group of partners is positive, OpenAI may consider rolling this feature out more broadly, allowing a wider range of users to learn from the improved output capabilities.

It is clear that with the GPT-4o Long Output model, OpenAI hopes to satisfy an even wider range of customer requirements and advanced applications requiring detailed answers.

Latest Posts

Advertisement

More from this stream

Recomended