On Thursday, OpenAI announced the launch of GPT-4o mini, a new, smaller version of its latest GPT-4o AI language model that will replace GPT-3.5 Turbo in ChatGPT, reports CNBC and Bloomberg. It will be available today for free users and those with ChatGPT Plus or Team subscriptions and will come to ChatGPT Enterprise next week.
GPT-4o mini will reportedly be multimodal like its big brother (which launched in May), interpreting images and text and also being able to use DALL-E 3 to generate images.
OpenAI told Bloomberg that GPT-4o mini will be the company’s first AI model to use a technique called “instruction hierarchy” that will make an AI model prioritize some instructions over others (such as from a company), which may make it more difficult for people to perform prompt injection attacks or jailbreaks that subvert built-in fine-tuning or directives given by a system prompt.
The value of smaller language models
OpenAI isn’t the first company to release a smaller version of an existing language model. It’s a common practice in the AI industry from vendors such as Meta, Google, and Anthropic. These smaller language models are designed to perform simpler tasks at a lower cost, such as making lists, summarizing, or suggesting words instead of performing deep analysis.
Smaller models are typically aimed at API users, which pay a fixed price per token input and output to use the models in their own applications, but in this case, offering GPT-4o mini for free as part of ChatGPT would ostensibly save money for OpenAI as well.
OpenAI’s head of API product, Olivier Godement, told Bloomberg, “In our mission to enable the bleeding edge, to build the most powerful, useful applications, we of course want to continue doing the frontier models, pushing the envelope here. But we also want to have the best small models out there.”
Smaller large language models (LLMs) usually have fewer parameters than larger models. Parameters are numerical stores of value in a neural network that store learned information. Having fewer parameters means an LLM has a smaller neural network, which typically limits the depth of an AI model’s ability to make sense of context. Larger-parameter models are typically “deeper thinkers” by virtue of the larger number of connections between concepts stored in those numerical parameters.
However, to complicate things, there isn’t always a direct correlation between parameter size and capability. The quality of training data, the efficiency of the model architecture, and the training process itself also impact a model’s performance, as we’ve seen in more capable small models like Microsoft Phi-3 recently.
Fewer parameters mean fewer calculations required to run the model, which means either less powerful (and less expensive) GPUs or fewer calculations on existing hardware are necessary, leading to cheaper energy bills and a lower end cost to the user.
It looks like CNBC and Bloomberg possibly broke an embargo and published their stories prior to OpenAI’s official blog release about GPT-4o Mini. This is a breaking news story and will be updated as details emerge.