On Sunday, Elon Musk’s AI firm xAI released the base model weights and network architecture of Grok-1, a large language model designed to compete with the models that power OpenAI’s ChatGPT. The open-weights release through GitHub and BitTorrent comes as Musk continues to criticize (and sue) rival OpenAI for not releasing its AI models in an open way.
Announced in November, Grok is an AI assistant similar to ChatGPT that is available to X Premium+ subscribers who pay $16 a month to the social media platform formerly known as Twitter. At its heart is a mixture-of-experts LLM called “Grok-1,” clocking in at 314 billion parameters. As a reference, GPT-3 included 175 billion parameters. Parameter count is a rough measure of an AI model’s complexity, reflecting its potential for generating more useful responses.
xAI is releasing the base model of Grok-1, which is not fine-tuned for a specific task, so it is likely not the same model that X uses to power its Grok AI assistant. “This is the raw base model checkpoint from the Grok-1 pre-training phase, which concluded in October 2023,” writes xAI on its release page. “This means that the model is not fine-tuned for any specific application, such as dialogue,” meaning it’s not necessarily shipping as a chatbot. But it will do next-token prediction, meaning it will complete a sentence (or other text prompt) with its estimation of the most relevant string of text.
“It’s not an instruction-tuned model,” says AI researcher Simon Willison, who spoke to Ars via text message. “Which means there’s substantial extra work needed to get it to the point where it can operate in a conversational context. Will be interesting to see if anyone from outside xAI with the skills and compute capacity puts that work in.”
Musk initially announced that Grok would be released as “open source” (more on that terminology below) in a tweet posted last Monday. The announcement came after Musk sued OpenAI and its executives, accusing them of prioritizing profits over open AI model releases. Musk was a co-founder of OpenAI but is no longer associated with the company, but he regularly goads OpenAI to release its models as open source or open weights, as many believe the company’s name suggests it should do.
On March 5, OpenAI responded to Musk’s allegations by revealing old emails that seemed to suggest Musk was once OK with OpenAI’s shift to a for-profit business model through a subsidiary. OpenAI also said the “open” in its name suggests that its resulting products would be available for everyone’s benefit rather than being an open-source approach. That same day, Musk tweeted (split across two tweets), “Change your name to ClosedAI and I will drop the lawsuit.” His announcement of releasing Grok openly came five days later.
Grok-1: A hefty model
So Grok-1 is out, but can anybody run it? xAI has released the base model weights and network architecture under the Apache 2.0 license. The inference code is available for download at GitHub, and the weights can be obtained through a Torrent link listed on the GitHub page.
With a weights checkpoint size of 296GB, only datacenter-class inference hardware is likely to have the RAM and processing power necessary to load the entire model at once (As a comparison, the largest Llama 2 weights file, a 16-bit precision 70B model, is around 140GB in size).
So far, we have not seen anyone get it running locally yet, but we have heard reports that people are working on a quantized model that will reduce its size so it could be run on consumer GPU hardware (doing this will also dramatically reduce its processing capability, however).
Willison confirmed our suspicions, saying, “It’s hard to evaluate [Grok-1] right now because it’s so big—a [massive] torrent file, and then you need a whole rack of expensive GPUs to run it. There may well be community-produced quantized versions in the next few weeks that are a more practical size, but if it isn’t at least quality-competitive with Mixtral, it’s hard to get too excited about it.”
Appropriately, xAI is not calling Grok-1’s GitHub debut an “open-source” release because that term has a specific meaning in software, and the industry has not yet settled on a term for AI model releases that ship code and weights with restrictions (like Meta’s Llama 2) or ship code and weights without also releasing training data, which means the training process of the AI model cannot be replicated by others. So, we typically call these releases “source available” or “open weights” instead.
“The most interesting thing about it is that it has an Apache 2 license,” says Willison. “Not one of the not-quite-OSI-compatible licenses used for models like Llama 2—and that it’s one of the largest open-weights models anyone has released so far.”