The OpenAI train wreck may be unfolding in slow motion before our eyes, but the company’s competition isn’t sitting back slack-jawed. Anthropic has just released Claude 2.1, an upgrade to its flagship large language model that keeps it competitive with the GPT series, and now has the useful added feature of “being developed by a company that’s not actively at war with itself.”
This new update to Claude has three major improvements: context window, precision, and extensibility.
On the context window front, that is, how much data the model can pay attention to at once, Anthropic has surpassed OpenAI: the embattled Sam Altman announced a window of 128,000 tokens at the company’s Dev Day (it seems so long ago!), and Claude 2.1 can now handle 200,000 tokens. That’s enough for “entire code bases, financial statements like S-1, or even long literary works like The Iliad,” the company wrote.
Of course, having more information does not necessarily mean that the model handles it as well. GPT-4 is still the gold standard in code generation, for example, and Claude will handle requests differently than his competitors, some better, some worse. It’s all a work in progress and ultimately it’s up to users to figure out the best way to handle this new capability.
Accuracy is also supposed to get a boost (this is a notoriously difficult concept to quantify), based on “a large set of complex, factual questions that investigate known weaknesses of current models.” The results show that Claude 2.1 gives fewer incorrect answers, is less likely to hallucinate, and is better at estimating when he cannot be sure: the model is “significantly more likely to object rather than provide incorrect information.” Again, the usefulness of this in practice can only be assessed by the users who put it into operation.
Lastly, Claude 2.1 can now use tools, as can crows and bonobos. However, there are no strong suits for the LLM: it is more like the agent functionality that we are seeing emerge in models intended to interact with web interfaces. If the model discovers that the best it can do for a question is not to reason about it but simply use a calculator or a known API, it will do that instead.
For example, if you don’t know which car or laptop to recommend to someone seeking product advice, you can call a model or database best equipped to answer that question, or even perform a web search if appropriate.
These iterative improvements are sure to be welcomed by developers who employ Claude regularly and show that every day lost at OpenAI due to power struggles is potentially one lost to the competition. Anthropic’s models may not always measure up to OpenAI’s, but this industry is moving forward fast. A few weeks off to catch up could make more of a difference than anyone expects.