In the future, Sastry added, A.I. systems might interpret whether a query requires a rigorous factual answer or something more creative. In other words, if you wanted an analytical report with citations and detailed attributions, the A.I. would know to deliver that. And if you desired a sonnet about the indictment of Donald Trump, well, it could dash that off instead.
In late June, I began to experiment with a plug-in the Wikimedia Foundation had built for ChatGPT. At the time, this software tool was being tested by several dozen Wikipedia editors and foundation staff members, but it became available in mid-July on the OpenAI website for subscribers who want augmented answers to their ChatGPT queries. The effect is similar to the “retrieval” process that Jesse Dodge surmises might be required to produce accurate answers. GPT-4’s knowledge base is currently limited to data it ingested by the end of its training period, in September 2021. A Wikipedia plug-in helps the bot access information about events up to the present day. At least in theory, the tool — lines of code that direct a search for Wikipedia articles that answer a chatbot query — gives users an improved, combinatory experience: the fluency and linguistic capabilities of an A.I. chatbot, merged with the factuality and currency of Wikipedia.
One afternoon, Chris Albon, who’s in charge of machine learning at the Wikimedia Foundation, took me through a quick training session. Albon asked ChatGPT about the Titan submersible, operated by the company OceanGate, whose whereabouts during an attempt to visit the Titanic’s wreckage were still unknown. “Normally you get some response that’s like, ‘My information cutoff is from 2021,’” Albon told me. But in this case ChatGPT, recognizing that it couldn’t answer Albon’s question — What happened with OceanGate’s submersible? — directed the plug-in to search Wikipedia (and only Wikipedia) for text relating to the question. After the plug-in found the relevant Wikipedia articles, it sent them to the bot, which in turn read and summarized them, then spit out its answer. As the responses came back, hindered by only a slight delay, it was clear that using the plug-in always forced ChatGPT to append a note, with links to Wikipedia entries, saying that its information was derived from Wikipedia, which was “made by volunteers.” And this: “As a large language model, I may not have summarized Wikipedia accurately.”
But the summary about the submersible struck me as readable, well supported and current — a big improvement from a ChatGPT response that either mangled the facts or lacked real-time access to the internet. Albon told me, “It’s a way for us to sort of experiment with the idea of ‘What does it look like for Wikipedia to exist outside of the realm of the website,’ so you could actually engage in Wikipedia without actually being on Wikipedia.com.” Going forward, he said, his sense was that the plug-in would continue to be available, as it is now, to users who want to activate it but that “eventually, there’s a certain set of plug-ins that are just always on.”
In other words, his hope was that any ChatGPT query might automatically result in the chatbots’ checking facts with Wikipedia and citing helpful articles. Such a process would probably block many hallucinations as well: For instance, because chatbots can be deceived by how a question is worded, false premises sometimes elicit false answers. Or, as Albon put it, “If you were to ask, ‘During the first lunar landing, who were the five people who landed on the moon?’ the chatbot wants to give you five names.” Only two people landed on the moon in 1969, however. Wikipedia would help by offering the two names, Buzz Aldrin and Neil Armstrong; and in the event the chatbot remained conflicted, it could say it didn’t know the answer and link to the article.