Navigating the Era of Advanced AI: OpenAI’s Pursuit of Controllable Superhuman Intelligence
Amidst the aftermath of Sam Altman’s abrupt departure from OpenAI, a storm brewed among investors while Altman strategized his comeback to the company. Simultaneously, within the corridors of OpenAI, the Superalignment team remained fervently dedicated to addressing the challenge of governing AI intelligence surpassing human capabilities.
“Today, we can basically align models that are dumber than us, or maybe around human-level at most,” Burns said. “Aligning a model that’s actually smarter than us is much, much less obvious — how we can even do it?”
At the helm of the Superalignment initiative stands Ilya Sutskever, the co-founder and chief scientist of OpenAI. This leadership role, which initially garnered little attention in July, has now sparked significant interest, given Sutskever’s previous advocacy for Sam Altman’s dismissal. While certain reports imply that Sutskever finds himself in an uncertain position following Altman’s reinstatement, OpenAI’s public relations team affirms to me that, at present, Sutskever continues to steer the Superalignment team’s endeavors.
Superalignment is a bit of touchy subject within the AI research community. Some argue that the subfield is premature; others imply that it’s a red herring.
While Altman has drawn parallels between OpenAI and the Manhattan Project, actively assembling a team focused on probing AI models to safeguard against potential “catastrophic risks” such as chemical and nuclear threats, experts remain skeptical about the startup’s technology reaching world-ending or surpassing-human capabilities in the near or distant future. According to these experts, claims of imminent superintelligence appear designed to divert attention from crucial present-day AI regulatory concerns, including algorithmic bias and AI’s proclivity toward generating toxic outputs.
Interestingly, Sutskever seems genuinely convinced that AI, not exclusively tied to OpenAI, could one day pose an existential threat. Reportedly, he took symbolic action by commissioning and burning a wooden effigy during a company offsite, underscoring his commitment to preventing potential AI-related harm to humanity. Moreover, he wields substantial influence within OpenAI, commanding a significant share—20%—of the organization’s current computer chips allocation for the Superalignment team’s research efforts.
“AI progress recently has been extraordinarily rapid, and I can assure you that it’s not slowing down,” Aschenbrenner said. “I think we’re going to reach human-level systems pretty soon, but it won’t stop there — we’re going to go right through to superhuman systems … So how do we align superhuman AI systems and make them safe? It’s really a problem for all of humanity — perhaps the most important unsolved technical problem of our time.”
The Superalignment team, currently, is attempting to build governance and control frameworks that might apply well to future powerful AI systems. It’s not a straightforward task considering that the definition of “superintelligence” — and whether a particular AI system has achieved it — is the subject of robust debate. But the approach the team’s settled on for now involves using a weaker, less-sophisticated AI model (e.g. GPT-2) to guide a more advanced, sophisticated model (GPT-4) in desirable directions — and away from undesirable ones.