OpenAI thinks superhuman AI is coming — and wants to build tools to control it

Anton Ioffe - December 16th 2023 - 7 minutes read

As we stand at the cusp of an unprecedented technological horizon, OpenAI, a leading artificial intelligence research lab, is sounding the clarion call about the advent of superhuman AI, a leap in machine intelligence that promises to redefine our future. Their focus now shifts from mere creation to vigilant stewardship: the quest to design sophisticated tools to control a power that could soon surpass human intellect. In the ensuing narrative, we delve into OpenAI's proactive measures to tame this nascent titan of tech. From the philosophical underpinnings to the practicalities of oversight, through the thicket of ethical conundrums and onto the plains of global collaboration, we'll dissect the blueprint of a world bracing for the impact of a superintelligent ally—or adversary. Join us as we explore the imperative of harnessing this formidable force while ensuring that humanity remains firmly at the helm, guiding the course of AI ascendancy.

Defining Superintelligence and OpenAI’s Ethos

In the realm of artificial intelligence, superhuman AI denotes a theoretical form of AI that surpasses human intelligence in every domain—be it reasoning, problem-solving, creativity, or emotional intelligence. While today's AI systems, including sophisticated language models like GPT-4, have demonstrated proficiency in specific tasks, they remain narrowly focused and rely on vast data inputs from human interactions. Superhuman AI, on the other hand, is envisioned to have the ability to understand and learn autonomously at an unprecedented scale, potentially leading to emergent abilities previously unseen and unpredicted by their creators. This leap in capabilities raises pivotal questions around the feasibility of human oversight and the mechanisms required to ensure such an AI's actions would align with human values and intent.

OpenAI operates under the ethos that the development of AI should be done with a vigilant eye on safety and alignment with human values. The company's team, including visionaries like Ilya Sutskever, approaches the inevitability of superintelligence not as a mere hypothesis but as an impending reality that demands proactive measures. OpenAI’s superalignment team, in particular, is tasked with exploring and crafting strategies to maintain control over AI that might one day exhibit intellectual capacities beyond our own. This forward-thinking approach underscores OpenAI's commitment to preemptively addressing the challenges of superintelligence, which they regard as both an immense technological achievement and a potential existential risk.

The pursuit of such superhuman AI calls for an extensive ethical framework and advanced control tools to match its sophistication. OpenAI's philosophy embraces the dual-sided nature of this technological frontier; on the one hand, superhuman AI promises to unlock new levels of knowledge and problem-solving, and on the other, it poses a series of complex risks that must be judiciously managed. Their stance is clear—while they stride toward monumental advances in AI capabilities, there is an equally robust effort to ensure these advances do not outpace our ability to steer them in a direction that is not just beneficial, but also secure and in concordance with the broader imperatives of humanity.

The Toolbox for Superintelligent Oversight

In the pursuit of creating a superintelligence that operates within safe boundaries, the superalignment team at OpenAI has been developing a toolbox equipped for superior oversight. One breakthrough approach involves a hierarchical supervision method where a less powerful large language model oversees its more powerful counterpart. This technique could be a prototype for future systems where superintelligent AIs would be monitored by in-built mechanisms designed to curb any behavior that deviates from human-aligned values. By ensuring that a checks-and-balances system is encoded into the AI itself, OpenAI intends to pre-empt the possibility of a superintelligent AI acting beyond the control of human handlers.

The toolkit also emphasizes the importance of reinforcement learning via human feedback, an existing technique that might scale to superhuman models. With this approach, human input remains a crucial part of the training process, even as the AI outpaces human cognition in certain areas. Humans would score the AI's outputs to foster behaviors that align with human ethics and objectives and discourage those that don't. This ensures that while the AI may develop superhuman processing and problem-solving capabilities, its ultimate goals and methods remain tethered to human judgments, hopefully preventing any rogue actions that could lead to adverse outcomes.

Transparency and accountability serve as foundational pillars for the oversight tools being devised. OpenAI is exploring avenues for AI systems to not only make decisions in an ethical and safe manner but also to report and justify their decision-making processes. Such self-explanatory AIs could assist in maintaining a clear record of operations for their human supervisors, which is essential for managing the immense power these systems may eventually wield. By innovating ways to embed explainability into AI, OpenAI aspires to build trust and understanding between these superhuman entities and the societies they may serve.

Balancing the Scales: Mitigation of Risks and Unintended Consequences

In the wake of rapid developments within artificial intelligence, ethical considerations and safety concerns take center stage. The promise of AI systems that may one day exhibit superhuman capabilities implies a dual-faced potential: on one side, extraordinary problem-solving abilities, and on the other, unprecedented risks and unintended consequences. To navigate this tightrope, OpenAI is embedding mitigation strategies directly into the architecture of their AI models. This includes safety measures like fail-safes and kill switches, designed to halt AI functions if they deviate from intended behavior. Strategic transparency is also being adopted to foster a culture of open discourse, inviting public engagement and scrutiny as a form of participatory assurance. This approach aims to democratize the understanding of AI capabilities and facilitate broad-based consensus on safety norms.

Another layer of OpenAI’s strategic safeguarding is the careful rollout of AI applications. Rather than sequestering AI development behind closed doors, they unveil AI systems incrementally to the public—collecting feedback and insights to inform iterative enhancements. This controlled exposure allows both the AI and its human users to synergize, adapting collectively. By encouraging society to 'toy' with AI under low-stakes conditions, OpenAI leverages the diversity of public interaction to pinpoint vulnerabilities and reinforce system integrity. It is a means of proactive risk management, where the feedback loop between human use and AI development undergirds the pursuit of a responsible deployment that honors the complexities of ethical AI usage.

Moreover, OpenAI's commitment to responsible deployment goes beyond technical safeguards, incorporating a principled stance on the use cases of their AI. Recognizing that the potential benefits of superintelligence should not cast a shadow on societal norms, there is a dedication to preventing applications that could cause harm or provoke moral objections. Teams of policy and safety experts are integral to this process, discerning the information inputs and outputs of AI systems like ChatGPT. They conduct rigorous audits, engage with external safety experts, and ensure that updates are responsive to the evolving landscape of AI interactions. By placing a premium on ethical deployment, OpenAI seeks to ensure that their AI systems serve as a boon to society—even as they edge closer to the horizon of superhuman capabilities.

Collaborative Futures: A Global Call to Action

The emergence of superhuman AI presents unprecedented challenges; it's clear that no single entity can address these perils alone. OpenAI, recognizing this, has embarked on an ambitious campaign to galvanize the global community towards a common goal: to ensure the safe and beneficial progression of artificial intelligence. Their collaborative approach leverages the diverse expertise of university labs, non-profit organizations, and independent researchers. Through generous grants and fellowships, OpenAI not only fosters innovation in AI safety but also underscores the significance of shared responsibility in this new frontier. Their initiatives catalyze a rich exchange of ideas, expanding the dialogue beyond the tech industry to include voices from various sectors and disciplines.

The success of this collective endeavor hinges on establishing a robust global discourse on AI governance. OpenAI's vision extends to shaping international policies that could steadfastly navigate the multitude of ethical, social, and economic impacts of AI systems with capabilities that tower over human intellect. This vision can only be brought to fruition with a polylogue entailing policymakers, ethicists, technologists, and the wider public. The discourse must transcend national boundaries and foster a universal framework that can adapt to an AI landscape that is fast-evolving and unpredictable. The recognition by global leaders of the gravity and immediacy of these challenges is paramount, as is their commitment to crafting agile and informed policies.

Above all, there is a dire need for a paradigm shift towards collective oversight of superhuman AI—a journey that OpenAI cannot and should not undertake alone. As AI technology strides forward, so too must our vigilance and collaborative spirit. Ensuring that AI remains aligned with our highest human values and serves the greater good is an imperative that no one stakeholder can guarantee. It is a task that necessitates the combined efforts of the entire human race, reinforced by openness, transparency, and a steadfast commitment to navigating the unpredictable waters of superhuman intelligence with foresight and moral clarity. Only through widespread cooperation and a shared sense of purpose can we harness the transformative power of AI while safeguarding humanity's future.


OpenAI, a leading artificial intelligence research lab, is highlighting the arrival of superhuman AI and the need for tools to control it. OpenAI is focused on ensuring that AI development aligns with human values and ethics. They are developing oversight tools such as hierarchical supervision and reinforcement learning with human feedback to maintain control over superintelligent AI. OpenAI is also implementing safety measures, strategic transparency, and responsible deployment practices while fostering collaboration and global discourse on AI governance. The collective efforts of the entire human race are crucial to ensuring the safe and beneficial progression of superhuman AI.

Don't Get Left Behind:
The Top 5 Career-Ending Mistakes Software Developers Make
FREE Cheat Sheet for Software Developers