Back to AI Timeline
2023Model

GPT-4 Launches

OpenAI released GPT-4, a multimodal model capable of processing both text and images with significantly improved reasoning abilities compared to its predecessors. It scored in the top percentiles on professional exams including the bar exam and medical licensing tests. GPT-4 set a new benchmark for what large language models could achieve.

On March 14, 2023, OpenAI released GPT-4, a multimodal large language model that represented a significant leap in capability over its predecessors. GPT-4 could process both text and images as input, demonstrated substantially improved reasoning abilities, and scored at or above the 90th percentile on numerous professional and academic exams. It quickly became the benchmark against which all other AI models were measured.

The Capabilities

GPT-4's performance on standardized tests was striking. It scored in the 90th percentile on the Uniform Bar Exam (compared to GPT-3.5's 10th percentile), the 99th percentile on the Biology Olympiad, and passed the US Medical Licensing Exam with a comfortable margin. It demonstrated improved ability to handle complex multi-step reasoning, follow nuanced instructions, and produce factually accurate responses. While it still made errors, the reduction in hallucinations compared to GPT-3.5 was substantial.

Multimodal Input

For the first time, a GPT model could accept images as input alongside text. Users could upload photographs, charts, diagrams, or screenshots and ask GPT-4 to analyze, describe, or reason about them. The model could read text in images, interpret charts, solve visual puzzles, and even understand memes. While it could not generate images (that capability came later with DALL-E integration), its visual understanding opened new application possibilities.

The Mystery of Scale

OpenAI made the unusual decision to release almost no technical details about GPT-4's architecture, training data, or size. The technical report focused on capabilities and safety evaluations rather than methodology. This departure from traditional academic transparency was criticized by many researchers but reflected OpenAI's increasing focus on competitive advantage and safety concerns about enabling replication of the most powerful AI systems.

Safety Efforts

GPT-4 represented OpenAI's most significant investment in safety and alignment to date. The model underwent six months of adversarial testing before release, with red-teaming by external experts in areas like cybersecurity, persuasion, and biosecurity. Safety mitigations reduced the model's tendency to produce harmful content by 82 percent compared to GPT-3.5. However, creative jailbreaks continued to emerge, highlighting the ongoing challenge of making AI systems robustly safe.

Integration and Applications

GPT-4 was rapidly integrated into products across industries. Microsoft embedded it in Bing Chat, GitHub Copilot X, and Microsoft 365 Copilot. Khan Academy used it for personalized tutoring. Morgan Stanley used it for financial analysis. Duolingo used it for language learning. Be My Eyes used it to help visually impaired users understand their surroundings. The breadth of applications demonstrated GPT-4's versatility as a general-purpose reasoning engine.

The Competitive Response

GPT-4's release intensified the AI race. Google accelerated its Gemini project. Anthropic scaled up Claude's capabilities. Open-source efforts like Llama 2 aimed to close the gap. The model's demonstrated capabilities raised the stakes for everyone in the industry and increased the pace of development across the board.

Legacy

GPT-4 was a inflection point in AI capability. It was the first AI model that many experts considered genuinely useful for professional-level cognitive work across a wide range of domains. Whether drafting legal briefs, analyzing medical images, tutoring students, or writing code, GPT-4 demonstrated that AI could serve as a capable assistant for knowledge work at a level that was previously the exclusive domain of trained professionals.

Key Figures

Sam AltmanIlya SutskeverMira MuratiGreg Brockman

Lasting Impact

GPT-4 set a new standard for AI capability, demonstrating professional-level performance across diverse domains and becoming the first model widely considered genuinely useful for professional cognitive work. It accelerated the AI arms race and expanded the commercial applications of large language models.

Related Events

2022Milestone
ChatGPT Launches

OpenAI released ChatGPT on November 30, 2022, and it became the fastest-growing consumer application in history, reaching 100 million users in just two months. Built on GPT-3.5 with reinforcement learning from human feedback, it made conversational AI accessible to the general public. ChatGPT fundamentally shifted public perception of AI capabilities and triggered an industry-wide race.

2020Model
GPT-3 Launches

OpenAI released GPT-3 with 175 billion parameters, demonstrating remarkable few-shot learning abilities across a wide range of tasks without task-specific fine-tuning. Users could prompt the model with just a few examples and get high-quality outputs for translation, code generation, creative writing, and more. GPT-3 ignited a wave of AI startups and applications built on large language models.

2023Product
Claude Launches (Anthropic)

Anthropic released Claude, an AI assistant designed with a focus on safety, helpfulness, and honesty using Constitutional AI techniques. Claude offered strong conversational abilities with a notably careful and nuanced approach to sensitive topics. It established Anthropic as a major competitor in the large language model space.

2024Model
Gemini Launches (Google)

Google launched Gemini, its most capable multimodal AI model family, designed to natively understand and reason across text, code, images, audio, and video. Gemini Ultra matched or exceeded GPT-4 on key benchmarks, signaling Google's return to the forefront of the AI race. The model was integrated across Google products from Search to Workspace.