Stanford CS25: Transformers United V6 I From Next-Token Prediction to Next-Generation Intelligence // TRAIN BRAIN

Stanford CS25: Transformers United V6 I From Next-Token Prediction to Next-Generation Intelligence

For more information about Stanford’s graduate programs, visit: https://online.stanford.edu/graduate-education
April 30, 2026
This seminar covers:
• Recent progress in pretraining algorithm design for large language models (LLMs), emphasizing the role of data ordering, reasoning-centric data integration, and reinforcement-based objectives in shaping model capability.
• The introduction of a two-phase pretraining framework that formalizes strategies for data selection, blending, and sequencing
• A demonstration that front-loading reasoning-rich data during pretraining yields persistent gains in reasoning accuracy that post-training alone cannot reproduce
Follow along with the seminar schedule. Visit: https://web.stanford.edu/class/cs25/
Guest Speaker: Shrimai Prabhumoye (Mistral AI, prev. NVIDIA)
Instructors:
• Steven Feng, Stanford Computer Science PhD student and NSERC PGS-D scholar
• Karan P. Singh, Electrical Engineering PhD student and NSF Graduate Research Fellow in the Stanford Translational AI Lab
• Michael C. Frank, Benjamin Scott Crocker Professor of Human Biology Director, Symbolic Systems Program
• Christopher Manning, Thomas M. Siebel Professor in Machine Learning, Professor of Linguistics and of Computer Science, Co-Founder and Senior Fellow of the Stanford Institute for Human-Centered Artificial Intelligence (HAI)

Stanford Online

You can gain access to a world of education through Stanford Online, the Stanford School of Engineering’s portal for academic and professional education offered by schools and units throughout Stanford University. https://online.stanford.edu/ Our robust ...

Stanford CS547 HCI Seminar | Spring 2026 | Just-in-Time Objectives for Specialized AI Interactions

Stanford CS547 HCI Seminar | Spring 2026 | Toward Ontological Multiplicity in AI and Computing

Stanford MS&E435 Economics of the AI Supercycle | Spring 2026 | Infrasctructure, Enterprise AI, SaaS

Live from Stanford AI Week

Stanford Robotics Seminar ENGR319 | Spring 2026 | Towards Trustworthy Autonomy

Our Learners share about their experience in the Engineering Leadership Program

Stanford Course - Technical Fundamentals of Generative AI

Course Overview - Technical Fundamentals of Generative AI

Stanford CS153 Frontier Systems | Building the Frontier Ecosystem

Stanford MS&E435 Economics of the AI Supercycle | Spring 2026 | Applications, Coding AI

Stanford MS&E435 Economics of the AI Supercycle | Spring 2026 | Building AI Factories

AI in Healthcare Series: Inside the Rise of AI in Healthcare, Open Evidence and Cyber Risks

Stanford CS153 Frontier Systems | Scale, AGI, and the Future of Everything

Stanford CS547 HCI Seminar | Spring 2026 | The Modern Motivators of Play

Stanford MS&E435 Economics of the AI Supercycle | Spring 2026 | Applications, Applied AI

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Guest Lecture: Dan Fu

Stanford Robotics Seminar ENGR319 | Spring 2026 | Leveraging Geometry in Robot Learning

Stanford CS25: Transformers United V6 I From Language Models to Native Multimodal Intelligence

Stanford CS25: Transformers United V6 I Serving Transformers: Lessons from the Trenches

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 17: Alignment - Multimodality

Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 8 - Trending Topics

Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 7 - Evaluation

Stanford CS153 Frontier Systems | The Road Ahead: Resilience Required

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 16: Post-Training - RLVR

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 15: Mid/Post-Training

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 14: Data

Stanford MS&E435 Economics of the AI Supercycle | Spring 2026 | Infrastructure, Capstone Case

Stanford CS25: Transformers United V6 I Advancing Science and Medicine with Collaborative AI Agents

Stanford CS153 Frontier Systems | The Discipline of Delivering Value per Gigawatt

Stanford MS&E435 Economics of the AI Supercycle | Spring 2026 | Enterprise Internal Knowledge

Stanford MS&E435 | Spring 2026 | Economics of Generative AI

Stanford Robotics Seminar ENGR319 | Spring 2026 | Integrated Learning and Planning

Stanford Robotics Seminar ENGR319 | Spring 2026 | Interactive Autonomy

Stanford CS25: Transformers United V6 I Distinct Modes of Generalization from Parameters and Context

Stanford CS153 Frontier Systems | The AI Native Company: How One Founder Becomes a 1000x Engineer

Stanford CS547 HCI Seminar | Spring 2026 | HCI and Human-Centered AI for Digital Health

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 13: Data (Sources, Datasets)

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 12: Evaluation

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 11: Scaling Laws

Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 6 - Model Training

Stanford CS153 Frontier Systems | Jensen Huang from NVIDIA on the Compute Behind Intelligence

Stanford CS153 Frontier Systems | Scott Nolan from General Matter on Energy Bottlenecks

Stanford Robotics Seminar ENGR319 | Spring 2026 | Unlocking Autonomous Medical Robotics

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 10: Inference

Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 5 - Architectures

Stanford CS25: Transformers United V6 I From Next-Token Prediction to Next-Generation Intelligence

Stanford CS25: Transformers United V6 I The Ultra-Scale Talk: Scaling Training to Thousands of GPUs

Stanford CS153 Frontier Systems | Ben Horowitz from a16z on Venture Capital Systems, Network Effects

Stanford CS153 Frontier Systems | Nikhyl Singhal from Skip on Product Management in the AI Era

Stanford CS153 Frontier Systems | Amit Jain from Luma AI on Unified Intelligence Systems

Stanford Online AI Programs Top Questions: When and How to Enroll in Online AI Courses

Stanford Online AI Programs Top Questions: Enrolling in Online Courses vs Self Study

Stanford Online AI Programs Top Questions: What's the Learning Experience Like?

Stanford Online AI Programs Top Questions: Ready to Start? Preparing for Success

Stanford Online AI Programs Top Questions: Choosing Your AI Program and Getting Started

Stanford Online AI Programs Top Questions: Graduate vs Professional - Which Is Right for You?

Stanford CS153 Frontier Systems | Andreas Blattmann from Black Forest Labs on Visual Intelligence

Stanford CS153 Frontier Systems | Mati Staniszewski from ElevenLabs on The Future of Voice Systems

Stanford's Code in Place Info Session with Mehran Sahami

Stanford CS153 Frontier Systems | Anjney Midha from AMP PBC on Frontier Systems

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 9: Scaling Laws

Stanford CS547 HCI Seminar | Spring 2026 | Observing the User Experience in 2026

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 8: Parallelism

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 7: Parallelism

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 6: Kernels, Triton, XLA

Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 4 - Latent Space & Guidance

Stanford CS25: Transformers United V6 I On the Tradeoffs of State Space Models and Transformers

Stanford CS25: Transformers United V6 I From Representation Learning to World Modeling

Stanford CS25: Transformers United V6 I Overview of Transformers

Stanford Robotics Seminar ENGR319 | Spring 2026 | Mechanical Intelligence in Locomotion

Stanford Robotics Seminar ENGR319 | Spring 2026 | Robot Learning from Human Experience

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 5: GPUs, TPUs

Stanford Course - Web Security

Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 3 - Flow matching

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 3: Architectures

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 2: PyTorch (einops)

Stanford CS336 Language Modeling from Scratch | Spring 2026 | Lecture 1: Overview, Tokenization

Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 2 - Score matching

Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 1 - Diffusion

Stanford Robotics Seminar ENGR319 | Winter 2026 | Gen Control, Action Chunking, Moravec’s Paradox