Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 4 - Latent Space & Guidance
Learn more details about this course: https://online.stanford.edu/courses/cme296-diffusion-and-large-vision-models
To follow along with the course schedule and syllabus, visit: https://cme296.stanford.edu/syllabus/
Chapters:
00:00:00 Introduction
00:07:05 Pixel space
00:12:39 Semantic vs perceptual similarity
00:14:27 Autoencoder
00:22:56 Variational autoencoders
00:31:19 ELBO derivation
00:46:43 Blurriness issue of VAEs
00:47:43 Reconstruction loss
00:48:54 KL regularization loss
00:50:17 Perceptual loss
00:54:17 Adversarial loss
00:57:01 Latent diffusion models
01:00:31 Encoder vs decoder trade-off
01:05:55 Text representation with Transformers
01:12:20 Image representation with ViT
01:18:35 Contrastive learning with CLIP
01:27:44 Classifier-based guidance
01:34:34 Classifier-free guidance
For more information about Stanford’s graduate programs, visit: https://online.stanford.edu/graduate-education
Afshine Amidi is an Adjunct Lecturer at Stanford University.
Shervine Amidi is an Adjunct Lecturer at Stanford University.
View the course playlist: https://www.youtube.com/playlist?list=PLoROMvodv4rNdy8rt2rZ4T2xM0OjADnfu
Stanford Online
You can gain access to a world of education through Stanford Online, the Stanford School of Engineering’s portal for academic and professional education offered by schools and units throughout Stanford University. https://online.stanford.edu/ Our robust ...