Back to previous page
Images from words
by
Our Take
Ever wonder how AI turns "dog with a blue beret" into an actual image? I did - and researched a bit. Here's the breakdown of what (I understand>biased view) is happening behind the scenes, minus the heavy math.
Topics
This resource is for
basic flow
text goes in through a frozen text encoder
gets turned into something the model can understand (embeddings)
diffusion model starts with noise and removes it bit by bit
super-resolution models clean it up and make it bigger
final image comes out
stack
text-to-image models (like Imagen, Parti, Stable Diffusion)
CLIP for understanding what words mean visually
transformer architecture handling the heavy lifting
upscalers making small images big
VQ-GAN for handling the image parts
resources
Jay Alammar's blog - visual explanations that actually make sense
Stanford's CS231n course - fundamentals of computer vision
Hugging Face diffusion course - hands-on with actual models
AssemblyAI YouTube series on diffusion models
Andrej Karpathy's neural nets course
Keras examples of implementing basic models
Papers:
Imagen paper (Google)
Parti paper (scaling study)
Stable Diffusion paper (for the open source angle)
Request a client account
Active Allsite clients receive a dedicated client account. Reach out to see if we currently have availability to take on new projects.
We partner with high-growth scaleups to improve their websites— choosing depth, creativity, and impact over hype and noise.
We’ve helped build 60+ brands, 50+ sites, and 20+ apps — always with care, clarity, and craft. If you're planning ahead or just curious, we're happy to connect — no pressure, no sales pitch.
Contact & Collaborate
Head Office
Grünberger Strasse 17
10243 Berlin
Germany