
About Prof. Tom Yeh
My New Year’s resolution for 2026 is simple:
I’m building the AI by Hand ✍️ Academy and opening my classroom to AI professionals around the world who care about math, algorithms, and architectures.
I will be dedicating substantial time to creating new material and organizing my existing content into a comprehensive library, from first principles to frontier systems.
Library
Seminar Series
🔥 OpenClaw - 12 Stages of Evolution from the Transformer (2/19/2026) 🔥 Transformer - Six Levels of Understanding (2/12/2026) Meta Superintelligence Labs vs Facebook AI Research (2/5/2026) 9 AI Eval Formulas (1/29/2026) Google Ironwood TPU: From Bits to HBM (1/22/2026) How AWS Uses Small Models Learn Tool Use (1/15/2026) Attention (1/15/2026) DeepSeek’s Manifold-Constrained Hyper Connection (mHC) (1/8/2026) Introduction to Generative AI (1/8/2026) Gated Attention (NeurIPS 2025 Best Paper) (12/16/2025) [Free Preview] Agent Hallucination (11/7/2025) [Free Preview] GPU (10/30/2025) [Free Preview] Scaling LLMs Google (8/20/2025) [Free Preview] LLaMA 1 to 2 to 3 to 4 (5/14/2025) [Free Preview] Drawing DeepSeek by Hand ✍️ (2/20/2025)
Course: Introduction to Agentic AI
New 2026 edition, featuring live calculations and sample code built directly in Excel.
Module 1: Foundation Module 2: LLM to RAG to Agents Module 3: Vector Database Module 4: Tool Use
Drawings: Frontier AI
“Expert Choice” Mixture of Experts (MoE) MHA, MQA, GQA, MoE-A: More Attention! New GPT-OSS Trick to Ignore Tokens MXFP4, FP4, FP8 LoRA, Fine-Tune, Pre-Train QLoRA, DoRA, BitFit, NF4 vs INT4 KV Cache, Prefill, Decode [Free Preview] EmbeddingGemma, MRL, InfoNCE, Embed vs. Decode Inference Batching, Request-vs-Token Level MLP Parallelism: Data, Context, Row, Column, Pipeline RoPE vs PE in QKV Self-Attention RMS, Group, Layer, Batch Norm, Tensor Parallelism Qwen 3
Excel Blueprints: Frontier AI
DeepSeek Attention (DSA) Einsum: Outer Product to GQA DeepSeek OCR ResNet, ViT, Qwen 3 Attention RLHF PPO Sharding [Free Preview] LinearLayer+LoRA Blueprint: PyTorch, BLAS, CUDA
Excel Blueprints: Essential AI Math
Softmax Sigmoid Negative Log-Likelihood Double Sum ReLU Leaky ReLU [Free Preview] Sinusoidal Positional Encoding Batch Normalization Layer Normalization RMS Normalization Cross Entropy Loss L2 Norm L2 Loss Binary Cross Entropy Loss KL Divergence ELU (Exponential Linear Unit) Swish (Sigmoid Linear Unit, SiLU) GELU (Gaussian Error Linear Unit) Tanh GLU (Gated Liner Unit) Entropy
Workbook: Matrix Multiplication
Calculate Complexity Shape Identity Scale Shift Combine Rows Combine Columns Not Commutative Associative Distributive Chain Transpose Inverse Linear Equations Tiled Algorithm
“What I cannot create, I do not understand.