AI by Hand ✍️

by Prof. Tom Yeh·Last posted 5 months ago

About Prof. Tom Yeh

artificial intelligence ai education machine learning computer science

My New Year’s resolution for 2026 is simple:

I’m building the AI by Hand ✍️ Academy and opening my classroom to AI professionals around the world who care about math, algorithms, and architectures.

I will be dedicating substantial time to creating new material and organizing my existing content into a comprehensive library, from first principles to frontier systems.

Library

Seminar Series

🔥 OpenClaw - 12 Stages of Evolution from the Transformer (2/19/2026) 🔥 Transformer - Six Levels of Understanding (2/12/2026) Meta Superintelligence Labs vs Facebook AI Research (2/5/2026) 9 AI Eval Formulas (1/29/2026) Google Ironwood TPU: From Bits to HBM (1/22/2026) How AWS Uses Small Models Learn Tool Use (1/15/2026) Attention (1/15/2026) DeepSeek’s Manifold-Constrained Hyper Connection (mHC) (1/8/2026) Introduction to Generative AI (1/8/2026) Gated Attention (NeurIPS 2025 Best Paper) (12/16/2025) [Free Preview] Agent Hallucination (11/7/2025) [Free Preview] GPU (10/30/2025) [Free Preview] Scaling LLMs Google (8/20/2025) [Free Preview] LLaMA 1 to 2 to 3 to 4 (5/14/2025) [Free Preview] Drawing DeepSeek by Hand ✍️ (2/20/2025)

Course: Introduction to Agentic AI

New 2026 edition, featuring live calculations and sample code built directly in Excel.

Module 1: Foundation Module 2: LLM to RAG to Agents Module 3: Vector Database Module 4: Tool Use

Drawings: Frontier AI

“Expert Choice” Mixture of Experts (MoE) MHA, MQA, GQA, MoE-A: More Attention! New GPT-OSS Trick to Ignore Tokens MXFP4, FP4, FP8 LoRA, Fine-Tune, Pre-Train QLoRA, DoRA, BitFit, NF4 vs INT4 KV Cache, Prefill, Decode [Free Preview] EmbeddingGemma, MRL, InfoNCE, Embed vs. Decode Inference Batching, Request-vs-Token Level MLP Parallelism: Data, Context, Row, Column, Pipeline RoPE vs PE in QKV Self-Attention RMS, Group, Layer, Batch Norm, Tensor Parallelism Qwen 3

Excel Blueprints: Frontier AI

DeepSeek Attention (DSA) Einsum: Outer Product to GQA DeepSeek OCR ResNet, ViT, Qwen 3 Attention RLHF PPO Sharding [Free Preview] LinearLayer+LoRA Blueprint: PyTorch, BLAS, CUDA

Excel Blueprints: Essential AI Math

Softmax Sigmoid Negative Log-Likelihood Double Sum ReLU Leaky ReLU [Free Preview] Sinusoidal Positional Encoding Batch Normalization Layer Normalization RMS Normalization Cross Entropy Loss L2 Norm L2 Loss Binary Cross Entropy Loss KL Divergence ELU (Exponential Linear Unit) Swish (Sigmoid Linear Unit, SiLU) GELU (Gaussian Error Linear Unit) Tanh GLU (Gated Liner Unit) Entropy

Workbook: Matrix Multiplication

Calculate Complexity Shape Identity Scale Shift Combine Rows Combine Columns Not Commutative Associative Distributive Chain Transpose Inverse Linear Equations Tiled Algorithm

“What I cannot create, I do not understand.