Teaching AI, Not Using It
"Why my students implement backpropagation by hand, build neural networks from NumPy, and learn to architect systems instead of calling APIs."
In my first class each semester, I tell students: "You will not use TensorFlow or PyTorch for the first month. You will implement gradient descent. You will derive backpropagation. You will build a neural network using only NumPy."
Some look horrified. Why learn the machinery when frameworks abstract it away?
The Using vs. Building Divide
Most AI education teaches usage. Import a library, load pretrained models, fine-tune on your data. Functional? Yes. Sufficient for building novel AI systems? No.
Using AI tools makes you dependent on what those tools provide. Building AI systems requires understanding the principles beneath the abstractions.
When your model fails—and it will—do you understand why? When you need capabilities the framework doesn't provide, can you implement them? When architectural choices determine success or failure, do you recognize the trade-offs?
What Implementation Teaches
Implementing backpropagation by hand forces confrontation with the chain rule, computational graphs, and gradient flow. Not as trivia, but as lived experience.
You discover why vanishing gradients happen. You feel the computational cost of deep networks. You understand why certain activation functions work better than others—not because a blog post said so, but because you watched the gradients behave.
Building a neural network from scratch in NumPy teaches tensor operations, broadcasting, and vectorization. These concepts determine whether your code runs in milliseconds or hours.
Architecture Before Implementation
Before writing code, students must design. What architecture suits this problem? Why convolutions for images? Why recurrence for sequences? Why attention for dependencies?
The answer "because BERT uses it" isn't acceptable. The answer must reference the structure of the data, the nature of the task, and the computational constraints.
This separates AI engineers from AI users. Engineers choose architectures based on problem structure. Users apply whatever worked in a tutorial.
The Mathematics Aren't Optional
Machine learning is applied mathematics. Linear algebra, calculus, probability, optimization—these aren't prerequisites you forget after exams. They're the language of AI.
When I teach optimization, students derive gradient descent, understand momentum, recognize why Adam works. They don't just call an optimizer—they understand what it does and why.
When students encounter a new paper, they can read the mathematics, implement the algorithms, and evaluate whether the approach fits their problem.
Building Production Systems
Academic AI and production AI differ substantially. In academics, you run experiments on clean datasets with known solutions. In production, you handle messy data, evolving requirements, and systems that can't fail.
Students learn to:
Design training pipelines that handle data drift
Build inference systems with latency constraints
Implement monitoring for model degradation
Structure code for maintainability and testing
Understand deployment trade-offs
These skills don't emerge from using high-level frameworks. They require building systems where you control every component.
The Reward
By mid-semester, students can read research papers and implement novel architectures. They understand why methods work, not just that they work. They can debug training failures, optimize inference speed, and design custom solutions.
When they encounter problems without existing solutions—which is most real problems—they can build something from first principles.
What This Means for Industry
Organizations don't need more people who can fine-tune GPT. They need people who can architect specialized AI for unique domains, understand failure modes, and build reliable production systems.
The talent gap isn't in AI usage. It's in AI engineering. People who understand the mathematics, can implement novel architectures, and make principled design decisions.
The Philosophy
I teach AI the way I build it. Not top-down from frameworks, but bottom-up from principles. Not usage patterns, but engineering fundamentals.
The goal isn't to create researchers who never use frameworks. It's to create builders who can work at any level of abstraction, from mathematical derivations to production deployments.
When you understand how something works, you can build anything. When you only know how to use it, you're limited to what others have built.
The Challenge
This approach demands more from students. It's harder than importing a library and calling fit(). It requires thinking, not just following tutorials.
But the students who push through emerge different. They don't just use AI. They build it.
And in a field evolving as rapidly as AI, that difference determines who drives innovation and who waits for the next framework update.
