AI for Dummies (me)
To not put it into words would be a waste of time. So this is my effort to track what I'm learning about AI. My source materials come from the papers listed in The 2025 AI Engineer Reading List.
What is AI
Artificial Intelligence is a domain of computer science that seeks to enable computers with human-like abilities to perform particular tasks. To do like us, but on a larger scale, faster. Computers are powerful tools and and if we can teach them to help us solve difficult problems then it is our nature to do so.
Machine Learning
Teaching computers to recognize patterns and make decisions without being explicitly programmed. Instead of relying on a giant set of predefined rules, the computer is given many examples to learn from and identify patterns on its own.
Supervised & Unsupervised learning
In a typical dataset, each row represents an individual data point with features—the known attributes or characteristics—and a label, the outcome the model is trained to predict. In other words, given features A, B, and C, the model learns to determine what label X should be.
Supervised learning relies on labeled data, identifying patterns between inputs and their corresponding outputs (e.g., an email labeled as spam or not spam). The goal is to accurately predict the output for new, unseen inputs.
Common Supervised Learning Tasks:
- Classification: Assigns data to categories (e.g., binary classification like spam detection, or multi-class classification like recognizing handwritten digits).
- Regression: Predicts continuous values (e.g., house prices, stock values, temperature, rainfall).
Unsupervised learning works with unlabeled data, meaning there’s no predefined output for the model to learn from. Instead, the goal is to identify patterns, structures, or relationships within the data itself (e.g., analyzing raw emails without knowing whether they are spam or not).
Goals of Unsupervised Learning:
- Clustering: Grouping similar data points together (e.g., categorizing emails based on content similarity).
- Dimensionality Reduction: Reducing the number of features while preserving important information.
Training, Validating & Testing the Model
Training begins by using a portion of the dataset, typically around 70-80%, as training data. The model takes these inputs, makes predictions, and compares them to the actual outputs to calculate the loss, or error. It then adjusts its internal parameters to improve accuracy over multiple iterations.
Validation follows, using a separate validation set, usually around 10-15% of the data. This step acts as a gut check to evaluate how well the model generalizes to unseen data. Unlike training, the model is not adjusted based on validation results. This helps detect overfitting, where the model memorizes training data instead of learning meaningful patterns.
Testing is the final evaluation, using the remaining 10-15% of the data. The model is assessed on completely new data that it has never seen before. This test set provides the best estimate of how well the model will perform in real-world scenarios.
Unsupervised Pre-Training (as it relates to LLMs)
Unsupervised pre-training allows a model to learn general language patterns from a large corpus of unlabeled text.
The model learns the structure, relationships, and statistical patterns in language by predicting the next word in a sequence.
This process is often referred to as a language modeling task.
The technical output of unsupervised pre-training is a pre-trained language model. Here is an example of an embedding matrix:
Word Embedding (Vector)
------- --------------------
"dog" [0.1, 0.5, -0.3]
"cat" [0.3, 0.7, 0.2]
"car" [-0.4, 0.2, 0.6]
Each word or token in the vocabulary is mapped to a dense vector (a high-dimensional representation).
The numbers in the embedding vectors, known as weights, are the parameters learned by the neural network during training. These weights encode semantic relationships between words, allowing the model to understand meaning and context. This is a simple example, and many models have hundreds and even thousands of weights per vector!
Fine tuning:
Fine-tuning is the process of taking a pre-trained model (like a language model) and adapting it to perform a specific task (sentiment analysis, question answering, summarization, code generation) by training it further on a smaller, labeled dataset.
During fine-tuning, the model shifts from its original general-purpose goal (predicting the next word) to a new task-specific goal (e.g., sentiment analysis or code generation).
Benefits of Fine-Tuning:
Instead of needing a huge labeled dataset to train a model, you can use a smaller one (e.g., 10,000 examples) paired with the pre trained model. This saves on the training time and compute costs, as well as the need for a large amount of labeled data.
Example Of Fine-Tuning
In 2021, OpenAI researchers introduced a model fine-tuned on a large dataset of publicly available code from GitHub. They found that this fine-tuned model significantly outperformed previous general-purpose models in generating useful code. One particularly effective approach was to have the model generate multiple potential solutions, allowing users to verify them either manually or by running the code against predefined tests to confirm correctness.
Neural Networks
What is a Neural Network?
The design of neural networks are inspired by the human mind. A neural network is a set of interconnected "neurons" that process data through a series of layers to produce some desired output. Each neuron is responsible for receiving data from the previous layer, doing some math, and sending it on to the next layer of neurons.
A typical neural network has three types of layers:
- Input Layer: Receives the data
- Hidden Layers: Perform computations
- Output Layer: Produces the final output
A Tiny Neural Network with Just 3 Parameters
Lets say we want a program that can help us predict how expensive a house will be. We know two things about each house:
- Size of the house (in square ft)
- Number of bedrooms
Our program can be written as:
Price of Home = (weight1 × Size) + (weight2 × Bedrooms) + bias
Where weight1
is value that indicates how much each square foot contributes to the final price, weight2
indicates how much each bedroom contributes to the final price, and bias
is a constant value that adjusts our final output to be more accurate, similar to how a house might have a minimum base price that represents the value of the land.
This tiny neural network takes our two known inputs (size of house and number of bedrooms), multiplies them by the relevant weight, adds them together along with the bias, and outputs a prediction.
How do we determine the weights and bias?
To train our model, we need some data:
Home Price Square Feet Bedrooms
---------- ----------- --------
$415,000 1500 3
$300,250 1250 3
$780,000 4200 5
During training, our model assigns random values to the 2 weights and the bias (these are called the parameters), then adjusts them until the output home price is sufficiently accurate. The weights and bias are the things we are most interested in figuring out. If a weight is too big and causing the final price to be off, it gets reduced. If the weight is too small and needs to have more influence over the final price, it gets increased. If the bias is making predictions too high or too low, it gets shifted up or down.
In simple terms, we: Assign random values to the weights and bias. Make a prediction for the final price. Measure the loss (how far off was the prediction from the actual price?) Use fancy math to figure out best way to tweak the 3 values. Repeat thousands/millions of times until loss is low.
While our house price example is simple, real-world AI models operate on a much larger scale:
- OpenAI's GPT-3: 175 billion parameters.
- OpenAI's GPT-4: Estimated to have 1.7 - 1.8 Trillion parameters. Woah.