Breaking News

Linear Regression

 ML Insights

Machine LearningBeginner Friendly~8 min read

Linear Regression — Made Simple

Ever wondered how a bank predicts your loan amount, or how a website guesses your budget? It often starts with this one simple idea.

📅 May 2026✍️ Ashwin Carpenter📖 For Everyone

What Is Linear Regression?

Think about this: the bigger a pizza is, the more it costs. The more hours you study, the better your exam score. These are straight-line relationships — and that is exactly what linear regression finds.

Linear regression is a machine learning method that draws the best possible straight line through your data and uses that line to predict future values.

💡

Real-life example: You want to predict a house's price based on its size. Linear regression looks at thousands of past house sales, draws a line showing how price goes up as size goes up, then uses that line to predict prices for new houses.

House Size →Price →Best-fit lineError (how far off)

Each green dot is a real house. The green line is what the model learned. The red dashes show how far off each prediction is.

The Formula (Don't Be Scared!)

It's just the straight-line equation you learned in school:

The model in plain words
Prediction = (slope × input) + starting value

The model's job is to find the right slope and starting value so the line fits the data as closely as possible. In ML, slope is called a weight and the starting value is called a bias.

How Does the Model Learn?

The model starts with a random guess — a line that fits the data badly. Then it keeps making small adjustments until the line fits as well as possible. Here is how:

1
Make a prediction

Use the current line to guess the output for each example in the training data.

2
Measure the mistake

Compare each guess to the real answer. Add up all the errors. The total is called thecost— lower is better.

3
Adjust the line

Nudge the slope and starting value slightly in the direction that reduces the cost. This step is calledgradient descent— the model rolls downhill toward a better answer.

4
Repeat until done

Do steps 1–3 hundreds of times. Each round, the line gets a little better until it stops improving.

🎯

The error formula: The model measures its mistakes using Mean Squared Error (MSE) — it squares each mistake (so positive and negative errors don't cancel out) and takes the average. A lower MSE means a better line.

Mean Squared Error — how wrong is the model?
MSE = average of (predicted value − real value)²

Quick Python Example

from sklearn.linear_model import LinearRegression

# Training data: house sizes (sqft) and prices ($)
sizes  = [[600], [800], [1000], [1200], [1500]]
prices = [150000, 200000, 250000, 290000, 370000]

# Train the model
model = LinearRegression()
model.fit(sizes, prices)

# Predict the price of an 1100 sqft house
result = model.predict([[1100]])
print(f"Predicted price: ${result[0]:,.0f}")

How Do We Know If It Worked?

After training, test the model on data it has never seen before. These three numbers tell you how well it performed:

MetricWhat it tells youGood score
RMSEAverage prediction error in the same units as your answer (e.g. dollars). Easy to understand.As low as possible
MAESimilar to RMSE but less affected by very large mistakes.As low as possible
How much of the pattern the model has captured. 1 = perfect. 0 = no better than a random guess.Close to 1
📊

Example: If your model predicts house prices and has an RMSE of $12,000, it is off by about $12,000 on average. Whether that is acceptable depends on the price range you are working with.

Common Mistakes to Avoid

MistakeWhat goes wrongEasy fix
Not scaling your numbersA feature like "salary in thousands" will overpower a feature like "age in years." The model learns badly.Scale all features to a similar range before training.
Extreme outliersOne or two wild data points can pull the whole line off track.Check your data and fix or remove obvious errors.
The relationship is curved, not straightIf the real pattern is a curve, a straight line will always give bad predictions.Plot your data first! If it curves, use a different model.
Too many features, not enough dataThe model memorises the training examples instead of learning the real pattern.Use fewer features or add regularisation (see below).

What Is Regularisation?

Sometimes the model tries too hard and memorises the training data. It gets great scores on training but fails on new data. This is called overfitting. Regularisation is a simple penalty that keeps the model from going overboard.

Ridge

Shrinks all weights a little

Great when all your features probably matter. No features get completely removed.

Lasso

Sets some weights to zero

Great when you think many features are useless — Lasso automatically removes them.

When Should You Use It?

✅ Use it when

Good fit

You want to predict a number (not a yes/no). The data looks roughly like a straight line. You need results that are easy to explain.

❌ Avoid it when

Not the right tool

You want to predict a category (e.g. spam/not spam). The relationship is clearly curved. You have complex, messy data — try a neural network instead.

⚠️

Always try this first. Before jumping to complicated models, try linear regression. If it already gives good results, you do not need anything fancier — and simple models are easier to trust, explain, and maintain.

Key Takeaways

1
Linear regression finds the best straight line through your data.

It learns the line by reducing the average error step by step.

2
It only works well when the relationship is roughly a straight line.

Always plot your data before you start. If it curves, use a different model.

3
Scale your features and look for outliers.

These two steps alone fix most common problems before training.

4
Use R² and RMSE to check how well the model is doing.

Always test on data the model has never seen — not just the training data.

Further Reading

  • Géron, Aurélien — Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow (O'Reilly, 2022)
  • James et al. — An Introduction to Statistical Learning (Springer, 2021) — free PDF online
  • Ng, Andrew — Machine Learning Specialization on Coursera — great beginner video course
  • scikit-learn docs — sklearn.linear_model

Written by Ashwin Carpenter · May 2026 · ML Insights

No comments