mlbeginner

Train Test Split for Beginners

Understand why machine learning needs a train/test split and what goes wrong when you skip it.

Mar 28, 20261 min read

What it is

A train/test split separates data into one portion for learning patterns and another portion for checking whether the model generalizes.

Why it matters

Without a held-out test set, a model may look impressive only because it memorized the training examples.

Example

If you have 1,000 rows, you might train on 800 rows and test on 200 rows.

Common mistakes

  • evaluating on the same data used for training
  • doing the split after leakage has already happened
  • forgetting stratification when the classes are imbalanced

Key takeaway

A good score is only meaningful if it comes from data the model did not see during training.

Related posts

mlbeginner

Precision and Recall Explained

Learn what precision and recall mean and why the right metric depends on the real-world cost of errors.

Mar 25, 20261 min read