mlbeginner

Train Test Split for Beginners

Understand why machine learning needs a train/test split and what goes wrong when you skip it.

Mar 28, 2026• 1 min read

What it is

A train/test split separates data into one portion for learning patterns and another portion for checking whether the model generalizes.

Why it matters

Without a held-out test set, a model may look impressive only because it memorized the training examples.

Example

If you have 1,000 rows, you might train on 800 rows and test on 200 rows.

Common mistakes

evaluating on the same data used for training
doing the split after leakage has already happened
forgetting stratification when the classes are imbalanced

Key takeaway

A good score is only meaningful if it comes from data the model did not see during training.

Related posts

mlbeginner

Precision and Recall Explained

Learn what precision and recall mean and why the right metric depends on the real-world cost of errors.

Mar 25, 2026•1 min read