🟣 ML + AI · Lesson 49

Train-Test Split and Data Preparation

Train-Test Split और Data Preparation

What is Train Test Split?

Train Test Split means train-test split divides data into training data for learning and testing data for checking performance.

In real programs, this topic helps in testing on unseen data. Learn the idea first, then type the program yourself and compare the output.

💡 At a Glance

Point	Details
Course Area	Machine Learning + AI Concepts used for prediction, classification, clustering and AI-based projects.
Main Use	testing on unseen data
Example File	`train-test-split.py`
Practice Focus	Run, change values, and explain the output line by line.

Why should you learn this?

It is useful for testing on unseen data.
It connects with checking fair performance.
It improves your ability to read, write and debug Python programs.

Important Terms

These terms are used directly in this lesson. Understand them before memorising the code.

Term	Meaning
training set	training set is an important term in this topic.
testing set	testing set is an important term in this topic.
test_size	test_size is an important term in this topic.
random_state	Fixed seed used to make data splitting repeatable.
generalization	generalization is an important term in this topic.

Syntax / Basic Pattern

The simple pattern is: prepare data, apply the concept, then show the result.

Basic Pattern

from sklearn.model_selection import train_test_split
X = [[1], [2], [3], [4], [5]]
y = [40, 50, 60, 70, 80]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=1)
print("Train size:", len(X_train))
print("Test size:", len(X_test))

Complete Example Program

Python – train-test-split.py

from sklearn.model_selection import train_test_split

X = [[1], [2], [3], [4], [5]]
y = [40, 50, 60, 70, 80]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=1)
print("Train size:", len(X_train))
print("Test size:", len(X_test))

Expected Output

Train size: 3 Test size: 2

Program Explanation

from sklearn.model_selection import train_test_split imports ready-made features from a module/library.
X = [[1], [2], [3], [4], [5]] stores a value in X.
y = [40, 50, 60, 70, 80] stores a value in y.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_ stores a value in X_train, X_test, y_train, y_test.
print("Train size:", len(X_train)) displays information or calculated result on the screen.
print("Test size:", len(X_test)) displays information or calculated result on the screen.

Where will you use it?

Testing on unseen data.
Checking fair performance.
Avoiding overfitting.

Common Mistakes

Training and testing the model on the same data.
Using an algorithm without understanding the input features.
Reporting only accuracy without checking actual mistakes and limitations.

Practice Tasks

Type the program in train-test-split.py and run it.
Change input values or sample data and observe the new output.
Create one example related to testing on unseen data.
Write 5 lines explaining the logic in your own words.

Summary

Train Test Split is not a theory-only topic. You should be able to explain the meaning, write the example, run it successfully, and use it in a small practical program.

Train Test Split क्या है?

Train Test Split ka matlab hai: Train-test split divides data into training data for learning and testing data for checking performance. Simple words me, ye topic practical Python programs likhne me direct use hota hai.

Is topic ko sirf definition ke liye nahi, balki testing on unseen data jaise real examples ke liye practice karein.

यह क्यों सीखना जरूरी है?

Ye testing on unseen data me kaam aata hai.
Ye checking fair performance se bhi connected hai.
Isse aap code ka output aur errors better samajh paate hain.

Important Terms

Term	Meaning
training set	training set is an important term in this topic.
testing set	testing set is an important term in this topic.
test_size	test_size is an important term in this topic.
random_state	Fixed seed used to make data splitting repeatable.
generalization	generalization is an important term in this topic.

Syntax / Basic Pattern

Basic idea: pehle data तैयार करें, phir Python logic apply करें, aur finally result display करें.

Basic Pattern

from sklearn.model_selection import train_test_split
X = [[1], [2], [3], [4], [5]]
y = [40, 50, 60, 70, 80]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=1)
print("Train size:", len(X_train))
print("Test size:", len(X_test))

Complete Example Program

Python – train-test-split.py

from sklearn.model_selection import train_test_split

X = [[1], [2], [3], [4], [5]]
y = [40, 50, 60, 70, 80]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=1)
print("Train size:", len(X_train))
print("Test size:", len(X_test))

Expected Output

Train size: 3 Test size: 2

Program Explanation

from sklearn.model_selection import train_test_split imports ready-made features from a module/library.
X = [[1], [2], [3], [4], [5]] stores a value in X.
y = [40, 50, 60, 70, 80] stores a value in y.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_ stores a value in X_train, X_test, y_train, y_test.
print("Train size:", len(X_train)) displays information or calculated result on the screen.
print("Test size:", len(X_test)) displays information or calculated result on the screen.

Practical Uses

Testing on unseen data.
Checking fair performance.
Avoiding overfitting.

Common Mistakes

Training and testing the model on the same data.
Using an algorithm without understanding the input features.
Reporting only accuracy without checking actual mistakes and limitations.

Practice Tasks

Program ko train-test-split.py file me type karke run karein.
Values change karke output compare karein.
testing on unseen data par ek छोटा example banayen.
Logic ko apne words me 5 lines me likhein.

सारांश

Train Test Split ko tab complete maanenge jab aap iska meaning, example, output aur practical use clearly explain kar saken.

← Back to Python Tutorial