🔵 Data Science  ·  Lesson 38

Data Cleaning with Pandas

Pandas से Data Cleaning

What is Pandas Data Cleaning?

Pandas Data Cleaning means data cleaning fixes missing values, duplicate rows, wrong data types and inconsistent entries.

In real programs, this topic helps in removing duplicates. Learn the idea first, then type the program yourself and compare the output.

💡 At a Glance
PointDetails
Course AreaData Science
Tools and concepts used to analyse, clean and present data.
Main Useremoving duplicates
Example Filepandas-data-cleaning.py
Practice FocusRun, change values, and explain the output line by line.

Why should you learn this?

  • It is useful for removing duplicates.
  • It connects with filling missing values.
  • It improves your ability to read, write and debug Python programs.

Important Terms

These terms are used directly in this lesson. Understand them before memorising the code.

TermMeaning
missing valuesBlank or unavailable data values.
duplicatesRepeated records in data.
fillnafillna is an important term in this topic.
dropnadropna is an important term in this topic.
astypeastype is an important term in this topic.

Syntax / Basic Pattern

The simple pattern is: prepare data, apply the concept, then show the result.

Basic Pattern
import pandas as pd
df = pd.DataFrame({"Name": ["Aarav", "Riya", "Riya"], "Marks": [82, None, None]})
df = df.drop_duplicates()
df["Marks"] = df["Marks"].fillna(0)
print(df)

Complete Example Program

Python – pandas-data-cleaning.py
import pandas as pd

df = pd.DataFrame({"Name": ["Aarav", "Riya", "Riya"], "Marks": [82, None, None]})

df = df.drop_duplicates()
df["Marks"] = df["Marks"].fillna(0)

print(df)

Expected Output

Name Marks 0 Aarav 82.0 1 Riya 0.0

Program Explanation

  • import pandas as pd imports ready-made features from a module/library.
  • df = pd.DataFrame({"Name": ["Aarav", "Riya", "Riya"], "Marks": [82, None, None]} stores a value in df.
  • df = df.drop_duplicates() stores a value in df.
  • df["Marks"] = df["Marks"].fillna(0) stores a value in df["Marks"].
  • print(df) displays information or calculated result on the screen.

Where will you use it?

  • Removing duplicates.
  • Filling missing values.
  • Fixing column data types.

Common Mistakes

  • Analysing data before checking missing values, duplicates and data types.
  • Changing original data without keeping a clean copy.
  • Creating charts without title, labels or explanation.

Practice Tasks

  1. Type the program in pandas-data-cleaning.py and run it.
  2. Change input values or sample data and observe the new output.
  3. Create one example related to removing duplicates.
  4. Write 5 lines explaining the logic in your own words.

Summary

Pandas Data Cleaning is not a theory-only topic. You should be able to explain the meaning, write the example, run it successfully, and use it in a small practical program.

Pandas Data Cleaning क्या है?

Pandas Data Cleaning ka matlab hai: Data cleaning fixes missing values, duplicate rows, wrong data types and inconsistent entries. Simple words me, ye topic practical Python programs likhne me direct use hota hai.

Is topic ko sirf definition ke liye nahi, balki removing duplicates jaise real examples ke liye practice karein.

यह क्यों सीखना जरूरी है?

  • Ye removing duplicates me kaam aata hai.
  • Ye filling missing values se bhi connected hai.
  • Isse aap code ka output aur errors better samajh paate hain.

Important Terms

TermMeaning
missing valuesBlank or unavailable data values.
duplicatesRepeated records in data.
fillnafillna is an important term in this topic.
dropnadropna is an important term in this topic.
astypeastype is an important term in this topic.

Syntax / Basic Pattern

Basic idea: pehle data तैयार करें, phir Python logic apply करें, aur finally result display करें.

Basic Pattern
import pandas as pd
df = pd.DataFrame({"Name": ["Aarav", "Riya", "Riya"], "Marks": [82, None, None]})
df = df.drop_duplicates()
df["Marks"] = df["Marks"].fillna(0)
print(df)

Complete Example Program

Python – pandas-data-cleaning.py
import pandas as pd

df = pd.DataFrame({"Name": ["Aarav", "Riya", "Riya"], "Marks": [82, None, None]})

df = df.drop_duplicates()
df["Marks"] = df["Marks"].fillna(0)

print(df)

Expected Output

Name Marks 0 Aarav 82.0 1 Riya 0.0

Program Explanation

  • import pandas as pd imports ready-made features from a module/library.
  • df = pd.DataFrame({"Name": ["Aarav", "Riya", "Riya"], "Marks": [82, None, None]} stores a value in df.
  • df = df.drop_duplicates() stores a value in df.
  • df["Marks"] = df["Marks"].fillna(0) stores a value in df["Marks"].
  • print(df) displays information or calculated result on the screen.

Practical Uses

  • Removing duplicates.
  • Filling missing values.
  • Fixing column data types.

Common Mistakes

  • Analysing data before checking missing values, duplicates and data types.
  • Changing original data without keeping a clean copy.
  • Creating charts without title, labels or explanation.

Practice Tasks

  1. Program ko pandas-data-cleaning.py file me type karke run karein.
  2. Values change karke output compare karein.
  3. removing duplicates par ek छोटा example banayen.
  4. Logic ko apne words me 5 lines me likhein.

सारांश

Pandas Data Cleaning ko tab complete maanenge jab aap iska meaning, example, output aur practical use clearly explain kar saken.

← Back to Python Tutorial