📘 Lesson · Lesson 98

Web Scraping (BeautifulSoup)

About this Project

💡 At a Glance

BeautifulSoup reads a web page's HTML and lets you extract data like titles, links and text.

The Program

Python

import requests
from bs4 import BeautifulSoup

url = "https://example.com"
html = requests.get(url).text
soup = BeautifulSoup(html, "html.parser")

print("Title:", soup.title.text)

# all links on the page
for link in soup.find_all("a"):
    print(link.get("href"))

Title: Example Domain https://www.iana.org/domains/example

Scrape Responsibly

⚠️ Note

Always check a site's robots.txt and terms before scraping. Do not overload servers.

Summary

requests fetches the HTML; BeautifulSoup parses it.
Use soup.title, soup.find_all() to extract elements.

इस Project के बारे में

💡 एक नज़र में

BeautifulSoup web page का HTML पढ़ता है और titles, links, text जैसा data निकालने देता है।

Program

Python

import requests
from bs4 import BeautifulSoup

url = "https://example.com"
html = requests.get(url).text
soup = BeautifulSoup(html, "html.parser")

print("Title:", soup.title.text)

# page के सारे links
for link in soup.find_all("a"):
    print(link.get("href"))

Title: Example Domain https://www.iana.org/domains/example

ज़िम्मेदारी से Scrape करें

⚠️ Note

Scraping से पहले site का robots.txt और terms ज़रूर देखें। Servers पर ज़्यादा load न डालें।

सारांश

requests HTML लाता है; BeautifulSoup parse करता है।
Elements निकालने को soup.title, soup.find_all() use करें।

← Back to Python Tutorial