📘 Lesson · Lesson 98
Web Scraping (BeautifulSoup)
Web Scraping (BeautifulSoup)
About this Project
💡 At a Glance
BeautifulSoup reads a web page's HTML and lets you extract data like titles, links and text.
The Program
Python
import requests
from bs4 import BeautifulSoup
url = "https://example.com"
html = requests.get(url).text
soup = BeautifulSoup(html, "html.parser")
print("Title:", soup.title.text)
# all links on the page
for link in soup.find_all("a"):
print(link.get("href"))Title: Example Domain
https://www.iana.org/domains/example
Scrape Responsibly
⚠️ Note
Always check a site's robots.txt and terms before scraping. Do not overload servers.
Summary
- requests fetches the HTML; BeautifulSoup parses it.
- Use soup.title, soup.find_all() to extract elements.
इस Project के बारे में
💡 एक नज़र में
BeautifulSoup web page का HTML पढ़ता है और titles, links, text जैसा data निकालने देता है।
Program
Python
import requests
from bs4 import BeautifulSoup
url = "https://example.com"
html = requests.get(url).text
soup = BeautifulSoup(html, "html.parser")
print("Title:", soup.title.text)
# page के सारे links
for link in soup.find_all("a"):
print(link.get("href"))Title: Example Domain
https://www.iana.org/domains/example
ज़िम्मेदारी से Scrape करें
⚠️ Note
Scraping से पहले site का robots.txt और terms ज़रूर देखें। Servers पर ज़्यादा load न डालें।
सारांश
- requests HTML लाता है; BeautifulSoup parse करता है।
- Elements निकालने को soup.title, soup.find_all() use करें।