当前位置：首页 > news >正文

品牌形象网站建设/拼多多代运营公司十大排名

news 2025/6/30 14:17:02

品牌形象网站建设,拼多多代运营公司十大排名,品牌的佛山网站建设价格,学做美食的网站视频bs4解析 bs4解析是python中独有的数据解析方式 bs4数据解析的原理 1.实例化一个BeautifulSoup对象，并且将页面源码数据加载到该对象中 2.通过调用BeautifulSoup对象中相关的属性或方法进行标签定位和数据提取环境安装： 1.pip install bs4 2.pip insta…

bs4解析

bs4解析是python中独有的数据解析方式

bs4数据解析的原理
1.实例化一个BeautifulSoup对象，并且将页面源码数据加载到该对象中
2.通过调用BeautifulSoup对象中相关的属性或方法进行标签定位和数据提取
环境安装：
1.pip install bs4
2.pip install lxml
这里顺带将pip如何设置成国内源的方法链接附上
链接在这
如何实例化BeautifulSoup对象：
from bs4 import BeautifulSoup
将爬取到的html页面加载到对象中，具体代码如下：

from bs4 import BeautifulSoup
import requests
if __name__ == '__main__':url="https://www.baidu.com/"headers = {"User-Agent": "Mozilla/5.0(Windows NT 10.0;Win64;x64) AppleWebKit / 537.36(KHTML, likeGecko) Chrome / 88.0.4324.150 Safari / 537.36"}resp=requests.get(url=url,headers=headers)resp.encoding="utf-8"# 将互联网上获取的页面源码加载到该对象中page_text=resp.textsoup=BeautifulSoup(page_text,"lxml")print(soup)

打印出来的soup就是一个html页面文件

相关属性和方法

soup.tagName:返回的是文档中第一次出现的对应的标签
soup.find():与上者等同，但可以进行属性定位，如：soup.find('div',class_='song')
因为class是python关键字，所以这个地方是写成class_
soup.find_all('tagName'):返回符合要求的所有标签
select:select('某种选择器') 返回的是一个数组
获取标签之间的文本数据:soup.a.text/string/get_text()text/get_text()可以获取某一个标签中的所有文本内容string:只能获取直系文本内容
获取标签中的属性值：soup.a['href']

使用案例：

from bs4 import BeautifulSoup
import requests
import os
if __name__ == '__main__':# 创建一个文件夹 保存所有图片if not os.path.exists("./qiutuLibs2"):os.mkdir("./qiutuLibs2")headers = {"User-Agent": "Mozilla/5.0(Windows NT 10.0;Win64;x64) AppleWebKit / 537.36(KHTML, likeGecko) Chrome / 88.0.4324.150 Safari / 537.36"}src_list=[]url = "https://www.qiushibaike.com/imgrank/page/%d/"for pageNum in (1,13):new_url=format(url%pageNum)resp=requests.get(url=new_url,headers=headers)resp.encoding="utf-8"# 将互联网上获取的页面源码加载到该对象中page_text=resp.textsoup=BeautifulSoup(page_text,"lxml")for src in soup.select(".illustration"):src_list.append(src['src'])for src in src_list:src="https:"+srcimg_data=requests.get(url=src,headers=headers).contentimg_Name=src.split('/')[-1]img_Path='./qiutuLibs2/'+img_Namewith open(img_Path,"wb") as fp:fp.write(img_data)print(img_Path+"下载成功")