file_path = os.path.join(os.getcwd()
这个任务需要用到爬虫和文件下载两个技能,下面是一个简单的实现过程:
1. 导入所需库
```
import requests
from bs4 import BeautifulSoup
import os
```
2. 设置请求头和 URL
```
url = "https://china.nba.cn/news/"
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
```
3. 发送 GET 请求并解析 HTML
```
# 发送 GET 请求
response = requests.get(url, headers=headers)
# 解析 HTML
soup = BeautifulSoup(response.text, 'html.parser')
```
4. 找到所有球员数据的链接
```
links = []
for a in soup.find_all('a', class_='nba-article-item__title-link'):
links.append(a['href'])
```
5. 进入每个链接,找到下载链接并下载文件
```
# 循环进入每个链接
for link in links:
# 发送 GET 请求
response = requests.get(link, headers=headers)
# 解析 HTML
soup = BeautifulSoup(response.text, 'html.parser')
# 找到下载链接
download_link = soup.find('a', class_='nba-article__download-link')
if download_link is not None:
# 下载文件
file_url = download_link['href']
file_name = file_url.split('/')[-1]
file_path = os.path.join(os.getcwd(), file_name)
with open(file_path, 'wb') as f:
f.write(requests.get(file_url).content)
print(f"文件 {file_name} 下载完成!")
```
完整代码如下:
```
import requests
from bs4 import BeautifulSoup
import os
# 设置请求头和 URL
url = "https://china.nba.cn/news/"
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
# 发送 GET 请求并解析 HTML
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
# 找到所有球员数据的链接
links = []
for a in soup.find_all('a', class_='nba-article-item__title-link'):
links.append(a['href'])
# 循环进入每个链接
for link in links:
# 发送 GET 请求
response = requests.get(link, headers=headers)
# 解析 HTML
soup = BeautifulSoup(response.text, 'html.parser')
# 找到下载链接
download_link = soup.find('a', class_='nba-article__download-link')
if download_link is not None:
# 下载文件
file_url = download_link['href']
file_name = file_url.split('/')[-1]
file_path = os.path.join(os.getcwd(), file_name)
with open(file_path, 'wb') as f:
f.write(requests.get(file_url).content)
print(f"文件 {file_name} 下载完成!")
```
运行该程序后,会在当前工作目录下下载所有球员数据的文件。
![file_path = os.path.join(os.getcwd() file_path = os.path.join(os.getcwd()](https://sjzwlkj.com/zb_users/upload/2024/02/202402141707870053368424.jpg)
相关文章
发表评论
评论列表
- 这篇文章还没有收到评论,赶紧来抢沙发吧~