文章目录 [ 隐藏 ]
爬虫其实很简单,只要用心,很快就就能掌握这门技术。我们上一篇讲了怎么加载花瓣网美女,这一章我们增加搜索功能,并实现python转exe文件。
本文目标
- 实现搜索花瓣相关图片
- 根据规则下载图片到本地文件夹
- 扩展功能:把我们的项目转成exe可执行文件
直接看源码
UtilsRequest.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
#auth:py40.com import requests,time class UtilsRequest(): #联网超时时间 time_out = 30 #联网失败重试次数 request_nums = 5; def __init__(self): super().__init__(); def requestpageText(self,url): request_count = self.request_nums try: request_count-=1 head = { 'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6'} Page = requests.session().get(url, headers=head, timeout=self.time_out) Page.encoding = "utf-8" print("获取网页数据成功") return Page.text except Exception as e: print("联网失败了...重试中", e) time.sleep(5) print("暂停结束") if request_count >=0 : self.requestpageText(url) def downfile(self,file, url): print("开始下载:", file, url) try: r = requests.get(url, stream=True) with open(file, 'wb') as fd: for chunk in r.iter_content(): fd.write(chunk) except Exception as e: print("下载失败了", e) |
huaban.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
import re,os import UtilsRequest class Huaban: file_save_path = "D:/work/python/pic/" text_keyword = "航空"; page_nums = 0; down_photo_num=0#下载的图片数量 ru = "" def __init__(self): super().__init__(); self.ru = UtilsRequest.UtilsRequest() def gethuaban(self): urlhuaban = "http://huaban.com/search/?q=%s&per_page=20&wfl=1&page=%d" urlhuaban = urlhuaban % (self.text_keyword,self.page_nums); file_save_path = self.file_save_path+self.text_keyword+"/"; print("*******************************************************************") print("请求网址:", urlhuaban) self.page_nums += 1 if not os.path.exists(file_save_path): os.makedirs(file_save_path) text = self.ru.requestpageText(urlhuaban) pattern = re.compile('{"pin_id":(\d*?),.*?"key":"(.*?)",.*?"like_count":(\d*?),.*?"repin_count":(\d*?),.*?}', re.S) items = re.findall(pattern, text) if(len(items)==0): print("*******************************************************************") print("共下载图片%d张"%self.down_photo_num) print("下载资源结束~~~~~~~~~~~~~或未找到资源") return; print(items) for item in items: max_pin_id = item[0] x_key = item[1] x_like_count = int(item[2]) x_repin_count = int(item[3]) if (x_repin_count > 10 and x_like_count > 10) or x_repin_count > 10 or x_like_count > 1: print("开始下载第{0}张图片".format(self.down_photo_num)) url_image = "http://hbimg.b0.upaiyun.com/" url_item = url_image + x_key filename = file_save_path + str(max_pin_id) + ".jpg" if os.path.isfile(filename): print("文件存在:", filename) continue self.ru.downfile(filename, url_item) self.down_photo_num += 1 self.gethuaban() h = Huaban() h.gethuaban() |
看下执行效果吧
1 2 3 4 5 6 |
E:\python\python\python.exe E:/python/python_tools.git/trunk/huanbanwang/client/huaban.py ******************************************************************* 请求网址: http://huaban.com/search/?q=航空&per_page=20&wfl=1&page=0 获取网页数据成功 [('25998529', 'ac48f50da994ae34508757cdc5a318bfbd76abad3dcd9-jF3R1F', '277', '5479'), ('64087739', '51d689b29b57592531048af52b3ecfa7943e70cc823e3-7u84U6', '175', '3871'), ('35380104', '7c5fa66176fe83d28e0cb5db2dc32d8b412ade3835bf7-skSUDn', '219', '3159'), ('3420403', 'a3ee187da0bb209dc52e56d617f057566a0a2e251af08-DLqN3F', '125', '1732'), ('164522301', 'cf8f63e3f8f5a488f56c2d15fbfc2223e6adc81bf6a1f-9RjvmR', '48', '1456'), ('148193533', '8279145ed274d8f0a6378f88fe0ca501db99c12a4c9b6-WUnHDs', '47', '1000'), ('159827954', '0f620917cb9f3cdd69680cde649c22c2e77d7b30303b3-um6CvT', '91', '907'), ('54706286', '71c80cdf6d51a84ca1e16cd89ed4a53ce74350bd1a5e0-6Zgrob', '66', '907'), ('181786894', '89468eb1a8b27df3937d409d86838c2c8dba644822969-IqTv3N', '72', '817'), ('159827812', '08fe0f28a1b7b3531eabfb22991e9447ec658b3a3126a-ajzvrx', '100', '778'), ('54706246', 'fd567e09a0a4798ff24eb235bd7ca1a806c560a317bd5-SoQtGo', '46', '827'), ('26989648', 'ef5c6200021a9fa963d3365cec18181578d9e8165f05e-SCQznH', '22', '827'), ('63097439', '990dc0dc81dfc7e8bbd43ea68cf38f8aa843092a56c1d-uuWphQ', '54', '774'), ('54706351', '7a1c075d07652ef20f4cc25c9cd33e183903e40719ea1-F9CpY3', '42', '752'), ('126871285', '3bfcf291a73a4a421092d7c55a4f0e4654c80d8929ad1-xWY6Dn', '51', '732'), ('138437559', 'a1de3b38b0b733fd3f7544392aacf2833b665ef63bd84-MOEBob', '20', '754'), ('33589865', '70a48927340e19f2bdb8f150b3cd23d6184e38bb769e1-6I8wSm', '53', '717'), ('29534347', 'db03f05d708eb085729069e79ad0a436f0706ed93f77b-sB1nyW', '16', '748'), ('754771425', '4ccc5f090bf9636d59d44fd5afcca7adfc4e69a13b2ddf-minhiB', '77', '627'), ('54706308', '5aaeed11af6652a2296ccc46a2b670a670c734b617508-aptmzS', '32', '655')] 开始下载第0张图片 |
然后是下载的图
程序讲解
这里我们用到面向对象思想,然后有两个文件UtilsRequest.py,Huaban.py。
Huaban.py实现业务逻辑和正则表达是匹配,根据规则找到图片详情。
UtilsRequest.py工具类,主要实现下载图片和获取网页源码功能。
Huaban.py是我们的入口代码,我们直接运行Huaban.py即可,text_keyword 使我们的搜索关键字,下载成功后会把图片保存在对应的目录下面。
1 2 3 |
pattern = re.compile('{"pin_id":(\d*?),.*?"key":"(.*?)",.*?"like_count":(\d*?),.*?"repin_count":(\d*?),.*?}', re.S) items = re.findall(pattern, text) |
正则表达式匹配出每一个图片的详情数据。
扩展功能Python转exe文件
好了,我们的程序只能在有Python环境下才能使用。如果不安装python,怎么在windows环境下使用呢,这就需要我们使用PyInstaller模块把python转exe文件。
关于PyInstaller的使用方法请查看教程Python转exe
执行命令:
1 2 |
E:\python\python\Scripts>pyinstaller.exe -F E:\python\python_tools.git\trunk\hua nbanwang\client\Huaban.py |
等命令执行完毕,可以看到我们的exe文件,这个exe文件可以在windows系统上运行,不用安装python环境。
未经允许不得转载:Python在线学习 » Python爬虫-爬取花瓣网图片二(增加搜索功能)