爬虫-漫画喵的100行逆袭 - 喵耳朵(3)_H5之家 - 中国HTML5教程资源分享第一站

HTML5技术

爬虫-漫画喵的100行逆袭 - 喵耳朵(3)

字号+ 作者：H5之家来源：H5之家 2017-02-20 13:02 我要评论( )

from selenium import webdriver from selenium.common.exceptions import NoSuchElementException import os from os import path as osp import urllib # 一个简单的下载器 download(url, save_path): try :with

from selenium import webdriver from selenium.common.exceptions import NoSuchElementException import os from os import path as osp import urllib # 一个简单的下载器 download(url, save_path): try: with open(save_path, ) as fp: fp.write(urllib.urlopen(url).read()) except Exception, et: print(et) : driver = browser = webdriver.PhantomJS(driver) # 浏览器实例 chapter_url = save_folder = osp.exists(save_folder): os.mkdir(save_folder) image_idx = 1 browser.get(chapter_url) True: # 根据前文的分析，找到图片的URI地址 image_url = browser.find_element_by_css_selector().get_attribute() save_image_name = osp.join(save_folder, (% image_idx) + + osp.basename(image_url).split()[-1]) download(image_url, save_image_name) browser.find_element_by_css_selector().click() try: # 找寻弹窗，如果弹窗存在，说明这个章节下载完毕，这个大循环也就结束了 browser.find_element_by_css_selector() break except NoSuchElementException: # 没有结束弹窗，继续下载 image_idx += 1

五、终焉-写在后面

至此，漫画喵的设计思路和主要的代码实现都介绍完了。上面的代码只是用来示意，小喵自己下载漫画用的代码是另一套。github的地址是：https://github.com/miaoerduo/cartoon-cat 。项目只有100多行。不过也用了小喵不少的一段时间。

博客写完了~小喵的漫画也下完了~

图6 下载好的漫画

如果您觉得本文对您有帮助，那请小喵喝杯茶吧~~O(∩_∩)O~~

转载请注明出处~

　

爬虫漫画 100行逆袭耳朵

1.本站遵循行业规范，任何转载的稿件都会明确标注作者和来源；2.本站的原创文章，请转载时务必注明文章作者和来源，不尊重原创的行为我们将追究责任；3.作者投稿可能会经我们编辑修改或补充。

相关文章

Python爬虫基础 - VoidKing

2017-01-23 11:00
记一次企业级爬虫系统升级改造（四）：爬取微信公众号文章（通过搜狗

2017-01-12 10:01
120项改进：开源超级爬虫Hawk 2.0 重磅发布！ - FerventDesert

2017-01-03 13:01
记一次企业级爬虫系统升级改造（一） - 彩色铅笔

2016-12-01 15:00

网友点评

精彩导读

数据库MySQL调优实战经验总结 - 肖邦linux

第一次创业回忆录：从博客走向微博那荡起与陨

为什么企业宁愿开高工资给新员工，都不愿意给

博客园自定义之博客园公告栏添加时钟——利用

vue系列之MVVM框架 - zhaobao1830

热门资讯

关注我们

关注微信公众号，了解最新精彩内容