python教程_H5之家 - 中国HTML5教程资源分享第一站

# [视频地址](https://yun.baidu.com/share/link?shareid=3374188589&uk=4161683797&fid=613858973117419) 经过前几节课的学习，糗事百科那种静态网页相信你已经可以轻松搞定了。但是我们经常会遇到一些右键查看源码无法抓取的网站，举个栗子： https://www.baidu.com/s?wd=美食我们可以看到，点击第二页第三页，url根本没变。右键查看源码也找不到这些菜，这就是个动态网页。 ![](https://leanote.com/api/file/getImage?fileId=57c6eb0eab644135ea068533) ![](https://leanote.com/api/file/getImage?fileId=57c6ed30ab644135ea068558) 怎么破？**抓包大法好！** ## google chrome 抓包大法 ![](https://leanote.com/api/file/getImage?fileId=57c6ebdeab644135ea068547) ![](https://leanote.com/api/file/getImage?fileId=57c6ed71ab644135ea06855b) 这个时候我们再点击第二页（现在在第三页） ![](https://leanote.com/api/file/getImage?fileId=57c6f0bfab644135ea06858e) 这才是真正的url ![这才是真正的url](https://leanote.com/api/file/getImage?fileId=57c6f0bfab644135ea06858f) 我们去访问这个url，有惊喜 ![](https://leanote.com/api/file/getImage?fileId=57c6f0bfab644135ea06858d) 返回了一个json格式的字符串（像不像输出了python中的dict）经过我们这么一系列的抓包，找到了真正的url，我们直接抓取这个url就可以了，其他的步骤和静态网页一样，抓取下来之后进行数据解析。 ## json数据解析记住两个函数！ ### json.loads json字符串转成dict >>> import json >>> json_str = '{"a":1, "name": "b"}' >>> json.loads(json_str) {'a': 1, 'name': 'b'} ### json.dumps dict转成json字符串 >>> json_dict = {"a": 1, "name": "b"} >>> json.dumps(json_dict) '{"a": 1, "name": "b"}' ## 抓取百度美食 python3 import requests import re import json def crawl(page): pn = page * 8 url = "https://sp0.baidu.com/8aQDcjqpAAV3otqbppnN2DJv/api.php?resource_id=6875&from_mid=1&&format=json&ie=utf-8&oe=utf-8&query=%E7%BE%8E%E9%A3%9F&sort_key=&sort_type=1&stat0=&stat1=&stat2=&stat3=&pn=" + str(pn) + "&rn=8&cb=jQuery110200319478991186668_1472651805605&_=1472651805613" res = requests.get(url) json_str_re = re.compile("{.*}") json_str = json_str_re.search(res.text).group() food_dict = json.loads(json_str) for food in food_dict["data"][0]["disp_data"]: print(food["ename"]) if __name__ == '__main__': crawl(1)