这篇文章主要记录scrapy中的关于request的一些知识
-
一般可以直接用request发起网络请求,request方法可以带一些常用的参数
request部分源码: class Request(object_ref): def __init__(self, url, callback=None, method='GET', headers=None, body=None, cookies=None, meta=None, encoding='utf-8', priority=0, dont_filter=False, errback=None): self._encoding = encoding # this one has to be set first self.method = str(method).upper() self._set_url(url) self._set_body(body) assert isinstance(priority, int), "Request priority not an integer: %r" % priority self.priority = priority assert callback or not errback, "Cannot use errback without a callback" self.callback = callback self.errback = errback self.cookies = cookies or {} self.headers = Headers(headers or {}, encoding=encoding) self.dont_filter = dont_filter self._meta = dict(meta) if meta else None
-
在研究scrapy_redis的时候发现源码中的去重方案里即有request又有request_fingerprint. 经过对比总结如下:
request_fingerprint叫做请求的指纹信息,存储的是请求的资源哈希标志,这个理解可能还是不准确,源码注释如下: The request fingerprint is a hash that uniquely identifies the resource the request points to. request_fingerprint能过滤掉请求统一资源的request,存储在dupefilter的redis数据库的set集合类型里 request_fingerprint实例: "18f32856cf15db66a6271f3a414bef6e4620fce3" 通过request_fingerprint过滤的request会被存储在用redis数据库的sorted set 集合类型重写的queue里,这又是一层过滤。 request中包含的信息包括url,callback等上面Request中的参数 request实例: redis 127.0.0.1:6379> zrange smzdm:requests 0 0 1) "\x80\x02}q\x01(U\x04bodyq\x02U\x00U\_encodingq\x03U\x05utf-8q\x04U\cookiesq\x05}q\x06U\x04metaq\}q\U\x05depthq\K\x04sU\heade rsq\}q\x0bU\Refererq\x0c]q\U\x1ahttp://faxian.smzdm.com/p4q\x0easU\x03urlq\x0fX\x1f\x00\x00\x00http://www.smzdm.com/p/6046409/U\ x0bdont_filterq\x10\x89U\priorityq\x11K\x00U\callbackq\x12U\x0cparse_detailq\x13U\x06methodq\x14U\x03GETq\x15U\errbackq\x16Nu." redis 127.0.0.1:6379>
参考文章:
结语:
坚持每天进步一点点…