备忘:python抓取网页

      No Comments on 备忘:python抓取网页

1.土法:

curl -o '+cid+'.xml --compressed  http://comment.bilibili.tv/'+cid+'.xml

然后Readline。
2.正经办法:

import urllib2
response = urllib2.urlopen('http://www.baidu.com/')
html = response.read()
print html

3.带gzip正经办法:

from StringIO import StringIO
import gzip
request = urllib2.Request('http://outofmemory.cn/')
request.add_header('Accept-encoding', 'gzip')
response = urllib2.urlopen(request)
if response.info().get('Content-Encoding') == 'gzip':
    buf = StringIO( response.read())
    f = gzip.GzipFile(fileobj=buf)
    data = f.read()

 
 
 

Leave a Reply

Your email address will not be published. Required fields are marked *