当前位置：首页 > news >正文

大麦网的网站建设/网站优化seo教程

news 2025/7/1 8:13:17

大麦网的网站建设,网站优化seo教程,企业动态网站,拉新推广怎么快速拉人一, 保存抓取到的数据为json文件.首先新建一个专门保存json文件的pipieline类.import codecs, json# codecs类似于open,但会帮我们处理一些编码问题.class JsonWithEncodingPipeline:def __init__(self):self.file codecs.open(save_file.json, w, encodingutf-8)def process_…

一, 保存抓取到的数据为json文件.

首先新建一个专门保存json文件的pipieline类.

import codecs, json

# codecs类似于open,但会帮我们处理一些编码问题.

class JsonWithEncodingPipeline:

def __init__(self):

self.file = codecs.open('save_file.json', 'w', encoding='utf-8')

def process_item(self, item, spider):

lines = json.dumps(dict(item), ensure_ascii=False) + '\n'

self.file.write(lines)

return item

def spider_close(self, spider):

self.file.close()

上边是我们自己写的保存为json的功能,scrapy也提供了专门保存json的功能

from scrapy.exporters import JsonItemExporter

# exporters.py模块中还有csv, xml,pickle,marshal等文件的保存类

class JsonExporterPipeline:

def __init__(self):

self.file = open('save_file.json', 'wb')

self.exporter = JsonItemExporter(self.file, encoding='utf-8',ensure_ascii=False)

self.exporter.start_exporting()

def process_item(self, item, spider):

self.exporter.export_item(item)

return item

def close_spider(self, spider):

self.exporter.finish_exporting()

self.file.close()

2. 把自己写的上边的两个pipeline之一加入settings.py中,设定好优先级.

二, 把数据异步保存到mysql

scrapy支持异步的方式高并发抓取数据,抓取速度会非常快,而普通的pymysql以及sqlalchemy都是同步阻塞的,如果spider和download模块抓取速度非常快,mysql入库才去同步阻塞的方式,速度就会很慢,会拖累项目的整体速度,所以有必要情况下采取异步入库mysql.

from twisted.enterprise import adbapi

import MySQLdb

import MySQLdb.sursors

class MysqlTwistedPipeline:

def __init__(self, db_pool):

self.db_pool = db_pool

@classmethed

def from_settings(cls, settings):

db_parmas = dict(

host = settings['MYSQL_HOST'],

db = settings['MYSQL_DBNAME'],

user = settings['MYSQL_USER'],

passed = settings['MYSQL_PASSWORD'],

charset = 'utf-8',

cursorclass = MySQLdb.cursors.DictCursor,

use_unicode=True,

) # db_params里的key要和ConnectionPool里的参数对应.

db_pool = adbapi.ConnectionPool("MySQLdb", **db_params)

return cls(db_pool)

def process_item(self, item, spider):

query = self.db_pool.runInteraction(self.do_insert, item)

query.addErrback(self.handle_error [,item, spider]) # 处理异常,

def do_insert(self, cursor, item): # db_pool.runInteraction会传一个cursor

insert_sql = 'insert into table_name(name, age) values(%s, %s)'

cursor.execute(insert_sql, (item['name'], item['age']))

def handle_error(self, failure [, item, spider]): # 处理错误的回调函数

print(failure)

查看全文

http://www.jmfq.cn/news/4863529.html

高端网站建设公司哪家公司好/全网营销图片

湖北城乡建设网站/鼓楼网站seo搜索引擎优化

让别人看到自己做的网站/百度推广代理商查询

都江堰建设局官方网站/seo做的比较好的公司

东莞免费做网站公司/电商线上推广渠道

佛山网站建设优化制作公司/seo营销怎么做

做网站公司郑州郑州的网站建设公司/百度热榜

网站备案幕布拍照/seo完整教程视频教程

wordpress 时间归档/优化大师软件下载

怎么上传做好的网站吗/工具seo

建立耐受什么意思/seo外包是什么意思

企业做网站建设/北京网站优化校学费

做网站哪里买空间好/广州seo关键词优化外包

做网站需要多少钱网络服务/青岛网站关键词优化公司

上海自助建站上海网站建设/茶叶网络营销策划方案

做ppt的软件模板下载网站有哪些/百度推广代运营

相关文章：