本文共 7191 字,大约阅读时间需要 23 分钟。
前几篇文章,我们对慕课网的课程进行了爬取,本文就对数据进行统计和可视化,让这些数据更直观的展现出来。
介绍
Flask 是基于Python的非常流行的框架之一,主要用于web开发,适合开发中小型项目,易于扩展。Flask的官方网站是 。
Echarts ( )是百度出品的,基于Canvas的,纯Javascript 的图表库,提供直观,生动,可交互,可个性化定制的数据可视化图表。创新的拖拽重计算、数据视图、值域漫游等特性大大增强了用户体验,赋予了用户对数据进行挖掘、整合的能力。
搭建Flaskweb项目
安装必要的依赖库
pip install Flaskpip install PyMySQL
web项目目录结构如下:
├── web│ │ ├── static│ │ └── js│ │ ├── dark.js│ │ └── echarts.min.js│ ├── templates│ │ └── index.html│ ├── __init__.py│ └── views.py├── runserver.py
其中runserver.py
为项目启动文件:
#!/usr/bin/python# -*- coding: utf-8 -*-from web import appif __name__ == '__main__': app.run(host='0.0.0.0', debug=True)
__init__.py
是项目的主文件
# -*- coding: utf-8 -*-from flask import Flaskapp = Flask(__name__)import web.views
views.py
为视图函数:
# -*- coding: utf-8 -*-import contextlibimport pymysqlfrom flask import jsonify, make_response, render_template, requestfrom web import app# 数据库连接# 定义上下文管理器,连接后自动关闭连接@contextlib.contextmanagerdef mysql(host='127.0.0.1', port=3306, user='root', passwd='abc-123', db='demo_db', charset='utf8'): conn = pymysql.connect( host=host, port=port, user=user, passwd=passwd, db=db, charset=charset) cursor = conn.cursor(cursor=pymysql.cursors.DictCursor) try: yield cursor finally: conn.commit() cursor.close() conn.close()# 首页@app.route('/')def hello_world(): return render_template('index.html')# 每个课程类型的课程数@app.route('/api/type')def api_type(): with mysql() as cursor: cursor.execute( "SELECT type as name,count(id) as value from imooc_courses GROUP BY type" ) return json_success(cursor.fetchall())# 每个学习方向的课程数@app.route('/api/cate')def api_cate(): with mysql() as cursor: cursor.execute( "SELECT cate as name,count(id) as value from imooc_courses GROUP BY cate" ) cate_data = cursor.fetchall() cate_data_new = transform_cate(cate_data) return json_success(cate_data_new)# 所以课程的学习人数@app.route('/api/learn_num')def api_learn_num(): with mysql() as cursor: cursor.execute( "SELECT title as name,learn_num as value from imooc_courses ORDER BY learn_num ASC" ) return json_success(cursor.fetchall())# 每个方向的学习人数@app.route('/api/learn_num_cate')def api_learn_num_cate(): with mysql() as cursor: cursor.execute( "SELECT cate as name,CAST(sum(learn_num) AS CHAR) as value from imooc_courses GROUP BY cate ORDER BY sum(learn_num) DESC" ) cate_data = cursor.fetchall() cate_data_new = transform_cate(cate_data) return json_success(cate_data_new)# 难度级别@app.route('/api/difficulty_level')def api_difficulty_level(): with mysql() as cursor: cursor.execute( "SELECT difficulty_level as name,count(id) as value from imooc_courses GROUP BY difficulty_level" ) return json_success(cursor.fetchall())# 课程评分@app.route('/api/overall_rating')def api_overall_rating(): with mysql() as cursor: cursor.execute( "SELECT overall_rating as name,count(id) as value from imooc_courses GROUP BY overall_rating order by overall_rating+0 ASC" ) return json_success(cursor.fetchall())# 课程评分@app.route('/api/duration')def api_duration(): with mysql() as cursor: cursor.execute( "SELECT duration as name,count(id) as value from imooc_courses GROUP BY duration order by duration+0 ASC" ) return json_success(cursor.fetchall())# 学习人数与评分的关系@app.route('/api/bubble_gradient')def api_bubble_gradient(): with mysql() as cursor: cursor.execute( "SELECT overall_rating,learn_num,0,title FROM imooc_courses") return json_success(cursor.fetchall())# 搜索@app.route('/api/search')def api_search(): if request.values.get('keywords'): keywords = request.values.get('keywords') else: keywords = '' with mysql() as cursor: cursor.execute("SELECT * FROM imooc_courses WHERE title like '%" + keywords + "%' or cate like '%" + keywords + "%' or type like '%" + keywords + "%' or brief like '%" + keywords + "%' order by learn_num desc limit 50") return json_success(cursor.fetchall())# 由于一个课程可能存在多少cate,以逗号分隔,所以此处重新组合def transform_cate(cate_data): cate_data_tmp = {} for item in cate_data: if item['name'] == '': item['name'] = '其他' if item['name'].find(',') > 0: for item_sub in item['name'].split(','): if item_sub not in cate_data_tmp.keys(): cate_data_tmp[item_sub] = item['value'] else: cate_data_tmp[item_sub] = int( cate_data_tmp[item_sub]) + int(item['value']) else: if item['name'] not in cate_data_tmp.keys(): cate_data_tmp[item['name']] = item['value'] else: cate_data_tmp[item['name']] = int( cate_data_tmp[item['name']]) + int(item['value']) cate_data_new = [] for key in cate_data_tmp: cate_data_new.append({'name': key, 'value': cate_data_tmp[key]}) return cate_data_new# 返回json数据def json_success(data): data = {'status': 'success', 'data': data, 'info': '成功'} response = make_response(jsonify(data)) # 支持跨域 response.headers['Access-Control-Allow-Origin'] = '*' response.headers['Access-Control-Allow-Methods'] = 'GET,POST' return response
templates\index.html
为模板文件,主要是通过views.py接口提供的数据,用Echarts进行可视化。
数据可视化分析
不多解释了,请看代码。
运行项目
python runserver.py
线上项目请结合uwsgi+nginx部署,这里就不多说啦。
最终效果
数据可视化分析(慕课网) 数据来源 慕课网,使用python-scrapy爬取数据,解析,预处理,缓存于mysql。
可视化采用python的flask框架获取统计数据,使用 Echarts进行简单的可视化。
转载地址:http://smuws.baihongyu.com/