高级检索
吴向阳, 陈万烤, 张祯, 王程, 刘宇. 面向订票服务器端爬虫的可视检测方法研究[J]. 计算机辅助设计与图形学学报, 2018, 30(1): 20-29. DOI: 10.3724/SP.J.1089.2018.16926
引用本文: 吴向阳, 陈万烤, 张祯, 王程, 刘宇. 面向订票服务器端爬虫的可视检测方法研究[J]. 计算机辅助设计与图形学学报, 2018, 30(1): 20-29. DOI: 10.3724/SP.J.1089.2018.16926
Wu Xiangyang, Chen Wankao, Zhang Zhen, Wang Cheng, Liu Yu. Visual Detection for Crawler on Booking Servers[J]. Journal of Computer-Aided Design & Computer Graphics, 2018, 30(1): 20-29. DOI: 10.3724/SP.J.1089.2018.16926
Citation: Wu Xiangyang, Chen Wankao, Zhang Zhen, Wang Cheng, Liu Yu. Visual Detection for Crawler on Booking Servers[J]. Journal of Computer-Aided Design & Computer Graphics, 2018, 30(1): 20-29. DOI: 10.3724/SP.J.1089.2018.16926

面向订票服务器端爬虫的可视检测方法研究

Visual Detection for Crawler on Booking Servers

  • 摘要: 恶意下载爬虫对电子商务造成了不可忽视的损失,为此设计了一套适用于各航空公司的通用查订票爬虫检测系统.首先系统以航线图、柱状图、饼图等多种可视化手段展现各时段的爬虫检测效果;其次基于SVM分类器,并结合IP地址聚合和查询量排序技术,设计了高效识别动态IP爬虫等多种爬虫的检测算法;最后通过特征筛选、IP历史详情查询等可视界面支持用户手工筛选训练样本,以更新SVM分类模型.以某航空公司访问E-Build服务器的日志数据进行的实验结果表明,该系统能够有效地抓取多种爬虫,大大降低无效查询量,并且能方便地更新分类模型,保持检测算法的长效性.

     

    Abstract: Large losses have been caused by malicious crawlers,demanding an anti-crawler system.This paper presents a general visual analytics system to detect crawlers in airlines’booking servers.First,several data visualization and analysis tools,including route map,histogram and pie chart,are provided to show the result of crawler detection at any time every day.Then,based on SVM classifier and combined with IP address aggregation,an effective algorithm is designed for recognizing various types of crawlers,especially dynamic IP ones.Additionally,by means of feature value filtering and according to IP’s historic behavior,the user can select optimum samples to retrain the SVM classifier.The results of our experiment using the log data from a airline show that our system can identify most crawlers and can adapt to the evolution of crawlers to maintain long-term effectiveness.

     

/

返回文章
返回