An efficient method for extracting web news content

Jian Sun, Luyang Tang, Dan Liao, Victor Chang

Research output: Chapter in Book or Report/Conference proceedingConference Proceedingpeer-review

1 Citation (Scopus)

Abstract

Web news extraction is a very important step in the process of Web intelligent information processing. It is the basis of research and application of network public opinion monitoring, heterogeneous Web data source integration and information retrieval. Therefore, the research and design of Web news content information extraction method has important research and application value. Using the idea of web information extraction based on statistics and web structure, this paper improves an existing webpage text extraction algorithm named ERBDF and designs a web news text extraction algorithm based on statistics and DOM tree structure (EETD). Finally, two algorithms are tested and compared in the accuracy and speed of text extraction and the results show that EETD has a better overall performance.

Original languageEnglish
Title of host publicationProceedings of 2017 International Conference on Engineering and Technology, ICET 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1-5
Number of pages5
ISBN (Electronic)9781538619490
DOIs
Publication statusPublished - 2 Jul 2017
Event2017 International Conference on Engineering and Technology, ICET 2017 - Antalya, Turkey
Duration: 21 Aug 201723 Aug 2017

Publication series

NameProceedings of 2017 International Conference on Engineering and Technology, ICET 2017
Volume2018-January

Conference

Conference2017 International Conference on Engineering and Technology, ICET 2017
Country/TerritoryTurkey
CityAntalya
Period21/08/1723/08/17

Keywords

  • DOM Tree Structure
  • Information Extraction
  • Statistical Methods
  • Web News

Fingerprint

Dive into the research topics of 'An efficient method for extracting web news content'. Together they form a unique fingerprint.

Cite this