Web Mining from Interpretable Compressed Representation of Sparse Web


연구 분야: Databases



학회: 2022 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)


초록

Large datasets often contain computational constraints when under the non-trivial extraction of implicit, previously unknown, and potentially useful information. These datasets are everywhere, with a popular example being the World Wide Web. It acts as a mass data producer and consumer across multiple devices in a distributed fashion worldwide, containing massive amounts of data. The discovery of knowledge on the Web requires web intelligence solutions, which take advantages of data mining and data science. In the case of web mining, the mining of web structures provides commonly recommended web pages to web surfers by examining incoming and outgoing links on web pages. The overall size of the web is however sparse. Sparsity of the web comes from a high number of vertex nodes (i.e., web pages), with a small number of directed edges (i.e., incoming and outgoing hyperlinks between web pages). In this paper, we present a solution to the mining of frequent patterns from the sparse web. From the sparsity of the web, web pages are captured in compressed bitmaps that are then mined for discovery of these patterns. Our bitmap model ensures readability, flexibility, and allows for the capturing of important information across multiple 31-bit groups. The mining process is demonstrated on real-life web data to present its capacity of mining for interesting patterns from interpretable compressed representation of sparse data.


Author Profile
Carson K. Leung

Department of Computer Science University of Manitoba Winnipeg MB Canada

Canada
Author Profile
Connor C.J. Hryhoruk

Department of Computer Science University of Manitoba Winnipeg MB Canada

Canada

📄 논문 정보

발행 연도 2022년
인용수 234
출판 국가 Canada
사이트 IEEE
좋아요 수 0

연관 논문 목록 (154건)