Whistlerlib: a distributed computing library for exploratory data analysis on large social network datasets


연구 분야: Networking



학회: Multimedia Tools and Applications


초록

At least 350k posts are published on X, 510k comments are posted on Facebook, and 66k pictures and videos are shared on Instagram each minute. These large datasets require substantial processing power, even if only a percentage is collected for analysis and research. To face this challenge, data scientists can now use computer clusters deployed on various IaaS and PaaS services in the cloud. However, scientists still have to master the design of distributed algorithms and be familiar with using distributed computing programming frameworks. It is thus essential to generate tools that provide analysis methods to leverage the advantages of computer clusters for processing large amounts of social network text. This paper presents Whistlerlib, a new Python library for conducting exploratory analysis on large text datasets on social networks. Whistlerlib implements distributed versions of various social media, sentiment, and social network analysis methods that can run atop computer clusters. We experimentally demonstrate the scalability of the various Whistlerlib distributed methods when deployed on a public cloud platform. We also present a practical example of the analysis of posts on the social network X about the Mexico City subway to showcase the features of Whistlerlib in scenarios where social network analysis tools are needed to address issues with a social dimension.


Author Profile
Alberto Garcia-Robledo

CONAHCYT-CentroGeo Unidad Querétaro Parque Sanfandila s/n Santiago de Querétaro 76703 Querétaro México

Germany
Author Profile
Angelina Espejel-Trujillo

CONAHCYT-CentroGeo Unidad Querétaro Parque Sanfandila s/n Santiago de Querétaro 76703 Querétaro México

Germany

📄 논문 정보

발행 연도 2024년
인용수 0
출판 국가 Germany
사이트 Springer
좋아요 수 0

연관 논문 목록 (11건)