Exploring the impact of preprocessing and feature extraction on deep learning-based sentiment analysis for big data in apache spark


연구 분야: Databases



학회: Progress in Artificial Intelligence


초록

Sentiment analysis on big data presents unique challenges due to the volume of unstructured data. Traditional single-node systems struggle with this scale, necessitating the use of distributed computing systems like Apache Spark. This study investigates the role of large-scale data preprocessing and feature extraction in sentiment analysis tasks. We conducted a comprehensive set of experiments using four preprocessing techniques and two word vectorization methods to evaluate their impact on the performance of Multi-Layer Perceptrons (MLPs) in Apache Spark. Our results indicate that the choice of preprocessing and feature extraction methods significantly influences model performance. Furthermore, our MLP architecture demonstrated both computational scalability and high accuracy performance in Apache Spark. These findings highlight the importance of large-scale data preprocessing and feature extraction in sentiment analysis on big data, and the effectiveness of using MLPs in Apache Spark for these tasks.


Author Profile
Ibtissam Youb

Department of ElectronicCCPS Laboratory ENSAM University of Hassan II Casablanca Morocco

Morocco
Author Profile
Sebastián Ventura

Department of Computer Science and Numerical Analysis University of Córdoba Córdoba Spain

Andorra
Author Profile
Mohamed Hamlich

Department of Computer Science and Numerical Analysis University of Córdoba Córdoba Spain

Andorra

📄 논문 정보

발행 연도 2024년
인용수 2
출판 국가 Morocco, Andorra
사이트 Springer
좋아요 수 0

연관 논문 목록 (97건)