연구 분야: Databases
학회: Progress in Artificial Intelligence
Sentiment analysis on big data presents unique challenges due to the volume of unstructured data. Traditional single-node systems struggle with this scale, necessitating the use of distributed computing systems like Apache Spark. This study investigates the role of large-scale data preprocessing and feature extraction in sentiment analysis tasks. We conducted a comprehensive set of experiments using four preprocessing techniques and two word vectorization methods to evaluate their impact on the performance of Multi-Layer Perceptrons (MLPs) in Apache Spark. Our results indicate that the choice of preprocessing and feature extraction methods significantly influences model performance. Furthermore, our MLP architecture demonstrated both computational scalability and high accuracy performance in Apache Spark. These findings highlight the importance of large-scale data preprocessing and feature extraction in sentiment analysis on big data, and the effectiveness of using MLPs in Apache Spark for these tasks.
| 발행 연도 | 2024년 |
|---|---|
| 인용수 | 2 |
| 출판 국가 | Morocco, Andorra |
| 사이트 | Springer |
| 좋아요 수 | 0 |