Leveraging Big Data for Early Detection of Depression: Developing a Machine Learning Model Using Tweets
Keywords:
Depression, social media, sentiment analysis, machine learning, natural language processingAbstract
This study explores the use of machine learning algorithms for detecting depression in social media data. A comprehensive literature review was conducted to identify the various approaches and techniques used in the field. The data collection involved the extraction of over 100,000 tweets from Twitter using specific keywords related to depression. The dataset was labeled for negative, positive, and neutral polarity, with 18,730 negative, 44,272 neutral, and 38,815 positive tweets. Eight different machine learning models, including SVM, Naive Bayes, Random Forest, KNN, Decision Tree, XGBClassifier, MultinomialNB, and Logistic Regression, were applied to the dataset for classification. The performance of each model was evaluated using accuracy, precision, recall, and F1-score metrics. The results indicate that Random Forest had the highest accuracy of 88.38%, followed by Support Vector Machine (SVM) with an accuracy of 86.95%. The study shows that machine learning models can be effective in detecting depression in social media data and can help identify individuals who may be at risk of depression.
Downloads
References
American Psychiatric Association. (2013). Diagnostic and statistical manual of mental disorders (5th ed.). American Psychiatric Association Publishing.
Coppersmith, G., Harman, C., & Dredze, M. (2014). Measuring post traumatic stress disorder in Twitter. In Eighth International AAAI Conference on Weblogs and Social Media.
De Choudhury, M., Gamon, M., Counts, S., & Horvitz, E. (2013). Predicting depression via social media. In Seventh International AAAI Conference on Weblogs and Social Media.
Guntuku, S. C., Yaden, D. B., Kern, M. L., Ungar, L. H., & Eichstaedt, J. C. (2019). Detecting depression and mental illness on social media: An integrative review. Current Opinion in Behavioral Sciences, 31, 82-89.
Homan, C. M., Johar, R., Liu, T., & Lytle, M. C. (2017). Detecting depression with social media: An exploratory study. Journal of Medical Internet Research, 19(6), e202.
Huang, K. Y., & Coppersmith, G. (2014). “An analysis of user-generated content on reddit for the detection of side effects of psychotropic medication.” Proceedings of the 2nd Workshop on Natural Language Processing and Computational Social Science, 106-111.
Lin, C., Wang, P., Chen, C., & Lu, C. (2014). Classifying emotions in microblog using feature selection and machine learning techniques. Information Sciences, 279, 722-737.
Tsugawa, S., Kikuchi, Y., & Kishino, F. (2015). Sentiment analysis of tweets on the Fukushima disaster: A comparison of methods and languages. PLoS ONE, 10(6), e0126077.
Forte, A., Guntuku, S. C., Jakovljevic, M., Smailović, J., & Elhadad, N. (2021). Exploring the Relationship Between Language and Mental Health Using Machine Learning: A Case Study of Reddit and Anxiety. Journal of medical Internet research, 23(6), e26363.
Zhang, Y., Chen, Q., Yang, Y., & Zhou, X. (2018). Bias in natural language processing: A case study of sentiment classification. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (pp. 1083-1092).