Mental health care needs of caregivers of people with Alzheimer’s disease from online forum analysis

Introduction

Approximately 16 million Americans are informal unpaid caregivers for family members or close friends with Alzheimer’s disease and related dementias (ADRD)¹. Caring for people with ADRD is usually long-term and demanding. Nearly 60% of caregivers of people with ADRD have had caregiver responsibilities for over four years¹, and 60% anticipate continued caregiving in the next five years. About one out of every three of these informal caregivers are older adults (age 65 or older)¹. Studies report that the level of caregiving burden for people with ADRD is higher than with other diseases^2,3, and burdensome caregiving duties hinder the caregiver from caring for themselves⁴. Thus, caregivers are often at high risk of mental distress; including depression, anxiety, and poor quality of life^5,6. Although the burden of caregiving and related mental distress have been studied in this population^7,8,9, investigations about mental health using unfiltered social media data of caregivers of ADRD patients are rare. Listening to the caregivers’ voices is crucial to accordingly support them because it could represent their values and needs better than what is prioritized by the clinicians¹⁰.

Qualitative research has been employed to listen to the caregivers’ needs, yet it has some limitations. First, caregiver interviews are challenging to conduct on a large scale. In addition, recruiting caregivers to take part in research studies can be difficult even with financial incentives, as they often have busy caregiving schedules¹¹. Furthermore, caregivers may be reluctant to share struggles with investigators¹². Therefore, obtaining genuine insights into the real-life concerns and mental distress of caregivers of ADRD patients through in-depth interviews can be challenging.

Online caregiving forums could be a promising data source for investigating caregivers’ genuine struggles due to the anonymity the forums provide, particularly for private concerns for which caregivers are unsure where or with whom to discuss¹². Natural language processing/machine learning (NLP/ML)-based topic modeling approaches have been increasingly used to extract large volumes of user-generated data, including posts from online forums, to examine various health issues (e.g., suicidal ideation and cochlear implants)^{10,13,14,15,16}. The direct voices of informal caregivers obtained from unfiltered discussions would contain rich and valuable information that is not easily available through conventional means, such as clinical settings or surveys. A detailed understanding of the burdens and mental stressors of caregivers may help contribute to the customization of need-based mental health support for caregivers, improve caregiver mental health outcomes and quality of life¹⁷, and ultimately improve the quality of care for ADRD patients^18,19.

To our knowledge, few prior studies have used systematic text-mining methods to analyze the discussions and sentiment of a large, text-based forum of caregivers. We hypothesize that (1) the online caregiving forum provides high-quality data that contains insights into mental distress and related stressors of informal caregivers of people with ADRD, and (2) the NLP/ML method could serve as a promising technique for large-scale thematic analyses for online forum data. Thus, this study aimed to identify novel trends in the caregiver mental stressors and care needs qualitatively and validate the results of our NLP/ML analysis via both a comparison to our qualitative analysis of the online caregiving forum, and a comparison to the existing literature. The findings have the potential to improve our understanding of informal caregivers’ genuine concern and mental distress in more detail, enabling us to prepare tailored support for this vulnerable population.

Methods

Data source: online caregiving forum

ALZConnected.org is a USA-based website for ADRD patients and their caregivers officially supported by Alzheimer’s Association²⁰. The caregiver forum of ALZConnected.org is the place where the caregivers, mostly informal caregivers of people with ADRD are actively seeking and providing advice and information to care for their loved ones⁴. The forum requires official registration as a member in order to write an original post and replies to the posts are granted only to the members. However, all the forum posts are public and seen by anyone without registration. This public forum was previously used for content analysis for other health topics among caregivers and ADRD patients²¹. The Institutional Review Board of Stanford University deemed this study exempt from human subjects research.

Data extraction: web-scraping

Web-scraping was done to retrieve public forum posts, from March 1, 2018 to February 28, 2022. We scraped only publicly available information, including the title, main body, and replies to the body of the posts, the date the post was added, the username of the post, and the date the user joined the forum. We pre-specified ten mental health keywords to selectively extract posts that contained caregivers’ mental distress and related stressors using the following keywords: depression, anxiety health, little interest, hopelessness, nervousness, worrying, loneliness, mental health, and mental distress. Ten keywords were adapted from the terms of widely used mental distress screening tool, patient health questionnaire-4, as well as commonly used terms for mental distress status in online health forums¹⁶. We programmed to scrape the posts if they contained any of the listed keywords, and the extracted information was imported into the Microsoft Excel spreadsheet.

Feasibility of studying mental health care needs of caregivers using online forum data

The feasibility was determined based on whether we were able to investigate informal caregivers’ mental distress and related stressors and needed support through the online caregiving forum data qualitatively and quantitatively. The titles of the posts were screened for this purpose given that they typically encapsulate the essence of the posts (e.g., “Emotionally tired,” “Feeling frustrated,” “Caregiver recovery program?”). Three independent researchers reviewed the titles for feasibility assessment (JK, YC, and ZRC). The primary researcher (JK) reviewed and categorized the posts if the title intended to (1) express the caregivers’ own mental distress or negative emotions, or (2) seek advice or support for their own mental and emotional distress. If the post could be qualified for both, it was assigned to the seeking advice category. If the title did not provide enough information to determine the category, the body of the posts was read as needed. The secondary researcher (YC) reviewed and confirmed the categorization adapting member checking approach. To further ensure the quality of categorization, the third researcher (ZRC) iteratively validated it by randomly selecting 100 titles from the total posts to review and categorize independently. The categorization of the third researcher (ZRC) was compared with the original categorization (JK and YC) and intercoder reliability was reported in terms of Cohen’s kappa.

Data analysis using natural language process (NLP)

Natural language processing (NLP) is a subarea of linguistics, computer science, and artificial intelligence and is widely used for large data analysis in health-related fields^22,23. Machine learning (ML)-based NLP algorithm was used to perform three NLP techniques; tokenization, lemmatization and stemming, and topic modeling. For text analysis, we used some functions of NLTK (Natural Language Toolkit) for text pre-processing. First, NLTK Word-Tokenize split texts into groups of words, which were tokens (e.g., Thanks for any insight here. → ‘Thanks’, ‘for’, ‘any’, ‘insight’, ‘here’, ‘. ’). Second, we removed stopwords (e.g., ‘the’, ‘of’, ‘to’) listed in the NLTK corpus, along with additional stopwords defined at a later stage. This process also involved the removal of articles and punctuation. Third, NLTK Lemmatizer (e.g., WordNetLemmatizer) was used to strip the words down to their most basic form (e.g., being → be, walks → walk, Thank → thank). Fourth, with these pre-processed words, topic modeling was conducted using the Latent Dirichlet Allocation (LDA). LDA is a three-level hierarchical Bayesian model for collections of discrete data, including text corpora²⁴. When LDA is applied to text modeling, it can generate a set of topic probabilities, in which each topic probability could represent a single prominent topic. The LDA model creates a group of topics that contain words that are most likely to belong to using ‘genism’ and ‘pyLDAvis’ libraries. For the topic modeling, we built bigram and trigram models to catch two or three-word groups commonly appearing together for advanced interpretation of the topic modeling results.

Validity assessment of the topics generated by the NLP/ML-based topic modeling

The validity assessment was done with two primary objectives. The first was to validate the performance of the NLP/ML model to see if it can generate the representative topics discussed in the online forum. For this purpose, the topics generated by the NLP/ML topic modeling were compared with the themes qualitatively generated by human readers to see if the NLP/ML modeled topics were matched with the qualitative themes. As a first step for this validation assessment, two researchers labeled the NLP-modeled topics. The primary researcher (SJR) interpreted and labeled the topics using the word clouds from the NLP model. The secondary researcher (SO) reviewed and confirmed the labels based on the member-checking approach. The disagreements were resolved through discussion. As a second step, Thematic Analysis was applied, which has been widely used to identify and analyze patterns of themes, topics, or ideas from online data for individuals with ADRD and their caregivers²¹. Two trained researchers qualitatively analyzed the text data, none of them were involved in the labeling of the NLP topics to avoid potential bias. The primary researcher (JK) read the posts carefully to generate tentative themes of the posts and refined the themes after multiple reviews. Out of posts containing at least one mental health keyword, we initially selected 100 posts at random to create the initial theme codebook, then progressively added additional posts into the analysis until no new topics could be discerned. The secondary researcher (YC) independently read the posts at random to review, check, and further refine the initial codes. Two coders met to reconcile the conflicts through discussion to finalize the codebook. Then, to validate the code, two researchers independently read the posts (body and title) to apply the identified code using approximately 3% of the posts. Two coders discussed conflicts for reconciliation and theme confirmation, and a meeting was planned with the third coder in case the two primary coders were unable to resolve the disagreement.

Upon the completion of the thematic analysis, a trained researcher (ZRC), who was neither involved in the thematic analysis nor NLP topic labeling, assessed the validity of the NLP topic modeling approach of extracting the key information from the large text data, as done previously²⁵. The NLP-modeled topics were compared with themes from the qualitative analysis to be matched and the comparison was reported as a table¹⁰.

The second objective was to validate the online caregiving forum content if it aligns with the existing knowledge on caregivers’ mental distress and related stressors. For this purpose, we compared the NLP/ML modeled topics and themes with the well-developed framework for caregiving burden and strain among informal caregivers of people with ADRD⁴. Two researchers (ZRC and JK), who know the context of the online caregiving forum well, independently examined whether the online forum content (NLP/ML topics and themes) was consistent with the previously reported informal caregivers’ mental distress and related stressors. The assessments were compared and reconciled through discussion, and the agreed validity assessment was reported as a comparison table.

Results

Description of the data source

We extracted posts from a period spanning from March 1, 2018, to February 28, 2022 (Fig. 1). The total number of posts collected was 60,812, composed of 8244 original posts and 52,568 reply posts from 5415 unique users. Among these, 5848 posts contained one or more of the ten designated mental health keywords and were used for topic modeling and qualitative analysis. On average, each unique user made 1.52 original posts. There was an average of 6.4 replies per original post.

Mental health care needs of caregivers of people with Alzheimer’s disease from online forum analysis — **Fig. 1: Study design and data flow chart.**

Feasibility of studying mental health care needs of caregivers using online forum data

We screened all the titles of the posts that contained at least one mental health keyword either in their title or body of the posts. Of a total of 5848 posts (original or reply), 963 posts were identified by three researchers as eligible to study mental distress or the situations that could elicit mental distress in informal caregivers of people with ADRD. Approximately, 93% (894 posts out of 963) were considered as posts intending to express negative emotions, and the rest (7%, 69 posts out of 963) were categorized as posts to seek specific advice or resources to cope with caregiver distress. Cohen’s kappa was 0.90 (almost perfect agreement). Table 1 presents the example titles that repeatedly appeared among those eligible titles.

Table 1 Example titles that showed the mental health care needs of informal caregivers of a person with ADRD from the online discussion forum^a

Full size table

Results of the NLP/ML-based topic modeling

The NLP/ML-based topic modeling created the most salient eight topics out of the original posts, providing ten keywords for each topic (Table 2). The eight topics represent the most significant and frequent discussions from the online forum, including caregiving duty and burden (e.g., Topic 2: talk, call, work, sit, sleep, and eat), coping strategies, and caregiver support (e.g., Topic 5: heart, love, peace, light, good, and thought), and institutionalization (e.g., Topic 7: move, place, facility, home, house, and pay). These ten keywords were visualized as word clouds in Supplementary 1.

Table 2 Representative topics of informal caregivers’ concern and distress from online discussion forum generated by NLP/ML topic modeling

Full size table

The main themes of the online caregiver forum created by the qualitative analysis

Approximately, 3% of the total posts were analyzed, leading to the identification of two primary themes with eight subthemes (for care recipients: symptoms, medications, relocation, care duty share, new diagnosis, conversation strategy with a person with dementia, PWD; for caregivers: caregiver burden, caregiver support) (Table 3).

Table 3 Representative themes of informal caregivers’ primary issues from online discussion forum generated by qualitative analysis

Full size table

Validity of the NLP/ML topic modeling

All eight major topics generated from the NLP/ML topic modeling successfully aligned with the qualitatively defined themes (Table 4). However, there was one difference between the NLP/ML topics and themes. Medication (Theme 2) was not matched with any of the eight NLP/ML topics.

Table 4 The comparisons of informal caregivers’ concern and distress between the NLP/ML generated topics and qualitatively defined themes

Full size table

Validity of online discussion forum data

The online discussion forum content, which was represented by the eight NLP/ML-generated topics validated through manually defined themes, was successfully aligned with the existing framework of caregiver concerns and stress⁴. Specifically, these topics were matched with six primary categories of the framework, including physical and psychological morbidity, social isolation, lack of support, nursing home admission, predictors, and protectors of caregiver distress (disease severity and perception and experience of caregiving role) (Table 5).

Table 5 Comparison of caregiving stress and mental distress of family caregivers of a person with ADRD from the online discussion forum with existing framework⁴

Full size table

Discussion

We assessed the feasibility of studying the mental health care needs of informal caregivers of people with ADRD using online caregiving forum data mining. Our findings demonstrate that this methodology applied to public online forums is valuable in identifying caregivers’ mental distress and related stressors. Furthermore, the NLP/ML-generated topics that provided valuable representative categories were mostly consistent with our qualitatively defined themes and existing framework for caregivers’ mental distress and stressors. However, we also witnessed the limitation of the NLP/ML topic modeling in that it was unable to detect a topic discussed with heterogenous words while the topic was defined as a main theme by human readers. All in all, our findings highlight that the use of public online forum data from caregivers and patients could be a promising approach for gaining insights to support family caregivers.

We examined whether the content of the online discussion forum was qualitatively and quantitatively sufficient to advance our understanding of the mental distress and related stressors of informal caregivers of people with ADRD. This rich dataset provided a comprehensive knowledge of the challenges and emotional experiences faced by caregivers. Among those, the majority of the discussions focused on simply expressing their emotional distress or sharing challenges they experienced day to day as a catharsis. Yet, some caregivers specifically asked about coping with mental distress and recommended self-care for themselves. Moreover, we found that there were repeatedly expressed emotions besides depression and anxiety, which were “venting,” “mad,” “frustrated,” and “exhausted.” These findings suggest that further examination of online caregiver forum contents could illuminate the intensity of mental and emotional distress and allow us to identify common situations related to these negative emotions. Understanding the perspectives of informal caregivers of people with ADRD can deepen our insight into their caregiving burden, enabling the design of interventions that meet their needs¹⁶. Given that caregiving for ADRD patients is typically a long-term and burdensome commitment, amplifying their voices to provide tailored support could improve their mental health outcomes and quality of life. This could, in turn, potentially enhance the quality of care for ADRD patients¹⁸.

Additionally, we would like to note that we identified a few posts that showed caregivers’ severe mental distress with extreme expressions (e.g., “suicide”). Although fellow community peers sent supportive messages in these specific cases, some of those who wrote the posts seemed to require immediate attention and external support. This suggests that it is very important to discuss the potential role of online caregiving communities in identifying caregivers at high risk of severe mental distress and facilitating timely support as those become an essential source for caregiving duty^16,26.

In this study, we observed that the NLP/ML-modeled topics of the online caregiving forum were valid, meaning these were representative of the discussion forum and resonated with existing knowledge on caregivers’ mental distress and related stressors⁷. Particularly, Brodaty et al.‘s systematic categorization of family caregiver distress among those people with ADRD included physical and psychological morbidity (e.g., poor sleep quality, chronic disease condition), social isolation, lack of support (instrumental, emotional, and informational), institutionalization (e.g., financial burden, guilt, and depression), disease severity (e.g., behavioral problems), experience of caregiving role⁴. All the eight major topics generated by the NLP/ML algorithms corresponded well with these known concerns and stressors of informal caregivers. The findings highlight that the NLP/ML-enabled online forum data analysis could benefit caregiver research in assessing their further issues and needed support, given its less resourceful nature than the conventional qualitative approach.

Notably, there were prominent discussions around the relocation of a loved one with ADRD in the online forum. The content revealed that family caregivers of those with ADRD often face significant challenges before and after institutionalization, both physically (e.g., sleep deprivation) and psychosocially (e.g., feelings of frustration, guilt, and depression). Caregivers sought advice on when, where, and how to facilitate the transition, and grappled with emotional distress post-institutionalization, often exacerbated by verbal abuse from loved ones blaming them for the move. These insights provide a detailed understanding of the challenges faced by family caregivers, aspects that are not frequently explored in existing literature. Our findings highlight the urgent need for considerable informational and emotional support for informal caregivers²⁷, especially those in the process of institutionalization.

One difference between the NLP/ML topics and themes that we observed needs to be noted. The topic modeling did not cover the topics about medication use while this was defined as a major theme by qualitative analysis. Perhaps the situation that people mentioned specific product names instead of the general term ‘medication’ might have been one reason that the NLP/ML model was not able to catch this discussion. This indicates that the NLP/ML topic modeling method should be interpreted with caution when used for content with heterogeneous words. Further research would be needed on the systematic validation of the topic modeling method as it is becoming a promising tool in health research.

The limitations of the study need to be acknowledged. We used data from a public online forum which makes it hard to guarantee the authenticity of the posts²⁸. It is possible that some users may not be genuine caregivers of people with ADRD. Additionally, while our sample size was sufficient, our findings might not be generalizable to all caregivers not using this specific online forum. Despite the rigorous validation procedure we followed, the machine learning algorithms might have limitations in fully capturing the subtleties and complexities of human emotions and feelings, which can lead to some degree of misclassification or oversimplification of the topics²⁹. Further research is required to enhance the accuracy and sensitivity of these tools to better interpret the vast and complex emotional landscapes within the caregiving community. Furthermore, as the data used in our analysis was publicly available and anonymous, we had limited demographic information about the caregivers participating in the forum. Thus, we were unable to explore the possible associations between caregivers’ mental health issues and demographic factors such as age, gender, and socioeconomic status, or their relationship to the person with ADRD. In addition, to extract the caregivers’ mental distress-related posts, we used ten predefined mental health keywords. Hence, it is possible that not all the relevant posts were included in this study. Lastly, given that the word cloud could be interpreted in different ways, precise labeling was challenging. Thus, it is possible that the labels might not be able to cover all the representative discussions.

In sum, the online caregiver forum data and the NLP/ML topic modeling enabled us to study mental distress and needed support from informal caregivers of people with ADRD. The findings from rigorous validation shed light on the potential of NLP/ML-based text analysis of the online discussion forum for informal caregiver research that can further assess needed support for this vulnerable population. This is meaningful because the online platforms provide a unique chance to access the voices and perspectives of caregivers who may not readily disclose their mental health concerns in conventional research settings¹². The approach could also be applied to other online communities for patients and caregivers³⁰, opening new opportunities for using patient and caregiver-generated data to provide need-based tailored support.

Introduction

Methods

Data source: online caregiving forum

Data extraction: web-scraping

Feasibility of studying mental health care needs of caregivers using online forum data

Data analysis using natural language process (NLP)

Validity assessment of the topics generated by the NLP/ML-based topic modeling

Results

Description of the data source

Feasibility of studying mental health care needs of caregivers using online forum data

Results of the NLP/ML-based topic modeling

The main themes of the online caregiver forum created by the qualitative analysis

Validity of the NLP/ML topic modeling

Validity of online discussion forum data

Discussion

Related Articles

Responses