Task-sharing and telemedicine delivery of psychotherapy to treat perinatal depression: a pragmatic, noninferiority randomized trial
Main
One in five women experience depression or anxiety during the perinatal period (pregnancy up to the year following childbirth)1,2. Treatment is essential given the negative, long-term and intergenerational impact on maternal and child developmental outcomes3. Brief psychotherapies are first-line, evidence-based treatments4,5. They are preferred by perinatal populations over pharmacotherapy6 and recommended by major clinical guidelines such as the UK National Institute for Health and Clinical Excellence7, the United States Preventive Task Force8 and the Canadian Network for Mood and Anxiety Disorders Treatments9,10. However, access is limited, with barriers including costs, stigma and the inequitable distribution of mental health professionals11. As a result, only 10% of affected perinatal patients in high-income countries (HICs) receive psychotherapy12.
Both task-sharing and telemedicine are scalable, patient-centered solutions to improve access to psychotherapy. Scalability is defined as the potential of an intervention to reach large numbers of individuals13. In task-sharing, nonspecialist treatment providers—individuals without a specialized degree or prior experience delivering mental healthcare—are trained to deliver brief, manualized psychotherapies and have been shown to treat a range of mental health conditions worldwide14,15. A previous systematic review16 yielded 45 randomized controlled trials of nonspecialist-delivered psychological treatments for perinatal populations with common mental health conditions. The results suggested that nonspecialists—namely nurses and midwives—could be trained to deliver psychological treatments for perinatal populations with depressive and anxiety symptoms in HICs. However, in both low- and middle-income and HICs, most trials use an inactive control group (for example, a waitlist) and no trials, to our knowledge, evaluated whether different provider types were able to deliver the same treatments comparably. Thus, while the efficacy of this approach has been shown to reduce perinatal depressive and anxiety symptoms worldwide14,16, the relative effectiveness of nonspecialists compared with specialist providers remains unknown.
Using telemedicine, psychotherapies can offer patients flexibility by reducing barriers such as scheduling, transportation and childcare11,17. Telemedicine-delivered psychotherapies are generally preferred by perinatal patients and are efficacious in reducing perinatal depressive and anxiety symptoms18,19. While the use of telemedicine rose dramatically during the COVID-19 pandemic, meaningful comparisons of telemedicine-delivered psychotherapy with in-person delivery have been impeded by inadequately powered trials to assess their effectiveness in common mental disorders. Thus, it is unclear whether telemedicine-delivered psychotherapy is a reasonable alternative to in-person psychotherapy, particularly for perinatal populations. Further, to our knowledge, no one trial has compared both task-sharing and telemedicine with specialists and in-person care in real-world healthcare settings. The combination of these two effective and patient-centered solutions has the potential to address the substantial and growing treatment gap for perinatal populations and beyond.
In the Scaling Up Maternal Mental healthcare by Increasing access to Treatment (SUMMIT) trial, we compared the effectiveness of provider (nonspecialist versus specialist) and modality (telemedicine versus in-person) in the delivery of one psychotherapy known as behavioral activation (BA) therapy. BA is a relatively brief20, evidence-based treatment that has been implemented effectively by a range of nonspecialists globally21,22,23. BA was selected because of its well-established efficacy for depressive and anxiety symptoms in both general and perinatal populations24,25, and its comparable efficacy to both antidepressant medication26,27 and longer courses of psychotherapy27,28. The SUMMIT open-access treatment manual is available online at no cost (www.thesummittrial.com). The current study focused on objectives that assessed outcomes at 3 months post-randomization.
The primary objectives examined (1) whether BA delivered by nonspecialists was noninferior to BA delivered by specialists in treating perinatal depressive symptoms and (2) whether BA delivered via telemedicine was noninferior to in-person BA in treating perinatal depressive symptoms. The secondary objectives assessed (1) whether BA delivered by nonspecialists was noninferior to BA delivered by specialists in treating perinatal anxiety symptoms and (2) the moderating effects of symptom severity (mild, moderate or severe) on the comparative effectiveness of providers and modality types on depressive and anxiety symptoms. The exploratory analyses examined (1) the noninferiority of key process variables and (2) whether participants recruited during different trial phases could be combined given recruitment occurred throughout the COVID-19 pandemic.
Results
Enrollment and participant characteristics
From 8 January 2020 to 4 October 2023, 3,629 individuals were approached from five clinical sites (Mount Sinai Hospital, Women’s College Hospital, St. Michael’s Hospital, University of North Carolina and Endeavor Health); 1,543 agreed to participate and 1,512 completed a second, more detailed screening for which consent was required. N = 1,230 participants were enrolled and randomized into the trial (Fig. 1 and Extended Data Figs. 1 and 2): 472 were assigned to the nonspecialist-telemedicine arm, 469 to specialist-telemedicine, 145 to nonspecialist in-person and 144 to specialist in-person. The 3 month follow-up occurred from March 2020 to February 2024. All analyses in this paper involved outcomes assessed at 3 months post-randomization.

Participants (N = 1,230) were randomly assigned to one of four arms, and all were offered BA. LTF, lost to follow-up; NSP, nonspecialist provider; SP, specialist provider; TM, telemedicine; IP, in-person. *Includes one participant who was a screen fail. **Due to the COVID-19 pandemic, recruitment to in-person (NSP–IP and SP–IP) occurred during the following time periods: January 2020 to March 2020, April 2022 to April 2023 and July 2021 to January 2022 (Extended Data Fig. 1). ***Reflects participants who completed the primary outcome (EPDS). ±Per protocol analyses were only conducted for modality (TM versus IP) comparisons because protocol deviations were defined as instances where participants were switched from IP to TM due to the COVID-19 pandemic. ¥Imputed analyses including the full sample (N = 1,230) were also conducted for all ITT and PP analyses (see Extended Data Fig. 2 for a flow chart by condition).
Baseline demographic and clinical characteristics are described in Table 1. Participants’ mean age was 33.27 (95% confidence interval (CI) 33.00–33.55) years, who predominantly identified as cis-women (1,168/1,173, 99.57%); Black, Indigenous or Persons of Color (578/1,226, 47.15%); and nulliparous (668/1,226, 54.49%). Most participants (1,051/1,226, 85.73%) reported a history of depression or anxiety, and almost one-quarter (288/1,226, 23.49%) were taking psychotropic medications at enrollment. Baseline characteristics were balanced between the four arms; however, most participants expressed an initial preference for being treated by a specialist (n = 730, 60.68%) and via telemedicine (n = 747, 62.09%).
Participants attended a mean of 6.20 (95% CI 6.05–6.36) BA sessions, with the vast majority (982 of 1,230 or 80% across sites) reaching treatment completion status, defined as completing a minimum of four sessions over a 120 day period. While there were no statistically significant differences in treatment dosage (number of sessions attended) between provider conditions (nonspecialist 6.06 versus specialist 6.36), those randomized to telemedicine attended significantly more sessions than those randomized to in-person BA (6.55 versus 5.07, t(1,228) = −8.15, P < 0.001). Further, BA fidelity scores were statistically higher for nonspecialists than specialists, with no differences between modalities (Supplementary Table 1). Neither treatment fidelity nor treatment modality preference were statistically correlated with patient depressive and or anxiety symptoms at 3 months post-randomization (Supplementary Table 2).
Primary outcomes
Providers: nonspecialists versus specialists
The primary outcome was the Edinburgh Postnatal Depression Scale29 (EPDS) at 3 months post-randomization. In the intention-to-treat (ITT) analyses comparing providers ((EPDS: nonspecialist 9.27 (95% CI 8.85–9.70) versus specialist 8.91 (95% CI 8.49–9.33), absolute difference in EPDS means (0.36)), the upper limit of the 95% CI for the difference in EPDS means (0.86) did not exceed the 10% noninferiority margin (EPDS 0.89). Thus, noninferiority of nonspecialist- to specialist-delivered psychotherapy was met. A per protocol (PP) analysis for the comparison of nonspecialist versus specialist was not carried out as the PP population remained the same as ITT due to the study participants not switching from one type of provider to another. Figure 2a illustrates the nonadjusted mean EPDS scores for both provider conditions over time.

a,b, The primary outcome was depressive symptom scores, as assessed by the EPDS at 3 months post-randomization, by provider (a) and modality (b). The t statistics and P values correspond to two-sample t-tests conducted to evaluate the difference in depressive symptom scores at baseline that presented no significant differences between conditions. We tested for a modality by provider interaction (P = 0.93) for our primary (EPDS) outcome and we found it to be nonsignificant in all cases.
Modality: telemedicine versus in-person
In the ITT analyses comparing treatment modality (EPDS: telemedicine 9.15 (95% CI 8.79–9.50) versus in-person 8.92 (95% CI 8.38–9.45), absolute difference in EPDS means (0.23)), the upper limit of the 95% CI for the difference in EPDS means (0.77) did not exceed the 13% noninferiority margin (EPDS 1.16) at 3 months post-randomization. Due to institutional pandemic-related restrictions, 21 participants were switched from in-person to telemedicine and reconsidered for the PP analyses. Similar results were found for the PP analyses comparing treatment modality (EPDS: telemedicine 9.17 (95% CI 8.82–9.53) versus in-person 8.81 (95% CI 8.26–9.36), absolute difference in EPDS means (0.36)) the upper limit of the 95% CI for the difference in EPDS means (0.91) was below the noninferiority margin of EPDS (1.15; Table 2). Thus, noninferiority of telemedicine to in-person psychotherapy was met. Figure 2b illustrates the nonadjusted mean EPDS scores for both modalities over time.
In both comparisons, noninferiority was also met when analyses were conducted with imputed data, adjusted (regression) models (Table 2) and without outliers (Supplementary Table 6). Further, we tested for a modality by provider interaction and found it to be nonsignificant (EPDS: P = 0.93; Fig. 2), suggesting that results comparing provider and modality held when considering modality and provider type respectively. These results also held across sites and when considering other important covariates such as clinical baseline severity, race and ethnicity, site, treatment preference for provider or modality, and psychotropic medication use (see Supplementary Table 7 for the full list of covariates considered in the adjusted analyses).
Secondary outcomes
Perinatal anxiety symptoms
Figure 3 presents nonadjusted mean Generalized Anxiety Disorder30 (GAD-7) scores over time for providers and modalities. Noninferiority was met when comparing providers (ITT GAD-7: nonspecialist 6.44 (95% CI 6.01–6.86) versus specialist 6.36 (95% CI 5.95–6.78)) and modalities (ITT GAD-7: telemedicine 6.43 (95% CI 6.09–6.78) versus in-person 6.29 (95% CI 5.71–6.88)) on anxiety symptoms for all analyses at 3 months post-randomization (Table 2) except for the PP analyses comparing telemedicine versus in-person in which noninferiority was not met on anxiety symptoms unless outliers were removed (Supplementary Table 6). The small absolute differences between modalities for ITT (0.14) and PP analyses (0.30) are not clinically meaningful, as suggested by others (that is, GAD-7 1.5–4)31, and the modality by provider interaction was nonsignificant (GAD-7: P = 0.71; Fig. 3).

a,b, The secondary outcome was anxiety symptom scores, as assessed by the GAD-7 at 3 months post-randomization by provider (a) and modality (b). The t statistics and P values correspond to two-sample t-tests conducted to evaluate differences in anxiety symptom scores at baseline that presented no significant differences between conditions. We tested for a modality by provider interaction (P = 0.71) for our secondary (GAD-7) outcome and we found it to be nonsignificant in all cases.
Clinical severity
We found no statistically significant differences for symptoms of depression (Fig. 4) or anxiety (Extended Data Fig. 3) at 3 months post-randomization when comparing providers (EPDS: mild: F = 0.09, P = 0.7619; moderate: F = 2.74, P = 0.0983; severe: F = 0.66, P = 0.4175) and modalities (EPDS: mild: F = 0.53, P = 0.4657; moderate: F = 0.26, P = 0.6076; severe: F = 2.59, P = 0.1090), accounting for baseline clinical severity. Scores decreased significantly between baseline and posttreatment in all three symptom severity groups (Supplementary Tables 10 and 11).

a,b, Clinical severity EPDS scores were assessed at baseline and the change over time within each severity group between conditions by provider (a) and modality (b) were not significant. Severity groups based on baseline EPDS score: mild (10–11), moderate (12–19) and severe (20–30). The F statistics and P values correspond to the time by condition interaction term from a linear mixed model.
Safety
Eighteen serious adverse events (SAEs) (fetal or infant death, 6; hospitalization, 5; hospitalization and life-threatening event, 3; other serious important medical event, 2; and life-threatening event, 2) and two adverse events (AEs) were identified before or at 3 months post-randomization. All were reviewed by an independent Data Safety and Monitoring Board (DSMB) and none were deemed directly related or a result of the trial. We found no statistical differences in safety events between provider or modality conditions (Supplementary Table 12).
Exploratory outcomes
Noninferiority between both provider and modality comparisons was met for all exploratory outcomes including client satisfaction (nonspecialist 3.40 (95% CI 3.34–3.46) versus specialist 3.44 (95% CI 3.39–3.49); telemedicine 3.43 (95% CI 3.39–3.47) versus in-person 3.38 (95% CI 3.29–3.46)), therapeutic alliance (nonspecialist 4.20 (95% CI 4.13–4.27) versus specialist 4.12 (95% CI 4.05–4.19); telemedicine 4.17 (95% CI 4.11–4.22) versus in-person 4.13 (95% CI 4.02–4.24)), perceived social support (nonspecialist 5.53 (95% CI 5.43–5.62) versus specialist 5.46 (95% CI 5.35–5.57); telemedicine 5.48 (95% CI 5.40–5.56) versus in-person 5.54 (95% CI 5.39–5.69)) and quality of life at 3 months post-randomization (Table 2). In addition, the interaction terms between providers (nonspecialist versus specialist) and study phase (Methods) in relation to EPDS and GAD-7 scores were not significant, supporting that participants recruited across trial phases could be combined (Supplementary Tables 13 and 14).
Sensitivity analyses
A priori planned sensitivity analyses accounted for differences in data collection time points and outcome variables (for example, participants who started or had a change in psychotropic medication between baseline and subsequent treatment sessions). In all analyses, no significant differences were found between provider or modality condition.
Post hoc analyses
Post hoc sensitivity analysis included a comparison of participants who received no sessions to those who received one or more session, in terms of their depressive or anxiety symptom scores at 3 months post-randomization (Extended Data Figs. 4 and 5). Those who received one or more sessions had significantly lower EPDS scores than those who received no sessions (EPDS 8.98 versus 11.03, P < 0.01). In addition, we examined the intraclass correlation coefficient (ICC) among providers and found the values to be small in magnitude (for primary outcome EPDS, ICC 0.02 (95% CI 0.00–0.08) and for secondary outcome GAD-7, ICC 0.00 (95% CI 0.00–0.00)). This small ICC may be expected given the large number of providers (n = 67) delivering the treatment across sites, and because the treatment was delivered individually and therefore participants were unlikely to interact with each other. Finally, we also conducted a sensitivity analysis to compare modality (telemedicine versus in-person) using the original noninferiority margin of 10% for our primary outcome (EPDS). The upper bound of the 95% CI was EPDS 0.77 and fell below the noninferiority margin of EPDS 0.89; thus, we found that noninferiority was met using the original noninferiority margin on the primary outcome of EPDS.
Discussion
In this large, multisite trial, we found that nonspecialists (individuals without a specialized mental health background), were noninferior to specialists in delivering BA, whether BA was provided in-person or via telemedicine. Noninferiority was met across all primary, secondary and exploratory outcomes, with minor exceptions. Given the high prevalence of depression and anxiety in perinatal populations and their negative, intergenerational impact when left untreated3, these findings have important public health implications.
Congruent with growing, robust evidence supporting task-sharing worldwide13,14, our results in the SUMMIT Trial suggest that training nonspecialists can increase the mental health workforce and improve access to relatively brief20, evidence-based psychotherapies such as BA. Globally, task-sharing has been examined with the promise to scale access to a large number of services—including mental healthcare—because of the decreased reliance on specialist providers who are overburdened, scarce, inequitably distributed and often work in the private sector, beyond the reach of the majority of populations in most countries32. This study extends the task-sharing literature by showing the comparable effectiveness of nonspecialists to specialists delivering the same psychotherapy to a diverse perinatal population in North America, thus suggesting that nonspecialist-delivered psychotherapy is not second rate to specialist-delivered care and can enhance access to quality-assured psychotherapy. Despite the majority of participants indicating a preference for treatment by a specialist provider, the SUMMIT trial demonstrated that nonspecialist-delivered care is noninferior to specialist-delivered care with high, and noninferior, patient satisfaction between provider conditions. These results suggest encouraging and reassuring evidence for patients to receive care from nonspecialists.
In addition, we found that BA was delivered by nonspecialists with statistically higher fidelity scores than by specialists. This may be due to the structured form of supervision nonspecialists received compared with the ad hoc consultation that was available to specialists following the training period. The higher fidelity scores among nonspecialists may also be because this was their first treatment delivery experience, whereas specialists were required to have a minimum of 5 years of experience delivering psychotherapies to be eligible for the trial and therefore less rigorous to follow the treatment protocol.
The SUMMIT Trial was a pragmatic trial that was embedded in the clinical care pathways of five clinical sites, and included referrals from obstetrical and from psychiatry services for patients for whom psychotherapy was indicated. Given the rapid recruitment rate, adherence to the protocols and clear integration into the existing clinical care pathways across geographically and socio-demographically diverse cities in both the United States and Canada, we believe our results demonstrate an ability for scale and spread. Across general healthcare settings, frontline healthcare workers such as nurses, midwives and doulas can be trained to deliver and increase access to psychotherapies for perinatal depressive and anxiety symptoms. The eight-session BA treatment manual was adapted from the Healthy Activity Program in India22 where mental health literacy is low and reconfirms that BA is an easy and user-friendly treatment that may be suitable for different populations24 and can be delivered by a range of treatment providers21,22. We found similar treatment effects to other evaluations of BA treatments21,22,25 using inactive controls, which reduced moderate and severe depressive and anxiety symptoms to mild or minimal severity levels33. Finally, these data may also reassuring for low- and middle-income countries, where task-sharing of nonspecialist care is routinely applied due to the severe shortage of specialist mental health providers and limited financial and infrastructure resources for mental healthcare34. Thus, the SUMMIT trial builds on the growing evidence that training nonspecialists has the potential to address the global mental health treatment gap by empowering communities to build local capacity, optimize available resources and still deliver evidence-based care.
Our findings also showed that telemedicine-delivered BA is noninferior to in-person BA when treating perinatal depression and anxiety symptoms. Telemedicine reduces barriers to care, especially for underserved populations, and was preferred over in-person psychotherapy in our sample (60.28%). This preference for telemedicine aligns with other previous studies highlighting a preference for virtual care among perinatal populations19,35 and women in general36,37 because they assist to overcome barriers related to coordinating childcare and transportation. In addition, our findings of noninferiority in key process variables challenge concerns that the patient–provider relationship is compromised using telemedicine38. The use of telemedicine in healthcare has grown substantially since the start of the COVID-19 pandemic and can increase access to care; however, it is now at risk of being removed from covered services in many jurisdictions39. This is important because nationally representative samples have demonstrated that the most prominent reason for not using telemedicine was because providers fail to offer this option36.
Noninferiority between modalities support the continued reimbursement of telemedicine-delivered psychotherapies. We also note that telemedicine was preferred by perinatal participants and this preference was consistent throughout the study regardless of when participants were enrolled. These results indicate that telemedicine-delivered psychotherapy is noninferior to the traditional in-person model, thus offering a more patient-centered and scalable approach.
Given the pragmatic nature of our trial conducted in real-world settings, our results may inform a stepped-care approach to treating depression and anxiety beyond perinatal populations. Nonspecialist-delivered psychotherapy via telemedicine may be the ideal initial step in a collaborative stepped-care model. This is important for three reasons. First, this approach can make brief and effective first-line psychotherapies accessible. Second, both the comparable effectiveness between providers and modalities and the marked symptom reduction in those with severe baseline symptoms highlight the benefit of this approach either on its own or while waiting for more specialized services. Third, this aligns with other task-sharing and stepped-care models (for example, ref. 40), whereby specialists ensure quality care through training and supervision and provide clinical care to patients who require more than first-step psychotherapies. To scale this task-sharing approach within stepped care, payers are required to reimburse nonspecialists delivering care and specialists providing quality assurance for the nonspecialists. Further research is required to examine whether self-guided or single-session interventions may be a suitable first-step for specific populations (for example, those with mild symptoms41).
The SUMMIT trial has many strengths. This includes a large population, a relatively high retention rate42,43 despite the challenges of a global pandemic, a racially and geographically diverse sample improving generalizability (Supplementary Table 5 (refs. 44,45,46,47,48,49,50,51,52,53,54,55,56)) and high patient satisfaction rates. Less than 1% of participants within our large sample required Internet access, suggesting that telemedicine is unlikely to raise a new barrier to care. Furthermore, the greater adherence to the number of sessions favoring telemedicine supports its feasibility. Finally, the conservative and clinically meaningful noninferiority margins strengthen the interpretation of our findings.
The study also has limitations. Substantial modifications were required due to COVID-19, including a change in noninferiority margin from 10% to 13% in the telemedicine versus in-person comparison, and deviation from the assigned modality due to institutional mandates during the pandemic. Most of the participants (70.07%) had a university degree, thus potentially limiting the generalizability of our sample. While this may be typical of psychotherapy research participants, our findings need to be extended to other groups such as adolescents or those without higher education. The trial did not include an inactive control group. This is common and even recommended in noninferiority trials57,58. Owing to the established efficacy of BA24,25, the comparison to a nontreated group in real-world healthcare settings would be inappropriate and arguably unethical25,57. In both comparisons of modality and provider, the magnitude of symptom improvement was significantly higher among the participants who received BA than in the small number who did not (Extended Data Figs. 4 and 5) and higher than what has been reported with inactive control conditions in other BA trials25. Further, the difference between their final EPDS (2.05) falls within the range of a minimally clinically important difference of EPDS 1.4–6.4 (ref. 59). Future applications of personalized medicine algorithms could determine ‘what works for whom’ and whether specific subgroups may benefit more from one type of provider or modality compared with another.
The noninferiority results of the SUMMIT trial suggests the comparable effectiveness of task-sharing and telemedicine-delivered psychotherapies to treat perinatal depression. These population-based approaches are both effective and patient centered and have the potential to transform access to treatment for perinatal mental health globally.
Methods
Study design
SUMMIT was a pragmatic, multisite, four-arm, individual randomized, noninferiority trial conducted in Chapel Hill (North Carolina), Evanston/Chicago (Illinois) and Toronto (Canada) at university-affiliated healthcare settings. In North Carolina, recruitment was conducted at three clinical sites affiliated with the University of North Carolina (UNC) Women’s and Neuroscience Hospitals. In Illinois, recruitment was conducted at 14 obstetric and family medicine clinics affiliated with Endeavor Health (formerly known as Northshore University Health System) and the University of Chicago. In Toronto, recruitment was conducted at Mount Sinai Hospital, Women’s College Hospital and St. Michael’s Hospital, all of which are affiliated with the University of Toronto. The study received ethical approvals from the following three institutional review boards (IRBs): UNC Biomedical IRB (19-1786), Endeavor Health IRB (EH18-129) and Clinical Trials Ontario (1895). All participants provided written informed consent before enrollment. An independent DSMB supervised the trial (Supplementary Table 15). The trial design, protocol and oversight have been described previously66.
Participants
Full eligibility criteria and recruitment procedures are detailed in the published protocol66. In brief, inclusion criteria included being a pregnant (≤36 weeks) or postpartum (4–30 weeks) adult (≥18 years; inclusive of gender identities and birthing persons) with a score ≥10 on the EPDS29 and speaking English or Spanish. A cutoff score of EPDS ≥10 was used to encompass both minor and major depression67. This threshold has been used in antenatal, postnatal and community-based populations with high sensitivity68, particularly for specific cultural groups (for example, ref. 69). Exclusion criteria included active suicidal intent; active symptoms of psychosis, mania or substance misuse; change in psychotropic medication within 2 weeks of beginning treatment; ongoing psychotherapy; and severe fetal anomalies, stillbirth or infant death.
Between January 2020 and October 2023, participants were recruited through self, internal or external referrals. Internal referrals were elicited from clinicians from site hospitals and satellite clinics (that is, obstetrical, mental health and family departments) who sent patient contact information directly to the research team. External referrals were received from clinicians at sites that were not affiliated with the trial. Finally, recruitment materials with contact information for the research team (that is, brochures and posters) were available in clinics and interested individuals contacted the team for more information (self-referrals). The study was introduced to potential participants by either their clinical provider (for internal and external referrals) or a trained research assistant at their respective sites (for self-referrals). Before contacting potential participants, research staff were trained in obtaining informed consent to meet ethical standards. In addition, all research staff interacting directly with participants engaged in role plays where interpersonal skills such as empathy and a nonjudgemental stance were emphasized. After providing consent, participants were screened for eligibility. Informed consent was obtained in-person or via telephone, and most occurred virtually using a combination of phone or Zoom.
Intervention
All participants were offered the same manualized BA intervention over six to eight individual weekly sessions. BA relies on increasing enjoyable or fulfilling activities that align with one’s values and targeting key mechanisms of patient activation and avoidant coping70. The intervention manual was adapted from two well-established manuals: the Healthy Activity Program from India22 and the Alma Program for perinatal populations in Colorado. As described elsewhere66,71, key strategies included psychoeducation, behavioral assessment, values-based activity monitoring and structuring, and problem solving and interpersonal effectiveness, all through a culturally sensitive lens71.
Intervention arms
Participants were randomized to one of four arms: nonspecialist telemedicine, specialist telemedicine, nonspecialist in-person or specialist in-person.
Providers
Details on provider recruitment, training and supervision have been published66,72. Briefly, providers were recruited through listservs advertising the casual position of a SUMMIT treatment provider, and by direct contact and word of mouth through stakeholders. Nonspecialists were healthcare workers (that is, registered nurses, midwives and doulas) without any formal mental health training or experience delivering psychotherapy, ascertained through resume review and the hiring interview process. In Canada and the United States, these healthcare workers are already embedded in pregnancy and postpartum care16. Specialists were professionals with formal mental health training (that is, psychiatrists, psychologists or social workers) and ≥5 years of experience delivering psychotherapies.
Training phase: both nonspecialists and specialists received the same BA training, provided by a minimum of two clinical leads (expert clinicians). The workshops utilized observation and didactics with interactive educational strategies including role play, games and homework. Nonspecialists and specialists meeting competency standards73,74 were selected for the 8 week internship phase of the trial during which they implemented BA treatment with up to two participants, with supervision from the site-level clinical lead. Only nonspecialists and specialists who achieved competence, as assessed by standardized role plays and therapy quality assessments72,73, were selected to deliver BA during the trial.
Trial phase: once the internship phase was complete, specialists attended weekly to monthly peer supervision and contacted the clinical lead on an ad hoc basis. Nonspecialists also participated in weekly to bimonthly supervision with their site clinical lead, as well as monthly measurement-based supervision where clinical leads and nonspecialists rated individual audio-recorded sessions for therapy quality. Audio-recorded treatment sessions were rated using the Q-SUMMIT, a brief, validated 20-item therapy quality checklist that examined the extent to which an individual treatment provider exhibited treatment-specific skills (for example, establishing an agenda) and general skills (for example, using collaboration and nonjudgemental stance) on a scale of zero to four. Using the same Q-SUMMIT measure, independent raters evaluated intervention fidelity on at least 5% of all treatment session audiotapes at each site, with good to excellent interrater reliability (κ = 0.75–0.85). Both nonspecialists and specialists who did not reach the cutoff for specific items received refresher training by the hub clinical leads. The Q-SUMMIT is freely available on the SUMMIT website (https://thesummittrial.com/external-resources/bamanual-and-materials/).
Modality
Telemedicine was implemented via WebEx in Chapel Hill and Zoom in Evanston/Chicago and Toronto, in compliance with local privacy laws. Participants attended in-person BA sessions at their outpatient clinical site or telemedicine BA sessions at their preferred private location. If needed, telemedicine participants were provided with study tablets and internet access. Seven telemedicine participants (or 0.57%) required study tablets or internet access.
Involvement of stakeholders
The SUMMIT trial involved multiple and diverse stakeholders to inform the development, implementation and dissemination strategies of the study. From the study’s inception, we consulted individuals with lived experience, family members, patient advocates, clinical experts, healthcare administrators, payers and policy members, and these stakeholders were integral members of our team, alongside researchers and clinicians. In addition, we also conducted qualitative research with a range of SUMMIT stakeholders, including (but not limited to) participant representation from each of the four arms, significant others (including spouses or partners of participating mothers) and a diverse range of other stakeholders to inform key decisions about the trial including the cadre of nonspecialist provider75, training and supervision processes76, culturally sensitive care71 and resuming in-person sessions during the COVID-19 pandemic77. Insights from our diverse stakeholder group were vital to informing patient-centered delivery and implementation of the SUMMIT trial.
Randomization and masking
Individual participants were randomized within REDCap to one of four study arms and randomization was stratified by perinatal period and site. An independent biostatistician generated the randomization sequence through computer-generated lists, with random blocks of four and eight, and stratified by perinatal period and site. The randomization sequence was concealed until all 3 month data were analyzed. Several strategies were used to reduce bias. First, all team members were masked to study arm allocation except for the data site coordinator, participants, treatment providers and clinical leads (who did not have access to aggregate data66). Second, all treatment providers introduced themselves as a ‘SUMMIT treatment provider’ to reduce potential bias of provider preference among study participants. All data were self-reported through REDCap and reviewed by independent data staff to reduce the number of staff who were unmasked. Third, differences in clinical and demographic characteristics between self-referrals compared with other referral sources were balanced. Trial recruitment began in January 2020 and participants were initially randomized 1:1:1:1 to each arm (Extended Data Fig. 1). Due to the COVID-19 pandemic, in-person sessions were paused, and participants were randomized 1:1 to the two telemedicine arms between March 2020 and July 2021. Once healthcare institutions allowed in-person visits, participants were randomized in a weighted 3:1 scheme favoring in-person conditions (July 2021 to January 2022). Finally, we returned to 1:1:1:1 randomization in January 2022 (see the Statistical Analysis Plan (SAP)78 or Extended Data Fig. 1 for more details).
Database and handling
Quantitative data were collected through standardized REDCapTM databases, securely hosted on institutional servers at each of the three participating hubs. Deidentified data from the United States hubs were extracted, encrypted and transferred to Toronto, where they were subsequently uploaded into the REDCap system. The data were protected by stringent measures, including restricted access, 24 h surveillance and continuous monitoring. Upon providing informed consent, participants were assigned unique study IDs and received email links to complete their baseline, session-wise and outcome assessments linked to their study ID profiles in REDCap. Personal information associated with study IDs was kept in a separate password-protected, encrypted file on secure institutional servers. Email addresses were stored separately from study data on a secure backend server within REDCap, ensuring protection from unauthorized access and data export.
Procedures
Participants completed a baseline assessment and were then randomized by trained research assistants using REDCap. All reported outcome data were collected via self-report through REDCap at 3 months post-randomization. All staff and providers were trained to follow an established and detailed safety guide. This included the monitoring of SAEs, which were defined as fatal or life-threatening events to the participant, fetus or child; disability or permanent damage; hospitalization; and other serious medical events and AEs, which included active suicide intent (regardless of hospitalization) or development of psychosis and mania. When potential SAEs or AEs were identified, they were reviewed by the site coordinator, trial psychiatrist and site principal investigator. All SAEs and AEs were documented by the site-level principal investigator and research staff, followed by an expedited review by the DSMB.
Trial changes due to the COVID-19 pandemic
The COVID-19 pandemic required (1) changing the randomization scheme (see above), (2) treating 21 participants who were initially randomized to the in-person modality via telemedicine during the peaks of the pandemic due to institutional policies and, consequently, (3) temporarily ceasing in-person recruitment and changing the noninferiority margin for modality (in-person versus telemedicine) from 10% to 13% (see SAP), which reduced the required sample size for in-person arms and the overall sample size from 1,368 to 1,226 (see below). These modifications were approved by our DSMB and research ethics committees across sites. Research has demonstrated that the minimal clinically important difference (MCID) on our primary outcome, the EPDS, is typically four points (95% CI 1.4–6.5)59,79. In line with noninferiority guidelines, the revised noninferiority margin of 13% was determined to consider the lower bound of this MCID (that is, EPDS 1.4). In addition, we conducted a post hoc sensitivity analysis using the (pre-COVID-19 original) lower noninferiority margin of 10% for in-person versus telemedicine. Finally, the study team decided to simplify the study protocol by (1) aligning secondary outcomes with predefined secondary aims. In doing so, several outcomes (trauma symptoms, perceived support, activation levels and quality of life) in the protocol were redefined as exploratory outcomes. With the exception of trauma symptoms, the analyses of these exploratory outcomes at 3 months post-randomization are presented in the current study; and (2) defining AE/SAE according to the SAP and approved ethics protocols to ensure clarity when assessing active suicidal intent. In short, when the published trials protocol and the SAP are discrepant, the text and analysis of the paper are congruent with the SAP (which was completed after the published protocol and refined some analyses).
Outcomes
The predefined primary outcome was the EPDS29 total score at 3 months post-randomization. As the treatment was offered on a weekly basis and typically takes up to 3 months to complete, 3 months post-randomization reflects a post-treatment outcome and is commonly used in other psychological treatment trials42. The EPDS is a ten-item, Likert scale (range 0–30) that is freely available in multiple languages. Consistent with the pragmatic trial design of SUMMIT, the ten-item EPDS scale was selected for the primary outcome, rather than a diagnostic interview, for several reasons: its high sensitivity, it is the most common measure of parental depressive symptoms in clinical and research settings, and it is feasible to implement at scale80. The secondary outcome was the GAD-7 (ref. 30) total score. Exploratory outcomes included patient-reported quality of life (as assessed through the World Health Organization Disability Assessment Schedule 2.0 (ref. 65) and EQ-5D 5-Level61), client satisfaction (Client Satisfaction Questionnaire-8 (ref. 60)), therapeutic alliance (Working Alliance Inventory–Short Form64), activation levels (Premium Abbreviated Activation Scale63) and perceived social support (Multidimensional Scale of Perceived Social Support62). All measures are global measures and were selected because of their robust psychometric properties across diverse settings. With regards to minor discrepancies between the listed secondary and exploratory outcomes in the published protocol and SAP, the text and analysis of the paper are congruent with the SAP (which was completed after the published protocol and refined some analyses).The current study focused on outcomes assessed at 3 months post-randomization. Data collection for all additional analyses related to sustained outcomes, child development outcomes, economic evaluation and qualitative data are ongoing and will therefore be published separately.
Sample size
Using a noninferiority margin of 10%, an EPDS mean estimate of 7.93 (s.d. 4.68) (ref. 81), 80% power and alpha (α) of 0.05, the comparison of provider (nonspecialist versus specialist) required 431 participants in each of the two conditions. To account for 10% dropout, the sample size was inflated to N = 958. As described in our detailed SAP and based on recent noninferiority guidelines, we did not adjust our two primary hypotheses for multiplicity because they did not involve different endpoints (that is, both hypotheses test total EPDS scores at 3 months post-randomization). Specifically, when multiple hypotheses test a similar underlying outcome, no adjustment for multiplicity is required82,83. As described above, and due to COVID-19, we increased the noninferiority margin for the modality comparison (telemedicine versus in-person) from 10% to 13% and used the same EPDS estimate, 80% power and α = 0.05. The comparison of telemedicine to in-person delivery required an additional 268 participants, yielding a target sample size of 1,226. While the study is a multicenter one, randomization occurred within each individual site. Thus, no cluster randomization was carried out and the sample size was not inflated to account for an intracluster correlation. The sample size calculation was run using PASS version 12 (Power Analysis and Sample Size Software; NCSS LLC).
Statistical analysis
Baseline variables included demographic characteristics (for example, self-reported age, race/ethnicity, sex and gender identity and marital status). All baseline variables of interest were summarized using means and two-sided 95% CIs for continuous measures, and percentages for categorical measures.
The primary analysis was a noninferiority t-test, using a one-sided, 95% CI around the difference in EPDS scores based on the actual primary endpoint data. Noninferiority was determined if the upper limit of the 95% CI of the mean score was below the noninferiority margin.
We selected predetermined noninferiority margins of 10% for provider and 13% for modality, which were calculated based on the actual primary endpoint data58. These noninferiority margins were selected to ensure clinically relevant conclusions of noninferiority between conditions were met. Specifically, and in line with guidelines for noninferiority trials, we selected these noninferiority margins to correspond with the MCID and to ensure that they were lower than any potential superiority effect between groups. Compared with other psychotherapy trials, which typically use noninferiority margins of 24–60%, our prespecified margins were intentionally conservative to ensure that noninferiority conclusions between providers and modalities could be ascertained. The same noninferiority margins were used to examine noninferiority for secondary and exploratory outcomes.
All primary and secondary noninferiority analyses were run as both ITT (based on randomization arms) without imputed data and then with imputed data (Supplementary Table 9), and a PP analysis based on the condition (provider: nonspecialist versus specialist, or modality: telemedicine or in-person) that the participant actually received. A sensitivity analysis omitting protocol deviations was also conducted (Supplementary Table 6). The key noninferiority analyses were based on noninferiority t-tests across the entire patient sample (under the assumption that randomization will serve to balance out potential confounders). Each t-test looked at the CI around the difference in outcome based on the actual data (for example, EPDS scores between nonspecialist and specialist or between telemedicine and in-person) to see whether the upper bound contained the noninferiority margin (upper bound for the case in which we take the difference to be nonspecialist minus specialist (or telemedicine minus in-person) and where higher scores indicate a worse outcome). In addition, we tested for a modality by provider interaction for our primary and secondary outcomes.
Secondary analyses repeated the same analyses with GAD-7 scores at 3 months post-randomization. Linear regression models were used as an additional analysis to assess noninferiority while accounting for preselected conceptually and empirically driven baseline covariates (Supplementary Table 7), such as study site, perinatal period (antenatal versus postnatal), education levels, treatment preference for provider or modality and baseline severity. Linear mixed models assessed the moderating effects of severity at baseline, with participants as a random effect and a treatment-by-time interaction. Exploratory analyses examined whether noninferiority held for additional exploratory outcomes. Further, we examined a potential pandemic effect, testing for an interaction between providers (specialist and nonspecialist) and timing (1:1 randomization to nonspecialist and specialist in telemedicine only versus 3:1 randomization favoring in-person) in relation to change in EPDS and GAD-7 scores in a linear regression model. We also conducted a sensitivity analysis on our primary outcome (EPDS scores at 3 months post-randomization), which excluded participants recruited during pre-COVID-19 1:1:1:1 randomization (Supplementary Table 16). An ICC (and its associated CI) was calculated to ensure that clustering at the level of the treatment provider would not be a concern for the key outcomes of interest.
Multiple imputation methods were conducted to impute missing data using fully conditional specification methods with Proc MI and Proc MIANALYZE in SAS. Fifteen imputed datasets were created to align with the percent missingness, with model results averaged across the 15 iterations84. Sensitivity analyses were carried out since missing data lead to the use of multiple imputation methods. These analyses compared the results of the models on the imputed data with the ones with the actual missing data included. Additional sensitivity analyses (for example, psychotropic medications and participants without BA sessions) were conducted (Supplementary Table 17). Outliers were defined as scores that were greater than 1.5× the interquartile range + the third quartile OR lower than the first quartile – 1.5× the interquartile range. The trial biostatistician conducted all analyses using SAS version 9.4 (SAS Institute).
The study was registered at clinicaltrials.gov (NCT04153864) before study recruitment on 6 November 2019.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Responses