Innovation and application of Large Language Models (LLMs) in dentistry – a scoping review

Introduction

Generative Artificial Intelligence (AI) represents a groundbreaking advancement in machine learning, particularly through the development of Large Language Models (LLMs) [1]. These sophisticated systems are designed to generate human-like text by leveraging vast datasets and complex algorithms [2]. LLMs, utilize transformer architectures to process and predict text, enabling them to perform a wide range of tasks from text completion to translation and summarization [3]. These models operate by segmenting input data into tokens and using self-attention mechanisms to understand and generate coherent sequences of text, thereby mimicking human-like understanding and communication [4].

LLMs employing deep learning algorithms process and comprehend natural language, enabling pattern recognition, translation, or generation of text and diverse content [5]. They have revolutionized healthcare by enhancing the efficiency, accuracy, and accessibility of medical services [6, 7]. Their ability to process and analyze large volumes of clinical data, understand complex medical terminologies, and generate detailed medical reports has significantly improved clinical documentation and patient care.

LLMs which have rapidly advanced the general field of healthcare, are also poised to make significant contributions within dentistry—an area that has only begun to explore their potential. For instance, it can automate the generation of medical records and progress notes, streamlining administrative tasks for dental practitioners [8]. Additionally, it can assist in summarizing complex research papers, extracting key information to keep clinicians updated on the latest developments [9]. Moreover, LLMs are increasingly being utilized in patient query handling, with the development of chatbots and virtual assistants that can provide accurate and timely responses to patient inquiries [10]. This kind of support aligns with dentistry’s high patient-interaction environment, where timely and accurate information is essential for patient satisfaction and adherence to care protocols [11]. Through these applications, LLMs not only augment the capabilities of dental professionals but also contribute to more informed decision-making and better patient outcomes.

To enhance the performance of LLMs in domain-specific tasks compared to general-purpose models, various prompting strategies can be employed [12]. Advanced prompting techniques such as role prompting, one-shot, few-shot, or chain-of-thought prompting provide context-rich inputs that guide the model to generate more relevant and precise responses [13]. Embedding techniques, which represent words or phrases in vector space, facilitate the model’s understanding of context and relationships between terms, improving its ability to handle specialized medical vocabulary [14]. Retrieval-Augmented Generation (RAG) combines LLMs with external knowledge sources, retrieving relevant information to support the generation process, thereby increasing the reliability and specificity of the outputs [15]. By integrating these strategies, LLMs can overcome the limitations of general-purpose models, delivering more accurate and contextually appropriate responses in specialized fields such as dentistry. The operational definitions of key prompting strategies and frequently used terminologies in LLMs are presented in Supplementary Table 2.

The deployment of Large Language Models (LLMs) in healthcare and dentistry can be understood through different levels, reflecting stages of integration and maturity, as described by Zhang et al. in their study on the development maturity of clinical artificial intelligence research [16]. At Level 1, LLMs are in the experimental phase, primarily focused on algorithm development and initial testing. Level 2 involves early adoption, where models are tested in controlled environments to validate their efficacy and reliability. By Level 3, LLMs are integrated into practical applications, often referred to as the “model into device” stage, where they begin to interact with real-world data and users. Level 4 represents mature deployment, where LLMs are fully embedded within healthcare systems, continually monitored, and refined to ensure optimal performance and reliability in diverse clinical settings.

By employing these advanced techniques and progressing through the stages of deployment, LLMs hold the potential to significantly advance healthcare, offering tailored solutions that address the unique challenges and requirements of the medical field. While there has been considerable research on the applicability of LLMs in various medical domains, their integration within dentistry remains underexplored [17,18,19]. Therefore, this scoping review aims to systematically evaluate the current literature on the application of LLMs in dentistry. By synthesizing the existing evidence, this review seeks to elucidate the diverse use cases, identify research gaps, and assess the methodologies employed such as evaluation metrics used in studies utilizing LLMs within dental practice. Furthermore, the review will examine the type of LLM model used (general purpose models versus prompting strategies employed) as well as offering insight into the current state of LLM integration in dental practice. Through a meticulous review, we aim to advance knowledge in this field and guide the effective integration of LLMs into dental practice for optimal outcomes.

Materials and methods

The scoping review was carried out following the established standards and guidelines outlined in the Preferred Reporting Items for Systematic Reviews and Meta-Analysis with the associated extension for Scoping Reviews (PRISMA-SCr). The protocol can be accessed through the Open Science Framework platform (https://osf.io/vqjz3).

Search strategy

The authors, in collaboration with a medical information specialist from Aga Khan University Hospital, Pakistan, developed a comprehensive search strategy utilizing various combinations of key search terms. A pilot search was conducted by the authors to refine the search strategy. Initially, the search produced a broad range of studies, many of which were tangentially related to the main topic. Additionally, the pilot search indicated that certain databases yielded more focused results; for example, IEEE Xplore provided highly relevant technical papers, while PubMed included a mix of broader dental applications. Adjusting the inclusion criteria to emphasize empirical studies related to dental practice rather than theoretical discussions further narrowed the results, ensuring that the final search strategy was both comprehensive and directly aligned with the research objectives.

Literature search

An extensive literature search was conducted in March 2024 through three electronic databases: PubMed (NLM), Scopus, and Institute of Electrical and Electronics Engineers (IEEE) Xplore. Additionally, a manual search was performed on Google Scholar to identify any additional literature addressing the review questions.

Search terms

The following search terms were used to identify the relevant literature:

Large Language Models OR LLM OR LLMA 2 OR ChatGPT OR Generative Artificial Intelligence OR Generative AI OR Chatbots OR Natural Language Processing OR NLP OR Google Bard OR PaLM OR PaLM 2 OR Gemini AND dental OR dentistry OR restorative dentistry OR endodontics OR prosthodontics OR periodontics OR maxillofacial surgery OR oral surgery OR orthodontics

Screening process

Article citations were exported to the Endnote reference manager version 20.0 (Clarivate Analytics) where duplicate references were removed. Two authors (IB and NN) screened the titles, abstracts, and full texts of the studies according to the predetermined inclusion criteria. Any disagreement between the two was resolved through discussion with the third author (FU). The data were added to a calibrated proforma independently by all three authors. Additionally, the extracted information was rechecked by the senior author (FU).

Review questions

  1. 1.

    What are the specific applications of Large Language Models (LLMs) in various dental specialties, and how have they been utilized to date?

  2. 2.

    What evaluation metrics are employed in studies assessing the performance of LLMs in dental practice?

  3. 3.

    What evidence exists regarding the accuracy and efficiency of LLMs in dentistry?

  4. 4.

    What type of LLM models were used in the studies, the general-purpose models or with advanced prompting strategies?

  5. 5.

    What is the current state of LLM integration (level of deployment) in dental practice?

Data extraction

A customized proforma was designed by the authors to extract the following information from included studies:

  1. 1.

    Study details (title, authors, journal of publication, year of publication)

  2. 2.

    Study characteristics (specialty/field and application)

  3. 3.

    Type of LLM model/algorithm used (GPT, Bard, Llama, Bloom)

  4. 4.

    Evaluation metrics utilized in the individual studies

  5. 5.

    Prompting strategies or training used (fine-tuning, embedding, RAG)

  6. 6.

    LLM deployment level

Inclusion criteria

  • Primary studies utilizing LLMs in dental practice

  • Studies in English language

Exclusion criteria

  • Reviews, editorials, commentaries, and conference proceedings

  • Studies available as abstract only

  • Studies registered as protocols

Results

Following a detailed manual and electronic literature search, 4079 records were identified. After removal of 400 duplicates, the remaining 3679 records were screened for relevance and 79 articles were excluded. A total of 3593 articles underwent final screening for eligibility check and after excluding narrative/systematic reviews, letter to the editor, product reviews and papers with irrelevant titles and abstracts, 17 studies fulfilling the inclusion criteria were included in the analysis. The PRISMA flowchart for screening process is presented in Fig. 1.

Fig. 1: PRISMA flowchart.
figure 1

The figure illustrates the search and retrieval processes of studies via PubMed, Scopus, Google Scholar and IEEE Xplore. After comprehensive screening, 17 studies were found to be eligible and included in the analysis.

Full size image

The characteristics of included studies extracted on a customized proforma is presented in Supplementary Table 1.

ChatGPT was the predominant large language model (LLM) utilized in 15 studies [20,21,22,23,24,25,26,27,28,29,30,31,32,33,34]. In contrast, other less frequently utilized AI tools were Bing, Google Bard, Open Evidence, and Medi Search [24, 29]. The primary objective of most of the studies (15 studies) was to address post-operative patient queries [20,21,22,23,24,25,26,27,28,29, 31, 33,34,35,36]. Additionally, one study focused on generating radiology reports, and another aimed at diagnosing conditions based on patient history and radiographic findings [30, 32]. The specialty-wise distribution of the included studies revealed that majority were within the domains of Oral and Maxillofacial Surgery and Orthodontics [21, 24,25,26, 30, 31, 33]. This was followed by studies in other domains such as Endodontics, Periodontics, General Dentistry, Maxillofacial Radiology, Prosthodontics, Dental Public Health, and Dental Radiology [20, 22, 23, 27, 29, 30, 32, 34,35,36].

Various evaluation metrics were employed across the included studies on the use of LLMs in dentistry. These included Likert Scale (9 studies), the Modified Global Quality Scale (3 studies), DISCERN tool (2 studies), Ensuring Quality Information for Patients tool (1 study), and Simple Measure of Gobbledygook (SMOG) and Similarity Index (1 study). The commonly used evaluation metrics with a brief description of each are presented in Fig. 2.

Fig. 2: Evaluation metrics utilized in the included studies.
figure 2

The figure shows a brief description of the evaluation metrics used. The size of each colored box represents the number (weightage) of studies utilizing the individual metrics.

Full size image

Interestingly, only two studies employed advanced prompting strategies such as zero-shot and chain-of-thought prompting [28, 32]. No prompting strategy was used in the remaining studies. Regarding the level of maturity according to the stage of development depicting the deployment of these LLMs, it was found that nearly all studies were at level 3 of deployment (model into device stage). Moreover, the evaluators in almost all the included studies were human dental experts. Their user experiences (positive, negative, mixed, or neutral) as reported in the individual studies are presented in Fig. 3.

Fig. 3: Graphical representation of user experience.
figure 3

The figure shows the positive, negative, neutral and mixed perspective of the human evaluators reported in the included studies.

Full size image

Discussion

The justification for conducting a scoping review on the use of LLMs in dentistry derives from the need to thoroughly comprehend the current state of research in this field, identify any gaps or limitations and offer suggestions for future research. A scoping review is particularly beneficial and preferred over a systematic review when there are open study questions and no predefined PICOs, as this allows researchers to broadly explore a topic, identify and clarify key terms, and visualize the landscape of research.

Application of LLMs across dental specialties

Through the course of our scoping review, we found extensive utilization of LLMs in domains such as Dental Public Health, Oral/Maxillofacial Surgery, Periodontology, Orthodontics, General Dentistry, Oral Surgery, Endodontics, Dental Radiology, Preventive Dentistry, and Prosthodontics. However, it is noteworthy that certain domains, such as Pediatric Dentistry, Implant Dentistry, and Oral Pathology, have not been extensively documented in the literature regarding their use of LLMs up to the time of conducting this scoping review. Moreover, while the studies focused on post-operative patient queries, generating radiology reports, and diagnosis based on patient history and radiographic findings, several critical aspects of dental healthcare were not covered. These include treatment planning, patient education, emergency dental care, integration with electronic health records (EHRs), and telehealth applications. Exploring the potential of LLMs in these areas could further enhance patient care, improve treatment outcomes, and increase efficiency in clinical practices.

Predominantly employed LLM in dental practice

In our scoping review, ChatGPT emerged as the predominant Large Language Model (LLM) utilized in various studies, as opposed to other available models like Llama 2, Gemini, Claude 2, Mixtral 8x7B, and Falcon. This popularity could be attributed to ChatGPT’s user-friendly interface, 24/7 accessibility, and the advantage of being the first LLM to enter the market [37]. While ChatGPT’s extensive usage offers advantages, it is important to recognize potential limitations in solely relying on it as this may overlook the unique features and potential advantages offered by other LLMs.

It is notable that authors sometimes omitted specifying the version of ChatGPT employed in their research. This omission could potentially pose a challenge in replicating and comparing study findings, as different versions of ChatGPT may exhibit varying performance characteristics [38]. To ensure transparency and reproducibility in LLM research, it is recommended that authors explicitly mention the versions of all LLM models utilized in their studies.

Challenges associated with general-purpose models

We observed that most studies utilized general-purpose models of ChatGPT, which are trained on a wide corpus of internet text. While these models performed well on general question-and-answer tasks, they often struggled with domain-specific technical questions. This limitation underscores the importance of employing advanced prompting techniques to enhance the performance of LLMs in specialized domains. Techniques such as role prompting, which involves adding a system message or utilizing different prompting strategies like one-shot, few-shot, or multi-shot prompts, can provide richer context and improve model understanding [13]. However, in our review, we found that only two studies incorporated advanced prompting techniques, highlighting a potential area for further exploration and development in LLM research within dentistry [28, 32].

Concerns regarding reliability of generated information

The studies reviewed in our paper indicate that the information generated by LLMs lacked references, raising concerns about its reliability. This issue can be addressed by employing retrieval-augmented generation techniques (RAG), which integrate retrieved knowledge with the model’s generation process [15]. Interestingly, none of the included studies in our review utilized any LLM modification techniques, such as fine-tuning or RAG, suggesting a potential avenue for future research to enhance the trustworthiness and accuracy of LLM-generated information in dentistry.

Maturity level of LLM deployment

The evaluation of the level of maturity in the deployment of LLMs in dental practice revealed that nearly all studies were at level 3 of deployment, which corresponds to the “model into device” stage. This stage indicates that the LLMs have moved beyond theoretical or pilot phases (levels 1 and 2, which involve initial development and early testing) and are being integrated into practical, usable applications within the healthcare setting. It demonstrates that the models have undergone sufficient development and validation to be trusted in real-world scenarios. However, achieving level 3 also highlights the need for continuous monitoring and refinement to ensure the models maintain accuracy, reliability, and relevance as they interact with actual users and encounter diverse real-world data (level 4) [16]. While most studies focused on assessing the output of LLMs against expert knowledge, there is untapped potential for further research to explore the utility of these models in real-world deployment among patients and healthcare providers. Understanding user acceptability and the practical application of LLMs beyond controlled research settings is crucial for informing their integration into clinical practice.

Lack of standardization of assessment tools

A notable shortcoming was that assessments in the included studies were conducted by subject-level experts using customized assessment tools tailored for each study, including Likert scales, modified Discern instruments, or modified Global Quality Scores (GQS). This lack of standardization precludes the homogenization of results across studies, making it challenging to compare findings effectively. Therefore, there is a pressing need for the development of a standardized assessment tool to facilitate better comparison of results and enhance the validity and reliability of evaluations across different studies. Furthermore, employing quantitative scales rather than Likert scales could provide a more objective means of quantifying outputs; however, it is important to acknowledge that quantitative measures also come with their own limitations, such as potential oversimplification of complex constructs and challenges in accurately capturing subtle variations in responses.

Need for standardized reporting

We observed a lack of standardized terminologies for assessment in the studies reviewed. Terms like accuracy, reliability, content analysis, validity, among others, were employed without clear definitions or consistent usage. This variability could potentially lead to confusion and hinder comparability across studies. It is essential for the research community and individual researchers to explicitly define these terms within the context of their studies, ensuring consistency and clarity in reporting. By adhering to accepted terminologies and valid performance metrics in a standardized manner, researchers can enhance the reliability and comprehensibility of their findings.

This is the first review of its kind that methodically explores the trends and progress of LLM related research in dental practice. However, the inclusion of only three databases in the search may have resulted in the omission of some relevant articles. Additionally, to ensure a wider inclusion of studies, the research questions posed in the review were intentionally broad. Lastly, while findings were extracted following a predefined methodology, some were added in an ad hoc manner to enhance the overall yield of our review.

Conclusion

Large Language Models have the potential to transform healthcare and dentistry by enhancing patient care and improving administrative efficiency. This includes providing accurate patient query responses, diagnostic assistance, and streamlining documentation processes. While ChatGPT was the frequently employed tool, diversifying assessments across various LLMs is essential for a comprehensive understanding of their capabilities. Moreover, to optimize the utility of LLMs, future research should focus on specific applications in dentistry and developing guidelines for effective integration. Furthermore, addressing challenges such as privacy, ethical use of the data, and training of practitioners will enable the dental profession to maximize the benefits of LLMs in clinical practice.

Related Articles

Leveraging large language models to assist philosophical counseling: prospective techniques, value, and challenges

Large language models (LLMs) have emerged as transformative tools with the potential to revolutionize philosophical counseling. By harnessing their advanced natural language processing and reasoning capabilities, LLMs offer innovative solutions to overcome limitations inherent in traditional counseling approaches—such as counselor scarcity, difficulties in identifying mental health issues, subjective outcome assessment, and cultural adaptation challenges. In this study, we explore cutting‐edge technical strategies—including prompt engineering, fine‐tuning, and retrieval‐augmented generation—to integrate LLMs into the counseling process. Our analysis demonstrates that LLM-assisted systems can provide counselor recommendations, streamline session evaluations, broaden service accessibility, and improve cultural adaptation. We also critically examine challenges related to user trust, data privacy, and the inherent inability of current AI systems to genuinely understand or empathize. Overall, this work presents both theoretical insights and practical guidelines for the responsible development and deployment of AI-assisted philosophical counseling practices.

Engineering bone/cartilage organoids: strategy, progress, and application

The concept and development of bone/cartilage organoids are rapidly gaining momentum, providing opportunities for both fundamental and translational research in bone biology. Bone/cartilage organoids, essentially miniature bone/cartilage tissues grown in vitro, enable the study of complex cellular interactions, biological processes, and disease pathology in a representative and controlled environment. This review provides a comprehensive and up-to-date overview of the field, focusing on the strategies for bone/cartilage organoid construction strategies, progresses in the research, and potential applications. We delve into the significance of selecting appropriate cells, matrix gels, cytokines/inducers, and construction techniques. Moreover, we explore the role of bone/cartilage organoids in advancing our understanding of bone/cartilage reconstruction, disease modeling, drug screening, disease prevention, and treatment strategies. While acknowledging the potential of these organoids, we discuss the inherent challenges and limitations in the field and propose potential solutions, including the use of bioprinting for organoid induction, AI for improved screening processes, and the exploration of assembloids for more complex, multicellular bone/cartilage organoids models. We believe that with continuous refinement and standardization, bone/cartilage organoids can profoundly impact patient-specific therapeutic interventions and lead the way in regenerative medicine.

Optical sorting: past, present and future

Optical sorting combines optical tweezers with diverse techniques, including optical spectrum, artificial intelligence (AI) and immunoassay, to endow unprecedented capabilities in particle sorting. In comparison to other methods such as microfluidics, acoustics and electrophoresis, optical sorting offers appreciable advantages in nanoscale precision, high resolution, non-invasiveness, and is becoming increasingly indispensable in fields of biophysics, chemistry, and materials science. This review aims to offer a comprehensive overview of the history, development, and perspectives of various optical sorting techniques, categorised as passive and active sorting methods. To begin, we elucidate the fundamental physics and attributes of both conventional and exotic optical forces. We then explore sorting capabilities of active optical sorting, which fuses optical tweezers with a diversity of techniques, including Raman spectroscopy and machine learning. Afterwards, we reveal the essential roles played by deterministic light fields, configured with lens systems or metasurfaces, in the passive sorting of particles based on their varying sizes and shapes, sorting resolutions and speeds. We conclude with our vision of the most promising and futuristic directions, including AI-facilitated ultrafast and bio-morphology-selective sorting. It can be envisioned that optical sorting will inevitably become a revolutionary tool in scientific research and practical biomedical applications.

Iron homeostasis and ferroptosis in muscle diseases and disorders: mechanisms and therapeutic prospects

The muscular system plays a critical role in the human body by governing skeletal movement, cardiovascular function, and the activities of digestive organs. Additionally, muscle tissues serve an endocrine function by secreting myogenic cytokines, thereby regulating metabolism throughout the entire body. Maintaining muscle function requires iron homeostasis. Recent studies suggest that disruptions in iron metabolism and ferroptosis, a form of iron-dependent cell death, are essential contributors to the progression of a wide range of muscle diseases and disorders, including sarcopenia, cardiomyopathy, and amyotrophic lateral sclerosis. Thus, a comprehensive overview of the mechanisms regulating iron metabolism and ferroptosis in these conditions is crucial for identifying potential therapeutic targets and developing new strategies for disease treatment and/or prevention. This review aims to summarize recent advances in understanding the molecular mechanisms underlying ferroptosis in the context of muscle injury, as well as associated muscle diseases and disorders. Moreover, we discuss potential targets within the ferroptosis pathway and possible strategies for managing muscle disorders. Finally, we shed new light on current limitations and future prospects for therapeutic interventions targeting ferroptosis.

Type 2 immunity in allergic diseases

Significant advancements have been made in understanding the cellular and molecular mechanisms of type 2 immunity in allergic diseases such as asthma, allergic rhinitis, chronic rhinosinusitis, eosinophilic esophagitis (EoE), food and drug allergies, and atopic dermatitis (AD). Type 2 immunity has evolved to protect against parasitic diseases and toxins, plays a role in the expulsion of parasites and larvae from inner tissues to the lumen and outside the body, maintains microbe-rich skin and mucosal epithelial barriers and counterbalances the type 1 immune response and its destructive effects. During the development of a type 2 immune response, an innate immune response initiates starting from epithelial cells and innate lymphoid cells (ILCs), including dendritic cells and macrophages, and translates to adaptive T and B-cell immunity, particularly IgE antibody production. Eosinophils, mast cells and basophils have effects on effector functions. Cytokines from ILC2s and CD4+ helper type 2 (Th2) cells, CD8 + T cells, and NK-T cells, along with myeloid cells, including IL-4, IL-5, IL-9, and IL-13, initiate and sustain allergic inflammation via T cell cells, eosinophils, and ILC2s; promote IgE class switching; and open the epithelial barrier. Epithelial cell activation, alarmin release and barrier dysfunction are key in the development of not only allergic diseases but also many other systemic diseases. Recent biologics targeting the pathways and effector functions of IL4/IL13, IL-5, and IgE have shown promising results for almost all ages, although some patients with severe allergic diseases do not respond to these therapies, highlighting the unmet need for a more detailed and personalized approach.

Responses

Your email address will not be published. Required fields are marked *