TCMEval-SDT: a benchmark dataset for syndrome differentiation thought of traditional Chinese medicine

AdMaPlace March 18, 2025

0 Comments

TCMEval-SDT: a benchmark dataset for syndrome differentiation thought of traditional Chinese medicine

Background & Summary

Traditional Chinese Medicine (TCM) plays a significant role in the treatment and prevention of diseases and is an important part of the world’s traditional medicine^1,2. For example, artemisinin’s effectively treat polycystic ovarian syndrome (PCOS) by mediating the LONP1-CYP11A1 interaction, leading to decreased androgen synthesis³. An herbal-based injection has been demonstrated to be effective in reducing 28-day mortality in patients with sepsis⁴. Bianzheng Lunzhi (Syndrome Differentiation and Treatment) is a core component of the theoretical framework of TCM. This personalized diagnostic and therapeutic approach involves a comprehensive analysis of various factors, including the patient’s specific disease, constitution, and environmental conditions, to determine the most appropriate treatment plan. Bianzheng Lunzhi represents the fundamental strategy and methodology of clinical practice in TCM^5,6.

Over the last few decades, artificial intelligence (AI) has seen rapid advancements in diverse industries. AI is increasingly demonstrating its potential in the medical field, with AI algorithms and models achieving significant results in disease diagnosis, drug discovery, patient car⁷. To objectively evaluate the performance of these AI algorithms and models, several benchmark datasets are currently being used. For example, DigestPath⁸ is utilized to assess gastrointestinal pathology detection algorithms, and MultiMedQA serves as the benchmark for evaluating medical questions⁹.

The diagnostic procedure in TCM clinical practice is different from that of Western medicine in that it diagnoses not only disease but also syndrome. The process of diagnosing a disease contains medical history collection, physical examination, medication use and laboratory tests. However, for diagnosing syndrome, there are no specialized benchmark for the process of syndrome differentiation. Existing benchmark datasets mostly focus on answering basic TCM knowledge questions, such as TCM Bench¹⁰, or on evaluating syndromes derived from case analysis, such as TCM-SD¹¹. However, these benchmark datasets do not cover the reasoning process of TCM syndrome diagnosis.

To address the above problems, this study first summarizes the TCM syndrome diagnosis into four steps. (1) clinical information extraction; (2) pathogenesis reasoning; (3) syndrome reasoning; and (4) explanatory summary. Based on this framework, we annotated and curated the TCM medical records. To this end, we have developed TCMEval-SDT, a benchmark dataset specifically designed to evaluate the ability of algorithms or models in TCM clinical diagnosis through syndrome differentiation. Our study aims to advance the development of algorithms or models capable of syndrome differentiation thinking in TCM, such as enabling large language models (LLMs) to think or reason like TCM clinicians during syndrome differentiation using Chain-of-Thought (CoT)¹² based on TCMEval-SDT. Ultimately achieve automated diagnosis in the field of TCM. This study has three main objectives:

1.

To present a large TCM syndrome diagnosis dataset with the metadata that comply with Findable, Accessible, Interoperable, and Reusable (FAIR) principles¹³. For example, medical record ID (DE0087751), clinical Data (DE0087752), clinical information (DE0087755), TCM pathogenesis (DE0087756), TCM syndrome (DE0087757) and explanatory summary (DE0087753).
2.

To establish evaluation metrics and allow users to evaluate their answers for performance assessment.
3.

To invite users to submit new data to collaboratively build and reuse a benchmark dataset for syndrome diagnosis in TCM, aiming to improve the reusability of data and the overall quality of TCM assessment datasets.

Methods

In this study, the medical records were processed by TCM-Experts, ensuring that all medical records underwent anonymization. A rigorous quality assurance process was implemented to ensure the privacy, accuracy, and reliability of the collected medical records. Subsequently, 300 medical records were selected through manual screening. These records were annotated using Baibu Knowledge Engine^14,15, a corpus Tool in the field of TCM that supports automatic annotation, human-machine combined annotation, and manual annotation modes for entity and relation annotation, to construct a comprehensive and systematically organised dataset for TCM syndrome diagnosis.

Data collection

The medical records were sourced from a self-built database established by our team, curated by experts from the Institute of Information on Traditional Chinese Medicine-China Academy of Chinese Medical Sciences, the Institute of Basic Theory for Chinese Medicine-China Academy of Chinese Medical Sciences, and senior TCM students. The data were collected from diverse sources, such as the China National Knowledge Infrastructure (CNKI, https://www.cnki.net), Wanfang data (https://www.wanfangdata.com.cn), classical Chinese medical texts and medical records from hospitals.

The data were first screened by TCM experts according to the following standards: (1) Complete medical record, including information such as clinical data and clinical experience, etc.; (2) Cases of common diseases. Cases of rare diseases and duplicate cases were excluded. To evaluate the quality of TCM medical records, we developed a TCM Medical Record Quality Assessment Scale (as shown in Table 1) based on the CARE guidelines and TCM expert opinions. This scale comprises ten sub-items, including patient information, clinical findings, timeline, and diagnostic evaluation, to systematically assess the quality of TCM case data. Evaluation results are categorized as “clearly described” “not clearly described” and “ not described” with corresponding scores of 1, 0.5, and 0, respectively^16,17. The TCM expert group assessed the quality of the manually screened cases using this scale, excluding cases with scores lower than 6 and including those with scores of 6 or higher.

Table 1 Details of TCM Medical Record Quality Assessment Scale.

Full size table

Data pre-processing and anonymization

The preprocessing workflow for the medical records is shown in Fig. 1. The first step involves anonymizing each medical record by permanently removing identifiable information, such as patient ID and name, to protect patient privacy. The second step entails cleaning and organizing the data by removing duplicate or null data and standardizing the medical records. The FAIR principles serve as foundational guidelines for data sharing and reuse. To support these goals, we designed metadata for medical records in our study that comply with the FAIR principles. We shared the metadata of the TCMEval-SDT dataset on the CDE Portal (https://cdeportal.bmicc.cn), a public metadata registration and management platform, to facilitate the design and management of metadata for similar future projects (as shown in Table 2). We organized unstructured data, including TXT, PDF, Word, and HTML files, into structured data according to metadata requirements, and then assigned a unique identifier to each medical record. Finally, we constructed a benchmark database for syndrome diagnosis, named TCMEval-SDT.

Table 2 FAIR-compliant metadata of medical record in TCMEval-SDT dataset.

Full size table

Data selection and annotation

The diagnosis of syndromes in TCM is inherently multidimensional, involving a comprehensive evaluation of the interactions between a patient’s physiological, pathological, and environmental factors. For theoretical analysis and practical guidance, we have summarized the TCM syndrome diagnosis process into four steps, as illustrated in Fig. 2.

(1)

Clinical Information Extraction: emulating TCM clinicians in obtaining clinical information from the patient’s medical data.
(2)

Pathogenesis Reasoning: Inferring TCM pathogenesis from relevant clinical information.
(3)

Syndrome Reasoning: Inferring TCM syndromes from relevant TCM pathogenesis.
(4)

Explanatory Summary: Summarizing clinical experiences and insights from TCM clinicians.

Entity and relation for medical record

We selected 300 medical records and employed the Baibu Knowledge Engine to annotate them according to the aforementioned steps. The annotated entities and their relations are shown in Tables 3, 4.

Table 3 Annotated entity of medical records.

Full size table

Table 4 Annotated relation of entity in medical records.

Full size table

Annotation guidelines

(1)

We classified the clinical information into two types: relevant information and irrelevant information. Relevant information refers to critical clinical information that significantly influences the diagnostic process, while irrelevant information refers to clinical information that does not impact the diagnosis. The annotated entities include only the relevant information in the TCM syndrome diagnosis process. For example, belching (clinical information) – stomach qi upward (TCM pathogenesis) – liver and stomach disharmony (syndrome). Irrelevant information, such as “red tongue with white coating” is excluded from the annotation scope as it does not directly influence this diagnostic process.
(2)

It is essential that the annotated entities must be as comprehensive as possible. For example, in “painful distension behind the sternum and in the epigastric region”, the entire phrase must be annotated to prevent loss of critical information by annotating only “painful”.
(3)

Inferential relationships exist between clinical information and TCM pathogenesis, and also between TCM pathogenesis and TCM syndromes. For example, extracting clinical information such as “belching” and “depressed state” leads to the inference of TCM pathogenesis, including “stomach qi upward” and “liver-qi stagnation”. Integrating these pathogenic indicators results in the identification of TCM syndromes like “liver and stomach disharmony”.
(4)

In this study, the annotation task adheres to a specific rule for long mentions where multiple entities are connected: each entity with independent significance is annotated separately. For example, in the phrase “painful distension behind the sternum and in the epigastric region, burning sensation behind the sternum, sensation of obstruction when swallowing, accompanied by belching and nausea”, the annotation was conducted as follows: “painful distension behind the sternum and in the epigastric region”, “burning sensation behind the sternum”, “sensation of obstruction when swallowing” accompanied by “belching” and “nausea”. This approach ensures that each meaningful entity is properly annotated based on its individual significance.

Example of clinical records annotation through the Baibu Knowledge Engine

Figure 3 illustrates an example of a TCM record annotated using the Baibu Knowledge Engine. TCM experts annotate the clinical Information, TCM pathogenesis, TCM syndrome, and its relations.

Example of the thought process design in syndrome differentiation

Figure 4 illustrates the detailed design of the thought process in syndrome differentiation. TCM experts extract clinical information and infer TCM pathogenesis based on the clinical data. The inferred pathogenesis is then used to deduce the corresponding syndromes. This process emulates the specific reasoning steps employed by TCM clinicians during syndrome differentiation, providing AI algorithms and models with detailed steps to emulate this reasoning process.

Data evaluation

After the data annotation process was completed, a quality assessment was performed on the 300 medical records used in this study. Each medical record was thoroughly annotated to ensure the completeness and accuracy of the case information. Additionally, to reduce potential biases introduced by incomplete information, all data records were required to contain no missing values. Finally, to maintain the representativeness of the sample, rare medical records were excluded. The final statistics of all TCM medical records, classified according to the ICD-11 for Mortality and Morbidity Statistics (https://icd.who.int/en) are shown in Table 5. All 300 annotated medical records satisfied the aforementioned selection criteria.

Table 5 Statistics of 300 TCM medical records classified by disease according to the ICD-11.

Full size table

Data Records

TCMEval-SDT benchmark dataset is available for access and download on Figshare¹⁸, provided under the CC-BY 4.0 license. A total of 300 medical records were incorporated into TCMEval-SDT to create this benchmark dataset. The data were divided into training (n = 200), testing (n = 50), and validation sets (n = 50) following a 4:1:1 ratio. To aid the algorithm or model in performing diagnosis, four subtasks were designed for each case: (1) data extraction; (2) pathogenesis reasoning; (3) syndrome reasoning; and (4) explanation summary. Pathogenesis reasoning and syndrome reasoning were formatted as multiple-choice questions with ten options to assess the model’s diagnostic reasoning ability. The multiple-choice questions on pathogenesis and syndrome reasoning are generated by a Python script (generate_multiple_choice_options.py), which is available on Figshare¹⁸. This script first collects the annotated pathogenesis and syndrome data from the TCMEval-SDT dataset, randomizes the options, and creates option lists for pathogenesis and syndrome. Finally, the script generates the ten options multiple-choice questions by selecting the correct options based on medical records, along with randomly selected options from the pathogenesis and syndrome lists. The Python script (evaluate.py) used for technical validation is also available on Figshare¹⁸.

The TCMEval-SDT dataset includes three JSON files: (1) Train_TCM_Data_v1.json containing 200 cases, (2) Test_TCM_Data_v1.json containing 50 cases, and (3) Validation_TCM_Data_v1.json containing 50 cases. Table 6 provides an overview of the metadata for the dataset. In TCMTval-SDT, we have designed metadata in accordance with the FAIR principles. All information can be accessed via the CDE Portal, laying the foundation for scientific data sharing in the field of TCM, and supporting more researchers in using and developing TCMTval-SDT. It is important to note that the Test_TCM_Data_v1.json file and Validation_TCM_Data_v1.json file do not include the information of pathogenesis reasoning, syndrome reasoning and its correct answer options.

Table 6 Metadata for clinical record in the TCMEval-SDT dataset.

Full size table

Technical Validation

In this chapter, we first introduce the criteria for evaluating answers. To validate and evaluate TCMEval-SDT, we selected four publicly available LLMs and randomly selected 50 medical records from the training set (n = 200). Using these records, we constructed zero-shot prompts to compare the TCM syndrome diagnosis capabilities of different LLMs.

Answer evaluation scheme

For the responses generated by LLMs, we have developed evaluation criteria tailored to each of the four tasks.

Task 1 Clinical information extraction:

$${S}_{c}=frac{|Acap {B}|}{|{A}|}$$

(1)

where S_c is the score of Task 1; |A| is the number of clinical information for medical record extracted by TCM Expert; (|{A}cap {B}|) is the number of intersections of clinical information for medical record extracted by TCM Expert and clinical information for medical record extracted by the LLMs

Task 2 Pathogenesis reasoning:

$${S}_{p}=frac{left|Acap Bright|}{left|Aright|+left|bar{A}cap Bright|}$$

(2)

where S_p is the score of Task 2; A is the set of correct answers of TCM pathogenesis; B is the set of answers selected of TCM pathogenesis by LLMs; (left|Acap Bright|) is the number of options selected correctly by LLMs; |A| is the number of correct answers of TCM pathogenesis; (left|bar{A}cap Bright|) is the number of options selected incorrectly by LLMs.

Task 3 Syndrome reasoning

$${S}_{s}=frac{left|Acap Bright|}{left|Aright|+left|bar{A}cap Bright|}$$

(3)

where S_s is the score of Task 3; A is the set of correct answers of TCM syndrome; B is the set of answers selected of TCM syndrome by LLMs. (left|Acap Bright|) is the number of options selected correctly by LLMs; |A| is the number of correct answers of TCM syndrome; (left|bar{A}cap Bright|) is the number of options selected incorrectly by LLMs.

Task 4 Explanatory summary:

$${S}_{r}={ROUGE}{rm{_}}L(X,Y)$$

(4)

where S_r is the score of Task 4, calculated based on ROUGE-L¹⁹; X is the generated text; Y is the reference text.

Final score S_f for LLMs in TCM syndrome diagnosis task is:

$${S}_{f}={omega }_{1}{S}_{c}+{omega }_{2}{S}_{p}+{omega }_{3}{S}_{s}+{omega }_{4}{S}_{r}$$

(5)

where ({omega }_{1}=0.2,{omega }_{2}=0.3,{omega }_{3}=0.4,and,{omega }_{4}=0.1) are the weights assigned to each task score.

Design of experiment and result analysis

In this study, we selected four publicly available LLMs: ChatGPT²⁰, Gemini 1.5-pro²¹, ChatGLM-130B²², and Tongyi Qianwen²³. For each medical record, we designed zero-shot prompts. The validation process consisted of two steps: (1) testing the selected LLMs via API calls and manual queries; (2) manually organizing the responses from the LLMs. We queried the dataset (n = 50) using both API calls and manual questioning, initially verifying whether the responses from the LLMs adhered to the required format. Subsequently, TCM experts reviewed the responses to identify any null values or formatting inconsistencies.

We employed evaluation scripts to assess the responses generated by the LLMs, as illustrated in Fig. 5. Overall, ChatGLM-130B demonstrated the best performance, achieving the highest total weighted score of 24.7378, followed by Gemini 1.5-pro and ChatGPT with weighted scores of 23.1816 and 21.4753, respectively. ChatGLM-130B performed excellently in Task 1 and Task 2 (see Fig. 5b,c), with weighted scores of 6.2112 and 7.98. For Task 3 (see Fig. 5d) and Task 4, Gemini 1.5-pro demonstrated superior performance, with weighted scores of 9.5067 and 1.4765, and ChatGLM-130B performance was slightly inferior. During the experiments, we observed that ChatGLM-130B and Gemini 1.5-pro demonstrated notable proficiency in TCM diagnostic tasks, achieving commendable scores across all four sub-tasks.

Usage Notes

The TCMEval-SDT benchmark dataset is available for download and review on Figshare¹⁸. This dataset was created to assess the capabilities of algorithms and models in the diagnosis of TCM syndromes. It has been meticulously curated and annotated by TCM experts and includes the following components: Medical Record ID, Medical Data, Explanatory Summary, Syndrome Differentiation, Clinical Information, TCM Pathogenesis, TCM Syndrome, Options of TCM Pathogenesis, Options of TCM Syndrome, Answers of TCM Pathogenesis, and Answers of TCM Syndrome.

However, the released dataset has several limitations. Currently, the dataset is relatively small in size, for example, it contains only four medical record related Qi, blood and fluid disorders,. In the future, we plan to include additional medical records and gradually expand the overall size of the dataset to ensure a more balanced distribution of disease types. Additionally, we aim to incorporate rare disease cases from TCM to develop a more specialized diagnostic dataset. We invite enthusiasts to join our community in enhancing this syndrome diagnosis benchmark dataset and contribute to the advancement of scientific data sharing and reuse.

Categories: Medical research, Signs and symptoms

Energy metabolism in health and diseases

Energy metabolism is indispensable for sustaining physiological functions in living organisms and assumes a pivotal role across physiological and pathological conditions. This review provides an extensive overview of advancements in energy metabolism research, elucidating critical pathways such as glycolysis, oxidative phosphorylation, fatty acid metabolism, and amino acid metabolism, along with their intricate regulatory mechanisms. The homeostatic balance of these processes is crucial; however, in pathological states such as neurodegenerative diseases, autoimmune disorders, and cancer, extensive metabolic reprogramming occurs, resulting in impaired glucose metabolism and mitochondrial dysfunction, which accelerate disease progression. Recent investigations into key regulatory pathways, including mechanistic target of rapamycin, sirtuins, and adenosine monophosphate-activated protein kinase, have considerably deepened our understanding of metabolic dysregulation and opened new avenues for therapeutic innovation. Emerging technologies, such as fluorescent probes, nano-biomaterials, and metabolomic analyses, promise substantial improvements in diagnostic precision. This review critically examines recent advancements and ongoing challenges in metabolism research, emphasizing its potential for precision diagnostics and personalized therapeutic interventions. Future studies should prioritize unraveling the regulatory mechanisms of energy metabolism and the dynamics of intercellular energy interactions. Integrating cutting-edge gene-editing technologies and multi-omics approaches, the development of multi-target pharmaceuticals in synergy with existing therapies such as immunotherapy and dietary interventions could enhance therapeutic efficacy. Personalized metabolic analysis is indispensable for crafting tailored treatment protocols, ultimately providing more accurate medical solutions for patients. This review aims to deepen the understanding and improve the application of energy metabolism to drive innovative diagnostic and therapeutic strategies.

AdMaPlace March 19, 2025

0 Comments

Tissue macrophages: origin, heterogenity, biological functions, diseases and therapeutic targets

Macrophages are immune cells belonging to the mononuclear phagocyte system. They play crucial roles in immune defense, surveillance, and homeostasis. This review systematically discusses the types of hematopoietic progenitors that give rise to macrophages, including primitive hematopoietic progenitors, erythro-myeloid progenitors, and hematopoietic stem cells. These progenitors have distinct genetic backgrounds and developmental processes. Accordingly, macrophages exhibit complex and diverse functions in the body, including phagocytosis and clearance of cellular debris, antigen presentation, and immune response, regulation of inflammation and cytokine production, tissue remodeling and repair, and multi-level regulatory signaling pathways/crosstalk involved in homeostasis and physiology. Besides, tumor-associated macrophages are a key component of the TME, exhibiting both anti-tumor and pro-tumor properties. Furthermore, the functional status of macrophages is closely linked to the development of various diseases, including cancer, autoimmune disorders, cardiovascular disease, neurodegenerative diseases, metabolic conditions, and trauma. Targeting macrophages has emerged as a promising therapeutic strategy in these contexts. Clinical trials of macrophage-based targeted drugs, macrophage-based immunotherapies, and nanoparticle-based therapy were comprehensively summarized. Potential challenges and future directions in targeting macrophages have also been discussed. Overall, our review highlights the significance of this versatile immune cell in human health and disease, which is expected to inform future research and clinical practice.

AdMaPlace March 19, 2025

0 Comments

Decreased miR-128-3p in serum exosomes from polycystic ovary syndrome induces ferroptosis in granulosa cells via the p38/JNK/SLC7A11 axis through targeting CSF1

Increasing evidence suggests that non-coding small RNAs (miRNAs) carried by exosomes (EXOs) play important roles in the development and treatment of polycystic ovary syndrome (PCOS). In this study, we demonstrate that PCOS mouse serum-derived EXOs promote granulosa cells (GCs) ferroptosis, and induce the occurrence of a PCOS-like phenotype in vivo. Notably, EXO miRNA sequencing combined with in vitro gain- and loss-of-function assays revealed that miR-128-3p, which is absent in the serum-derived EXOs of mice with PCOS, regulates lipid peroxidation and GC sensitivity to ferroptosis inducers. Mechanistically, overexpression of CSF1, a direct target of miR-128-3p, reversed the anti-ferroptotic effect of miR-128-3p. Conversely, ferroptosis induction was mitigated in CSF1-downregulated GCs. Furthermore, we demonstrated that miR-128-3p inhibition activates the p38/JNK pathway via CSF1, leading to NRF2-mediated down-regulation of SLC7A11 transcription, which triggers GC iron overload. Moreover, intrathecal miR-128-3p AgomiR injection into mouse ovaries ameliorated PCOS-like characteristics and restored fertility in letrozole-induced mice. The study reveals the pathological mechanisms of PCOS based on circulating EXOs and provides the first evidence of the roles of miR-128-3p and CSF1 in ovarian GCs. This discovery is expected to provide promising therapeutic targets for the treatment of PCOS.

AdMaPlace March 9, 2025

0 Comments

Advance in peptide-based drug development: delivery platforms, therapeutics and vaccines

The successful approval of peptide-based drugs can be attributed to a collaborative effort across multiple disciplines. The integration of novel drug design and synthesis techniques, display library technology, delivery systems, bioengineering advancements, and artificial intelligence have significantly expedited the development of groundbreaking peptide-based drugs, effectively addressing the obstacles associated with their character, such as the rapid clearance and degradation, necessitating subcutaneous injection leading to increasing patient discomfort, and ultimately advancing translational research efforts. Peptides are presently employed in the management and diagnosis of a diverse array of medical conditions, such as diabetes mellitus, weight loss, oncology, and rare diseases, and are additionally garnering interest in facilitating targeted drug delivery platforms and the advancement of peptide-based vaccines. This paper provides an overview of the present market and clinical trial progress of peptide-based therapeutics, delivery platforms, and vaccines. It examines the key areas of research in peptide-based drug development through a literature analysis and emphasizes the structural modification principles of peptide-based drugs, as well as the recent advancements in screening, design, and delivery technologies. The accelerated advancement in the development of novel peptide-based therapeutics, including peptide-drug complexes, new peptide-based vaccines, and innovative peptide-based diagnostic reagents, has the potential to promote the era of precise customization of disease therapeutic schedule.

AdMaPlace March 19, 2025

0 Comments

Landscape of small nucleic acid therapeutics: moving from the bench to the clinic as next-generation medicines

The ability of small nucleic acids to modulate gene expression via a range of processes has been widely explored. Compared with conventional treatments, small nucleic acid therapeutics have the potential to achieve long-lasting or even curative effects via gene editing. As a result of recent technological advances, efficient small nucleic acid delivery for therapeutic and biomedical applications has been achieved, accelerating their clinical translation. Here, we review the increasing number of small nucleic acid therapeutic classes and the most common chemical modifications and delivery platforms. We also discuss the key advances in the design, development and therapeutic application of each delivery platform. Furthermore, this review presents comprehensive profiles of currently approved small nucleic acid drugs, including 11 antisense oligonucleotides (ASOs), 2 aptamers and 6 siRNA drugs, summarizing their modifications, disease-specific mechanisms of action and delivery strategies. Other candidates whose clinical trial status has been recorded and updated are also discussed. We also consider strategic issues such as important safety considerations, novel vectors and hurdles for translating academic breakthroughs to the clinic. Small nucleic acid therapeutics have produced favorable results in clinical trials and have the potential to address previously “undruggable” targets, suggesting that they could be useful for guiding the development of additional clinical candidates.

AdMaPlace March 19, 2025

0 Comments

Background & Summary

Methods

Data collection

Data pre-processing and anonymization

Data selection and annotation

Entity and relation for medical record

Annotation guidelines

Example of clinical records annotation through the Baibu Knowledge Engine

Example of the thought process design in syndrome differentiation

Data evaluation

Data Records

Technical Validation

Answer evaluation scheme

Design of experiment and result analysis

Usage Notes

Related Articles

Responses