Evolution and impact of the science of science: from theoretical analysis to digital-AI driven research

Introduction

Since its creation, the Science of Science (SoS), which takes science as a whole as its object of study, has always been closely linked with the development and evolution of science and technology (S&T), and has taken the study of the laws of science’s development and its social functions as the core of its disciplinary evolution. The development of S&T has promoted the transformation of SoS’s research paradigm. At present, the latest scientific and technological revolutions and industrial transformations are on the way, digital intelligence-driven scientific discoveries and scientific research paradigm transformation promote scientific development to step into the intelligent era. Moreover, “AI for Science” fully integrated into scientific research, decision-making, scientific management (governance), development strategy and other aspects. SoS witnesses unprecedented significant development opportunities and challenges.

The evolution of SoS has been significantly influenced by shifts in scientific research paradigms, tracing its roots back to philosophical and scientific thought before SoS was formally established. The spotlight on science’s role in societal development and human progress, especially noted in the early 20th century, paved the way for SoS’s formal inception. In 1925, Polish sociologist F. Znaniecki advocated for a “science of science,” laying foundational ideas for SoS. A pivotal moment came in 1931 when the former Soviet delegate N. Bukhari, at the Second International Conference on the History of Science in London, presented Boris Hessen’s report on Newton’s “Principia” linking Newtonian mechanics to social contexts, which is considered a landmark exploration in SoS. The 1939 publication “The Social Function of Science” by British physicist J. D. Bernal further established SoS, emphasizing the analysis of the connection between science and society. By the 1960s, the emergence of scientometrics shifted SoS from sociological analysis to a quantitative phase, leading to its maturity. This conventional phase of SoS has since fostered the growth of related fields such as S&T management, innovation, and policy.

In the 21st century, the emergence of data-driven scientific research has rejuvenated the Study of SoS, with “AI for Science” leading the charge into new research paradigms. This shift has sparked a greater emphasis on conducting research over understanding the deeper nuances of science itself, making the scientific process and knowledge system increasingly complex. This complexity demands a more profound understanding and explanation of science, facilitated by the advent of large-scale data, advanced computing, and precise algorithms, which significantly enhance our ability to explain scientific phenomena. Consequently, SoS is poised to evolve into a more integrated discipline, signaling its maturity and underscoring the need for a new, modern framework for SoS (Kuhn, 1962); (Shneider, 2009).

The paper delves into the evolution and history of the Study of SoS, emphasizing its pivotal role in blending science, technology, and society. By systematically examining the SoS’s growth, the paper aims to highlight how SoS has been crucial in linking and advancing both the natural and social sciences. It acknowledges and interprets the importance of science in societal roles, offering theoretical and practical frameworks to better grasp the intricate relationship between science, technology, and society. This approach aims to enrich our understanding of science by exploring the dynamic interactions among these fields.

The social function of science: Sociological research on SoS

Since the 19th century, S&T have become increasingly intertwined, forming an indissoluble bond. Unresolved scientific questions often lead to technological innovations, while scientific endeavors rely on technological advancements for experimental tools and methods. This interplay has deeply influenced modern society, which now fundamentally rests on the pillars of contemporary S&T. The impact of natural sciences on society has, in turn, become a focus of social sciences, with their methods and theories providing critical insights and approaches in social scientific inquiry. The concept of scientific labor’s socialization has also entered academic discourse, highlighted by Hessen’s (1931) exploration of the social and economic influences on Newton’s “Principia” and Merton’s works, such as “Science, Technology, and Society in Seventeenth-Century England” (Merton, 1973) and “Social Theory and Social Structure” (Merton, 1973), marking the advent of a sociological lens in science studies.

In the early 20th century, the field of ‘science studies’ began to take shape, notably with Polish sociologist Florian Witold Znaniecki coining the term in his 1925 article “The Object and Tasks of the Science of Knowledge.” The following year, N. Richevsky from the former Soviet Union expanded the concept to include the societal impacts of science. Further clarity and terminology, ‘science of science’ were introduced by Tadeusz Kotarbinski in 1929, and by Polish sociologists, the Ossowskis, in 1935, broadening the research scope within this emerging field. A landmark moment came with Boris Hessen’s 1931 presentation in London, focusing on the social factors influencing natural science, highlighting the field’s importance. American sociologist Robert King Merton, influenced by Hessen, delved into the science-society relationship in 17th-century England, adopting a sociological approach. Merton’s identification of the ‘Matthew effect’ played a crucial role in establishing the sociology of science and science studies, significantly advancing our understanding of the dynamics between science and societal structures.

In 1939, the British scientist J.D. Bernal, in his book “The Social Function of Science”, presented for the first time a comprehensive and extensive proposal to study the entire realm of scientific issues using the historical and sociological methods of natural science. Simultaneously, he advocated the integration of qualitative and quantitative research methods, aiming to conduct a comprehensive examination of the significant social phenomenon of “science” within the broader societal framework (Bernal 1939). This groundbreaking approach gave rise to a novel discipline known as the SoS. Upon its publication, this book swiftly became the foundational theoretical work in the field of SoS. Furthermore, Bernal authored numerous works that encapsulated the scientific achievements of his time, revealing the philosophical significance of science and its role in human history. His writings also addressed the contradictions in the development of science within class societies and the continuous progress of science under socialist systems. Due to Bernal’s prominent contributions to the study of the SoS, he is acknowledged in academia as the founder of the discipline of SoS.

Since the 1960s, the study of SoS has evolved to encompass a range of research methods and theories, introducing new dimensions of analysis. The historical dimension focuses on and examines the historical processes of scientific development, the evolution of scientific thought, as well as the formation and changes in scientific paradigms. In addition, one key area, scientometrics, focusing on the quantitative analysis of science, has since become a central discipline within SoS, revitalizing the field with its focus on measuring and understanding scientific output and influence.

From qualitative analysis to large-scale quantitative analysis: Scientometrics as a branch of SoS

The established research paradigm within SoS has traditionally hinged on the compilation of past experiences and the construction of speculative theories, thereby securing its academic standing and theoretical foundation. Nevertheless, the rapid progress in S&T has posed substantial challenges to this conventional approach. In the era of “big science,” the exponential expansion of scientific knowledge necessitates that the SoS discipline refine its ability to pinpoint the leading edge of scientific and technological advancements, disclose the underlying forces driving scientific progress, and furnish actionable insights for societal utility. The adoption of quantitative analysis methods that are both calculable and interpretable, grounded in the field’s extant theories, hypotheses, and models, is of paramount importance.

Statistical analysis of the laws of scientific development

The field of science studies, often traced back to the 1920s or 1930s, actually began in the late 19th century with natural scientists collecting statistical data on peers using mathematical statistical methods. Victorian naturalist Sir Francis Galton (1822–1911) was pivotal, in identifying the ‘regression toward the mean’ phenomenon (Galton, 1886) and introducing the correlation coefficient (Lee Rodgers & Nicewander, 1988). He analyzed traits of British scientists in ‘English Men of Science: Their Nature and Nurture’ using questionnaires (Galton, 1874). Inspired by Galton, James McKeen Cattell evaluated American scientists’ productivity based on peer review (Cattell, 1906).

Scientific literature became a focus of quantitative research. Alfred James Lotka (1880–1949) proposed Lotka’s law, stating that the number of authors who have written n papers is approximately 1/n² of the number of authors who have written 1 paper (Lotka, 1926). Other milestones include Zipf’s Law and Bradford’s Law, revealing the distribution of word frequency in English literature (Kingsley Zipf, 1932) and the dispersion/concentration of scientific literature in journals (Bradford, 1934).

Early studies had limited understanding of science due to a lack of theoretical guidance. Galton proposed ‘genius is hereditary’ in ‘Hereditary Genius’, and Lotka focused on the applicability of Lotka’s law (Galton, 1891). However, these studies impacted SoS development, highlighting the value of quantitative methods. John D. Bernal (1901–1971), the founder of SoS, used data charts in ‘The Social Function of Science’ to analyze government funding, teaching staff income, and scientist numbers (Bernal, 1939).

With the growth of scientific literature, establishing a quantitative research paradigm for SoS was crucial. Derek J. de Solla Price (1922–1983), influenced by Bernal, shifted from physics/mathematics to SoS. He identified the exponential growth of scientific literature, known as the Price Index (De Solla Price, 1975). Price’s works ‘Science since Babylon’ and ‘Little Science, Big Science’ organized and developed the research of Galton, Lotka, Bradford, and Zipf, proposing Price’s Law, a further deduction of Lotka’s law (D. J. D. S. Price, 1963). These works laid the foundation for quantitative research in SoS, attracting global interest.

Citation analysis and visualization in scientific knowledge networks

The development of SoS relies on scientific statistical data. American information scientist Eugene Eli Garfield (1925-2017) made significant contributions to this field by proposing in 1955 the use of citation relationships between scientific literature to analyze the dynamics and structure of scientific development (Garfield, 1955). This pivotal concept laid the groundwork for the subsequent establishment of databases like the Science Citation Index (SCI), Social Science Citation Index (SSCI), Journal Citation Report (JCR), and Arts and Humanities Citation Index (A&HCI), which collectively gather citation data from nearly 180 fields of books and journal articles. Garfield’s work not only provided a substantial data foundation for the quantitative research paradigm of SoS but also introduced new theories and methods, namely citation analysis, which greatly expanded the potential for SoS research development.

Citation frequency and its derived indicators, such as the h-index (Hirsch, 2005), have been widely adopted to quantify concepts like academic influence, scientific productivity, and labor efficiency. Garfield’s early work analyzing citation frequency identified patterns such as the correlation between rapid citation and Nobel Prize nominations (Garfield & Malin, 1968). The citation frequency has been a crucial tool for measuring the status and influence of American science (Board, 1981). Vasiliy Vasilevich Nalimov (1910-1997), the inventor of the term ‘scientometrics’ in the Soviet Union, emphasized the importance of citation rate over publication count for measuring scientific productivity (Granovsky, 2001).

Moreover, the distribution of citations over time has drawn attention, with scholars studying literature aging through citation curves. Bernal’s concept of “literature half-life” in 1958 (Xiao, 2011) and Price’s “Price index” (D. d. S. Price, 1976) in 1976 were key developments in quantifying literature aging. The “Sleeping Beauty in Science” phenomenon, identified by van Raan (Van Raan, 2004), refers to papers initially ignored that later received a surge in citations, which has attracted wide attention in SoS.

Citation relationships are used to construct networks to study the structure and evolution of scientific knowledge diffusion and cooperation. Price was the first scholar who discovered the phenomenon of ‘advantage accumulation’ in the growth of citation networks and gave a mathematical explanation for the evolution of citation networks (D. d. S. Price, 1976). The subsequent development of coupling analysis (Kessler 1963), co-citation (H. Small, 1973), co-authorship (Glänzel & Schubert, 2004), keyword co-occurrence (Callon, Courtial, & Laville, 1991), and reference co-occurrence (Porter & Chubin, 1985) have provided deeper insights into the dynamics of scientific knowledge. Social network analysis has further enriched the toolkit of scientometrics (Egghe & Rousseau, 1990).

Theoretical foundations for applying citation analysis to scientometrics have been established, considering citation motivation (Garfield, 1965) and information systems. Nalimov’s cybernetic model of science (Nalimov & Mul’chenko, 1969) and Merton’s sociological theories of citation (Merton, 1973) have contributed to understanding the role of citations in scientific evaluation (H. G. Small, 1978).

The paradigm shift from mathematical expression to visual representation in scientometrics has been facilitated by the introduction of information visualization techniques. Tools like CiteSpace (Chen, 2006), HistCite (Garfield, 2006), BibExcel (Persson, Danell, & Schneider, 2009), Vosviewer (N. Van Eck & Waltman, 2010), and CitnetExplorer (N. J. Van Eck & Waltman, 2014) have allowed for the visualization of citation networks, knowledge domains, and other complex data structures. This shift has led to the emergence of mapping knowledge domains as a new quantitative research field in SoS, enabling a more intuitive and comprehensive understanding of scientific knowledge networks and their evolution.

Large-scale and fine-grained shifts in SoS: leveraging big data and complex network

The evolution of SoS has witnessed a transformative shift towards large-scale and fine-grained analysis, driven by the emergence of big data and the application of complex network theory. This shift has been enabled by the accessibility of diverse data sources and the utilization of advanced computational tools, which have expanded the scope and depth of SoS research.

The availability of a wide range of data types, from social media to clinical trial records, has provided SoS researchers with a rich and interconnected dataset, such as OpenAlex, Publons, Dimensions, Semantic Scholar, Microsoft Academic Graph and so on. The opening of SciScinet in 2023, with its vast collection of over 134 million scientific documents and associated records, represents a significant leap in data availability and connectivity. This data deluge has not only increased the scale of SoS research but also facilitated more nuanced analyses (Z. Lin, Yin, Liu, & Wang, 2023). For instance, the study of the interaction between scientific papers and policies has become possible by constructing citation relationships between these two domains, revealing the integration of scientific research and policy-making (Yin, Gao, Jones, & Wang, 2021).

The scale of data in SoS research has exploded, enabling cross-validation of complex scientific problems and societal challenges. Research on scientists’ career dynamics (Huang, Gates, Sinatra, & Barabási, 2020; Wang, Jones, & Wang, 2019), collaboration patterns (Bu, Ding, Liang, & Murray, 2018; Bu, Murray, Ding, Huang, & Zhao, 2018; Wu, Wang, & Evans, 2019), and the evolution of science (Gates, Ke, Varol, & Barabási, 2019) itself have been enriched by the analysis of millions of scientific papers, grants, and mobility data. This scale has allowed for the exploration of subtle differences in career trajectories, collaboration dynamics, and disciplinary trends.

The introduction of complex networks has driven a shift towards more refined SoS research. These networks, characterized by self-organization, self-similarity, and scale-free properties, offer new perspectives on the dynamic processes, complex relationships, and underlying mechanisms of scientific development. Complex networks provide interpretability to the dynamic evolution of citation networks (Fortunato et al., 2018; Parolo et al., 2015), recognition of the intricate relationships between knowledge creation and diffusion (Sankar, Thumba, Ramamohan, Chandra, & Satheesh Kumar, 2020; X.-h. Zhang et al., 2019), and inference of the mechanisms driving scientific activities (Jia, Wang, & Szymanski, 2017; A. Zeng et al., 2019).

The field of complex networks has attracted notable scholars, leading to the emergence of a new paradigm in SoS research after 2010. However, this reliance on model characterization and interpretation has also led to the “trap of numbers,” (Sugimoto, 2021) highlighting the need for a balanced approach that combines quantitative data with qualitative insights to fully understand the complexities of SoS research.

From scientometrics to AI-driven research: The latest Paradigm of SoS

Since 2010, the development of scientific data infrastructures and the rise of artificial intelligence (AI) large models have led to a qualitative leap in SoS, both in terms of data sources and scale. The future of SoS lies in expanding its scope, particularly by incorporating more diverse methodologies and data sources (“Broader scope is key to the future of ‘science of science’,” 2022). This shift is crucial for addressing long-standing issues such as data scarcity, methodological limitations, and biases within the scientific community. In recent years, the ongoing global scientific and technological revolution, driven by digital intelligence and big data, has profoundly impacted traditional scientific research activities. This paradigm shift, characterized by “AI for science,” leverages massive datasets to learn scientific laws and natural principles, offering cutting-edge tools for advancing research (X. Zhang et al., 2023). AI plays a pivotal role in this transformation, acting as both a surrogate and a quant to enhance data collection, curation, and analysis (Messeri & Crockett, 2024), thereby improving the efficiency and scalability of SoS research. This evolution is propelling significant progress in SoS research and reshaping the landscape of scientific inquiry.

AI as Surrogate in SoS

In the face of complex research questions and the deluge of data collection and integration, AI serves as a powerful surrogate for scientometrics. It excels in processing vast amounts of data and generating alternative datasets, thereby reducing the burden on human researchers and enhancing the efficiency of data collection. Machine learning, deep learning and other artificial intelligence technologies address multi-dimensional, multi-modal, and multi-scenario data collection and simulation, assist researchers in a large number of experiments for verification (Peng et al., 2023), solve complex computing, and accelerate the pace of scientific research and technology development. For example, AI plays a vital role in sentiment annotation and entity extraction.

AI in sentiment annotation

Sentiment annotation is one of the most significant applications of AI in SoS. For instance, AI algorithms can analyze textual and audiovisual data from scientific publications, conference presentations, and online discussions to automatically label sentiments expressed by researchers. This not only streamlines the time-consuming process of manual annotation but also enables the detection of nuanced emotional trends within the scientific community that might otherwise be overlooked. Citation sentiment classification differs from general sentiment annotation tasks. While the latter focuses on semantic emotions in text (e.g., positive/negative/neutral), citation sentiment specifically refers to the stance (agreement, disagreement, or neutrality) that authors hold toward the cited work.

(1) Methods for sentiment annotation

Methods for sentiment analysis using artificial intelligence encompass rule-based approaches, supervised learning, semi-supervised learning, and unsupervised learning. These methods utilize predefined rules and dictionaries to identify sentiment words and phrases within a text, allowing for the judgment of the text’s sentiment. For example, a dictionary containing positive and negative words can be used to determine the sentiment of a text based on the frequency of these words. Supervised learning involves training a model to recognize sentiment words within a text and learn how to classify the text’s sentiment as positive, negative, or neutral. For instance, a large dataset of labeled text (i.e., text with known sentiment) can be used to train a classification model, such as a support vector machine (SVM) or a naive Bayes classifier, to predict the sentiment of new, unseen text. Semi-supervised learning leverages a small amount of labeled data and a large amount of unlabeled data. The model automatically labels the unlabeled data and then further trains itself using this newly labeled data. This approach is useful when labeled data is scarce and expensive to obtain. Unsupervised learning identifies sentiment patterns within a text, such as the co-occurrence of words, to infer the sentiment. For example, clustering algorithms can group similar texts based on their sentiment, allowing for the identification of different sentiment categories within a dataset.

(2) Sentiment annotation in SoS

AI has been successfully applied in analyzing the stance and sentiment within academic papers and reports to assess their scientific value and influence. For instance, citation sentiment analysis can evaluate how researchers view the contributions of cited documents—whether with agreement, disagreement, or neutrality—rather than only capturing the general semantic sentiment within the text. Using such methods, the stance in citations can be divided into positive (based on; corroboration; discovery; positive; practical; significant; standard; supply), neutral (contrast; co-citation; neutral), and negative (Aljuaid, Iftikhar, Ahmad, Asif, & Tanvir Afzal, 2021; Budi & Yaniasih, 2023; Kong et al., 2024). This distinction emphasizes that citation sentiment analysis is not just a subset of general sentiment classification, but a specialized task for academic discourse, providing a methodological foundation for the validation of classic SoS theoretical models such as social constructivist theory (Gilbert, 1977) and normative theory (Merton, 1973).

Moreover, with the rise of the “Public Understanding of Science” movement, the introduction of sentiment analysis methods helps the research of SoS to expand its research objects from the scientific community to the general public. This enables SoS to be more applicable to addressing issues regarding the relationship between modern society and science. AI-driven sentiment analysis can help policymakers identify and predict public opinions, providing essential support for decision-making processes. By analyzing large volumes of data from social media platforms and public reports, AI can track the emotional responses of different segments of the population toward scientific policies or projects. For example, during the COVID-19 pandemic, sentiment analysis of social media revealed key public concerns and emotional reactions, which could inform public health authorities on how to adjust their communication strategies to address fear and misinformation more effectively (Li et al., 2023). Besides, AI-driven sentiment analysis allows for the identification of emerging concerns that might not be immediately visible through traditional surveys or reports. This real-time analysis provides decision-makers with valuable insights to anticipate potential backlash or support for upcoming policies (Xue et al., 2020).

AI in entity extraction and annotation

Another illustrative example lies in the realm of entity extraction and annotation. AI tools can swiftly identify and classify key entities such as researchers, institutions, funding agencies, and research topics within vast corpora of scientific literature. This automation facilitates the construction of complex knowledge graphs, revealing intricate relationships and patterns that would be challenging to uncover manually.

Observation units are intelligently extracted from macro-units to micro-units of knowledge. In the past, scientific knowledge units are generally based on papers, patents, journals, or disciplines, and thus the research on scientific development in SoS is limited to macro-knowledge units (Y. Liu & Rousseau, 2010). Since the LDA was proposed in 2002, the efficiency of scientific or technical knowledge unit extraction based on scientific papers and patent texts has been greatly improved. After 2013, Large Language models (LLMs) such as Word2Vec, Top2Vec, and BERT, GPT which have powerful contextual understanding, language generation, and learning capabilities, optimize the analysis and annotation of full text, semantic recognition and prediction, strengthen the topic extraction efficiency and characterization ability of scientific text and technical text (Reimers & Gurevych, 2019), and promote the SoS’s research to explore the science from the fine-grained perspective. In SoS, AI-based entity extraction techniques have been applied to various tasks, advancing research in scientific knowledge mapping, bibliometric analysis, and the construction of knowledge graphs.

Researcher and institution identification: AI entity extraction techniques, such as those implemented in Sentence-BERT (Reimers & Gurevych, 2019) and SciSpaCy (Neumann, King, Beltagy, & Ammar, 2019), have been used to automatically identify and classify researchers, institutions, and research topics from vast corpora of academic papers and patents. This has facilitated the construction of large-scale knowledge graphs, which reveal the relationships and collaborations between different entities in scientific ecosystems. These graphs are then used to study collaboration networks, innovation patterns, and the evolution of research domains.

Patent and citation analysis: AI-based entity extraction models, such as BERT (Devlin, Chang, Lee, & Toutanova, 2018) and Word2Vec (Mikolov, Chen, Corrado, & Dean, 2013), are commonly employed in patent and citation databases to extract key entities like inventors, patents, journals, and disciplines. This allows researchers to analyze citation trends, measure scientific influence, and explore how knowledge flows between disciplines through patents and scholarly work.

Construction of fine-grained knowledge units: Since the introduction of models like LDA (Blei, Ng, & Jordan, 2003), and later enhanced by BERT-based models, entity extraction has been crucial for extracting fine-grained knowledge units from scientific texts. These models have improved the ability to identify technical terms, research outcomes, and domain-specific entities at a more granular level, advancing the study of micro-units of knowledge in SoS. Moreover, models specifically fine-tuned for scientific documents, such as SciBERT (Beltagy, Lo, & Cohan, 2019), SPECTER (Cohan, Feldman, Beltagy, Downey, & Weld, 2020), and SPECTER2 (Singh, D’Arcy, Cohan, Downey, & Feldman, 2022), have achieved high performance in tasks closely related to SoS. SciBERT significantly enhances the extraction and interpretation of scientific terms and named entities, making it highly effective for domain-specific knowledge tasks. SPECTER and SPECTER2, designed for document-level representation using citation-informed transformers, are particularly adept at capturing relationships between scientific documents based on citation networks, aiding in the construction of accurate, large-scale knowledge graphs that reveal complex interconnections within scientific research.

Biomedical and clinical research: In the context of biomedical research, entity extraction tools like Med7 (Kormilitzin, Vaci, Liu, & Nevado-Holgado, 2021) have been utilized to extract clinical entities (e.g., diseases, treatments) from scientific publications and electronic health records. These extracted entities are used to map scientific advancements in medicine, track the development of treatments, and understand trends in healthcare innovation.

AI in image recognition

AI has significantly advanced the detection of image-related issues in scientific publishing, particularly in biomedical research, ensuring data integrity and ethical compliance. Convolutional Neural Networks (CNNs), such as AlexNet, GoogleNet, ResNet, R-CNN and FCNN models are widely used to identify duplicated or manipulated images by analyzing features like contrast, brightness, and structural patterns (Yu, Yang, Zhang, Armstrong, & Deen, 2021). Similarly, Generative Adversarial Networks (GANs) and anomaly detection models are applied to recognize subtle image manipulations, such as artificial enhancements or duplicated sections, potentially leading to research misinterpretation (X. Liu et al., 2021). Further, algorithm like PCA (Plehiers et al., 2020), t-SNE (Schubert & Gertz, 2017), DBSCAN (Çelik, Dadaşer-Çelik, & Dokuz, 2011) or LSTM (Fernando, Denman, Sridharan, & Fookes, 2018) are usually used for anomaly detection, which ensured data authenticity in biomedical datasets. Additionally, Imagetwin (https://imagetwin.ai/) and Proofig (https://www.proofig.com/), are examples of tools that utilize AI to detect potential image anomalies, offering support to journals, publishers, and institutions in maintaining research integrity. In summary, the integration of AI in detecting image-related issues has become a powerful tool in SoS, reinforcing the accuracy and ethical standards of scientific research. Through advanced image recognition models and dedicated software, researchers and publishers are better equipped to monitor and verify visual data, contributing to the trustworthiness and quality of scientific knowledge.

AI as quant in SoS

AI’s prowess in computational methods significantly supplements human capabilities, particularly when dealing with immense datasets and intricate relationships. It bridges the gap between human limitations and the demands of SoS research, enhancing both predictability and interpretability.

Predictive algorithms

AI has become an influential tool in predicting various aspects of scientific careers, research team performance, and research project outcomes. By utilizing machine learning algorithms and data analytics, AI provides insights that were previously difficult or impossible to obtain through traditional methods.

Predicting citation performance and citing behavior: AI applications in citing behavior research involve analyzing citation patterns through machine learning (T. Zeng & Acuna, 2020)and natural language processing to uncover insights about scholarly communication. By utilizing algorithms such as neural networks and regression models, researchers can predict citation counts based on various factors, including publication attributes and author influence (Akella, Alhoori, Kondamudi, Freeman, & Zhou, 2021). Studies have shown that these models can effectively identify biases in citation practices and the impact of collaboration on citation dynamics (Iqbal et al., 2021). Such insights enhance our understanding of knowledge dissemination and the complex networks that shape academic impact.

Predicting scientists’ career trajectories: AI can model and predict the career patterns of scientists by analyzing their publications, citations, collaborations, and affiliations. Predictive algorithms such as decision trees, support vector machines (SVMs), and neural networks are employed to identify key factors influencing scientific productivity and career progression (Edwards, Acheson-Field, Rennane, & Zaber, 2023; Musso, Hernández, & Cascallar, 2020; Yang, Chawla, & Uzzi, 2019). For example, factors like the number of publications, the diversity of collaborations, and funding history can be fed into these algorithms to forecast future career success or the likelihood of achieving high-impact research outcomes (L. Liu, Dehmamy, Chown, Giles, & Wang, 2021; L. Liu et al., 2018; Wang et al., 2019).

Predicting research team performance: AI also plays a crucial role in evaluating and predicting the performance of research teams. By analyzing team composition, collaboration patterns, and historical project outcomes, predictive models can estimate the future productivity and success of a research team (Ghawi, Müller, & Pfeffer, 2021; Giannakas, Troussas, Krouska, Sgouropoulou, & Voyiatzis, 2022). Factors such as team size, team hierarchy, team distance, diversity in expertise, and the centrality of team members within their academic networks have been shown to correlate with team performance (Y. Lin, Frey, & Wu, 2023; Wu et al., 2019; Xu, Wu, & Evans, 2022).

Predicting research project outcomes: AI can significantly enhance the prediction of research project performance by analyzing multiple dimensions of project execution and outcomes. Through machine learning and data-driven models, AI can evaluate project success across various performance indicators, such as time management, resource allocation, collaboration dynamics, and the achievement of project goals (Yoo, Jung, & Jun, 2023). AI-powered predictive models often rely on historical data, such as past project performance, team expertise, funding amounts, and institutional resources, to make projections. These models can identify trends and patterns that forecast whether a project is likely to meet its objectives. The use of AI in predictive analytics offers insights into several key performance areas: timeline and milestones (Kim & Jang, 2024), budget and resource allocation (Jiang, Fan, Zhang, & Zhu, 2023), innovation and output (Gao, Wen, & Deng, 2022), team collaboration and productivity (Tohalino & Amancio, 2022; Yoo et al., 2023).

Machine learning for explainability

The construction of relationships in SoS spans from the investigation of extrinsic connections to the elucidation of intrinsic mechanisms. Examining the interconnections between elements within a scientific system has consistently been a pivotal aspect of SoS. During the sociological analysis phase, scientific inquiry typically employs deductive reasoning to establish relationships among elements. Regression and correlation analyses are frequently utilized to decipher the associations between elements in the realm of scientometrics. Since 2010, SoS research has leveraged causal inference models from econometrics, such as Difference in Differences, Regression Discontinuity Design, Granger Causality Test, Propensity Score Matching, and Instrumental Variables (Zhao et al., 2020), along with machine learning approaches like BART (Prado, Moral, & Parnell, 2021), TMLE (Schuler & Rose, 2017), and causal forest models (Wager & Athey, 2018) to delve into intrinsic causal mechanisms.

Machine learning methods play a crucial role in addressing the causal inference demands of large-scale and high-dimensional data. These methods are adept at processing extensive datasets, predicting nonlinear and complex relationships between causes and effects, and enhancing the precision of causal inferences. Furthermore, languages such as Python and R offer a plethora of causal inference packages, including Random Forest, XGBoost, and Super Learner, which facilitate intelligent inference of the intrinsic mechanisms governing scientific system elements within SoS research (Athey, Tibshirani, & Wager, 2019; Hill, Linero, & Murray, 2020). The application of machine learning-based causal inference is instrumental in unraveling the intricacies of complex relationships, providing deeper insights and more accurate predictions in the field of SoS research.

AI as arbiter in SoS

AI plays a significant role in the peer review process within SoS, offering the potential to enhance efficiency, reduce bias, and improve transparency. Generally speaking, the role of AI as an arbiter in peer review can be categorized into four types.

Automation of routine tasks: AI can automate many routine aspects of peer review, such as format checks, plagiarism detection, and manuscript matching. For example, tools like iThenticate help in detecting plagiarism, and Penelope.ai checks manuscript formatting (Kankanhalli, 2024). This helps to speed up the process and reduces the workload of human reviewers, allowing them to focus on more complex tasks like evaluating scientific rigor and originality. Similarly, StatReviewer and Statcheck assist with checking the statistical methods reported in papers (Nuijten & Polanin, 2020; Shanahan, 2016).

Reviewer-manuscript matching: AI tools, such as the Toronto Paper Matching System (TPMS), use machine learning algorithms to match reviewers with appropriate expertise to manuscripts (Charlin & Zemel, 2013). This process is more efficient than traditional keyword-based matching and ensures that the most suitable reviewers are selected (Kalmukov, 2020). Similarly, machine learning models have been employed by large conferences like NeurIPS to improve the accuracy of reviewer-paper matching (Kankanhalli, 2024; S. Price & Flach, 2017). By improving the quality of reviewer selection, AI helps to increase the quality and relevance of peer review feedback.

Bias mitigation: AI has the potential to reduce biases that are inherent in human peer review. Human reviewers may unintentionally introduce biases based on the author’s institution, country, or research field (Thelwall & Kousha, 2023). AI systems trained on diverse datasets can mitigate some of these biases by evaluating manuscripts based on objective criteria like novelty or methodological soundness. Recent studies have shown that AI-based tools can help assess articles more objectively, although there remain challenges in fully eliminating biases from AI algorithms (Checco, Bracciale, Loreti, Pinfield, & Bianchi, 2021).

Challenges and ethical concerns: Despite its benefits, AI in peer review also introduces concerns, particularly related to transparency and algorithmic bias. AI algorithms, especially those based on deep learning, are often “black boxes,” making it difficult for users to understand how decisions are made (Thelwall & Kousha, 2023). This raises concerns about fairness and accountability in the peer review process. Recent work has called for the development of more explainable AI models to enhance the transparency of decisions made by these systems (A & R, 2023; Vilone & Longo, 2020).

Conclusion and future work

Since the beginning of the 20th century, when the term “science of science” was introduced in Poland, SoS has witnessed over 100 years of development. From qualitative analyses of science’s social functions to large-scale quantitative approaches, such as scientometrics, the field has matured through the use of statistical methods, citation analysis, and big data. AI’s role in SoS, acting as a surrogate for data processing, a quant for data analysis, and an arbiter for peer review, marks a transformative moment in the discipline. By integrating AI into SoS, researchers can analyze vast datasets with greater precision, predict project performance, and derive actionable insights into scientific productivity and collaboration. Moving forward, AI’s capabilities will continue to reshape SoS, offering new opportunities to understand the dynamics of science and address societal challenges.

Looking into the future, after several research paradigm shifts and enlightenment, SoS’s research will open a new era of AI for SoS, facing the strategic needs and practical problems of scientific development, focusing on the political, economic, cultural, societal and other dimensions of science. First of all, it is necessary to strengthen the sense of disciplinary community in SoS, enhance the construction of disciplinary infrastructure, and cohesion of disciplinary core strength. Secondly, SoS’s research must be based on the “problem-oriented” items, especially those related to sustainable development and scientific governance. Lastly, researchers should go out of the “ivory tower” to convey valuable knowledge to the public, to expand the general public’s understanding of SoS.