Research on collaborative scheduling strategies of multi-agent agricultural machinery groups

Research on collaborative scheduling strategies of multi-agent agricultural machinery groups

Introduction

With the continuous improvement of China’s agricultural modernization level, the traditional decentralized scheduling mode for agricultural machinery has become inadequate for current agricultural production needs due to its lack of a global perspective and effective cost control in scheduling costs1. The agricultural machinery industry gradually is increasingly exhibiting a pattern of resource mismatch, characterized by “organic households having no work to do, while non-organic households have work but no machines available”2. Currently, inter-area machine harvesting has emerged as a relatively mature scheduling model, with harvesting operations expanding from wheat to corn and rice, demonstrating significant vitality. In January 2022, the Ministry of Agriculture and Rural Affairs issued the “14th Five-Year Plan” National Agricultural Mechanization Development Plan, which encourages the promotion of intelligent upgrading and transformation of agricultural machinery, and accelerates the popularization and application of intelligent agricultural machinery3. As the number of agricultural machines increases and the process of intelligentization accelerates, a key issue that must be addressed is how to scientifically plan and schedule the operational paths of the harvesters in order to improve agricultural production efficiency, reduce costs, and maximize the utilization of harvester resources.

Currently, traditional heuristic algorithms such as the Genetic Algorithm (GA)4, Simulated Annealing Algorithm (SA)5, and Ant Colony Optimization Algorithm (ACO)6 have been widely employed to address the path planning challenges associated with agricultural machinery operations.Haixiang Guo et al.7 proposed a single dispatch center path problem and used hybrid heuristic algorithm, specifically a hybrid genetic algorithm and a hybrid ant colony algorithm to develop a reasonable and effective path planning scheme.MATEI et al.8 expanded the Vehicle Routing Problem (VRP) by designing and implementing a hybrid heuristic model that integrates a genetic algorithm, a local search mechanism, and a migration policy to solve the single dispatch center path planning issue. Zhou Qian et al.9 introduced a genetic TABU search algorithm that combines TABU search with genetic operators to solve the problem of regional scheduling of vehicles with a single dispatch center.Zhou Xiancheng et al.10 proposed an improved ant colony algorithm targeting the green vehicle path problem across multiple dispatch centers, and with experimental results indicating that the algorithm can effectively plan the distribution path of vehicles while significantly reducing the total logistics and distribution costs.Wei Ming et al.11 developed a model for the collaborative planning of UAV transport routes and trajectories in a multi-order dispatch center for logistics and distribution, optimizing the best UAV transport routes, operational time and trajectory scheme.Chen Yanyi et al.12 proposed an improved ant colony-genetic algorithm for addressing the path planning problem of fresh food distribution across multiple dispatch centers.The experimental results demonstrate that this algorithm is more efficient than traditional algorithms, effectively alleviating the conflict between the timeliness of distribution and the uncertainty of distribution time.MEH et al.13 introduced a variable contraindicated neighborhood search algorithm that incorporates contraindicated moves during the jitter phase of the variable neighborhood search algorithm to tackle the Multi-Depot Vehicle Routing Problem (MDVRP). Gu et al.14 proposed a method to solve the single dispatch center vehicle path problem using an artificial bee colony algorithm.This approach reduces the MDVRP to a single dispatch center problem by clustering the dispatch centers, modifies the artificial bee colony algorithm to generate solutions for each dispatch center, and proposes a co-evolutionary strategy for generating a comprehensive MDVRP solution.Qui Hao et al.15 addressed a three-stage optimization algorithm employing an ant colony algorithm for a multi-dispatch center vehicle path problem, incorporating three-dimensional loading constraints with the aim of minimizing the distance traveled by the vehicles.

With the rapid development of machine learning technology, Deep Reinforcement Learning (DRL)16 has been extensively applied to the path planning problem, offering a novel perspective and solution for addressing the operational path planning challenges faced by multiple harvesters and dispatch centers in complex environments.DRL effectively integrates the robust perceptual capabilities of Deep Learning with the efficient decision-making mechanisms of Reinforcement Learning, demonstrating exceptional performance in managing complex tasks and environments. Ge Bin et al.17 proposed an end-to-end deep reinforcement learning network architecture to address the vehicle path planning problem with a single dispatch center. They designed an edge aggregation graph attention network encoder and a multi-head attention decoder.The experimental results demonstrated the effectiveness and superiority of their framework in solving this type of problem.Similarly, Jiang Ming et al.18 introduced an end-to-end deep reinforcement learning method based on a multi-pointer Transformer.This model showcases its superiority by enhancing both the encoder and decoder.When compared to existing heuristic algorithms and other deep learning methods, their algorithm maintains a rapid solving capability within a small optimal gap, effectively addressing the vehicle path problem for a single dispatch center. Huang Yan et al.19 applied deep reinforcement learning algorithms to a vehicle path planning scenario at a single dispatch center, aiming to identify the shortest path by leveraging the rewards obtained from the interaction of intelligent agents with the environment, thereby addressing the vehicle path problem in an end-to-end manner. Chen et al.20 proposed a deep reinforcement learning based encoder-decoder framework for tackling the hybrid delivery and pickup vehicle routing problem within a single dispatch center. This framework employs a Graph Neural Network (GNN) as the encoder structure to extract instance features, while a decoder with an attention mechanism translates the detailed routes into sequences, ultimately yielding high-quality solution. Arishi et al.21 introduced an algorithm based on a novel multi-intelligence deep reinforcement model to solve the multi-dispatch center problem. They evaluated the performance of this approach through extensive experiments, demonstrating that the model could generate high-quality solutions in real-world environments. Lei Kun et al.22 prensented an end-to-end deep reinforcement learning framework designed to enhance the efficiency of solving the vehicle path problem involving multiple dispatch centers. This framework utilizes a Transformer-based decoder combined with an attention mechanism, and its feasibility and effectiveness were verified using both randomly generated arithmetic cases and publicly available standard arithmetic cases. Li et al.23 employed a multi-head attention (MHA)24 mechanism within a multi-dispatch center scenario to integrate various types of embeddings from both the network and the environmental state at each step.This approach facilitated the accurate selection of the next accessed node and the construction of the path. Additionally, they introduced a dispatch center rotation enhancement method to optimize the decoding process. Wang Wanliang et al.25 developed a policy network comprising multiple intelligences that utilized both single-head attention (SHA) and multi-head attention mechanisms.They implemented a policy gradient algorithm to derive an efficient solution. while incorporating the 2-opt26 local search technique alongside a sampling search technique to optimize the multi-dispatch center vehicle path problem. Yu et al.27 proposed an enhancement to improve the pointer network by simplifying the encoder based on recurrent neural networks, thereby offering a more efficient solution to the vehicle path planning challenge in multiple dispatch centers.

In summary, while deep reinforcement learning algorithms demonstrate strong performance in addressing path planning issues, there is a notable gap in research concerning the multi-harvester path planning problem involving multiple scheduling centers within the agricultural sector.Therefore, this paper aims to investigate a collaborative job scheduling scheme for multi-machines across multiple scheduling centers., utilizing a deep reinforcement learning approach. This will be achieved by developing an encoder-decoder architectural model integrated with a multi-head attention mechanism, analyzing various types of data from the farmland, and designing an enhanced REINFORCE algorithm25 for model training.

Problem and model

Problem description

This section describes the problem of multiple harvester path planning involving multiple dispatch centers. Specifically, there are n harvester dispatch centers located within a defined region, each of which has k harvesters to provide operation services for m pieces of farmland distributed in different locations.The dispatch center must take into account several factors, including the performance of the harvester, the geographic locations of the farmland, the size of the farmland plots, and other relevant considerations, in order to effectively plan the operation path for the harvester.The objective is to ensure that all farmland is cultivated while minimizing scheduling costs.The following assumptions are made to facilitate the study of multi-dispatch center job path planning.

  1. (1)

    Each harvester has the uniform performance and travels at a constant speed.

  2. (2)

    The number of harvesters assigned to each dispatch center does not exceed its capacity limit.

  3. (3)

    Each harvester departs from the dispatch center and returns upon completing its operation.

  4. (4)

    The effective operating hours of the harvester must not exceed the maximum allowable effective operating hours.

A schematic diagram illustrating the multi-harvester path planning problem with multiple dispatch centers is presented in Fig. 1.

Fig. 1
Research on collaborative scheduling strategies of multi-agent agricultural machinery groups

Schematic diagram of the multi-harvester path planning problem with multiple dispatch centers.

Full size image

Mathematical model for the multi-dispatch-center multi-harvester path planning problem

To formulate the multi-harvester path planning problem with multiple dispatch centers, the relevant variables are defined in this section as follows.

The dispatch center collection is (Ac={{h}_{1},{h}_{2},…,{h}_{i}},i epsilon [1,n]),The collection of farm work sites is (:X={{x}_{n+1},{x}_{n+2},…,{x}_{j}},jepsilon[n+1,n+m]),The harvester dispatch center and farm operation points are merged into a collection of dispatch nodes denoted as (:Y={{s}_{e},{d}_{e}},{s}_{e}={{u}_{e},{v}_{e}},eepsilon[1,n+m]).(:{s}_{e}) denotes the coordinates of the scheduling node, and (:{d}_{e}) denotes the operation time required for the harvester to complete the job task of scheduling node e.The total harvester fleet collection is (:COHC={{hc}_{11},{hc}_{12},…,{hc}_{ir}}), (:iepsilon[1,n],repsilon[1,k]), (:{hc}_{ir}) denotes the r-th harvester in the i-th dispatch center.The collection of effective operating hours of the harvester fleet is (:U={{u}_{11},{u}_{12},…,{u}_{ir}}.iepsilon[1,n].repsilon[1,k]),(:{u}_{ir}) denotes the effective length of time required for the r-th harvester at the i-th dispatch center to complete the operation task.The main parameters are defined as shown in Table 1.

Table 1 Parameter definition table.
Full size table

The mathematical model for the multi-harvester path planning problem, involving multiple dispatch centers is developed with the objective of minimizing scheduling costs.The objective function is defined as follows.

$$:{min}Dcost=:sumlimits_{i=1}^{n}sumlimits_{r=1}^{k}sumlimits_{o=1}^{n+m}sumlimits_{p=1}^{n+m}dis(o,p)times:{D}_{opr}times:{T}_{{c}^{{prime:}}}$$
(1)

The constraints are as follows.

$$mathop sum limits_{{o = 1}}^{{n + m}} D_{{opr}} = ~mathop sum limits_{{p = 1}}^{{n + m}} D_{{por}} ,forall pepsilon left( {Ac cup X} right),repsilon k$$
(2)
$$mathop sum limits_{{o = 1}}^{{n + m}} mathop sum limits_{{p = 1}}^{{n + m}} ~mathop sum limits_{{r = 1}}^{k} D_{{opr}} ~ = ~1,forall oepsilon left( {Ac cup X} right)$$
(3)
$$sumlimits_{i=1}^{n}sumlimits_{r=1}^{k}:{t}_{ir}:le::{t}_{max}$$
(4)

Equation (1) represents the minimum dispatch cost as an objective function, which is solely dependent on the transfer distance between harvester plots, and (:{D}_{opr}:text{t}text{a}text{k}text{e}text{s}:text{t}text{h}text{e}:text{v}text{a}text{l}text{u}text{e}:1:text{w}text{h}text{e}text{n}:text{t}text{h}text{e}:text{t}text{a}text{s}text{k}:text{a}text{t}:text{a}text{s}text{s}text{i}text{g}text{n}text{m}text{e}text{n}text{t}:text{p}text{o}text{i}text{n}text{t}:p:text{i}text{s}:text{c}text{o}text{m}text{p}text{l}text{e}text{t}text{e}text{d},:text{a}text{n}text{d}:0:text{o}text{t}text{h}text{e}text{r}text{w}text{i}text{s}text{e}), Eq. (2) indicates that each harvester commences its operation from the dispatch center and returns to the dispatch center upon task completion, Eq. (3) indicates that only one harvester is permitted to operate on each farmland plot. Equation (4) indicates that the actual operating hours of each harvester must not exceed the maximum effective operating hours.

Algorithm description

This section addresses the multi-harvester path planning problem with multiple dispatch centers through the following steps. (1) modeling the problem as a Markov Decision Process (MDP)19 and constructing a deep reinforcement learning environment. (2) developing a decision network based on the self-attention mechanism which utilizes the current state of the harvester to select the next operational farmland through a hybrid action selection strategy. (3) employing a reinforcement learning strategy gradient algorithm to train the network, thereby fully capturing scheduling process and deriving the optimal scheduling strategy for multiple scheduling centers. (4) applying a 2-opt local search strategy to refine the scheduling scheme output from the model, enhancing the overall the path quality. The solution flowchart for the multi-scheduling center multi-harvester path planning problem is illustrated in Fig. 2.

Fig. 2
figure 2

Flowchart for solving the multi-harvester path planning problem with multiple dispatch centers.

Full size image

Build an intensive learning environment with multiple dispatch centers and multiple harvesters

The multi-harvester path planning problem, involving multiple dispatch centers, is framed as a sequential decision-making challenge. In this process, each harvester departs from its respective dispatch center, executes the designated operational task, and records the distance between the departure point and the current farmland location, Upon completion of .the current farmland task, the scheduling center dynamically adjusts its strategy based on the harvester’s current state and intelligently determines the next farmland location to be addressed.This iterative process continues until all fields have been fully operationalized, with the objective of minimizing overall operational costs, as well as reducing unnecessary travel distances and resource consumption.

The multi-harvester path planning problem with multiple dispatch centers can be framed as a Markov Decision Process(MDP).Addressing this issue requires the construction of a reinforcement learning environment grounded in MDP framework.The environment comprises a state space S, an action space A and a reward function R.

State Space: The state is defined as (:S={{ob}_{ire}^{t},{m}_{ir}^{t}},eepsilon[1,n+m]),where (:{ob}_{ire}^{t}) represents the position information of the r-th harvester from dispatch center i at farm operation point e at time t, as well as the duration required to complete the current farm operation task.(:{u}_{ir}^{t}) represents the remaining effective operation time of the r-th harvester in dispatch center i after time t.

The action space is defined as the set of dispatch nodes at which a harvester can choose to perform a job task at moment t,(:{A}^{t}=left{{A}_{ire}^{t}:right},eepsilon[1,n+m].) Specifically, (:{A}_{ire}^{t}) represents the action of the r-th harvester, originating from the i-th dispatch center moving to the e-th dispatch node to perform a task at time t.

In reinforcement learning, the reward function is regarded as a evaluation criterion for agent’s behavior.In this paper, the reward function is defined as the negative of the total transfer distance of all harvesters, i.e.

$$::R:=:-sumlimits_{i=1}^{n}sumlimits_{r=1}^{k}sumlimits_{e}^{n+m}dis({A}_{ire}^{t},{A}_{ire}^{t+1})times:{T}_{{c}^{{prime:}}}$$
(5)

Constructing deep learning neural network models

In this paper, an encoder-decoder architecture is employed to address the multi-harvester path planning problem involving multiple scheduling centers.The encoder is tasked with embedding the set of scheduling nodes and effectively mapping them into a feature encoding that captures the relationships among scheduling centers and operational points on the farm.The decoder, in turn, selects the optimal operating farm for each harvester in the current state, utilizing the feature codes generated by the encoder alongside operational information during the scheduling process, which includes data from other scheduling centers, the status of the farm’s operational points, and the remaining effective operating hours of the harvester. The encoder-decoder model is illustrated in Fig. 3.

Fig. 3
figure 3

Encoder-decoder model. Note: Add&BN denotes Residual Connections and Batch Normalization; FF denotes feed forward layer.

Full size image

Designing the encoder

The encoder converts the set of scheduling nodes into feature vectors.In this section, the encoder is composed of an embedding layer and three independent self-attentive components, all sharing the same structure.Each self-attention component comprises of a multi-head attention layer and a feed forward (FF) layer24. To enhance network convergence, stability and generalization.Residual Connections and Batch Normalization are incorporated into both MHA and FF layers.

The encoder processes the set of scheduling nodes by utilizing the embedding layer and the self-attentive component to produce high-dimensional feature representations that incorporate contextual information from the graph node features.Initially, the embedding layer applies a linear transformation to the input scheduling node set (:{Y}_{e}), generating an initial feature vector ({h}_{e}^{left(0right)}) for the scheduling node set,(:eepsilon[1,n+m]), i.e.

$$:{h}_{e}^{left(0right)}:=:Wtimes:{Y}_{e}+b$$
(6)

where (:W) and (:b) denote the network parameters of the encoder embedding layer and the initial feature vector dimension is (:{d}_{h}=128).

The feature vector (:{h}_{e}^{left(0right)}) output from the embedding layer serves as the initial input to the self-attention component, where the multi-head attention mechanism is employed to extract feature information from the initial feature vector across various dimensions.(:{h}_{e}^{left(0right)}) is first mapped to a query vector (:{Q}_{z}), key vector (:{K}_{z}), and value vector (:{V}_{Z}) used by the encoder through a linear mapping,(:zin:[1,H]), z denotes the different heads of attention and H represents the total number of heads in the multi-head attention, H = 8, i.e.

$$:{Q}_{z}={W}^{x}times:{h}_{e}^{left(0right)}$$
(7)
$$:{K}_{z}={W}^{x}times:{h}_{e}^{left(0right)}:$$
(8)
$$:{V}_{z}={W}^{x}times:{h}_{e}^{left(0right)}$$
(9)

where (:{W}^{x}) denotes the network parameters. Perform a dotwise multiplication of (:{Q}_{z}) and (:{K}_{z}) in each attention head, scaling the result by (:sqrt{{d}_{k}}) dimensions, where (:{d}_{k}) represents the dimension of the k-vector.The outcomes of this operation are then normalized and dot-multiplied with (:{V}_{z}) using the softmax function to derive the attention vectors (:{L}_{z}) for the various attention heads.

$$:{L}_{z}=softmaxleft(frac{{Q}_{z}^{T}times:{K}_{z}}{sqrt{{d}_{{k}_{z}}}}right){V}_{z}$$
(10)

The attention vectors of the various attention heads are merged to obtain the final attention matrix L.

$$:L=concat({L}_{1}+{L}_{2}+,ldots,+{L}_{z})$$
(11)

The attention matrix L is updated after the feed-forward layer into a feature vector (:{h}_{e}^{left(1right)}) that can further represent the relationship between the set of scheduling nodes, These feature vectors are continuously updated to new feature vectors (:{h}_{e}^{left(lright)}) by subsequent encoders,(:lepsilon[1,N]), The average value of (h_{text{e}}^{left(text{l}right)}) is taken as the graph node feature (:h_{text{g}text{r}text{a}text{p}h}) that represents the contextual information.

Designing the decoder

The decoder calculates the probability of scheduling the set of nodes based on the output of the encoder. In this section, the decoder comprises an embedding layer, a multi-head attention layer and a single-head attention layer.The input to the decoder consists of two components. the output of the encoder and the set of scheduling nodes The output of the encoder, in turn, includes node embeddings (:{h}_{e}^{left(lright)})and graph node features (:{h}_{graph}).

The embedding layer of the decoder integrates the feature vector (:{h}_{e}) associated with the set of scheduling nodes that have completed the job, along with the remaining effective job duration (:{u}_{t}) of the harvester, to serve as the contextual information (:{s}_{cnt}left(tright)) for the current state.

$$:{s}_{cnt}left(tright)={W}^{{c}_{1}}times:concat({h}_{e},{u}_{t})+{b}^{{c}_{1}}$$
(12)

where (:{W}^{{c}_{1}}) and (:{b}^{{c}_{1}}) are the network parameters of the decoder embedding layer.The feature vector (:{h}_{e}) is derived through a linear mapping to the key vector (:{K}_{fixed}) and the value vector (:{V}_{fixed}) of the decoder.The query vector (:{Q}_{fixed}) of the decoder is formulated by splicing the graph node features (:{h}_{graph}) with the current context information (:{s}_{cnt}left(tright)) following a linear mapping.The attention score (:{G}_{z}) for each SHA is calculated from the query vector in conjunction with the dot product of the key vectors.The calculation process adheres to specific rules to exclude scheduling nodes that are not operable by the harvester. (1) farmland points that have already been processed by the harvester. and (2) farmland points where the time required to complete the farmland operation task exceeds the harvester’s remaining effective operation time .

$$G_{z} = left{ {begin{array}{*{20}l} {frac{{left( {Q_{{fixed}}^{z} } right)^{T} times K_{{fixed}}^{z} }}{{sqrt {d_{{k_{z} }} } }},A,collection,of,non – masked,scheduling,~nodes} hfill \ { – infty ,Masking,the,set,of,scheduling,nodes} hfill \ end{array} } right.$$
(13)

where (:{G}_{z}) denotes the attention scores for different attention heads.

The attention vectors for different attention heads are denoted as.

$$:{O}_{z}=softmax({G}_{z}$$
(14)

The final attention matrix M is denoted as.

$$:M={O}_{1}+{O}_{2}+cdots+{O}_{z}$$
(15)

The attention matrix M is mapped to the query vector (:{Q}_{sha}) of the single-head attention mechanism in the single-head attention component through the linear layer, and the final attention score is computed through the single-head attention mechanism and the result is scaled to (:[-C,C]) using the tanh function.

$$a_{z} = left{ {begin{array}{*{20}l} {tanhleft( {frac{{left( {Q_{{sha}} } right)^{T} times K_{{fixed}} }}{{sqrt {d_{{k_{{fixed}} }} } }}} right) times C,A,collection,of,non – masked,scheduling,~nodes} hfill \ { – infty ,Masking,the,set,of,scheduling,nodes} hfill \ end{array} } right.$$
(16)

Next, the probability distribution (:{p}_{i,e}) for choosing different sets of scheduling nodes is obtained by normalizing the process using the softmax function. i.e.

$$:{p}_{i,e}=softmax({a}_{z}$$
(17)

Based on the previously obtained probability distribution (:{p}_{i,e}), the set of scheduling nodes where the harvester is required to operate can be identified in accordance with the action selection policy.

Optimizing strategy networks for training

This section employs the REINFORCE algorithm with a rollback benchmark, an enhanced policy gradient algorithm for reinforcement learning that mitigates the variance of policy gradient estimation and enhances training stability.Initially, the cumulative returns from the execution of strategies by individual harvesters are computed, and these returns are subsequently utilized to calculate the strategy gradient, which is then used to update the parameter (:theta:) of the harvester’s strategy network parameter.

For a given farmland state s, the policy network (:{pi:}_{theta:}) outputs a vector (:{p}_{theta:}left({pi:}_{t}right|s)) of action probabilities for each harvester based on this state, which is selected by sampling to form a joint policy (:{pi:}_{t}=sampleleft[{p}_{theta:}right(pi:left|sright)]); The benchmark network (:{pi:}_{bl}) then outputs the joint policy (:{pi:}_{t,bl}=greedyleft[{p}_{{theta:}^{bl}}right(pi:left|sright)]) in a greedy selection manner based on the action probability vector (:{p}_{{theta:}^{bl}}left({pi:}_{t}right|s)) output by the benchmark network.Monte Carlo method is used to evaluate the expected cumulative return of the strategy (:Jleft(theta:right|s)={E}_{{p}_{theta:}left(sright)}[text{G}left({uppi:}right)]),where (:Gleft(pi:right)) is the cumulative return of the strategy (:pi:). The REINFORCE algorithm with benchmark is utilized to calculate the strategy gradient with the following formula:

$$:{nabla:}_{theta:}Jleft(theta:right|s)=-{E}_{{p}_{theta:}left(pi:right|s)}[Gleft(pi:right)-Gleft({pi:}_{bl}right)left]{nabla:}_{theta:}log{p}_{theta:}right(pi:left|sright)$$
(18)

Subsequently, the parameters of the policy network are updated utilizing the Adam optimization algorithm, which is a gradient descent-based method.

$$:theta:=Adam(theta:,{nabla:}_{theta:}J(theta:left|sright))$$
(19)

The benchmark network is utilized to evaluate the current state of the farm, denoted as s,and to determine difficulty of the task. This assessment allows the policy network to adjust and optimize its behavior more effectively. The(::{pi:}_{bl}) is updated and compared to the (:{pi:}_{theta:}) during the training process.If the output of the strategy network significantly surpasses that of the benchmark network and passes a t-test at a significance level of (:{alpha:}^{prime})(set to 0.05), a rollback update of the benchmark network, i.e.,(:{theta:}^{bl}to:theta:), is executed. This process facilitates the learning and enhancement of the overall strategy network.The specific training process is illustrated in Fig. 4.

Fig. 4
figure 4

Training flow chart.

Full size image

Action selection strategy

During the training process for multi-harvester path planning across multiple dispatch centers, the benchmark network (:{pi:}_{bl}) and the policy network (:{pi:}_{theta:}) employ distince action selection strategies.The former utilizes a greedy action selection strategy, while the latter adopts a random sampling action selection strategy.In the case of the benchmark network, the greedy action selection strategy consistently selects the action with the highest expected return at each decision step, thereby ensuring the stability and efficiency of path planning.Conversely, the policy network implements a random sampling action selection strategy, which randomly selects actions based on the action probability distribution, rather than being confined to the action with the highest probability.This approach enhances decision diversity and enables the policy network to explore a broader range of potential paths, ultimately facilitating path optimization.

In this problem, the benchmark network and the strategy network serve distinct roles. The benchmark network is utilized to evaluate the complexity of the problem, employing a greedy action selection strategy to swiftly obtain effective evaluation metrics.Specifically, the benchmark network consistently selects the action with the highest expected return in each state, thereby providing a reliable criterion for path planning. Conversely, the strategy network adopts a randomized action selection strategy to enhance its evaluation of decision-making capabilities in path planning.By randomly sampling from the action probability distribution, the strategy network can explore a broader range of potential paths, avoid local optima, continuously refine the path selection strategy, and ultimately enhance the overall planning effectiveness.

Local search strategy

To address the issue of increased path length due to crossings during harvester operation, a 2-opt local search strategy is introduced in this section.

This strategy minimizes path crossings by exchanging two edges within the path, thereby effectively reducing the distance traveled by the harvester. The strategy iteratively refines the path until it either reaches a local optimal solution or meets a predefined stopping condition. The specific operational steps are as follows.

  1. (1)

    Iterate over each path operated by the harvester.

  2. (2)

    Two edges of the current job path are randomly selected and swapped.Specifically, the two edges are removed and the remaining ends are reconnected to form a new job path.

  3. (3)

    If the length of the new path is shorter than that of the current path, update the current path, reset the number of iterations to 0, and return 1.Otherwise, increase the number of iterations and return 1.

  4. (4)

    If the maximum number of iterations is reached and the path has not improved, the 2-opt local search concludes, and the current path is returned as the optimal path.

Test results and analysis

Training data

According to the formulation of Eq. (5) in Section 2.1, the reward value is defined as the negative total transfer distance of all harvesters. This value influences the advantages and disadvantages associated with the harvester’s choice of action strategy. Consequently, as the number of training steps increases, the reward value progressively approaches an optimal state and attains a smooth equilibrium.

In the initial stage of model training, the understanding of environmental changes is insufficient, resulting in a low initial reward value that gradually increases. By the time the number of training steps reaches 150,000, the reward value begins to stabilize, indicating that the model is adapting to in the environmental changes. At 200, 000 training steps, the model demonstrates a capacity to quickly adapt to these changes and effectively plan the optimal path with the lowest scheduling cost. The trend of the reward value is illustrated in Fig. 5.

Fig. 5
figure 5

Schematic representation of the variation of reward values with the number of iterations.

Full size image

Experimental data

In this section, three agricultural machinery dispatch centers, each serving 50 farmlands, were randomly selected for experiments based on the spatial distribution data (36°–41° N, 113°–117° E) of maize cultivation in Hebei Province obtained from remote sensing images.The harvester model utilized is Ward’s semi-fed 4LB-150AA grain combine harvester, which has a fuel consumption rate of approximately 1.7 L/km.The average diesel fuel price from April to July 2024 was around 8.2 Yuan/L, resulting in the harvester’s transfer cost per kilometer being set at 14 Yuan.Due to heavy workload and time constraints during the busy farming season, the harvester often operates continuously.Considering this practical scenario, the effective operation length of the harvester is established at 24 h. To evaluate the advantages of the MCMPP-DRL algorithm proposed in this paper for the multi-dispatch center harvester path planning problem, a comparison is made between the MCMPP-DRL algorithm and other approaches, including the genetic algorithm, simulated annealing algorithm and ant colony optimization algorithm. The information regarding the selected farmland operating points is presented in Table 2.

Table 2 Information on farm operation sites.
Full size table

The farmland operation point serial numbers 0, 1, and 2 correspond to the three dispatch centers, resulting in an operation area of 0 hectares. The remaining information pertains to the farmland operation point.

Results

For simplicity, the scheduling cost results for the three scheduling centers, which include 20, 40, 50, 70, 100 and 120 pieces farmland are abbreviated as (:{MC}_{3-20}), (:{MC}_{3-40}), (:{MC}_{3-50}), (:{MC}_{3-70}), (:{MC}_{3-100}) and (:{MC}_{3-120}). The scheduling cost results for the four algorithms are presented in Table 3.

Table 3 Performance comparison of different algorithms for (:{MC}_{3-50}) cases.
Full size table

As can be seen from Table 3, the harvester operating paths of the MCMPP-DRL algorithm proposed in this paper are better than the optimal paths generated using the GA, SA, and ACO algorithms, and therefore achieve the lowest scheduling cost.

Discussion

In order to exclude special experimental data and to further verify the applicability of the algorithm, farmlands of different sizes were selected for comparative analysis in the corn-growing area of Hebei Province. For the three dispatch centers, 20 farmlands, 40 farmlands, 50 farmlands, 70 farmlands, 100 farmlands, and 120 farmlands were selected for the experiment, and five different groups of farm operation sites were selected based on six different sizes of farmlands. Each algorithm was run 10 times and the set of results with the best performance in each experiment was taken out to compare the cost of harvester scheduling under different farmland sizes. The experimental results are presented in Table 4.

Table 4 Scheduling cost results for different algorithms with different number of farmlands.
Full size table

As demonstrated in Table 4, the MCMPP-DRL algorithm outperforms the heuristic algorithm iregarding solution quality for problems of sizes MC3–20, MC3–40, MC3–50, MC3–70, MC3–100 and MC3–120. Consequently, the MCMPP-DRL algorithm results in a reduction in harvester costs. Table 5 illustrates the performance improvement in harvester scheduling costs when using the MCMPP-DRL algorithm compared to the heuristic algorithm.

Table 5 Comparison of scheduling costs of different algorithms for different number of farmlands.
Full size table

As illustrated in Table 5, the optimization of the MCMPP-DRL algorithm regarding scheduling costs varies in effectiveness across different problem sizes when compared to the heuristic algorithm.this suggests that the MCMPP-DRL algorithm can yield superior experimental results.

Table 6 Average scheduling cost results for different algorithms with different number of farmlands.
Full size table

As illustrated in Table 6, the disparity between the average costs of each algorithm and the MCMPP-DRL algorithm becomes increasingly pronounced as the number of farmlands increases, while the number of scheduling centers remains constant. This observation further substantiates the effectiveness and rationality of the MCMPP-DRL algorithm. Additionally, Table 7 presents the performance improvement in the average scheduling cost of harvesters when calculated using the MCMPP-DRL algorithm in comparison to the heuristic algorithm.

Table 7 Comparison of average scheduling costs of different algorithms for different number of farmlands.
Full size table

Tables 6 and 7 further analyze the performance optimization of the MCMPP-DRL algorithm in comparison to the heuristic algorithms, specifically as the number of farmlands varies in the context of three scheduling centers. To more intuitively illustrate the advantages of the algorithm presented in this paper, the average performance of scheduling cost of the MCMPP-DRL algorithm alongside the three heuristic algorithms across different numbers of farmlands is compared and analyzed. The results of this comparison are depicted in Fig. 6 through 11.

Fig. 6
figure 6

Comparison of average cost of different algorithms for 20 farmlands.

Full size image
Fig. 7
figure 7

Comparison of average cost of different algorithms for 40 farmlands.

Full size image

Figure 6 shows that the scheduling cost of the MCMPP-DRL algorithm is significantly lower than that of the other three heuristic algorithms, demonstrating the advantages in scheduling small-scale farmland. Figure 7 shows that as the number of farmland increases, the MCMPP-DRL algorithm still maintains a low scheduling cost, while the cost of the other heuristics increases, reflecting the stability of the MCMPP-DRL algorithm. Figure 8 shows that the advantage of MCMPP-DRL algorithm is more obvious at medium-sized number of farmlands, and the scheduling cost is significantly lower than other algorithms. Figure 9 shows that the MCMPP-DRL algorithm continues to exhibit lower costs, while the heuristic algorithm has a more pronounced trend of increasing costs. Figure 10 demonstrates that the MCMPP-DRL algorithm continues to excel in cost control in large-scale farm scheduling problems, well below the other algorithms. Figure 11 shows that the MCMPP-DRL algorithm maintains the lowest scheduling cost even with a larger number of farmlands, demonstrating superiority and applicability in complex scenarios.

Fig. 8
figure 8

Comparison of average cost of different algorithms for 50 farmlands.

Full size image
Fig. 9
figure 9

Comparison of average cost of different algorithms for 70 farmlands.

Full size image
Fig. 10
figure 10

Comparison of average cost of different algorithms for 100 farmlands.

Full size image
Fig. 11
figure 11

Comparison of average cost of different algorithms for 120 farmlands.

Full size image

In summary, the MCMPP-DRL algorithm always provides a more economical scheduling solution, regardless of the variation in the number of farmlands.

Conclusion

To effectively reduce the scheduling cost of harvesters, a new algorithm MCMPP-DRL based on deep reinforcement learning is proposed.Different scheduling centers with varying numbers of farmlands in the corn planting area of Hebei Province were randomly selected for simulation experiments. The performance of the MCMPP-DRL algorithm was compared with that of genetic algorithms, simulated annealing algorithm and ant colony optimization algorithm, demonstrating significant advantages in cost reduction.The experimental indicate that the scheduling cost of the MCMPP-DRL algorithm decreases by at least 9.66% compared to the ant colony optimization algorithm, 14.34% compared to the simulated annealing algorithm, and 24.41% compared to the genetic algorithm across various problem sizes.This further substantiates the effectiveness of the proposed algorithm in minimizing scheduling costs. Additionally, this paper analyzes various influencing factors in actual harvester scheduling scenarios, such as fuel consumption and diesel prices and sets these parameters reasonably to ensure that the experimental results possess practical reference value and application prospects.

The research in this paper has the following limitations: first, due to the complexity of the scheduling scenarios, the effectiveness of the proposed method is only initially verified by small-scale practical scenarios, and experimental testing and verification of large-scale practical scenarios need to be further carried out in future work; second, this paper only investigates the co-scheduling problem of the same type of harvester, and does not involve the co-scheduling optimization of a wide range of heterogeneous harvesters. Therefore, the future research direction will focus on the validation of large-scale actual scenarios and the optimization of the cooperative scheduling of heterogeneous agricultural machines to improve the overall efficiency under complex operation scenarios.

Related Articles

The spatial coupling and its influencing mechanism between rural human-habitat heritage and key rural tourism villages in China

Exploring the influencing factors and its influencing mechanism of the spatial coupling between rural human-habitat heritage (RHH) and key rural tourism villages (RTV) at county scale from the perspective of space can expand the theoretical research on the spatial coupling mechanism between RHH and RTV, and further provide theoretical reference and data support for the coordinated development and high-quality development of RHH and RTV in China. At the same time, previous studies have failed to systematically analyze the influencing factors and influencing mechanisms of the spatial coupling between RHH and RTV at the county scale, which restricted decision makers from formulating coordinated development measures between RHH and RTV at the macro level. In this study, bivariate spatial autocorrelation model and spatial coupling coordination model were used to quantitatively analyze the spatial coupling level between RHH and RTV at the county scale in China. Then, the linear regression (OLS) model, geographically weighted regression (GWR) model, and optimal parameter GeoDetector (OPGD) model were integrated to systematically analyze the linear influencing, spatial heterogeneity effect and interactive effect of natural environment and socioeconomic factors on the spatial coupling level between RHH and RTV in China, and explore the interactive influencing mechanism. The results show that the spatial coupling level of RHH and RTV in China show a significant east-west differentiation. There were 2024, 473, 293, 55 and 6 areas of severe, moderate, mild, basic and moderate coordination between RHH and RTV, respectively. Among them, severe and moderate discoordination areas are mainly distributed in Northeast China, arid and semi-arid areas in Western China, plateau areas in Southwest China, densely populated urban agglomerations and plains agricultural areas in the Middle East China. Mild discoordination areas and basic and moderate coordination areas are mainly located in transition zones in mountainous and plain areas, economically developed mountainous and hilly counties along the southeastern coast, and coastal tourist cities. Economic and population factors are the fundamental factors that affect the spatial coupling between RHH and RTV. Rural tourism facilities and rural public service facilities are important external driving forces for the coupling development of RHH and RTV, and Sociocultural environment factors are the important internal driving forces. Different surface forms, different climate conditions and different ecological environment conditions can form different natural textures and spatial organizations. Suitable climate conditions, sufficient water sources and ecological environment conditions can form more suitable rural settlement construction conditions and production and living conditions, and ultimately affect the protection and activation of rural human settlement heritage and the development and layout of key tourist villages. The spatial coupling relationship between RHH and RTV is the result of the complex interaction between the natural directivity law caused by natural environmental factors and the humanistic directivity law caused by human social and economic activities.

Uneven diffusion: a multi-scale analysis of rural settlement evolution and its driving forces in China from 2000–2020

In recent years, the spatial and temporal patterns of rural settlement expansion in China have shifted significantly due to rapid urbanization and industrialization. This study examines rural settlement expansion in China from 2000 to 2020, using the Landscape Expansion Index (LEI) and GIS spatial analysis to assess changes in land use scale and related factors. The findings reveal that: (1) From 2000 to 2020, China saw a rapid and large-scale expansion of rural settlements, with the total area increasing by 40,322.74 km², 87.42% of which resulted from outlying expansion, indicating a clear diffusion trend. (2) The movement of rural settlements has predominantly followed a southeast–northwest axis, focusing on the middle reaches of the Yangtze River, with a clockwise rotation shift. (3) Settlement expansion has been primarily concentrated in low-elevation, waterfront, and road-adjacent areas, where GDP per capita and population density significantly influence settlement patterns. These results offer valuable insights for optimizing the spatial distribution and industrial restructuring of rural settlements, as well as for guiding rural spatial planning and industrial policy development.

Energy metabolism in health and diseases

Energy metabolism is indispensable for sustaining physiological functions in living organisms and assumes a pivotal role across physiological and pathological conditions. This review provides an extensive overview of advancements in energy metabolism research, elucidating critical pathways such as glycolysis, oxidative phosphorylation, fatty acid metabolism, and amino acid metabolism, along with their intricate regulatory mechanisms. The homeostatic balance of these processes is crucial; however, in pathological states such as neurodegenerative diseases, autoimmune disorders, and cancer, extensive metabolic reprogramming occurs, resulting in impaired glucose metabolism and mitochondrial dysfunction, which accelerate disease progression. Recent investigations into key regulatory pathways, including mechanistic target of rapamycin, sirtuins, and adenosine monophosphate-activated protein kinase, have considerably deepened our understanding of metabolic dysregulation and opened new avenues for therapeutic innovation. Emerging technologies, such as fluorescent probes, nano-biomaterials, and metabolomic analyses, promise substantial improvements in diagnostic precision. This review critically examines recent advancements and ongoing challenges in metabolism research, emphasizing its potential for precision diagnostics and personalized therapeutic interventions. Future studies should prioritize unraveling the regulatory mechanisms of energy metabolism and the dynamics of intercellular energy interactions. Integrating cutting-edge gene-editing technologies and multi-omics approaches, the development of multi-target pharmaceuticals in synergy with existing therapies such as immunotherapy and dietary interventions could enhance therapeutic efficacy. Personalized metabolic analysis is indispensable for crafting tailored treatment protocols, ultimately providing more accurate medical solutions for patients. This review aims to deepen the understanding and improve the application of energy metabolism to drive innovative diagnostic and therapeutic strategies.

Biological, dietetic and pharmacological properties of vitamin B9

Humans must obtain vitamin B9 (folate) from plant-based diet. The sources as well as the effect of food processing are discussed in detail. Industrial production, fortification and biofortification, kinetics, and physiological role in humans are described. As folate deficiency leads to several pathological states, current opinions toward prevention through fortification are discussed. Claimed risks of increased folate intake are mentioned as well as analytical ways for measurement of folate.

Facts, uncertainties, and opportunities in wheat molecular improvement

The year 2020 was a landmark year for wheat. The wheat HB4 event harboring a drought-resistant gene from sunflowers, received regulatory approval and was grown commercially in Argentina, with approval for food and feed in other countries. This, indeed, is many years after the adoption of genetic modifications in other crops. The lack of consumer acceptance and resulting trade barriers halted the commercialization of the earliest events and had a chilling effect on, especially, private Research & Development (R&D) investments. As regulations for modern breeding technologies such as genome-edited cultivars are being discussed and/or adopted across the globe, we would like to propose a framework to ensure that wheat is not left behind a second time as the potential benefits far outweigh the perceived risks. In this paper, after a review of the technical challenges wheat faces with the generation of trans- and cis-genic wheat varieties, we discuss some of the factors that could help demystify the risk/reward equation and thereby the consumer’s reluctance or acceptance of these techniques for future wheat improvement. The advent of next-generation sequencing is shedding light on natural gene transfer between species and the number of perturbations other accepted techniques like mutagenesis create. The transition from classic breeding techniques and embracing transgenic, cisgenic, and genome editing approaches feels inevitable for wheat improvement if we are to develop climate-resilient wheat varieties to feed a growing world population.

Responses

Your email address will not be published. Required fields are marked *