Policy-guided Q-learning algorithm for intelligent travel path planning
Policy-guided Q-learning algorithm for intelligent travel path planning
Jingya Shi
TLDR
The research results indicate that this reinforcement learning-based path planning algorithm with a strategy-guided mechanism can provide efficient, safe, and personalized path planning solutions for urban travelers, offering broad application prospects.
Abstract
In complex urban environments, traditional path planning methods have significant shortcomings in terms of safety assurance, multi-objective path optimization, and personalized travel recommendations. To address these issues, this paper proposes a reinforcement learning-based path planning algorithm with a strategy-guided mechanism and further constructs a path optimization model suitable for multi-destination travel scenarios. This method introduces a safety-aware potential field and a composite reward mechanism to guide the agent to achieve a dynamic balance between path length and safety. In the experimental section, a dataset incorporating map and urban public safety information was constructed, and 800 rounds of path learning simulation experiments were conducted. The results show that the convergence time of the proposed algorithm is 32% shorter than that of the greedy strategy-based method and 27% shorter than that of the policy enhancement method. Additionally, the average path length is reduced by over 100 m, and the safety score improves by over 14%. In multi-destination travel tests, when the number of target points was 20, the total path length was reduced by 3.00% compared to the distance matrix method and by 0.15% compared to the genetic algorithm, verifying its scalability and stability in complex scenarios. The research results indicate that this method can provide efficient, safe, and personalized path planning solutions for urban travelers, offering broad application prospects.

