Learning to Solve Orienteering Problem with Time Windows and Variable Profits

Songqun Gao^1,3, Zanxi Ruan⁴, Patrick Floor⁵, Marco Roveri^2,3, Luigi Palopoli^2,3, Daniele Fontanelli^1,3

¹Department of Industrial Engineering, Università di Trento
²Department of Information Engineering and Computer Science, Università di Trento
³Interdepartmental Robotics Labs (IDRA), Università di Trento
⁴Department of Engineering for Innovation Medicine, Università di Verona
⁵Pipple, Eindhoven, Netherlands

Paper GitHub

Poster at ICLR 2026 🎉

Overview of the DeCoST approach: It decouples routing and service-time decisions for OPTWVP with a two-stage pipeline: a parallel decoder predicts paths and initial service times, then an STO algorithm to global optimality, improving solution quality and inference speed. The upper part shows the two-stage optimization process, while the lower part details the STO algorithm.

Abstract

The orienteering problem with time windows and variable profits (OPTWVP) is common in many real-world applications and involves continuous time variables. Current approaches fail to develop an efficient solver for this orienteering problem variant with discrete and continuous variables. In this paper, we propose a learning-based two-stage DEcoupled discrete-Continuous optimization with Service-time-guided Trajectory (DeCoST), which aims to effectively decouple the discrete and continuous decision variables in the OPTWVP problem, while enabling efficient and learnable coordination between them. In the first stage, a parallel decoding structure is employed to predict the path and the initial service time allocation. The second stage optimizes the service times through a linear programming (LP) formulation and provides a long-horizon learning of structure estimation. We rigorously prove the global optimality of the second-stage solution. Experiments on OPTWVP instances demonstrate that DeCoST outperforms both state-of-the-art constructive solvers and the latest meta-heuristic algorithms in terms of solution quality and computational efficiency, achieving up to 6.6x inference speedup on instances with fewer than 500 nodes. Moreover, the proposed framework is compatible with various constructive solvers and consistently enhances the solution quality for OPTWVP.

Motivation

Real-world routing tasks such as inspection, logistics, and robotic missions often involve time windows and profits that depend on how long you serve each location. This creates a tight coupling between discrete routing choices (which nodes to visit, in what order) and continuous service-time allocation (how long to stay), making the OPTWVP far more challenging than classical orienteering.

The figure illustrates the challenge faced by NCO methods in allocating hybrid discrete-continuous decisions and capturing continuous-variable representations. Existing methods either decouple routing and service times too loosely or rely on heavy meta-heuristics, which scale poorly and can miss good solutions. Learning-based solvers speed up inference but often fix or heuristically tune service times, losing optimality. This motivates a decoupled approach that keeps routing fast while refining service times globally.

Problem Formulation

We model OPTWVP on a complete graph with a depot and tasks. Each task has a time window and earns profit proportional to its service time; travel times are given. A route starts/ends at the depot and must fit within a total time budget. Specifically, OPTWVP combines three key aspects:

(OP) Time limit on the route.
(VP) Profit increases with service time.
(TW) Each task has a feasible time window.

The decisions are the visit order and the service time at each visited node. The objective is to maximize total profit under time-window and budget constraints.

Contributions

We propose DeCoST, a learning-based two-stage framework for OPTWVP that decouples discrete routing and continuous service-time allocation while coordinating them effectively.
We introduce a parallel decoder with cross-stage feedback (pTAR) and an LP-based service-time optimization (STO) stage that yields global optimality for fixed routes and improves long-horizon structure estimation.
Extensive experiments show that DeCoST outperforms state-of-the-art NCO methods and recent meta-heuristics in both solution quality and computational efficiency.

Experimental Results

We evaluate DeCoST on extended OPTWVP against representative heuristics and NCO baselines. DeCoST delivers consistently strong solution quality across scales and time windows, achieving the best or near-best Scores and Gaps while remaining efficient. Adding STO markedly improves NCO methods; for instance, GFACS+STO reduces the gap from 13.4% to 3.38% at n=100, TW=100, albeit with higher runtime due to its non-autoregressive decoding and reduced batching. STO is applied only to NCO models for fairness, since heuristics already embed service-time optimization. Compared with ILS, DeCoST matches or exceeds quality with 20-45x faster inference.

Performance comparison on OPTWVP under different problem sizes and time window settings. **Bold** values denote the best performance in each setting, and underlined values denote the second-best. Category C refers to NCO methods, and H refers to heuristic baselines. The default unit for runtime is milliseconds (ms).
Cat.	Method	n = 50, TW = 100			n = 100, TW = 100
Cat.	Method	Score ↑	Gap ↓	Runtime ↓	Score ↑	Gap ↓	Runtime ↓
--	B&C	15.2	0.00%	200	26.1	0.00%	1023
H	Greedy-PRS	12.9	15.7%	44	21.9	16.2%	48
H	ILS	14.5	4.34%	2109	24.9	4.2%	7122
C	GFACS (Greedy)	11.1	25.7%	94.2	22.6	13.4%	98.2
	GFACS	12.3	18.6%	4310	25.2	3.38%	6760
	POMO	11.3	25.3%	46.6	11.5	55.7%	52.3
	DeCoST (Ours)	15.1	1.06%	94.0	25.6	1.97%	158
Cat.	Method	n = 50, TW = 500			n = 100, TW = 500
Cat.	Method	Score ↑	Gap ↓	Runtime ↓	Score ↑	Gap ↓	Runtime ↓
--	B&C	51.9	0.00%	1346s	--	--	>24h
H	Greedy-PRS	46.1	11.4%	46.1	79.7	--	51
H	ILS	47.8	7.82%	1262	85.0	--	8441
C	GFACS (Greedy)	35.21	31.7%	89.8	60.5	--	97.8
	GFACS	47.4	8.57%	6950	84.4	--	8750
	POMO	31.3	39.6%	40.7	54.6	--	60.6
	DeCoST (Ours)	51.4	0.83%	60.4	88.1	--	100

Acknowledgements

Co-funded by the European Union under the EU - HE Magician – Grant Agreement 101120731. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Commission. Neither the European Union nor the granting authority can be held responsible for them.

Citation

            @inproceedings{gao2026decost,
              title={Learning to Solve Orienteering Problem with Time Windows and Variable Profits},
              author={Songqun Gao and Zanxi Ruan and Patrick Floor and Marco Roveri and Luigi Palopoli and Daniele Fontanelli},
              booktitle={The Fourteenth International Conference on Learning Representations},
              year={2026},
          }