Learning to Solve Orienteering Problem with Time Windows and Variable Profits
Abstract
The orienteering problem with time windows and variable profits (OPTWVP) is common in many real-world applications and involves continuous time variables. Current approaches fail to develop an efficient solver for this orienteering problem variant with discrete and continuous variables. In this paper, we propose a learning-based two-stage DEcoupled discrete-Continuous optimization with Service-time-guided Trajectory (DeCoST), which aims to effectively decouple the discrete and continuous decision variables in the OPTWVP problem, while enabling efficient and learnable coordination between them. In the first stage, a parallel decoding structure is employed to predict the path and the initial service time allocation. The second stage optimizes the service times through a linear programming (LP) formulation and provides a long-horizon learning of structure estimation. We rigorously prove the global optimality of the second-stage solution. Experiments on OPTWVP instances demonstrate that DeCoST outperforms both state-of-the-art constructive solvers and the latest meta-heuristic algorithms in terms of solution quality and computational efficiency, achieving up to 6.6x inference speedup on instances with fewer than 500 nodes. Moreover, the proposed framework is compatible with various constructive solvers and consistently enhances the solution quality for OPTWVP.
Motivation
Real-world routing tasks such as inspection, logistics, and robotic missions often involve time windows and profits that depend on how long you serve each location. This creates a tight coupling between discrete routing choices (which nodes to visit, in what order) and continuous service-time allocation (how long to stay), making the OPTWVP far more challenging than classical orienteering.
The figure illustrates the challenge faced by NCO methods in allocating hybrid discrete-continuous decisions and capturing continuous-variable representations. Existing methods either decouple routing and service times too loosely or rely on heavy meta-heuristics, which scale poorly and can miss good solutions. Learning-based solvers speed up inference but often fix or heuristically tune service times, losing optimality. This motivates a decoupled approach that keeps routing fast while refining service times globally.
Problem Formulation
We model OPTWVP on a complete graph with a depot and tasks. Each task has a time window and earns profit proportional to its service time; travel times are given. A route starts/ends at the depot and must fit within a total time budget. Specifically, OPTWVP combines three key aspects:
- (OP) Time limit on the route.
- (VP) Profit increases with service time.
- (TW) Each task has a feasible time window.
The decisions are the visit order and the service time at each visited node. The objective is to maximize total profit under time-window and budget constraints.
Contributions
- We propose DeCoST, a learning-based two-stage framework for OPTWVP that decouples discrete routing and continuous service-time allocation while coordinating them effectively.
- We introduce a parallel decoder with cross-stage feedback (pTAR) and an LP-based service-time optimization (STO) stage that yields global optimality for fixed routes and improves long-horizon structure estimation.
- Extensive experiments show that DeCoST outperforms state-of-the-art NCO methods and recent meta-heuristics in both solution quality and computational efficiency.
Experimental Results
We evaluate DeCoST on extended OPTWVP against representative heuristics and NCO baselines. DeCoST delivers consistently strong solution quality across scales and time windows, achieving the best or near-best Scores and Gaps while remaining efficient. Adding STO markedly improves NCO methods; for instance, GFACS+STO reduces the gap from 13.4% to 3.38% at n=100, TW=100, albeit with higher runtime due to its non-autoregressive decoding and reduced batching. STO is applied only to NCO models for fairness, since heuristics already embed service-time optimization. Compared with ILS, DeCoST matches or exceeds quality with 20-45x faster inference.
| Cat. | Method | n = 50, TW = 100 | n = 100, TW = 100 | ||||
|---|---|---|---|---|---|---|---|
| Score ↑ | Gap ↓ | Runtime ↓ | Score ↑ | Gap ↓ | Runtime ↓ | ||
| -- | B&C | 15.2 | 0.00% | 200 | 26.1 | 0.00% | 1023 |
| H | Greedy-PRS | 12.9 | 15.7% | 44 | 21.9 | 16.2% | 48 |
| ILS | 14.5 | 4.34% | 2109 | 24.9 | 4.2% | 7122 | |
| C | GFACS (Greedy) | 11.1 | 25.7% | 94.2 | 22.6 | 13.4% | 98.2 |
| GFACS | 12.3 | 18.6% | 4310 | 25.2 | 3.38% | 6760 | |
| POMO | 11.3 | 25.3% | 46.6 | 11.5 | 55.7% | 52.3 | |
| DeCoST (Ours) | 15.1 | 1.06% | 94.0 | 25.6 | 1.97% | 158 | |
| Cat. | Method | n = 50, TW = 500 | n = 100, TW = 500 | ||||
| Score ↑ | Gap ↓ | Runtime ↓ | Score ↑ | Gap ↓ | Runtime ↓ | ||
| -- | B&C | 51.9 | 0.00% | 1346s | -- | -- | >24h |
| H | Greedy-PRS | 46.1 | 11.4% | 46.1 | 79.7 | -- | 51 |
| ILS | 47.8 | 7.82% | 1262 | 85.0 | -- | 8441 | |
| C | GFACS (Greedy) | 35.21 | 31.7% | 89.8 | 60.5 | -- | 97.8 |
| GFACS | 47.4 | 8.57% | 6950 | 84.4 | -- | 8750 | |
| POMO | 31.3 | 39.6% | 40.7 | 54.6 | -- | 60.6 | |
| DeCoST (Ours) | 51.4 | 0.83% | 60.4 | 88.1 | -- | 100 | |
Acknowledgements
Co-funded by the European Union under the EU - HE Magician – Grant Agreement 101120731. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Commission. Neither the European Union nor the granting authority can be held responsible for them.
Citation
@inproceedings{gao2026decost,
title={Learning to Solve Orienteering Problem with Time Windows and Variable Profits},
author={Songqun Gao and Zanxi Ruan and Patrick Floor and Marco Roveri and Luigi Palopoli and Daniele Fontanelli},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
}