论文标题
随机动态线性编程:用于多阶段随机线性编程的顺序采样算法
Stochastic Dynamic Linear Programming: A Sequential Sampling Algorithm for Multistage Stochastic Linear Programming
论文作者
论文摘要
多阶段随机编程涉及操作和计划问题,这些问题涉及一系列决策,同时响应不确定的实现。旨在解决多阶段随机线性编程(MSLP)问题的算法通常依赖于场景树来表示潜在的随机过程。当此过程表现出阶段独立性时,基于抽样的技术,尤其是随机双动力学编程(SDDP)算法,已经获得了广泛的接受。但是,这些基于抽样的方法仍然具有使用所谓样本平均近似值的问题的确定性表示。在这项工作中,我们为MSLP问题提供了一种顺序抽样方法,该方法使决策过程可以递归地吸收新采样的数据。我们将此方法称为随机动态线性编程(SDLP)算法。由于我们使用顺序采样,因此该算法不必通过方案树或样本平均近似值对不确定性的先验表示,这两者都需要对基础分布的知识/估计。在这方面,SDLP是解决MSLP问题的顺序采样方法。该方法构成了两阶段SLP模型的随机分解(SD)的概括。我们在非末端阶段采用二次正则化来优化问题。此外,我们介绍了提供分段式解决方案发现方案的基本可行策略的概念,该方案嵌入了优化算法中,以识别用于正则化的现有解决方案。最后,我们表明,SDLP算法沿一系列状态轨迹提供了一系列决策和相应的价值函数估计,这些轨迹渐近地收敛到其最佳对应物,并具有概率。
Multistage stochastic programming deals with operational and planning problems that involve a sequence of decisions over time while responding to realizations that are uncertain. Algorithms designed to address multistage stochastic linear programming (MSLP) problems often rely upon scenario trees to represent the underlying stochastic process. When this process exhibits stagewise independence, sampling-based techniques, particularly the stochastic dual dynamic programming (SDDP) algorithm, have received wide acceptance. However, these sampling-based methods still operate with a deterministic representation of the problem that uses the so-called sample average approximation. In this work, we present a sequential sampling approach for MSLP problems that allows the decision process to assimilate newly sampled data recursively. We refer to this method as the stochastic dynamic linear programming (SDLP) algorithm. Since we use sequential sampling, the algorithm does not necessitate a priori representation of uncertainty, either through a scenario tree or sample average approximation, both of which require a knowledge/estimation of the underlying distribution. In this regard, SDLP is a sequential sampling approach to address MSLP problems. This method constitutes a generalization of the Stochastic Decomposition (SD) for two-stage SLP models. We employ quadratic regularization for optimization problems in the non-terminal stages. Furthermore, we introduce the notion of basic feasible policies which provide a piecewise-affine solution discovery scheme, that is embedded within the optimization algorithm to identify incumbent solutions used for regularization. Finally, we show that the SDLP algorithm provides a sequence of decisions and corresponding value function estimates along a sequence of state trajectories that asymptotically converge to their optimal counterparts, with probability one.