论文标题

使用加固学习的地下储层中随机最佳井控制

Stochastic optimal well control in subsurface reservoirs using reinforcement learning

论文作者

Dixit, Atish, ElSheikh, Ahmed H.

论文摘要

我们提出了一个无模型加固学习(RL)框架的案例研究,以解决预定义参数不确定性分布和部分可观察到的随机最佳控制。我们专注于强大的最佳井控制问题,这是地下储层管理领域的密集研究活动的主题。对于此问题,由于数据仅在井位置可用,因此可以部分观察到该系统。此外,由于可用场数据的稀疏性,模型参数高度不确定。原则上,RL算法能够学习最佳的动作策略(从状态到动作的地图),以最大程度地提高数值奖励信号。在Deep RL中,使用深神网络进行了从状态到动作的映射。在强大的最佳井控制问题的RL公式中,状态在井位的饱和度和压力值表示,而动作代表控制通过井流的阀门开口。数值奖励是指总扫描效率,不确定的模型参数是地下渗透率场。通过引入域随机化方案来处理模型参数不确定性,该方案利用群集分析其不确定性分布。我们使用两种最先进的RL算法,近端策略优化(PPO)和Advantagy Actor-Critic(A2C)提出了数值结果,这些结果是在两个地下流量测试用例上,代表了两个不同的不确定性分布分布。根据使用差分进化算法获得的优化结果对结果进行了测试。此外,我们通过评估从训练过程中未使用的参数不确定性分布中得出的未见样本中学习的控制策略来证明提议使用RL的鲁棒性。

We present a case study of model-free reinforcement learning (RL) framework to solve stochastic optimal control for a predefined parameter uncertainty distribution and partially observable system. We focus on robust optimal well control problem which is a subject of intensive research activities in the field of subsurface reservoir management. For this problem, the system is partially observed since the data is only available at well locations. Furthermore, the model parameters are highly uncertain due to sparsity of available field data. In principle, RL algorithms are capable of learning optimal action policies -- a map from states to actions -- to maximize a numerical reward signal. In deep RL, this mapping from state to action is parameterized using a deep neural network. In the RL formulation of the robust optimal well control problem, the states are represented by saturation and pressure values at well locations while the actions represent the valve openings controlling the flow through wells. The numerical reward refers to the total sweep efficiency and the uncertain model parameter is the subsurface permeability field. The model parameter uncertainties are handled by introducing a domain randomisation scheme that exploits cluster analysis on its uncertainty distribution. We present numerical results using two state-of-the-art RL algorithms, proximal policy optimization (PPO) and advantage actor-critic (A2C), on two subsurface flow test cases representing two distinct uncertainty distributions of permeability field. The results were benchmarked against optimisation results obtained using differential evolution algorithm. Furthermore, we demonstrate the robustness of the proposed use of RL by evaluating the learned control policy on unseen samples drawn from the parameter uncertainty distribution that were not used during the training process.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源