部分可观察到的最低年龄计划：贪婪政策

论文标题

部分可观察到的最低年龄计划：贪婪政策

Partially Observable Minimum-Age Scheduling: The Greedy Policy

论文作者

Shao, Yulin, Cao, Qi, Liew, Soung Chang, Chen, He

论文摘要

本文在无线传感器网络中研究了最低年龄调度问题，其中访问点（AP）通过一组传感器监视对象的状态。通过信息年龄（AOI）测量的感应状态的新鲜度在不同的传感器上有所不同，并且不直接观察到AP。 AP必须决定要查询哪种传感器/示例，以获取对象的最新状态信息（即带有最低AOI的状态信息）。在本文中，我们将最低年龄的调度问题制定为具有部分可观察到的武器的多臂匪徒问题，并探索了贪婪的政策，以最大程度地减少在无限视野中采样的预期AOI。为了分析贪婪政策的绩效，我们1）提出了一个轻松的贪婪政策，将武器的抽样过程解散，2）制定每只手臂的抽样过程，作为一个可以观察到的马尔可夫决策过程（POMDP），而3）在平均贪婪的贪婪政策下，将AOI的平均采样作为平均ARMS的平均ARMS Syny Arms Synys Syny Arms Syny Arms Synem syspl syspl syspl sympl sympl sympl sympl sympl sympl symple。数值和仿真结果验证了放松的贪婪政策是与贪婪策略的绝佳近似，因为预期的AOI在无限的地平线上采样。

This paper studies the minimum-age scheduling problem in a wireless sensor network where an access point (AP) monitors the state of an object via a set of sensors. The freshness of the sensed state, measured by the age-of-information (AoI), varies at different sensors and is not directly observable to the AP. The AP has to decide which sensor to query/sample in order to get the most updated state information of the object (i.e., the state information with the minimum AoI). In this paper, we formulate the minimum-age scheduling problem as a multi-armed bandit problem with partially observable arms and explore the greedy policy to minimize the expected AoI sampled over an infinite horizon. To analyze the performance of the greedy policy, we 1) put forth a relaxed greedy policy that decouples the sampling processes of the arms, 2) formulate the sampling process of each arm as a partially observable Markov decision process (POMDP), and 3) derive the average sampled AoI under the relaxed greedy policy as a sum of the average AoI sampled from individual arms. Numerical and simulation results validate that the relaxed greedy policy is an excellent approximation to the greedy policy in terms of the expected AoI sampled over an infinite horizon.

下载PDF全文

下载文献需遵守相关版权规定

论文标题