基于模型的参与者批评：gan（模型发生器） + drl（actor-Critic）=> agi

论文标题

基于模型的参与者批评：gan（模型发生器） + drl（actor-Critic）=> agi

Model-based actor-critic: GAN (model generator) + DRL (actor-critic) => AGI

论文作者

Dargazany, Aras

论文摘要

我们的努力是为了将GAN和DRL算法统一成统一的AI模型（AGI或通用AI或人工通用智能，其通用应用程序都可以应用于：（a）诸如GAN（UN/SEMI-SEMI-ALLE-ALLIFE-）之类的离线学习（UN/SEMI-ALLE-AFLECT）SLENED（例如大数据分析）和在线学习（br nime nime nim nim nime（Mining）（br））（br neach in contect; bro）（b）; （带有环境的奖励），例如（真实或模拟的）机器人和控制，我们的核心建议将（生成/预测的）环境模型添加到参与者（无模型）体系结构，从而导致基于型号的参与者批判性架构（TD）误差（TD）误差，并构成了型号我们将其与（无模型的）DDPG相比，将它们既应用于OpenAI体育馆和Unity Agents的独立模拟机器人和控制任务环境。（a）与最佳的基于模型的（行星）和无模型（D4PG）方法相比，通过产生竞争性能来统一DRL领域；

Our effort is toward unifying GAN and DRL algorithms into a unifying AI model (AGI or general-purpose AI or artificial general intelligence which has general-purpose applications to: (A) offline learning (of stored data) like GAN in (un/semi-/fully-)SL setting such as big data analytics (mining) and visualization; (B) online learning (of real or simulated devices) like DRL in RL setting (with/out environment reward) such as (real or simulated) robotics and control; Our core proposal is adding an (generative/predictive) environment model to the actor-critic (model-free) architecture which results in a model-based actor-critic architecture with temporal-differencing (TD) error and an episodic memory. The proposed AI model is similar to (model-free) DDPG and therefore it's called model-based DDPG. To evaluate it, we compare it with (model-free) DDPG by applying them both to a variety (wide range) of independent simulated robotic and control task environments in OpenAI Gym and Unity Agents. Our initial limited experiments show that DRL and GAN in model-based actor-critic results in an incremental goal-driven intellignce required to solve each task with similar performance to (model-free) DDPG. Our future focus is to investigate the proposed AI model potential to: (A) unify DRL field inside AI by producing competitive performance compared to the best of model-based (PlaNet) and model-free (D4PG) approaches; (B) bridge the gap between AI and robotics communities by solving the important problem of reward engineering with learning the reward function by demonstration.

下载PDF全文

下载文献需遵守相关版权规定

论文标题