论文标题
基于模型的参与者批评:gan(模型发生器) + drl(actor-Critic)=> agi
Model-based actor-critic: GAN (model generator) + DRL (actor-critic) => AGI
论文作者
论文摘要
我们的努力是为了将GAN和DRL算法统一成统一的AI模型(AGI或通用AI或人工通用智能,其通用应用程序都可以应用于:(a)诸如GAN(UN/SEMI-SEMI-ALLE-ALLIFE-)之类的离线学习(UN/SEMI-ALLE-AFLECT)SLENED(例如大数据分析)和在线学习(br nime nime nim nim nime(Mining)(br))(br neach in contect; bro)(b); (带有环境的奖励),例如(真实或模拟的)机器人和控制,我们的核心建议将(生成/预测的)环境模型添加到参与者(无模型)体系结构,从而导致基于型号的参与者批判性架构(TD)误差(TD)误差,并构成了型号我们将其与(无模型的)DDPG相比,将它们既应用于OpenAI体育馆和Unity Agents的独立模拟机器人和控制任务环境。 (a)与最佳的基于模型的(行星)和无模型(D4PG)方法相比,通过产生竞争性能来统一DRL领域;
Our effort is toward unifying GAN and DRL algorithms into a unifying AI model (AGI or general-purpose AI or artificial general intelligence which has general-purpose applications to: (A) offline learning (of stored data) like GAN in (un/semi-/fully-)SL setting such as big data analytics (mining) and visualization; (B) online learning (of real or simulated devices) like DRL in RL setting (with/out environment reward) such as (real or simulated) robotics and control; Our core proposal is adding an (generative/predictive) environment model to the actor-critic (model-free) architecture which results in a model-based actor-critic architecture with temporal-differencing (TD) error and an episodic memory. The proposed AI model is similar to (model-free) DDPG and therefore it's called model-based DDPG. To evaluate it, we compare it with (model-free) DDPG by applying them both to a variety (wide range) of independent simulated robotic and control task environments in OpenAI Gym and Unity Agents. Our initial limited experiments show that DRL and GAN in model-based actor-critic results in an incremental goal-driven intellignce required to solve each task with similar performance to (model-free) DDPG. Our future focus is to investigate the proposed AI model potential to: (A) unify DRL field inside AI by producing competitive performance compared to the best of model-based (PlaNet) and model-free (D4PG) approaches; (B) bridge the gap between AI and robotics communities by solving the important problem of reward engineering with learning the reward function by demonstration.