AGI agent 通过贝叶斯推理自由能目标函数,进行实时推理并完美解决pendulum任务环境;追踪目标能力极强
先简单看一下强化学习的效果
强化学习的Pendulum 效果:
这里使用juliareinforcementlearning库;epoch步数高达5000步
代码语言:javascript复制functionRL.Experiment( ::Val{:JuliaRL}, ::Val{:BasicDQN}, ::Val{:PendulumDiscrete}, ::Nothing; seed =123, ) rng = StableRNG(seed) env = PendulumEnv(continuous =false, max_steps =5000, rng = rng) ns, na = length(state(env)), length(action_space(env)) agent = Agent( policy = QBasedPolicy( learner = BasicDQNLearner( approximator = NeuralNetworkApproximator( model = Chain( Dense(ns,64, relu; init = glorot_uniform(rng)), Dense(64,64, relu; init = glorot_uniform(rng)), Dense(64, na; init = glorot_uniform(rng)), ) |> gpu, optimizer = ADAM(), ), batch_size =32, min_replay_history =100, loss_func = huber_loss, rng = rng, ), explorer = EpsilonGreedyExplorer( kind = :exp, ϵ_stable =0.01, decay_steps =500, rng = rng, ), ), trajectory = CircularArraySARTTrajectory( capacity =5_000, state =Vector{Float32} => (ns,), ), ) stop_condition = StopAfterStep(50_000, is_show_progress=!haskey(ENV,"CI")) hook = TotalRewardPerEpisode() Experiment(agent, env, stop_condition, hook,"")end
强化学习不能完美解决,而且每次目标参数变化要重新训练
AGI agent 效果:
AGI agent 通过贝叶斯推理自由能目标函数,进行实时推理并完美解决pendulum任务环境;追踪目标能力极强
目标位置、环境重力、物体质量、推力、摩擦力、等参数都可以实时调整,实时推理,实时完成
对比强化学习每次不同的目标,不同的配置都需要重新训练,且每次通过随机动作尝试达到目标的方法,贝叶斯推理明显是高维智能。
其他参考:
27次训练即可解决小车双摆的强化学习算法