openAI的仿真環境Gym Retro的Python API接口

本文轉載自查看原文 2021-09-11 07:20 140 Python

如題，本文主要介紹仿真環境Gym Retro的Python API接口。

官網地址：

https://retro.readthedocs.io/en/latest/python.html

==============================================

gym-retro 的Python接口和gym基本保持一致，或者說是兼容的，在使用gym-retro的時候會調用gym的一些操作，因此我們安裝gym-retro的同時也會將gym進行安裝。

因為gym-retro的Python接口和gym大致相同，所以官網給出的也是二者不同的地方，也就是gym-retro中才有的一些設置，該不同的地方其實就只有一處，就是環境的設置入口，而其他的不同地方都是圍繞着這個入口函數的或者說是為這個入口函數進行參數設置的。環境入口函數如上圖所示。

gym-retro 的環境入口函數（其實是類的函數）有兩個，分別為：retro.make(), retro.RetroEnv

retro.make 函數的輸入參數情況：

retro.RetroEnv 函數的輸入參數情況：

說明一點，個人在使用時沒有發現這兩個函數有什么不同，為了和gym更加匹配所以更加推薦使用 retro.make 函數，同時官網中也是推薦使用 retro.make 函數進行環境設置。

下面我們都以 retro.make 為例子進行介紹。

==============================================

retro.make 中的輸入參數為 enum 枚舉類型，具體為類:

retro.State ， retro.Actions , retro.Observations 。

官網介紹：

=============================================

下面使用游戲 Pong-Atari2600 進行API的介紹，標准默認的代碼如下：

注意：游戲的roms下載地址：

atari 2600 ROM官方鏈接:

http://www.atarimania.com/rom_collection_archive_atari_2600_roms.html

import retro


def main():
    env = retro.make(game='Pong-Atari2600', players=2)
    obs = env.reset()

    while True:
        # action_space will by MultiBinary(16) now instead of MultiBinary(8)
        # the bottom half of the actions will be for player 1 and the top half for player 2
        obs, rew, done, info = env.step(env.action_space.sample())
        # rew will be a list of [player_1_rew, player_2_rew]
        # done and info will remain the same
        env.render()

        if done:
            obs = env.reset()
    env.close()


if __name__ == "__main__":
    main()

對環境函數 retro.make 設置入參 state ：

入參 state的輸入值為 retro.State 的枚舉類：

分別可以設置為

state=retro.State.DEFAULT

或

state=retro.State.NONE

其中，我們默認的是使用 state=retro.State.DEFAULT ，這樣我們就可以使用安裝游戲時游戲rom文件夾下的 metadata.json 文件中指定的游戲開始狀態，即 metadata.json 中指定的 .state 文件。而 state=retro.State.NONE 則是使用rom文件默認的原始初始狀態進行游戲初始化。

說明一下，ROMs游戲的狀態可以保存為某個 .state 文件，從 .state 文件中啟動初始化某游戲我們可以得到完全相同的游戲初始化環境（游戲在內存中的所有數值都是完全相同的）。由於ROMs游戲原本設計並不是給計算機仿真使用的，所以很多游戲在開始階段需要認為手動的進行選擇（如關卡選擇、難度選擇、具體配置選擇等），為了可以方便的在計算機里面仿真我們一般需要提前對ROMs游戲進行手動初始化也就是跳過這些需要手動操作的步驟，然后再將此時的游戲狀態保存下來，以后使用計算機仿真的時候直接從保存的狀態啟動。

這時大家或許會有個疑問，那就是采用上面的方式每次都是從同一個狀態開始游戲那么是不是會進行多個回合的游戲最后結果都一樣呢，確實這個擔憂是多余的，因為即使每次都是從同一個游戲狀態啟動游戲但是在運行游戲的過程中我們使用的隨機種子是不同的，這樣計算機采取的動作也是不同的，這時不論agent的動作如何選擇整個游戲過程都是各不相同的。

正因為我們往往需要手動操作游戲去跳過游戲的開始階段所以我們一般不使用 state=retro.State.NONE 設置，而是 state=retro.State.DEFAULT ，這樣就可以在 metadata.json 中指定自己手動指定的開始狀態。

本文使用anaconda環境運行，因此本文中游戲 Pong-Atari2600 的地址：（本機創建的環境名玩為 game ）

anaconda3\envs\game\Lib\site-packages\retro\data\stable\Pong-Atari2600

該路徑下內容：

可以看到 metadata.json 中內容：

其中，“Start” 為單人模式下啟動游戲的狀態文件，“Start.2P” 為雙人模式下啟動游戲的狀態文件名，加上 .state 文件類型后綴則為 Start.state 文件和 Start.2p.state 文件正好對應上面路徑下的兩個.state 文件。

例子：

state=retro.State.DEFAULT

import retro


def main():
    env = retro.make(game='Pong-Atari2600', players=2, state=retro.State.DEFAULT)
    obs = env.reset()

    while True:
        # action_space will by MultiBinary(16) now instead of MultiBinary(8)
        # the bottom half of the actions will be for player 1 and the top half for player 2
        obs, rew, done, info = env.step(env.action_space.sample())
        # rew will be a list of [player_1_rew, player_2_rew]
        # done and info will remain the same
        env.render()

        if done:
            obs = env.reset()
    env.close()


if __name__ == "__main__":
    main()

View Code

state=retro.State.NONE

import retro


def main():
    env = retro.make(game='Pong-Atari2600', players=2, state=retro.State.NONE)
    obs = env.reset()

    while True:
        # action_space will by MultiBinary(16) now instead of MultiBinary(8)
        # the bottom half of the actions will be for player 1 and the top half for player 2
        obs, rew, done, info = env.step(env.action_space.sample())
        # rew will be a list of [player_1_rew, player_2_rew]
        # done and info will remain the same
        env.render()

        if done:
            obs = env.reset()
    env.close()


if __name__ == "__main__":
    main()

View Code

===============================================

對環境函數 retro.make 設置入參 obs_type ：

分別可以設置為

obs_type=retro.Observations.IMAGE

或

obs_type=retro.Observations.RAM

其中，obs_type=retro.Observations.IMAGE 為默認設置，表示agent與環境交互返回的狀態變量為圖像的數值，
而 obs_type=retro.Observations.RAM 則表示返回的是游戲運行時的內存數據（用游戲當前運行時內存中數據表示此時的狀態）

例子：

obs_type=retro.Observations.IMAGE

import retro


def main():
    env = retro.make(game='Pong-Atari2600', players=2, state=retro.State.DEFAULT, obs_type=retro.Observations.IMAGE)
    obs = env.reset()

    while True:
        # action_space will by MultiBinary(16) now instead of MultiBinary(8)
        # the bottom half of the actions will be for player 1 and the top half for player 2
        obs, rew, done, info = env.step(env.action_space.sample())
        # rew will be a list of [player_1_rew, player_2_rew]
        # done and info will remain the same
        env.render()

        print(type(obs))
        print(obs)

        if done:
            obs = env.reset()
    env.close()


if __name__ == "__main__":
    main()

View Code

obs_type=retro.Observations.RAM

import retro


def main():
    env = retro.make(game='Pong-Atari2600', players=2, state=retro.State.DEFAULT, obs_type=retro.Observations.RAM)
    obs = env.reset()

    while True:
        # action_space will by MultiBinary(16) now instead of MultiBinary(8)
        # the bottom half of the actions will be for player 1 and the top half for player 2
        obs, rew, done, info = env.step(env.action_space.sample())
        # rew will be a list of [player_1_rew, player_2_rew]
        # done and info will remain the same
        env.render()

        print(type(obs))
        print(obs)

        if done:
            obs = env.reset()
    env.close()


if __name__ == "__main__":
    main()

View Code

================================================

對環境函數 retro.make 設置入參 use_restricted_actions ：

可設置為：

use_restricted_actions=retro.Actions.ALL

或

use_restricted_actions=retro.Actions.DISCRETE

或

use_restricted_actions=retro.Actions.FILTERED

或

use_restricted_actions=retro.Actions.MULTI_DISCRETE

根據函數 retro.RetroEnv 源代碼：

https://retro.readthedocs.io/en/latest/_modules/retro/retro_env.html#RetroEnv

我們可以大致估計默認設置為：

use_restricted_actions=retro.Actions.FILTERED

其中，use_restricted_actions=retro.Actions.ALL 代表動作為 MultiBinary 類型，並且不對動作進行過濾，也就是說動作空間為使用所有動作（有些無效動作也會包括在里面）。

而 use_restricted_actions=retro.Actions.DISCRETE 和 use_restricted_actions=retro.Actions.FILTERED 和 use_restricted_actions=retro.Actions.MULTI_DISCRETE

則是對動作過濾，也就是說不使用所有動作作為動作空間，將一些無效動作直接過濾排除掉，不包括在動作空間中。

其中，use_restricted_actions=retro.Actions.DISCRETE 動作空間的類型為 DISCRETE 類型，

use_restricted_actions=retro.Actions.FILTERED 動作空間的類型為 MultiBinary 類型。

use_restricted_actions=retro.Actions.MULTI_DISCRETE 動作空間的類型為 MultiDiscete 類型。

注意：在游戲 Pong-Atari2600 中 use_restricted_actions=retro.Actions.MULTI_DISCRETE 傳遞給環境的step函數后會報錯。

例子：

use_restricted_actions=retro.Actions.MULTI_DISCRETE

import retro


def main():
    env = retro.make(game='Pong-Atari2600', players=2, state=retro.State.DEFAULT, obs_type=retro.Observations.IMAGE, use_restricted_actions=retro.Actions.MULTI_DISCRETE)
    obs = env.reset()

    while True:
        # action_space will by MultiBinary(16) now instead of MultiBinary(8)
        # the bottom half of the actions will be for player 1 and the top half for player 2

        print(env.action_space.sample())

        obs, rew, done, info = env.step(env.action_space.sample())
        # rew will be a list of [player_1_rew, player_2_rew]
        # done and info will remain the same
        env.render()

        if done:
            obs = env.reset()
    env.close()


if __name__ == "__main__":
    main()

View Code

運行報錯信息：

[1 2 0 0 0 2]
Traceback (most recent call last):
  File "C:/Users/81283/PycharmProjects/game/x.py", line 25, in <module>
    main()
  File "C:/Users/81283/PycharmProjects/game/x.py", line 14, in main
    obs, rew, done, info = env.step(env.action_space.sample())
  File "C:\Users\81283\anaconda3\envs\game\lib\site-packages\retro\retro_env.py", line 179, in step
    for p, ap in enumerate(self.action_to_array(a)):
  File "C:\Users\81283\anaconda3\envs\game\lib\site-packages\retro\retro_env.py", line 161, in action_to_array
    buttons = self.button_combos[i]
IndexError: list index out of range

Process finished with exit code 1

這說明游戲 Pong-Atari2600 環境的step函數不支持 MultiDiscete 類型的動作空間。

例子：

use_restricted_actions=retro.Actions.ALL

import retro


def main():
    env = retro.make(game='Pong-Atari2600', players=2, state=retro.State.DEFAULT, obs_type=retro.Observations.IMAGE, use_restricted_actions=retro.Actions.ALL)
    obs = env.reset()

    while True:
        # action_space will by MultiBinary(16) now instead of MultiBinary(8)
        # the bottom half of the actions will be for player 1 and the top half for player 2

        print(env.action_space.sample())

        obs, rew, done, info = env.step(env.action_space.sample())
        # rew will be a list of [player_1_rew, player_2_rew]
        # done and info will remain the same
        env.render()

        if done:
            obs = env.reset()
    env.close()


if __name__ == "__main__":
    main()

View Code

不對動作進行進行過濾，有些無效的動作也會被選擇，所以導致游戲會出現很多無法預料的結果，這個例子中就會出現游戲始終無法正式開始（游戲開始一般需要執行fire button），或者游戲沒有運行幾步就 reset 重新初始化了。

例子：

use_restricted_actions=retro.Actions.DISCRETE

import retro


def main():
    env = retro.make(game='Pong-Atari2600', players=2, state=retro.State.DEFAULT, obs_type=retro.Observations.IMAGE, use_restricted_actions=retro.Actions.DISCRETE)
    obs = env.reset()

    while True:
        # action_space will by MultiBinary(16) now instead of MultiBinary(8)
        # the bottom half of the actions will be for player 1 and the top half for player 2

        print(env.action_space.sample())

        obs, rew, done, info = env.step(env.action_space.sample())
        # rew will be a list of [player_1_rew, player_2_rew]
        # done and info will remain the same
        env.render()

        if done:
            obs = env.reset()
    env.close()


if __name__ == "__main__":
    main()

View Code

例子：（默認的設置）

use_restricted_actions=retro.Actions.FILTERED

import retro


def main():
    env = retro.make(game='Pong-Atari2600', players=2, state=retro.State.DEFAULT, obs_type=retro.Observations.IMAGE, use_restricted_actions=retro.Actions.FILTERED )
    obs = env.reset()

    while True:
        # action_space will by MultiBinary(16) now instead of MultiBinary(8)
        # the bottom half of the actions will be for player 1 and the top half for player 2

        print(env.action_space.sample())

        obs, rew, done, info = env.step(env.action_space.sample())
        # rew will be a list of [player_1_rew, player_2_rew]
        # done and info will remain the same
        env.render()

        if done:
            obs = env.reset()
    env.close()


if __name__ == "__main__":
    main()

View Code

設置 use_restricted_actions=retro.Actions.DISCRETE 和 use_restricted_actions=retro.Actions.FILTERED 的動作空間類型分別為 DISCRETE 類型和 MultiBinary 類型。雖然這兩個設置的動作空間不同，但是都是動作空間對應的動作都是過濾后的動作，因此在執行過程中這兩種設置取得的效果大致相同。

這里說明一下，無效動作個人的理解是對環境初始化或者其他的可以影響環境正常運行的動作，而不是說無效動作會執行后報錯的，只能說執行無效動作會使我們得到不想要的環境狀態。

===============================================

對動作空間進行定制化，給出例子，對126個數值的 Discrete(126) 動作空間限制為7個數值的 Discrete(7)動作空間，也就是說Discrete類型的126個動作中我們只取其中最重要的7個動作，將這7個動作定制為新的動作空間。

例子： discretizer.py

修改后的代碼：

"""
Define discrete action spaces for Gym Retro environments with a limited set of button combos
"""

import gym
import numpy as np
import retro


class Discretizer(gym.ActionWrapper):
    """
    Wrap a gym environment and make it use discrete actions.
    Args:
        combos: ordered list of lists of valid button combinations
    """

    def __init__(self, env, combos):
        super().__init__(env)
        assert isinstance(env.action_space, gym.spaces.MultiBinary)
        buttons = env.unwrapped.buttons
        self._decode_discrete_action = []
        for combo in combos:
            arr = np.array([False] * env.action_space.n)
            for button in combo:
                arr[buttons.index(button)] = True
            self._decode_discrete_action.append(arr)

        self.action_space = gym.spaces.Discrete(len(self._decode_discrete_action))

    def action(self, act):
        return self._decode_discrete_action[act].copy()


class SonicDiscretizer(Discretizer):
    """
    Use Sonic-specific discrete actions
    based on https://github.com/openai/retro-baselines/blob/master/agents/sonic_util.py
    """

    def __init__(self, env):
        super().__init__(env=env,
                         combos=[['LEFT'], ['RIGHT'], ['LEFT', 'DOWN'], ['RIGHT', 'DOWN'], ['DOWN'], ['DOWN', 'B'],
                                 ['B']])


def main():
    env = retro.make(game='SonicTheHedgehog-Genesis', use_restricted_actions=retro.Actions.MULTI_DISCRETE)
    print('retro.Actions.MULTI_DISCRETE action_space', env.action_space)
    env.close()

    env = retro.make(game='SonicTheHedgehog-Genesis', use_restricted_actions=retro.Actions.ALL)
    print('retro.Actions.ALL action_space', env.action_space)
    env.close()

    env = retro.make(game='SonicTheHedgehog-Genesis', use_restricted_actions=retro.Actions.FILTERED)
    print('retro.Actions.FILTERED action_space', env.action_space)
    env.close()

    env = retro.make(game='SonicTheHedgehog-Genesis', use_restricted_actions=retro.Actions.DISCRETE)
    print('retro.Actions.DISCRETE action_space', env.action_space)
    env.close()

    env = retro.make(game='SonicTheHedgehog-Genesis')
    print(env.unwrapped.buttons)
    env = SonicDiscretizer(env)
    print('SonicDiscretizer action_space', env.action_space)
    env.close()


if __name__ == '__main__':
    main()

運行結果：

已知過濾后的DISCRETE 動作空間為 Discrete(126) , 我們希望將 DISCRETE 動作空間限制為 DISCRETE(7) 。

上面例子的實現是將 MultiBinary(12) 對應的Button，即 ['B', 'A', 'MODE', 'START', 'UP', 'DOWN', 'LEFT', 'RIGHT', 'C', 'Y', 'X', 'Z']

選取為 [['LEFT'], ['RIGHT'], ['LEFT', 'DOWN'], ['RIGHT', 'DOWN'], ['DOWN'], ['DOWN', 'B'], ['B']] ，即 MultiBinary(7) 。

Discrete(7) 的動作分別為 0， 1， 2， 3， 4， 5， 6 ，對應的 MultiBinary(7) 的button意義分別為：

[['LEFT'], ['RIGHT'], ['LEFT', 'DOWN'], ['RIGHT', 'DOWN'], ['DOWN'], ['DOWN', 'B'], ['B']]

而上面例子的MultiBinary(7) 其實是在MultiBinary(12)的基礎上包裝的，其真實的MultiBinary(12) 編碼為：

[[0 0 0 0 0 0 1 0 0 0 0 0]
[0 0 0 0 0 0 0 1 0 0 0 0]
[0 0 0 0 0 1 1 0 0 0 0 0]
[0 0 0 0 0 1 0 1 0 0 0 0]
[0 0 0 0 0 1 0 0 0 0 0 0]
[1 0 0 0 0 1 0 0 0 0 0 0]
[1 0 0 0 0 0 0 0 0 0 0 0]]

代碼：

"""
Define discrete action spaces for Gym Retro environments with a limited set of button combos
"""

import gym
import numpy as np
import retro


class Discretizer(gym.ActionWrapper):
    """
    Wrap a gym environment and make it use discrete actions.
    Args:
        combos: ordered list of lists of valid button combinations
    """

    def __init__(self, env, combos):
        super().__init__(env)
        assert isinstance(env.action_space, gym.spaces.MultiBinary)
        buttons = env.unwrapped.buttons
        self._decode_discrete_action = []
        for combo in combos:
            arr = np.array([False] * env.action_space.n)
            for button in combo:
                arr[buttons.index(button)] = True
            self._decode_discrete_action.append(arr)
        print("inside encode:")
        print(np.array(self._decode_discrete_action, dtype=np.int32))

        self.action_space = gym.spaces.Discrete(len(self._decode_discrete_action))

    def action(self, act):
        return self._decode_discrete_action[act].copy()


class SonicDiscretizer(Discretizer):
    """
    Use Sonic-specific discrete actions
    based on https://github.com/openai/retro-baselines/blob/master/agents/sonic_util.py
    """

    def __init__(self, env):
        super().__init__(env=env,
                         combos=[['LEFT'], ['RIGHT'], ['LEFT', 'DOWN'], ['RIGHT', 'DOWN'], ['DOWN'], ['DOWN', 'B'],
                                 ['B']])


def main():
    env = retro.make(game='SonicTheHedgehog-Genesis', use_restricted_actions=retro.Actions.MULTI_DISCRETE)
    print('retro.Actions.MULTI_DISCRETE action_space', env.action_space)
    env.close()

    env = retro.make(game='SonicTheHedgehog-Genesis', use_restricted_actions=retro.Actions.ALL)
    print('retro.Actions.ALL action_space', env.action_space)
    env.close()

    env = retro.make(game='SonicTheHedgehog-Genesis', use_restricted_actions=retro.Actions.FILTERED)
    print('retro.Actions.FILTERED action_space', env.action_space)
    env.close()

    env = retro.make(game='SonicTheHedgehog-Genesis', use_restricted_actions=retro.Actions.DISCRETE)
    print('retro.Actions.DISCRETE action_space', env.action_space)
    env.close()

    env = retro.make(game='SonicTheHedgehog-Genesis')
    print(env.unwrapped.buttons)
    env = SonicDiscretizer(env)
    print('SonicDiscretizer action_space', env.action_space)
    env.close()


if __name__ == '__main__':
    main()

View Code

說明： MultiBinary 動作空間每次選擇的動作可能是幾個動作的組合，比如在 MultiBinary(5) 的動作空間中隨機選取動作可能為：

[0,1,0,1,0] 或者 [1,0,1,1,0]，其中 1 代表選取對應的動作，0則代表不選取對應的動作。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 強化學習仿真環境搭建入門Getting Started with OpenAI gym 強化學習平台 openAI 的 gym 安裝（Ubuntu環境下如何安裝Python的gym模塊） Windows下OpenAI gym環境的使用 OpenAI gym的建模思想 [DQN] OpenAI Gym - CartPole OpenAI Gym 入門與提高（一） Gym環境構建與最簡單的RL agent 強化學習環境OpenAi搭建，從虛擬機到Gym、Mujoco和mujoco-py的完整安裝 OpenAI Gym 入門與提高（二）在筆記本GPU雙顯卡上構建深度學習開發環境----運行cuda、cudnn、tensorflow、theano、openai gym等使用PARL與Gym仿真環境進行深度Q學習（DQL） Linux 16.04.1-Ubuntu 安裝 Python3 + openAi Gym 以及錯誤解決