StarCraft II RL Tutorial 3

Deepmind's pysc2: Action Space

Posted by Chris Hoyean Song on February 7, 2019

Hi there! I’m Chris. I’m the boyfriend of reinforcement learning since reinforcement learning is my girlfriend.

On this 3rd posting, I’ll cover the action space of the pysc2 environment.

In Atari game, the action space is so simple. For example, you can control the agent by using only 2 actions: up and down. Action space is the input value when we play the game.

alt text

When Deepmind published the Deep Q-Network to the world, they used Atari as the experiment environment. I think Deepmind adopt Atari game as their training environment since the Atari game has a quite small number of action space and observation space.

alt text

But, the StarCraft II, the game we are going to cover, is quite complicated. In this article, we will check the action space of StarCraft II.

The action space of the StarCraft II consists of combinations of many commands. The big difference between Atari and StarCraft II is this, “mouse and camera.”

For instance, when we(human player) select an SCV, we draw a rectangle using a mouse near the SCV. The pysc2 takes a similar approach to solve this action.

alt text

When we select an SCV, we take these three step to do it.

  • Left click the mouse at the left upper part of the SCV.
  • Drag the mouse to the right lower part of the SCV.
  • Release the mouse left button.

In pysc2, we have to combinate the commands like below to do the same task in Star Craft II.

  • Base Action: select_rect (3)
  • Sub Action: (false)
  • Point1: (10, 12)
  • Point2: (20, 19)

1) Base Action: select_rect

First, we have to figure out the base action number by drawing the rectangle to select units. In pysc2 source code, you can find those base actions. Let me share the link.

https://github.com/deepmind/pysc2/blob/master/pysc2/lib/actions.py#L454

# The semantic meaning of these actions can mainly be found by searching:
# http://liquipedia.net/starcraft2/ or http://starcraft.wikia.com/ .
# pylint: disable=line-too-long
_FUNCTIONS = [
    Function.ui_func(0, "no_op", no_op),
    Function.ui_func(1, "move_camera", move_camera),
    Function.ui_func(2, "select_point", select_point),
    Function.ui_func(3, "select_rect", select_rect),

As you can see, the code of select_rect is 3.

We need three parameters to do this command. Those parameters are sub_action, point1, point2.

I’ll describe these three parameters one by one if you check pysc2/lib/actions.py line 320, you can find the three parameters to do select_rect action.

https://github.com/deepmind/pysc2/blob/master/pysc2/lib/actions.py#L320


# Which argument types do each function need?
FUNCTION_TYPES = {
    no_op: [],
    move_camera: [TYPES.minimap],
    select_point: [TYPES.select_point_act, TYPES.screen],
    select_rect: [TYPES.select_add, TYPES.screen, TYPES.screen2],
    select_unit: [TYPES.select_unit_act, TYPES.select_unit_id],
    control_group: [TYPES.control_group_act, TYPES.control_group_id],
    select_idle_worker: [TYPES.select_worker],
    select_army: [TYPES.select_add],
    select_warp_gates: [TYPES.select_add],
    select_larva: [],
    unload: [TYPES.unload_id],
    build_queue: [TYPES.build_queue_id],
    cmd_quick: [TYPES.queued],
    cmd_screen: [TYPES.queued, TYPES.screen],
    cmd_minimap: [TYPES.queued, TYPES.minimap],
    autocast: [],
}

So, you need these 3 parameters. select_add, screen, screen2

2) Sub Action: select_add

So, let’s check select_add first. This parameter is a boolean type. The value can be True or False. If you are an intermediate level player of StarCraft II, you should know you can press the [Shift] button to select more units when you already selected some units. So this select_add parameter decides whether the agent is going to select more units or select the units only in the rectangle.

https://github.com/deepmind/pysc2/blob/master/pysc2/lib/actions.py#L219

The full list of argument types.
  Take a look at TYPES and FUNCTION_TYPES for more details.
  Attributes:
    screen: A point on the screen.
    minimap: A point on the minimap.
    screen2: The second point for a rectangle. This is needed so that no
        function takes the same type twice.
    queued: Whether the action should be done now or later.
    control_group_act: What to do with the control group.
    control_group_id: Which control group to do it with.
    select_point_act: What to do with the unit at the point.
    select_add: Whether to add the unit to the selection or replace it.
    select_unit_act: What to do when selecting a unit by id.
    select_unit_id: Which unit to select by id.
    select_worker: What to do when selecting a worker.
    build_queue_id: Which build queue index to target.
    unload_id: Which unit to target in a transport/nydus/command center.

3) Screen1: Point() x,y coordinate

screen parameter is the x,y point on the screen.

https://github.com/deepmind/pysc2/blob/master/pysc2/lib/actions.py#L211

4) Screen2: Point() x,y coordinate

screen2 parameter is the second x,y point for a rectangle.

So in summary, the parameters for these commands are like below.

action:

[3, 0, [[10,12], [20,19]]]

This is the real numbered parameters that pysc2 send to the StarCraftII client. Then let me explain these parameters.

3 : Base Action parameter, `select_rect`
0 : Sub Action parameter, `select_add` [True, False]
[10,12] : Point1 parameter, `screen` [[0~63, 0~63]]
[20,19] : Point2 parameter, `screen2` [[0~63, 0~63]]

So the action space for the pysc2 command works like this.

As I told you at the beginning of this article, the action space of simple Atari game consists of two commands-up and down. But, according to my simple calculation, the size of the action space of pysc2 is more than 100,000,000. You can say it’s a super hard problem.

5) available_actions: the available base actions that you can take in this step.

One more thing, in StarCraftII, there is the concept of available_actions. If you don’t choose marine, you cannot use Stimpack skill. And you cannot build buildings unless you select SCV.

That’s why pysc2 returns available_actions parameter in every step of the game environment. So, in my case, I remove all those unavailable actions from the policy network output results.

It’s been a long time after I wrote the last pysc2 tutorial.

StarCraft II RL environment is like my first love of reinforcement learning. And I’m so happy to watch the research progress of Deepmind’s Alphastar these days. I sincerely respect the researchers of Alphastar team since they made a great accomplishment.