OpenAI gymnasium examples

Cheetah test

In this section we show how to use the example supplied with the installation. This example shows how to setup the software to train the OpenAi Mujoco Cheetah model to run.

 

Copy the entire cheetah-test example directory to a suitable location. Then make a copy of the entire cheetah-test example directory to our first test directory, here we call it ‘cheetah-test1’

 

cp -r cheetah-test cheetah-test1

 

Now enter that directory and run evolve using the following command:

 

nohup evolve > results.log &

 

And watch the results using the following command:

 

tail -f results.log

 

After a few generations, type ‘Ctrl-c’ at the terminal to exit the ‘tail -f’.

 

The supplied configuration in cheetah-test is unlikely to find any solution to the Cheetah running problem but is designed to complete each iteration quickly to demonstrate how to use the tools.

 

Say that we wish to view the performance of the best trained program at generation 1. To do this we type:

 

                                    cp pm1.hif test.hif

python3 pyModelRun.py

 

This will bring up a window where the Cheetah model will be operated by the trained program. The output of training at generation 1 is contained in pm1.hif and the pyModelRun.py file expects a file test.hif as it’s test program so we simply copy the required program to be tested first. Given the training configuration in cheetah-test, the result is likely to be a stationary cheetah.

 

We can go back to observing the performance of the training by typing:

 

tail -f results.log

 

And at any time we can stop the training process by typing ‘Ctrl-c’ at the terminal to exit the ‘tail -f’ command and then killing the training by issuing:

 

               evolveKill

 

Say that we wanted to restart the training where we left off, this would be done by creating a new directory for the new training run. Go to the parent directory of cheetah-test1 and create cheetah-test2 with the data we originally use to create cheetah-test1.

 

                                        cd ..

               cp -r cheetah-test cheetah-test2

               cd cheetah-test2

 

Because cheetah-test1 and cheetah-test2 share the same configuration, they can also share the same format for trained program files. We can therefore use the previously trained program files to seed this new run. To do that we copy the desired starting point for the new training into this new directory.

 

                                        cp ../cheetah-test1/pm5.hif seed0.hif

 

Note that any generation output can be used to seed another search. Outputs of multiple searches can also be used to seed another search, and this enables interesting characteristics to be combined.

 

To restart the training simply type the following command and follow the same steps given previously.

 

nohup evolve > results.log &

 

Note that multiple training processes can be run simultaneously, if the users license allows, but these must be within different working directories. evolveKill should always be used to terminate the training process as this cleanly frees the license. evolveKill must be run in the working directory of the training process that is required to be terminated. If the training process is not terminated cleanly the license will not be released, however in this case the maximum period of time the license will be held and blocked is 30 minutes. 

Cheetah run

A second version of the cheetah model is supplied. The second version is contained in the directory ‘cheetah-run’. The difference between the two versions is the configuration data. The configuration data in ‘cheetah-run’ will require significantly more processing time between generations. The process of running it is identical to the previous example.

 

Note that all results are unique due to the randomness involved in the search process. Therefore, it is not possible to say exactly how long this will take to converge on a solution. Many solutions represent locally optimised. It is possible to monitor the trained program’s behaviour, to stop and restart training, and to seed further training from multiple programs in order to combine interesting or useful characteristics.