Action Schema Neural Networks: Generalized Policies for Stochastic Planning Problems in the Wargaming Domain

Stochastic shortest path problems have been of interest to the automated planning community for many years. Traditionally, policy solutions to these problems have been found by a set of admissible heuristics, such as LM-Cut, which are able to approximate the best actions to take in a current state to provide the highest probability of reaching a goal state in a delete relaxation of these problems. Though successful, these heuristics face scalability problems as the state spaces of these stochastic problems increase. \cite{toyer2018action} provided a solution to this problem by utilizing deep neural networks to learn a successful policy that could scale to Nth order problems with only linear time constraints. The neural networks are coined Action Schema Networks (ASNets), since given a current state they provide an appropriate action to take. We present a case study on this technique by applying it to the fighter jet wargaminig domain. We have designed a PPDDL domain and grounding files for a wide set of scenarios in which red and blue, 4th and 5th, generations fighters engage in battle and the ASNets must decide on which attack method to use given the current scenario state to increase the probability of reaching a goal state. We present the results of 5 trial experiments and discuss the degree of success we have had in training the ASNets, intuition about the results, and suggestions for future work.

The wargame PPDDL domain, PPDDL grounding files, grounding generator scripts, trial experiment files, the experiment results, and python scripts to collate the results of the trials are available on GitHub here under an MIT License

Leave a Reply

Your email address will not be published. Required fields are marked *