Copyright | (c) Sentenai 2017 |
---|---|
License | Proprietary |
Maintainer | sam@sentenai.com |
Stability | experimental |
Portability | non-portable |
Safe Haskell | None |
Language | Haskell2010 |
- CartPole by Sutton et al.
Taken from https://webdocs.cs.ualberta.ca/~sutton/book/code/pole.c with some added insights from the OpenAI gym
cart_and_pole: the cart and pole dynamics; given action and current state, estimates next state
cart_pole: Takes an action (0 or 1) and the current values of the four state variables and updates their values by estimating the state TAU seconds later.
- newtype Environment a = Environment {
- getEnvironment :: RWST CartPoleConf (DList Event) CartPoleState IO a
- runEnvironmentWithSeed :: Environment () -> GenIO -> IO (DList Event)
- runEnvironmentWithSeed_ :: Environment () -> GenIO -> IO ()
- runEnvironment :: Environment () -> IO (DList Event)
- runEnvironment_ :: Environment () -> IO ()
- data Event r o a = Event Integer r o a
- data Action
- data StateCP
Documentation
newtype Environment a Source #
A cartpole environment
Environment | |
|
runEnvironmentWithSeed :: Environment () -> GenIO -> IO (DList Event) Source #
run an environment with an explicit seed
runEnvironmentWithSeed_ :: Environment () -> GenIO -> IO () Source #
same as runEnvironmentWithSeed
but don't return history
runEnvironment :: Environment () -> IO (DList Event) Source #
run an environment and create a new random generator for each effectful action
runEnvironment_ :: Environment () -> IO () Source #
same as runEnvironment
but don't return history
Our primary datatype for an event in a trace. Contains the episode number,
reward, state, and action taken (in that order).
TODO: change the ordering to Event Integer s a r
MonadWriter (DList Event) Environment # | |
Monad t => MonadWriter (DList (Event Reward s a)) (GymEnvironmentT s a t) # | |
(Show a, Show o, Show r) => Show (Event r o a) Source # | |
Cartpole can only go left or right has an action space of "discrete 2" containing {0..n-1}.
FIXME: Migrate this to either a more generic "directions" actions (would need things like "up", "down" versions as well) or a "discrete actions" version. I'm a fan of the former.
The state of a cart on a pole in a CartPole environment