Copyright | (c) Sentenai 2017 |
---|---|
License | BSD3 |
Maintainer | sam@sentenai.com |
Stability | experimental |
Portability | non-portable |
Safe Haskell | None |
Language | Haskell2010 |
User-facing API for MonadEnv, typeclass used to implement an environment
Documentation
class (Num r, Monad e) => MonadEnv e s a r | e -> s a r where Source #
The environment monad TODO: Think about two typeclasses: ContinuousMonadEnv and EpisodicMonadEnv
reset :: e (Initial s) Source #
Any environment must be initialized with reset
. This can be used to
reset the environment at any time. It's expected that resetting an
environment begins a new episode (and can only be called once in a
continuous environment).
step :: a -> e (Obs r s) Source #
Step though an environment with an action, run the action in the environment, and return a reward and the new state of the environment.
MonadEnv Environment () Action Reward Source # | |
MonadEnv Environment StateCP Action Reward Source # | |
MonadEnv m s a r => MonadEnv (MWCRandT m) s a r Source # | An instance which allows for an environment to hold a reference to a shared MWC-random generator |
MonadEnv m s a r => MonadEnv (DebugLogger m) s a r Source # | |
MonadEnv m s a r => MonadEnv (NoopLogger m) s a r Source # | |
(MonadIO t, MonadThrow t) => MonadEnv (EnvironmentT t) State Action Reward Source # | |
(MonadThrow t, MonadIO t) => MonadEnv (EnvironmentT t) State Action Reward Source # | |
(MonadIO t, MonadThrow t) => MonadEnv (EnvironmentT t) State Action Reward Source # | |
(MonadThrow t, MonadIO t) => MonadEnv (EnvironmentT t) StateFL Action Reward Source # | |
MonadEnv e s a r => MonadEnv (StateT t e) s a r Source # | |
(Monoid t, MonadEnv e s a r) => MonadEnv (WriterT t e) s a r Source # | |
MonadEnv e s a r => MonadEnv (ReaderT * t e) s a r Source # | |
(Monoid writer, MonadEnv e s a r) => MonadEnv (RWST reader writer state e) s a r Source # | |
An observation of the environment will either show that the environment is
done with the episode (yielding Done
), that the environment has already
Terminated
, or will return the reward of the last action performed and the
next state
TODO: return Terminal
(or return ()) on failure
Next !r !o | |
Done !r !(Maybe o) | |
Terminated |
When starting an episode, we want to send an indication that the environment
is starting without conflating this type with future steps (in Obs r o
)