Copyright	(c) Sentenai 2017
License	BSD3
Maintainer	sam@sentenai.com
Stability	experimental
Portability	non-portable
Safe Haskell	None
Language	Haskell2010

Control.MonadEnv

Description

User-facing API for MonadEnv, typeclass used to implement an environment

Synopsis

Documentation

class (Num r, Monad e) => MonadEnv e s a r | e -> s a r where Source #

The environment monad TODO: Think about two typeclasses: ContinuousMonadEnv and EpisodicMonadEnv

Minimal complete definition

reset, step

Methods

reset :: e (Initial s) Source #

Any environment must be initialized with reset. This can be used to reset the environment at any time. It's expected that resetting an environment begins a new episode (and can only be called once in a continuous environment).

step :: a -> e (Obs r s) Source #

Step though an environment with an action, run the action in the environment, and return a reward and the new state of the environment.

Instances

MonadEnv Environment () Action Reward Source #
Methods reset :: Environment (Initial ()) Source # step :: Action -> Environment (Obs Reward ()) Source #
MonadEnv Environment StateCP Action Reward Source #
Methods reset :: Environment (Initial StateCP) Source # step :: Action -> Environment (Obs Reward StateCP) Source #
MonadEnv m s a r => MonadEnv (MWCRandT m) s a r Source #	An instance which allows for an environment to hold a reference to a shared MWC-random generator
Methods reset :: MWCRandT m (Initial s) Source # step :: a -> MWCRandT m (Obs r s) Source #
MonadEnv m s a r => MonadEnv (DebugLogger m) s a r Source #
Methods reset :: DebugLogger m (Initial s) Source # step :: a -> DebugLogger m (Obs r s) Source #
MonadEnv m s a r => MonadEnv (NoopLogger m) s a r Source #
Methods reset :: NoopLogger m (Initial s) Source # step :: a -> NoopLogger m (Obs r s) Source #
(MonadIO t, MonadThrow t) => MonadEnv (EnvironmentT t) State Action Reward Source #
Methods reset :: EnvironmentT t (Initial State) Source # step :: Action -> EnvironmentT t (Obs Reward State) Source #
(MonadThrow t, MonadIO t) => MonadEnv (EnvironmentT t) State Action Reward Source #
Methods reset :: EnvironmentT t (Initial State) Source # step :: Action -> EnvironmentT t (Obs Reward State) Source #
(MonadIO t, MonadThrow t) => MonadEnv (EnvironmentT t) State Action Reward Source #
Methods reset :: EnvironmentT t (Initial State) Source # step :: Action -> EnvironmentT t (Obs Reward State) Source #
(MonadThrow t, MonadIO t) => MonadEnv (EnvironmentT t) StateFL Action Reward Source #
Methods reset :: EnvironmentT t (Initial StateFL) Source # step :: Action -> EnvironmentT t (Obs Reward StateFL) Source #
MonadEnv e s a r => MonadEnv (StateT t e) s a r Source #
Methods reset :: StateT t e (Initial s) Source # step :: a -> StateT t e (Obs r s) Source #
(Monoid t, MonadEnv e s a r) => MonadEnv (WriterT t e) s a r Source #
Methods reset :: WriterT t e (Initial s) Source # step :: a -> WriterT t e (Obs r s) Source #
MonadEnv e s a r => MonadEnv (ReaderT * t e) s a r Source #
Methods reset :: ReaderT * t e (Initial s) Source # step :: a -> ReaderT * t e (Obs r s) Source #
(Monoid writer, MonadEnv e s a r) => MonadEnv (RWST reader writer state e) s a r Source #
Methods reset :: RWST reader writer state e (Initial s) Source # step :: a -> RWST reader writer state e (Obs r s) Source #

data Obs r o Source #

An observation of the environment will either show that the environment is done with the episode (yielding Done), that the environment has already Terminated, or will return the reward of the last action performed and the next state TODO: return Terminal (or return ()) on failure

Constructors

Next !r !o
Done !r !(Maybe o)
Terminated

Instances

(Eq o, Eq r) => Eq (Obs r o) Source #
Methods (==) :: Obs r o -> Obs r o -> Bool # (/=) :: Obs r o -> Obs r o -> Bool #
(Show o, Show r) => Show (Obs r o) Source #
Methods showsPrec :: Int -> Obs r o -> ShowS # show :: Obs r o -> String # showList :: [Obs r o] -> ShowS #

data Initial o Source #

When starting an episode, we want to send an indication that the environment is starting without conflating this type with future steps (in Obs r o)

Constructors

Initial !o
EmptyEpisode

type Reward = Double Source #

A concrete reward signal.