reinforce-0.0.0.1: Reinforcement learning in Haskell

Copyright(c) Sentenai 2017
LicenseBSD3
Maintainersam@sentenai.com
Stabilityexperimental
Portabilitynon-portable
Safe HaskellNone
LanguageHaskell2010

Control.MonadEnv

Description

User-facing API for MonadEnv, typeclass used to implement an environment

Synopsis

Documentation

class (Num r, Monad e) => MonadEnv e s a r | e -> s a r where Source #

The environment monad TODO: Think about two typeclasses: ContinuousMonadEnv and EpisodicMonadEnv

Minimal complete definition

reset, step

Methods

reset :: e (Initial s) Source #

Any environment must be initialized with reset. This can be used to reset the environment at any time. It's expected that resetting an environment begins a new episode (and can only be called once in a continuous environment).

step :: a -> e (Obs r s) Source #

Step though an environment with an action, run the action in the environment, and return a reward and the new state of the environment.

Instances

MonadEnv Environment () Action Reward Source # 
MonadEnv Environment StateCP Action Reward Source # 
MonadEnv m s a r => MonadEnv (MWCRandT m) s a r Source #

An instance which allows for an environment to hold a reference to a shared MWC-random generator

Methods

reset :: MWCRandT m (Initial s) Source #

step :: a -> MWCRandT m (Obs r s) Source #

MonadEnv m s a r => MonadEnv (DebugLogger m) s a r Source # 

Methods

reset :: DebugLogger m (Initial s) Source #

step :: a -> DebugLogger m (Obs r s) Source #

MonadEnv m s a r => MonadEnv (NoopLogger m) s a r Source # 

Methods

reset :: NoopLogger m (Initial s) Source #

step :: a -> NoopLogger m (Obs r s) Source #

(MonadIO t, MonadThrow t) => MonadEnv (EnvironmentT t) State Action Reward Source # 
(MonadThrow t, MonadIO t) => MonadEnv (EnvironmentT t) State Action Reward Source # 
(MonadIO t, MonadThrow t) => MonadEnv (EnvironmentT t) State Action Reward Source # 
(MonadThrow t, MonadIO t) => MonadEnv (EnvironmentT t) StateFL Action Reward Source # 
MonadEnv e s a r => MonadEnv (StateT t e) s a r Source # 

Methods

reset :: StateT t e (Initial s) Source #

step :: a -> StateT t e (Obs r s) Source #

(Monoid t, MonadEnv e s a r) => MonadEnv (WriterT t e) s a r Source # 

Methods

reset :: WriterT t e (Initial s) Source #

step :: a -> WriterT t e (Obs r s) Source #

MonadEnv e s a r => MonadEnv (ReaderT * t e) s a r Source # 

Methods

reset :: ReaderT * t e (Initial s) Source #

step :: a -> ReaderT * t e (Obs r s) Source #

(Monoid writer, MonadEnv e s a r) => MonadEnv (RWST reader writer state e) s a r Source # 

Methods

reset :: RWST reader writer state e (Initial s) Source #

step :: a -> RWST reader writer state e (Obs r s) Source #

data Obs r o Source #

An observation of the environment will either show that the environment is done with the episode (yielding Done), that the environment has already Terminated, or will return the reward of the last action performed and the next state TODO: return Terminal (or return ()) on failure

Constructors

Next !r !o 
Done !r !(Maybe o) 
Terminated 

Instances

(Eq o, Eq r) => Eq (Obs r o) Source # 

Methods

(==) :: Obs r o -> Obs r o -> Bool #

(/=) :: Obs r o -> Obs r o -> Bool #

(Show o, Show r) => Show (Obs r o) Source # 

Methods

showsPrec :: Int -> Obs r o -> ShowS #

show :: Obs r o -> String #

showList :: [Obs r o] -> ShowS #

data Initial o Source #

When starting an episode, we want to send an indication that the environment is starting without conflating this type with future steps (in Obs r o)

Constructors

Initial !o 
EmptyEpisode 

type Reward = Double Source #

A concrete reward signal.