reinforce-0.0.0.1: Reinforcement learning in Haskell

Copyright(c) Sentenai 2017
LicenseProprietary
Maintainersam@sentenai.com
Stabilityexperimental
Portabilitynon-portable
Safe HaskellNone
LanguageHaskell2010

Environments.CartPole

Description

  • CartPole by Sutton et al.

Taken from https://webdocs.cs.ualberta.ca/~sutton/book/code/pole.c with some added insights from the OpenAI gym

cart_and_pole: the cart and pole dynamics; given action and current state, estimates next state

cart_pole: Takes an action (0 or 1) and the current values of the four state variables and updates their values by estimating the state TAU seconds later.

Synopsis

Documentation

runEnvironmentWithSeed :: Environment () -> GenIO -> IO (DList Event) Source #

run an environment with an explicit seed

runEnvironmentWithSeed_ :: Environment () -> GenIO -> IO () Source #

same as runEnvironmentWithSeed but don't return history

runEnvironment :: Environment () -> IO (DList Event) Source #

run an environment and create a new random generator for each effectful action

runEnvironment_ :: Environment () -> IO () Source #

same as runEnvironment but don't return history

data Event r o a Source #

Our primary datatype for an event in a trace. Contains the episode number, reward, state, and action taken (in that order). TODO: change the ordering to Event Integer s a r

Constructors

Event Integer r o a 

Instances

MonadWriter (DList Event) Environment # 
Monad t => MonadWriter (DList (Event Reward s a)) (GymEnvironmentT s a t) # 

Methods

writer :: (a, DList (Event Reward s a)) -> GymEnvironmentT s a t a #

tell :: DList (Event Reward s a) -> GymEnvironmentT s a t () #

listen :: GymEnvironmentT s a t a -> GymEnvironmentT s a t (a, DList (Event Reward s a)) #

pass :: GymEnvironmentT s a t (a, DList (Event Reward s a) -> DList (Event Reward s a)) -> GymEnvironmentT s a t a #

(Show a, Show o, Show r) => Show (Event r o a) Source # 

Methods

showsPrec :: Int -> Event r o a -> ShowS #

show :: Event r o a -> String #

showList :: [Event r o a] -> ShowS #

data Action Source #

Cartpole can only go left or right has an action space of "discrete 2" containing {0..n-1}.

FIXME: Migrate this to either a more generic "directions" actions (would need things like "up", "down" versions as well) or a "discrete actions" version. I'm a fan of the former.

Instances

Bounded Action Source # 
Enum Action Source # 
Eq Action Source # 

Methods

(==) :: Action -> Action -> Bool #

(/=) :: Action -> Action -> Bool #

Ord Action Source # 
Show Action Source # 
Generic Action Source # 

Associated Types

type Rep Action :: * -> * #

Methods

from :: Action -> Rep Action x #

to :: Rep Action x -> Action #

Hashable Action Source # 

Methods

hashWithSalt :: Int -> Action -> Int #

hash :: Action -> Int #

ToJSON Action Source # 
DiscreteActionSpace Action Source # 

Associated Types

type Size Action :: Nat Source #

MonadEnv Environment StateCP Action Reward Source # 
MonadWriter (DList Event) Environment # 
type Rep Action Source # 
type Rep Action = D1 (MetaData "Action" "Data.CartPole" "reinforce-0.0.0.1-BYNakn0URySEY5wecxfdnO" False) ((:+:) (C1 (MetaCons "GoLeft" PrefixI False) U1) (C1 (MetaCons "GoRight" PrefixI False) U1))
type Size Action Source # 
type Size Action = 2

data StateCP Source #

The state of a cart on a pole in a CartPole environment

Instances

Eq StateCP Source # 

Methods

(==) :: StateCP -> StateCP -> Bool #

(/=) :: StateCP -> StateCP -> Bool #

Ord StateCP Source # 
Show StateCP Source # 
Generic StateCP Source # 

Associated Types

type Rep StateCP :: * -> * #

Methods

from :: StateCP -> Rep StateCP x #

to :: Rep StateCP x -> StateCP #

Monoid StateCP Source # 
Hashable StateCP Source # 

Methods

hashWithSalt :: Int -> StateCP -> Int #

hash :: StateCP -> Int #

FromJSON StateCP Source # 
StateSpace StateCP Source # 
StateSpaceStatic StateCP Source # 

Associated Types

type Size StateCP :: Nat Source #

Methods

toR :: StateCP -> R (Size StateCP) Source #

MonadEnv Environment StateCP Action Reward Source # 
MonadWriter (DList Event) Environment # 
type Rep StateCP Source # 
type Size StateCP Source # 
type Size StateCP = 4