The system considered may be in one of [n] states at any point in time; its probability law is a Markov process that depends on the policy (control) chosen. The return to the system over a given planning horizon is the integral over that horizon of a return rate that depends on both the policy and the sample path of the process. A necessary and sufficient condition is obtained for a policy that maximizes the expected return over the given planning horizon. A constructive proof is given that there is a piecewise constant policy that is optimal. A bound on the number of switches (points where the piecewise constant policy jumps) is obtained for the case in which there are two possible states and they communicate for one element. 26 pp. Ref.