Skip to main content
Fig. 4 | Intensive Care Medicine Experimental

Fig. 4

From: Reinforcement learning for intensive care medicine: actionable clinical insights from novel approaches to reward shaping and off-policy model evaluation

Fig. 4

The histograms illustrate the evaluation outcomes with a detailing the Off-Policy Evaluation (OPE) for policies with positive OPE returns and b presenting the cross-OPE results. The vertical axis shows the density distribution, indicating data spread. The horizontal axis measures the relative performance or value of the target policy against the behaviour policy, using different reward functions without a specific unit. The "reward function version" corresponds to a series of weight factors: [0.25, 0.5, 1, 2, 4, 8], which are assigned to versions 1 through 6, respectively

Back to article page