Fig. 4From: Reinforcement learning for intensive care medicine: actionable clinical insights from novel approaches to reward shaping and off-policy model evaluationThe histograms illustrate the evaluation outcomes with a detailing the Off-Policy Evaluation (OPE) for policies with positive OPE returns and b presenting the cross-OPE results. The vertical axis shows the density distribution, indicating data spread. The horizontal axis measures the relative performance or value of the target policy against the behaviour policy, using different reward functions without a specific unit. The "reward function version" corresponds to a series of weight factors: [0.25, 0.5, 1, 2, 4, 8], which are assigned to versions 1 through 6, respectivelyBack to article page