Fig. 6From: Reinforcement learning for intensive care medicine: actionable clinical insights from novel approaches to reward shaping and off-policy model evaluationDisplayed are three aggregate surface plots over the dataset: the first for physician-chosen actions, the second for policy-recommended actions, and the third, Delta-Q, which is calculated by subtracting physician Q-values from policy Q-values. The Delta-Q plot indicates smaller differences in high (14 +) PEEP and FIO2 (80–100%) settings, suggesting this action aligns closely with the policy's guidance. In contrast, larger Delta-Q values in lower PEEP and FIO2 ranges suggest greater divergence, indicating that these areas may have more room for optimisation in alignment with policy recommendationsBack to article page