Skip to main content
Fig. 6 | Intensive Care Medicine Experimental

Fig. 6

From: Reinforcement learning for intensive care medicine: actionable clinical insights from novel approaches to reward shaping and off-policy model evaluation

Fig. 6

Displayed are three aggregate surface plots over the dataset: the first for physician-chosen actions, the second for policy-recommended actions, and the third, Delta-Q, which is calculated by subtracting physician Q-values from policy Q-values. The Delta-Q plot indicates smaller differences in high (14 +) PEEP and FIO2 (80–100%) settings, suggesting this action aligns closely with the policy's guidance. In contrast, larger Delta-Q values in lower PEEP and FIO2 ranges suggest greater divergence, indicating that these areas may have more room for optimisation in alignment with policy recommendations

Back to article page