Skip to main content
Fig. 5 | Intensive Care Medicine Experimental

Fig. 5

From: Reinforcement learning for intensive care medicine: actionable clinical insights from novel approaches to reward shaping and off-policy model evaluation

Fig. 5

Distribution of Q-values for all state-action pairs within the historical dataset. The x-axis denotes the Q-values, and the y-axis represents their frequency. The plot reveals a pronounced shift to the left in the Q-value distribution for 'PEEP 10–14 and FiO2 80–100%', implying a lower expected reward. A policy based on this model will likely infrequently suggest 'PEEP 10–14 and FiO2 80–100%'. Each curve in the plot is labelled according to the corresponding 'PEEP' and 'FiO2' action pair, as specified in the legend

Back to article page