Calculate Reward function for Model Predictive Optimal Control
reward_fun_wd.RdThis function computes a reward value based on epidemiological simulation data, penalties for exceeding target thresholds, and the cost of non-pharmaceutical interventions (NPIs).
Arguments
- episimdata
A data frame containing epidemiological simulation data. The function expects specific columns:
"C"for cases,"Re"for reproduction number, and"Deaths".- actions
A data frame containing the costs of non-pharmaceutical interventions (NPIs). It expects a column named
"cost_of_NPI".- ii
An integer specifying the row index in
episimdatato use for reward calculation.- jj
An integer specifying the row index in
actionsto use for cost retrieval.- alpha
A numeric value representing the weight applied to case error in the reward calculation.
- alpha_d
A numeric value representing the weight applied to death error in the reward calculation.
- ovp
A numeric value for the penalty applied when cases exceed
C_target_pen.- dovp
A numeric value for the penalty applied when deaths exceed
D_target_pen.- C_target
A numeric target for the number of cases.
- C_target_pen
A numeric threshold for the penalty on cases.
- D_target
A numeric target for the number of deaths.
- D_target_pen
A numeric threshold for the penalty on deaths.
Details
The reward is computed as: $$reward = -\alpha * |C - C_{target}| - penalty_{C} - NPI_{cost} - \alpha_d * |Deaths - D_{target}| - penalty_{D}$$
Where:
\(penalty_{C}\) is applied if cases exceed
C_target_pen.\(penalty_{D}\) is applied if deaths exceed
D_target_pen.NPI_{cost}is retrieved from theactionsdata frame based on indexjj.
Examples
reward_fun_wd(episimdata, alpha = 0.5, alpha_d = 0.7, ovp = 50, dovp = 30,
C_target = 120, C_target_pen = 140, D_target = 7, D_target_pen = 12,
actions = actions, ii = 1, jj = 2)
#> Error in reward_fun_wd(episimdata, alpha = 0.5, alpha_d = 0.7, ovp = 50, dovp = 30, C_target = 120, C_target_pen = 140, D_target = 7, D_target_pen = 12, actions = actions, ii = 1, jj = 2): unused arguments (alpha = 0.5, alpha_d = 0.7, ovp = 50, dovp = 30, C_target = 120, C_target_pen = 140, D_target = 7, D_target_pen = 12)