Calculate Reward function for Model Predictive Optimal Control

This function computes a reward value based on epidemiological simulation data, penalties for exceeding target thresholds, and the cost of non-pharmaceutical interventions (NPIs).

Usage

reward_fun_wd(episimdata, episettings, actions, ii, jj)

Arguments

episimdata: A data frame containing epidemiological simulation data. The function expects specific columns: "C" for cases, "Re" for reproduction number, and "Deaths".
actions: A data frame containing the costs of non-pharmaceutical interventions (NPIs). It expects a column named "cost_of_NPI".
ii: An integer specifying the row index in episimdata to use for reward calculation.
jj: An integer specifying the row index in actions to use for cost retrieval.
alpha: A numeric value representing the weight applied to case error in the reward calculation.
alpha_d: A numeric value representing the weight applied to death error in the reward calculation.
ovp: A numeric value for the penalty applied when cases exceed C_target_pen.
dovp: A numeric value for the penalty applied when deaths exceed D_target_pen.
C_target: A numeric target for the number of cases.
C_target_pen: A numeric threshold for the penalty on cases.
D_target: A numeric target for the number of deaths.
D_target_pen: A numeric threshold for the penalty on deaths.

Value

A numeric value representing the calculated reward.

Details

The reward is computed as: $$reward = -\alpha * |C - C_{target}| - penalty_{C} - NPI_{cost} - \alpha_d * |Deaths - D_{target}| - penalty_{D}$$

Where:

$penalty_{C}$ is applied if cases exceed C_target_pen.
$penalty_{D}$ is applied if deaths exceed D_target_pen.
NPI_{cost} is retrieved from the actions data frame based on index jj.

Examples

reward_fun_wd(episimdata, alpha = 0.5, alpha_d = 0.7, ovp = 50, dovp = 30,
              C_target = 120, C_target_pen = 140, D_target = 7, D_target_pen = 12,
              actions = actions, ii = 1, jj = 2)
#> Error in reward_fun_wd(episimdata, alpha = 0.5, alpha_d = 0.7, ovp = 50,     dovp = 30, C_target = 120, C_target_pen = 140, D_target = 7,     D_target_pen = 12, actions = actions, ii = 1, jj = 2): unused arguments (alpha = 0.5, alpha_d = 0.7, ovp = 50, dovp = 30, C_target = 120, C_target_pen = 140, D_target = 7, D_target_pen = 12)