TY - JOUR
T1 - Learning reward frequency over reward probability
T2 - A tale of two learning rules
AU - Don, Hilary J.
AU - Otto, A. Ross
AU - Cornwall, Astin C.
AU - Davis, T.
AU - Worthy, Darrell A.
N1 - Funding Information:
This work was supported by grant AG043425 from the National Institute of Aging (NIA), United States to DAW. We thank research assistants Shannon Yap, Tuyet Linh Huynh, Sumedha Rao, Kirsten Downs, Ashton Wilson, Lilian Garza, Josh Philip, Mikayla Herman, Samantha Rumann, Kaila Powell, Kavyapriya Murali, Kinsey Blackburn, Shannon Pavloske, Marena De-Angelis, Catherine Lee, Melissa Hernandez, Tiffany Dobry, Xavier Jefferson, and lab manager Kaitlyn McCauley for assistance with the data collection.
Publisher Copyright:
© 2019 Elsevier B.V.
PY - 2019/12
Y1 - 2019/12
N2 - Learning about the expected value of choice alternatives associated with reward is critical for adaptive behavior. Although human choice preferences are affected by the presentation frequency of reward-related alternatives, this may not be captured by some dominant models of value learning, such as the delta rule. In this study, we examined whether reward learning is driven more by learning the probability of reward provided by each option, or how frequently each option has been rewarded, and assess how well models based on average reward (e.g. the delta model) and models based on cumulative reward (e.g. the decay model) can account for choice preferences. In a binary-outcome choice task, participants selected between pairs of options that had reward probabilities of 0.65 (A) versus 0.35 (B) or 0.75 (C) versus 0.25 (D). Crucially, during training there were twice the number of AB trials as CD trials, such that option A was associated with higher cumulative reward, while option C gave higher average reward. Participants then decided between novel combinations of options (e.g., AC). Most participants preferred option A over C, a result predicted by the Decay model, but not the Delta model. We also compared the Delta and Decay models to both more simplified as well as more complex models that assumed additional mechanisms, such as representation of uncertainty. Overall, models that assume learning about cumulative reward provided the best account of the data.
AB - Learning about the expected value of choice alternatives associated with reward is critical for adaptive behavior. Although human choice preferences are affected by the presentation frequency of reward-related alternatives, this may not be captured by some dominant models of value learning, such as the delta rule. In this study, we examined whether reward learning is driven more by learning the probability of reward provided by each option, or how frequently each option has been rewarded, and assess how well models based on average reward (e.g. the delta model) and models based on cumulative reward (e.g. the decay model) can account for choice preferences. In a binary-outcome choice task, participants selected between pairs of options that had reward probabilities of 0.65 (A) versus 0.35 (B) or 0.75 (C) versus 0.25 (D). Crucially, during training there were twice the number of AB trials as CD trials, such that option A was associated with higher cumulative reward, while option C gave higher average reward. Participants then decided between novel combinations of options (e.g., AC). Most participants preferred option A over C, a result predicted by the Decay model, but not the Delta model. We also compared the Delta and Decay models to both more simplified as well as more complex models that assumed additional mechanisms, such as representation of uncertainty. Overall, models that assume learning about cumulative reward provided the best account of the data.
KW - Decay rule
KW - Delta rule
KW - Prediction error
KW - Probability learning
KW - Reinforcement learning
KW - Reward frequency
UR - http://www.scopus.com/inward/record.url?scp=85070689787&partnerID=8YFLogxK
U2 - 10.1016/j.cognition.2019.104042
DO - 10.1016/j.cognition.2019.104042
M3 - Article
C2 - 31430606
AN - SCOPUS:85070689787
SN - 0010-0277
VL - 193
JO - Cognition
JF - Cognition
M1 - 104042
ER -