Learning reward frequency over reward probability: A tale of two learning rules

Hilary J. Don; A. Ross Otto; Astin C. Cornwall; T. Davis; Darrell A. Worthy

doi:10.1016/j.cognition.2019.104042

Learning reward frequency over reward probability: A tale of two learning rules

Hilary J. Don, A. Ross Otto, Astin C. Cornwall, T. Davis, Darrell A. Worthy

Psychological Sciences

Research output: Contribution to journal › Article › peer-review

11 Scopus citations

Abstract

Learning about the expected value of choice alternatives associated with reward is critical for adaptive behavior. Although human choice preferences are affected by the presentation frequency of reward-related alternatives, this may not be captured by some dominant models of value learning, such as the delta rule. In this study, we examined whether reward learning is driven more by learning the probability of reward provided by each option, or how frequently each option has been rewarded, and assess how well models based on average reward (e.g. the delta model) and models based on cumulative reward (e.g. the decay model) can account for choice preferences. In a binary-outcome choice task, participants selected between pairs of options that had reward probabilities of 0.65 (A) versus 0.35 (B) or 0.75 (C) versus 0.25 (D). Crucially, during training there were twice the number of AB trials as CD trials, such that option A was associated with higher cumulative reward, while option C gave higher average reward. Participants then decided between novel combinations of options (e.g., AC). Most participants preferred option A over C, a result predicted by the Decay model, but not the Delta model. We also compared the Delta and Decay models to both more simplified as well as more complex models that assumed additional mechanisms, such as representation of uncertainty. Overall, models that assume learning about cumulative reward provided the best account of the data.

Original language	English
Article number	104042
Journal	Cognition
Volume	193
DOIs	https://doi.org/10.1016/j.cognition.2019.104042
State	Published - Dec 2019

Keywords

Decay rule
Delta rule
Prediction error
Probability learning
Reinforcement learning
Reward frequency

Access to Document

10.1016/j.cognition.2019.104042

Cite this

@article{5c792081bd4048c7a083cdf676de7f33,

title = "Learning reward frequency over reward probability: A tale of two learning rules",

abstract = "Learning about the expected value of choice alternatives associated with reward is critical for adaptive behavior. Although human choice preferences are affected by the presentation frequency of reward-related alternatives, this may not be captured by some dominant models of value learning, such as the delta rule. In this study, we examined whether reward learning is driven more by learning the probability of reward provided by each option, or how frequently each option has been rewarded, and assess how well models based on average reward (e.g. the delta model) and models based on cumulative reward (e.g. the decay model) can account for choice preferences. In a binary-outcome choice task, participants selected between pairs of options that had reward probabilities of 0.65 (A) versus 0.35 (B) or 0.75 (C) versus 0.25 (D). Crucially, during training there were twice the number of AB trials as CD trials, such that option A was associated with higher cumulative reward, while option C gave higher average reward. Participants then decided between novel combinations of options (e.g., AC). Most participants preferred option A over C, a result predicted by the Decay model, but not the Delta model. We also compared the Delta and Decay models to both more simplified as well as more complex models that assumed additional mechanisms, such as representation of uncertainty. Overall, models that assume learning about cumulative reward provided the best account of the data.",

keywords = "Decay rule, Delta rule, Prediction error, Probability learning, Reinforcement learning, Reward frequency",

author = "Don, {Hilary J.} and Otto, {A. Ross} and Cornwall, {Astin C.} and T. Davis and Worthy, {Darrell A.}",

note = "Publisher Copyright: {\textcopyright} 2019 Elsevier B.V.",

year = "2019",

month = dec,

doi = "10.1016/j.cognition.2019.104042",

language = "English",

volume = "193",

journal = "Cognition",

issn = "0010-0277",

}

TY - JOUR

T1 - Learning reward frequency over reward probability

T2 - A tale of two learning rules

AU - Don, Hilary J.

AU - Otto, A. Ross

AU - Cornwall, Astin C.

AU - Davis, T.

AU - Worthy, Darrell A.

PY - 2019/12

Y1 - 2019/12

N2 - Learning about the expected value of choice alternatives associated with reward is critical for adaptive behavior. Although human choice preferences are affected by the presentation frequency of reward-related alternatives, this may not be captured by some dominant models of value learning, such as the delta rule. In this study, we examined whether reward learning is driven more by learning the probability of reward provided by each option, or how frequently each option has been rewarded, and assess how well models based on average reward (e.g. the delta model) and models based on cumulative reward (e.g. the decay model) can account for choice preferences. In a binary-outcome choice task, participants selected between pairs of options that had reward probabilities of 0.65 (A) versus 0.35 (B) or 0.75 (C) versus 0.25 (D). Crucially, during training there were twice the number of AB trials as CD trials, such that option A was associated with higher cumulative reward, while option C gave higher average reward. Participants then decided between novel combinations of options (e.g., AC). Most participants preferred option A over C, a result predicted by the Decay model, but not the Delta model. We also compared the Delta and Decay models to both more simplified as well as more complex models that assumed additional mechanisms, such as representation of uncertainty. Overall, models that assume learning about cumulative reward provided the best account of the data.

AB - Learning about the expected value of choice alternatives associated with reward is critical for adaptive behavior. Although human choice preferences are affected by the presentation frequency of reward-related alternatives, this may not be captured by some dominant models of value learning, such as the delta rule. In this study, we examined whether reward learning is driven more by learning the probability of reward provided by each option, or how frequently each option has been rewarded, and assess how well models based on average reward (e.g. the delta model) and models based on cumulative reward (e.g. the decay model) can account for choice preferences. In a binary-outcome choice task, participants selected between pairs of options that had reward probabilities of 0.65 (A) versus 0.35 (B) or 0.75 (C) versus 0.25 (D). Crucially, during training there were twice the number of AB trials as CD trials, such that option A was associated with higher cumulative reward, while option C gave higher average reward. Participants then decided between novel combinations of options (e.g., AC). Most participants preferred option A over C, a result predicted by the Decay model, but not the Delta model. We also compared the Delta and Decay models to both more simplified as well as more complex models that assumed additional mechanisms, such as representation of uncertainty. Overall, models that assume learning about cumulative reward provided the best account of the data.

KW - Decay rule

KW - Delta rule

KW - Prediction error

KW - Probability learning

KW - Reinforcement learning

KW - Reward frequency

UR - http://www.scopus.com/inward/record.url?scp=85070689787&partnerID=8YFLogxK

U2 - 10.1016/j.cognition.2019.104042

DO - 10.1016/j.cognition.2019.104042

M3 - Article

C2 - 31430606

AN - SCOPUS:85070689787

SN - 0010-0277

VL - 193

JO - Cognition

JF - Cognition

M1 - 104042

ER -

Learning reward frequency over reward probability: A tale of two learning rules

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this