No registration required
Yuwei Wang
Title PreFER: Interactive Robo-Advisor with Scoring Mechanism
Abstract Instead of asking a client to specify her risk preference or learning it from her investment choice, we propose an inverse reinforcement learning (IRL) framework to learn her risk preference or the reward function by scoring. Specifically, the robo-advisor requests the client to score unadopted investment advice for improvement, and extracts information from adopted ones for non-interactive periods. We develop the IRL through discrete-time Predictable Forward Exploratory Reward (PreFER) processes, where the exploration is regularized by Tsallis entropy. By interpreting the score as the acceptance probability of an advice, the preference learning becomes an inverse problem of finding the exploratory investment distribution of the client, given investment distribution recommended by the robo-advisor and an acceptance probability in the context of acceptance-rejection method proposed by Von Neumann. Demonstrations are made for both classes of CARA and CRRA utilities. We prove that the density function of the optimal exploratory control attains maximum at the classic optimal strategy in the absence of exploration. We establish a one-to-one correspondence between the observed score and inferred risk aversion parameter. As long as the scores are consistent in ordering, the biasness of scores does not affect the identification of the client's risk aversion for a sufficient large number of interactions. The PreFER process further predicts the risk preference at the next time point from the one just learned, leading to an aggregation of learning power.
----
Hoi Ying Wong
Title: Reinforcement learning without a market simulator
Abstract: Reinforcement learning (RL) has found applications in portfolio selection and optimal stopping problems in finance and insurance. Unlike stochastic control problems, which rely on a given stochastic model, an RL agent strikes a balance between exploration and exploitation, where exploration involves randomized decision rules. However, the RL
framework in the finance and insurance literature usually requires a reliable market simulator during the training procedure. While the construction of a market simulator is largely unknown, this limitation restricts applications to high-frequency trading.
In this talk, I present two examples of RL that do not require a market simulator but use data other than market prices. The first example focuses on learning efficient investment decisions from liquidity spread data. The exploration aspect of RL enables investors to construct a statistical test for rebalancing time that accounts for liquidity (including transaction) costs. Upon rebalancing, we construct a robust investment rule using exploration with KL divergence regularization.
The second example addresses surrendering decisions in variable annuities. The actuarial science literature presents two schools of thought: the optimal surrendering approach based on market information and the surrender intensity approach. Using RL, we demonstrate that the latter can be viewed as an exploratory version of the former. Thus, surrender data can be used to train the RL model for randomized surrender decisions consistent with optimal stopping rules.