"Preference as Reward, Maximum Preference Optimization with Importance ..."

Zaifan Jiang, Xing Huang, Chao Wei (2023)

Details and statistics

DOI: 10.48550/ARXIV.2312.16430

access: open

type: Informal or Other Publication

metadata version: 2024-01-18

a service of  Schloss Dagstuhl - Leibniz Center for Informatics