Francis Rhys Ward
Cited by
Cited by
An assurance case pattern for the interpretability of machine learning in safety-critical systems
FR Ward, I Habli
Computer Safety, Reliability, and Security. SAFECOMP 2020 Workshops: DECSoS …, 2020
Geometric deep learning for post-menstrual age prediction based on the neonatal white matter cortical surface
V Vosylius, A Wang, C Waters, A Zakharov, F Ward, L Le Folgoc, J Cupitt, ...
Uncertainty for Safe Utilization of Machine Learning in Medical Imaging, and …, 2020
Honesty is the best policy: defining and mitigating AI deception
F Ward, F Toni, F Belardinelli, T Everitt
Advances in Neural Information Processing Systems 36, 2024
On Agent Incentives to Manipulate Human Feedback in Multi-Agent Reward Learning Scenarios.
FR Ward, F Toni, F Belardinelli
AAMAS, 1759-1761, 2022
The reasons that agents act: Intention and instrumental goals
FR Ward, M MacDermott, F Belardinelli, F Toni, T Everitt
arXiv preprint arXiv:2402.07221, 2024
Defining deception in structural causal games
FR Ward, F Toni, F Belardinelli
Proceedings of the 2023 International Conference on Autonomous Agents and …, 2023
Towards Defining Deception in Structural Causal Games
FR Ward
NeurIPS ML Safety Workshop, 2022
Tall tales at different scales: Evaluating scaling trends for deception in language models
FR Ward, F Hofstätter, LA Thomson, HM Wood, O Jaffe, P Bartak, ...
Argumentative reward learning: Reasoning about human preferences
FR Ward, F Belardinelli, F Toni
arXiv preprint arXiv:2209.14010, 2022
A Causal Perspective on AI Deception in Games.
FR Ward, F Toni, F Belardinelli
AISafety@ IJCAI, 2022
AI Sandbagging: Language Models can Strategically Underperform on Evaluations
T van der Weij, F Hofstätter, O Jaffe, SF Brown, FR Ward
arXiv preprint arXiv:2406.07358, 2024
Experiments with Detecting and Mitigating AI Deception
I Sahbane, FR Ward, CH Åslund
arXiv preprint arXiv:2306.14816, 2023
AGI Alignment Coursework Ethics, Privacy, AI in Society
FR Ward
The system can't perform the operation now. Try again later.
Articles 1–13