Follow
Francis Rhys Ward
Title
Cited by
Cited by
Year
An assurance case pattern for the interpretability of machine learning in safety-critical systems
FR Ward, I Habli
Computer Safety, Reliability, and Security. SAFECOMP 2020 Workshops: DECSoS …, 2020
192020
Geometric deep learning for post-menstrual age prediction based on the neonatal white matter cortical surface
V Vosylius, A Wang, C Waters, A Zakharov, F Ward, L Le Folgoc, J Cupitt, ...
Uncertainty for Safe Utilization of Machine Learning in Medical Imaging, and …, 2020
162020
Honesty is the best policy: defining and mitigating AI deception
F Ward, F Toni, F Belardinelli, T Everitt
Advances in Neural Information Processing Systems 36, 2024
122024
On Agent Incentives to Manipulate Human Feedback in Multi-Agent Reward Learning Scenarios.
FR Ward, F Toni, F Belardinelli
AAMAS, 1759-1761, 2022
62022
The reasons that agents act: Intention and instrumental goals
FR Ward, M MacDermott, F Belardinelli, F Toni, T Everitt
arXiv preprint arXiv:2402.07221, 2024
32024
Defining deception in structural causal games
FR Ward, F Toni, F Belardinelli
Proceedings of the 2023 International Conference on Autonomous Agents and …, 2023
32023
Towards Defining Deception in Structural Causal Games
FR Ward
NeurIPS ML Safety Workshop, 2022
32022
Tall tales at different scales: Evaluating scaling trends for deception in language models
FR Ward, F Hofstätter, LA Thomson, HM Wood, O Jaffe, P Bartak, ...
12023
Argumentative reward learning: Reasoning about human preferences
FR Ward, F Belardinelli, F Toni
arXiv preprint arXiv:2209.14010, 2022
12022
A Causal Perspective on AI Deception in Games.
FR Ward, F Toni, F Belardinelli
AISafety@ IJCAI, 2022
12022
AI Sandbagging: Language Models can Strategically Underperform on Evaluations
T van der Weij, F Hofstätter, O Jaffe, SF Brown, FR Ward
arXiv preprint arXiv:2406.07358, 2024
2024
Experiments with Detecting and Mitigating AI Deception
I Sahbane, FR Ward, CH Åslund
arXiv preprint arXiv:2306.14816, 2023
2023
AGI Alignment Coursework Ethics, Privacy, AI in Society
FR Ward
The system can't perform the operation now. Try again later.
Articles 1–13