Francis Rhys Ward

Cited by

	All	Since 2019
Citations	70	70
h-index	5	5
i10-index	3	3

20212022202320248 11 19 32

Public access

View all

4 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Ibrahim HabliProfessor of Safety-Critical Systems at the University of YorkVerified email at york.ac.uk
Loic Le FolgocAssociate Professor, Télécom Paris, FranceVerified email at telecom-paris.fr
Daniel RueckertTechnical University of Munich and Imperial College LondonVerified email at tum.de
Amir AlansaryResearch Associate, Biomedical Image Analysis Group (BioMedIA), Imperial College LondonVerified email at imperial.ac.uk
Alexey ZakharovUniversity of Oxford - WhiRLVerified email at ic.ac.uk

Francis Rhys Ward

Imperial College London

Verified email at ic.ac.uk - Homepage

AI alignment deception causality reward learning manipulation


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
An assurance case pattern for the interpretability of machine learning in safety-critical systems FR Ward, I Habli Computer Safety, Reliability, and Security. SAFECOMP 2020 Workshops: DECSoS …, 2020	21	2020
Geometric deep learning for post-menstrual age prediction based on the neonatal white matter cortical surface V Vosylius, A Wang, C Waters, A Zakharov, F Ward, L Le Folgoc, J Cupitt, ... Uncertainty for Safe Utilization of Machine Learning in Medical Imaging, and …, 2020	15	2020
Honesty is the best policy: defining and mitigating AI deception F Ward, F Toni, F Belardinelli, T Everitt Advances in Neural Information Processing Systems 36, 2024	12	2024
On Agent Incentives to Manipulate Human Feedback in Multi-Agent Reward Learning Scenarios. FR Ward, F Toni, F Belardinelli AAMAS, 1759-1761, 2022	6	2022
The reasons that agents act: Intention and instrumental goals FR Ward, M MacDermott, F Belardinelli, F Toni, T Everitt arXiv preprint arXiv:2402.07221, 2024	5	2024
Defining deception in structural causal games FR Ward, F Toni, F Belardinelli Proceedings of the 2023 International Conference on Autonomous Agents and …, 2023	4	2023
Towards defining deception in structural causal games FR Ward NeurIPS ML Safety Workshop, 2022	3	2022
AI Sandbagging: Language Models can Strategically Underperform on Evaluations T van der Weij, F Hofstätter, O Jaffe, SF Brown, FR Ward arXiv preprint arXiv:2406.07358, 2024	1	2024
Argumentative reward learning: Reasoning about human preferences FR Ward, F Belardinelli, F Toni arXiv preprint arXiv:2209.14010, 2022	1	2022
A Causal Perspective on AI Deception in Games. FR Ward, F Toni, F Belardinelli AISafety@ IJCAI, 2022	1	2022
Tall tales at different scales: Evaluating scaling trends for deception in language models FR Ward, F Hofstätter, LA Thomson, HM Wood, O Jaffe, P Bartak, ...	1
Experiments with Detecting and Mitigating AI Deception I Sahbane, FR Ward, CH Åslund arXiv preprint arXiv:2306.14816, 2023		2023
AGI Alignment Coursework Ethics, Privacy, AI in Society FR Ward

The system can't perform the operation now. Try again later.

Articles 1–13

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors