Automated justice: between the artificial intelligences that fake and those that persuade

Authors

  • Javier Ercilla García Magistrado Especialista en la Jurisdicción Social

DOI:

https://doi.org/10.46661/lexsocial.11652

Keywords:

Alignment faking, Large Language Models, Compliance gap, AI ethics, Algorithmic justice

Abstract

On December 18, 2024, Anthropic researchers released a study entitled “Alignment Faking in Large Language Models,” which questions the effectiveness of current training and ethical alignment methodologies in Artificial Intelligence. The study’s primary finding points to the ability of Large Language Models (LLMs) to “fake” adherence to certain principles or values when they perceive they are under evaluation, while exhibiting divergent behaviours in contexts where they believe they are unmonitored. This so-called compliance gap highlights fundamental concerns about the reliability, legitimacy, and transparency of such systems, particularly in high-stakes social contexts such as their potential implementation in the administration of justice. This article examines the philosophical and legal implications of this phenomenon, situating it within the ongoing debate over whether a judge must be “good” in a moral sense or simply conform to the law. It also discusses the technical and regulatory challenges posed by AI capable of contextual adaptation strategies, drawing attention to the need for oversight mechanisms akin to those used in judicial systems to ensure proper alignment. Finally, the article addresses the dilemma of whether it is ethically and pragmatically feasible to demand that AI embody an internal “virtue” or whether externally correct moral and legal conduct may suffice.

Downloads

Download data is not yet available.

References

Alexy, R. (2007). Teoría de la argumentación jurídica. Madrid: Centro de Estudios Políticos y Constitucionales.

Asís Roig, R. (2008). La motivación de las decisiones judiciales. En F. Gutiérrez-Alviz Conradi (Dir.), La justicia procesal. Cuadernos de Derecho Judicial (Vol. 6, pp. 1-18). Madrid: Consejo General del Poder Judicial.

Atienza, M. (1991). Las razones del derecho: Teorías de la argumentación jurídica. Madrid: Centro de Estudios Constitucionales.

Bode, L., & Vraga, E. K. (2018). See something, say something: Correction of global health misinformation on social media. Health Communication, 33(9), 1131-1140. https://doi.org/10.1080/10410236.2017.1331312 DOI: https://doi.org/10.1080/10410236.2017.1331312

Calamandrei, P. (1989). Elogio de los jueces escrito por abogados (S. Melendo, M. Garijo, & C. Finzi, Trads.). Ediciones Europa América. (Obra original publicada en 1935)

Ercilla García, J. (2024). La inteligencia artificial y el futuro del razonamiento jurídico. En El impacto de la IA en el aprendizaje y en la práctica del derecho. La Ley. ISBN: 978-8419905963. https://doi.org/10.62659/FA2400206 DOI: https://doi.org/10.62659/FA2400206

Fernández García, E. (2008). Los jueces buenos y los buenos jueces. Algunas sencillas reflexiones y dudas sobre la ética judicial [Good judges and good-hearted judges. Some simple reflections and doubts on judicial ethics]. Derechos y Libertades, 19(II), 17-35.

Greenblatt, R., Denison, C., Wright, B., Roger, F., MacDiarmid, M., Marks, S., Treutlein, J., Belonax, T., Chen, J., Duvenaud, D., Khan, A., Michael, J., Mindermann, S., Perez, E., Petrini, L., Uesato, J., Kaplan, J., Shlegeris, B., Bowman, S. R., & Hubinger, E. (2024). Alignment faking in large language models [Preprint]. https://doi.org/10.48550/arXiv.2412.14093

Huang, G., & Wang, S. (2023). Is artificial intelligence more persuasive than humans? A meta-analysis. Journal of Communication. https://doi.org/10.1093/joc/jqad024 DOI: https://doi.org/10.31234/osf.io/ehg7n

Liu, B., & Wei, L. (2019). Machine authorship in situ: Effect of news organization and news genre on news credibility. Digital Journalism, 7(5), 635-657. https://doi.org/10.1080/21670811.2018.1510740

https://doi.org/10.1080/21670811.2018.1510740 DOI: https://doi.org/10.1080/21670811.2018.1510740

Longoni, C., Bonezzi, A., & Morewedge, C. K. (2019). Resistance to medical artificial intelligence. Journal of Consumer Research, 46(4), 629-650. https://doi.org/10.1093/jcr/ucz013 DOI: https://doi.org/10.1093/jcr/ucz013

Malem Seña, J. F. (2001). ¿Pueden las malas personas ser buenos jueces? Doxa: Cuadernos de Filosofía del Derecho, 24, 379-403. https://doi.org/10.14198/DOXA2001.24.14 DOI: https://doi.org/10.14198/DOXA2001.24.14

Moreso, J., Redondo, M. C., & Navarro, P. (1992). Argumentación jurídica, lógica y decisión judicial. Doxa. nº 11, pp. 247-262. https://doi.org/10.14198/DOXA1992.11.10 DOI: https://doi.org/10.14198/DOXA1992.11.10

Nieto, A. (2000). El arbitrio judicial. Barcelona, España: Ariel.

Salvi, F., Horta Ribeiro, M., Gallotti, R., & West, R. (2024). On the Conversational Persuasiveness of Large Language Models: A Randomized Controlled Trial. arXiv preprint arXiv:2403.14380. https://doi.org/10.21203/rs.3.rs-4429707/v1 DOI: https://doi.org/10.21203/rs.3.rs-4429707/v1

Starke, C., & Lünich, M. (2020). Artificial intelligence for political decision-making in the European Union: Effects on citizens' perceptions of input, throughput, and output legitimacy. Data & Policy, 2, e16. https://doi.org/10.1017/dap.2020.19 DOI: https://doi.org/10.1017/dap.2020.19

Zuluaga Jaramillo, A. F. (2012). La justificación interna en la argumentación jurídica de la Corte Constitucional en la acción de tutela contra sentencia judicial por defecto fáctico. Revista Ratio Juris, 7(14), 89-112. https://doi.org/10.24142/raju.v7n14a3 DOI: https://doi.org/10.24142/raju.v7n14a3

Published

2025-02-19

How to Cite

Ercilla García, J. (2025). Automated justice: between the artificial intelligences that fake and those that persuade. Lex Social: Journal of Social Rights, 15(1), 1–39. https://doi.org/10.46661/lexsocial.11652

Issue

Section

Articles