Automated justice: between the artificial intelligences that fake and those that persuade
DOI:
https://doi.org/10.46661/lexsocial.11652Keywords:
Alignment faking, Large Language Models, Compliance gap, AI ethics, Algorithmic justiceAbstract
On December 18, 2024, Anthropic researchers released a study entitled “Alignment Faking in Large Language Models,” which questions the effectiveness of current training and ethical alignment methodologies in Artificial Intelligence. The study’s primary finding points to the ability of Large Language Models (LLMs) to “fake” adherence to certain principles or values when they perceive they are under evaluation, while exhibiting divergent behaviours in contexts where they believe they are unmonitored. This so-called compliance gap highlights fundamental concerns about the reliability, legitimacy, and transparency of such systems, particularly in high-stakes social contexts such as their potential implementation in the administration of justice. This article examines the philosophical and legal implications of this phenomenon, situating it within the ongoing debate over whether a judge must be “good” in a moral sense or simply conform to the law. It also discusses the technical and regulatory challenges posed by AI capable of contextual adaptation strategies, drawing attention to the need for oversight mechanisms akin to those used in judicial systems to ensure proper alignment. Finally, the article addresses the dilemma of whether it is ethically and pragmatically feasible to demand that AI embody an internal “virtue” or whether externally correct moral and legal conduct may suffice.
Downloads
References
Alexy, R. (2007). Teoría de la argumentación jurídica. Madrid: Centro de Estudios Políticos y Constitucionales.
Asís Roig, R. (2008). La motivación de las decisiones judiciales. En F. Gutiérrez-Alviz Conradi (Dir.), La justicia procesal. Cuadernos de Derecho Judicial (Vol. 6, pp. 1-18). Madrid: Consejo General del Poder Judicial.
Atienza, M. (1991). Las razones del derecho: Teorías de la argumentación jurídica. Madrid: Centro de Estudios Constitucionales.
Bode, L., & Vraga, E. K. (2018). See something, say something: Correction of global health misinformation on social media. Health Communication, 33(9), 1131-1140. https://doi.org/10.1080/10410236.2017.1331312 DOI: https://doi.org/10.1080/10410236.2017.1331312
Calamandrei, P. (1989). Elogio de los jueces escrito por abogados (S. Melendo, M. Garijo, & C. Finzi, Trads.). Ediciones Europa América. (Obra original publicada en 1935)
Ercilla García, J. (2024). La inteligencia artificial y el futuro del razonamiento jurídico. En El impacto de la IA en el aprendizaje y en la práctica del derecho. La Ley. ISBN: 978-8419905963. https://doi.org/10.62659/FA2400206 DOI: https://doi.org/10.62659/FA2400206
Fernández García, E. (2008). Los jueces buenos y los buenos jueces. Algunas sencillas reflexiones y dudas sobre la ética judicial [Good judges and good-hearted judges. Some simple reflections and doubts on judicial ethics]. Derechos y Libertades, 19(II), 17-35.
Greenblatt, R., Denison, C., Wright, B., Roger, F., MacDiarmid, M., Marks, S., Treutlein, J., Belonax, T., Chen, J., Duvenaud, D., Khan, A., Michael, J., Mindermann, S., Perez, E., Petrini, L., Uesato, J., Kaplan, J., Shlegeris, B., Bowman, S. R., & Hubinger, E. (2024). Alignment faking in large language models [Preprint]. https://doi.org/10.48550/arXiv.2412.14093
Huang, G., & Wang, S. (2023). Is artificial intelligence more persuasive than humans? A meta-analysis. Journal of Communication. https://doi.org/10.1093/joc/jqad024 DOI: https://doi.org/10.31234/osf.io/ehg7n
Liu, B., & Wei, L. (2019). Machine authorship in situ: Effect of news organization and news genre on news credibility. Digital Journalism, 7(5), 635-657. https://doi.org/10.1080/21670811.2018.1510740
https://doi.org/10.1080/21670811.2018.1510740 DOI: https://doi.org/10.1080/21670811.2018.1510740
Longoni, C., Bonezzi, A., & Morewedge, C. K. (2019). Resistance to medical artificial intelligence. Journal of Consumer Research, 46(4), 629-650. https://doi.org/10.1093/jcr/ucz013 DOI: https://doi.org/10.1093/jcr/ucz013
Malem Seña, J. F. (2001). ¿Pueden las malas personas ser buenos jueces? Doxa: Cuadernos de Filosofía del Derecho, 24, 379-403. https://doi.org/10.14198/DOXA2001.24.14 DOI: https://doi.org/10.14198/DOXA2001.24.14
Moreso, J., Redondo, M. C., & Navarro, P. (1992). Argumentación jurídica, lógica y decisión judicial. Doxa. nº 11, pp. 247-262. https://doi.org/10.14198/DOXA1992.11.10 DOI: https://doi.org/10.14198/DOXA1992.11.10
Nieto, A. (2000). El arbitrio judicial. Barcelona, España: Ariel.
Salvi, F., Horta Ribeiro, M., Gallotti, R., & West, R. (2024). On the Conversational Persuasiveness of Large Language Models: A Randomized Controlled Trial. arXiv preprint arXiv:2403.14380. https://doi.org/10.21203/rs.3.rs-4429707/v1 DOI: https://doi.org/10.21203/rs.3.rs-4429707/v1
Starke, C., & Lünich, M. (2020). Artificial intelligence for political decision-making in the European Union: Effects on citizens' perceptions of input, throughput, and output legitimacy. Data & Policy, 2, e16. https://doi.org/10.1017/dap.2020.19 DOI: https://doi.org/10.1017/dap.2020.19
Zuluaga Jaramillo, A. F. (2012). La justificación interna en la argumentación jurídica de la Corte Constitucional en la acción de tutela contra sentencia judicial por defecto fáctico. Revista Ratio Juris, 7(14), 89-112. https://doi.org/10.24142/raju.v7n14a3 DOI: https://doi.org/10.24142/raju.v7n14a3
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Javier Ercilla García

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
-
Atribución — Usted debe dar crédito de manera adecuada, brindar un enlace a la licencia, e indicar si se han realizado cambios. Puede hacerlo en cualquier forma razonable, pero no de forma tal que sugiera que usted o su uso tienen el apoyo de la licenciante.
-
NoComercial — Usted no puede hacer uso del material con propósitos comerciales.
-
CompartirIgual — Si remezcla, transforma o crea a partir del material, deberá difundir sus contribuciones bajo la misma licencia que el original.