About Lazaro Observatory

What is Lázaro?

Lázaro is an observatory of anglicism usage in the Spanish press. The purpose of this project is to apply a data-driven approach to the study of anglicisms (ie, unadapted lexical borrowings from English) in Spanish newspapers. Every day, Lázaro collects the latests news published in 22 Spanish news sources, analyzes them and extracts the anglicisms that have been used in the daily news.

This talk (in Spanish) summarizes how the project was built :

What news sources does Lázaro track?

The observatory currently monitors the following sources :

Source	Topic
El País	General news
elDiario.es	General news
ABC	General news
El Mundo	General news
La Vanguardia	General news
El Confidencial	General news
20 Minutos	General news
Agencia EFE	General news
Agencia Sinc	Science & Tech
Muy Interesante	Science & Tech
La Marea	Politics
El Salto	Politics
El Economista	Economy
Cinco Días	Economy
JotDown	Culture
El Mundo Today	Satirical news
Marca	Sports
Rolling Stones	Music
Fotogramas	Cinema
Diez Minutos	Gossip
Men's Health	Lifestyle
Elle	Lifestyle

How does Lázaro work?

The core of the project is a Machine Learning model that extracts unadapted lexical borrowings (especially English lexical borrowings or anglicisms) from Spanish articles. The model is a BiLSTM-CRF model fed with bilingual EN-ES embeddings, along with subword embeddings (more info on the model can be found in the ACL paper). A previous version of the observatory that was active since April 2020 to August 2022 ran on a Conditional Random Field (CRF) model (more information about that previous model can be found here).

The code of the observatory and the training corpus are available in GitHub. The anglicism detection model can also be used through HuggingFace model hub or via pylazaro Python library.

More information about the model and the training corpus can be found in the following publications:

Álvarez Mellado, E., Lignos, C. Detecting Unassimilated Borrowings in Spanish: An Annotated Corpus and Approaches to Modeling,Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022.
Álvarez Mellado, E., Extracting English Lexical Borrowings from Spanish Newswire, Proceedings of the Society for Computation in Linguistics: Vol. 4 , Article 41, 2021.
Álvarez Mellado, E., An Annotated Corpus of Emerging Anglicisms in Spanish Newspaper Headlines, Proceedings of the The 4th Workshop on Computational Approaches to Code Switching, pp. 1-8, 2020.
Álvarez Mellado, E., Lázaro: An Extractor of Emergent Anglicisms in Spanish Newswire, MS thesis, Brandeis University, 2020.

Twitter bot: `@lazarobot`

The Twitter bot @lazarobot tweets every day the new anglicisms extracted by the model (ie, anglicisms that have never been seen before by the model), along with its context and a link to the article where it was found.

What Lázaro is not

The purpose of this project is to describe and analyze the usage of anglicisms in the Spanish press. This project seeks by no means to critizise or condemn the usage of anglicisms, or those that use them.

The motivation behind Lázaro Observatory is not to defend an alleged linguistic purity, but to study the phenomenon of lexical borrowing from a descriptive and data driven point of view

Why Lázaro?

The name of the project, Lázaro, is a tribute to the Spanish linguist Lázaro Carreter, whose prescriptivist columns against the usage of the anglicisms in the Spanish press were extremely popular in Spain during the 1980s and the 1990s.

Recognitions & awards

The project behind Lázaro Observatory has received the following awards:

Adam Kilgarriff Prize, awarded every two years and intended to recognise outstanding work in the fields of corpus linguistics, computational linguistics, and lexicography.
Archiletras Award for research projects awarded by Archiletras magazine
Generation Google Scholarship awarded by Google
HDH 2021 Award to Best Resource awarded by the Hispanic Digital Humanitites association
Outstanding Corpus Thesis Award (MS level) awarded by the Institute for Corpus Research (Incheon National University, South Korea)
Karen Spärck Jones Award for Outstanding Achievement in Natural Language Processing awarded by Brandeis University (Massachusetts)

Lázaro Observatory in the media:

Lázaro Observatorio has been featured in the following Spanish media:

Interview at Un idioma sin fronteras in the Spanish National Radio (RNE).
X-ray of anglicism usage in the Spanish press, in Archiletras.
Interview for La Tarde, in COPE radio station.
20 anglicismos nuevos cada día, article by Álex Grijelmo for El País.
Julia en la Onda, with Julia Otero at Onda Cero station [starting at 1:10:00].
Con la lengua fuera, podcast about Linguistics directed by Macarena Gil and Nerea Fernández de Gobeo.
En la punta de la lengua, at Cadena SER Burgos, directed by Raúl Urbina.

Projects using Lázaro Observatory:

Research and third-party projects that use the data provided by Lázaro Observatory:

Núñez Nogueroles, E. E., & Luján-García, C. (2022), Percepciones y uso autodeclarado de anglicismos del campo de las TIC por parte de estudiantes universitarios españoles, Miscelánea: A Journal of English and American Studies, 66, 41–67.

Credits

Lázaro Observatory is a project created and developed by Elena Álvarez Mellado. The project was originally created within the Broadening Linguistic Technologies Lab at Brandeis University (Massachusetts) under the supervision of Constantine Lignos and is currently developed within the Natural Language Processing and Information Retrieval research group at UNED University in Madrid (Spain).

About Lázaro Observatory