Cookie Consent by Free Privacy Policy Generator

About Lázaro Observatory

What is Lázaro?

Lázaro is an observatory of anglicism usage in the Spanish press. The purpose of this project is to apply a data-driven approach to the study of anglicisms (ie, unadapted lexical borrowings from English) in Spanish newspapers. Every day, Lázaro collects the latests news published in 22 Spanish news sources, analyzes them and extracts the anglicisms that have been used in the daily news.

This talk (in Spanish) summarizes how the project was built :



What news sources does Lázaro track?

The observatory currently monitors the following sources :

Source Topic
El País General news
elDiario.es General news
ABC General news
El Mundo General news
La Vanguardia General news
El Confidencial General news
20 Minutos General news
Agencia EFE General news
Agencia Sinc Science & Tech
Muy Interesante Science & Tech
La Marea Politics
El Salto Politics
El Economista Economy
Cinco Días Economy
JotDown Culture
El Mundo Today Satirical news
Marca Sports
Rolling Stones Music
Fotogramas Cinema
Diez Minutos Gossip
Men's Health Lifestyle
Elle Lifestyle

How does Lázaro work?

The core of the project is a Machine Learning model that extracts unadapted lexical borrowings (especially English lexical borrowings or anglicisms) from Spanish articles. The model is a BiLSTM-CRF model fed with bilingual EN-ES embeddings, along with subword embeddings (more info on the model can be found in the ACL paper). A previous version of the observatory that was active since April 2020 to August 2022 ran on a Conditional Random Field (CRF) model (more information about that previous model can be found here).

The code of the observatory and the training corpus are available in GitHub. The anglicism detection model can also be used through HuggingFace model hub or via pylazaro Python library.

More information about the model and the training corpus can be found in the following publications:

Twitter bot: @lazarobot

The Twitter bot @lazarobot tweets every day the new anglicisms extracted by the model (ie, anglicisms that have never been seen before by the model), along with its context and a link to the article where it was found.

What Lázaro is not

The purpose of this project is to describe and analyze the usage of anglicisms in the Spanish press. This project seeks by no means to critizise or condemn the usage of anglicisms, or those that use them.

The motivation behind Lázaro Observatory is not to defend an alleged linguistic purity, but to study the phenomenon of lexical borrowing from a descriptive and data driven point of view

Why Lázaro?

The name of the project, Lázaro, is a tribute to the Spanish linguist Lázaro Carreter, whose prescriptivist columns against the usage of the anglicisms in the Spanish press were extremely popular in Spain during the 1980s and the 1990s.

Recognitions & awards

The project behind Lázaro Observatory has received the following awards:

Lázaro Observatory in the media:

Lázaro Observatorio has been featured in the following Spanish media:

Projects using Lázaro Observatory:

Research and third-party projects that use the data provided by Lázaro Observatory:

Credits

Lázaro Observatory is a project created and developed by Elena Álvarez Mellado. The project was originally created within the Broadening Linguistic Technologies Lab at Brandeis University (Massachusetts) under the supervision of Constantine Lignos and is currently developed within the Natural Language Processing and Information Retrieval research group at UNED University in Madrid (Spain).