Lázaro is an observatory of anglicism usage in the Spanish press. The purpose of this project is to apply a data-driven approach to the study of anglicisms (ie, unadapted lexical borrowings from English) in Spanish newspapers. Every day, Lázaro collects the latests news published in eight major Spanish newspapers, analyzes them and extracts the anglicisms that have been used in the daily news. Lazaro currently analyzes the following newspapers: elDiario.es, El País, El Mundo, ABC, La Vanguardia, El Confidencial, 20minutos y EFE.
The core of the project is a Machine Learning model that extracts unadapted lexical borrowings (especially English lexical borrowings or anglicisms) from Spanish articles. This model is a Conditional Random Field (CRF). The code of the model and the training corpus are available in the GitHub repository. More information about the model and the training corpus can be found in the following publications:
The Twitter bot @lazarobot tweets every day the new anglicisms extracted by the model (ie, anglicisms that have never been seen before by the model), along with its context and a link to the article where it was found.
The purpose of this project is to describe and analyze the usage of anglicisms in the Spanish press. This project seeks by no means to critizise or condemn the usage of anglicisms, or those that use them.
The motivation behind Lázaro Observatory is not to defend an alleged linguistic purity, but to study the phenomenon of lexical borrowing from a descriptive and data driven point of view
The name of the project, Lázaro, is a tribute to the Spanish linguist Lázaro Carreter, whose prescriptivist columns against the usage of the anglicisms in the Spanish press were extremely popular in Spain during the 1980s and the 1990s.
The project behind Lázaro Observatory was awarded the Outstanding Corpus Thesis Award that is awarded by the Institute for Corpus Research (Incheon National University, Sourth Korea). It has also receive the Karen Spärck Jones Award for Outstanding Achievement in Natural Language Processing awarded by Brandeis University (Massachusetts).
Lázaro Observatorio has been featured in the following Spanish media:
Lázaro Observatory is a project created and developed by Elena Álvarez Mellado. The project was originally created within the Computational Structure of Language Lab at Brandeis University (Massachusetts) and advised by Constantine Lignos.