Dr. Javier Huertas

Assistant Professor

Short bio

Dr. Javier Huertas Tato obtained his PhD in Computer Science at Universidad Carlos III de Madrid under a FPI research grant. Currently, he is working as a Ph.D. assistant lecturer at Universidad Politecnica de Madrid and collaborating with national and international research projects such as CIVIC, FightDIS, and IBERIFIER. His current research topics are disinformation detection, tracking, and countering; machine learning applied to environmental issues; and deep learning techniques such as convolutional networks and transformers.

Areas of research

Disinformation
Machine Learning
Deep Learning

Projects as member

Iberian Digital Media Research and Fact-Checking Hub (IBERIFIER)

Scope: International

Start date: September 1, 2021

End date: February 29, 2024
Caracterización inteligente de la veracidad de información asociada a la COVID-19 (CIVIC)

Scope: Industrial

Start date: October 10, 2020

End date: October 10, 2022
DisTrack: Tracking disinformation in Online Social Networks through Deep Natural Language Processing (DisTrack)

Scope: Industrial

Start date: November 1, 2021

End date: December 31, 2022
Fighting against Information DISorders in Online Social Networks (FightDIS)

Scope: National

Start date: January 1, 2021

End date: August 31, 2024
XAI-DisInfodemics: eXplainable AI for disinformation and conspiracy detection during infodemics (XAI-DisInfodemics)

Scope: International

Start date: January 1, 2021

End date: December 31, 2024
Malicious actors profiling and detection in Online Social Networks through Artificial (MARTINI)

Scope: International

Start date: December 1, 2022

End date: December 31, 2025
Cátedra de Ciberseguridad INCIBE-ETSISI/UPM

Scope: Industrial

Start date: December 22, 2023

End date: December 31, 2025

Journal articles

2022

Huertas-Tato, Javier; Martín, Alejandro; Camacho, David

SILT: Efficient transformer training for inter-lingual inference Journal Article

In: Expert Systems with Applications, vol. 200, pp. 116923, 2022, ISSN: 0957-4174.

Abstract | Links | BibTeX

@article{huertas-tato_silt_2022,

title = {SILT: Efficient transformer training for inter-lingual inference},

author = {Javier Huertas-Tato and Alejandro Martín and David Camacho},

url = {https://www.sciencedirect.com/science/article/pii/S0957417422003578},

doi = {10.1016/j.eswa.2022.116923},

issn = {0957-4174},

year  = {2022},

date = {2022-08-01},

urldate = {2022-08-01},

journal = {Expert Systems with Applications},

volume = {200},

pages = {116923},

abstract = {The ability of transformers to perform precision tasks such as question answering, Natural Language Inference (NLI) or summarizing, has enabled them to be ranked as one of the best paradigms to address Natural Language Processing (NLP) tasks. NLI is one of the best scenarios to test these architectures, due to the knowledge required to understand complex sentences and established relationships between a hypothesis and a premise. Nevertheless, these models suffer from the incapacity to generalize to other domains or from difficulties to face multilingual and interlingual scenarios. The leading pathway in the literature to address these issues involve designing and training extremely large architectures, but this causes unpredictable behaviors and establishes barriers which impede broad access and fine tuning. In this paper, we propose a new architecture called Siamese Inter-Lingual Transformer (SILT). This architecture is able to efficiently align multilingual embeddings for Natural Language Inference, allowing for unmatched language pairs to be processed. SILT leverages siamese pre-trained multi-lingual transformers with frozen weights where the two input sentences attend to each other to later be combined through a matrix alignment method. The experimental results carried out in this paper evidence that SILT allows to reduce drastically the number of trainable parameters while allowing for inter-lingual NLI and achieving state-of-the-art performance on common benchmarks.},

keywords = {},

pubstate = {published},

tppubtype = {article}

}

Martín, Alejandro; Huertas-Tato, Javier; Huertas-García, Álvaro; Villar-Rodríguez, Guillermo; Camacho, David

FacTeR-Check: Semi-automated fact-checking through Semantic Similarity and Natural Language Inference Journal Article

In: arXiv:2110.14532 [cs], 2022, (arXiv: 2110.14532).

Abstract | Links | BibTeX

@article{martin_facter-check_2022,

title = {FacTeR-Check: Semi-automated fact-checking through Semantic Similarity and Natural Language Inference},

author = {Alejandro Martín and Javier Huertas-Tato and Álvaro Huertas-García and Guillermo Villar-Rodríguez and David Camacho},

url = {http://arxiv.org/abs/2110.14532},

year  = {2022},

date = {2022-02-01},

urldate = {2022-02-01},

journal = {arXiv:2110.14532 [cs]},

abstract = {Our society produces and shares overwhelming amounts of information through Online Social Networks (OSNs). Within this environment, misinformation and disinformation have proliferated, becoming a public safety concern in most countries. Allowing the public and professionals to efficiently find reliable evidences about the factual veracity of a claim is a crucial step to mitigate this harmful spread. To this end, we propose FacTeR-Check, a multilingual architecture for semi-automated fact-checking that can be used for either applications designed for the general public and by fact-checking organisations. FacTeR-Check enables retrieving fact-checked information, unchecked claims verification and tracking dangerous information over social media. This architectures involves several modules developed to evaluate semantic similarity, to calculate natural language inference and to retrieve information from Online Social Networks. The union of all these components builds a semi-automated fact-checking tool able of verifying new claims, to extract related evidence, and to track the evolution of a hoax on a OSN. While individual modules are validated on related benchmarks (mainly MSTS and SICK), the complete architecture is validated using a new dataset called NLI19-SP that is publicly released with COVID-19 related hoaxes and tweets from Spanish social media. Our results show state-of-the-art performance on the individual benchmarks, as well as producing a useful analysis of the evolution over time of 61 different hoaxes.},

note = {arXiv: 2110.14532},

keywords = {},

pubstate = {published},

tppubtype = {article}

}

Publications in conferences

2021

Huertas-García, Álvaro; Huertas-Tato, Javier; Martín, Alejandro; Camacho, David

CIVIC-UPM at CheckThat! 2021: Integration of Transformers in Misinformation Detection and Topic Classification Proceedings Article

In: Conference and Labs of the Evaluation Forum (CLEF) Working Notes, pp. 520–530, 2021.

Abstract | Links | BibTeX

Villar-Rodríguez, Guillermo; Huertas-Tato, Javier; Martín, Alejandro; Camacho, David

A la desinformación le gusta la compañía: Representación de bulos de Twitter sobre la COVID-19 mediante embeddings Conference

XIX Conference of the Spanish Association for Artificial Intelligence, 2021.

BibTeX

Huertas-García, Álvaro; Huertas-Tato, Javier; Martín, Alejandro; Camacho, David

Countering Misinformation Through Semantic-Aware Multilingual Models Proceedings Article

In: Yin, Hujun; Camacho, David; Tino, Peter; Allmendinger, Richard; Tallón-Ballesteros, Antonio J.; Tang, Ke; Cho, Sung-Bae; Novais, Paulo; Nascimento, Susana (Ed.): Intelligent Data Engineering and Automated Learning – IDEAL 2021, pp. 312–323, Springer International Publishing, Cham, 2021, ISBN: 978-3-030-91608-4.

Abstract | Links | BibTeX

@inproceedings{huertas-garcia_countering_2021,

title = {Countering Misinformation Through Semantic-Aware Multilingual Models},

author = {Álvaro Huertas-García and Javier Huertas-Tato and Alejandro Martín and David Camacho},

editor = {Hujun Yin and David Camacho and Peter Tino and Richard Allmendinger and Antonio J. Tallón-Ballesteros and Ke Tang and Sung-Bae Cho and Paulo Novais and Susana Nascimento},

doi = {10.1007/978-3-030-91608-4_31},

isbn = {978-3-030-91608-4},

year  = {2021},

date = {2021-01-01},

urldate = {2021-01-01},

booktitle = {Intelligent Data Engineering and Automated Learning – IDEAL 2021},

pages = {312--323},

publisher = {Springer International Publishing},

address = {Cham},

abstract = {The presence of misinformation and harmful content on social networks is an emerging problem that endangers public health. One of the most successful approaches for detecting, assessing, and providing prompt responses to this misinformation problem is Natural Language Processing (NLP) techniques based on semantic similarity. However, language constitutes one of the most significant barriers to address, denoting the need to develop multilingual tools for an effective fight against misinformation. This paper presents an approach for countering misinformation through a semantic-aware multilingual architecture. Due to the specificity of the task addressed, which involves assessing the level of similarity between a pair of texts in a multilingual scenario, we built an extension of the well-known Semantic Textual Similarity Benchmark (STSb) to 15 languages. This new dataset allows to fine-tune and evaluate multilingual models based on Transformers with a siamese network topology on monolingual and cross-lingual Semantic Textual Similarity (STS) tasks, achieving a maximum average Spearman correlation coefficient of 83.60%. We validate our proposal using the Covid-19 MLIA @ Eval Multilingual Semantic Search Task. The results reported demonstrate that semantic-aware multilingual architectures are successful at measuring the degree of similarity between pairs of texts, while broadening our understanding of the multilingual capabilities of this type of models. The results and the new multilingual STS Benchmark data presented and made publicly in this study constitute an initial step towards extending methods proposed in the literature that employ semantic similarity to combat misinformation at a multilingual level.},

keywords = {},

pubstate = {published},

tppubtype = {inproceedings}

}

Other publications

2022

Huertas-García, Álvaro; Martín, Alejandro; Huertas-Tato, Javier; Camacho, David

Exploring Dimensionality Reduction Techniques in Multilingual Transformers Miscellaneous

CoRR, 2022.

Abstract | Links | BibTeX

@misc{nokey,

title = {Exploring Dimensionality Reduction Techniques in Multilingual Transformers},

author = {Álvaro Huertas-García and Alejandro Martín and Javier Huertas-Tato and David Camacho},

url = {https://doi.org/10.48550/arxiv.2204.08415},

doi = {10.48550/ARXIV.2204.08415},

year  = {2022},

date = {2022-04-18},

urldate = {2022-04-18},

abstract = {Both in scientific literature and in industry,, Semantic and context-aware Natural Language Processing-based solutions have been gaining importance in recent years. The possibilities and performance shown by these models when dealing with complex Language Understanding tasks is unquestionable, from conversational agents to the fight against disinformation in social networks. In addition, considerable attention is also being paid to developing multilingual models to tackle the language bottleneck. The growing need to provide more complex models implementing all these features has been accompanied by an increase in their size, without being conservative in the number of dimensions required. This paper aims to give a comprehensive account of the impact of a wide variety of dimensional reduction techniques on the performance of different state-of-the-art multilingual Siamese Transformers, including unsupervised dimensional reduction techniques such as linear and nonlinear feature extraction, feature selection, and manifold techniques. In order to evaluate the effects of these techniques, we considered the multilingual extended version of Semantic Textual Similarity Benchmark (mSTSb) and two different baseline approaches, one using the pre-trained version of several models and another using their fine-tuned STS version. The results evidence that it is possible to achieve an average reduction in the number of dimensions of 91.58%±2.59% and 54.65%±32.20%, respectively. This work has also considered the consequences of dimensionality reduction for visualization purposes. The results of this study will significantly contribute to the understanding of how different tuning approaches affect performance on semantic-aware tasks and how dimensional reduction techniques deal with the high-dimensional embeddings computed for the STS task and their potential for highly demanding NLP tasks },

howpublished = {CoRR},

keywords = {},

pubstate = {published},

tppubtype = {misc}

}

Teaching

Inteligencia Artificial

Type: Theory & Practice

Location: Universidad Politécnica de Madrid

Years: 2022 –
Algorítimica y Complejidad

Type: Theory

Location: Universidad Politécnica de Madrid

Years: 2022 –

Dr. Javier Huertas

Assistant Professor

Short bio

Areas of research

Projects as member

Iberian Digital Media Research and Fact-Checking Hub (IBERIFIER)

Caracterización inteligente de la veracidad de información asociada a la COVID-19 (CIVIC)

DisTrack: Tracking disinformation in Online Social Networks through Deep Natural Language Processing (DisTrack)

Fighting against Information DISorders in Online Social Networks (FightDIS)

XAI-DisInfodemics: eXplainable AI for disinformation and conspiracy detection during infodemics (XAI-DisInfodemics)

Malicious actors profiling and detection in Online Social Networks through Artificial (MARTINI)

Cátedra de Ciberseguridad INCIBE-ETSISI/UPM

Journal articles

2022

Publications in conferences

2021

Other publications

2022

Teaching

Inteligencia Artificial

Algorítimica y Complejidad