Publisher:, Link>

ABSTRACT

Medical images are an essential input for the timely diagnosis of pathologies. Despite its wide use in the area, searching for images that can reveal valuable information to support decision-making is difficult and expensive. However, the possibilities that open when making large repositories of images available for search by content are unsuspected. We designed a content-based image retrieval system for medical imaging, which reduces the gap between access to information and the availability of useful repositories to meet these needs. The system operates on the principle of query-by-example, in which users provide medical images, and the system displays a set of related images. Unlike metadata match-driven searches, our system drives content-based search. This allows the system to conduct searches on repositories of medical images that do not necessarily have complete and curated metadata. We explore our system’s feasibility in computational tomography (CT) slices for SARS-CoV-2 infection (COVID-19), showing that our proposal obtains promising results, advantageously comparing it with other search methods.


Publisher: Revista Bits de Ciencia, Link>

ABSTRACT

Los bots tienen un nefasto efecto en la diseminación de información engañosa o tendenciosa en redes sociales [1]. Su objetivo es amplificar la alcanzabilidad de campañas, transformando artificialmente mensajes en tendencias. Para ello, las cuentas que dan soporte a campañas se hacen seguir por cuentas manejadas por algoritmos. Muchas de las cuentas que siguen a personajes de alta connotación pública son bots, las cuales entregan soporte a sus mensajes con likes y retweets. Cuando estos mensajes muestran un inusitado nivel de reacciones, se transforman en tendencias, lo cual aumenta aún más su visibilidad. Al transformarse en tendencias, su influencia en la red crece, produciendo un fenómeno de bola de nieve.

 

Publisher: arXiv, Link>

ABSTRACT

Current language models are usually trained using a self-supervised scheme, where the main focus is learning representations at the word or sentence level. However, there has been limited progress in generating useful discourse-level representations. In this work, we propose to use ideas from predictive coding theory to augment BERT-style language models with a mechanism that allows them to learn suitable discourse-level representations. As a result, our proposed approach is able to predict future sentences using explicit top-down connections that operate at the intermediate layers of the network. By experimenting with benchmarks designed to evaluate discourse-related knowledge using pre-trained sentence representations, we demonstrate that our approach improves performance in 6 out of 11 tasks by excelling in discourse relationship detection.


Publisher: Diagnostics, Link>

ABSTRACT

Medical imaging is essential nowadays throughout medical education, research, and care. Accordingly, international efforts have been made to set large-scale image repositories for these purposes. Yet, to date, browsing of large-scale medical image repositories has been troublesome, time-consuming, and generally limited by text search engines. A paradigm shift, by means of a query-by-example search engine, would alleviate these constraints and beneficially impact several practical demands throughout the medical field. The current project aims to address this gap in medical imaging consumption by developing a content-based image retrieval (CBIR) system, which combines two image processing architectures based on deep learning. Furthermore, a first-of-its-kind intelligent visual browser was designed that interactively displays a set of imaging examinations with similar visual content on a similarity map, making it possible to search for and efficiently navigate through a large-scale medical imaging repository, even if it has been set with incomplete and curated metadata. Users may, likewise, provide text keywords, in which case the system performs a content- and metadata-based search. The system was fashioned with an anonymizer service and designed to be fully interoperable according to international standards, to stimulate its integration within electronic healthcare systems and its adoption for medical education, research and care. Professionals of the healthcare sector, by means of a self-administered questionnaire, underscored that this CBIR system and intelligent interactive visual browser would be highly useful for these purposes. Further studies are warranted to complete a comprehensive assessment of the performance of the system through case description and protocolized evaluations by medical imaging specialists.


Publisher: , Link>

ABSTRACT

Valuable and timely information about crisis situations such as natural disasters, can be rapidly obtained from user-generated content in social media. This has created an emergent research field that has focused mostly on the problem of filtering and classifying potentially relevant messages during emergency situations. However, we believe important insight can be gained from studying online communications during disasters at a more comprehensive level. In this sense, a higher-level analysis could allow us to understand if there are collective patterns associated to certain characteristics of events. Following this motivation, we present a novel comparative analysis of 41 real-world crisis events. This analysis is based on textual and linguistic features of social media messages shared during these crises. For our comparison we considered hazard categories (i.e., human-induced and natural crises) as well as subcategories (i.e., intentional, accidental and so forth). Among other things, our results show that using only a small set of textual features, we can differentiate among types of events with 75% accuracy. Indicating that there are clear patterns in how people react to different extreme situations, depending on, for example, whether the event was triggered by natural causes or by human action. These findings have implications from a crisis response perspective, as they will allow experts to foresee patterns in emerging situations, even if there is no prior experience with an event of such characteristics.1


Publisher: arXiv, Link>

ABSTRACT

Automatic hate speech detection in online social networks is an important open problem in Natural Language Processing (NLP). Hate speech is a multidimensional issue, strongly dependant on language and cultural factors. Despite its relevance, research on this topic has been almost exclusively devoted to English. Most supervised learning resources, such as labeled datasets and NLP tools, have been created for this same language. Considering that a large portion of users worldwide speak in languages other than English, there is an important need for creating efficient approaches for multilingual hate speech detection. In this work we propose to address the problem of multilingual hate speech detection from the perspective of transfer learning. Our goal is to determine if knowledge from one particular language can be used to classify other language, and to determine effective ways to achieve this. We propose a hate specific data representation and evaluate its effectiveness against general-purpose universal representations most of which, unlike our proposed model, have been trained on massive amounts of data. We focus on a cross-lingual setting, in which one needs to classify hate speech in one language without having access to any labeled data for that language. We show that the use of our simple yet specific multilingual hate representations improves classification results. We explain this with a qualitative analysis showing that our specific representation is able to capture some common patterns in how hate speech presents itself in different languages. Our proposal constitutes, to the best of our knowledge, the first attempt for constructing multilingual specific-task representations. Despite its simplicity, our model outperformed the previous approaches for most of the experimental setups. Our findings can orient future solutions toward the use of domain-specific representations.


Publisher: Publications, Link>

ABSTRACT

The evaluation of research proposals and academic careers is subject to indicators of scientific productivity. Citations are critical signs of impact for researchers, and many indicators are based on these data. The literature shows that there are differences in citation patterns between areas. The scope and depth that these differences may have to motivate the extension of these studies considering types of articles and age groups of researchers. In this work, we conducted an exploratory study to elucidate what evidence there is about the existence of these differences in citation patterns. To perform this study, we collected historical data from Scopus. Analyzing these data, we evaluate if there are measurable differences in citation patterns. This study shows that there are evident differences in citation patterns between areas, types of publications, and age groups of researchers that may be relevant when carrying out researchers’ academic evaluation.


Publisher:, Link>

ABSTRACT

Social networks are used every day to report daily events, although the information published in them many times correspond to fake news. Detecting these fake news has become a research topic that can be approached using deep learning. However, most of the current research on the topic is available only for the English language. When working on fake news detection in other languages, such as Spanish, one of the barriers is the low quantity of labeled datasets available in Spanish. Hence, we explore if it is convenient to translate an English dataset to Spanish using Statistical Machine Translation. We use the translated dataset to evaluate the accuracy of several deep learning architectures and compare the results from the translated dataset and the original dataset in fake news classification. Our results suggest that the approach is feasible, although it requires high-quality translation techniques, such as those found in the translation’s neural-based models.


Publisher: arXiv, Link>

ABSTRACT

The field of natural language understanding has experienced exponential progress in the last few years, with impressive results in several tasks. This success has motivated researchers to study the underlying knowledge encoded by these models. Despite this, attempts to understand their semantic capabilities have not been successful, often leading to non-conclusive, or contradictory conclusions among different works. Via a probing classifier, we extract the underlying knowledge graph of nine of the most influential language models of the last years, including word embeddings, text generators, and context encoders. This probe is based on concept relatedness, grounded on WordNet. Our results reveal that all the models encode this knowledge, but suffer from several inaccuracies. Furthermore, we show that the different architectures and training strategies lead to different model biases. We conduct a systematic evaluation to discover specific factors that explain why some concepts are challenging. We hope our insights will motivate the development of models that capture concepts more precisely.


Abstract
The popularity of mobile devices with GPS capabilities, along with the worldwide adoption of social media, have created a rich source of text data combined with spatio-temporal information. Text data collected from location-based social networks can be used to gain space–time insights into human behavior and provide a view of time and space from the social media lens. From a data modeling perspective, text, time, and space have different scales and representation approaches; hence, it is not trivial to jointly represent them in a unified model. Existing approaches do not capture the sequential structure present in texts or the patterns that drive how text is generated considering the spatio-temporal context at different levels of granularity. In this work, we present a neural language model architecture that allows us to represent time and space as context for text generation at different granularities. We define the task of modeling text, timestamps, and geo-coordinates as a spatio-temporal conditioned language model task. This task definition allows us to employ the same evaluation methodology used in language modeling, which is a traditional natural language processing task that considers the sequential structure of texts. We conduct experiments over two datasets collected from location-based social networks, Twitter and Foursquare. Our experimental results show that each dataset has particular patterns for language generation under spatio-temporal conditions at different granularities. In addition, we present qualitative analyses to show how the proposed model can be used to characterize urban places.

agencia nacional de investigación y desarrollo
Edificio de Innovación UC, Piso 2
Vicuña Mackenna 4860
Macul, Chile