Publisher: Advances in Neural Information Processing Systems, Link > When learning tasks over time, artificial neural networks suffer from a problem known as Catastrophic Forgetting (CF). This happens when the weights of a network are overwritten during the training of a new task causing forgetting of old information. To address this issue, we propose MetA Reusable Knowledge or MARK, a new method that fosters weight reusability instead of overwriting when learning a new task. Specifically, MARK keeps a set of shared weights among tasks. We envision these shared weights as a common Knowledge Base (KB) that is not only used to learn new tasks, but also enriched with new knowledge as the model learns new tasks. Key components behind MARK are two-fold. On the one hand, a metalearning approach provides the key mechanism to incrementally enrich the KB with new knowledge and to foster weight reusability among tasks. On the other hand, a set of trainable masks provides the key mechanism to selectively choose from the KB relevant weights to solve each task. By using MARK, we achieve state of the art results in several popular benchmarks, surpassing the best performing methods in terms of average accuracy by over 10% on the 20-Split-MiniImageNet dataset, while achieving almost zero forgetfulness using 55% of the number of parameters. Furthermore, an ablation study provides evidence that, indeed, MARK is learning reusable knowledge that is selectively used by each task.

Abstract
The popularity of mobile devices with GPS capabilities, along with the worldwide adoption of social media, have created a rich source of text data combined with spatio-temporal information. Text data collected from location-based social networks can be used to gain space–time insights into human behavior and provide a view of time and space from the social media lens. From a data modeling perspective, text, time, and space have different scales and representation approaches; hence, it is not trivial to jointly represent them in a unified model. Existing approaches do not capture the sequential structure present in texts or the patterns that drive how text is generated considering the spatio-temporal context at different levels of granularity. In this work, we present a neural language model architecture that allows us to represent time and space as context for text generation at different granularities. We define the task of modeling text, timestamps, and geo-coordinates as a spatio-temporal conditioned language model task. This task definition allows us to employ the same evaluation methodology used in language modeling, which is a traditional natural language processing task that considers the sequential structure of texts. We conduct experiments over two datasets collected from location-based social networks, Twitter and Foursquare. Our experimental results show that each dataset has particular patterns for language generation under spatio-temporal conditions at different granularities. In addition, we present qualitative analyses to show how the proposed model can be used to characterize urban places.

Publisher: , Link >

Embeddings are core components of modern model-based Collaborative Filtering (CF) methods, such as Matrix Factorization (MF) and Deep Learning variations. In essence, embeddings are mappings of the original sparse representation of categorical features (eg, user and items) to dense low-dimensional representations. A well-known limitation of such methods is that the learned embeddings are opaque and hard to explain to the users. On the other hand, a key feature of simpler KNN-based CF models (aka user/item-based CF) is that they naturally yield similarity-based explanations, ie, similar users/items as evidence to support model recommendations. Unlike related works that try to attribute explicit meaning (via metadata) to the learned embeddings, in this paper, we propose to equip the learned embeddings of MF with meaningful similarity-based explanations. First, we show that the learned user/item …


In X-ray testing, the aim is to inspect those inner parts of an object that cannot be detected by the naked eye. Typical applications are the detection of targets like blow holes in casting inspection, cracks in welding inspection, and prohibited objects in baggage inspection. A straightforward solution today is the use of object detection methods based on deep learning models. Nevertheless, this strategy is not effective when the number of available X-ray images for training is low. Unfortunately, the databases in X-ray testing are rather limited. To overcome this problem, we propose a strategy for deep learning training that is performed with a low number of target-free X-ray images with superimposition of many simulated targets. The simulation is based on the Beer–Lambert law that allows to superimpose different layers. Using this method it is very simple to generate training data. The proposed method was used to train known object detection models (e.g. YOLO, RetinaNet, EfficientDet and SSD) in casting inspection, welding inspection and baggage inspection. The learned models were tested on real X-ray images. In our experiments, we show that the proposed solution is simple (the implementation of the training can be done with a few lines of code using open source libraries), effective (average precision was 0.91, 0.60 and 0.88 for casting, welding and baggage inspection respectively), and fast (training was done in a couple of hours, and testing can be performed in 11ms per image). We believe that this strategy makes a contribution to the implementation of practical solutions to the problem of target detection in X-ray testing.

[:es]Publisher: Computers in biology and medicine, Link > Abstract: Recent advances in medical imaging have confirmed the presence of altered hemodynamics in bicuspid aortic valve (BAV) patients. Therefore, there is a need for new hemodynamic biomarkers to refine disease monitoring and improve patient risk stratification. This research aims to analyze and extract multiple correlation patterns of hemodynamic parameters from 4D Flow MRI data and find which parameters allow an accurate classification between healthy volunteers (HV) and BAV patients with dilated and non-dilated ascending aorta using machine learning. Sixteen hemodynamic parameters were calculated in the ascending aorta (AAo) and aortic arch (AArch) at peak systole from 4D Flow MRI. We used sequential forward selection (SFS) and principal component analysis (PCA) as feature selection algorithms. Then, eleven machine-learning classifiers were implemented to separate HV and BAV patients (non- and dilated ascending aorta). Multiple correlation patterns from hemodynamic parameters were extracted using hierarchical clustering. The linear discriminant analysis and random forest are the best performing classifiers, using five hemodynamic parameters selected with SFS (velocity angle, forward velocity, vorticity, and backward velocity in AAo; and helicity density in AArch) a 96.31 ± 1.76% and 96.00 ± 0.83% accuracy, respectively. Hierarchical clustering revealed three groups of correlated features. According to this analysis, we observed that features selected by SFS have a better performance than those selected by PCA because the five selected parameters were distributed according to 3 different clusters. Based on the proposed method, we concluded that the feature selection method found five potentially hemodynamic biomarkers related to this disease.[:]

Publisher: Computer Vision for X-Ray Testing, Link >

Abstract

Given a facial matcher, in explainable face verification, the task is to answer: how relevant are the parts of a probe image to establish the matching with an enrolled image. In many cases, however, the trained models cannot be manipulated and must be treated as "black-boxes". In this paper, we present six different saliency maps that can be used to explain any face verification algorithm with no manipulation inside of the face recognition model. The key idea of the methods is based on how the matching score of the two face images changes when the probe is perturbed. The proposed methods remove and aggregate different parts of the face, and measure contributions of these parts individually and in-collaboration as well. We test and compare our proposed methods in three different scenarios: synthetic images with different qualities and occlusions, real face images with different facial expressions, poses, and occlusions and faces from different demographic groups. In our experiments, five different face verification algorithms are used: ArcFace, Dlib, FaceNet (trained on VGGface2 and Casia-WebFace), and LBP. We conclude that one of the proposed methods achieves saliency maps that are stable and interpretable to humans. In addition, our method, in combination with a new visualization of saliency maps based on contours, shows promising results in comparison with other state-of-the-art art methods. This paper presents good insights into any face verification algorithm, in which it can be clearly appreciated which are the most relevant face areas that an algorithm takes into account to carry out the recognition process.


Publisher: Dagstuhl Reports, Link >

Abstract

The presentation turns around the subject of explainable AI. More specifically, we deal with attribution numerical scores that are assigned to features values of an entity under classification, to identify and rank their importance for the obtained classification label. We concentrate on the popular SHAP score [2] that can be applied with black-box and open models. We show that, in contrast to its general #P-hardness, it can be computed in polynomial time for classifiers that are based on decomposable and deterministic Boolean decision circuits. This class of classifiers includes decision trees and ordered binary decision diagrams. This result was established in [1]. The presentation illustrates how the proof heavily relies on the connection to SAT-related computational problems.


Publisher: arXiv, Link>

ABSTRACT:

Large-scale pre-trained language models have shown remarkable results in diverse NLP applications. Unfortunately, these performance gains have been accompanied by a significant increase in computation time and model size, stressing the need to develop new or complementary strategies to increase the efficiency of these models. In this paper we propose DACT-BERT, a differentiable adaptive computation time strategy for BERT-like models. DACT-BERT adds an adaptive computational mechanism to BERT's regular processing pipeline, which controls the number of Transformer blocks that need to be executed at inference time. By doing this, the model learns to combine the most appropriate intermediate representations for the task at hand. Our experiments demonstrate that our approach, when compared to the baselines, excels on a reduced computational regime and is competitive in other less restrictive ones.


Publisher: arXiv, Link>

ABSTRACT:

Current language models are usually trained using a self-supervised scheme, where the main focus is learning representations at the word or sentence level. However, there has been limited progress in generating useful discourse-level representations. In this work, we propose to use ideas from predictive coding theory to augment BERT-style language models with a mechanism that allows them to learn suitable discourse-level representations. As a result, our proposed approach is able to predict future sentences using explicit top-down connections that operate at the intermediate layers of the network. By experimenting with benchmarks designed to evaluate discourse-related knowledge using pre-trained sentence representations, we demonstrate that our approach improves performance in 6 out of 11 tasks by excelling in discourse relationship detection.


Publisher: arXiv, Link>

ABSTRACT:

Recently, few-shot video classification has received an increasing interest. Current approaches mostly focus on effectively exploiting the temporal dimension in videos to improve learning under low data regimes. However, most works have largely ignored that videos are often accompanied by rich textual descriptions that can also be an essential source of information to handle few-shot recognition cases. In this paper, we propose to leverage these human-provided textual descriptions as privileged information when training a few-shot video classification model. Specifically, we formulate a text-based task conditioner to adapt video features to the few-shot learning task. Furthermore, our model follows a transductive setting to improve the task-adaptation ability of the model by using the support textual descriptions and query instances to update a set of class prototypes. Our model achieves state-of-the-art performance on four challenging benchmarks commonly used to evaluate few-shot video action classification models.


agencia nacional de investigación y desarrollo
Edificio de Innovación UC, Piso 2
Vicuña Mackenna 4860
Macul, Chile