Publisher: Elsevier, Data in Brief  Link>


The COVID-19 pandemic has underlined the need for reliable information for clinical decision-making and public health policies. As such, evidence-based medicine (EBM) is essential in identifying and evaluating scientific documents pertinent to novel diseases, and the accurate classification of biomedical text is integral to this process. Given this context, we introduce a comprehensive, curated dataset composed of COVID-19-related documents.

This dataset includes 20,047 labeled documents that were meticulously classified into five distinct categories: systematic reviews (SR), primary study randomized controlled trials (PS-RCT), primary study non-randomized controlled trials (PS-NRCT), broad synthesis (BS), and excluded (EXC). The documents, labeled by collaborators from the Epistemonikos Foundation, incorporate information such as document type, title, abstract, and metadata, including PubMed id, authors, journal, and publication date.

Uniquely, this dataset has been curated by the Epistemonikos Foundation and is not readily accessible through conventional web-scraping methods, thereby attesting to its distinctive value in this field of research. In addition to this, the dataset also includes a vast evidence repository comprising 427,870 non-COVID-19 documents, also categorized into SR, PS-RCT, PS-NRCT, BS, and EXC. This additional collection can serve as a valuable benchmark for subsequent research. The comprehensive nature of this open-access dataset and its accompanying resources is poised to significantly advance evidence-based medicine and facilitate further research in the domain.

Publisher:  IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) Link>


Temporal video grounding is a fundamental task in computer vision, aiming to localize a natural language query in a long, untrimmed video. It has a key role in the scientific community, in part due to the large amount of video generated every day. Although we find extensive work in this task, we note that research remains focused on a small selection of video representations, which may lead to architectural overfitting in the long run. To address this issue, we propose an empirical study to investigate the impact of different video features on a classical architecture. We extract features for three well-known benchmarks, Charades-STA, ActivityNet-Captions and YouCookII, using video encoders based on CNNs, temporal reasoning and transformers. Our results show significant differences in the performance of our model by simply changing the video encoder, while also revealing clear patterns and errors derived from the use of certain features, ultimately indicating potential feature complementarity.

Publisher: Frontiers in Neural Circuits Link>


While external stimulation can reliably trigger neuronal activity, cerebral processes can operate independently from the environment. In this study, we conceptualize autogenous cerebral processes (ACPs) as intrinsic operations of the brain that exist on multiple scales and can influence or shape stimulus responses, behavior, homeostasis, and the physiological state of an organism. We further propose that the field should consider exploring to what extent perception, arousal, behavior, or movement, as well as other cognitive functions previously investigated mainly regarding their stimulus–response dynamics, are ACP-driven.

Publisher:  CEUR-WS Link>


The extraction and classification of important information from Spanish Electronic Clinical Narratives (ECNs) can be challenging due to the complexity of the clinical text and the limited availability of labeled data. In this paper, we introduce a chunked Named Entity Recognition model designed to parse and classify sections of ECNs into predefined categories. The model aims to improve section identification and classification accuracy within ECNs in the context of the IberLEF ClinAIS Task. Our system achieves a promising performance, obtaining a weighted B2 score of .6958, demonstrating its capability to accurately distinguish borders and boundaries between sections. The paper concludes with a comprehensive analysis of the results, discussing potential implications and suggesting directions for further improvements in clinical text analysis.

Publisher:  IEEE Explore  Link>


Autism Spectrum Disorder (ASD) is a neurodevelopmental condition that affects social communication and behavior. Early diagnosis is crucial to enhance the patient’s quality of life through treatments and therapies. In this research, two white matter (WM) fiber bundle segmentation methods are analyzed and compared in terms of their performance and impact on the results obtained from the analyzes applied to a database comprising 37 adolescents, 19 subjects with autism and 18 controls. To achieve this, we conducted the segmentation of deep white matter tracts, and computed average diffusion-based indices for each tract, such as Apparent Diffusion Coefficient (ADC), Fractional Anisotropy (FA), and Generalized Fractional Anisotropy (GFA). We applied statistical tests to identify features with significant differences between groups based on the results of two segmentation methods. Significant differences in diffusion-based indices were found in certain cingulate, thalamic, corticospinal, and corpus callosum fascicles. Furthermore, we performed classification between patients and controls using each fascicle feature independently with the Support Vector Machine (SVM) and Decision Trees (DT) algorithms. Finally, we applied the classifiers to the most relevant features for each segmentation method. Overall, even with the limitations of our small database, we demonstrated that the segmentation algorithm has a high impact on WM tract-based analyzes and prediction, with the autocencoder-based algorithm showing better results than a distance-based method.

Publisher: Elsevier, SoftwareX  Link>


CoTranslate is a web-based platform designed to efficiently label and review translations from language experts, with the aim of creating high-quality sentence-pair corpuses for training neural machine translation models. Utilizing Django backend and ReactJS frontend, the platform fosters collaboration among experts in translating and validating sentences. Focused on developing quality corpora, particularly for low-resource languages, CoTranslate addresses linguistic barriers and enhances translation quality. By streamlining the creation of robust training datasets, CoTranslate holds significant potential to impact the field of machine translation.

Publisher:  IEEE Explora Link>


There is ongoing interest in the dynamics of resting state brain networks (RSNs) as potential predictors of cognitive and behavioural states. Multivariate Autoregressors (MAR) are used to model regional brain activity as a linear combination of past activity in other regions. The coefficients of the MAR are taken as estimates of effective brain connectivity. However, assumption of stationarity, and the large number of coefficients renders the MAR impractical for estimating brain networks from standard neuroimaging time-series of limited durations. We propose HsMM-MAR-AC, a novel sparse hybrid discrete-continuous model for the efficient estimation of time-dependent effective brain networks from non-stationary brain activity time-series. Discrete quasi-stationary Brain States, and the fast switching between them, are modelled by a Hidden semi-Markov Model whose continuous emissions are drawn from a sparse MAR. The coefficients of the MAR are restricted by Anatomical Brain Connectivity information in two ways: 1) Effective direct connectivity between two brain regions is only considered if the corresponding anatomical connection exists; and 2) the autoregressors lag associated with each connection is based on the fiber length between the two regions, such that only one lag per connection is estimated. We test the accuracy of HsMM-MAR-AC in recovering simulated resting state networks of various durations, and at different thresholds of anatomical restrictions. We demonstrate that HsMM-MAR-AC recovers the RSNs more accurately than the benchmark method of the sliding window, with as little as 4 minutes of data. We also show that when the anatomical restrictions are relaxed, longer time-series are needed to estimate the networks, and became computationally unfeasible without anatomical restrictions. HsMM-MAR-AC offers an efficient model for estimating time-dependent Effective Connectivity from neuroimaging data that exploits the advantages of Hidden Markov and MAR models without identifiability problems, excessive demand on data collection, or unnecessary computational effort.

Publisher: Elsevier, Expert Systems with Applications  Link>


We present a study of an artificial neural architecture that predict human ocular scanpaths while they are free-viewing different images types. This analysis is made by comparing different metrics that encompass scanpath patterns, these metrics aim to measure spatial and temporal errors; such as the MSE, ScanMatch, cross-correlogram peaks, and MultiMatch. Our methodology begin by choosing one architecture and training different parametric models per subject and image type, this allows to adjust the models to each person and a given set of images. We find out that there is a clear difference in prediction when people free-view images with high visual content (high-frequency contents) and low visual content (no-frequency contents). The input features selected for predicting the scanpath are saliency maps calculated from foveated images together with the past of the ocular scanpath of subjects, modeled by our architecture called FovSOS-FSD (Foveated Saliency and Ocular Scanpath with Feature Selection and Direct Prediction).

The results of this study could be used to improve the design of gaze-controlled interfaces, virtual reality, as well as to better understand how humans visually explore their surroundings and pave a way to make future research.

Publisher:  Elsevier, Artificial Intelligence Link>


Reinforcement Learning (RL) is a machine learning paradigm wherein an artificial agent interacts with an environment with the purpose of learning behaviour that maximizes the expected cumulative reward it receives from the environment. Reward machines (RMs) provide a structured, automata-based representation of a reward function that enables an RL agent to decompose an RL problem into structured subproblems that can be efficiently learned via off-policy learning. Here we show that RMs can be learned from experience, instead of being specified by the user, and that the resulting problem decomposition can be used to effectively solve partially observable RL problems. We pose the task of learning RMs as a discrete optimization problem where the objective is to find an RM that decomposes the problem into a set of subproblems such that the combination of their optimal memoryless policies is an optimal policy for the original problem. We show the effectiveness of this approach on three partially observable domains, where it significantly outperforms A3C, PPO, and ACER, and discuss its advantages, limitations, and broader potential.1

Publisher: 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)  Link>


Catastrophic forgetting, the phenomenon of forgetting previously learned tasks when learning a new one, is a major hurdle in developing continual learning algorithms. A popular method to alleviate forgetting is to use a memory buffer, which stores a subset of previously learned task examples for use during training on new tasks. The de facto method of filling memory is by randomly selecting previous examples. However, this process could introduce outliers or noisy samples that could hurt the generalization of the model. This paper introduces Memory Outlier Elimination (MOE), a method for identifying and eliminating outliers in the memory buffer by choosing samples from label-homogeneous subpopulations. We show that a space with a high homogeneity is related to a feature space that is more representative of the class distribution. In practice, MOE removes a sample if it is surrounded by samples from different labels. We demonstrate the effectiveness of MOE on CIFAR-10, CIFAR-100, and CORe50, outperforming previous well-known memory population methods.

agencia nacional de investigación y desarrollo
Edificio de Innovación UC, Piso 2
Vicuña Mackenna 4860
Macul, Chile