Felipe Bravo-Márquez

Felipe Bravo-Márquez

Especialidad: Procesamiento de lenguaje natural, aprendizaje automático.
Felipe es  profesor asistente en la Universidad de Chile. Ha trabajado en el análisis de opiniones y emociones en medios de comunicación social, y su trabajo ha sido publicado en conferencias y revistas destacadas. También ha participado en comités de programas de conferencias importantes en inteligencia artificial y procesamiento del lenguaje natural. https://felipebravom.com/

PUBLICACIONES

Publisher:  IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) Link>

ABSTRACT

Temporal video grounding is a fundamental task in computer vision, aiming to localize a natural language query in a long, untrimmed video. It has a key role in the scientific community, in part due to the large amount of video generated every day. Although we find extensive work in this task, we note that research remains focused on a small selection of video representations, which may lead to architectural overfitting in the long run. To address this issue, we propose an empirical study to investigate the impact of different video features on a classical architecture. We extract features for three well-known benchmarks, Charades-STA, ActivityNet-Captions and YouCookII, using video encoders based on CNNs, temporal reasoning and transformers. Our results show significant differences in the performance of our model by simply changing the video encoder, while also revealing clear patterns and errors derived from the use of certain features, ultimately indicating potential feature complementarity.

agencia nacional de investigación y desarrollo
Edificio de Innovación UC, Piso 2
Vicuña Mackenna 4860
Macul, Chile