A Comprehensive Overview of Causal Reasoning for Visual Representation Learning
Category Computer Science Saturday - November 25 2023, 07:46 UTC - 12 months ago This paper provides a comprehensive overview of causal reasoning for visual representation learning and brings to the forefront the urgency of developing novel causality-guided visual representation learning methods. The paper selectively cites related works, datasets and insights and provides a comprehensive overview of core components of representation learning. The main contributions of this work are providing a comprehensive overview of causal reasoning for visual representation learning and bringing up the urgency to develop new causal-guided visual representation learning methods.
With the emergence of huge amounts of heterogeneous multi-modal data—including images, videos, texts/languages, audios, and multi-sensor data—deep learning-based methods have shown promising performance for various computer vision and machine learning tasks, such as visual comprehension, video understanding, visual-linguistic analysis, and multi-modal fusion.However, existing methods rely heavily upon fitting the data distributions and tend to capture the spurious correlations from different modalities, and thus fail to learn the essential causal relations behind the multi-modal knowledge, which has good generalization and cognitive abilities.
Inspired by the fact that most of the data in computer vision society are independent and identically distributed (i.i.d.), a substantial body of literature has adopted data augmentation, pre-training, self-supervision, and novel architectures to improve the robustness of the state-of-the-art deep neural network architectures. However, it has been argued that such strategies only learn correlation-based patterns (statistical dependencies) from data and may not generalize well without the guarantee of the i.i.d setting.
Due to its powerful ability to uncover the underlying structural knowledge about data-generating processes that allow interventions to generalize well across different tasks and environments, causal reasoning offers a promising alternative to correlation learning.
Recently, causal reasoning has attracted increasing attention in myriad high-impact domains within computer vision and machine learning, such as interpretable deep learning, causal feature selection, visual comprehension, visual robustness, visual question answering, and video understanding. A common challenge of these causal methods is how to build a strong cognitive model that can fully discover causality and spatial-temporal relations.
In their paper, the researchers aim to provide a comprehensive overview of causal reasoning for visual representation learning, attracting attention, encouraging discussions, and bringing to the forefront the urgency of developing novel causality-guided visual representation learning methods.
Although there are some surveys about causal reasoning, these works are intended for general representation learning tasks such as deconfounding, out-of-distribution (OOD) generalization, and debasing.
The work is published in the journal Machine Intelligence Research.
Uniquely, this paper focuses on the systematic and comprehensive survey of related works, datasets, insights, future challenges and opportunities for causal reasoning, visual representation learning, and their integration. To present the review more concisely and clearly, this paper selects and cites related works by considering their sources, publication years, impact, and the coverage of different aspects of the topic surveyed in this paper.
Overall, the main contributions of this work are as follows. Firstly, this paper presents the basic concepts of causality, the structural causal model (SCM), the independent causal mechanism (ICM) principle, causal inference, and causal intervention. Then, based on the technical details, the related works are reviewed and summarized from three core components of representation learning: pre-training, data augmentation, and post-training.
Share