in IEEE transactions on image processing : a publication of the IEEE Signal Processing Society by Yi Tian, Xiyun Wang, Sihui Zhang, Wanru Xu, Yi Jin, Yaping Huang
Gaze estimation task aims to predict a 3D gaze direction or a 2D gaze point given a face or eye image. To improve generalization of gaze estimation models to unseen new users, existing methods either disentangle personalized information of all subjects from their gaze features, or integrate unrefined personalized information into blended embeddings. Their methodologies are not rigorous whose performance is still unsatisfactory. In this paper, we put forward a comprehensive perspective named 'Disengage AND Integrate' to deal with personalized information, which elaborates that for specified users, their irrelevant personalized information should be discarded while relevant one should be considered. Accordingly, a novel Personalized Causal Network (PCNet) for generalizable gaze estimation has been proposed. The PCNet adopts a two-branch framework, which consists of a subject-deconfounded appearance sub-network (SdeANet) and a prototypical personalization sub-network (ProPNet). The SdeANet aims to explore causalities among facial images, gazes, and personalized information and extract a subject-invariant appearance-aware feature of each image by means of causal intervention. The ProPNet aims to characterize customized personalization-aware features of arbitrary users with the help of a prototype-based subject identification task. Furthermore, our whole PCNet is optimized in a hybrid episodic training paradigm, which further improve its adaptability to new users. Experiments on three challenging datasets over within-domain and cross-domain gaze estimation tasks demonstrate the effectiveness of our method.