TimeWalker: Personalized Neural Space for Lifelong Head Avatars

Video

Abstract

We present TimeWalker, a new framework that models realistic, full-scale 3D head avatars of a person on lifelong scale. Unlike current human head avatar pipelines that capture a person's identity only at the momentary level (i.e., instant photography, or short videos), TimeWalker constructs a person's comprehensive identity from unstructured data collection over his/her various life stages, offering a paradigm to achieve full reconstruction and animation of that person at different moments of life. At the heart of TimeWalker's success is a novel neural parametric model that learns personalized representation with the disentanglement of shape, expression, and appearance across ages. Central to our methodology are the concepts of two aspects: (1) We track back to the principle of modeling a person's identity in an additive combination of his/her average head representation in the canonical space, and moment-specific head attribute representations driven from a set of neural head basis. To learn the set of head basis that could represent the comprehensive head variations of the target person in a compact manner, we propose a Dynamic Neural Basis-Blending Module (Dynamo). It dynamically adjusts the number and blend weights of neural head bases, according to both shared and specific traits of the target person over ages. (2) We introduce Dynamic 2D Gaussian Splatting (DNA-2DGS), an extension of Gaussian splatting representation, to model head motion deformations like facial expressions without losing the realism of rendering and reconstruction of full head. DNA-2DGS includes a set of controllable 2D oriented planar Gaussian disks that utilize the priors from a parametric morphable face model, and move/rotate with the change of expression. Through extensive experimental evaluations, we show TimeWalker's ability to reconstruct and animate avatars across decoupled dimensions with realistic rendering effects, demonstrating a way to achieve personalized time traveling in a breeze.

Overview

TimeWalker constructs a comprehensive identity, achieving full reconstruction and animation of that identity across different lifestages. In Neural Head Basis (a), we get the deform value from Dynamic Neural Basis-Blending and neural deformation field consisting of MLP network and residual embedding. This deformed value is applied to the Gaussian Surfels, which are defined and initialized in canonical space, to perform moment-specific head avatars. We further introduce the motion warping field to warp the Gaussian kernel with expression or shape signal. To perform mesh reconstruction, in (b) Dynamic 2D Gaussian Splattings, we extract the feature from moment-specific rendering result and apply poisson mesh to reconstruct the static mesh, after that the motion warping field is performed with shape or expression conditions to drive the mesh with different motion.

Life Stage

The model's ability to walk through long age period of a person without losing rendering realism, and to represent multiple appearance with diverse skin color gap.

Unseen Expression

The model's ability to reenact unseen expression.

Personalized Space

Multidimensional disentangled animation of single ID, including lifestage, shape, expression, novel view.

Cross Lifestage Reenactment

Demonstrate subresult of expression reenactment across different lifestage of the ID.

Disentangled Animation

Life Stage :
Expression :
Shape :
View :

Cross ID Reenactment

The model's ability to reenact novel and extrapolated expression.

Source ID from RenderMe-360 dataset and NerSemble dataset

Comparsion with SOTA Methods

"1" means: One model for each identity, encompassing multiple appearances

"N" means: N models for each identity, one model for each appearance

#Protocol-1 (1 vs 1): For all methods, we train one separate model for each identity.

#Protocol-2 (1 vs N): For our method, we train one model for each identity; For the SOTA models, we train N models for each identity(one for each appearance).

Downstream Application

3D Editted result produced by DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing

Origin Rendering

Text Prompt: Make the man wear one fashion sunglass

Text Prompt: Make the man wear white beard

Text Prompt: Add smile

Text Prompt: Add curly short red hair

BibTeX

If you find our work helpful, please consider citing:

      @misc{pan2024timewalkerpersonalizedneuralspace,
        title={TimeWalker: Personalized Neural Space for Lifelong Head Avatars}, 
        author={Dongwei Pan and Yang Li and Hongsheng Li and Kwan-Yee Lin},
        year={2024},
        eprint={2412.02421},
        archivePrefix={arXiv},
        primaryClass={cs.CV},
        url={https://arxiv.org/abs/2412.02421}, 
      }