Human Mesh Recovery (HMR) aims to reconstruct 3D human pose and shape from 2D observations and is fundamental to human-centric understanding in real-world scenarios. While recent image-based HMR methods such as SAM 3D Body achieve strong robustness on in-the-wild images, they rely on per-frame inference when applied to videos, leading to temporal inconsistency and degraded performance under occlusions. We address these issues without extra training by leveraging the inherent human continuity in videos.
We propose SAM-Body4D, a training-free framework for temporally consistent and occlusion-robust HMR from videos. We first generate identity-consistent masklets using a promptable video segmentation model, then refine them with an Occlusion-Aware module to recover missing regions. The refined masklets guide SAM 3D Body to produce consistent full-body mesh trajectories, while a padding-based parallel strategy enables efficient multi-human inference. Experimental results demonstrate that SAM-Body4D achieves improved temporal stability and robustness in challenging in-the-wild videos, without any retraining.
Given an input video with human prompts, SAM-Body4D operates on three main modules in a training-free manner. The Masklet Generator derives identity-consistent temporal masklets from the video to provide spatio-temporal tracking cues. The Occlusion-Aware Masklet Refiner enriches these masklets by recovering invisible body regions and stabilizing temporal alignment. Finally, the Mask-Guided HMR module uses refined masklets as spatial prompts to predict accurate and temporally coherent human meshes across the entire sequence.
Compared to per-frame HMR (Left), occlusion-aware refinement (Right) prevents hallucinated poses and preserves consistent body structure under severe occlusions.
This project is built upon several outstanding works in the community.
In particular, our framework is deeply inspired by and developed based on SAM 3, Diffusion-VAS, and SAM 3D Body, which provide the foundations for robust segmentation, occlusion-aware amodal completion, and human body mesh recovery, respectively.
We sincerely thank the authors of these works for making their code and ideas publicly available, which greatly facilitated this project.
@article{gao2025sambody4d,
title = {SAM-Body4D: Training-Free 4D Human Body Mesh Recovery from Videos},
author = {Gao, Mingqi and Miao, Yunqi and Han, Jungong},
journal = {arXiv preprint arXiv:2512.08406},
year = {2025},
url = {https://arxiv.org/abs/2512.08406}
}