VT-Intrinsic: Physics-based Decomposition of Reflectance and Shading using
a Single Visible-Thermal Image Pair

Carnegie Mellon University
CVPR 2026

"Absence, the highest form of presence."

— James Joyce

VT-Intrinsic Method Overview

TL;DR: Decomposing scene appearance into reflectance (material albedo) and shading (incident illumination) is a fundamental challenge in vision. What is absent in the visible image—light not reflected by the scene—is absorbed as heat and manifests in the thermal spectrum, which presents disambiguating cues. This work proposes an ordinality-based theory for intrinsic image decomposition to exploit an auxiliary thermal image.

Qualitative Results

Key differences are highlighted with bounding boxes.

Interactive Ordinality Demo Interactive

Albedo/shading ordinalities, derived purely from the ordinalities of visible-thermal intensities.

Tap Point A
Tap Point B
See Ordinality

Visible Image

Visible Image
👆
Click Point A

Thermal Image

Thermal Image

Ordinality Theory

How light and heat transport reveals reflectance and shading cues?

theory_0

Given the absorbed heat (𝑆), intrinsic image decomposition becomes well-posed [JoLHT-Video].
However, measuring the absorbed heat from light requires capturing thermal transients with a calibrated video under controlled illumination, which is expensive for general application.


What can we do with the absorbed heat?

theory_1

The light and heat transport equations reveal albedo/shading ordinalities of arbitrary point pairs.


How these ordinalities bridge a thermal image and absorbed heat?

theory_2

Hence, albedo/shading ordinalities are preserved when substituting absorbed heat with thermal image intensity. This enables extracting rich ordinality information from a single thermal image, eliminating the need for transient thermal video under controlled illumination. (Details in paper.)


Optimization Method

How these ordinalities enable intrinsic image decomposition?

pipeline

Our theory enables albedo/shading ordinality queries between arbitrary two points in the image. We densely sample these point-pair ordinalities and thus classify edges as albedo- or shading-dominant, then use these constraints as supervision to optimize randomly initialized CNNs parameterizing albedo and shading (Double-DIP, offering low-level image prior underlying in network architecture). P.S. This is a per-instance optimization pipeline, rather than using a diffusion model..

Abstract

Decomposing a scene into its reflectance and shading is a challenge due to the lack of extensive ground-truth data for real-world scenes. We introduce a novel physics-based approach for intrinsic image decomposition using a pair of visible and thermal images. We leverage the principle that light not reflected from an opaque surface is absorbed and detected as heat by a thermal camera. This allows us to relate the ordinalities (or relative magnitudes) between visible and thermal image intensities to the ordinalities of shading and reflectance. The ordinalities enable dense self-supervision of an optimizing neural network to recover shading and reflectance. We perform extensive quantitative evaluations with known reflectance and shading under natural and artificial lighting, and qualitative experiments across diverse scenes. The results demonstrate superior performance over both classical physics-based and recent learning-based methods, providing a path toward scalable real-world data curation with supervision.

Citation

@inproceedings{yuan2025vt-intrinsic,
  title = {VT-Intrinsic: Physics-Based Decomposition of Reflectance and Shading using a Single Visible-Thermal Image Pair}, 
  author = {Zeqing Yuan and Mani Ramanagopal and Aswin C. Sankaranarayanan and Srinivasa G. Narasimhan},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2026} 
}
        

Acknowledgments

This work was partly supported by NSF grants IIS210723, and NSF-NIFA AI Institute for Resilient Agriculture. We are sincerely grateful to Akihiko Oharazawa for his help with expert annotation, and to Sriram Narayanan and Gaurav Parmar for their insightful discussions.