X$^{2}$-Gaussian: 4D Radiative Gaussian Splatting for Continuous-time Tomographic Reconstruction

2503.21779

Weihao Yu et el.

🤗 2025-03-31

TL;DR
#

Four-dimensional computed tomography (4D CT) captures anatomical changes, crucial for medical imaging. However, current methods discretize time into phases using respiratory gating devices, causing misalignment and limiting practicality. These rely on external hardware, are uncomfortable for patients, and often produce artifacts that obscure fine tissue details. Recent advances like Neural Radiance Fields(NeRF) and Gaussian Splatting(GS) improve 3D reconstruction but still struggle with continuous motion and hardware dependency.

This paper introduces X2-Gaussian, a novel framework for continuous-time 4D CT reconstruction. It integrates dynamic radiative Gaussian splatting with self-supervised respiratory motion learning. The method models dynamics via a spatiotemporal encoder-decoder predicting time-varying Gaussian deformations, eliminating phase discretization. A physiology-driven consistency loss learns breathing cycles directly from projections. X2-Gaussian achieves state-of-the-art performance and advances high-fidelity 4D CT for clinical imaging.

Key Takeaways
#

Why does it matter?
#

This research addresses the critical need for accurate 4D CT reconstruction, improving clinical workflows and patient comfort by removing hardware dependencies and enabling continuous motion modeling. It opens new avenues for dynamic medical imaging and radiomic feature extraction.

Visual Insights
#

🔼 This figure illustrates the X2-Gaussian framework, which is a novel method for 4D CT reconstruction. It highlights two key components: 1) Dynamic Gaussian motion modeling, which captures continuous anatomical changes over time using a time-dependent deformation field applied to Gaussian splatting. This allows for continuous-time reconstruction rather than discrete phase binning. 2) Self-supervised respiratory motion learning, which automatically learns the patient’s breathing cycle directly from the projection data, eliminating the need for external gating devices. The framework is presented as a flowchart showing the data flow and interactions between these components.
read the caption
Figure 1: Framework of our X2-Gaussian, which consists of two innovative components: (1) Dynamic Gaussian motion modeling for continuous-time reconstruction; (2) Self-Supervised respiratory motion learning for estimating breathing cycle autonomously.

Method	Patient1		Patient2		Patient3		Patient4		Patient5		Average
Method	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM
FDK [42]	$34.47$	$0.836$	$25.05$	$0.624$	$34.23$	$0.826$	$28.05$	$0.709$	$25.25$	$0.638$	$29.41$	$0.727$
IntraTomo [58]	$40.04$	$0.965$	$30.62$	$0.889$	$33.55$	$0.888$	$33.00$	$0.910$	$32.8$	$0.935$	$34.00$	$0.917$
NeRF [37]	$40.85$	$0.964$	$32.87$	$0.917$	$33.43$	$0.897$	$33.66$	$0.922$	$34.29$	$0.955$	$35.02$	$0.931$
TensoRF [10]	$33.21$	$0.907$	$30.32$	$0.864$	$33.47$	$0.881$	$33.64$	$0.813$	$32.40$	$0.928$	$32.61$	$0.898$
NAF [59]	$38.21$	$0.945$	$31.73$	$0.875$	$34.11$	$0.900$	$33.95$	$0.911$	$31.74$	$0.927$	$33.95$	$0.912$
SAX-NeRF [7]	$37.21$	$0.961$	$31.53$	$0.938$	$36.71$	$0.929$	$34.30$	$0.944$	$33.14$	$0.947$	$34.58$	$0.942$
3D-GS [29]	$34.19$	$0.847$	$22.96$	$0.713$	$32.53$	$0.840$	$26.32$	$0.793$	$29.89$	$0.812$	$29.18$	$0.801$
X-GS [6]	$38.00$	$0.903$	$25.32$	$0.739$	$33.54$	$0.854$	$28.69$	$0.807$	$28.77$	$0.793$	$30.86$	$0.819$
R²-GS [60]	$40.51$	$0.966$	$33.75$	$0.921$	$39.66$	$0.956$	$36.45$	$0.938$	$35.09$	$0.937$	$37.09$	$0.943$
Ours	$\bm{44.6}$	$\bm{0.978}$	$\bm{35.32}$	$\bm{0.935}$	$\bm{43.22}$	$\bm{0.972}$	$\bm{37.18}$	$\bm{0.942}$	$\bm{36.36}$	$\bm{0.947}$	$\bm{39.34}$	$\bm{0.955}$

🔼 Table 1 presents a quantitative comparison of the proposed X2-Gaussian method against several state-of-the-art techniques for 4D CT reconstruction. The comparison is performed on the DIR public dataset, a collection of 4D CT scans from patients with malignant thoracic tumors. The table shows the performance of each method in terms of Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM) for each of five patients, as well as an average across all patients. Higher PSNR and SSIM values indicate better reconstruction quality.
read the caption
Table 1: Comparison of our X2-Gaussian with different methods on the DIR dataset.

In-depth insights
#

Continuous 4D-CT
#

Continuous 4D-CT aims to overcome limitations of traditional phase-binning methods in dynamic medical imaging. Current 4D-CT methods discretize temporal resolution, introducing motion misalignment and limiting clinical practicality. By directly modeling continuous anatomical motion without discrete phase binning, it enables motion analysis at arbitrary temporal resolutions. The core idea involves extending static radiative Gaussian splatting into the temporal domain, capturing continuous deformation of anatomical structures over time using a spatiotemporal encoder-decoder architecture. It eliminates external gating devices and estimates breathing period directly from projection data through physiology-driven periodic consistency, automatically discovering and adapting to patient-specific breathing patterns.

Motion Learning
#

Motion learning, particularly in the context of 4D CT reconstruction, is a critical aspect for accurately capturing dynamic anatomical changes. Traditional methods rely on external gating devices and phase-binning, which discretize temporal resolution and introduce motion artifacts. Newer methods attempt to predict patient-specific motion patterns and improve reconstruction quality, they often still treat time as discrete. The ideal motion learning approach would be self-supervised, learning directly from projection data without external devices, and capable of modeling continuous anatomical motion throughout the respiratory cycle. It should capture periodic nature of respiratory motion while adapting to patient-specific breathing patterns. The goal is to achieve high-fidelity 4D CT reconstruction for dynamic clinical imaging, enabling better monitoring of respiratory-induced changes and more accurate radiotherapy planning. Effectively this involves learning to predict time-varying deformations in the data.

Gaussian Splatting
#

Gaussian Splatting emerges as a pivotal technique, particularly 3DGS, initially conceived for novel view synthesis. Leveraging millions of 3D Gaussian point clouds, it achieves impressive scene and object representation. Its rapid evolution spans diverse applications, including scene modeling, SLAM, 3D generation, and medical imaging, underscoring its versatility. However, challenges persist in dynamic CT volume reconstruction, where existing algorithms struggle. The text highlights the development of techniques like X-GS and R2GS that are built on gaussian splatting, however there are inherent limitations of these techniques. This study introduces a novel approach that leverages Gaussian Splatting with the aim to resolve these limitations and advance the state-of-the-art performance. With the method the study strives to reconstruct the CT volumes, the study aims to cope with the mentioned problems.

Periodic Loss
#

A periodic loss function in the context of 4D CT reconstruction likely aims to enforce temporal consistency by leveraging the periodic nature of respiratory motion. This means that images reconstructed at time t should be similar to those at time t + nT, where T is the respiratory period and n is an integer. By penalizing deviations from this periodic behavior, the loss function can help to regularize the reconstruction process, reduce noise, and improve the overall quality, especially in dynamic regions. The effectiveness of periodic loss hinges on the accurate estimation or learning of the respiratory period T, either through external gating or self-supervised techniques, potentially making model more robust in dynamic scenarios.

Clinical Params
#

The paper touches on the potential extraction of clinically relevant parameters, which is a significant step towards practical application in medicine. The ability to quantify Tidal Volume (TV), Minute Ventilation (MV), and I:E Ratio directly from the reconstructed 4D CT volumes opens up avenues for enhanced patient monitoring and treatment personalization. This suggests the method can serve as a powerful tool for radiomic feature analysis, offering insights into disease progression and treatment response. While the paper showcases preliminary results, the integration of such quantitative measures holds promise for improved clinical decision-making and personalized radiotherapy planning, paving the way for a more tailored approach to patient care. This is a crucial area for future research and development.

More visual insights
#

More on figures

🔼 Figure 3 visualizes the periodic nature of respiratory motion. It shows snapshots of a 3-second respiratory cycle (T=3s), highlighting the cyclical movement of specific anatomical structures. The colored boxes track the same anatomical region across different time points (t) within the cycle and at corresponding points in subsequent cycles (t+nT), demonstrating the repetitive pattern of breathing.
read the caption
Figure 2: Periodic display of respiratory motion (T=3⁢s𝑇3𝑠T=3sitalic_T = 3 italic_s). A specific anatomical structure (framed by boxes of the same color) at time t𝑡titalic_t has the same position at time t+n⁢T𝑡𝑛𝑇t+nTitalic_t + italic_n italic_T.

🔼 This figure shows the impact of two key techniques, Bounded Cycle Shifts and Log-Space Parameterization, on the learning process of the respiratory cycle period (T). The x-axis represents the number of iterations during training, and the y-axis represents the estimated period (T). The blue line (‘Ours’) demonstrates the stable and accurate convergence to the true respiratory period when both techniques are used. The orange line (‘w/o Bounded Cycle Shifts’) shows the instability where the estimated period oscillates and approaches half the true value. The green line (‘w/o Log-Space Parameterization’) shows significant oscillations in the estimated period. This visualization clearly illustrates the importance of these two methods in ensuring stable and accurate learning of the breathing cycle.
read the caption
Figure 3: Convergence behavior of the learnable period T^^𝑇\hat{T}over^ start_ARG italic_T end_ARG. Without Bounded Cycle Shifts, T^^𝑇\hat{T}over^ start_ARG italic_T end_ARG undergoes wide-ranging oscillations approaching half the true period. Without Log-Space Parameterization, the optimization curve exhibits large oscillations. With both techniques implemented, T^^𝑇\hat{T}over^ start_ARG italic_T end_ARG converges stably and accurately to the correct breathing cycle.

🔼 Figure 5 presents a qualitative comparison of 4D CT reconstruction results obtained using different methods across coronal, sagittal, and axial views. The images showcase the ability of each method to model dynamic anatomical structures, such as the diaphragm and airways, during respiration. X2-Gaussian, the proposed method, demonstrates superior performance in capturing the continuous movement of these structures, resulting in clearer and more detailed representations than existing methods. This superiority is particularly evident in the accurate depiction of finer anatomical details, which are often blurred or lost in other reconstruction approaches. The comparison highlights X2-Gaussian’s enhanced ability to resolve the complexities of dynamic imaging.
read the caption
Figure 4: Qualitative comparison of reconstruction results across coronal, sagittal, and axial planes. Our method shows superior performance in modeling dynamic regions (e.g. diaphragmatic motion and airway deformation) while preserving finer anatomical details compared to existing approaches.

🔼 Figure 6 presents a two-part visualization of the X2-Gaussian model’s performance. Subfigure (a) shows a graph illustrating the relationship between the number of X-ray projections used for reconstruction and the resulting 3D PSNR (Peak Signal-to-Noise Ratio) and 3D SSIM (Structural Similarity Index). The graph demonstrates the improvement in reconstruction quality as the number of projections increases. Subfigure (b) displays the temporal changes in lung volume over a respiratory cycle, as obtained from the 4D CT scan reconstructed by X2-Gaussian. The plot shows the volume variations of both the right and left lungs over time, highlighting the model’s ability to capture the dynamic nature of respiratory motion.
read the caption
Figure 5: (a) Reconstruction results of X2-Gaussian using different numbers of projections. (b) Temporal variations of lung volume in 4D CT reconstructed by X2-Gaussian.

More on tables

Method	4DLung		SPARE
Method	PSNR	SSIM	PSNR	SSIM
FDK [42]	$27.03$	$0.611$	$14.25$	$0.359$
IntraTomo [58]	$34.28$	$0.939$	$27.29$	$0.871$
TensoRF [10]	$34.55$	$0.937$	$26.91$	$0.857$
NAF [59]	$34.94$	$0.936$	$28.44$	$0.893$
X-GS [6]	$29.62$	$0.705$	$18.20$	$0.442$
R²-GS [60]	$37.31$	$0.952$	$31.12$	$0.908$
Ours	$\bm{38.61}$	$\bm{0.957}$	$\bm{32.24}$	$\bm{0.922}$

🔼 Table 2 presents a quantitative comparison of the proposed X2-Gaussian method against several state-of-the-art 4D CT reconstruction techniques on two publicly available datasets: 4DLung and SPARE. The table shows the Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM) metrics, which are common image quality assessment measures. Higher PSNR and SSIM values indicate better reconstruction quality. By comparing the performance of X2-Gaussian to other methods across these datasets, the table demonstrates the superior reconstruction capabilities of the proposed approach in terms of both quantitative metrics.
read the caption
Table 2: Comparison of our X2-Gaussian with different methods on the 4DLung and SPARE datasets.

Method	PSNR	SSIM	Est. error of $T$ (ms)
Ours	$\bm{39.34}$	$\bm{0.955}$	$\bm{5.2}$
- Log-sp. param.	$39.32$	$0.954$	$12.0$
- B. cyc. shifts	$39.28$	$0.954$	$216.8$
- Both	$39.23$	$0.953$	$914.0$

🔼 This table presents a ablation study on the impact of different optimization techniques on the accuracy of respiratory cycle estimation using the DIR dataset. It compares the performance of the proposed method with and without key optimization components such as log-space parameterization and bounded cycle shifts, highlighting their importance in achieving accurate and stable period estimation.
read the caption
Table 3: Results of respiratory cycle estimation and different optimization techniques used on DIR dataset.

Method	PSNR	SSIM
Baseline	$37.09$	$0.943$
+ DGMM	$38.56$	$0.947$
+ DGMM + SSRML	$\bm{39.34}$	$\bm{0.955}$
$\alpha$ = 0.1	$38.86$	$0.952$
$\alpha$ = 0.5	$39.14$	$0.954$
$\alpha$ = 1.0	$\bm{39.34}$	$\bm{0.955}$
$\alpha$ = 2.0	$38.41$	$0.949$

🔼 This table presents the results of ablation studies conducted to analyze the impact of different components and hyperparameters on the performance of the X2-Gaussian model. It specifically examines the contributions of dynamic Gaussian motion modeling (DGMM), self-supervised respiratory motion learning (SSRML), and the weight (α) assigned to the periodic consistency loss (Equation 12). By comparing the performance metrics (PSNR and SSIM) across various configurations, the table helps to quantify the individual contributions of each component and the optimal setting for the hyperparameter α.
read the caption
Table 4: Ablation studies on components and hyperparameters. DGMM denotes dynamic gaussian motion modeling in Sec. 4.2, and SSRML is self-supervised respiratory motion learning in Sec. 4.3. α𝛼\alphaitalic_α is the weight of periodic consistency loss in Eq. 12.

Method	Patient1		Patient2		Patient3		Patient4		Patient5		Average
Method	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM
FDK [42]	$27.36$	$0.646$	$22.98$	$0.410$	$28.48$	$0.662$	$28.76$	$0.654$	$27.59$	$0.684$	$27.03$	$0.611$
IntraTomo [58]	$30.39$	$0.926$	$35.73$	$0.930$	$34.99$	$0.938$	$35.29$	$0.941$	$35.02$	$0.960$	$34.28$	$0.939$
TensoRF [10]	$30.42$	$0.907$	$36.67$	$0.931$	$34.64$	$0.933$	$35.14$	$0.944$	$35.86$	$0.969$	$34.55$	$0.937$
NAF [59]	$30.76$	$0.901$	$37.46$	$0.932$	$34.69$	$0.934$	$35.47$	$0.947$	$36.30$	$0.964$	$34.94$	$0.936$
X-GS [6]	$30.62$	$0.709$	$25.16$	$0.526$	$31.45$	$0.722$	$30.88$	$0.773$	$29.98$	$0.792$	$29.62$	$0.705$
R²-GS [60]	$33.19$	$0.918$	$39.22$	$0.972$	$37.90$	$0.960$	$37.29$	$0.939$	$38.96$	$0.970$	$37.31$	$0.952$
Ours	$34.49$	$0.929$	$\bm{40.44}$	$\bm{0.957}$	$\bm{39.94}$	$\bm{0.966}$	$\bm{38.10}$	$0.943$	$\bm{40.06}$	$\bm{0.973}$	$\bm{38.61}$	$\bm{0.957}$

🔼 This table presents a quantitative comparison of the proposed X2-Gaussian method against several state-of-the-art techniques for 4D CT reconstruction. The evaluation is performed on the 4DLung dataset, a publicly available dataset containing 4D CT scans of patients with lung cancer. The comparison metrics used are PSNR (Peak Signal-to-Noise Ratio) and SSIM (Structural Similarity Index), both common measures of image quality. Results are provided for each of five patients in the dataset, as well as an overall average, allowing for a comprehensive assessment of the performance differences between the methods. The methods compared encompass traditional analytical approaches, novel neural radiance field (NeRF)-based methods, and other Gaussian splatting techniques.
read the caption
Table 5: Comparison of our X2-Gaussian with different methods on the 4DLung dataset.

Method	Patient1		Patient2		Patient3		Average
Method	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM
FDK [42]	$9.85$	$0.232$	$11.85$	$0.229$	$21.04$	$0.616$	$14.25$	$0.359$
IntraTomo [58]	$27.55$	$0.889$	$27.83$	$0.864$	$26.48$	$0.860$	$27.29$	$0.871$
TensoRF [10]	$26.88$	$0.863$	$27.21$	$0.832$	$26.64$	$0.877$	$26.91$	$0.857$
NAF [59]	$28.67$	$0.908$	$29.25$	$0.880$	$27.39$	$0.892$	$28.44$	$0.893$
X-GS [6]	$14.16$	$0.328$	$17.37$	$0.356$	$23.06$	$0.652$	$18.20$	$0.442$
R²-GS [60]	$30.04$	$0.907$	$32.06$	$0.901$	$31.26$	$0.916$	$31.12$	$0.908$
Ours	$\bm{31.38}$	$\bm{0.920}$	$\bm{32.47}$	$\bm{0.907}$	$\bm{32.87}$	$\bm{0.939}$	$\bm{32.24}$	$\bm{0.922}$

🔼 This table presents a quantitative comparison of the proposed X2-Gaussian method against several state-of-the-art techniques for 4D CT reconstruction. The comparison is performed on the SPARE dataset, a publicly available benchmark dataset for 4D lung CT scans. The metrics used for comparison are PSNR (Peak Signal-to-Noise Ratio) and SSIM (Structural Similarity Index), which measure the image quality and structural similarity respectively. The results for each patient in the dataset are shown, alongside average values across all patients. This allows for a comprehensive evaluation of the relative performance of different methods in reconstructing 4D CT data from sparse projections.
read the caption
Table 6: Comparison of our X2-Gaussian with different methods on the SPARE dataset.

TL;DR#

Key Takeaways#

Why does it matter?#

Visual Insights#

In-depth insights#

Continuous 4D-CT#

Motion Learning#

Gaussian Splatting#

Periodic Loss#

Clinical Params#

More visual insights#

Full paper#