🏢 Tsinghua University

Skinned Motion Retargeting with Dense Geometric Interaction Perception

26 September 2024·2892 words·14 mins· loading · loading

AI Applications Gaming 🏢 Tsinghua University

MeshRet: A novel retargeting framework that uses dense geometric interaction modeling for realistic, artifact-free skinned character animation.

ShowMaker: Creating High-Fidelity 2D Human Video via Fine-Grained Diffusion Modeling

26 September 2024·2221 words·11 mins· loading · loading

Computer Vision Image Generation 🏢 Tsinghua University

ShowMaker: Generating high-fidelity 2D human conversational videos using fine-grained diffusion modeling and 2D key points.

SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object Navigation

26 September 2024·2121 words·10 mins· loading · loading

Multimodal Learning Embodied AI 🏢 Tsinghua University

SG-Nav achieves state-of-the-art zero-shot object navigation by leveraging a novel 3D scene graph to provide rich context for LLM-based reasoning.

Semi-Open 3D Object Retrieval via Hierarchical Equilibrium on Hypergraph

26 September 2024·2346 words·12 mins· loading · loading

AI Generated Computer Vision 3D Vision 🏢 Tsinghua University

HERT: a novel framework for semi-open 3D object retrieval using hierarchical hypergraph equilibrium, achieving state-of-the-art performance on four new benchmark datasets.

Scaling Law for Time Series Forecasting

26 September 2024·2211 words·11 mins· loading · loading

Machine Learning Deep Learning 🏢 Tsinghua University

Unlocking the potential of deep learning for time series forecasting: this study reveals a scaling law influenced by dataset size, model complexity, and the crucial look-back horizon, leading to impro…

Scalable Bayesian Optimization via Focalized Sparse Gaussian Processes

26 September 2024·2563 words·13 mins· loading · loading

AI Generated Machine Learning Optimization 🏢 Tsinghua University

FOCALBO, a hierarchical Bayesian optimization algorithm using focalized sparse Gaussian processes, efficiently tackles high-dimensional problems with massive datasets, achieving state-of-the-art perfo…

S-STE: Continuous Pruning Function for Efficient 2:4 Sparse Pre-training

26 September 2024·2718 words·13 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Tsinghua University

S-STE achieves efficient 2:4 sparse pre-training by introducing a novel continuous pruning function, overcoming the limitations of previous methods and leading to improved accuracy and speed.

RoPINN: Region Optimized Physics-Informed Neural Networks

26 September 2024·2557 words·13 mins· loading · loading

AI Theory Optimization 🏢 Tsinghua University

ROPINN: Revolutionizing Physics-Informed Neural Networks with Region Optimization

Revisiting motion information for RGB-Event tracking with MOT philosophy

26 September 2024·2713 words·13 mins· loading · loading

AI Generated Computer Vision Object Detection 🏢 Tsinghua University

RGB-Event tracker CSAM leverages MOT philosophy for enhanced robustness by integrating appearance and motion information from RGB and event streams, achieving state-of-the-art performance.

ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search

26 September 2024·2799 words·14 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Tsinghua University

ReST-MCTS*: A novel LLM self-training method using process reward guided tree search, outperforming existing methods by generating higher-quality reasoning traces for improved model accuracy.

Reprogramming Pretrained Target-Specific Diffusion Models for Dual-Target Drug Design

26 September 2024·1732 words·9 mins· loading · loading

AI Applications Healthcare 🏢 Tsinghua University

AI-powered dual-target drug design is revolutionized by repurposing pretrained diffusion models, achieving zero-shot transfer learning and outperforming existing methods.

ReFIR: Grounding Large Restoration Models with Retrieval Augmentation

26 September 2024·3091 words·15 mins· loading · loading

Computer Vision Image Generation 🏢 Tsinghua University

ReFIR enhances Large Restoration Models’ accuracy by incorporating retrieved images as external knowledge, mitigating hallucination without retraining.

Recovering Complete Actions for Cross-dataset Skeleton Action Recognition

26 September 2024·2959 words·14 mins· loading · loading

Computer Vision Action Recognition 🏢 Tsinghua University

Boost skeleton action recognition accuracy across datasets by recovering complete actions and resampling; outperforms existing methods.

RealCompo: Balancing Realism and Compositionality Improves Text-to-Image Diffusion Models

26 September 2024·2585 words·13 mins· loading · loading

Computer Vision Image Generation 🏢 Tsinghua University

RealCompo: A novel training-free framework dynamically balances realism and compositionality in text-to-image generation, achieving state-of-the-art results.

Real-world Image Dehazing with Coherence-based Pseudo Labeling and Cooperative Unfolding Network

26 September 2024·2135 words·11 mins· loading · loading

Image Generation 🏢 Tsinghua University

CORUN-Colabator: a novel cooperative unfolding network and coherence-based label generator achieves state-of-the-art real-world image dehazing by effectively integrating physical knowledge and generat…

Q-VLM: Post-training Quantization for Large Vision-Language Models

26 September 2024·2070 words·10 mins· loading · loading

Multimodal Learning Vision-Language Models 🏢 Tsinghua University

Q-VLM: A novel post-training quantization framework significantly compresses large vision-language models, boosting inference speed without sacrificing accuracy.

Prediction with Action: Visual Policy Learning via Joint Denoising Process

26 September 2024·2466 words·12 mins· loading · loading

AI Applications Robotics 🏢 Tsinghua University

PAD, a novel visual policy learning framework, unifies image prediction and robot action in a joint denoising process, achieving significant performance improvements in robotic manipulation tasks.

PhyRecon: Physically Plausible Neural Scene Reconstruction

26 September 2024·2451 words·12 mins· loading · loading

Computer Vision 3D Vision 🏢 Tsinghua University

PHYRECON: A novel neural scene reconstruction method uses differentiable rendering and physics simulation for physically plausible 3D models.

PEAC: Unsupervised Pre-training for Cross-Embodiment Reinforcement Learning

26 September 2024·3037 words·15 mins· loading · loading

Machine Learning Reinforcement Learning 🏢 Tsinghua University

PEAC: a novel unsupervised pre-training method significantly improves cross-embodiment generalization in reinforcement learning, enabling faster adaptation to diverse robots and tasks.

Parameter-Inverted Image Pyramid Networks

26 September 2024·2381 words·12 mins· loading · loading

Object Detection 🏢 Tsinghua University

Parameter-Inverted Image Pyramid Networks (PIIP) boost image pyramid efficiency by using smaller models for higher-resolution images and larger models for lower-resolution ones, achieving superior per…