🏢 Zhejiang University

RIG: Synergizing Reasoning and Imagination in End-to-End Generalist Policy

31 March 2025·3587 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Embodied AI 🏢 Zhejiang University

RIG: Synergizes reasoning and imagination in an end-to-end generalist policy for embodied agents, improving sample efficiency and generalization.

ZJUKLAB at SemEval-2025 Task 4: Unlearning via Model Merging

27 March 2025·1815 words·9 mins· loading · loading

AI Generated 🤗 Daily Papers AI Theory Privacy 🏢 Zhejiang University

Model Merging: An unlearning system, which combines specialized models, achieves top results in SemEval-2025 Task 4 by selectively erasing sensitive knowledge.

Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks

27 March 2025·3498 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Embodied AI 🏢 Zhejiang University

Embodied-Reasoner: Integrates visual search, reasoning, and action for interactive tasks, outperforming existing models in embodied environments.

ADS-Edit: A Multimodal Knowledge Editing Dataset for Autonomous Driving Systems

26 March 2025·2349 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers AI Applications Autonomous Vehicles 🏢 Zhejiang University

ADS-Edit: Empowering autonomous driving with multimodal knowledge editing!

LookAhead Tuning: Safer Language Models via Partial Answer Previews

24 March 2025·2175 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers AI Theory Safety 🏢 Zhejiang University

LookAhead Tuning: Safer LLMs via Partial Answer Previews by preserving initial token distributions.

Zero-1-to-A: Zero-Shot One Image to Animatable Head Avatars Using Video Diffusion

20 March 2025·2987 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Zhejiang University

Zero-1-to-A: Animatable avatars from a single image using video diffusion, robust to spatial & temporal inconsistencies!

MotionStreamer: Streaming Motion Generation via Diffusion-based Autoregressive Model in Causal Latent Space

19 March 2025·3386 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Action Recognition 🏢 Zhejiang University

MotionStreamer: Streaming motion generation w/ diffusion-based autoregressive model in causal latent space.

Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM

18 March 2025·5843 words·28 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Zhejiang University

Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLMs

DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models

17 March 2025·3109 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Zhejiang University

DreamRenderer: Taming attribute control in large-scale text-to-image models with a plug-and-play, training-free approach for enhanced content creation.

MagicID: Hybrid Preference Optimization for ID-Consistent and Dynamic-Preserved Video Customization

16 March 2025·2743 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Zhejiang University

MagicID: ID-consistent & dynamic-preserved video customization via hybrid preference optimization.

ReCamMaster: Camera-Controlled Generative Rendering from A Single Video

14 March 2025·2617 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Zhejiang University

ReCamMaster: Re-shoots videos via generative rendering, controlling camera movement from a single source, for novel perspectives and enhanced video creation.

DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks

24 February 2025·3965 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Segmentation 🏢 Zhejiang University

DICEPTION: A generalist diffusion model for visual perceptual tasks.

Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models

19 February 2025·3075 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Zhejiang University

LORAM: Train small, infer large LLMs by memory-efficient LoRA training. Enables 70B parameter model training on a 20G HBM GPU, replacing A100-80G. Reduces parameter storage cost by 15.81x.

How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training

16 February 2025·7040 words·34 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Zhejiang University

LLMs’ knowledge acquisition is unveiled through the lens of evolving knowledge circuits, revealing how new knowledge integration depends on relevance to existing knowledge, exhibiting distinct phases …

DreamDPO: Aligning Text-to-3D Generation with Human Preferences via Direct Preference Optimization

5 February 2025·2709 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Text Generation 🏢 Zhejiang University

DreamDPO: Revolutionizing text-to-3D generation by directly aligning outputs with human preferences via innovative preference optimization.

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection

8 January 2025·2599 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Zhejiang University

InfiGUIAgent, a novel multimodal GUI agent, leverages a two-stage training pipeline to achieve advanced reasoning and GUI interaction capabilities, outperforming existing models in benchmarks.

OneKE: A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System

28 December 2024·379 words·2 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Information Extraction 🏢 Zhejiang University

OneKE: a dockerized, schema-guided LLM agent system efficiently extracts knowledge from diverse sources, offering adaptability and robust error handling.

Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models

24 December 2024·3014 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Zhejiang University

Orient Anything: Learning robust object orientation estimation directly from rendered 3D models, achieving state-of-the-art accuracy on real images.

Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation

18 December 2024·4162 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Zhejiang University

Prompting unlocks 4K metric depth from low-cost LiDAR.

ZipAR: Accelerating Autoregressive Image Generation through Spatial Locality

5 December 2024·2050 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Zhejiang University

ZipAR accelerates autoregressive image generation by up to 91% through parallel decoding leveraging spatial locality in images, making high-resolution image generation significantly faster.