🏢 Zhejiang University
ZJUKLAB at SemEval-2025 Task 4: Unlearning via Model Merging
·1815 words·9 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
AI Theory
Privacy
🏢 Zhejiang University
Model Merging: An unlearning system, which combines specialized models, achieves top results in SemEval-2025 Task 4 by selectively erasing sensitive knowledge.
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
·3498 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Embodied AI
🏢 Zhejiang University
Embodied-Reasoner: Integrates visual search, reasoning, and action for interactive tasks, outperforming existing models in embodied environments.
ADS-Edit: A Multimodal Knowledge Editing Dataset for Autonomous Driving Systems
·2349 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
AI Applications
Autonomous Vehicles
🏢 Zhejiang University
ADS-Edit: Empowering autonomous driving with multimodal knowledge editing!
LookAhead Tuning: Safer Language Models via Partial Answer Previews
·2175 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
AI Theory
Safety
🏢 Zhejiang University
LookAhead Tuning: Safer LLMs via Partial Answer Previews by preserving initial token distributions.
Zero-1-to-A: Zero-Shot One Image to Animatable Head Avatars Using Video Diffusion
·2987 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Zhejiang University
Zero-1-to-A: Animatable avatars from a single image using video diffusion, robust to spatial & temporal inconsistencies!
MotionStreamer: Streaming Motion Generation via Diffusion-based Autoregressive Model in Causal Latent Space
·3386 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Action Recognition
🏢 Zhejiang University
MotionStreamer: Streaming motion generation w/ diffusion-based autoregressive model in causal latent space.
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM
·5843 words·28 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Zhejiang University
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLMs
DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models
·3109 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Zhejiang University
DreamRenderer: Taming attribute control in large-scale text-to-image models with a plug-and-play, training-free approach for enhanced content creation.
MagicID: Hybrid Preference Optimization for ID-Consistent and Dynamic-Preserved Video Customization
·2743 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Zhejiang University
MagicID: ID-consistent & dynamic-preserved video customization via hybrid preference optimization.
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
·2617 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Zhejiang University
ReCamMaster: Re-shoots videos via generative rendering, controlling camera movement from a single source, for novel perspectives and enhanced video creation.
DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks
·3965 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Segmentation
🏢 Zhejiang University
DICEPTION: A generalist diffusion model for visual perceptual tasks.
Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models
·3075 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Zhejiang University
LORAM: Train small, infer large LLMs by memory-efficient LoRA training. Enables 70B parameter model training on a 20G HBM GPU, replacing A100-80G. Reduces parameter storage cost by 15.81x.
How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training
·7040 words·34 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Zhejiang University
LLMs’ knowledge acquisition is unveiled through the lens of evolving knowledge circuits, revealing how new knowledge integration depends on relevance to existing knowledge, exhibiting distinct phases …
DreamDPO: Aligning Text-to-3D Generation with Human Preferences via Direct Preference Optimization
·2709 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Text Generation
🏢 Zhejiang University
DreamDPO: Revolutionizing text-to-3D generation by directly aligning outputs with human preferences via innovative preference optimization.
InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection
·2599 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Zhejiang University
InfiGUIAgent, a novel multimodal GUI agent, leverages a two-stage training pipeline to achieve advanced reasoning and GUI interaction capabilities, outperforming existing models in benchmarks.
OneKE: A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System
·379 words·2 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Information Extraction
🏢 Zhejiang University
OneKE: a dockerized, schema-guided LLM agent system efficiently extracts knowledge from diverse sources, offering adaptability and robust error handling.
Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models
·3014 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Zhejiang University
Orient Anything: Learning robust object orientation estimation directly from rendered 3D models, achieving state-of-the-art accuracy on real images.
Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation
·4162 words·20 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Zhejiang University
Prompting unlocks 4K metric depth from low-cost LiDAR.
ZipAR: Accelerating Autoregressive Image Generation through Spatial Locality
·2050 words·10 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Zhejiang University
ZipAR accelerates autoregressive image generation by up to 91% through parallel decoding leveraging spatial locality in images, making high-resolution image generation significantly faster.
Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion
·4118 words·20 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Zhejiang University
ScoreLiDAR: Distilling diffusion models for 5x faster, higher-quality 3D LiDAR scene completion!