Skip to main content

🏢 Fudan University

VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation
·3389 words·16 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Fudan University
VidCRAFT3 enables high-quality image-to-video generation with precise control over camera movement, object motion, and lighting, pushing the boundaries of visual content creation.
VideoRoPE: What Makes for Good Video Rotary Position Embedding?
·3961 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Fudan University
VideoRoPE enhances video processing in Transformer models by introducing a novel 3D rotary position embedding that preserves spatio-temporal relationships, resulting in superior performance across var…
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training
·4105 words·20 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Fudan University
Agent-R: A novel self-training framework enables language model agents to learn from errors by dynamically constructing training data that corrects erroneous actions, resulting in significantly improv…
Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback
·3489 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Fudan University
DOLPHIN: AI automates scientific research from idea generation to experimental validation.
LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment
·276 words·2 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Fudan University
LiFT leverages human feedback, including reasoning, to effectively align text-to-video models with human preferences, significantly improving video quality.
BitStack: Fine-Grained Size Control for Compressed Large Language Models in Variable Memory Environments
·6027 words·29 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Fudan University
BitStack: Dynamic LLM sizing for variable memory!