🏢 Tencent AI Lab
Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation
·3101 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Tencent AI Lab
Hunyuan3D 2.0: A groundbreaking open-source system generating high-resolution, textured 3D assets using scalable diffusion models, exceeding state-of-the-art performance.
XMusic: Towards a Generalized and Controllable Symbolic Music Generation Framework
·3087 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Speech and Audio
Music Generation
🏢 Tencent AI Lab
XMusic: A new framework generates high-quality, emotionally controllable symbolic music from various prompts (images, videos, text, tags, humming).
CityDreamer4D: Compositional Generative Model of Unbounded 4D Cities
·3972 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Tencent AI Lab
CityDreamer4D generates realistic, unbounded 4D city models by cleverly separating dynamic objects (like vehicles) from static elements (buildings, roads), using multiple neural fields for enhanced re…
Scaling Laws for Floating Point Quantization Training
·6363 words·30 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Tencent AI Lab
New scaling laws for efficient floating-point quantization training in LLMs are presented, showing optimal bit allocation and critical data size.
VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models
·4442 words·21 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Tencent AI Lab
VideoMaker achieves high-fidelity zero-shot customized video generation by cleverly harnessing the inherent power of video diffusion models, eliminating the need for extra feature extraction and injec…
DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought
·402 words·2 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Machine Translation
🏢 Tencent AI Lab
DRT-01 leverages long chain-of-thought reasoning to significantly boost machine translation quality, particularly for complex sentences with metaphors and similes, achieving substantial improvements o…
A Silver Bullet or a Compromise for Full Attention? A Comprehensive Study of Gist Token-based Context Compression
·4375 words·21 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Tencent AI Lab
This study reveals that gist token-based context compression in LLMs, while effective for some tasks, suffers from key failure patterns. The authors propose fine-grained autoencoding and segment-wise…
FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction
·4390 words·21 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Tencent AI Lab
FreeSplatter: a novel feed-forward framework reconstructs high-quality 3D scenes from uncalibrated sparse-view images, estimating camera poses in seconds.
EMOv2: Pushing 5M Vision Model Frontier
·6258 words·30 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Classification
🏢 Tencent AI Lab
EMOv2 achieves state-of-the-art performance in various vision tasks using a novel Meta Mobile Block, pushing the 5M parameter lightweight model frontier.
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
·2671 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Tencent AI Lab
Divot: A novel diffusion-powered video tokenizer enables unified video comprehension & generation with LLMs, surpassing existing methods.
NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images
·3265 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Tencent AI Lab
NVComposer: A novel generative NVS model boosts synthesis quality by implicitly inferring spatial relationships from multiple sparse, unposed images, eliminating reliance on external alignment.
Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability
·2134 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Tencent AI Lab
Boosting LLMs’ reasoning: A novel token-level contrastive estimation method automatically identifies and penalizes critical tokens leading to errors, significantly enhancing reasoning accuracy.
Draft Model Knows When to Stop: A Self-Verification Length Policy for Speculative Decoding
·2920 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Tencent AI Lab
Self-VerIfication length Policy (SVIP) dynamically adjusts speculative decoding draft lengths based on token difficulty, achieving up to 20% faster large language model inference.
Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens
·3397 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Tencent AI Lab
Low-bit quantization excels for undertrained LLMs but struggles with fully-trained ones; new scaling laws reveal this, directing future research.
AnchorCrafter: Animate CyberAnchors Saling Your Products via Human-Object Interacting Video Generation
·2812 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Tencent AI Lab
AnchorCrafter animates cyber-anchors selling products via human-object interacting video generation, achieving high visual fidelity and controllable interactions.
Morph: A Motion-free Physics Optimization Framework for Human Motion Generation
·2160 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Tencent AI Lab
Morph: a novel motion-free physics optimization framework drastically enhances human motion generation’s physical plausibility using synthetic data, achieving state-of-the-art quality.
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
·2697 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Tencent AI Lab
Insight-V: A multi-agent system enhances multi-modal LLMs’ visual reasoning by generating high-quality long-chain reasoning data and employing a two-stage training pipeline, achieving significant perf…
StdGEN: Semantic-Decomposed 3D Character Generation from Single Images
·2454 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Tencent AI Lab
StdGEN: Generate high-quality, semantically decomposed 3D characters from a single image in minutes, enabling flexible customization for various applications.
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
·1756 words·9 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Tencent AI Lab
Tencent unveils Hunyuan-Large, a groundbreaking open-source MoE LLM boasting 389B parameters and 52B activated parameters, surpassing existing models in performance across various benchmarks.