🏢 Tencent AI Lab

Expanding RL with Verifiable Rewards Across Diverse Domains

31 March 2025·3127 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 Tencent AI Lab

RL with Verifiable Rewards is now expanding to diverse domains like medicine!

The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding

13 February 2025·2201 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tencent AI Lab

LLMs often fail to demonstrate true understanding of concepts, acting as ‘stochastic parrots’ – a phenomenon quantitatively proven by the PHYSICO benchmark.

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

30 January 2025·2085 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tencent AI Lab

Large language models (LLMs) often prematurely abandon promising reasoning paths, a phenomenon called ‘underthinking’. This paper introduces a novel metric to quantify this issue and proposes a decodi…

Autonomy-of-Experts Models

22 January 2025·2476 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tencent AI Lab

Revolutionizing large language models, Autonomy-of-Experts (AoE) empowers individual expert modules to autonomously select inputs, eliminating routers and boosting both efficiency and accuracy.

Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation

21 January 2025·3101 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Tencent AI Lab

Hunyuan3D 2.0: A groundbreaking open-source system generating high-resolution, textured 3D assets using scalable diffusion models, exceeding state-of-the-art performance.

XMusic: Towards a Generalized and Controllable Symbolic Music Generation Framework

15 January 2025·3087 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Speech and Audio Music Generation 🏢 Tencent AI Lab

XMusic: A new framework generates high-quality, emotionally controllable symbolic music from various prompts (images, videos, text, tags, humming).

CityDreamer4D: Compositional Generative Model of Unbounded 4D Cities

15 January 2025·3972 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Tencent AI Lab

CityDreamer4D generates realistic, unbounded 4D city models by cleverly separating dynamic objects (like vehicles) from static elements (buildings, roads), using multiple neural fields for enhanced re…

Scaling Laws for Floating Point Quantization Training

5 January 2025·6363 words·30 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tencent AI Lab

New scaling laws for efficient floating-point quantization training in LLMs are presented, showing optimal bit allocation and critical data size.

VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models

27 December 2024·4442 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tencent AI Lab

VideoMaker achieves high-fidelity zero-shot customized video generation by cleverly harnessing the inherent power of video diffusion models, eliminating the need for extra feature extraction and injec…

DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought

23 December 2024·402 words·2 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Machine Translation 🏢 Tencent AI Lab

DRT-01 leverages long chain-of-thought reasoning to significantly boost machine translation quality, particularly for complex sentences with metaphors and similes, achieving substantial improvements o…

A Silver Bullet or a Compromise for Full Attention? A Comprehensive Study of Gist Token-based Context Compression

23 December 2024·4375 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tencent AI Lab

This study reveals that gist token-based context compression in LLMs, while effective for some tasks, suffers from key failure patterns. The authors propose fine-grained autoencoding and segment-wise…

FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction

12 December 2024·4390 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Tencent AI Lab

FreeSplatter: a novel feed-forward framework reconstructs high-quality 3D scenes from uncalibrated sparse-view images, estimating camera poses in seconds.

EMOv2: Pushing 5M Vision Model Frontier

9 December 2024·6258 words·30 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Classification 🏢 Tencent AI Lab

EMOv2 achieves state-of-the-art performance in various vision tasks using a novel Meta Mobile Block, pushing the 5M parameter lightweight model frontier.

Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation

5 December 2024·2671 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Tencent AI Lab

Divot: A novel diffusion-powered video tokenizer enables unified video comprehension & generation with LLMs, surpassing existing methods.

NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images

4 December 2024·3265 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tencent AI Lab

NVComposer: A novel generative NVS model boosts synthesis quality by implicitly inferring spatial relationships from multiple sparse, unposed images, eliminating reliance on external alignment.

Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability

29 November 2024·2134 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tencent AI Lab

Boosting LLMs’ reasoning: A novel token-level contrastive estimation method automatically identifies and penalizes critical tokens leading to errors, significantly enhancing reasoning accuracy.

Draft Model Knows When to Stop: A Self-Verification Length Policy for Speculative Decoding

27 November 2024·2920 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tencent AI Lab

Self-VerIfication length Policy (SVIP) dynamically adjusts speculative decoding draft lengths based on token difficulty, achieving up to 20% faster large language model inference.

Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens

26 November 2024·3397 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tencent AI Lab

Low-bit quantization excels for undertrained LLMs but struggles with fully-trained ones; new scaling laws reveal this, directing future research.

AnchorCrafter: Animate CyberAnchors Saling Your Products via Human-Object Interacting Video Generation

26 November 2024·2812 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tencent AI Lab

AnchorCrafter animates cyber-anchors selling products via human-object interacting video generation, achieving high visual fidelity and controllable interactions.

Morph: A Motion-free Physics Optimization Framework for Human Motion Generation

22 November 2024·2160 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tencent AI Lab

Morph: a novel motion-free physics optimization framework drastically enhances human motion generation’s physical plausibility using synthetic data, achieving state-of-the-art quality.