🏢 Alibaba Group

ChatAnyone: Stylized Real-time Portrait Video Generation with Hierarchical Motion Diffusion Model

27 March 2025·1950 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Alibaba Group

ChatAnyone: Stylized real-time portrait video generation with hierarchical motion diffusion model.

TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting

21 March 2025·3002 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Alibaba Group

TaoAvatar: Lifelike talking avatars in AR, using 3D Gaussian Splatting for real-time rendering and high fidelity.

LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds

13 March 2025·2424 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Alibaba Group

LHM: Animatable 3D avatars from a single image in seconds.

WritingBench: A Comprehensive Benchmark for Generative Writing

7 March 2025·4038 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Text Generation 🏢 Alibaba Group

WritingBench: A new benchmark for generative writing evaluation, enhancing LLMs across diverse domains.

R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning

7 March 2025·1187 words·6 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Multimodal Reasoning 🏢 Alibaba Group

R1-Omni: RLVR enhances multimodal emotion recognition, boosting reasoning and generalization.

EMO2: End-Effector Guided Audio-Driven Avatar Video Generation

18 January 2025·2205 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Alibaba Group

EMO2 achieves realistic audio-driven avatar video generation by employing a two-stage framework: first generating hand poses directly from audio and then using a diffusion model to synthesize full-bod…

Textoon: Generating Vivid 2D Cartoon Characters from Text Descriptions

17 January 2025·2057 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Alibaba Group

Textoon: Generating vivid 2D cartoon characters from text descriptions in under a minute, revolutionizing animation workflow.

HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial Network for High-Fidelity Speech Super-Resolution

17 January 2025·1883 words·9 mins· loading · loading

AI Generated 🤗 Daily Papers Speech and Audio Audio Generation 🏢 Alibaba Group

HiFi-SR: A unified generative network achieves high-fidelity speech super-resolution, outperforming existing methods by seamlessly integrating transformer and convolutional components for end-to-end a…

DiffuEraser: A Diffusion Model for Video Inpainting

17 January 2025·2356 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Alibaba Group

DiffuEraser: a novel video inpainting model based on stable diffusion, surpasses existing methods by using injected priors and temporal consistency improvements for superior results.

CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings

2 January 2025·2397 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Alibaba Group

CODEELO benchmark uses CodeForces to fairly evaluate LLMs’ coding abilities, providing human-comparable Elo ratings and addressing limitations of existing benchmarks.

Evaluating and Aligning CodeLLMs on Human Preference

6 December 2024·3535 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Alibaba Group

CodeArena, a novel benchmark, evaluates code LLMs based on human preferences, revealing performance gaps between open-source and proprietary models, and a large-scale synthetic instruction corpus impr…

A Simple and Provable Scaling Law for the Test-Time Compute of Large Language Models

29 November 2024·1730 words·9 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Alibaba Group

Boost LLM accuracy exponentially by using a two-stage algorithm with provable scaling laws: generate multiple candidate solutions then compare them in a knockout tournament!

Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model

28 November 2024·271 words·2 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Alibaba Group

TeaCache: a training-free method boosts video diffusion model speed by up to 4.41x with minimal quality loss by cleverly caching intermediate outputs.

M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation

28 October 2024·4787 words·23 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Alibaba Group

M2RC-EVAL: A new massively multilingual benchmark for repository-level code completion, featuring fine-grained annotations and a large instruction dataset, enabling better evaluation of code LLMs acro…