🏢 Alibaba Group
ChatAnyone: Stylized Real-time Portrait Video Generation with Hierarchical Motion Diffusion Model
·1950 words·10 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Alibaba Group
ChatAnyone: Stylized real-time portrait video generation with hierarchical motion diffusion model.
TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting
·3002 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Alibaba Group
TaoAvatar: Lifelike talking avatars in AR, using 3D Gaussian Splatting for real-time rendering and high fidelity.
LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds
·2424 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
3D Vision
🏢 Alibaba Group
LHM: Animatable 3D avatars from a single image in seconds.
WritingBench: A Comprehensive Benchmark for Generative Writing
·4038 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Text Generation
🏢 Alibaba Group
WritingBench: A new benchmark for generative writing evaluation, enhancing LLMs across diverse domains.
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning
·1187 words·6 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Multimodal Reasoning
🏢 Alibaba Group
R1-Omni: RLVR enhances multimodal emotion recognition, boosting reasoning and generalization.
EMO2: End-Effector Guided Audio-Driven Avatar Video Generation
·2205 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Alibaba Group
EMO2 achieves realistic audio-driven avatar video generation by employing a two-stage framework: first generating hand poses directly from audio and then using a diffusion model to synthesize full-bod…
Textoon: Generating Vivid 2D Cartoon Characters from Text Descriptions
·2057 words·10 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Alibaba Group
Textoon: Generating vivid 2D cartoon characters from text descriptions in under a minute, revolutionizing animation workflow.
HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial Network for High-Fidelity Speech Super-Resolution
·1883 words·9 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Speech and Audio
Audio Generation
🏢 Alibaba Group
HiFi-SR: A unified generative network achieves high-fidelity speech super-resolution, outperforming existing methods by seamlessly integrating transformer and convolutional components for end-to-end a…
DiffuEraser: A Diffusion Model for Video Inpainting
·2356 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Alibaba Group
DiffuEraser: a novel video inpainting model based on stable diffusion, surpasses existing methods by using injected priors and temporal consistency improvements for superior results.
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings
·2397 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Alibaba Group
CODEELO benchmark uses CodeForces to fairly evaluate LLMs’ coding abilities, providing human-comparable Elo ratings and addressing limitations of existing benchmarks.
Evaluating and Aligning CodeLLMs on Human Preference
·3535 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Alibaba Group
CodeArena, a novel benchmark, evaluates code LLMs based on human preferences, revealing performance gaps between open-source and proprietary models, and a large-scale synthetic instruction corpus impr…
A Simple and Provable Scaling Law for the Test-Time Compute of Large Language Models
·1730 words·9 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Alibaba Group
Boost LLM accuracy exponentially by using a two-stage algorithm with provable scaling laws: generate multiple candidate solutions then compare them in a knockout tournament!
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
·271 words·2 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Alibaba Group
TeaCache: a training-free method boosts video diffusion model speed by up to 4.41x with minimal quality loss by cleverly caching intermediate outputs.
M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation
·4787 words·23 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Alibaba Group
M2RC-EVAL: A new massively multilingual benchmark for repository-level code completion, featuring fine-grained annotations and a large instruction dataset, enabling better evaluation of code LLMs acro…