Skip to main content

🏢 Alibaba Group

EMO2: End-Effector Guided Audio-Driven Avatar Video Generation
·2205 words·11 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Alibaba Group
EMO2 achieves realistic audio-driven avatar video generation by employing a two-stage framework: first generating hand poses directly from audio and then using a diffusion model to synthesize full-bod…
Textoon: Generating Vivid 2D Cartoon Characters from Text Descriptions
·2057 words·10 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Alibaba Group
Textoon: Generating vivid 2D cartoon characters from text descriptions in under a minute, revolutionizing animation workflow.
HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial Network for High-Fidelity Speech Super-Resolution
·1883 words·9 mins· loading · loading
AI Generated 🤗 Daily Papers Speech and Audio Audio Generation 🏢 Alibaba Group
HiFi-SR: A unified generative network achieves high-fidelity speech super-resolution, outperforming existing methods by seamlessly integrating transformer and convolutional components for end-to-end a…
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings
·2397 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Alibaba Group
CODEELO benchmark uses CodeForces to fairly evaluate LLMs’ coding abilities, providing human-comparable Elo ratings and addressing limitations of existing benchmarks.
Evaluating and Aligning CodeLLMs on Human Preference
·3535 words·17 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Alibaba Group
CodeArena, a novel benchmark, evaluates code LLMs based on human preferences, revealing performance gaps between open-source and proprietary models, and a large-scale synthetic instruction corpus impr…
A Simple and Provable Scaling Law for the Test-Time Compute of Large Language Models
·1730 words·9 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Alibaba Group
Boost LLM accuracy exponentially by using a two-stage algorithm with provable scaling laws: generate multiple candidate solutions then compare them in a knockout tournament!
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
·271 words·2 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Alibaba Group
TeaCache: a training-free method boosts video diffusion model speed by up to 4.41x with minimal quality loss by cleverly caching intermediate outputs.
M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation
·4787 words·23 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Alibaba Group
M2RC-EVAL: A new massively multilingual benchmark for repository-level code completion, featuring fine-grained annotations and a large instruction dataset, enabling better evaluation of code LLMs acro…