🏢 ByteDance

Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model

10 March 2025·3772 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 ByteDance

Seedream 2.0: A native Chinese-English bilingual image generation model that understands cultural nuances and excels in text rendering.

MAGA: MAssive Genre-Audience Reformulation to Pretraining Corpus Expansion

6 February 2025·2840 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 ByteDance

MAGA reformulates existing corpora to massively expand LLM pretraining data, boosting performance across various model sizes while maintaining quality.

OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models

3 February 2025·2129 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 ByteDance

OmniHuman-1: Scaling up one-stage conditioned human animation through novel mixed-condition training.

EchoVideo: Identity-Preserving Human Video Generation by Multimodal Feature Fusion

23 January 2025·2578 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 ByteDance

EchoVideo generates high-fidelity, identity-preserving videos by cleverly fusing text and image features, overcoming limitations of prior methods.

Video Depth Anything: Consistent Depth Estimation for Super-Long Videos

21 January 2025·4089 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 ByteDance

Video Depth Anything achieves consistent depth estimation for super-long videos by enhancing Depth Anything V2 with a spatial-temporal head and a novel temporal consistency loss, setting a new state-o…

ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models in Multi-Hop Tool Use

5 January 2025·3646 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 ByteDance

ToolHop: New benchmark dataset rigorously evaluates LLMs’ multi-hop tool use, revealing significant challenges and variations across different LLM families.

Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

5 December 2024·5538 words·26 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 ByteDance

Infinity, a novel bitwise autoregressive model, sets new records in high-resolution image synthesis, outperforming top diffusion models in speed and quality.

AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models

5 December 2024·3014 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 ByteDance

AnyDressing: Customizable multi-garment virtual dressing via a novel latent diffusion model!

TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation

4 December 2024·5178 words·25 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 ByteDance

TokenFlow: One image tokenizer, mastering both visual understanding & generation!

Ultra-Sparse Memory Network

19 November 2024·5103 words·24 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 ByteDance

UltraMem, a novel ultra-sparse memory network, drastically speeds up LLM inference by 6x compared to MoE while maintaining performance, paving the way for efficient large-scale model deployment.

Randomized Autoregressive Visual Generation

1 November 2024·4145 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 ByteDance

Randomized Autoregressive Modeling (RAR) sets a new state-of-the-art in image generation by cleverly introducing randomness during training to improve the model’s ability to learn from bidirectional c…