Skip to main content

🏢 ByteDance

Video Depth Anything: Consistent Depth Estimation for Super-Long Videos
·4089 words·20 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 ByteDance
Video Depth Anything achieves consistent depth estimation for super-long videos by enhancing Depth Anything V2 with a spatial-temporal head and a novel temporal consistency loss, setting a new state-o…
ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models in Multi-Hop Tool Use
·3646 words·18 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 ByteDance
ToolHop: New benchmark dataset rigorously evaluates LLMs’ multi-hop tool use, revealing significant challenges and variations across different LLM families.
Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
·5538 words·26 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 ByteDance
Infinity, a novel bitwise autoregressive model, sets new records in high-resolution image synthesis, outperforming top diffusion models in speed and quality.
AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models
·3014 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 ByteDance
AnyDressing: Customizable multi-garment virtual dressing via a novel latent diffusion model!
TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation
·5178 words·25 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 ByteDance
TokenFlow: One image tokenizer, mastering both visual understanding & generation!
Ultra-Sparse Memory Network
·5103 words·24 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 ByteDance
UltraMem, a novel ultra-sparse memory network, drastically speeds up LLM inference by 6x compared to MoE while maintaining performance, paving the way for efficient large-scale model deployment.
Randomized Autoregressive Visual Generation
·4145 words·20 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 ByteDance
Randomized Autoregressive Modeling (RAR) sets a new state-of-the-art in image generation by cleverly introducing randomness during training to improve the model’s ability to learn from bidirectional c…