🏢 Microsoft Research
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
·3910 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Microsoft Research
Small language models can master complex math reasoning using self-evolved deep thinking via Monte Carlo Tree Search, surpassing larger models in performance.
GeAR: Generation Augmented Retrieval
·1952 words·10 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Question Answering
🏢 Microsoft Research
GeAR, a new retrieval model, boosts accuracy by combining document retrieval with fine-grained information generation, leading to better understanding and improved localization.
MixLLM: LLM Quantization with Global Mixed-precision between Output-features and Highly-efficient System Design
·2482 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Microsoft Research
MixLLM achieves state-of-the-art LLM compression by using mixed-precision quantization between output features, improving accuracy and system efficiency.
VidTok: A Versatile and Open-Source Video Tokenizer
·2918 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Microsoft Research
VidTok: an open-source, top performing video tokenizer.
Phi-4 Technical Report
·2630 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Microsoft Research
Phi-4: a 14B parameter LLM surpassing its teacher model (GPT-4) in STEM-focused QA through innovative synthetic data generation and post-training techniques.
OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation
·5249 words·25 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Microsoft Research
OLA-VLM boosts multimodal LLMs’ visual understanding by distilling knowledge from specialized visual encoders into the LLM’s internal representations during pretraining, achieving significant performa…
Multimodal Latent Language Modeling with Next-Token Diffusion
·4442 words·21 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Multimodal Generation
🏢 Microsoft Research
LatentLM: a novel multimodal model unifying discrete & continuous data via next-token diffusion, surpassing existing methods in performance & scalability across various tasks.
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion
·3412 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Microsoft Research
Florence-VL enhances vision-language models by incorporating a generative vision encoder and a novel depth-breadth fusion architecture, achieving state-of-the-art results on various benchmarks.
MH-MoE:Multi-Head Mixture-of-Experts
·1899 words·9 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Microsoft Research
MH-MoE: A novel implementation of Multi-Head Mixture-of-Experts achieves superior performance in large language models by enhancing efficiency without sacrificing model size or computational cost.
LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation
·2445 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Vision-Language Models
🏢 Microsoft Research
LLM2CLIP boosts CLIP’s performance by cleverly integrating LLMs, enabling it to understand longer, more complex image captions and achieving state-of-the-art results across various benchmarks.
BitNet a4.8: 4-bit Activations for 1-bit LLMs
·2844 words·14 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 Microsoft Research
BitNet a4.8 achieves comparable performance to existing 1-bit LLMs, but with significantly faster inference, by using a hybrid quantization and sparsification strategy for 4-bit activations.