π’ Google DeepMind
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
·4915 words·24 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Multimodal Learning
Vision-Language Models
π’ Google DeepMind
SigLIP 2: Multilingual Vision-Language Encoders with Semantic Understanding, Localization, and Dense Features.
Eager Updates For Overlapped Communication and Computation in DiLoCo
·3815 words·18 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Federated Learning
π’ Google DeepMind
Eager updates drastically speed up training massive language models by cleverly overlapping communication and computation in DiLoCo, achieving near-optimal performance even with low bandwidth.
We Can't Understand AI Using our Existing Vocabulary
·3226 words·16 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Google DeepMind
To understand AI, we need new words! This paper argues that developing neologismsβnew words for human & machine conceptsβis key to bridging the communication gap and achieving better AI control.
Matryoshka Quantization
·9741 words·46 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Google DeepMind
Matryoshka Quantization (MatQuant) boosts low-precision model accuracy by up to 10% through a novel multi-scale training approach. It leverages the nested structure of integer data types, allowing a …
Agency Is Frame-Dependent
·400 words·2 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Reinforcement Learning
π’ Google DeepMind
Agency, a key concept in AI, is shown to be relative to the observer’s perspective (frame-dependent), challenging traditional binary definitions and necessitating a more nuanced approach for AI system…
Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2
·4637 words·22 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Google DeepMind
AlphaGeometry2 surpasses average IMO gold medalists in solving geometry problems!
On Teacher Hacking in Language Model Distillation
·2783 words·14 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Google DeepMind
Language model distillation suffers from ’teacher hacking’, where student models over-optimize flawed teacher models, degrading true performance. This paper identifies this issue and offers effective…
Improving Transformer World Models for Data-Efficient RL
·2775 words·14 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Reinforcement Learning
π’ Google DeepMind
AI agents now master complex tasks with improved Transformer World Models, achieving a new state-of-the-art in data-efficient reinforcement learning.
Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch
·5509 words·26 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Federated Learning
π’ Google DeepMind
Streaming DiLoCo achieves two orders of magnitude bandwidth reduction in billion-scale parameter LLM training by synchronizing parameter subsets sequentially, overlapping communication with computatio…
TokenVerse: Versatile Multi-concept Personalization in Token Modulation Space
·4649 words·22 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ Google DeepMind
TokenVerse: Extract & combine visual concepts from multiple images for creative image generation!
MSTS: A Multimodal Safety Test Suite for Vision-Language Models
·3786 words·18 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Multimodal Learning
Vision-Language Models
π’ Google DeepMind
New multimodal safety test suite (MSTS) reveals vision-language models’ vulnerabilities and underscores the unique challenges of multimodal inputs.
Evolving Deeper LLM Thinking
·7089 words·34 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Google DeepMind
Mind Evolution, a novel evolutionary search strategy, significantly boosts Large Language Model (LLM) problem-solving by generating, recombining, and refining candidate solutions via an LLM, outperfor…
Trusted Machine Learning Models Unlock Private Inference for Problems Currently Infeasible with Cryptography
·1464 words·7 mins·
loading
·
loading
AI Generated
π€ Daily Papers
AI Theory
Privacy
π’ Google DeepMind
Machine learning models can enable secure computations previously impossible with cryptography, achieving privacy and efficiency in Trusted Capable Model Environments (TCMEs).
Do generative video models learn physical principles from watching videos?
·3121 words·15 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Video Understanding
π’ Google DeepMind
Generative video models struggle to understand physics despite producing visually realistic videos; Physics-IQ benchmark reveals this critical limitation, highlighting the need for improved physical r…
Deliberation in Latent Space via Differentiable Cache Augmentation
·3569 words·17 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Google DeepMind
Frozen LLMs get a performance boost by augmenting their key-value cache with latent embeddings generated by a differentiable offline coprocessor.
Revisiting In-Context Learning with Long Context Language Models
·4377 words·21 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Google DeepMind
Long-context models surprisingly show that simple random sampling of examples is as effective as sophisticated methods for in-context learning, shifting the focus to efficient context utilization.
LearnLM: Improving Gemini for Learning
·4335 words·21 mins·
loading
·
loading
AI Generated
π€ Daily Papers
AI Applications
Education
π’ Google DeepMind
LearnLM enhances Gemini for education by training it to follow pedagogical instructions, leading to significant preference improvements over GPT-40, Claude 3.5, and Gemini 1.5 Pro in diverse learning …
PaliGemma 2: A Family of Versatile VLMs for Transfer
·6035 words·29 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Multimodal Learning
Vision-Language Models
π’ Google DeepMind
PaliGemma 2: A family of versatile, open-weight VLMs achieving state-of-the-art results on various transfer tasks by scaling model size and resolution.
CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models
·3896 words·19 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
3D Vision
π’ Google DeepMind
CAT4D: Create realistic 4D scenes from single-view videos using a novel multi-view video diffusion model.