🏢 Google DeepMind

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

20 February 2025·4915 words·24 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Google DeepMind

SigLIP 2: Multilingual Vision-Language Encoders with Semantic Understanding, Localization, and Dense Features.

Eager Updates For Overlapped Communication and Computation in DiLoCo

18 February 2025·3815 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Federated Learning 🏢 Google DeepMind

Eager updates drastically speed up training massive language models by cleverly overlapping communication and computation in DiLoCo, achieving near-optimal performance even with low bandwidth.

We Can't Understand AI Using our Existing Vocabulary

11 February 2025·3226 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Google DeepMind

To understand AI, we need new words! This paper argues that developing neologisms—new words for human & machine concepts—is key to bridging the communication gap and achieving better AI control.

Matryoshka Quantization

10 February 2025·9741 words·46 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Google DeepMind

Matryoshka Quantization (MatQuant) boosts low-precision model accuracy by up to 10% through a novel multi-scale training approach. It leverages the nested structure of integer data types, allowing a …

Agency Is Frame-Dependent

6 February 2025·400 words·2 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 Google DeepMind

Agency, a key concept in AI, is shown to be relative to the observer’s perspective (frame-dependent), challenging traditional binary definitions and necessitating a more nuanced approach for AI system…

Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2

5 February 2025·4637 words·22 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Google DeepMind

AlphaGeometry2 surpasses average IMO gold medalists in solving geometry problems!

On Teacher Hacking in Language Model Distillation

4 February 2025·2783 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Google DeepMind

Language model distillation suffers from ’teacher hacking’, where student models over-optimize flawed teacher models, degrading true performance. This paper identifies this issue and offers effective…

Improving Transformer World Models for Data-Efficient RL

3 February 2025·2775 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Reinforcement Learning 🏢 Google DeepMind

AI agents now master complex tasks with improved Transformer World Models, achieving a new state-of-the-art in data-efficient reinforcement learning.

Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch

30 January 2025·5509 words·26 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Federated Learning 🏢 Google DeepMind

Streaming DiLoCo achieves two orders of magnitude bandwidth reduction in billion-scale parameter LLM training by synchronizing parameter subsets sequentially, overlapping communication with computatio…

TokenVerse: Versatile Multi-concept Personalization in Token Modulation Space

21 January 2025·4649 words·22 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Google DeepMind

TokenVerse: Extract & combine visual concepts from multiple images for creative image generation!

MSTS: A Multimodal Safety Test Suite for Vision-Language Models

17 January 2025·3786 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Google DeepMind

New multimodal safety test suite (MSTS) reveals vision-language models’ vulnerabilities and underscores the unique challenges of multimodal inputs.

Evolving Deeper LLM Thinking

17 January 2025·7089 words·34 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Google DeepMind

Mind Evolution, a novel evolutionary search strategy, significantly boosts Large Language Model (LLM) problem-solving by generating, recombining, and refining candidate solutions via an LLM, outperfor…

Trusted Machine Learning Models Unlock Private Inference for Problems Currently Infeasible with Cryptography

15 January 2025·1464 words·7 mins· loading · loading

AI Generated 🤗 Daily Papers AI Theory Privacy 🏢 Google DeepMind

Machine learning models can enable secure computations previously impossible with cryptography, achieving privacy and efficiency in Trusted Capable Model Environments (TCMEs).

Do generative video models learn physical principles from watching videos?

14 January 2025·3121 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Google DeepMind

Generative video models struggle to understand physics despite producing visually realistic videos; Physics-IQ benchmark reveals this critical limitation, highlighting the need for improved physical r…

Deliberation in Latent Space via Differentiable Cache Augmentation

23 December 2024·3569 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Google DeepMind

Frozen LLMs get a performance boost by augmenting their key-value cache with latent embeddings generated by a differentiable offline coprocessor.

Revisiting In-Context Learning with Long Context Language Models

22 December 2024·4377 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Google DeepMind

Long-context models surprisingly show that simple random sampling of examples is as effective as sophisticated methods for in-context learning, shifting the focus to efficient context utilization.

LearnLM: Improving Gemini for Learning

21 December 2024·4335 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers AI Applications Education 🏢 Google DeepMind

LearnLM enhances Gemini for education by training it to follow pedagogical instructions, leading to significant preference improvements over GPT-40, Claude 3.5, and Gemini 1.5 Pro in diverse learning …

PaliGemma 2: A Family of Versatile VLMs for Transfer

4 December 2024·6035 words·29 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Google DeepMind

PaliGemma 2: A family of versatile, open-weight VLMs achieving state-of-the-art results on various transfer tasks by scaling model size and resolution.

CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models

27 November 2024·3896 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Google DeepMind

CAT4D: Create realistic 4D scenes from single-view videos using a novel multi-view video diffusion model.