Human-AI Interaction

ViSpeak: Visual Instruction Feedback in Streaming Videos

17 March 2025·4700 words·23 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Human-AI Interaction 🏢 School of Computer Science and Engineering, Sun Yat-Sen University, China

ViSpeak: Enables visual instruction feedback in streaming videos, enhancing human-AI interaction.

AI-native Memory 2.0: Second Me

11 March 2025·1327 words·7 mins· loading · loading

AI Generated 🤗 Daily Papers AI Applications Human-AI Interaction 🏢 Mindverse.ai

AI-native memory 2.0 presents second me, an AI system for personal knowledge management.

EgoLife: Towards Egocentric Life Assistant

5 March 2025·3562 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Human-AI Interaction 🏢 NTU S-Lab

EgoLife: Ultra-long egocentric dataset & benchmark enabling AI assistants to understand and enhance daily life. Datasets and models released!

InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback

20 February 2025·3063 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Human-AI Interaction 🏢 National University of Singapore

InterFeedback: LMMs need better human feedback to enhance AI assistants!

UI-TARS: Pioneering Automated GUI Interaction with Native Agents

21 January 2025·4964 words·24 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Human-AI Interaction 🏢 ByteDance Seed, Tsinghua University

UI-TARS, a novel native GUI agent, achieves state-of-the-art performance by solely using screenshots as input, eliminating the need for complex agent frameworks and expert-designed workflows.

A3: Android Agent Arena for Mobile GUI Agents

2 January 2025·2276 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers AI Applications Human-AI Interaction 🏢 Hong Kong University of Science and Technology

Android Agent Arena (A3): A novel evaluation platform for mobile GUI agents offering diverse tasks, flexible action space, and automated LLM-based evaluation, advancing real-world AI agent research.

PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World

23 December 2024·3633 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Human-AI Interaction 🏢 Shanghai Jiao Tong University

PC Agent: While you sleep, AI works! This AI system uses human cognition transfer to perform complex digital tasks, exceeding the capabilities of existing digital agents by efficiently learning from h…

GUI Agents: A Survey

18 December 2024·360 words·2 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Human-AI Interaction 🏢 University of Maryland

A comprehensive survey of GUI agents, categorizing benchmarks, architectures, training methods, and open challenges, providing a unified framework for researchers.

SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters

29 November 2024·3277 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Human-AI Interaction 🏢 SenseTime Research

SOLAMI: enabling immersive, natural interactions with 3D characters via a unified social vision-language-action model and a novel synthetic multimodal dataset.

SketchAgent: Language-Driven Sequential Sketch Generation

26 November 2024·5526 words·26 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Human-AI Interaction 🏢 MIT

SketchAgent uses a multimodal LLM to generate dynamic, sequential sketches from textual prompts, enabling collaborative drawing and chat-based editing.

The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use

15 November 2024·614 words·3 mins· loading · loading

AI Generated 🤗 Daily Papers AI Applications Human-AI Interaction 🏢 Show Lab, National University of Singapore

Claude 3.5 Computer Use: A groundbreaking AI model offering public beta graphical user interface (GUI) agent for computer use is comprehensively analyzed in this research. This study provides an out-o…

GazeGen: Gaze-Driven User Interaction for Visual Content Generation

7 November 2024·2843 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Human-AI Interaction 🏢 Harvard University

GazeGen uses real-time gaze tracking to enable intuitive hands-free visual content creation and editing, setting a new standard for accessible AR/VR interaction.

Navigating the Unknown: A Chat-Based Collaborative Interface for Personalized Exploratory Tasks

31 October 2024·6756 words·32 mins· loading · loading

AI Generated 🤗 Daily Papers AI Applications Human-AI Interaction 🏢 Southeast University

Collaborative Assistant for Personalized Exploration (CARE) enhances LLM chatbots for exploratory tasks by combining a multi-agent framework with a structured interface, delivering tailored solutions …

AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents

31 October 2024·3766 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Human-AI Interaction 🏢 Tsinghua University

ANDROIDLAB, a novel framework, systematically benchmarks Android autonomous agents, improving LLM and LMM success rates on 138 tasks via a unified environment and open-source dataset.

Survey of User Interface Design and Interaction Techniques in Generative AI Applications

28 October 2024·3567 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Human-AI Interaction 🏢 UC San Diego

This study provides a comprehensive taxonomy of user interface design and interaction techniques in generative AI, offering valuable insights for developers and researchers aiming to enhance user expe…