Human-AI Interaction
UI-TARS: Pioneering Automated GUI Interaction with Native Agents
·4964 words·24 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Human-AI Interaction
🏢 ByteDance Seed, Tsinghua University
UI-TARS, a novel native GUI agent, achieves state-of-the-art performance by solely using screenshots as input, eliminating the need for complex agent frameworks and expert-designed workflows.
A3: Android Agent Arena for Mobile GUI Agents
·2276 words·11 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
AI Applications
Human-AI Interaction
🏢 Hong Kong University of Science and Technology
Android Agent Arena (A3): A novel evaluation platform for mobile GUI agents offering diverse tasks, flexible action space, and automated LLM-based evaluation, advancing real-world AI agent research.
PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World
·3633 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Human-AI Interaction
🏢 Shanghai Jiao Tong University
PC Agent: While you sleep, AI works! This AI system uses human cognition transfer to perform complex digital tasks, exceeding the capabilities of existing digital agents by efficiently learning from h…
GUI Agents: A Survey
·360 words·2 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Human-AI Interaction
🏢 University of Maryland
A comprehensive survey of GUI agents, categorizing benchmarks, architectures, training methods, and open challenges, providing a unified framework for researchers.
SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters
·3277 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Human-AI Interaction
🏢 SenseTime Research
SOLAMI: enabling immersive, natural interactions with 3D characters via a unified social vision-language-action model and a novel synthetic multimodal dataset.
SketchAgent: Language-Driven Sequential Sketch Generation
·5526 words·26 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Human-AI Interaction
🏢 MIT
SketchAgent uses a multimodal LLM to generate dynamic, sequential sketches from textual prompts, enabling collaborative drawing and chat-based editing.
The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use
·614 words·3 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
AI Applications
Human-AI Interaction
🏢 Show Lab, National University of Singapore
Claude 3.5 Computer Use: A groundbreaking AI model offering public beta graphical user interface (GUI) agent for computer use is comprehensively analyzed in this research. This study provides an out-o…
GazeGen: Gaze-Driven User Interaction for Visual Content Generation
·2843 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Human-AI Interaction
🏢 Harvard University
GazeGen uses real-time gaze tracking to enable intuitive hands-free visual content creation and editing, setting a new standard for accessible AR/VR interaction.
Navigating the Unknown: A Chat-Based Collaborative Interface for Personalized Exploratory Tasks
·6756 words·32 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
AI Applications
Human-AI Interaction
🏢 Southeast University
Collaborative Assistant for Personalized Exploration (CARE) enhances LLM chatbots for exploratory tasks by combining a multi-agent framework with a structured interface, delivering tailored solutions …
AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents
·3766 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Human-AI Interaction
🏢 Tsinghua University
ANDROIDLAB, a novel framework, systematically benchmarks Android autonomous agents, improving LLM and LMM success rates on 138 tasks via a unified environment and open-source dataset.
Survey of User Interface Design and Interaction Techniques in Generative AI Applications
·3567 words·17 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Multimodal Learning
Human-AI Interaction
🏢 UC San Diego
This study provides a comprehensive taxonomy of user interface design and interaction techniques in generative AI, offering valuable insights for developers and researchers aiming to enhance user expe…