↓Skip to main content

🏢 Harvard University

4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models

13 March 2025·2631 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Harvard University

4D LangSplat learns 4D language fields for dynamic scenes using multimodal large language models, enabling time-sensitive open-vocabulary queries.

FLAG-Trader: Fusion LLM-Agent with Gradient-based Reinforcement Learning for Financial Trading

17 February 2025·2535 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers AI Applications Finance 🏢 Harvard University

FLAG-TRADER fuses LLMs & RL for enhanced financial trading, achieving superior performance compared to traditional methods by efficiently integrating multimodal data and adapting to market dynamics.

Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion

19 December 2024·3907 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Harvard University

Affordance-Aware Object Insertion uses a novel Mask-Aware Dual Diffusion model & SAM-FB dataset to realistically place objects in scenes, considering contextual relationships.

GazeGen: Gaze-Driven User Interaction for Visual Content Generation

7 November 2024·2843 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Human-AI Interaction 🏢 Harvard University

GazeGen uses real-time gaze tracking to enable intuitive hands-free visual content creation and editing, setting a new standard for accessible AR/VR interaction.