Skip to main content

🏢 Harvard University

4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models
·2631 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Harvard University
4D LangSplat learns 4D language fields for dynamic scenes using multimodal large language models, enabling time-sensitive open-vocabulary queries.
FLAG-Trader: Fusion LLM-Agent with Gradient-based Reinforcement Learning for Financial Trading
·2535 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers AI Applications Finance 🏢 Harvard University
FLAG-TRADER fuses LLMs & RL for enhanced financial trading, achieving superior performance compared to traditional methods by efficiently integrating multimodal data and adapting to market dynamics.
Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion
·3907 words·19 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Harvard University
Affordance-Aware Object Insertion uses a novel Mask-Aware Dual Diffusion model & SAM-FB dataset to realistically place objects in scenes, considering contextual relationships.
GazeGen: Gaze-Driven User Interaction for Visual Content Generation
·2843 words·14 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Human-AI Interaction 🏢 Harvard University
GazeGen uses real-time gaze tracking to enable intuitive hands-free visual content creation and editing, setting a new standard for accessible AR/VR interaction.