🏢 Harvard University
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models
·2631 words·13 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Video Understanding
🏢 Harvard University
4D LangSplat learns 4D language fields for dynamic scenes using multimodal large language models, enabling time-sensitive open-vocabulary queries.
FLAG-Trader: Fusion LLM-Agent with Gradient-based Reinforcement Learning for Financial Trading
·2535 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
AI Applications
Finance
🏢 Harvard University
FLAG-TRADER fuses LLMs & RL for enhanced financial trading, achieving superior performance compared to traditional methods by efficiently integrating multimodal data and adapting to market dynamics.
Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion
·3907 words·19 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Image Generation
🏢 Harvard University
Affordance-Aware Object Insertion uses a novel Mask-Aware Dual Diffusion model & SAM-FB dataset to realistically place objects in scenes, considering contextual relationships.
GazeGen: Gaze-Driven User Interaction for Visual Content Generation
·2843 words·14 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Computer Vision
Human-AI Interaction
🏢 Harvard University
GazeGen uses real-time gaze tracking to enable intuitive hands-free visual content creation and editing, setting a new standard for accessible AR/VR interaction.