Skip to main content

🏢 Zhejiang University

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection
·2599 words·13 mins· loading · loading
AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Zhejiang University
InfiGUIAgent, a novel multimodal GUI agent, leverages a two-stage training pipeline to achieve advanced reasoning and GUI interaction capabilities, outperforming existing models in benchmarks.
OneKE: A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System
·379 words·2 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Information Extraction 🏢 Zhejiang University
OneKE: a dockerized, schema-guided LLM agent system efficiently extracts knowledge from diverse sources, offering adaptability and robust error handling.
Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models
·3014 words·15 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Zhejiang University
Orient Anything: Learning robust object orientation estimation directly from rendered 3D models, achieving state-of-the-art accuracy on real images.
Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation
·4162 words·20 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Zhejiang University
Prompting unlocks 4K metric depth from low-cost LiDAR.
ZipAR: Accelerating Autoregressive Image Generation through Spatial Locality
·2050 words·10 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Zhejiang University
ZipAR accelerates autoregressive image generation by up to 91% through parallel decoding leveraging spatial locality in images, making high-resolution image generation significantly faster.
Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion
·4118 words·20 mins· loading · loading
AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Zhejiang University
ScoreLiDAR: Distilling diffusion models for 5x faster, higher-quality 3D LiDAR scene completion!