🏢 University of Hong Kong

Sonata: Self-Supervised Learning of Reliable Point Representations

20 March 2025·2429 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 University of Hong Kong

Sonata: Reliable 3D point cloud self-supervised learning through self-distillation, achieving SOTA with less data.

Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation

20 March 2025·3405 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of Hong Kong

TokenBridge bridges continuous and discrete tokens for autoregressive visual generation, achieving high-quality synthesis with simple autoregressive modeling.

UniTok: A Unified Tokenizer for Visual Generation and Understanding

27 February 2025·3043 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 University of Hong Kong

UniTok: A unified tokenizer bridging the visual generation and understanding gap via multi-codebook quantization, achieving SOTA in MLLMs.

Goku: Flow Based Video Generative Foundation Models

7 February 2025·3430 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of Hong Kong

Goku: a novel family of joint image-and-video generation models uses rectified flow Transformers, achieving industry-leading performance with a robust data pipeline and training infrastructure.

Teaching Language Models to Critique via Reinforcement Learning

5 February 2025·4328 words·21 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University of Hong Kong

LLMs learn to critique and refine their output via reinforcement learning, significantly improving code generation.

GameFactory: Creating New Games with Generative Interactive Videos

14 January 2025·3286 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 University of Hong Kong

GameFactory uses AI to generate entirely new games within diverse, open-domain scenes by learning action controls from a small dataset and transferring them to pre-trained video models.

FashionComposer: Compositional Fashion Image Generation

18 December 2024·2265 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of Hong Kong

FashionComposer revolutionizes fashion image creation through flexible composition of garments, faces, and poses.

UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics

10 December 2024·3117 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of Hong Kong

UniReal: a universal framework for image generation and editing, unifying diverse tasks via learning real-world dynamics from video data, achieving highly realistic and versatile results.

Moto: Latent Motion Token as the Bridging Language for Robot Manipulation

5 December 2024·3555 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers AI Applications Robotics 🏢 University of Hong Kong

Moto: Bridging language for robot manipulation using latent motion tokens, achieving superior performance with limited data.

TEXGen: a Generative Diffusion Model for Mesh Textures

22 November 2024·3720 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of Hong Kong

TEXGen: A groundbreaking generative diffusion model creates high-resolution 3D mesh textures directly from text and image prompts, exceeding prior methods in quality and efficiency.

SAMPart3D: Segment Any Part in 3D Objects

11 November 2024·3136 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 University of Hong Kong

SAMPart3D: Zero-shot 3D part segmentation across granularities, scaling to large datasets & handling part ambiguity.