↓Skip to main content

🏢 Institute of Information Engineering, Chinese Academy of Sciences

TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control

26 September 2024·2216 words·11 mins· loading · loading

Image Generation 🏢 Institute of Information Engineering, Chinese Academy of Sciences

TextCtrl: a novel diffusion-based scene text editing method using prior guidance control, achieving superior style fidelity and accuracy with a new real-world benchmark dataset, ScenePair.

SpeechForensics: Audio-Visual Speech Representation Learning for Face Forgery Detection

26 September 2024·2151 words·11 mins· loading · loading

Computer Vision Face Recognition 🏢 Institute of Information Engineering, Chinese Academy of Sciences

SpeechForensics leverages audio-visual speech representation learning to achieve superior face forgery detection, outperforming state-of-the-art methods in cross-dataset generalization and robustness.

Not All Diffusion Model Activations Have Been Evaluated as Discriminative Features

26 September 2024·3306 words·16 mins· loading · loading

Image Generation 🏢 Institute of Information Engineering, Chinese Academy of Sciences

Unlocking superior discriminative features from diffusion models, this research reveals key activation properties for effective feature selection, surpassing state-of-the-art methods.