↓Skip to main content

🏢 Singapore University of Technology and Design

Shifting Long-Context LLMs Research from Input to Output

6 March 2025·1724 words·9 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Text Generation 🏢 Singapore University of Technology and Design

Time to focus on LLM’s long-form outputs! This paper advocates for research on generating high-quality, long, and coherent text.

MotionLab: Unified Human Motion Generation and Editing via the Motion-Condition-Motion Paradigm

4 February 2025·4621 words·22 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Singapore University of Technology and Design

MotionLab: One framework to rule them all! Unifying human motion generation & editing via a novel Motion-Condition-Motion paradigm, boosting efficiency and generalization.

The Jumping Reasoning Curve? Tracking the Evolution of Reasoning Performance in GPT-[n] and o-[n] Models on Multimodal Puzzles

3 February 2025·3250 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Multimodal Reasoning 🏢 Singapore University of Technology and Design

GPT models’ multimodal reasoning abilities are tracked over time on challenging visual puzzles, revealing surprisingly steady improvement and cost trade-offs.

TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization

30 December 2024·3050 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Text Generation 🏢 Singapore University of Technology and Design

TANGOFLUX: Blazing-fast, high-fidelity text-to-audio generation using novel CLAP-Ranked Preference Optimization.

M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework

9 November 2024·2696 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Question Answering 🏢 Singapore University of Technology and Design

M-LongDoc: a new benchmark and retrieval-aware tuning framework revolutionizes multimodal long document understanding, improving model accuracy by 4.6%.