↗ OpenReview ↗ NeurIPS Proc. ↗ Chat
TL;DR#
Active learning aims to minimize labeled examples needed for accurate classification. However, existing methods often struggle with noisy labels, leading to high query complexity. This paper tackles this challenge by focusing on halfspace learning, a fundamental problem in machine learning. The existing algorithms are sensitive to noisy labels and do not scale well to larger datasets.
The researchers introduce a new type of query, called Threshold Statistical Queries (TSQ), designed to be robust to noisy labels. They develop a novel algorithm that uses TSQs to efficiently learn halfspaces under the Massart noise model, a common type of noise in real-world data. Their algorithm uses significantly fewer queries compared to previous approaches and achieves similar accuracy. Importantly, the paper also proves that for the case of agnostic noise, high query complexity is unavoidable, even for simpler classification tasks.
Key Takeaways#
Why does it matter?#
This paper is crucial for researchers in active learning and machine learning dealing with noisy data. It presents the first query-efficient algorithm for learning halfspaces under the Massart noise model, a significant advancement. This opens avenues for improving active learning algorithms’ robustness and efficiency in real-world applications where noisy labels are common. The negative results on agnostic noise highlight fundamental limitations, guiding future research directions.