PEBench: A Fictitious Dataset to Benchmark Machine Unlearning for Multimodal Large Language Models

2503.12545

Zhaopan Xu et el.

🤗 2025-03-19

TL;DR
#

Multimodal Large Language Models have shown great improvements. However, their dependence on vast amounts of internet data raises privacy concerns. Machine unlearning(MU) is a solution, allowing removal of knowledge from trained models without retraining. Existing MU evaluations are incomplete and poorly defined, hindering secure system development. Prior benchmarks are limited to discrete entities and overlook the coupling of concepts within images.

This paper introduces a new benchmark designed to evaluate machine unlearning(MU) performance in Multimodal Large Language Models(MLLMs). The benchmark assesses both personal entity and general event unlearning, revealing limitations of current MU methods. It benchmarks MU methods, revealing strengths and weaknesses, providing guidance for future improvements and enhances the security of multimodal models.

Key Takeaways
#

Why does it matter?
#

This research introduces PEBench, a new benchmark for assessing machine unlearning in multimodal models. By providing a comprehensive dataset, it addresses gaps in current evaluations. This work will advance secure multimodal models and opens avenues for further investigation into the challenges and opportunities of machine unlearning.

Visual Insights
#

🔼 Figure 1 illustrates the concept of machine unlearning (MU) in multimodal large language models (MLLMs) using an example image of Joe Biden speaking at the White House. Panel (a) shows that before unlearning, the MLLM correctly identifies both the person (Joe Biden) and the event (speaking at the White House). The goal of MU is to selectively remove specific information from the model without retraining. Panel (b) demonstrates the result when the unlearning target is the ‘Identity’ of Joe Biden; the model incorrectly identifies him as someone else. Panel (c) shows the outcome when the unlearning target is the ‘Event’; the model misinterprets the event as a concert. This figure highlights the challenge of MU in MLLMs, where removing specific information can unintentionally affect related concepts.
read the caption
Figure 1: Example of an image of Joe Biden speaking at the White House. Before unlearning (a) MLLMs have the ability to generate responses related to various visual concepts (Identify and Event). The goal of Machine Unlearning (MU) for MLLMs is to selectively forget specific concepts within the model. When the unlearning target is Identity (b), the model mistakenly identifies Joe Biden as a different person. When the unlearning target is Event (c), the model misinterprets the speech as a concert.

Method	Person Unlearning						Events Unlearning
	Efficacy	Generality	Retain	Scope	Real	World Fact	Efficacy	Generality	Retain	Scope	Real	World Fact
	Precision	Precision	Precision	ROUGE-L	Precision	POPE	G-Eval	G-Eval	G-Eval	Precision	ROUGE-L	POPE
Finetune (Base)	0.0	2.24	97.53	0.98	100.0	85.88	0.18	0.20	0.99	100.00	0.56	85.88
PO [30]	100.00	100.00	4.12	0.89	86.64	78.52	0.21	0.22	0.98	98.86	0.44	77.23
GA [38]	100.00	100.00	3.89	0.91	71.64	78.01	0.51	0.49	0.62	78.50	0.24	78.82
GD [24]	98.89	98.89	21.48	0.86	76.87	77.08	0.58	0.56	0.88	81.50	0.30	79.07
KL [37]	100.00	99.70	5.00	0.81	73.88	78.73	0.55	0.51	0.84	80.75	0.25	78.75
SIU [21]	100.00	100.00	10.36	0.90	80.43	79.02	0.48	0.46	0.74	84.50	0.48	80.07
DPO [33]	100.00	100.00	8.64	0.92	82.63	78.38	0.43	0.41	0.80	83.10	0.35	79.28
Goal (Upper Bound)	100.00	100.00	96.38	0.99	100.00	87.52	0.97	0.98	0.99	100.00	0.55	87.52

🔼 This table presents a comprehensive evaluation of six different machine unlearning (MU) methods on the PEBench benchmark dataset. The evaluation focuses on the task of removing specific personal entities and event information from a multimodal large language model (MLLM). For each method, the table reports four key metrics: Efficacy (how well the model forgets the targeted information), Generality (how well the forgetting generalizes to unseen data), Retain (how well the model retains knowledge of untargeted information), and World Fact (how well the model performs on general world knowledge). The ‘Finetune’ row provides the baseline performance of the model without unlearning, and the ‘Goal’ row represents the ideal performance if the unwanted data could be perfectly removed without retraining.
read the caption
Table 1: Performance overview of different MU methods evaluated on PEBench. The performance metrics include Efficacy, Generality, Retain, Real, and World Fact. A higher score represents better performance. Finetune represents the baseline performance (lower bound for unlearning), and Goal represents the ideal unlearning model (upper bound).

In-depth insights
#

MU for MLLMs
#

MU for MLLMs presents unique challenges. Erasing knowledge from these models requires careful consideration due to their multimodal nature. Current benchmarks may not fully capture the complexity of real-world scenarios, especially the intricate relationships between entities and events. Selective forgetting, without impacting related concepts, is crucial for practical applications like privacy protection and content moderation. Further research is needed to develop more robust and nuanced MU techniques tailored to MLLMs.

PEBench Intro
#

PEBench, as introduced in the abstract, is a new benchmark designed to rigorously assess machine unlearning (MU) techniques specifically within Multimodal Large Language Models (MLLMs). The necessity of PEBench arises from the limitations of current MU evaluations, which often lack comprehensiveness and a clear problem definition, hindering advancements in secure and trustworthy AI systems. The dataset is personal entities and event scenes, it aims to provide a standardized framework for MU research in MLLMs, which should make advancing privacy-preserving multimodal models much easier. The experiments done reveal strengths, limitations of MU methods, also key areas for progress in MLLM unlearning.

SynthData+MU
#

Synthetic data offers a controlled environment for machine unlearning (MU) research, allowing researchers to systematically manipulate data characteristics and assess MU methods’ effectiveness. This approach addresses the challenge of data dependencies, ensuring reliable evaluation. By focusing on data absent from pre-training, benchmarks can establish an ‘unlearned’ state, facilitating comparisons. Synthetic data also enables targeted generation of specific scenarios, like harmful content. This aids in stress-testing MU algorithms. Challenges include bridging the gap between synthetic and real-world data, ensuring that lessons learned from synthetic datasets generalize effectively. Further work might focus on transfer learning techniques or domain adaptation methods to improve the applicability of synthetic data to real-world MU scenarios.

G-Eval: Event MU
#

G-Eval for Event MU is a key metric for assessing the effectiveness of machine unlearning, specifically focusing on how well a model “forgets” or removes specific events. This evaluation likely employs GPT-4 to assess the similarity between the unlearned model’s output, a ‘ground truth,’ and an ideal ‘goal’ model’s output. The G-Eval score likely ranges from 0 to 1. A score closer to 1 could signify the unlearned output closely matches the ideal model, indicating effective event removal, while a lower score suggests the unlearned model retains undesirable information or leans towards the original state. It’s crucial in multimodal scenarios as it considers how unlearning affects the overall context.

BGD+Balancing
#

While ‘BGD+Balancing’ isn’t explicitly a heading in the paper, the concept is present, likely referring to a balanced gradient difference approach incorporating data and task balancing, as introduced in the paper. A BGD approach aims to enhance machine unlearning by addressing data imbalance challenges. It focuses on dynamically adjusting the sampling ratio between event and individual data to avoid one dominating the learning process. Multi-task balancing will include applying separate loss functions to the individual and event unlearning. This strategy helps in mitigating interference when learning both targets. Combining BGD with Gradient Difference allows for better fine-tuning while unlearning, leading to higher effectiveness for the unlearning performance in both personal entities and event scenes. Also, this approach will prevent a potential ‘collapse’ of performance by carefully balancing the learning signals.

More visual insights
#

More on tables

Method	Person Unlearning					Events Unlearning
Method	Efficacy	Generality	Retain	Real	World Fact	Efficacy	Generality	Retain	Real	World Fact
Finetune (Base)	0.0	2.24	97.53	100.0	85.88	0.18	0.20	0.99	0.56	85.88
GD [24]	55.00	55.00	39.72	95.80	77.08	0.36	0.34	0.88	0.37	77.08
GD_+BGD	63.50_+8.5	62.58_+7.6	28.32_-11.4	88.65_-7.2	78.56_+1.5	0.47_+0.1	0.50_+0.2	0.73_-0.2	0.45_+0.1	78.36_+1.3
KL [37]	36.36	36.36	22.41	58.88	70.23	0.34	0.32	0.82	0.40	66.54
KL_+BGD	48.10_+11.7	48.10_+11.7	18.67_-3.7	55.34_-3.5	68.62_-1.6	0.42_+0.1	0.41_+0.1	0.76_-0.1	0.42_+0.02	67.04_+0.5
Goal (Upper Bound)	100.00	100.00	96.38	100.00	87.52	0.97	0.98	0.99	0.55	87.52

🔼 Table 2 presents a performance comparison of six different machine unlearning methods when applied to simultaneously remove both personal entities and event information from a multimodal large language model. It shows the efficacy, generality (how well the unlearning generalizes to unseen data), retention (how well the model retains knowledge of other, unlearned data), and real-world performance (on a separate, real-world dataset) for each method. The ‘+’ symbol indicates improvement over the baseline, while ‘-’ shows a decrease in performance for a given metric. This table highlights the challenges of simultaneous unlearning and the need for better-performing methods.
read the caption
Table 2: Performance overview of simultaneously unlearn people and events. +{\color[rgb]{0.22265625,0.7109375,0.2890625}\definecolor[named]{pgfstrokecolor% }{rgb}{0.22265625,0.7109375,0.2890625}+}+ (or −{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}-}-) indicates the performance gain (or decrease) compared to the base method.

Num Outputs	Num Inference Steps	Guidance	True Gs	Width	Height
1	40	2.5	3.5	512	512

🔼 This table lists the hyperparameters used in the Flux image generation model. These parameters control various aspects of the image generation process, including the number of images generated, the number of inference steps used, and the dimensions (width and height) of the output images. Understanding these settings is crucial for interpreting the quality and characteristics of the generated images within the PEBench dataset.
read the caption
Table 3: Flux hyper-parameters.

Task Name	General Prompt Format
Science & Research	Biologist, Physicist, Archaeologist, Ecologist
Healthcare & Medicine	Doctor, Nurse, Physical Therapist, Psychologist
Technology & Engineering	Software Developer, Electrical Engineer, Mechanical Engineer, Cybersecurity Specialist
Environmental & Agriculture	Environmental Scientist, Agronomist, Forester, Soil Scientist
Arts & Creative Fields	Painter, Musician, Writer, Graphic Designer
Business & Finance	Accountant, Market Analyst, Financial Advisor, Project Manager
Public Service & Community Support	Police Officer, Firefighter, Social Worker, Nonprofit Coordinator
Education & Culture	Teacher, Trainer, Librarian, Museum Curator
Media & Communications	Journalist, Broadcaster, Content Creator, Public Relations Specialist
Architecture & Construction	Architect, Civil Engineer, Construction Worker, Surveyor
Law & Policy	Lawyer, Judge, Policy Analyst, Legislative Assistant
Retail & Services	Retail Manager, Customer Service Representative, Hotel Concierge, Sales Associate
Sports & Fitness	Athlete, Fitness Coach, Physical Trainer, Yoga Instructor
Logistics & Transportation	Logistics Manager, Truck Driver, Pilot, Shipping Coordinator
Energy & Natural Resources	Petroleum Engineer, Geologist, Renewable Energy Consultant, Miner
Unemployed	Job Seeker, Stay-at-home Parent, Retired, Freelancer, Entrepreneur, Consultant, Artist
Students	Primary School Student, Junior High Student, High School Junior, Undergraduate Student, Community College Student, Master’s Student, Doctoral Student, Research Assistant, Apprentice, Technical School Student

🔼 Table 4 presents a detailed categorization of occupations across various sectors, including science, healthcare, technology, arts, and public services. Each category includes several specific job examples, providing a comprehensive illustration of the diverse range of professions represented in the dataset. This ensures a realistic and representative portrayal of the occupational landscape.
read the caption
Table 4: The categorization of jobs across various domains, including science, healthcare, technology, arts, and public services. The second column provides specific examples of jobs within each category, offering a comprehensive overview of the dataset’s occupational diversity.

Region	Cities
North America	New York City, USA; Toronto, Canada; Mexico City, Mexico; Vancouver, Canada; San Juan, Puerto Rico
South America	São Paulo, Brazil; Buenos Aires, Argentina; Caracas, Venezuela; Quito, Ecuador; Lima, Peru
Europe	Paris, France; Berlin, Germany; Stockholm, Sweden; Helsinki, Finland; Zurich, Switzerland; Lisbon, Portugal; Dublin, Ireland; Warsaw, Poland; Vienna, Austria; Reykjavik, Iceland; Bucharest, Romania
Africa	Cairo, Egypt; Cape Town, South Africa; Lagos, Nigeria; Nairobi, Kenya; Accra, Ghana; Dakar, Senegal; Addis Ababa, Ethiopia; Casablanca, Morocco; Kigali, Rwanda
Asia	Tokyo, Japan; Mumbai, India; Seoul, South Korea; Bangkok, Thailand; Istanbul, Turkey; Dubai, United Arab Emirates; Jakarta, Indonesia; Hanoi, Vietnam; Amman, Jordan; Doha, Qatar; Ulaanbaatar, Mongolia; Male, Maldives; Phnom Penh, Cambodia; Beijing, China; Shanghai, China
Australia & Oceania	Sydney, Australia; Wellington, New Zealand; Brisbane, Australia; Suva, Fiji; Port Moresby, Papua New Guinea
Middle East	Riyadh, Saudi Arabia; Tehran, Iran; Baghdad, Iraq; Beirut, Lebanon; Muscat, Oman

🔼 Table 5 presents a list of cities categorized by their respective continents and regions. This categorization is designed to showcase the geographic diversity encompassed within the PEBench dataset. The inclusion of a wide range of cities from different continents and regions emphasizes the global nature of the data and its representation of diverse geographic locations.
read the caption
Table 5: Cities categorized by their respective regions, highlighting diverse geographical.

Event	Description	Keywords
media interview	Participating in an interview with a local media outlet. The setting is a well-lit studio or casual setup, depending on the person’s profession. The conversation is captured by a small crew with minimal background distractions.	”interview”, ”local media”, ”studio”, ”casual setup”, ”conversation”, ”crew”, ”minimal distractions”, ”well-lit”, ”profession”, ”professional”
park jogging	Exercising or relaxing in a nearby park. The park is peaceful with trees and walking paths, a serene backdrop for professionals, students, or retirees enjoying nature.	”park”, ”jogging”, ”exercising”, ”relaxing”, ”nature”, ”trees”, ”walking paths”, ”serene”, ”peaceful”, ”outdoors”
farm visit	Visiting a local farm, surrounded by green fields and farm animals. The atmosphere is peaceful and natural, perfect for relaxing or learning about agriculture.	”farm”, ”visit”, ”green fields”, ”farm animals”, ”peaceful”, ”natural”, ”agriculture”, ”learning”, ”outdoors”, ”relaxing”
dinner with friends	Enjoying a meal with friends or family at a local restaurant. The restaurant has a cozy, informal setting, suitable for unwinding after a busy day.	”dinner”, ”friends”, ”family”, ”restaurant”, ”cozy”, ”informal”, ”meal”, ”unwinding”, ”relaxed”, ”evening”
landmark visit	Visiting a notable city landmark, adding a cultural aspect to their day. The clear weather and bustling tourist atmosphere offer a nice break from their routine.	”landmark”, ”city”, ”tourist”, ”cultural”, ”visit”, ”weather”, ”atmosphere”, ”bustling”, ”break”, ”routine”
zoo visit	Exploring a local zoo, observing animals in their habitats. The setting is educational and family-friendly, perfect for learning about wildlife.	”zoo”, ”animals”, ”habitats”, ”education”, ”family-friendly”, ”wildlife”, ”exploring”, ”local”, ”learning”, ”nature”
shopping mall	Walking through a busy shopping mall, either to relax or purchase essentials. The mall is brightly lit, with various stores and other people enjoying a bustling atmosphere.	”shopping”, ”mall”, ”bustling”, ”stores”, ”shopping experience”, ”brightly lit”, ”people”, ”relaxing”, ”purchasing”, ”atmosphere”
public lecture	Attending or presenting a lecture at a university or community center. The atmosphere is formal, with people attentively listening, suitable for professionals, students, or anyone interested in continuous learning.	”lecture”, ”public”, ”university”, ”community center”, ”formal”, ”attendees”, ”presentation”, ”education”, ”learning”, ”professional”
gym workout	Engaging in a workout at a local gym. The gym has spacious areas for various exercises and equipment, creating a focused and energetic environment for fitness enthusiasts of all ages.	”gym”, ”workout”, ”exercises”, ”fitness”, ”spacious”, ”equipment”, ”energetic”, ”environment”, ”focus”, ”physical”
dance event	Dancing or socializing at a club or festive event. The atmosphere is vibrant, with colorful lights and music setting a lively mood.	”dance”, ”event”, ”club”, ”music”, ”socializing”, ”vibrant”, ”colorful”, ”lights”, ”festive”, ”lively”
coffee shop reading	Enjoying a coffee break in a cozy café. The ambiance is quiet and relaxed, perfect for reading, working on a laptop, or chatting with friends.	”coffee shop”, ”reading”, ”cozy”, ”relaxed”, ”ambient”, ”quiet”, ”laptop”, ”break”, ”friends”, ”work”
airport waiting	Waiting at an airport terminal for a flight, surrounded by other travelers. The modern, glass-walled terminal offers views of the runway, creating a calm and organized atmosphere.	”airport”, ”waiting”, ”travel”, ”terminal”, ”flight”, ”runway”, ”modern”, ”organized”, ”passengers”, ”calm”
concert attendance	Attending a live concert in an open-air or indoor venue. The crowd is lively, cheering and enjoying the music in a spirited environment.	”concert”, ”live music”, ”crowd”, ”lively”, ”spirited”, ”performance”, ”audience”, ”indoor”, ”outdoor”, ”energy”
beach relaxing	Relaxing by the seaside, with gentle waves and a clear sky. This peaceful setting is ideal for a break from their routine, whether alone or with family.	”beach”, ”relaxing”, ”seaside”, ”waves”, ”clear sky”, ”peaceful”, ”break”, ”family”, ”serene”, ”outdoors”
business meeting	Participating in a business or professional meeting in a modern conference room. The background shows large windows with a city view, creating a productive atmosphere.	”business”, ”meeting”, ”conference room”, ”professional”, ”city view”, ”windows”, ”productive”, ”discussion”, ”corporate”, ”formal”
museum tour	Exploring a museum filled with historical or artistic exhibits. The lighting is dim with spotlights on displays, creating a reflective environment for visitors.	”museum”, ”tour”, ”historical”, ”artistic”, ”exhibits”, ”spotlights”, ”dim lighting”, ”reflective”, ”atmosphere”, ”culture”
car driving	Driving through a scenic area, either in the city or countryside, during sunset. The road is lined with buildings or natural landscapes, creating a calm and picturesque atmosphere.	”car”, ”driving”, ”scenic”, ”sunset”, ”road”, ”landscapes”, ”city”, ”countryside”, ”picturesque”, ”travel”
grocery shopping	Picking up essentials at a well-organized grocery store. The bright lighting and neatly stocked shelves create a comfortable and efficient shopping experience.	”grocery”, ”shopping”, ”store”, ”essentials”, ”organized”, ”bright lighting”, ”efficient”, ”comfortable”, ”experience”, ”shopping”

🔼 Table 6 presents a comprehensive list of 40 different event scenarios, each described in detail. For each scenario, a set of keywords has been extracted to concisely summarize its key features and characteristics. These keywords are not simply descriptive; they are carefully selected to be relevant for evaluating the effectiveness of the machine unlearning process in the context of the PEBench framework. The table thus serves as a crucial component of the evaluation methodology, providing a structured and standardized way to assess the model’s ability to forget specific concepts while retaining other knowledge.
read the caption
Table 6: Event Descriptions with Corresponding Keywords (part one). Each event description provides a detailed explanation of the scenario and is associated with a list of extracted keywords that capture the essence of the scene. These keywords are used for evaluation purposes in our framework.

Event	Description	Keywords
marathon running	Running in a local marathon event. The streets are lined with cheering crowds, and the weather is clear, creating an energetic and community-oriented environment.	”marathon”, ”running”, ”event”, ”streets”, ”cheering”, ”crowds”, ”clear weather”, ”community”, ”energy”, ”fitness”
art gallery visit	Strolling through an art gallery or exhibition. The gallery has soft lighting and showcases various artworks, allowing for a calm, introspective experience.	”art gallery”, ”visit”, ”exhibits”, ”artwork”, ”soft lighting”, ”calm”, ”introspective”, ”atmosphere”, ”culture”, ”reflection”
family gathering	Spending time with family at a comfortable home setting. The room is warmly lit with family mementos and a friendly, welcoming atmosphere.	”family”, ”gathering”, ”home”, ”warmly lit”, ”mementos”, ”friendly”, ”welcoming”, ”atmosphere”, ”comfort”, ”together”
bookstore browsing	Browsing through books in a quaint bookstore. The small, quiet setting is filled with shelves of books, perfect for leisurely exploration.	”bookstore”, ”browsing”, ”books”, ”quaint”, ”quiet”, ”shelves”, ”exploration”, ”reading”, ”leisure”, ”relaxed”
mountain cabin retreat	Relaxing at a cabin in the mountains. The area is peaceful, surrounded by trees and distant mountain views, creating a tranquil and refreshing setting.	”mountain”, ”cabin”, ”retreat”, ”peaceful”, ”trees”, ”views”, ”tranquil”, ”refreshing”, ”nature”, ”serene”
office working	Working or studying at a desk in a modern office. The room has large windows with natural light, creating a productive and quiet atmosphere for focused tasks.	”office”, ”working”, ”desk”, ”modern”, ”conference room”, ”windows”, ”natural light”, ”focused”, ”quiet”, ”productive”
train commute	Traveling on a busy train, either standing or seated, surrounded by passengers absorbed in various activities. The setting is organized, creating a routine commute experience.	”train”, ”commute”, ”busy”, ”seated”, ”standing”, ”passengers”, ”routine”, ”travel”, ”organized”, ”routine”
mountain hiking	Hiking along a scenic mountain trail. The view of mountains and clear sky adds a refreshing and peaceful ambiance to the experience.	”mountain”, ”hiking”, ”trail”, ”scenic”, ”view”, ”clear sky”, ”peaceful”, ”refreshing”, ”nature”, ”outdoors”
school presentation	Delivering or observing a presentation in a classroom. The students are attentive, creating an academic atmosphere suited for sharing knowledge.	”school”, ”presentation”, ”classroom”, ”students”, ”attentive”, ”academic”, ”learning”, ”sharing knowledge”, ”formal”, ”education”
restaurant dining	Dining at an upscale restaurant. The lighting is dim, and the decor is elegant, creating an intimate and refined ambiance.	”restaurant”, ”dining”, ”upscale”, ”dim lighting”, ”elegant”, ”refined”, ”intimate”, ”ambiance”, ”meal”, ”gourmet”
night sky stargazing	Observing the night sky at an outdoor stargazing event. Telescopes are set up, and the setting is quiet with a clear view of the stars, creating a magical atmosphere.	”night sky”, ”stargazing”, ”outdoors”, ”telescopes”, ”quiet”, ”clear view”, ”stars”, ”magical”, ”peaceful”, ”event”
snowshoeing	Exploring a snowy forest on a snowshoeing trail. The setting is quiet, with only the sound of footsteps in the snow, creating a peaceful winter atmosphere.	”snowshoeing”, ”forest”, ”snow”, ”winter”, ”trail”, ”quiet”, ”footsteps”, ”peaceful”, ”nature”, ”serene”
city bike ride	Riding a bike along city streets or designated trails. The background showcases tall buildings or park areas, creating a blend of urban and natural scenery.	”bike ride”, ”city”, ”streets”, ”trails”, ”urban”, ”scenery”, ”buildings”, ”park”, ”nature”, ”dynamic”
fashion show	Attending a fashion show. The atmosphere is glamorous, with a runway spotlighting models and guests observing the latest trends in fashion.	”fashion”, ”show”, ”runway”, ”models”, ”glamorous”, ”spotlight”, ”trends”, ”observation”, ”fashionable”, ”elegant”
fishing trip	Fishing by a serene lake. The landscape is surrounded by greenery, and the atmosphere is peaceful with only nature’s sounds in the background.	”fishing”, ”trip”, ”lake”, ”serene”, ”greenery”, ”nature”, ”outdoors”, ”peaceful”, ”relaxing”, ”scenic”
train station waiting	Waiting at a quiet train station platform, with schedules displayed on an electronic board. The atmosphere is calm, with passengers nearby preparing for their commute.	”train station”, ”waiting”, ”platform”, ”calm”, ”passengers”, ”quiet”, ”departure”, ”travel”, ”routine”, ”organized”
charity event	Participating in a community charity event in a large hall. The room is decorated for the occasion, with guests mingling and the mood warm and friendly.	”charity”, ”event”, ”community”, ”hall”, ”guests”, ”mingling”, ”decorated”, ”mood”, ”warm”, ”friendly”

🔼 Table 7 presents a comprehensive list of events and their corresponding keywords. Each event is described in detail, providing context and setting. The associated keywords capture the key aspects of the event’s visual and thematic elements. These keywords are crucial for the evaluation of the model’s performance in the PEBench framework.
read the caption
Table 7: Event Descriptions with Corresponding Keywords (part two). Each event description provides a detailed explanation of the scenario and is associated with a list of extracted keywords that capture the essence of the scene. These keywords are used for evaluation purposes in our framework.

Event	Description	Keywords
nature photography	Taking photographs in a scenic forest or park. The atmosphere is quiet and filled with the sounds of nature, perfect for capturing the beauty of the outdoors.	”photography”, ”nature”, ”forest”, ”park”, ”outdoors”, ”quiet”, ”scenic”, ”capturing”, ”beauty”, ”peaceful”
library studying	Studying or reading in a quiet library. The tall bookshelves and soft lighting create an ideal setting for focused learning.	”library”, ”studying”, ”bookshelves”, ”quiet”, ”focused”, ”reading”, ”learning”, ”atmosphere”, ”soft lighting”, ”introspective”
boat trip	Taking a relaxing boat trip along a calm river or lake. The sky is clear, and the scenic landscape adds to the peacefulness of the outing.	”boat trip”, ”river”, ”lake”, ”relaxing”, ”scenic”, ”peaceful”, ”water”, ”landscape”, ”clear sky”, ”nature”
biking trail	Riding a bike along a nature trail, with trees lining the path. The refreshing environment and dappled sunlight create a peaceful atmosphere.	”bike”, ”trail”, ”nature”, ”trees”, ”path”, ”outdoors”, ”scenic”, ”sunlight”, ”peaceful”, ”refreshing”
city walk	Walking through a lively city center. The street is lined with shops and bustling with people, providing a vibrant and dynamic urban experience.	”city walk”, ”lively”, ”shops”, ”bustling”, ”urban”, ”dynamic”, ”streets”, ”people”, ”downtown”, ”exploring”

🔼 Table 8 presents event descriptions and their corresponding keywords. Each description details a specific event scenario (e.g., library studying, nature photography, boat trip). Associated with each description is a list of keywords that concisely summarize the scene’s key elements. These keywords are used during the evaluation phase of the PEBench framework to assess the performance of various machine unlearning methods on multimodal data.
read the caption
Table 8: Event Descriptions with Corresponding Keywords (part three). Each event description provides a detailed explanation of the scenario and is associated with a list of extracted keywords that capture the essence of the scene. These keywords are used for evaluation purposes in our framework.

TL;DR#

Key Takeaways#

Why does it matter?#

Visual Insights#

In-depth insights#

MU for MLLMs#

PEBench Intro#

SynthData+MU#

G-Eval: Event MU#

BGD+Balancing#

More visual insights#

Full paper#