TL;DR#
Multimodal Large Language Models have shown great improvements. However, their dependence on vast amounts of internet data raises privacy concerns. Machine unlearning(MU) is a solution, allowing removal of knowledge from trained models without retraining. Existing MU evaluations are incomplete and poorly defined, hindering secure system development. Prior benchmarks are limited to discrete entities and overlook the coupling of concepts within images.
This paper introduces a new benchmark designed to evaluate machine unlearning(MU) performance in Multimodal Large Language Models(MLLMs). The benchmark assesses both personal entity and general event unlearning, revealing limitations of current MU methods. It benchmarks MU methods, revealing strengths and weaknesses, providing guidance for future improvements and enhances the security of multimodal models.
Key Takeaways#
Why does it matter?#
This research introduces PEBench, a new benchmark for assessing machine unlearning in multimodal models. By providing a comprehensive dataset, it addresses gaps in current evaluations. This work will advance secure multimodal models and opens avenues for further investigation into the challenges and opportunities of machine unlearning.
Visual Insights#
🔼 Figure 1 illustrates the concept of machine unlearning (MU) in multimodal large language models (MLLMs) using an example image of Joe Biden speaking at the White House. Panel (a) shows that before unlearning, the MLLM correctly identifies both the person (Joe Biden) and the event (speaking at the White House). The goal of MU is to selectively remove specific information from the model without retraining. Panel (b) demonstrates the result when the unlearning target is the ‘Identity’ of Joe Biden; the model incorrectly identifies him as someone else. Panel (c) shows the outcome when the unlearning target is the ‘Event’; the model misinterprets the event as a concert. This figure highlights the challenge of MU in MLLMs, where removing specific information can unintentionally affect related concepts.
read the caption
Figure 1: Example of an image of Joe Biden speaking at the White House. Before unlearning (a) MLLMs have the ability to generate responses related to various visual concepts (Identify and Event). The goal of Machine Unlearning (MU) for MLLMs is to selectively forget specific concepts within the model. When the unlearning target is Identity (b), the model mistakenly identifies Joe Biden as a different person. When the unlearning target is Event (c), the model misinterprets the speech as a concert.
Method | Person Unlearning | Events Unlearning | ||||||||||
Efficacy | Generality | Retain | Scope | Real | World Fact | Efficacy | Generality | Retain | Scope | Real | World Fact | |
Precision | Precision | Precision | ROUGE-L | Precision | POPE | G-Eval | G-Eval | G-Eval | Precision | ROUGE-L | POPE | |
Finetune (Base) | 0.0 | 2.24 | 97.53 | 0.98 | 100.0 | 85.88 | 0.18 | 0.20 | 0.99 | 100.00 | 0.56 | 85.88 |
PO [30] | 100.00 | 100.00 | 4.12 | 0.89 | 86.64 | 78.52 | 0.21 | 0.22 | 0.98 | 98.86 | 0.44 | 77.23 |
GA [38] | 100.00 | 100.00 | 3.89 | 0.91 | 71.64 | 78.01 | 0.51 | 0.49 | 0.62 | 78.50 | 0.24 | 78.82 |
GD [24] | 98.89 | 98.89 | 21.48 | 0.86 | 76.87 | 77.08 | 0.58 | 0.56 | 0.88 | 81.50 | 0.30 | 79.07 |
KL [37] | 100.00 | 99.70 | 5.00 | 0.81 | 73.88 | 78.73 | 0.55 | 0.51 | 0.84 | 80.75 | 0.25 | 78.75 |
SIU [21] | 100.00 | 100.00 | 10.36 | 0.90 | 80.43 | 79.02 | 0.48 | 0.46 | 0.74 | 84.50 | 0.48 | 80.07 |
DPO [33] | 100.00 | 100.00 | 8.64 | 0.92 | 82.63 | 78.38 | 0.43 | 0.41 | 0.80 | 83.10 | 0.35 | 79.28 |
Goal (Upper Bound) | 100.00 | 100.00 | 96.38 | 0.99 | 100.00 | 87.52 | 0.97 | 0.98 | 0.99 | 100.00 | 0.55 | 87.52 |
🔼 This table presents a comprehensive evaluation of six different machine unlearning (MU) methods on the PEBench benchmark dataset. The evaluation focuses on the task of removing specific personal entities and event information from a multimodal large language model (MLLM). For each method, the table reports four key metrics: Efficacy (how well the model forgets the targeted information), Generality (how well the forgetting generalizes to unseen data), Retain (how well the model retains knowledge of untargeted information), and World Fact (how well the model performs on general world knowledge). The ‘Finetune’ row provides the baseline performance of the model without unlearning, and the ‘Goal’ row represents the ideal performance if the unwanted data could be perfectly removed without retraining.
read the caption
Table 1: Performance overview of different MU methods evaluated on PEBench. The performance metrics include Efficacy, Generality, Retain, Real, and World Fact. A higher score represents better performance. Finetune represents the baseline performance (lower bound for unlearning), and Goal represents the ideal unlearning model (upper bound).
In-depth insights#
MU for MLLMs#
MU for MLLMs presents unique challenges. Erasing knowledge from these models requires careful consideration due to their multimodal nature. Current benchmarks may not fully capture the complexity of real-world scenarios, especially the intricate relationships between entities and events. Selective forgetting, without impacting related concepts, is crucial for practical applications like privacy protection and content moderation. Further research is needed to develop more robust and nuanced MU techniques tailored to MLLMs.
PEBench Intro#
PEBench, as introduced in the abstract, is a new benchmark designed to rigorously assess machine unlearning (MU) techniques specifically within Multimodal Large Language Models (MLLMs). The necessity of PEBench arises from the limitations of current MU evaluations, which often lack comprehensiveness and a clear problem definition, hindering advancements in secure and trustworthy AI systems. The dataset is personal entities and event scenes, it aims to provide a standardized framework for MU research in MLLMs, which should make advancing privacy-preserving multimodal models much easier. The experiments done reveal strengths, limitations of MU methods, also key areas for progress in MLLM unlearning.
SynthData+MU#
Synthetic data offers a controlled environment for machine unlearning (MU) research, allowing researchers to systematically manipulate data characteristics and assess MU methods’ effectiveness. This approach addresses the challenge of data dependencies, ensuring reliable evaluation. By focusing on data absent from pre-training, benchmarks can establish an ‘unlearned’ state, facilitating comparisons. Synthetic data also enables targeted generation of specific scenarios, like harmful content. This aids in stress-testing MU algorithms. Challenges include bridging the gap between synthetic and real-world data, ensuring that lessons learned from synthetic datasets generalize effectively. Further work might focus on transfer learning techniques or domain adaptation methods to improve the applicability of synthetic data to real-world MU scenarios.
G-Eval: Event MU#
G-Eval for Event MU is a key metric for assessing the effectiveness of machine unlearning, specifically focusing on how well a model “forgets” or removes specific events. This evaluation likely employs GPT-4 to assess the similarity between the unlearned model’s output, a ‘ground truth,’ and an ideal ‘goal’ model’s output. The G-Eval score likely ranges from 0 to 1. A score closer to 1 could signify the unlearned output closely matches the ideal model, indicating effective event removal, while a lower score suggests the unlearned model retains undesirable information or leans towards the original state. It’s crucial in multimodal scenarios as it considers how unlearning affects the overall context.
BGD+Balancing#
While ‘BGD+Balancing’ isn’t explicitly a heading in the paper, the concept is present, likely referring to a balanced gradient difference approach incorporating data and task balancing, as introduced in the paper. A BGD approach aims to enhance machine unlearning by addressing data imbalance challenges. It focuses on dynamically adjusting the sampling ratio between event and individual data to avoid one dominating the learning process. Multi-task balancing will include applying separate loss functions to the individual and event unlearning. This strategy helps in mitigating interference when learning both targets. Combining BGD with Gradient Difference allows for better fine-tuning while unlearning, leading to higher effectiveness for the unlearning performance in both personal entities and event scenes. Also, this approach will prevent a potential ‘collapse’ of performance by carefully balancing the learning signals.
More visual insights#
More on figures
🔼 This figure compares PEBench with two other multimodal machine unlearning (MU) benchmarks for large language models (LLMs): MMUBench and CLEAR. MMUBench uses real-world entities and images, while CLEAR uses synthetic data. PEBench, in contrast, utilizes synthetic data to avoid data leakage issues and enables a fairer comparison of MU methods. The figure highlights that existing benchmarks focus on discrete entities, whereas PEBench expands the scope to encompass both identities and event scenes (broader visual concepts) commonly found together within images. This allows for a more comprehensive and realistic evaluation of MU in MLLMs.
read the caption
Figure 2: Comparison between previous MU benchmarks for MLLMs and our PEBench.
🔼 Figure 3 provides a detailed overview of the PEBench framework, illustrating the complete data curation and evaluation pipeline. The framework consists of two main stages. The first stage focuses on data curation: generating text descriptions for diverse person-event pairs using GPT-4 and generating corresponding images to ensure consistency and coupling in visual concepts. The second stage is the evaluation pipeline which involves splitting the dataset, training the goal model and the finetuned model, and finally evaluating their performance to assess the effectiveness of the unlearning methods using metrics like Efficacy, Generality, Scope, and more.
read the caption
Figure 3: Overview of the PEBench framework, illustrating the data curation and evaluation processes.
More on tables
Method | Person Unlearning | Events Unlearning | ||||||||
Efficacy | Generality | Retain | Real | World Fact | Efficacy | Generality | Retain | Real | World Fact | |
Finetune (Base) | 0.0 | 2.24 | 97.53 | 100.0 | 85.88 | 0.18 | 0.20 | 0.99 | 0.56 | 85.88 |
GD [24] | 55.00 | 55.00 | 39.72 | 95.80 | 77.08 | 0.36 | 0.34 | 0.88 | 0.37 | 77.08 |
GD+BGD | 63.50+8.5 | 62.58+7.6 | 28.32-11.4 | 88.65-7.2 | 78.56+1.5 | 0.47+0.1 | 0.50+0.2 | 0.73-0.2 | 0.45+0.1 | 78.36+1.3 |
KL [37] | 36.36 | 36.36 | 22.41 | 58.88 | 70.23 | 0.34 | 0.32 | 0.82 | 0.40 | 66.54 |
KL+BGD | 48.10+11.7 | 48.10+11.7 | 18.67-3.7 | 55.34-3.5 | 68.62-1.6 | 0.42+0.1 | 0.41+0.1 | 0.76-0.1 | 0.42+0.02 | 67.04+0.5 |
Goal (Upper Bound) | 100.00 | 100.00 | 96.38 | 100.00 | 87.52 | 0.97 | 0.98 | 0.99 | 0.55 | 87.52 |
🔼 Table 2 presents a performance comparison of six different machine unlearning methods when applied to simultaneously remove both personal entities and event information from a multimodal large language model. It shows the efficacy, generality (how well the unlearning generalizes to unseen data), retention (how well the model retains knowledge of other, unlearned data), and real-world performance (on a separate, real-world dataset) for each method. The ‘+’ symbol indicates improvement over the baseline, while ‘-’ shows a decrease in performance for a given metric. This table highlights the challenges of simultaneous unlearning and the need for better-performing methods.
read the caption
Table 2: Performance overview of simultaneously unlearn people and events. +{\color[rgb]{0.22265625,0.7109375,0.2890625}\definecolor[named]{pgfstrokecolor% }{rgb}{0.22265625,0.7109375,0.2890625}+}+ (or −{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}-}-) indicates the performance gain (or decrease) compared to the base method.
Num Outputs | Num Inference Steps | Guidance | True Gs | Width | Height |
1 | 40 | 2.5 | 3.5 | 512 | 512 |
🔼 This table lists the hyperparameters used in the Flux image generation model. These parameters control various aspects of the image generation process, including the number of images generated, the number of inference steps used, and the dimensions (width and height) of the output images. Understanding these settings is crucial for interpreting the quality and characteristics of the generated images within the PEBench dataset.
read the caption
Table 3: Flux hyper-parameters.
Task Name | General Prompt Format |
Science & Research | Biologist, Physicist, Archaeologist, Ecologist |
Healthcare & Medicine | Doctor, Nurse, Physical Therapist, Psychologist |
Technology & Engineering | Software Developer, Electrical Engineer, Mechanical Engineer, Cybersecurity Specialist |
Environmental & Agriculture | Environmental Scientist, Agronomist, Forester, Soil Scientist |
Arts & Creative Fields | Painter, Musician, Writer, Graphic Designer |
Business & Finance | Accountant, Market Analyst, Financial Advisor, Project Manager |
Public Service & Community Support | Police Officer, Firefighter, Social Worker, Nonprofit Coordinator |
Education & Culture | Teacher, Trainer, Librarian, Museum Curator |
Media & Communications | Journalist, Broadcaster, Content Creator, Public Relations Specialist |
Architecture & Construction | Architect, Civil Engineer, Construction Worker, Surveyor |
Law & Policy | Lawyer, Judge, Policy Analyst, Legislative Assistant |
Retail & Services | Retail Manager, Customer Service Representative, Hotel Concierge, Sales Associate |
Sports & Fitness | Athlete, Fitness Coach, Physical Trainer, Yoga Instructor |
Logistics & Transportation | Logistics Manager, Truck Driver, Pilot, Shipping Coordinator |
Energy & Natural Resources | Petroleum Engineer, Geologist, Renewable Energy Consultant, Miner |
Unemployed | Job Seeker, Stay-at-home Parent, Retired, Freelancer, Entrepreneur, Consultant, Artist |
Students | Primary School Student, Junior High Student, High School Junior, Undergraduate Student, Community College Student, Master’s Student, Doctoral Student, Research Assistant, Apprentice, Technical School Student |
🔼 Table 4 presents a detailed categorization of occupations across various sectors, including science, healthcare, technology, arts, and public services. Each category includes several specific job examples, providing a comprehensive illustration of the diverse range of professions represented in the dataset. This ensures a realistic and representative portrayal of the occupational landscape.
read the caption
Table 4: The categorization of jobs across various domains, including science, healthcare, technology, arts, and public services. The second column provides specific examples of jobs within each category, offering a comprehensive overview of the dataset’s occupational diversity.
Region | Cities |
North America | New York City, USA; Toronto, Canada; Mexico City, Mexico; Vancouver, Canada; San Juan, Puerto Rico |
South America | São Paulo, Brazil; Buenos Aires, Argentina; Caracas, Venezuela; Quito, Ecuador; Lima, Peru |
Europe | Paris, France; Berlin, Germany; Stockholm, Sweden; Helsinki, Finland; Zurich, Switzerland; Lisbon, Portugal; Dublin, Ireland; Warsaw, Poland; Vienna, Austria; Reykjavik, Iceland; Bucharest, Romania |
Africa | Cairo, Egypt; Cape Town, South Africa; Lagos, Nigeria; Nairobi, Kenya; Accra, Ghana; Dakar, Senegal; Addis Ababa, Ethiopia; Casablanca, Morocco; Kigali, Rwanda |
Asia | Tokyo, Japan; Mumbai, India; Seoul, South Korea; Bangkok, Thailand; Istanbul, Turkey; Dubai, United Arab Emirates; Jakarta, Indonesia; Hanoi, Vietnam; Amman, Jordan; Doha, Qatar; Ulaanbaatar, Mongolia; Male, Maldives; Phnom Penh, Cambodia; Beijing, China; Shanghai, China |
Australia & Oceania | Sydney, Australia; Wellington, New Zealand; Brisbane, Australia; Suva, Fiji; Port Moresby, Papua New Guinea |
Middle East | Riyadh, Saudi Arabia; Tehran, Iran; Baghdad, Iraq; Beirut, Lebanon; Muscat, Oman |
🔼 Table 5 presents a list of cities categorized by their respective continents and regions. This categorization is designed to showcase the geographic diversity encompassed within the PEBench dataset. The inclusion of a wide range of cities from different continents and regions emphasizes the global nature of the data and its representation of diverse geographic locations.
read the caption
Table 5: Cities categorized by their respective regions, highlighting diverse geographical.
Event | Description | Keywords |
media interview | Participating in an interview with a local media outlet. The setting is a well-lit studio or casual setup, depending on the person’s profession. The conversation is captured by a small crew with minimal background distractions. | ”interview”, ”local media”, ”studio”, ”casual setup”, ”conversation”, ”crew”, ”minimal distractions”, ”well-lit”, ”profession”, ”professional” |
park jogging | Exercising or relaxing in a nearby park. The park is peaceful with trees and walking paths, a serene backdrop for professionals, students, or retirees enjoying nature. | ”park”, ”jogging”, ”exercising”, ”relaxing”, ”nature”, ”trees”, ”walking paths”, ”serene”, ”peaceful”, ”outdoors” |
farm visit | Visiting a local farm, surrounded by green fields and farm animals. The atmosphere is peaceful and natural, perfect for relaxing or learning about agriculture. | ”farm”, ”visit”, ”green fields”, ”farm animals”, ”peaceful”, ”natural”, ”agriculture”, ”learning”, ”outdoors”, ”relaxing” |
dinner with friends | Enjoying a meal with friends or family at a local restaurant. The restaurant has a cozy, informal setting, suitable for unwinding after a busy day. | ”dinner”, ”friends”, ”family”, ”restaurant”, ”cozy”, ”informal”, ”meal”, ”unwinding”, ”relaxed”, ”evening” |
landmark visit | Visiting a notable city landmark, adding a cultural aspect to their day. The clear weather and bustling tourist atmosphere offer a nice break from their routine. | ”landmark”, ”city”, ”tourist”, ”cultural”, ”visit”, ”weather”, ”atmosphere”, ”bustling”, ”break”, ”routine” |
zoo visit | Exploring a local zoo, observing animals in their habitats. The setting is educational and family-friendly, perfect for learning about wildlife. | ”zoo”, ”animals”, ”habitats”, ”education”, ”family-friendly”, ”wildlife”, ”exploring”, ”local”, ”learning”, ”nature” |
shopping mall | Walking through a busy shopping mall, either to relax or purchase essentials. The mall is brightly lit, with various stores and other people enjoying a bustling atmosphere. | ”shopping”, ”mall”, ”bustling”, ”stores”, ”shopping experience”, ”brightly lit”, ”people”, ”relaxing”, ”purchasing”, ”atmosphere” |
public lecture | Attending or presenting a lecture at a university or community center. The atmosphere is formal, with people attentively listening, suitable for professionals, students, or anyone interested in continuous learning. | ”lecture”, ”public”, ”university”, ”community center”, ”formal”, ”attendees”, ”presentation”, ”education”, ”learning”, ”professional” |
gym workout | Engaging in a workout at a local gym. The gym has spacious areas for various exercises and equipment, creating a focused and energetic environment for fitness enthusiasts of all ages. | ”gym”, ”workout”, ”exercises”, ”fitness”, ”spacious”, ”equipment”, ”energetic”, ”environment”, ”focus”, ”physical” |
dance event | Dancing or socializing at a club or festive event. The atmosphere is vibrant, with colorful lights and music setting a lively mood. | ”dance”, ”event”, ”club”, ”music”, ”socializing”, ”vibrant”, ”colorful”, ”lights”, ”festive”, ”lively” |
coffee shop reading | Enjoying a coffee break in a cozy café. The ambiance is quiet and relaxed, perfect for reading, working on a laptop, or chatting with friends. | ”coffee shop”, ”reading”, ”cozy”, ”relaxed”, ”ambient”, ”quiet”, ”laptop”, ”break”, ”friends”, ”work” |
airport waiting | Waiting at an airport terminal for a flight, surrounded by other travelers. The modern, glass-walled terminal offers views of the runway, creating a calm and organized atmosphere. | ”airport”, ”waiting”, ”travel”, ”terminal”, ”flight”, ”runway”, ”modern”, ”organized”, ”passengers”, ”calm” |
concert attendance | Attending a live concert in an open-air or indoor venue. The crowd is lively, cheering and enjoying the music in a spirited environment. | ”concert”, ”live music”, ”crowd”, ”lively”, ”spirited”, ”performance”, ”audience”, ”indoor”, ”outdoor”, ”energy” |
beach relaxing | Relaxing by the seaside, with gentle waves and a clear sky. This peaceful setting is ideal for a break from their routine, whether alone or with family. | ”beach”, ”relaxing”, ”seaside”, ”waves”, ”clear sky”, ”peaceful”, ”break”, ”family”, ”serene”, ”outdoors” |
business meeting | Participating in a business or professional meeting in a modern conference room. The background shows large windows with a city view, creating a productive atmosphere. | ”business”, ”meeting”, ”conference room”, ”professional”, ”city view”, ”windows”, ”productive”, ”discussion”, ”corporate”, ”formal” |
museum tour | Exploring a museum filled with historical or artistic exhibits. The lighting is dim with spotlights on displays, creating a reflective environment for visitors. | ”museum”, ”tour”, ”historical”, ”artistic”, ”exhibits”, ”spotlights”, ”dim lighting”, ”reflective”, ”atmosphere”, ”culture” |
car driving | Driving through a scenic area, either in the city or countryside, during sunset. The road is lined with buildings or natural landscapes, creating a calm and picturesque atmosphere. | ”car”, ”driving”, ”scenic”, ”sunset”, ”road”, ”landscapes”, ”city”, ”countryside”, ”picturesque”, ”travel” |
grocery shopping | Picking up essentials at a well-organized grocery store. The bright lighting and neatly stocked shelves create a comfortable and efficient shopping experience. | ”grocery”, ”shopping”, ”store”, ”essentials”, ”organized”, ”bright lighting”, ”efficient”, ”comfortable”, ”experience”, ”shopping” |
🔼 Table 6 presents a comprehensive list of 40 different event scenarios, each described in detail. For each scenario, a set of keywords has been extracted to concisely summarize its key features and characteristics. These keywords are not simply descriptive; they are carefully selected to be relevant for evaluating the effectiveness of the machine unlearning process in the context of the PEBench framework. The table thus serves as a crucial component of the evaluation methodology, providing a structured and standardized way to assess the model’s ability to forget specific concepts while retaining other knowledge.
read the caption
Table 6: Event Descriptions with Corresponding Keywords (part one). Each event description provides a detailed explanation of the scenario and is associated with a list of extracted keywords that capture the essence of the scene. These keywords are used for evaluation purposes in our framework.
Event | Description | Keywords |
marathon running | Running in a local marathon event. The streets are lined with cheering crowds, and the weather is clear, creating an energetic and community-oriented environment. | ”marathon”, ”running”, ”event”, ”streets”, ”cheering”, ”crowds”, ”clear weather”, ”community”, ”energy”, ”fitness” |
art gallery visit | Strolling through an art gallery or exhibition. The gallery has soft lighting and showcases various artworks, allowing for a calm, introspective experience. | ”art gallery”, ”visit”, ”exhibits”, ”artwork”, ”soft lighting”, ”calm”, ”introspective”, ”atmosphere”, ”culture”, ”reflection” |
family gathering | Spending time with family at a comfortable home setting. The room is warmly lit with family mementos and a friendly, welcoming atmosphere. | ”family”, ”gathering”, ”home”, ”warmly lit”, ”mementos”, ”friendly”, ”welcoming”, ”atmosphere”, ”comfort”, ”together” |
bookstore browsing | Browsing through books in a quaint bookstore. The small, quiet setting is filled with shelves of books, perfect for leisurely exploration. | ”bookstore”, ”browsing”, ”books”, ”quaint”, ”quiet”, ”shelves”, ”exploration”, ”reading”, ”leisure”, ”relaxed” |
mountain cabin retreat | Relaxing at a cabin in the mountains. The area is peaceful, surrounded by trees and distant mountain views, creating a tranquil and refreshing setting. | ”mountain”, ”cabin”, ”retreat”, ”peaceful”, ”trees”, ”views”, ”tranquil”, ”refreshing”, ”nature”, ”serene” |
office working | Working or studying at a desk in a modern office. The room has large windows with natural light, creating a productive and quiet atmosphere for focused tasks. | ”office”, ”working”, ”desk”, ”modern”, ”conference room”, ”windows”, ”natural light”, ”focused”, ”quiet”, ”productive” |
train commute | Traveling on a busy train, either standing or seated, surrounded by passengers absorbed in various activities. The setting is organized, creating a routine commute experience. | ”train”, ”commute”, ”busy”, ”seated”, ”standing”, ”passengers”, ”routine”, ”travel”, ”organized”, ”routine” |
mountain hiking | Hiking along a scenic mountain trail. The view of mountains and clear sky adds a refreshing and peaceful ambiance to the experience. | ”mountain”, ”hiking”, ”trail”, ”scenic”, ”view”, ”clear sky”, ”peaceful”, ”refreshing”, ”nature”, ”outdoors” |
school presentation | Delivering or observing a presentation in a classroom. The students are attentive, creating an academic atmosphere suited for sharing knowledge. | ”school”, ”presentation”, ”classroom”, ”students”, ”attentive”, ”academic”, ”learning”, ”sharing knowledge”, ”formal”, ”education” |
restaurant dining | Dining at an upscale restaurant. The lighting is dim, and the decor is elegant, creating an intimate and refined ambiance. | ”restaurant”, ”dining”, ”upscale”, ”dim lighting”, ”elegant”, ”refined”, ”intimate”, ”ambiance”, ”meal”, ”gourmet” |
night sky stargazing | Observing the night sky at an outdoor stargazing event. Telescopes are set up, and the setting is quiet with a clear view of the stars, creating a magical atmosphere. | ”night sky”, ”stargazing”, ”outdoors”, ”telescopes”, ”quiet”, ”clear view”, ”stars”, ”magical”, ”peaceful”, ”event” |
snowshoeing | Exploring a snowy forest on a snowshoeing trail. The setting is quiet, with only the sound of footsteps in the snow, creating a peaceful winter atmosphere. | ”snowshoeing”, ”forest”, ”snow”, ”winter”, ”trail”, ”quiet”, ”footsteps”, ”peaceful”, ”nature”, ”serene” |
city bike ride | Riding a bike along city streets or designated trails. The background showcases tall buildings or park areas, creating a blend of urban and natural scenery. | ”bike ride”, ”city”, ”streets”, ”trails”, ”urban”, ”scenery”, ”buildings”, ”park”, ”nature”, ”dynamic” |
fashion show | Attending a fashion show. The atmosphere is glamorous, with a runway spotlighting models and guests observing the latest trends in fashion. | ”fashion”, ”show”, ”runway”, ”models”, ”glamorous”, ”spotlight”, ”trends”, ”observation”, ”fashionable”, ”elegant” |
fishing trip | Fishing by a serene lake. The landscape is surrounded by greenery, and the atmosphere is peaceful with only nature’s sounds in the background. | ”fishing”, ”trip”, ”lake”, ”serene”, ”greenery”, ”nature”, ”outdoors”, ”peaceful”, ”relaxing”, ”scenic” |
train station waiting | Waiting at a quiet train station platform, with schedules displayed on an electronic board. The atmosphere is calm, with passengers nearby preparing for their commute. | ”train station”, ”waiting”, ”platform”, ”calm”, ”passengers”, ”quiet”, ”departure”, ”travel”, ”routine”, ”organized” |
charity event | Participating in a community charity event in a large hall. The room is decorated for the occasion, with guests mingling and the mood warm and friendly. | ”charity”, ”event”, ”community”, ”hall”, ”guests”, ”mingling”, ”decorated”, ”mood”, ”warm”, ”friendly” |
🔼 Table 7 presents a comprehensive list of events and their corresponding keywords. Each event is described in detail, providing context and setting. The associated keywords capture the key aspects of the event’s visual and thematic elements. These keywords are crucial for the evaluation of the model’s performance in the PEBench framework.
read the caption
Table 7: Event Descriptions with Corresponding Keywords (part two). Each event description provides a detailed explanation of the scenario and is associated with a list of extracted keywords that capture the essence of the scene. These keywords are used for evaluation purposes in our framework.
Event | Description | Keywords |
nature photography | Taking photographs in a scenic forest or park. The atmosphere is quiet and filled with the sounds of nature, perfect for capturing the beauty of the outdoors. | ”photography”, ”nature”, ”forest”, ”park”, ”outdoors”, ”quiet”, ”scenic”, ”capturing”, ”beauty”, ”peaceful” |
library studying | Studying or reading in a quiet library. The tall bookshelves and soft lighting create an ideal setting for focused learning. | ”library”, ”studying”, ”bookshelves”, ”quiet”, ”focused”, ”reading”, ”learning”, ”atmosphere”, ”soft lighting”, ”introspective” |
boat trip | Taking a relaxing boat trip along a calm river or lake. The sky is clear, and the scenic landscape adds to the peacefulness of the outing. | ”boat trip”, ”river”, ”lake”, ”relaxing”, ”scenic”, ”peaceful”, ”water”, ”landscape”, ”clear sky”, ”nature” |
biking trail | Riding a bike along a nature trail, with trees lining the path. The refreshing environment and dappled sunlight create a peaceful atmosphere. | ”bike”, ”trail”, ”nature”, ”trees”, ”path”, ”outdoors”, ”scenic”, ”sunlight”, ”peaceful”, ”refreshing” |
city walk | Walking through a lively city center. The street is lined with shops and bustling with people, providing a vibrant and dynamic urban experience. | ”city walk”, ”lively”, ”shops”, ”bustling”, ”urban”, ”dynamic”, ”streets”, ”people”, ”downtown”, ”exploring” |
🔼 Table 8 presents event descriptions and their corresponding keywords. Each description details a specific event scenario (e.g., library studying, nature photography, boat trip). Associated with each description is a list of keywords that concisely summarize the scene’s key elements. These keywords are used during the evaluation phase of the PEBench framework to assess the performance of various machine unlearning methods on multimodal data.
read the caption
Table 8: Event Descriptions with Corresponding Keywords (part three). Each event description provides a detailed explanation of the scenario and is associated with a list of extracted keywords that capture the essence of the scene. These keywords are used for evaluation purposes in our framework.
Full paper#



















