Linear Correlation in LM's Compositional Generalization and Hallucination

2502.04520

Letian Peng et el.

🤗 2025-02-10

TL;DR
#

Large language models (LLMs) are known for their impressive capabilities, but their ability to generalize and compose knowledge remains a subject of active debate. One major challenge is understanding how LLMs generalize to novel situations and avoid generating incorrect or hallucinatory outputs. Many researchers are working to improve LLMs’ capabilities in these aspects.

This research delves into the nature of LLM knowledge composition by examining the relationship between the ’logits’ (probability scores) of different knowledge prompts. They discovered a surprising phenomenon: a consistent linear relationship exists between the logits of related prompts. This means the model’s predictions for one type of prompt can be approximately derived from the predictions on a related one via a simple linear transformation. Further analysis revealed that this linear correlation is surprisingly stable, even after significant model fine-tuning. Importantly, the strength and accuracy of this linear transformation serve as an indicator of the LLM’s ability to generalize correctly to new knowledge combinations.

Key Takeaways
#

Why does it matter?
#

This paper is crucial because it uncovers a hidden linearity in language models’ knowledge composition. This finding challenges existing assumptions about how LMs generalize, providing a novel perspective on compositional generalization and hallucination. The identification of linear correlations as a predictor of generalization and hallucination offers a new diagnostic tool and opens new avenues for improving LM capabilities and mitigating unwanted behaviors.

Visual Insights
#

🔼 This figure demonstrates two key findings about how language models (LMs) handle knowledge composition. First, it shows that a linear transformation can accurately map the prediction logits (the LM’s internal probabilities for the next word) from one type of knowledge prompt to another. For example, logits predicting the country someone lives in can be approximated linearly from the logits predicting the city they live in. Importantly, this linear relationship remains even after substantial further training of the LM (fine-tuning). Second, the figure illustrates how updating knowledge in one domain (e.g., correcting the LM’s knowledge about a city) propagates through this linear transformation to other related domains (e.g., updating knowledge about the country). This propagation can lead to either accurate generalizations (correctly updated information) or hallucinations (incorrectly inferred information) depending on whether the linear transformation accurately captures the underlying relationship between the knowledge domains.
read the caption
Figure 1: Demonstration of our main discoveries. 1) We can fit a linear transformation between the output of source and target knowledge prompts, which is resilient against fine-tuning. 2) Updating the source knowledge will generalize to the target one via resilient linearity, causing compositional generalization/hallucination.

Family	Prompt	Domain Examples
Attribute	“X lives in the city of”	Paris, Vienna
Attribute	“X lives in the country of”	France, Austria
X-Lang.	“X vit dans la ville de”	Paris, Vienne
X-Lang.	“X lebt in der Stadt von”	Paris, Wien
Simile	“X has the same color as”	Apple, Banana
Simile	“X’s color is”	Red, Yellow
Math	“X+1=”	1, 2, 3, 4, 5
Math	“X*2=”	2, 4, 6, 8, 10

🔼 This table presents examples of prompts used to investigate knowledge composition in large language models (LLMs). It showcases four families of knowledge composition: Attribute (e.g., relating city to country), Cross-language (e.g., translating ‘Paris’ to ‘Paris’ in different languages), Simile (e.g., connecting concepts based on shared attributes like color), and Math (e.g., performing arithmetic operations). Each family includes example prompts and the corresponding domains of potential outputs or target concepts. This illustrates the range of tasks explored to understand how LMs generalize across these different types of compositional relations.
read the caption
Table 1: Examples of prompts and domains in different families of knowledge composition.

In-depth insights
#

Linear Knowledge
#

The concept of “Linear Knowledge” in the context of large language models (LLMs) suggests that some relationships between pieces of knowledge are inherently linear. This linearity manifests as a linear transformation between the prediction logits of related prompts, meaning that the model’s confidence in one piece of knowledge can be directly predicted from its confidence in a related piece. This phenomenon is resilient to fine-tuning and even large-scale model updates, indicating a deep-seated aspect of how LMs structure information. This linearity mirrors human cognition in that we often associate related concepts in a straightforward manner (e.g., Paris-France). However, this linearity is a double-edged sword. While it facilitates compositional generalization in some cases, it can also lead to hallucinations when the linear mapping deviates from real-world relationships, creating inaccurate or fabricated knowledge.

LM Generalization
#

The paper investigates the generalization capabilities of Large Language Models (LLMs), specifically focusing on compositional generalization and the phenomenon of hallucination. A core finding is the presence of resilient linear correlations between the output logits of related knowledge prompts, even after extensive fine-tuning. This suggests that LLMs leverage linear transformations to map knowledge from one domain to another, mirroring aspects of human knowledge composition. However, this linearity, while often enabling successful generalization, can also lead to hallucinations when the linear mapping deviates from real-world relationships. The study highlights the importance of both high correlation and precise linear transformations for successful generalization and emphasizes that vocabulary representations play a critical role in establishing these correlations.

Correlation’s Role
#

The research paper explores the crucial role of linear correlations in large language models (LLMs), particularly concerning their compositional generalization capabilities. Linear correlations between the prediction logits of related knowledge pairs are identified as a key factor influencing how well LLMs generalize. This means that the LLM’s ability to accurately predict the next token in a sequence related to a given concept significantly depends on the existence and strength of linear relationships between those concepts’ representations within the model. The presence of a high linear correlation combined with a precise linear transformation (W, b) generally leads to successful generalization, allowing LLMs to apply knowledge learned in one context to a related but novel situation. However, when the correlation is high but the transformation matrix (W) is imprecise, the result is often compositional hallucination, where the model incorrectly generalizes knowledge, producing nonsensical or factually incorrect outputs. Thus, the study highlights the importance of analyzing both the strength of correlations and the accuracy of the learned linear transformations for understanding LLM generalization and for mitigating the risk of hallucination.

Vocabulary’s Impact
#

The research paper significantly highlights vocabulary’s crucial role in the compositional generalization and hallucination observed in large language models (LLMs). The authors demonstrate that the linear correlations found between logits of related knowledge prompts are heavily influenced by, and strongly linked to, the underlying vocabulary representations. Experiments show that even a simplified model, using only a mean-pooling layer and a feedforward network with pre-trained vocabulary embeddings, can learn to compose knowledge successfully. This finding directly points to the importance of lexical mappings and suggests that the LLM’s ability to generalize or hallucinate is deeply rooted in how it processes and relates vocabulary items. Altering these mappings disrupts the compositional abilities, underlining the critical role of vocabulary representations in the overall functioning of the LLM. Therefore, improvements in LLM generalization and the mitigation of hallucination might require careful attention to enhancing the quality of vocabulary representations and their interrelationships within the model’s architecture.

Future Research
#

Future research should focus on developing a formal theory explaining why resilient linear correlations emerge in language models. Investigating how model architecture, optimization dynamics, and linguistic structures contribute to these correlations is crucial. Further work must systematically analyze the influence of training data on correlation formation to better understand which data properties drive their emergence. A key area for investigation is creating a general method for predicting which knowledge pairs will exhibit linear correlations, going beyond specific examples like city-country. Finally, research should explore the implications of linear correlations for various tasks, including knowledge editing and hallucination mitigation, and determine how these correlations can be effectively leveraged for improved generalizable learning.

More visual insights
#

More on figures

🔼 This figure illustrates the hypothesis that Language Models (LMs) learn a linear transformation (W, b) to compose knowledge. The core idea is that the model’s output logits for related knowledge prompts (e.g., ‘X lives in the city of’ and ‘X lives in the country of’) are linearly correlated. The figure poses four key questions. 1) Can a linear transformation (W, b) be fit between the logits of these related prompts? 2) Does this linear relationship hold for arbitrary inputs X, not just those that are explicitly paired in the training data? 3) Does fine-tuning the LM significantly alter the learned linear transformation (W, b)? 4) Which LM parameters contribute to the formation of this linear transformation?
read the caption
Figure 2: Our hypothesis and questions about how LMs compose knowledge by learning (W,b)𝑊𝑏(W,b)( italic_W , italic_b ).

🔼 This figure displays a heatmap visualizing the linear correlation between the next-token prediction (NTP) logits from the LLaMA-3-8B language model. The heatmap shows the correlation between pairs of related knowledge prompts. Warmer colors (red) represent stronger positive correlations, while cooler colors (blue) indicate weaker or negative correlations. The figure helps illustrate the concept of resilient linearity in the model’s knowledge composition, revealing relationships between different knowledge subdomains.
read the caption
Figure 3: The linear correlation between NTP logits of llama-3-8b.

🔼 This figure shows how the precision of the linear transformation matrix W increases with the size of the language model. The precision is measured by the Hit@Top-N metric (where N=1 and 5 are shown), which indicates how frequently the model correctly identifies the most influential (or influenced) token pairs when composing knowledge. The figure uses the City-Country and CEO-Company knowledge pairs as examples, illustrating that larger models achieve higher precision in inferring relationships, particularly for the City-Country task.
read the caption
Figure 4: The scaling-up of the precision of W𝑊Witalic_W with model size.

🔼 This figure illustrates the relationship between the weights of the linear transformation matrix W and the success of compositional generalization. The x-axis represents the weight assigned to a specific (City, Country) pair in matrix W. The y-axis represents the proportion of times this pair correctly generalizes (successful generalization) or results in a hallucination (hallucination). The bars show that high weights generally lead to successful generalization, while low weights are more likely to result in hallucinations. This indicates that the precision of W is critical for successful compositional generalization.
read the caption
Figure 5: The effect of W𝑊Witalic_W weights on generalization.

🔼 This figure illustrates an experiment to investigate the origin of linear correlations in Language Models (LMs). The researchers simplified the LM architecture by replacing its complex internal layers (e.g., self-attention, positional embeddings, etc.) with a simpler structure consisting only of a mean-pooling layer and a single feedforward network. This simplified model was trained on a small set of paired texts exhibiting the knowledge composition relations studied in the paper. The goal was to determine whether the simplified model could still learn and generalize the compositional knowledge relationships observed in larger, more complex LMs, thereby helping to isolate the factors responsible for the linear correlations found in those LMs. The simplified structure helps to show that the linear correlation is not caused by the complex inner workings of the transformer models.
read the caption
Figure 6: We replace the deep intermediate layers of LMs with an initialized shallow bag-of-word network.

🔼 This heatmap visualizes the linear correlation between next-token prediction (NTP) logits in the LLaMA-3-8B language model when performing mathematical operations. Each cell represents the correlation between logits from two different prompts involving math problems. The intensity of color corresponds to the strength of the correlation; darker red signifies stronger positive correlation, while darker blue represents stronger negative correlation. The diagonal represents perfect correlation (1.0). This figure helps to illustrate the extent to which the model uses linear relationships in solving mathematical problems and generating sequential tokens.
read the caption
Figure 7: The linear correlation between NTP logits of llama-3-8b in math operations.

🔼 This figure visualizes the correlation matrices of next token prediction (NTP) logits from the LLaMA-3-8B language model. It compares the correlations before and after large-scale post-training. The color intensity in the matrix represents the strength of the correlation between logits; warmer colors indicate stronger correlations, while cooler colors represent weaker correlations. Comparing the two matrices allows for an assessment of how the model’s understanding of relationships between concepts changes after extensive fine-tuning, highlighting the resilience (or lack thereof) of the learned correlations to further training.
read the caption
Figure 8: The linear correlation between NTP logits of llama-3-8b before and after large-scale post-training.

🔼 This figure visualizes the correlation matrices of next token prediction (NTP) logits for mathematical operations before and after large-scale post-training. It shows the correlation coefficients between the logits generated by the language model for different mathematical expressions, both before any post-training fine-tuning and after a large-scale fine-tuning process. This allows for assessment of how robust the linear relationships between these logits are to significant model adjustments, and how the model’s understanding of mathematical composition changes.
read the caption
Figure 9: The linear correlation between NTP logits in math operations before and after large-scale post-training.

🔼 This figure displays the instance-wise correlation matrix for the next token prediction (NTP) logits of the LLaMA3-8b model, focusing on the ‘attribute’ knowledge family as an example. The matrix visualizes the Pearson correlation coefficients between the logits of different input-output word pairs within the attribute family. Higher correlation (redder colors) indicates a stronger linear relationship between the corresponding logits, suggesting that the model is more likely to generalize knowledge composition for those pairs. Conversely, lower correlation (bluer colors) indicates a weaker linear relationship and a higher likelihood of compositional generalization failures or hallucinations.
read the caption
Figure 10: The instance-wise correlation between NTP logits of llama3-8b (attribute as an example).

🔼 This figure displays a heatmap visualizing the correlation between next-token prediction (NTP) logits in the GPT-2-medium language model. The heatmap shows the correlation strength between different attributes, such as the correlation between ‘city’ and ‘country’ or ‘job’ and ‘company’. Warmer colors (red) indicate a stronger positive correlation, while cooler colors (blue) indicate a weaker or negative correlation. The diagonal line represents perfect correlation (1.0) between identical attributes. This visualization helps understand how the model associates different attributes in its internal representation of knowledge.
read the caption
Figure 11: The attribute correlation between NTP logits of gpt2-medium.

🔼 This figure shows a heatmap representing the correlation between the next-token prediction (NTP) logits of the LLaMA-3-1B language model for different attribute pairs. Each cell’s color intensity indicates the strength of the linear correlation between the logits of two attributes. Darker red indicates a strong positive correlation, while dark blue indicates a strong negative correlation. The diagonal shows perfect correlation, and the off-diagonal elements reveal the relationships between various attributes (e.g., city, country, job, personality, etc.). The figure helps visualize how the model associates and generalizes knowledge between related attributes.
read the caption
Figure 12: The attribute correlation between NTP logits of llama-3.2-1b.

🔼 This heatmap visualizes the correlation between next-token prediction (NTP) logits from different attribute-related prompts in the LLaMA-3-3B language model. Each cell’s color intensity represents the correlation coefficient between the logits of two prompts, indicating the strength of their semantic relationship. Warmer colors suggest a stronger positive correlation, while cooler colors represent a negative or weaker correlation. The diagonal shows perfect correlation (1.0) as it compares a prompt’s logits to itself. This figure helps understand how the model relates different attributes in its knowledge representation.
read the caption
Figure 13: The attribute correlation between NTP logits of llama-3.2-3b.

🔼 This heatmap visualizes the correlation between the next-token prediction (NTP) logits of the LLaMA-3-8B language model for different attribute pairs. Each cell’s color intensity represents the correlation coefficient between the logits of two related attributes, indicating how strongly the model associates them. Darker red indicates a strong positive correlation, while dark blue represents a strong negative correlation. The attributes considered are likely related to various semantic fields like geography, occupation, family, etc. The figure aids in understanding the model’s implicit knowledge structures by highlighting strongly correlated attributes.
read the caption
Figure 14: The attribute correlation between NTP logits of llama-3-8b.

🔼 This heatmap visualizes the correlation between the next-token prediction logits of the LLaMA-3-70B language model for different attribute pairs. Each cell represents the correlation coefficient between the logits of two attributes (e.g., city and country). Warmer colors (red) indicate a higher positive correlation, while cooler colors (blue) show a negative correlation. The diagonal line represents the perfect correlation of an attribute with itself. The figure provides insights into how the model relates different attributes in its internal knowledge representation, revealing potential linear relationships between certain attributes which might explain compositional generalization or hallucination.
read the caption
Figure 15: The attribute correlation between NTP logits of llama-3-70b.

🔼 This heatmap visualizes the correlation between the next-token prediction (NTP) logits from different attribute prompts in the Deepseek-r1-distll-qwen-7B language model. Each row and column represents a specific attribute prompt (e.g., ‘X lives in the city of’, ‘X works for the company’). The color intensity at the intersection of a row and column indicates the strength of the correlation between the corresponding attribute pairs. Darker red shades signify high positive correlation, while darker blue shades indicate high negative correlation. The diagonal line shows perfect correlation because an attribute is perfectly correlated with itself. This visualization helps to understand how different attributes are related and interconnected within the language model’s knowledge representation.
read the caption
Figure 16: The attribute correlation between NTP logits of deepseek-r1-distll-qwen-7B.

🔼 This heatmap visualizes the correlation between the next-token prediction (NTP) logits of the Mistral-7b-v0.3 language model for various attribute pairs. Each cell represents the correlation between the logits of two attributes. Strong positive correlations (red) indicate that the model strongly associates the two attributes, suggesting a strong learned relationship in the model. Negative correlations (blue) mean the model associates the attributes inversely. The diagonal line shows perfect correlation between an attribute and itself. This figure helps understand the nature of knowledge composition within the model.
read the caption
Figure 17: The attribute correlation between NTP logits of mistral-7b-v0.3.

🔼 This heatmap visualizes the linear correlation between next-token prediction (NTP) logits from different knowledge prompts in the LLaMA-3-8B language model. The rows and columns represent source and target knowledge prompts, respectively, and the color intensity of each cell reflects the strength of the linear correlation between the logits of the corresponding prompts. Darker red indicates a strong positive correlation, while darker blue indicates a strong negative correlation. The figure is used to illustrate the existence of linear correlations between related knowledge pairs and the resilience of these correlations to large-scale fine-tuning. The specific knowledge categories are indicated within the heatmap.
read the caption
Figure 18: The linear correlation between NTP logits of llama-3.2-3b.

🔼 This figure visualizes the correlation matrices of next token prediction (NTP) logits from the LLaMA-3 3B parameter model. The left panel shows the correlation before large-scale post-training, while the right panel shows the correlation after the post-training. Comparing these two matrices reveals the resilience of linear correlations in the model’s parameters to significant fine-tuning. The color intensity in each matrix represents the strength of the correlation between different logits. Darker reds indicate strong positive correlations, while darker blues indicate strong negative correlations. The figure provides visual evidence that relationships between logits captured before fine-tuning are largely maintained after the model undergoes substantial post-training.
read the caption
Figure 19: The linear correlation between NTP logits of llama-3.2-3b before and after large-scale post-training.

🔼 This figure displays the correlation matrices for three different sizes of LLAMA language models (1B, 3B, and 8B parameters) before and after large-scale post-training. It visually demonstrates how the linear correlation between next-token prediction (NTP) logits of related knowledge pairs, as established prior to fine-tuning, remains consistent even after extensive post-training. The increased robustness of the correlation in larger models suggests that the effect is more prominent and resilient in models with greater scale and capacity.
read the caption
Figure 20: The correlation becomes more resilient in larger LMs.

🔼 This figure displays heatmaps visualizing the correlation between next-token prediction (NTP) logits from the Mistral-7b-v0.3 language model. It compares the correlation matrices before and after large-scale post-training. The heatmaps allow for a visual comparison of how the relationships between different word tokens change before and after the model undergoes further training. A strong correlation between logits suggests a stronger relationship between corresponding words, highlighting the impact of post-training on the model’s understanding of word relationships.
read the caption
Figure 21: The correlation between logits from mistral-7b-v0.3 before and after post-training.

🔼 This figure compares the cross-lingual correlation of language models between Aya and LLaMA. Specifically, it visualizes the correlation between language pairs’ output logits. The heatmaps illustrate the strength of linear correlation discovered in the study. Higher correlation indicates a stronger association between the logits of related concepts in different languages. The comparison highlights how the multilingual nature of Aya impacts its ability to correlate concepts in diverse languages, particularly when compared to the primarily English-centric training of LLaMA. Note that this comparison focuses on the consistency of correlated concepts, not the accuracy or completeness of the concepts themselves.
read the caption
Figure 22: The comparison between Aya and LLaMA in cross-lingual correlation.

🔼 This figure visualizes the standard deviation (std) of the label-wise correlation distribution between next token prediction (NTP) logits. It provides insight into how consistent the linear correlations are across different knowledge pairs and addresses the concern that the correlation might primarily reflect the majority property of labels or be biased by highly correlated pairs. The heatmap shows the standard deviations, with lower values indicating more consistent correlations.
read the caption
Figure 23: The std of correlation distribution between logits.

🔼 This figure shows the standard deviation (std) of the label-wise Pearson correlation coefficients before and after large-scale post-training. It visualizes the dispersion or spread of the correlation values across different pairs of next-token prediction (NTP) logits. A low standard deviation indicates that the correlations are clustered around a central value, implying a more consistent relationship between the logits, whereas a higher standard deviation indicates more variability in the strength of the linear relationships. This analysis helps assess the robustness of the linear correlation against the large-scale fine-tuning process, demonstrating its resilience in capturing consistent patterns in the data.
read the caption
Figure 24: The std of correlation distribution between logits before and after large-scale post-training.

More on tables

Relation Pair	Hit@Top-N
	Influenced Target			Influencing Source
	$1$	$3$	$5$	$1$	$3$	$5$
City $\rightarrow$ Country	$0.42$	$0.45$	$0.48$	$0.67$	$0.74$	$0.78$
CEO $\rightarrow$ Company	$0.09$	$0.09$	$0.14$	$0.05$	$0.05$	$0.08$
City ${}_{\textrm{en}}$ $\rightarrow$ City ${}_{\textrm{es}}$	$0.91$	$0.91$	$0.92$	$0.67$	$0.74$	$0.78$
City ${}_{\textrm{en}}$ $\rightarrow$ City ${}_{\textrm{zh}}$	$0.10$	$0.13$	$0.16$	$0.09$	$0.11$	$0.15$
Fruit $\rightarrow$ Color	$0.25$	$0.38$	$0.47$	$0.38$	$0.50$	$0.54$
Food $\rightarrow$ Taste	$0.28$	$0.50$	$0.62$	$0.14$	$0.36$	$0.43$
X+1 $\rightarrow$ X+2	$0.00$	$0.50$	$0.60$	$0.10$	$0.30$	$0.50$
X+1 $\rightarrow$ X*2	$0.10$	$0.40$	$0.50$	$0.10$	$0.30$	$0.70$

🔼 This table presents the precision of compositional relationships learned by the language model and captured in the weight matrix W. It shows how accurately the model’s learned associations between source and target knowledge reflect real-world relationships for various knowledge pairs. Higher precision indicates a stronger alignment between the model’s knowledge and human understanding of those relationships. The table includes metrics like Hit@Top-N (for N=1, 3, 5) to evaluate the accuracy of the top-ranked predictions made by the model.
read the caption
Table 2: The precision of compositional relations built up in W𝑊Witalic_W.

If City $=$	Then Country $=$
Shanghai	China, Italia, Albania, USSR, Korea
NYC	USA, USSR, UAE, China, CCCP
Oslo	CCCP, Norway, Kosovo, Israel, Oman
Seattle	Uruguay, Serbia, Kosovo, Romania, Slovenia
Indianapolis	India, Indonesia, France, Iraq, Netherlands
If $X+1=$	Then $X+2=$
$1$	1, 2, 4, 6, 3
$2$	2, 3, 4, 5, 7
$3$	3, 6, 5, 4, 7
$4$	4, 0, 2, 1, 10
$5$	5, 6, 8, 7, 9

🔼 This table presents examples of top-influenced tokens in target knowledge after applying a linear transformation (W,b) to source knowledge logits. It demonstrates the precision of the learned transformation (W,b) in predicting correct target knowledge tokens and showcases instances where the transformation yields either accurate generalization or hallucinations. For each knowledge pair (e.g., City-Country), the table shows the top-influenced tokens in the target knowledge for various source knowledge tokens and compares them to ground truth values. The purpose is to illustrate the connection between the linear transformation, its precision, and the outcome of compositional generalization (accurate or hallucinatory).
read the caption
Table 3: Cases of top-influenced tokens pairs in target knowledge.

Relation Pair	Logit Correlation	Grad. Correlation
City $\rightarrow$ Country	$0.89$	$0.79$
CEO $\rightarrow$ Company	$0.55$	$0.47$
City ${}_{\textrm{en}}$ $\rightarrow$ City ${}_{\textrm{es}}$	$0.70$	$0.79$
City ${}_{\textrm{en}}$ $\rightarrow$ City ${}_{\textrm{zh}}$	$0.58$	$0.46$
Fruit $\rightarrow$ Color	$0.48$	$0.46$
Food $\rightarrow$ Taste	$0.47$	$0.47$
X+1 $\rightarrow$ X+2	$0.93$	$0.87$
X+1 $\rightarrow$ X*2	$0.73$	$0.66$

🔼 This table presents the correlation coefficients between gradients of related knowledge pairs before fine-tuning. For each pair of related knowledge prompts (e.g., City-Country, CEO-Company), it shows the Pearson correlation between the gradients calculated during the training process. This helps illustrate the relationship between gradients on different next-token-prediction (NTP) logits, providing insight into how the language model (LM) generalizes and updates knowledge during training. High correlation suggests a stronger linear relationship in the gradient space between the two prompts, indicating that updating one prompt’s knowledge is likely to simultaneously update the other due to the inherent correlation.
read the caption
Table 4: Correlation between gradients on related knowledge.

Corr.	Prec.	Relation Pair	Generalization (Random)
High	High	City $\rightarrow$ Country	$53.70\%\ (0.78\%)$
		Country $\rightarrow$ Continent	$50.93\%\ (20.00\%)$
		City ${}_{\textrm{en}}$ $\rightarrow$ City ${}_{\textrm{es}}$	$39.10\%\ (0.41\%)$
High	Low	X+1 $\rightarrow$ X+2	$0.00\%\ (9.09\%)$
High	Low	X+1 $\rightarrow$ X*2	$8.18\%\ (9.09\%)$
Low	Low	Fruit $\rightarrow$ Color	$11.60\%\ (6.67\%)$
		Food $\rightarrow$ Taste	$19.44\%\ (10.00\%)$
		CEO $\rightarrow$ Company	$4.34\%\ (1.00\%)$
		Language $\rightarrow$ Continent	$23.65\%\ (20.00\%)$
		City ${}_{\textrm{en}}$ $\rightarrow$ City ${}_{\textrm{zh}}$	$2.49\%\ (0.41\%)$
		City ${}_{\textrm{en}}$ $\rightarrow$ City ${}_{\textrm{ja}}$	$4.60\%\ (0.41\%)$

🔼 This table presents the results of an experiment evaluating the impact of linear correlation and the precision of a learned linear transformation (W) on the generalization performance of language models (LMs). Specifically, it shows the percentage of successful generalization across various relation pairs (e.g., City-Country, CEO-Company) categorized by levels of linear correlation (high/low) and W precision (high/low). The results demonstrate the importance of both factors for successful compositional generalization in LMs.
read the caption
Table 5: The ratio of successful generalization in relation pairs with different linear correlation and W𝑊Witalic_W precision.

City	Reference	Generalized	$W_{\textrm{ref}}$	$W_{\textrm{gen}}$	$W_{\textrm{max}}$
Shanghai	China	China	$0.50$	$0.50$	$0.50$
NYC	USA	USA	$0.58$	$0.58$	$0.58$
Copenhagen	Denmark	Denmark	$0.47$	$0.47$	$0.47$
Karnataka	India	India	$0.34$	$0.34$	$0.56$
Indianapolis	USA	India	$-0.05$	$0.15$	$0.17$
Dresden	Germany	Israel	$0.04$	$0.13$	$0.15$
Canberra	Australia	Canada	$0.04$	$0.10$	$0.10$
Helsinki	Finland	Sweden	$0.42$	$0.11$	$0.42$

🔼 This table presents a case study demonstrating the effects of the linear transformation (W, b) on compositional generalization and hallucination in the City→Country knowledge domain. It shows example pairs of cities and their corresponding countries where the model correctly generalizes (high W weight and correct association) and where it hallucinates (low or incorrect W weight, resulting in incorrect country assignments). The table illustrates the interplay between the strength of linear correlation, precision of W, and the accuracy of the model’s compositional generalization.
read the caption
Table 6: Generalization and hallucination in City→→\rightarrow→Country.

Mapping	Generalization
(City $\rightarrow$ Country)
Shanghai, Tokyo, Paris $\rightarrow$ China, Japan, France	$97.66\%$
Shanghai, Tokyo, Paris $\rightarrow$ Japan, France, China	$22.66\%$
S, T, P $\rightarrow$ C, J, F	$36.72\%$
(Country $\rightarrow$ Continent)
China, France, Canada $\rightarrow$ Asia, Europe, North	$78.12\%$
(CEO $\rightarrow$ Company)
Elon, Andy, Tim $\rightarrow$ Tesla, Amazon, Apple	$58.59\%$
(+1 $\rightarrow$ +2)
1, 2, 3 $\rightarrow$ 3, 4, 5	$9.38\%$

🔼 This table presents the results of an experiment evaluating the impact of vocabulary mappings on the model’s ability to generalize. The experiment replaced the deep internal layers of a language model with a simplified feedforward network. The model was trained on paired knowledge examples (e.g., Paris-France) to learn knowledge composition. The table shows the generalization performance of the simplified model under different vocabulary mappings, demonstrating how different mappings affect generalization.
read the caption
Table 7: Generalization with Different Vocabulary Mappings.

Relation Pair	Fruit-Color	Food-Taste	Gem-Color	Name-Country	Animal-Size
Correlation	$48.42$	$46.68$	$27.46$	$67.35$	$59.59$
Relation Pair	Object-Genre	Object-Heat	Object-Size	Object-Price	Object-Color
Correlation	$77.68$	$73.11$	$71.41$	$72.87$	$70.87$

🔼 This table presents the correlation coefficients between gradients of related simile objects and attributes in a language model. It shows how strongly the gradients of different but semantically related parts of the model are correlated during the training process. This is relevant to understanding how the model generalizes compositional knowledge and is particularly useful in evaluating the precision of the linear transformation (W,b) learned by the model. A high correlation indicates that the model is efficiently transferring updates during training.
read the caption
Table 8: Correlation between gradients on simile objects and attributes.

Relation Pair	Fruit-Color	Food-Taste	Gem-Color	Name-Country	Animal-Size
Correlation	$44.11$	$37.06$	$33.66$	$67.30$	$49.65$
Relation Pair	Object-Genre	Object-Heat	Object-Size	Object-Price	Object-Color
Correlation	$72.03$	$63.75$	$66.13$	$71.09$	$66.27$

🔼 This table presents the Pearson correlation coefficients between the logits of next-token prediction (NTP) tasks related to simile objects and their attributes. It shows correlations before and after large-scale fine-tuning (post-training) of the language model. This allows for an assessment of how well the model maintains its ability to generalize knowledge composition regarding similes after undergoing significant parameter updates. Higher correlation suggests better compositional generalization.
read the caption
Table 9: Correlation between logits on simile objects and attributes before and after large-scale post-training.

Template	Domain Size
Attribute	$23$
Cross-language	$11\times 5=55$
Simile	$17$
Math	$4\times 4=16$
Total	$111$

🔼 This table presents a summary of the number of prompts used in the experiments, categorized by four different families of knowledge composition: Attribute, Cross-language, Simile, and Math. It shows the total number of prompts used across all families, highlighting the distribution of prompts among the four types. This breakdown is essential for understanding the scope and balance of the experimental design.
read the caption
Table 10: The statistics of prompts in different families.

Knowledge		Template	Domain Size
Attribute	birthplace	“{} was born in the city of”	242
	city	“{} lives in the city of”	242
	country	“{} lives in the country of”	128
	continent	“{} lives in the continent of”	6
	language	“{} speaks the language of”	217
	company	“{} works for the company of”	100
	landmark	“{} lives near the landmark of”	100
	ceo	“{} works for the CEO called”	101
	mother	“{}’s mother’s name is”	100
	father	“{}’s father’s name is”	100
	job	“{}’s job is”	105
	personality	“{}’s personality is”	100
	pet	“{}’s pet is”	100
	sport	“{}’s favorite sport is”	102
	food	“{}’s favorite food is”	104
	drink	“{}’s favorite drink is”	102
	gender	“{}’s gender is”	3
	vehicle	“{}’s preferred mode of transportation is”	51
	color	“{}’s favorite color is”	15
	music	“{}’s favorite music genre is”	100
	hobby	“{}’s favorite hobby is”	101
	flower	“{}’s favorite flower is”	97
	vacation	“{}’s favorite vacation spot is”	101

🔼 This table lists the various prompts used in the experiments in the Attribute family. Each row shows a knowledge template, the associated prompt used to elicit responses from language models, and the size of the domain for which that template was applied (i.e., number of distinct entities considered). The goal is to investigate how language models compose knowledge by learning a linear transformation between different knowledge domains. For example, a prompt such as ‘X lives in the city of’ relates to the source knowledge domain of cities, and it is paired with a target knowledge domain, such as countries, using prompts like ‘X lives in the country of.’
read the caption
Table 11: Templates used in our experiments (Part 1: Attribute).

Knowledge		Template	Domain Size
Spanish	birthplace	“{} nació en la ciudad de”	242
	city	“{} vive en la ciudad de”	242
	country	“{} vive en el país de”	128
	continent	“{} vive en el continente de”	6
	language	“{} habla el idioma de”	217
	company	“{} trabaja para la empresa de”	100
	ceo	“{} trabaja para el CEO llamado”	101
	job	“El trabajo de {} es”	105
	mother	“El nombre de la madre de {} es”	100
	father	“{} el nombre del padre es”	100
	gender	“El género de {} es”	3
French	birthplace	“{} est né dans la ville de”	242
	city	“{} vit dans la ville de”	242
	country	“{} vit dans le pays de”	128
	continent	“{} vit sur le continent de”	6
	language	“{} parle la langue de”	217
	company	“{} travaille pour l’entreprise de”	100
	ceo	“{} travaille pour le PDG appelé”	101
	job	“{} travaille comme”	105
	mother	“Le nom de la mère de {} est”	100
	father	“Le nom du père de {} est”	100
	gender	“{} est de sexe”	3
German	birthplace	“{} wurde in der Stadt geboren”	242
	city	“{} lebt in der Stadt”	242
	country	“{} lebt im Land”	128
	continent	“{} lebt auf dem Kontinent”	6
	language	“{} spricht die Sprache von”	217
	company	“{} arbeitet für das Unternehmen von”	100
	ceo	“{} arbeitet für den CEO namens”	101
	job	“Der Beruf von {} ist”	105
	mother	“Der Name von {}’s Mutter ist”	100
	father	“Der Name von {}’s Vater ist”	100
	gender	“Das Geschlecht von {} ist”	3
Chinese	birthplace	{CJK}UTF8gbsn“{}所出生的城市是”	242
	city	{CJK}UTF8gbsn“{}所居住的城市是”	242
	country	{CJK}UTF8gbsn“{}所居住的国家是”	128
	continent	{CJK}UTF8gbsn“{}所居住的大陆是”	6
	language	{CJK}UTF8gbsn“{}说的语言是”	217
	company	{CJK}UTF8gbsn“{}工作的公司是”	100
	ceo	{CJK}UTF8gbsn“{}工作的公司的CEO是”	101
	job	{CJK}UTF8gbsn“{}的工作是”	105
	mother	{CJK}UTF8gbsn“{}的母亲的名字是”	100
	father	{CJK}UTF8gbsn“{}的父亲的名字是”	100
	gender	{CJK}UTF8gbsn“{}的性别是”	3
Japanese	birthplace	{CJK}UTF8min“{}が生まれた都市は”	242
	city	{CJK}UTF8min“{}が住んでいる都市は”	242
	country	{CJK}UTF8min“{}が住んでいる国は”	128
	continent	{CJK}UTF8min“{}が住んでいる大陸は”	6
	language	{CJK}UTF8min“{}が話している言語は”	217
	company	{CJK}UTF8min“{}が働いている会社は”	100
	ceo	{CJK}UTF8min“{}が働いている会社のCEOは”	101
	job	{CJK}UTF8min“{}の仕事は”	105
	mother	{CJK}UTF8min“{}の母の名前は”	100
	father	{CJK}UTF8min“{}の父の名前は”	100
	gender	{CJK}UTF8min“{}の性別は”	3

🔼 This table lists the various sentence templates used in the cross-language experiments of the research paper. Each template is designed to test the model’s ability to generalize knowledge across different languages. The table provides the knowledge type being tested (e.g., birthplace, city, country), the template in each target language (English, Spanish, French, German, Chinese, Japanese), and the number of distinct entities used for evaluation within each language for that template. The purpose is to systematically explore how the linear correlation between source and target language logits varies across different language pairs and knowledge types.
read the caption
Table 12: Templates used in our experiments (Part 2: Cross Language).

Knowledge		Template	Domain Size
Simile	object_color	“The color of {} is the same as”	85
	object_price	“The size of {} is the same as”	85
	object_heat	“The heat of {} is the same as”	85
	object_genre	“The genre of {} is the same as”	85
	object_size	“The size of {} is the same as”	85
	simile_color	“The color of {} is”	15
	simile_price	“The size of {} is”	2
	simile_heat	“The heat of {} is”	4
	simile_genre	“The genre of {} is”	22
	simile_size	“The size of {} is”	3
	simile_taste	“The taste of {} is”	3
	name_country	“{} lives in the same country as”	128
	gem_color	“The color of {} is the same as the gem called”	$50$
	animal_size	“The size of {} is the same as the animal called”	$100$
	food_taste	“{} has the same taste as the food:”	$95$
	fruit_color	“{} X has the same color as the fruit:”	$99$
Math	X+N	“{}+N=”	11
	X-N	“{}-N=”	11
	X*N	“{}*N=”	11
	X/N	“{}/N=”	11

🔼 This table lists the prompts used in the Simile and Math sections of the experiments. For Simile, it shows various prompts comparing attributes of different objects (color, price, heat, genre, size) using phrases like “The color of {} is the same as”. For Math, it displays prompts involving basic arithmetic operations (+, -, *, /) applied to an unknown variable represented by {}. Each prompt is categorized by the knowledge type it tests (object color, object price, etc.) and indicates the size of the domain of possible responses for each prompt.
read the caption
Table 13: Templates used in our experiments (Part 3: Simile and Math).

Relation Pair	Fruit-Color	Food-Taste	Gem-Color	Name-Country	Animal-Size
Correlation	$48.37$	$46.95$	$50.48$	$78.83$	$69.43$
Relation Pair	Object-Genre	Object-Heat	Object-Size	Object-Price	Object-Color
Correlation	$81.92$	$76.48$	$84.23$	$84.23$	$81.08$

🔼 This table presents the Pearson correlation coefficients between the logits of the LLaMA-3-2-3B language model for pairs of simile objects and their attributes. It shows the strength of linear relationships between the model’s predictions for different simile attributes (like color, size, taste) given a simile object. Higher values indicate stronger correlations, suggesting a closer semantic relationship between the attributes within the model’s representation.
read the caption
Table 14: Correlation between logits of llama-3.2-3b on simile objects and attributes.

Country	Influencing Cities
Sweden	Stockholm, Brisbane, Johannesburg, Cardiff, Chicago, Hyderabad, Aleppo, Lima, Rochester, Salem
Cuba	Havana, Chicago, Columbus, stockholm, Rochester, Hyderabad, Scarborough, Johannesburg, singapore, Hamburg
Switzerland	Columbus, Stuttgart, Cardiff, Leicester, Chicago, Brisbane, Saras, stockholm, vegas, Bethlehem
Ghana	Winnipeg, Nairobi, Johannesburg, Leicester, Atlanta, Tulsa, Maharashtra, Greenville, Brisbane, Lima
Poland	Warsaw, Cardiff, Liverpool, Maharashtra, stockholm, Amsterdam, Atlanta, Kashmir, Perth, Aleppo
Turkey	Istanbul, Chicago, Toronto, Maharashtra, stockholm, Johannesburg, Cardiff, Lima, Columbus, Ankara
Sudan	Nairobi, stockholm, Lima, Tulsa, Johannesburg, Maharashtra, Winnipeg, Hyderabad, Wilmington, Kashmir
Romania	Cardiff, Rochester, Johannesburg, Budapest, Seattle, Rajasthan, Hyderabad, Chicago, Kyoto, Lima
Samoa	Maharashtra, Leicester, Winnipeg, Chicago, Honolulu, Brisbane, Nairobi, Hyderabad, Lima, Cardiff
Iceland	Cardiff, Leicester, Chicago, Amsterdam, Wilmington, Islamabad, Winnipeg, Kyoto, Hyderabad, stockholm
Nigeria	Winnipeg, Nairobi, Maharashtra, Lagos, Johannesburg, Stuttgart, Leicester, Abu, Chicago, Tulsa
Iraq	Chicago, Hyderabad, Wilmington, Lima, Baghdad, stockholm, Kashmir, Tulsa, Belfast, singapore
Laos	Bangkok, Leicester, Chicago, Kashmir, Tulsa, stockholm, Winnipeg, Lima, Rajasthan, Johannesburg
USSR	Moscow, NYC, Midlands, stockholm, Chicago, Cardiff, Maharashtra, Pyongyang, Boulder, Columbus
Kosovo	Kashmir, Seattle, Leicester, stockholm, Tulsa, Belfast, Mosul, vegas, Rochester, Buenos
China	Beijing, Shanghai, Hyderabad, Brisbane, Columbus, stockholm, Maharashtra, Amsterdam, Leicester, Hamburg
Guatemala	Greenville, Tulsa, Leicester, Buenos, Johannesburg, Kashmir, Wilmington, Lima, Chicago, Rochester
Tunisia	Johannesburg, stockholm, Hamburg, Columbus, Leicester, Tulsa, Stuttgart, Winnipeg, Cardiff, Maharashtra
Denmark	Copenhagen, Cardiff, Leicester, Brisbane, Hyderabad, Atlanta, Saras, Chicago, Hamburg, Salem
Nicaragua	Nairobi, Bangkok, Rochester, Leicester, Amsterdam, Kerala, Maharashtra, Belfast, Winnipeg, Chicago
Türkiye	Maharashtra, München, Seattle, İstanbul, stockholm, Jakarta, Istanbul, Toronto, Milwaukee, Kyoto
Bosnia	Hyderabad, Islamabad, Belfast, Johannesburg, Jakarta, Cardiff, Rochester, Kashmir, Leicester, Lima
Netherlands	Amsterdam, Cardiff, Midlands, Columbus, Karachi, stockholm, Nottingham, Maharashtra, Saras, Wilmington
Malaysia	Leicester, Kuala, Cardiff, Hamburg, Maharashtra, Baltimore, Chicago, Columbus, Johannesburg, Hyderabad
Venezuela	Wilmington, vegas, Cardiff, Maharashtra, Rochester, Brisbane, stockholm, Buenos, Lima, Tulsa
Sri	Leicester, Atlanta, Kashmir, Rajasthan, Nairobi, Cardiff, stockholm, Lima, Maharashtra, Islamabad
Ireland	Dublin, Cardiff, Belfast, Leicester, Tehran, Johannesburg, Stuttgart, Aleppo, Bethlehem, Hyderabad
Liberia	Leicester, Winnipeg, Nairobi, Johannesburg, Chicago, Kerala, Rochester, Maharashtra, Atlanta, Greenville
Afghanistan	Kabul, Cardiff, Islamabad, stockholm, Tulsa, Chicago, Maharashtra, Kashmir, Rajasthan, Leicester
America	Columbus, Chicago, Belfast, Sofia, Hyderabad, Seattle, Cardiff, Johannesburg, Maharashtra, Moscow
Austria	Cardiff, Vienna, Hamburg, Hyderabad, Leicester, Bethlehem, Stuttgart, stockholm, Columbus, Rajasthan
Scotland	Cardiff, Glasgow, Edinburgh, Stuttgart, stockholm, Belfast, Leicester, Columbus, Maharashtra, Lima
Libya	Chicago, stockholm, Columbus, Leicester, Aleppo, Cardiff, Mosul, Lima, Wilmington, Johannesburg
Uruguay	Buenos, Seattle, Hyderabad, Maharashtra, Hamburg, Johannesburg, Wilmington, Leicester, Columbus, Cardiff
Bangladesh	Winnipeg, Cardiff, Leicester, Maharashtra, Tulsa, Atlanta, Chicago, Bangalore, Islamabad, Kashmir
Bahrain	Leicester, Chicago, Brisbane, Kashmir, Lima, Riyadh, Dubai, Wilmington, Atlanta, Saras
Pakistan	Islamabad, Cardiff, Jakarta, Karachi, Tulsa, Leicester, Winnipeg, Atlanta, Maharashtra, Wilmington
Fiji	Lima, Leicester, Fargo, Kashmir, Brisbane, Winnipeg, Johannesburg, Cardiff, Tulsa, Edinburgh
Cambodia	Bangkok, Tulsa, Leicester, Cardiff, stockholm, Kashmir, Johannesburg, Wilmington, Kabul, Lima
Singapore	singapore, Chicago, Leicester, Brisbane, Hamburg, Columbus, Atlanta, Kashmir, Johannesburg, Cardiff
Macedonia	Leicester, Stuttgart, Winnipeg, Rochester, Kashmir, Johannesburg, Jakarta, Maharashtra, Budapest, Lima
Mongolia	Winnipeg, Chattanooga, Leicester, Lima, Cardiff, Kyoto, Maharashtra, Johannesburg, Rajasthan, Hamburg
Peru	Lima, Perth, Maharashtra, Winnipeg, Leicester, Chattanooga, Seattle, Hyderabad, Nairobi, Chicago
Myanmar	Bangkok, Cardiff, Tulsa, Leicester, Winnipeg, Kashmir, Maharashtra, Kyoto, Lima, Chicago
Trinidad	Leicester, Cardiff, Maharashtra, Brisbane, Rochester, Tulsa, Winnipeg, Abu, vegas, Johannesburg
Colombia	Maharashtra, Columbus, Lima, Seattle, Rochester, Wilmington, Johannesburg, Stuttgart, Amsterdam, Hyderabad
Maurit	Winnipeg, Leicester, Johannesburg, Edinburgh, Cardiff, Chicago, Stuttgart, stockholm, Moscow, Wilmington
Iran	Tehran, Cardiff, Lima, Kashmir, Hyderabad, Leicester, Aleppo, Chicago, Stuttgart, Hamburg
India	Indianapolis, Cardiff, Maharashtra, Chicago, Hyderabad, Leicester, Lima, Columbus, Winnipeg, stockholm
Spain	Madrid, Hyderabad, stockholm, Spokane, Cardiff, Amsterdam, Rome, Barcelona, Dallas, Johannesburg
Honduras	Wilmington, Winnipeg, Buenos, Hamburg, Nairobi, stockholm, Johannesburg, Amsterdam, Columbus, Lima
USA	NYC, Moscow, Columbus, Midlands, Chicago, Sofia, Karnataka, Karachi, Cardiff, Sevilla

🔼 This table presents the top cities that most strongly influence the prediction of a given country in a language model’s next-token prediction. For each country, the listed cities represent those with the highest weights in the model’s internal linear transformation, reflecting the strongest learned associations between city and country.
read the caption
Table 15: The most influencing cities of counties in the City→→\rightarrow→Country correlation.

Father	Influencing Mothers
Omar	Olivia, Nora, Sara, Sofia, Naomi, Diana, Uma, Rosa, Eden, Jade
Victor	Victoria, Sofia, Maria, Savannah, Sophie, Uma, Sonia, Angela, Grace, Ivy
Andre	Angela, Sofia, Sophie, Savannah, Maria, Rebecca, Ivy, Clara, Chloe, Nina
Julio	Sofia, Chloe, Maria, Carmen, Rebecca, Ivy, Rosa, Olivia, Sonia, Savannah
Enrique	Carmen, Chloe, Rosa, Clara, Sofia, Emma, Maria, Rebecca, Fiona, Olivia
Amir	Sara, Sofia, Amelia, Eden, Mei, Nora, Uma, Bella, Victoria, Diana
Xavier	Sophie, Maria, Sonia, Olivia, Emma, Leah, Clara, Uma, Jasmine, Carmen
Javier	Carmen, Chloe, Sofia, Ivy, Maria, Jasmine, Olivia, Rosa, Fiona, Jennifer
Vlad	Elena, Sofia, Chloe, Mia, Nina, Angela, Diana, Naomi, Savannah, Clara
Roberto	Chloe, Sofia, Rosa, Carmen, Lucia, Olivia, Clara, Mei, Maria, Elena
Lars	Sophie, Clara, Maria, Nina, Ella, Sara, Harper, Savannah, Rebecca, Fiona
Min	Sonia, Mei, Angela, Eden, Clara, Chloe, Grace, Maria, Harper, Savannah
James	Grace, Fiona, Ella, Savannah, Emma, Angela, Chloe, Harper, Leah, Maria
Giovanni	Lucia, Fiona, Sofia, Savannah, Rosa, Diana, Bella, Chloe, Carmen, Mei
Ivan	Ivy, Elena, Sofia, Nina, Maria, Ada, Emma, Sophie, Savannah, Sakura
Diego	Chloe, Sofia, Maria, Rosa, Angela, Carmen, Savannah, Diana, Clara, Mei
Fernando	Maria, Rosa, Fiona, Savannah, Carmen, Angela, Sofia, Luna, Clara, Ada
Ethan	Elena, Leah, Jennifer, Emma, Jasmine, Chloe, Clara, Mei, Ada, Serena
Chen	Mei, Chloe, Grace, Nina, Eden, Harper, Sofia, Rebecca, Sakura, Sonia
Gabriel	Maria, Sophie, Eden, Leah, Sara, Grace, Chloe, Rebecca, Elena, Luna
Boris	Bella, Elena, Angela, Fiona, Nina, Ada, Sofia, Sophie, Nora, Leah
Jean	Sophie, Angela, Chloe, Maria, Naomi, Carmen, Savannah, Nina, Rebecca, Lucia
Dmitry	Sofia, Elena, Chloe, Diana, Nina, Savannah, Mia, Clara, Sakura, Ivy
Ahmed	Sara, Sofia, Sophie, Nora, Uma, Victoria, Eden, Sonia, Jennifer, Mei
Wei	Mei, Chloe, Grace, Rebecca, Mia, Sofia, Ada, Nina, Angela, Harper
Ibrahim	Sofia, Sara, Eden, Uma, Victoria, Nora, Bella, Ada, Sophie, Elena
Liam	Fiona, Emma, Mia, Chloe, Nora, Leah, Grace, Jasmine, Jade, Angela
Mustafa	Sara, Sofia, Nora, Victoria, Ada, Uma, Eden, Jade, Rosa, Elena
Jorge	Maria, Carmen, Rosa, Chloe, Sofia, Diana, Elena, Fiona, Angela, Nora
Leonardo	Clara, Sofia, Jennifer, Olivia, Chloe, Jasmine, Fiona, Rosa, Lucia, Diana
Luca	Fiona, Lucia, Sofia, Angela, Maria, Savannah, Emma, Clara, Sakura, Leah
Carlos	Carmen, Maria, Rosa, Olivia, Chloe, Sofia, Clara, Sakura, Savannah, Fiona
Pedro	Maria, Rosa, Carmen, Chloe, Olivia, Clara, Sakura, Sofia, Ivy, Ada
Michel	Sophie, Lucia, Nina, Maria, Leah, Eden, Elena, Sara, Sonia, Carmen
Kai	Mei, Maria, Nina, Angela, Chloe, Eden, Jade, Uma, Sakura, Ada
Benjamin	Leah, Eden, Bella, Rebecca, Sophie, Grace, Nina, Harper, Lucia, Victoria
Noah	Rebecca, Chloe, Nina, Nora, Eden, Naomi, Sara, Grace, Leah, Ada
Ali	Sara, Nora, Eden, Victoria, Uma, Sofia, Mei, Jade, Bella, Sonia
Levi	Chloe, Leah, Eden, Sara, Nina, Elena, Harper, Bella, Rosa, Rebecca
Antonio	Rosa, Maria, Angela, Lucia, Sofia, Chloe, Savannah, Olivia, Carmen, Fiona
Rafael	Sofia, Rosa, Carmen, Maria, Clara, Leah, Ivy, Chloe, Naomi, Lucia
Marco	Maria, Sofia, Jasmine, Lucia, Clara, Angela, Chloe, Mei, Rebecca, Carmen
Stefan	Elena, Fiona, Angela, Savannah, Clara, Sophie, Mei, Maria, Eden, Rebecca
Chung	Mei, Chloe, Grace, Maria, Angela, Sonia, Harper, Clara, Savannah, Mia
Abdul	Uma, Sara, Sofia, Nora, Jennifer, Ada, Rosa, Victoria, Eden, Bella
Muhammad	Sofia, Sara, Victoria, Mei, Emily, Jennifer, Nora, Uma, Eden, Naomi
Hugo	Maria, Sophie, Chloe, Clara, Fiona, Emma, Savannah, Angela, Carmen, Ivy
Axel	Sophie, Angela, Rebecca, Nina, Ada, Emma, Fiona, Ivy, Eden, Savannah
Lucas	Lucia, Maria, Clara, Fiona, Uma, Chloe, Harper, Savannah, Sophie, Jasmine
Mason	Harper, Leah, Jasmine, Chloe, Angela, Nina, Ada, Sofia, Ella, Emma
Hassan	Sara, Eden, Nora, Victoria, Bella, Sofia, Naomi, Savannah, Mei, Diana
Pablo	Maria, Chloe, Sofia, Rosa, Savannah, Rebecca, Carmen, Elena, Fiona, Luna
Raphael	Rebecca, Sophie, Elena, Leah, Rosa, Grace, Eden, Fiona, Clara, Sonia
Elijah	Elena, Eden, Rebecca, Chloe, Savannah, Ella, Leah, Emily, Grace, Uma
Louis	Sophie, Nina, Savannah, Grace, Rosa, Maria, Rebecca, Fiona, Leah, Sonia
Ricardo	Chloe, Carmen, Sofia, Rosa, Jennifer, Clara, Rebecca, Sakura, Mei, Olivia
Samuel	Sonia, Savannah, Leah, Eden, Rebecca, Sophie, Grace, Ada, Emma, Clara
William	Grace, Emma, Emily, Leah, Ada, Harper, Angela, Victoria, Fiona, Diana
Salman	Sonia, Sofia, Nora, Uma, Sara, Bella, Eden, Jennifer, Victoria, Leah
Oliver	Olivia, Sophie, Harper, Elena, Nina, Maria, Grace, Diana, Emma, Nora
Angelo	Angela, Sofia, Fiona, Clara, Chloe, Rosa, Carmen, Savannah, Lucia, Nina
Hans	Sophie, Rebecca, Angela, Savannah, Eden, Ella, Clara, Maria, Uma, Mei
Jamal	Sofia, Jasmine, Uma, Sara, Mei, Eden, Naomi, Victoria, Bella, Diana
Santiago	Sofia, Maria, Rosa, Carmen, Chloe, Savannah, Mei, Olivia, Ivy, Luna

🔼 This table displays the top influencing fathers associated with each mother’s name in the context of a linear correlation analysis between the ‘mother’ and ‘father’ knowledge domains within a language model. The analysis investigates how the prediction of a father’s name is influenced by different mothers’ names, revealing potential biases or patterns in the model’s learned associations between parental names.
read the caption
Table 16: The most influencing fathers of mothers in the Mother→→\rightarrow→Father correlation.

	Attribute	Influencing Objects
Genre	toys	toy, puzzle, drum, shoes, sweater, electric, fridge, gloves, chair, jeans
	transport	headphones, pen, plate, drum, electric, car, couch, smartphone, rug, suitcase
	kitchen	drum, jeans, pen, plate, toy, backpack, rug, fridge, chair, grill
	furniture	drum, chair, fridge, electric, rug, camera, puzzle, shoes, sweater, plate
	decor	drum, rug, vase, pen, sweater, jeans, smartphone, backpack, washing, speaker
	accessories	drum, shoes, plate, laptop, electric, oven, gloves, curtains, jeans, chair
	sports	basketball, pen, drum, jeans, plate, skateboard, tennis, rug, charger, puzzle
	travel	pen, drum, water, yoga, suitcase, sunglasses, watch, plate, jeans, fridge
	art	drum, puzzle, pen, scarf, water, camera, couch, toy, chair, jeans
	fitness	yoga, puzzle, drum, pen, couch, electric, sweater, scarf, rug, camera
	outdoors	drum, plate, pen, fishing, electric, water, couch, camera, toy, puzzle
	bags	drum, fridge, sweater, gloves, jeans, backpack, pen, rug, electric, umbrella
	electronics	electric, drum, headphones, plate, toy, pen, laptop, jeans, sweater, couch
	clothing	drum, sweater, electric, shoes, skateboard, pen, jeans, camera, rug, fridge
	food	fridge, drum, pen, water, scarf, couch, plate, smartphone, sweater, speaker
	photography	camera, water, drum, puzzle, scarf, skateboard, yoga, headphones, rug, couch
	literature	book, iron, pen, drum, yoga, couch, water, speaker, scarf, fan
	appliances	electric, sweater, jeans, plate, shoes, fridge, drum, chair, oven, laptop
	home	electric, oven, drum, smartphone, pen, backpack, rug, jeans, fridge, puzzle
	music	guitar, drum, headphones, scarf, basketball, pen, toy, puzzle, suitcase, water
Heat	warm	hoodie, sweater, clock, lamp, drum, earrings, yoga, apple, tennis, oven
	hot	hoodie, puzzle, tennis, drum, oven, jeans, car, lamp, earrings, fan
	neutral	jeans, speaker, blanket, sofa, car, puzzle, earrings, hoodie, tennis, rug
	cold	hoodie, car, earrings, fan, lamp, curtains, couch, clock, puzzle, sweater
Size	large	smartphone, jeans, drum, puzzle, hoodie, umbrella, pencil, clock, car, backpack
	medium	hoodie, tripod, car, keyboard, drum, suitcase, smartphone, basketball, curtains, bottle
	small	smartphone, hoodie, car, drum, pencil, jeans, backpack, keyboard, puzzle, toy
Color	black	jeans, iron, fan, umbrella, hoodie, suitcase, puzzle, bowl, printer, electric
	green	backpack, plate, puzzle, jeans, couch, umbrella, drum, soap, car, sweater
	blue	jeans, electric, puzzle, plate, backpack, fishing, bottle, chair, car, umbrella
	beige	jeans, soap, hoodie, drum, puzzle, bottle, suitcase, oven, bed, speaker
	gold	puzzle, backpack, car, earrings, iron, bottle, drum, jeans, plate, fan
	natural	jeans, bottle, puzzle, earrings, car, plate, oven, yoga, suitcase, drum
	silver	bottle, jeans, puzzle, iron, drum, mirror, soap, electric, backpack, earrings
	orange	puzzle, car, drum, backpack, jeans, umbrella, bottle, electric, oven, plate
	red	car, drum, earrings, puzzle, microwave, pen, umbrella, bowl, electric, backpack
	gray	jeans, soap, mouse, puzzle, plate, sweater, umbrella, printer, bed, backpack
	brown	soap, iron, puzzle, sweater, umbrella, backpack, speaker, drum, hoodie, couch
	yellow	plate, yoga, car, backpack, umbrella, soap, drum, puzzle, sweater, fan
	purple	puzzle, drum, electric, hoodie, backpack, jeans, microwave, mouse, bottle, bowl
	white	plate, suitcase, fan, jeans, puzzle, backpack, soap, umbrella, sweater, drum
Price	high	smartphone, drum, air, car, hoodie, jeans, backpack, umbrella, puzzle, electric
Price	low	drum, jeans, backpack, smartphone, car, hoodie, air, umbrella, puzzle, electric

🔼 This table shows the top influencing objects for various attributes within the simile correlation analysis. For each attribute (e.g., color, size, heat, genre), the table lists the objects that most strongly influence the prediction of that attribute in the language model. This helps illustrate how the model connects similar concepts, such as linking the color of an apple to the color of a banana. The strength of these connections is a key aspect of the model’s compositional generalization ability.
read the caption
Table 17: The most influencing objects of attributes in the simile correlation.

Completeness	Correlation	Precision (Hit@Top- $5$ )	Generalization
Whole Semantics	$0.85$	$0.49$	$55.67\%$
Word in a Phrase	$0.86$	$0.10$	$2.00\%$
Subword	$0.87$	$0.00$	$0.00\%$

🔼 This table presents the correlation, precision (Hit@Top-5), and generalization performance for tokens categorized by their semantic completeness. Three levels of semantic completeness are considered: ‘Whole Semantics’ (tokens with complete meaning), ‘Word in a Phrase’ (tokens forming part of a phrase), and ‘Subword’ (sub-word units). The results show that tokens with higher semantic completeness exhibit stronger correlations and higher precision, leading to better generalization.
read the caption
Table 18: The correlation and W𝑊Witalic_W precision of tokens with different levels of semantic completeness.

TL;DR#

Key Takeaways#

Why does it matter?#

Visual Insights#

In-depth insights#

Linear Knowledge#

LM Generalization#

Correlation’s Role#

Vocabulary’s Impact#

Future Research#

More visual insights#

Full paper#