Why [UNK]/PREP?
The [UNK] strategy masks the lexical identity of the preposition, making it possible to test whether the model captures constructional information beyond specific lexical cues.
This website accompanies the CMCL 2026 paper “Layer su Layer: Identifying and Disambiguating the Italian NPN Construction in BERT’s family” and provides an interactive visualization of the PCA projections of contextual embeddings extracted from BERT. The aim is to qualitatively explore whether construction-relevant distinctions emerge in the representation space.
The study focuses on the Italian NPN (noun–preposition–noun) constructional family and investigates whether contextual embeddings encode information relevant to both construction identification and semantic disambiguation. Each point in the plots corresponds to an instance of an NPN construction or a distractor, depending on the experimental condition, while colors reflect constructional or semantic labels.
These visualizations are intended as a qualitative complement to probing-based evaluation: rather than providing direct evidence of linguistic knowledge, they offer an exploratory perspective on how embeddings are geometrically organized across layers and embedding types, and whether such organization aligns with linguistically motivated distinctions.
@article{gorzoni2026layer,
author = {Gorzoni, Greta and Pannitto, Ludovica and Masini, Francesca},
title = {Layer su Layer: Identifying and Disambiguating the Italian NPN Construction in BERT's family},
journal = {Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics},
year = {2026}
}
@dataset{gorzoni2026npn,
author = {Gorzoni, Greta and Pannitto, Ludovica and Masini, Francesca},
title = {NPN Construction and Distractor dataset},
year = {2026},
publisher = {Zenodo},
doi = {10.5281/zenodo.19095867}
}
@article{masini2024costruzioni,
title={Costruzioni su costruzioni: idiomaticity and regularity of NPN discontinuous reduplications in Italian},
author={Masini, Francesca and others},
journal={TOPOI},
pages={51--82},
year={2024},
publisher={Aracne}
}
@dataset{masini2024npn,
title={NPN discontinuous reduplications in Italian: dataset},
author={Masini, Francesca},
year={2024},
publisher={Univesity of Bologna}
}
This visualization shows PCA projections for the construction identification task based on [UNK] representations. By replacing the prepositional slot with [UNK], the setup reduces direct lexical information and highlights whether constructional distinctions can still emerge in the embedding space.
[UNK]/PREP?
The [UNK] strategy masks the lexical identity of the preposition, making it possible to test whether the model captures constructional information beyond specific lexical cues.
Look for whether positive and negative instances occupy more distinct regions as layer depth increases, and whether their organization becomes more coherent in later layers.
These plots provide a qualitative complement to classifier-based probing: they help assess whether distinctions that are recoverable quantitatively also have visible geometric correlates in the representation space.