’Layer su Layer’ - CMCL 2026 paper · PCA visualtization explorer

PCA visualisations of BERT's contextual embeddings

This website accompanies the CMCL 2026 paper “Layer su Layer: Identifying and Disambiguating the Italian NPN Construction in BERT’s family” and provides an interactive visualization of the PCA projections of contextual embeddings extracted from BERT. The aim is to qualitatively explore whether construction-relevant distinctions emerge in the representation space.

The study focuses on the Italian NPN (noun–preposition–noun) constructional family and investigates whether contextual embeddings encode information relevant to both construction identification and semantic disambiguation. Each point in the plots corresponds to an instance of an NPN construction or a distractor, depending on the experimental condition, while colors reflect constructional or semantic labels.

These visualizations are intended as a qualitative complement to probing-based evaluation: rather than providing direct evidence of linguistic knowledge, they offer an exploratory perspective on how embeddings are geometrically organized across layers and embedding types, and whether such organization aligns with linguistically motivated distinctions.

Italian NPN Constructions in Contextual Space

References:

How to read the visualizations:

  • Points: individual dataset instances.
  • Colors: class labels or semantic labels.
  • Frames / layers: successive hidden layers of the BERT model.
  • Interpretation: visible clustering can suggest distinctions in representational space, but PCA remains a partial projection and should be read together with probing results.

UNK · Identification across NPN Cxns and Distractors

This visualization shows PCA projections for the construction identification task based on [UNK] representations. By replacing the prepositional slot with [UNK], the setup reduces direct lexical information and highlights whether constructional distinctions can still emerge in the embedding space.

Dataset composition: 240 instances of NPN Cxns and distractors. Embedding: [UNK]

Why [UNK]/PREP?

The [UNK] strategy masks the lexical identity of the preposition, making it possible to test whether the model captures constructional information beyond specific lexical cues.

What to inspect

Look for whether positive and negative instances occupy more distinct regions as layer depth increases, and whether their organization becomes more coherent in later layers.

How it relates to probing

These plots provide a qualitative complement to classifier-based probing: they help assess whether distinctions that are recoverable quantitatively also have visible geometric correlates in the representation space.

Loaded file: unk_identification.html
Embedded interactive HTML