Integrated Phonological Similarity Analysis
This tool visualizes phonological similarity between Chinese dialect survey points across 1,318 regions. Each region is compared against all others using 5 phonological features and 6 similarity metrics.
Features (Equal Weight 1:1:1:1:1)
Each word's pronunciation is decomposed into 5 components:
- Initial - onset consonant
- Medial - glide (i, u, y, w, j)
- Nucleus - main vowel
- Coda - final consonant (n, m, p, t, k, etc.)
- Tone - tonal category
When multiple features are selected, their scores are averaged with equal weights. All 31 possible combinations (C(5,1) + C(5,2) + ... + C(5,5)) are precomputed.
Similarity Metrics
Inventory Metrics (Blue Area) - Set Comparison
Compare the inventory (set of unique phonemes) between two regions:
| Metric | Formula | Behavior |
| Jaccard | |A∩B| / |A∪B| | Strict; penalizes differences strongly |
| Dice | 2|A∩B| / (|A|+|B|) | More weight to shared items |
| Ochiai | |A∩B| / √(|A|×|B|) | Cosine-like; geometric mean |
| Overlap | |A∩B| / min(|A|,|B|) | Subset-friendly; ignores size difference |
Character Metrics (Yellow Area) - Pairwise Comparison
Compare pronunciation of each shared word between two regions:
| Metric | Formula | Behavior |
| Exact Match | (1/N) × Σ Match(Ai,Bi) | Binary: 1 if identical, 0 otherwise |
| Levenshtein Ratio | 1 - Dist(A,B)/MaxLen | Partial credit for similar strings |
Controls
- Source Region - Select by Province → City → Region code (county). The selected region is shown as a blue dot.
- Speaker Group - Filter by speaker demographics.
- Features Combination - Toggle which features contribute to the similarity score.
- Rank Percentage / Count - These are linked. Adjust one and the other updates. Regions within the top X% are highlighted in red.
- Hover Tooltip Fields - Toggle which fields appear when hovering over a data point.
Interactions
- Zoom - Scroll wheel to zoom. Points scale automatically.
- Pan - Click and drag.
- Hover - Shows region details and similarity scores.
- Map Titles - Click to see metric formula and description.