Movie S2.

Languages achieve near-optimal compression. Left panel: The red dot traces along the optimal systems on the IB curve (theoretical limit), while the blue dot follows nearby, indicating the position of selected languages just below the curve in the information plane. A total of 23 representative languages are shown, which were selected to demonstrate the range of empirical variation accommodated by the IB model and the relation of that variation to languages’ positions near the IB curve. Right panel: Contour plots of the language’s naming distribution (top) and the IB encoder (bottom) that correspond to the blue and red dots on the left panel, respectively. The IB systems captures much of the structural variability in the data, and even languages that are less similar to the IB systems are still highly efficient, as seen on the left panel.

Efficient compression in color naming and its evolution

Noga Zaslavsky, Charles Kemp, Terry Regier, and Naftali Tishby

PNAS. 2018. 115:7937-7942 DOI: 10.1073/pnas.1800521115