The ubiquity of audio commutation technologies, particularly telephone, radio, and TV, have had a significant affect on language. They further spread English around the world making it more accessible and more necessary for lower social and economic classes, they led to the blending of dialects and the death of some smaller regional dialects. They enabled the rapid adoption of new words and concepts.
How will LLMs affect language? Will they further cement English as the world’s dominant language or lead to the adoption of a new lingua franca? Will they be able to adapt to differences in dialects or will they force us to further consolidate how we speak? What about programming languages? Will the model best able to generate usable code determine what language or languages will be used in the future? Thoughts and beliefs generally follow language, at least on the social scale, how will LLM’s affects on language affect how we think and act? What we believe?
[shameless ad] This sort of question fits well [email protected] [/shameless ad]
What causes the loss of a local variety (dialect or language) is not simply exposure to other varieties, but the loss of the identity associated with said variety. In other words, what led to the blending and death of those dialects wasn’t the audio communication technology - it’s economical, social, and ideological pressures, such as nationalism.
I’ll exemplify this using rhoticity in England. If telephone, radio and TV led to blending and death of dialects, you’d expect rhoticity in England to increase, due to exposure to American media. It didn’t - it’s decreasing:
Source for the map: it’s a collation of both maps in this article. The reason for the shift however becomes obvious when you look at identity matters: “you’re a Brit, speak like a Brit”.
The exact same reasoning applies to other languages, by the way. Caipira Portuguese features aren’t being replaced with the ones from that weird Globo TV accent, but with the ones spoken in São Paulo city; sheísmo in Argentina seems to be spreading, regardless of media from other countries; Occitan was not killed in France by simply exposing kids to French, but by making them feel ashamed of speaking Occitan.
With that out of the way, it’s hard to predict the future impact of machine text generation, be it through LLMs or better models. It’s perfectly possible that this sort of tech helps the preservation of local varieties, as LLMs are kind of good at translation; for example, I’ve noticed that Gemini is able to parse Venetian, even if unable to answer in the language.