With all the rage on LLMs …some weekend thoughts on linguistics, computation, compression and storage … in an LLM conversation style … raises more questions, ideas, open explorations than provide answers : https://lnkd.in/ee79eEzJ
Here are few quick shares :
– Not all languages are equal in their capabilities in information (orthographic) density esp. in their scripts (written language). Chinese and English language are examples of diversity here.
– Characters, character sets, encoding all have a role to play.
– More importantly are there encoding optimization opportunities to store information using languages with higher orthographic density by using Hindi or Sanskrit instead of English ? For example, Bharat in English is 6 characters (bytes) and in Hindi / Telugu it is 3 characters and we can technically create an encoding scheme that allows for storing this information in 3 bytes
– Are we constrained with current computer architectures by the 2 states that bits can hold. ?
– Do quantum computers have a role to play? Linguistics and quantum computation aren’t that far apart. There could be very interesting possibilities.
We are on the edge.
Few last thoughts : does higher information density force humans to apply more pattern recognition effort per character thus forcing humans to read and write more slowly ? can this in-turn cause more meticulous and possibly stricter behavior ? language and culture and how do they intertwine.?