The distributional hypothesis puts forward the idea that words with the same meaning are grouped together within texts. The idea examines words for their meanings and their distribution throughout a text. This is then compared to the distributions of words with similar or related meanings. Such examinations determine that words occur together within their context due to their similar or related meanings.
Distributional hypothesis was first suggested by British linguist J.R. Firth. He is known for the most famous quote regarding the idea “You shall know a word by the company it keeps.” Firth, who is also well known for his studies regarding prosody, believed that no one system would ever explain how a language works. Instead, he believed several overlapping systems would be needed.
American linguist Zellig Harris built on Firth’s work. He wanted to use math to study and analyze linguistic data. His ideas on math’s contribution to such studies are important, but he is also known for covering a wide range of linguistic ideas during his lifetime.
Studies into the distributional hypothesis are part of the examination of linguistics. Mathematical and statistical methods, not linguistic ones, are used to sift through large amounts of language data. This means, therefore, that the distributional hypothesis is part of computational linguistics and statistical semantics. It is also related to ideas from linguists and linguistic philosophers about the development of native languages in children, a process known as language acquisition.
Statistical semantics uses mathematical algorithms to study word distribution. These results are then filtered by meaning and further studied to find out the distribution of words related by meaning. There are two main methods of statistical semantics: distribution by word clusters and by text region.
Studying word distribution by clusters of related meanings is called Hyperspace Analog to Language (HAL). HAL examines the relationships of words clustered together in a text. This can be intra-sentence or intra-paragraph, but rarely further afield than that. The semantic distribution of words is determined by how often the words occur next to one another.
Whole text studies use Latent Semantic Analysis (LSA). This is a natural language processing method. Words with a close meaning will occur close to one another throughout a text. Such texts are examined for clusters using a mathematical method called Singular Value Decompression (SVD).
Data gleaned from studies into the distributional hypothesis are being used to study the building blocks of semantics and word relationships. Moving beyond a structuralist approach, the hypothesis can be applied to Artificial Intelligence (AI). This would help computer programs better understand the relationship and distribution of words. It also has implications for how children process words and create word associations and sentences.