<?xml version="1.0"?><?xml-stylesheet type="text/xsl"  href="../template.xsl"?><!DOCTYPE nsuarticle PUBLIC "-//NPG//DTD NSU//EN" "../nsu_article.dtd"><nsuarticle type="news">   <articleidlist> 	 <articleid type="uid">010927</articleid><storyno>-1</storyno> 	 <articleid type="doi">10.1038/nsu010927</articleid><storyno>-1</storyno>   </articleidlist>   <pubfm> 	 <pubdate> 		<dayofweek name="Friday"/> 		  <day>21</day> 		  <month>September</month> 		  <year>2001</year> 	 </pubdate> 	 <category>physics</category>   </pubfm>   <fm> 	 <title>Physicists decode the Bard</title> 	 <aug> 		<prefix></prefix> 		<fnm>Philip</fnm> 		<snm>Ball</snm> 		<suffix></suffix> 	 </aug> 	 <keywdgrp> 		<keyword>language</keyword> 	 <keyword>text</keyword><keyword>deconstruction</keyword><keyword>verb</keyword><keyword>noun</keyword><keyword>Shakespeare</keyword><keyword>linguistics</keyword><keyword>statistical physics</keyword><keyword>literature</keyword></keywdgrp> 	 <standfirst>The distribution of a word in a text can show whether it is a noun or a verb.</standfirst>   </fm>   <body> 	 <p> <figure align="left" filename="hamlet_160.jpg"><caption>Hamlet has been helping physicists with their enquiries.</caption></figure></p><p>Two physicists may have found a hidden structure in the English language<bibr rid="b1">1</bibr>. One can guess the syntactic or grammatical character of a word without knowing its meaning, they say, because different types of word are scattered differently through written texts, rather as certain wild flowers grow in certain parts of a meadow.</p><p>The new analysis might help to unravel how language arose, and could aid in the deciphering of coded messages, say the duo: Marcelo Montemurro of the National University of Cordoba, Argentina, and Damian Zanette of the Centre for Atomic Science in Bariloche, also in Argentina.</p><p>In the 1940s, US sociologist George Kingsley Zipf observed that words in English texts occur in an apparently universal pattern. Zipf's patient students counted all the words in Shakespeare's Hamlet and ranked them in order of decreasing frequency. The most common word, with 1,087 appearances, was 'the', followed by 'and'.</p><p>Zipf found that the distribution obeyed a mathematical relationship called a power law. In other words, a plot of the logarithm of a word's rank against the logarithm of its frequency is a straight line - for all the words in the play.</p><p>This curious result seemed to hold true for other texts. But it doesn't say much about what each word signifies. Word number 45, for example, is 'or'; word 47 is 'Hamlet'. All Shakespeare's plays probably contain 'or', but only one of them contains 'Hamlet'. So some words are highly context-specific, others are more general. 'Hamlet' looks like a relatively common word if you analyse Hamlet, but not Twelfth Night.</p><p>Montemurro and Zanette wanted to link this kind of statistical analysis to the meanings of words. They assigned to each word a 'Shannon entropy', named after another 1940s mathematician, Claude Shannon, who worked on information flow in communications systems.</p><p>The Shannon entropy is a measure of how context-specific a word is. A word confined to only one part of a text has low entropy, even if it is very common in that section. The entropy of a word that is evenly distributed throughout the text is high.</p><p>The researchers calculated the frequency and Shannon entropy of all 885,535 words in Shakespeare's 36 plays (23,150 different words). Unsurprisingly, the entropy of a word increased as its frequency increased - common words tended to be evenly distributed. </p><p>But there were big fluctuations in this relationship - some words are common but have very low entropy, perhaps appearing in only a few plays, or just one.</p><p>Then, reversing the 'monkeys with typewriters' scenario, Montemurro and Zanette produced 36 nonsensical 'Shakespearean' plays by scrambling all the words at random. This absurd oeuvre had a similar frequency and entropy relationship, but without the fluctuations.</p><p>The differences between a word's entropy in the real plays and in the randomized plays depends on that word's grammatical and even its narrative nature. For example, names, status nouns - such as 'duke' and 'king', which are common in Shakespeare - and verbs cluster into different regions of a plot of entropy difference versus frequency. </p><p>In other words, the distribution of a word through the text can indicate whether it is a noun or a verb, for example.</p><p>The researchers observed the same effect in the works of Charles Dickens and Robert Louis Stevenson. It remains to be seen how the findings translate to other languages.</p>   </body>   <bm> 	 <refgrp> 		<bib id="b1"><refau> 		  <snm>Montemurro</snm>, 		  <inits>M. A.</inits></refau> 7 <refau> 		  <snm>Zanette</snm>, 		  <inits>D. H.</inits></refau> <atl>Entropic analysis of the role of words in literary texts</atl>. Preprint available at <weblink arturl="http://xxx.lanl.gov/abs/cond-mat/0109218">http://xxx.lanl.gov/abs/cond-mat/0109218</weblink> (<pubyear>2001</pubyear>).		  </bib></refgrp> <features><related_stories url="010913/010913-2"><title>Firm size matters</title><pubdate><dayofweek name="Friday"/><day>7</day><month>September</month><year>2001</year></pubdate></related_stories><related_stories url="010906/010906-16"><title>Babies' hands babble</title><pubdate><dayofweek name="Thursday"/><day>6</day><month>September</month><year>2001</year></pubdate></related_stories><related_stories url="991223/991223-6"><title>Say what you see</title><pubdate><dayofweek name="Tuesday"/><day>21</day><month>December</month><year>1999</year></pubdate></related_stories><related_stories url="991118/991118-2"><title>Making sense of sentences</title><pubdate><dayofweek name="Friday"/><day>12</day><month>November</month><year>1999</year></pubdate></related_stories><related_stories url="990617/990617-6"><title>The bilingual brain</title></related_stories><related_stories url="990603/990603-7"><title>A sound basis for Dyslexia</title></related_stories></features><pic_idea>The works of Shakespeare, e.g. still from a play or film of Hamlet.</pic_idea>   </bm> </nsuarticle> 
