ing out specific roles throughout the process. Every act of communication involves an “information source” which “selects a message out of a set of possible messages”
(Weaver1949, 11), a “transmitter” whichencodesit into a suitable “signal”, and a “receiver”
whodecodesthe message back into intelligible form. During the transmission, certain things which were not intended by the information source are added to the signal and may cause errors in the transmission, these things are called “noise”. To illustrate this scheme one may imagine a conversation between agent𝐴and agent𝐵happening on a street. Oversimplifying, agent𝐴′𝑠brain represents the information source and her vocal system the transmitter; while agent𝐵′𝑠auditory system is the receiver and his brain the destination (1949, 12). Because the conversation is happening on the street, the sound of passing cars and people represents the noise threatening the clarity of the conversation.
From a technical standpoint,codemay be defined as the “mapping of a finite set of sym-bols of an alphabet onto a suitable signal sequence” (cited by Kittler2008, 40). Codes may be intelligible, such as a written language (including those used in programming) or unintelligible. Unintelligible codes may be so due to practical or technical reasons (such as the case with machine code and barcodes, respectively), out of ignorance (due to illiteracy or lack of knowledge of a given language) or because they have been de-liberately made obscure. Such is the case withcyphers.⁷⁰
A cypher is essentially a method oralgorithmfor concealing information by deliber-ately transforming an otherwise intelligible message into apparent random gibberish to the eyes of unwanted and potentially prying third parties. To recover the hidden information, one must know thekeyto translate the gibberish back into something intelligible. Without it, decryption can only be carried out by reverse-engineering the cypher through some form of statistical analysis, through plain guessing (which is commonly known as a “brute force attack”), or through a combination of both.⁷¹The first step towards reverse-engineering or “cracking” (not brute-forcing) a cypher is
one mind may affect another” (Shannon and Weaver [1949]1980, 3), this, of course, applies toany medium, from spoken words to audiovisual content. However, MTC is solely concerned with the technical/quantitative region of communication; hence, while it acknowledges the existence and importance of the other two, it essentially ignores them. That is why Shannon (1948) explicitly states the “psychological” aspects of communication are irrelevant to his theory.
⁷⁰The term cypher has its roots in the Arabic wordsifrwhich, meant “emptiness” and was also the name for “zero”.
⁷¹In our post-Snowden era, where encryption systems are growing in popularity and becoming con-sumer products and expected features in communication services, codebreaking has also become pervasive. Because reverse-engineering encryption requires considerable computation power and time, most malicious codebreaking attempts resort to brute force attacks, middleman attacks or even “social engineering” or a combination of all of these techniques.
4.5 Understanding MTC through the basics of cryptography
to look for somepatternhiding underneath the encrypted message. Luckily for code-breakers, patterns are more persistent that we would generally think (Gleick [1992]
2011, 179).
Patterns involve structure and, more important, orderly repetition. Superfluous or unnecessary repetition is said to beredundant. While redundancy is frequently equated to needless excess, it is intrinsic to languages. Shannon (1949) calculated English has a redundancy of roughly50%, which means that about half the words in a message could be eliminated without rendering it completely unintelligible (Weaver1949). But redundancy also operates at a more granular level. For example, in an English (or Spanish) text, virtually all instances where the letter “q” appears makes the following
“u” redundant, simply because there are very few words where “u” is omitted from the
“qu” pair or “digraph”. Redundancy is also responsible for the fact that we can write something like so: “if u cn rd ths” and still be able to understand it (Gleick [1992]2011).
In everyday communication, redundancy is a desirable feature⁷²because it increases the chances of a message being correctly interpreted (Floridi2004; Gleick [1992]2011).
Repetition counters equivocation and misunderstandings caused by the inevitable noise that accompanies all instances of communication. That is why, as Floridi (2004, 15) notes, “in a crowded pub, you shout your orders twice and add some gestures”. In everyday language, it is easier to make ourselves understood by incrementing our
“verbosity” (Gleick [1992]2011). Thus, in this context redundancy works a method of error correction. Repetition does not make normal communication more efficient from a purely quantitative standpoint, but it certainly makes it clearer.
Lack of efficiency and clarity are precisely the opposite of what cryptographers at-tempt to achieve. Like all effective and “elegant” codes, good cyphers allow their user’s to encode as much information in as little space as possible; all the while making it extremely difficult to be accessed without the appropriate decoding mechanism. Re-dundancy that leads to the recognition of a pattern is a cryptographer’s nightmare
⁷²Redundancy is also welcomed in the context of art. As Russian polymath Mikhail Volkenstein noted:
Unlike non-artistic texts—in newspapers, for example—, in artistic ones repetitions are far from being always redundant, that is, far from being devoid of fresh infor-mation. In ornamentation—of tiles or wallpaper, for example—a repeating pattern may have an emotional impact precisely because of the repetition. And this holds not only for applied art. A repeated refrain in a poem or a passage of music has artistic significance. This shows again the importance of the integrity of an artistic work—the impossibility of delineating from it a rational content where repetition would indeed be redundant. (Volkenstein [1986]2009, 188)
and a cryptanalyst’s dream. Weak encryption is usually bad at concealing its structure.
For example, simple substitution methods⁷³ — such as the “Caesar cypher”⁷⁴ — which merely shift the letters in the alphabet a fixed number of places are extremely vulner-able tofrequency analysis.
Frequency analysis takes advantage of the statistical structure (Gleick [1992] 2011, 180) — and hence, of the redundancy — of languages.⁷⁵ Statistical structure refers to the fact that in every language some phonemes are more frequent than others, which in turn means that, in writing, some symbols and letters are more common than others. For example, both in English and Portuguese the letter “e” has the highest frequency, whereas, in Spanish, “a” is the most frequent one. Knowing this, a code breaker would crack a substitution cypher by matching the character with the highest frequency in the encrypted message with the letter with the highest frequency in the language used. She would then repeat the procedure with all subsequent letters in order of frequency until a recognisable pattern emerged. While frequency analysis was initially done by humans, after WWII the task was passed on to computers, which can be programmed to carry out the process much faster.⁷⁶
The statistical structure is not limited to syntactics, as the level of abstraction grows more complex, frequencies become influenced by semantics (meaning), pragmatics (usage), and context. Thus, depending on the circumstances some words are more likely to appear than others. That is why MTC regards communication as astochastic
⁷³There are stronger substitution methods, such as the one developed by Renaissance polymath Battista Alberti, which remained unbreakable for centuries.
⁷⁴The “Caesar cypher”, named after one of its most illustrious users worked by shifting the letters in the alphabet four places; thus all “D’s” were “A’s”, and all “E’s” were “D’s”, etc. (Kittler2008, 40). Needless to say that, under contemporary standards, its efficacy as an encryption system is virtually null, although it is not as weak as the one provided by ROT23.
⁷⁵By most accounts, the first person to leave a written reference on the fact that in every language some phonemes and letters are more common, and also that their frequency could be used to crack en-crypted messages was the ninth century polymath Aal-Kindi (ca. 800–870 CE), “The philosopher of the Arabs” (Singh2001). The first Westerner to arrive at the same conclusion was another polymath, the Italian architect and mathematician Leon Battista Alberti (1404–1472), the inventor of linear per-spective (Kittler2008) and the “Father of Western Cryptology” (Kahn1996).
⁷⁶Frequency analysis as described above, is not the only code-breaking method, in fact, it is only use-ful against the weakest substitution cyphers. Since the Renaissance, when Leon Battista Alberti de-scribed a polyalphabetic cypher (which is essentially a cumulous of Caesar cyphers, but remained unbroken until the Victorian Era) cryptography has been evolving. Particularly after the two World Wars, both cyphers and cryptanalysis have become extremely sophisticated thanks to computa-tional technologies. Nonetheless, statistical analysis and probability remain at the core of cryp-tography since, at a fundamental level all encryption systems comprise “a finite (though possibly vast) number of possible messages, a finite number of possible cryptograms, and in between, trans-forming one to the other, a finite number of keys, each with an associated probability” (Gleick [1992]
2011, 181).
4.5 Understanding MTC through the basics of cryptography
system, meaning that is neither deterministic,⁷⁷nor random, but probabilistic (Gleick [1992]2011, 187).⁷⁸ But language, it turns out, is not only representative of astochastic system, but also of aMarkovprocess and — at least at the syntactic level — of an ergodic process.
A Markov chain⁷⁹is a sequence of events in which the probability of each new event is determined by the outcome of the previous event, or in more complex cases, on the outcomes of various preceding events (Volkenstein [1986]2009, 148). Ergodic processes are a subset of Markov processes, the difference being that any reasonably large sam-ple taken from an ergodic process is representative of the whole system (Shannon and Weaver [1949]1980). Human languages are Markov systems because certain words are more likely to appear when and if others have been uttered before — for example, in English, the article “the” is more likely to be followed by a noun or a verb than by any other type of grammatical unit. Language may be consideredergodicsince analysing a regular book or even a newspaper can yield an accurate picture of the statistical struc-ture of the language in which they have been written.⁸⁰ In short, for MTC, messages arechosenfrom a (finite) set of possible messages. The more unexpected the message, the higher the amount of information it carries; and vice versa, the more expected, and hence, redundant, the less informative. Following this logic, it is clear that the higher the randomness in a given dataset, the higher its informativeness.
⁷⁷In the non-pejorative, mathematical sense, deterministic means that a system’s states are caused by prior states withabsolutecertainty, rather than probabilistically (Pinker2003, 112).
⁷⁸Fair dice are an example of a stochastic system since it is possible to calculate the probability of getting any number between two and twelve at any given throw — seven being the most likely to appear, and each throw is subject to a certain amount of randomness orentropy. Whereas two extremely biased dice represent a deterministic system since, after a series of throws, one can be reasonably sure of what number will come next; thus, it is virtually devoid of randomness. Finally, a random system, i.e., one in a maximum state of entropy, is one in which the events carry no discernible pattern on which to base future predictions, for there is just no way to calculate the likelihood of any of its outputs.
⁷⁹So named after Russian mathematician Andrey Markov (1956–1922) who proved this probabilistic phenomenon by analysing, amongst other works, Pushkin’s Eugene Onegin. While literary in ori-gin Markov’s theory is successfully applied in the physical sciences and economics (Volkenstein [1986]2009, 148).
⁸⁰This was precisely what Samuel Morse did when conceiving the code that bears his name. Reportedly he counted all the letters in a Philadelphia newspaper to find out which were the most frequent ones and thus assign them the shortest symbols. Having found12, 000“E’s”, followed by9, 000“T’s” he decided to assign these letters a single dot and a single dash, respectively (Kahn1996).