Archive-name: temp-sci-crypt Version: $Id: temp-sci-crypt,v 1.05 1993/03/15 09:30:15 lwl Exp $ "temporary, independent" sci.crypt Frequently Asked Questions There a group is putting together a fine FAQ for this topic. However, in my brief tenure on Internet, there seems to be a consistent cry for some form of FAQ and postings that also cry for one. After a discussion with a former member of the FAQ group, I've decided to post this from time to time. This is an attempt to answer many basic questions in hope of providing a lot of the benefit of a FAQ without the burden of being a complete answer to all relevant questions. There is no desire or attempt to replace the other group's work; this is more of a stopgap. However, beginners should find this very helpful. Note to constant readers: The only changes from last time is to update question 17 to reflect the current status of the RFD. Note: References to a "Megabarfoocorp" are intended to be fictional. Question Summary (search Q1 for question 1 below, etc.). 1. What is cryptography, generally? 2. I have invented this great algorithm. . . 3. What is a cryptographic "attack"? 4. What if I keep my new algorithm a secret? 5. What are the basic cryptographic "attacks"? 6. What makes an encryption system good? 7. I hereby challenge the world. . . 8. Are there legal restrictions on crypto? 9. What is a public key system? 10. What is key distribution? 11. What is the 'one time pad'? 12. What is the NSA? 13. What is the ACA? 14. What are some good books on cryptography? 15. What are the Beale ciphers? Are they a hoax? 16. What kinds of things resemble cryptography, but are not? Can I bring them up here? 17. Can I/Should I post here about politics and privacy concerns related to cryptography? Q1: What is cryptography? How, basically, does it work? What are the basic terms used to describe cryptography? Cryptography is the art and science of hiding data in plain sight. It is also the art and science of stealing data hidden in plain sight. There are at least three players. The first is the one who has the original data, which is presumed to have high value to others. This data is presumed to reside in a safe place that no one but the originator and his/her friends can see. (If the originator cannot physically secure the original data, cryptography is a waste of time). Now, for cryptography to be necessary, the data must, for some reason, have to be transmitted over some public means such as a telephone line, a letter through the mail; any means that cannot be physically secured by the owner to a legitimate receiver of the data. The receiver is the second party. Cryptography is any transformation of the data into a form it is hoped that cannot be recovered in a timely manner by an unknown party, who is called here 'the opponent'. The transformation is not some physical means, such as hiding the data on microfilm or some such; it is some mathematical transformation of the original data that the receiver on the other end knows how to undo. The process of scrambling (transforming) the data is called 'encryption'. Unscrambling is called decryption. An encryption system has two basic parts. 1) A general transformation process called the encryption algorithm. 2) A customization of that algorithm called a cipher key. The sender and the receiver must find a secure means to exchange the cipher key. That is, the same public means used to send the encrypted data cannot be used. This may be a difficult problem, and has a variety of solutions, but will be assumed solved for now. Once the key is successfully exchanged, the two parties can separately implement the encryption algorithm and its inverse, the decryption algorithm. At this point, the data can be transmitted in its encrypted form using the agreed-to key to customize the general algorithm to a particular version that transforms the data. Since the encrypted data is sent over some insecure medium, it is assumed that an opponent (the third party) may intercept the data, possibly without being detected, and analyze the encrypted text, also called cipher text. In theory, any encryption system can be defeated, given enough time. The amount of time it takes cannot always be predicted. This is because the opponent can supply extra information that might reduce the computation time a great deal. For one thing, the sender and receiver may make a very poor choice of cipher key. If the opponent has a list of poor keys, a computer may permit a large list of such keys to be tried; if the poor key actually used is on the list, the opponent wins even if the encryption system is otherwise secure. All methods the opponent dreams up have one thing in common. It is an attempt to recover the original data without advance knowledge of the particular cipher key. There are a wide variety of means available and some will be described later on. The name for any of these methods is called 'cryptanalysis' and the person who does the penetration is called the cryptanalyst. In diagram form (the boxes indicated physically secure areas)-- -------------| -------------- Sender | | Receiver data "x" | | cipher key cipher key |-------> y ----->| y=Encrypt( | | | x=Decrypt(y,key) x,key) | | | received data "x" -------------| | |------------- V Opponent z = Cryptanalysis(y,Encrypt Algorithm, general knowledge of x, guesses about secret key, statistical analysis of y) The opponent uses Cryptanalysis of message y until the value z is either equal to x or z is "enough" like x to accomplish the illicit purpose. The sender and receiver win whenever recovery of z takes too much time. Q2: I have invented this wonderful, fast-running encryption algorithm. How do I find out if it is as great as I think it is? It is one thousand times easier to invent an encryption algorithm than it is to discover if it is worthwhile. Most designers who have not learned cryptography are used to dealing with mathematics that discusses the general case. But, successful cryptanalysis often relies on any number of fortuitous special cases that the designer must anticipate lest a given key and data stream create even one of them. Many practical illicit decryptions astonish the newcomer; they seem like cheating, but they do work. It is easy to get superficial reassurance that a poor encryption algorithm seems good. Most people reading this file have probably attempted the kinds of cryptograms one finds in newspapers and puzzle books (usually called 'cryptograms'). That encryption algorithm is simple -- one replaces each letter of the alphabet with exactly one other letter of the alphabet. In less than an hour, sixth graders have been taught to successfully solve this kind of cipher. Yet, it has 26 factorial possible keys (about 2 to the 88th power), which is much more than the 2 to the 56th keys of the well known commercial algorithm, DES. A large number of keys is important, but is not by itself secure. Obviously, the successful sixth graders do not attempt all possible keys. They use their general knowledge of English to shortcut the process and eliminate all but a few possible keys. Since the gross mathematical properties of an encryption system prove nothing, only cryptanalytic attacks matter and these require some study. Q3: What is an "attack"? An attack is a particular form of cryptanalysis. There are generic types of attacks, and some very specific attacks. In the end, cryptography is a war of specifics. The opponent will either invent a very clever and unique attack or will customize a general attack to suit the needs at hand. Some attacks might even exploit the properties of a particular message or settle for a partial illicit decryption. However, in sci.crypt, there is a great deal of discussion about attacks, both general and specific, and an assumption that the reader can fill in missing details at times. Since those who post here usually have other things to do, to-the-bit details are often omitted. Q4: Hmm. In spite of questions 2 and 3, I still think I have a pretty good system. But, it seems pretty hard to know if it can really defeat an expert. Still, I know it will work if I can keep my method secret, right? Good luck. There are documented cases aplenty where an encryption system was deduced based entirely on clever analysis of the encrypted text alone. There are also known cases where encryption systems were deduced because the encrypted text was later published verbatim somewhere (for instance, a press release) and so the system was figured out, eliminating the presumed secrecy advantage for the next cipher text. Q5: What are the principal cryptanalytic attacks? The first is "cipher text only". This is recovering the text or the key by analysis of the text (using statistical means, for instance) and by knowing broad details such as whether it is the text of someone's speech, a PC-DOS EXE file, whether text is in English or French. For non-puzzle examples, such broad information is almost always reliably known. People have some idea of what they wish to steal. The weakest systems fall to this form of attack. The next attacks (which blur together in practice) are "probable word" and "known plaintext". If one works with a newspaper cryptogram, one may have little idea of what is in the text. However, if one was illicitly trying to decrypt the source code of Megabarfoocorp's C++ compiler, one would be much better off. There would be lots of things one could confidently expect, such as long strings of the space character, strings like "if (" and "if(" and the like. There would even be whole phrases like "Copyright 1990, Megabarfoocorp International" or some such. With imagination, surprisingly long strings can be predicted. Computers can tirelessly try a number of trivial variations of such known text at a great many locations. Knowledge of the encryption system may reveal the correct placement outright or a small number of places to try. A wide variety of systems can be broken if enough plaintext can be successfully matched to its ciphertext. "Known plaintext" means that the text can be exactly placed, or placed with trivial analysis of the text; "probable word" implies trying out a likely plain text in many places and determining which, if any, work. The next attack is "chosen plaintext". In some situations, it is possible for the opponent to pose as the legitimate user of the encryption system. This is especially true in "public key" systems (described later). In this case, the opponent can present fairly arbitrary unencrypted data of his/her choosing, cause it to be encrypted, and study the results. Very few systems ever invented pass this test, but it should be seriously considered for any significant use. Why? No designer can dream up all attacks. Moreover, at some point, a form of known plaintext attack may well enough approximate a chosen plaintext attack to make it worthwhle for the designer to allow for chosen plaintext to begin with, especially as the known plaintext attack that works might not be dreamed up by the designer. Planning for chosen plaintext is a rough and ready way of plugging that hole. There are other attack strategies. One worth mentioning for telecommunications applications is the "replay" attack. Suppose one has an Automatic Teller Machine (ATM) which uses a substitution cipher. Since one assumes the telephone line to the ATM can be tapped (why encrypt if it cannot?), one can also assume the opponent can _inject_ false ciphertext. Thus, without even being aware of the actual system, an opponent may be able to simply replay old ciphertext and get the cash drawer to give him/her $50 from your account. There are encryption systems which avoid this difficulty. Another general form of attack often not considered by newcomers is comparing multiple messages using the same key. It is impractical to use a different key for each cipher text (with one important exception called the 'one time pad'). Therefore, an opponent will have several different texts encrypted in the same key and may be able to exploit this fact. All transpositions algorithms (those which merely scramble the order of the bytes) are vulnerable to this attack. More sophisticated systems are also vulnerable to this attack in some cases as well. This is a vast area and one of the things that is difficult, even for experienced designers, is anticipation of all possible attacks. Moreover, there is no obligation on the attacker's part to be less mathematically sophisticated than the designer. A system that survives the attacks the designer invents may fall easily to a mathematical approach the designer of the system is unfamiliar with. And, one even has to worry about items like a rare bug in the program that injects the cipher key rather than the cipher text into the output stream. It is amazing how often trifling errors in the implementation make theory irrelevant. Q6: What does make a system 'good'? What makes a system 'good' relies on many details specific to a given situation. Only after a lot of ingenious attacks are tried can a system be released for use. There never can be any absolute guarantees. All that can be said is that it defeated the best experts available. The opponent may be smarter. However, there are some agreed-to minimums that a good system must have to even be worth serious analysis -- 1) It must be suitable for computer use. Ordinary byte streams (as arbitrary as possible) would be the input "plain" text and byte streams would be the output "cipher" text. There should be few cases where some kinds of input text produces poor results and these, if they exist, should be easily known, described, and avoided. 2) To be cost-competitive, it must produce about the same number of output cipher bytes as input plain bytes. Relaxing this restriction is not as helpful as one might think; the best historical systems meet this restriction, so a new system must meet it also. 3) Output bytes should have a complex relationship between the key, the plain text being encrypted, and possibly some amount of text previously encrypted. "Simple", general methods, such as creation/construction of minterm sums and matrix inversions should not produce the cipher key or an equivalent, usable illicit decryption method. 4) Trying all keys must not be practical. Trying a cleverly ordered subset of the keys must not work. This is hard to achieve; it means that the failure of a particular key X to illicitly decrypt must not also allow an opponent to conclude that some large class of keys "similar" to X need not be tried. For many systems, all potential keys are equally strong. For some systems, some keys may be stronger than others. In either case, there must be a very large number of keys that give the encryption system its full strength and such keys must be easily discovered and easily proven to be fully strong. Weaker keys must be readily identified and avoided. 5) Assume all details about the encryption algorithm are known. Relying on a secret method has failed repeatedly. It is prudent to assume only the variable part of the system, called the cipher key, that is selected by the customer, is unknown. 6) Classical attacks must be tried in great variety and ingenuity. Details of this are beyond the scope of this file. For a summary of the principal attacks, see Question 5, "What are the principal cryptanalytic attacks?". It is easy to do this particular step incompletely. Remember, there may be little effective limit to the resources or the brainpower of one's opponent. The data may be very valuable and it make take only a couple of days rental of the right kind of consultant and a supercomputer to get the job done. The legitimate user will, by contrast, have a smaller budget that is begrudged as "overhead". 7) Performance must be competitive with existing solutions. A practical problem is that every moment spent encrypting is regarded as "overhead". Therefore, the method must not be uncompetitive with existing algorithms regarded as secure. Inventing one's own system is an interesting pastime and quite educational. However, any hope of really securing data requires, at the very minimum, mastery of illicit decipherment. It is very easy to scramble data impressively and please oneself. This is not the same as keeping it actually secure. Q7: Ok, you guys seem to know a lot about this stuff. I think I have a neat system. Here's a message I encrypted. Tell me if the system is any good. Oh, and can I keep my algorithm secret? I think I want to patent it, so Q5 does not apply in my case. Most people read and understand questions 5 and 6 and their implications. But, a few individuals, because of what they invented, believe they have an exception of some kind. If that is you, you're setting yourself up for disappointment. Even if you are stone cold right about what you have invented, you are not going about proving it properly. The main issue is a mindset issue. Anyalysing a cipher is not like a proposition in geometry or the denoument of a mystery novel. One's intuition about proofs may not hold for cryptography. Finding out if a cipher is any good is perhaps most like debugging a program. Just as you can never be sure you got all of the bugs out, you can never be sure you have a cipher that will withstand all attacks an opponent might come up with. So, even if you do publish some form of challenge, and even if the posters of sci.crypt attack it vigorously, it may prove nothing. Second, you may not be in a position to form a good, sound test. Beginners who get this far are peeved that they are asked to reveal their encryption algorithm. They may also be asked to reveal whole paragraphs of cipher text or to encrypt certain texts in a secret key and give the answer back. All these things seem like cheating. The answer is: real opponents _do_ cheat. Unlike those who post here, they are not above burglarlizing your home to get a copy of your source code, if that is cheaper than hiring experts by the hour, to give a relevant example. Whatever we ask for will represent a close analogy of a real-world attack. If you are still convinced you have a good system, by all means go out and try to patent it; you are not legally obliged to ask our help, after all. But history is against you; against you so much that you will find few people here that are willing to trust your definition of a good test of a cipher. There is no 'royal road' to cryptography. The best thing to do is take a couple of months and seriously study cryptanalysis. Q8: What are the legal restrictions on cryptography? You'd have to ask a lawyer. Most governments consider cryptography a sensitive topic for one reason or another. There are a variety of restrictions possible. Most governments don't give their reasons publically, so one may not know why there are restrictions, just that there are restrictions to follow. One can expect to find laws about sending encrypted data over national borders and may often expect to find laws about the import and export of encryption systems. Since sending data over Internet, Bitnet, Usenet, Fidonet, etc. may cross national borders (even if the originator does not realize it), a basic familiarity with these laws is recommended before sending out encryption systems or encrypted data. Additionally, some networks (e.g. Fidonet) do not currently wish encrypted data to be sent over their networks. It is (at least) common courtesy to find out your network host's attitude toward encrypted data and also where it flows. Your host may not know the law; apathy in this area is common. But you may be lgeally called to account for their ignornace. Q9: What is a public key system? A public key system is an encryption method with a unique property -- encryption and decryption use different keys and one of those keys can be published freely. Being able to pass around the 'decrypt' key part of one's encryption algorithm allows some very interesting things to happen. For one thing, messages can be exchanged by people who had not planned on doing so in advance. One merely encrypts in one's private key, decrypts using the receiver's public key and passes on the result to the receiver. The receiver encrypts in his/her own private key, then decrypts using the sender's public key, recovering the message. Q10: What is key distribution? Key distribution is the practical problem of exchanging keys between the parties that wish to set up an encryption system between the two of them. Especially in a network environment, passing keys one can trust back and forth, can be difficult. How can one be sure a cipher key was not sent by an imposter? Unless the keys are exchanged in a secret, secure place, face-to-face, getting keys securely exchanged and with knowledge of the fact that the key was sent authentically can be difficult. Yet, any practical system must permit reasonably convenient, but very secure exchange of cipher keys. Once a few special keys are securely exchanged, it may be possible to send new cipher keys in encrypted form between the sender and receiver that have a known lifetime. That is, the cipher key is sent in a special encrypted message using a special key used only for exchanging keys. In telecommunications environments, this allows frequent change of keys between the parties 'safely' over the same insecure medium used to send the cipher text. While this idea is at the heart of much commercial use of cryptography, it is not easily accomplished and less easily summarized. Q11: What is the 'one time pad'? The 'one time pad' is an encryption method that is known to be absolutely, provably secure. How it works is as follows -- 1. Generate a huge number of bits using a naturally random process. 2. Both sides exchange this data, which is as much data as they are going to exchange later on. 3. Exclusive OR the original text, bit by bit, with the 'one time pad' data, never reusing the 'one time pad' data. 4. Have elaborate rules to keep the two sides in synch so that the data can be recovered reliably by the receiver. (Both sides must know where they are in terms of how much 'one time pad' has been consumed). Note that only genuine, naturally random processes will do. There must be no relationship between any prior bit of the 'one time pad' and a future bit of the key. "Random number generators", in particular, may not substitute and still be a 'one time pad'. The reason the one time pad works is precisely because there is no relationship between any bits of the key stream. All cipher keys are equally probable. All original data messages are equally probable. There is no 'hat' to hang analysis upon. Even if one can inject as much text into a one time pad as one wishes, recovering the key stream tells nothing about the next message. Unfortunately, one time pads are very ungainly, so they are not typically used. The requirement to have a genuinely random process, with the right kind of statistical probability, is not easy to to set up. The ability to exchange vast amounts of data, securely, in advance, is not easy to achieve in environments when encryption is needed in the first place. There are a variety of cipher systems which generate "pseudo one time pad" streams of cipher key, but all have the same theoretical vulnerability; any algorithmic process introduces relationships between some old key bit(s) and the new key bit and so permits cryptanalysis. "Random number generators" are frequently dreamed up by newcomers as a "pseudo one time pad", but they are notoriously vulnerable to analysis, all independent of whether the pseudo-random stream satisfies randomness tests or not. The favorite example is a "standard" pseudo-random number generator of the form x = ((A*x) +C) mod M where A, C, and M are fixed and x is the most recent value used to form the last "random" number. Thus, the key of the cipher is the initial value of x. For this example, set M to 2 to the 32nd. Now, if one can predict or simply find the word "the " (the word "the" followed by a space character) on a even four byte boundary in the file, one can recover an old value of "x" and predict the rest of the keystream from that point, which may be enough of a "break" to accomplish the purpose. This is true even if the particular A, C, and M perfectly satisfy any randomness test that anyone ever devised. Naturally, this example can be improved upon, but it illustrates the potential problem all such methods have. Q12: What is the NSA (National Security Agency)? The NSA has several tasks. The most relevant here is that it employs the United States' government's cryptographers. Most nations have some department that handles cryptography for it, but the US' NSA tends to draw the most attention. It is considered equal to or superior to any such department in the world. It keeps an extremely low public profile, and has a "large", but secret budget. Because of these and other factors, there will be many posts speculating about the activities and motives of the NSA. Q13: What is the American Cryptogram Association? American Cryptogram Association Information, Sept 1992 The American Cryptogram Association is an international group of individuals who study cryptography together and publish puzzle ciphers to challenge each other and get practical experience in solving ciphers. It is a nonprofit group. The American Cryptogram Association (ACA) publishes the bi-monthly magazine, "The Cryptogram", which contains a wide variety of simple substitution ciphers ("cryptograms") in English and other languages as well as cryptograms using cipher systems of historical interest (such as Playfair). The level of difficulty varies from easy to difficult. Except for "foreign language" cryptograms, all text is in English. The magazine also features "how to" articles at the hobbyist level and other features of interest to members. A "Computer Supplement" is also available which features articles on computerizing various phases of cryptogram solving; the level of the articles varies from simple programming examples to automatic algorithmic solutions of various cipher systems. The supplement is available as a separate subscription, and is published when material permits; usually two or three times per year. Where to write for subscription or other information -- ACA Treasurer 18789 West Hickory St Mundelein IL 60060 Q14: What are some good books on cryptography? Good books on this topic that weren't government classified used to be rare. There are now a host of good books. One informal test of a library's quality is how many of these are on the shelves. These are widely available, but few libraries have them all. David Kahn, The Codebreakers, Macmillan, 1967 [history; excellent] H. F. Gaines, Cryptanalysis, Dover, 1956 [originally 1939, as Elementary Cryptanalysis] Abraham Sinkov, Elementary Cryptanalysis, Math. Assoc. of Amer., 1966 D Denning, Cryptography and Data Security, Addison-Wesley, 1983 Alan G. Konheim, Cryptography: A Primer, Wiley-Interscience, 1981 Meyer and Matyas, Cryptography: A New Dimension in Computer Data Security, John Wiley & Sons, 1982. Books can be ordered from Aegan Park Press. They are not inexpensive, but they are also the only known public source for most of these and other books of historical and analytical interest. From the Aegean Park Press P.O. Box 2837, Laguna Hills, CA 92654-0837 [write for current catalog]. The following is a quality, scholarly journal. Libraries may carry it if they are into high technology or computer science. Cryptologia: a cryptology journal, quarterly since Jan 1977. Cryptologia; Rose-Hulman Institute of Technology; Terre Haute Indiana 47803 [general: systems, analysis, history, ...] Thanks to cme@ellisun.sw.stratus.com (Carl Ellison) Gwyn@BRL.MIL (Doug Gwyn) smb@ulysses.att.com (Steven Bellovin) for assembling this list of books in bibliography form. I knew of each here, but getting all the details is a lot of work. Thanks again. Q15: What are the Beale Ciphers, and are they a hoax? edited article from Jim Gillogly (jim@rand.org): The story in a pamphlet by J. B. Ward (1885) goes: Thomas Jefferson Beale and a party of adventurers accumulated a huge mass of treasure and buried it in Bedford County, Virginia, leaving three ciphers with an innkeeper; the ciphers describe the location, contents, and intended beneficiaries of the treasure. Ward gives a decryption of the second cipher (contents) called B2; it was encrypted as a book cipher using the initial letters of the Declaration of Independence (DOI) as key. A book cipher is described in A. C. Doyle's "The Valley of Fear". The sender and receiver, each sharing a common book, counts the words on a given page. Usually, the first letter of any word on that page is a substitute for that letter. It is represented in the cipher text by its word number on the page. If this paragraph was used, the word "Beale" could be 2 18 1 33 18. It could also be 22 18 7 33 18. Thus, a "book" cipher is a form of "multiple substitution cipher". With enough material, they can be solved by known methods, but book ciphers are not easy for short enough messages, especially if the constructor of the cipher is careful to use all of the possible substitutes available. Holmes, in his successful analysis in "The Valley of Fear", reveals both the strength and weakness of such ciphers. Once the correct book is guessed, the cipher falls apart. However, if the book cannot be guessed, analysis falls to more cumbersome means that are uncertain for short enough texts. B1 and B3 are unsolved; many documents have been tried as the key to B1, which is widely assumed to also be a book cipher. Afficionados can join a group that attempts to solve B1 by various means with an eye toward splitting the treasure: The Beale Cypher Association P.O. Box 236 Warrington, PA 18976 You can get the ciphers from the rec.puzzles FAQL by including the line send index in a message to netlib@peregrine.com and following the directions. Some believe the story is a hoax. Kruh (Cryptologia 12,4 Oct 88) gives a long list of problems with the story. Gillogly (Cryptologia 4,2 Apr 80) decrypted B1 with the DOI and found some unexpected strings, including ABFDEFGHIIJKLMMNOHPP. Hammer (president of the Beale Cypher Association) agrees that this string couldn't appear by chance, but feels there must be an explanation; Gwyn (sci.crypt expert) is unimpressed with this string. Q16. Are there things that are kind of like cryptography, but are not? Should they be discussed here anyway? There are a few things the probably shouldn't be discussed, except in passing: 1) spying and spy paraphenalia. 2) microdots and other things that physically conceal messages; sci.crypt is about algorithms. These things are interesting, but are not the central focus and there are probably other places to discuss them. There are also topics that are not quite cryptography, in the sense of Q1 that are very welcome. These include: 1. using cryptographic algorithms for "digital signatures". 2. using cryptographic algorithms for computer virus detection schemes. 3. using cryptographic algorithms for computer password protection or generation. Q17 What about the political and privacy issues related to cryptography? Political/privacy issues are getting controversial as to whether they belong in sci.crypt. Some who post here are beginning to complain about political content; they wish to keep the focus more on cryptography as a science. Others see no harm in a certain amount of it. You might consider whether your posting better belongs in alt.privacy or some other newsgroup. Cross-postings from such places are increasingly not appreciated. Those you wish to reach probably already monitor the other groups; those who are indifferent will be merely annoyed. There was an active RFD (request for discussion) on splitting sci.crypt into two groups in order to remove the "political" content. The names of the two new groups were under active discussion. For the moment, this seems to have died down as the political content has also died down.