Speed Coding 2016 – Q3
20 April 2016, by Chris Arnott
Here’s the third of our speed coding questions.
We have intercepted a transmission from the evil organisation S.P.O.O.K. We need you to decipher it.
We know that they use a Caesar cipher based on RC4:
- the encrypted message and plain text both consist of letters A-Z only, no punctuation or spaces
- they encrypt this with a stream cipher consisting of a sequence of numbers 0-25
- for stream value X and letter Y the output is the letter (X + Y) mod 26
e.g. to encipher letters HELLO with stream 1,2,3,4,5 we would get IGOPT (I=H+1, G=E+2, etc.). We’ll represent the cipher stream as letters too, where A=0, B=1 etc., and note that to decipher we must subtract the corresponding stream, i.e. to decipher IGOPT with 1,2,3,4,5 we subtract one letter from the ‘I’ to get ‘H’, two letters from the ‘G’ to get ‘E’, etc.
|Encipher examples||Decipher examples|
|A||0||A ⇨ A, B ⇨ B, C ⇨ C||A ⇨ A, B ⇨ B, C ⇨ C|
|B||1||A ⇨ B, B ⇨ C, C ⇨ D||A ⇨ Z, B ⇨ A, C ⇨ B|
|C||2||A ⇨ C, B ⇨ D, C ⇨ E||A ⇨ Y, B ⇨ Z, C ⇨ A|
|D||3||A ⇨ D, B ⇨ E, C ⇨ F||A ⇨ X, B ⇨ Y, C ⇨ Z|
|E||4||A ⇨ E, B ⇨ F, C ⇨ G||A ⇨ W, B ⇨ X, C ⇨ Y|
Fortunately S.P.O.O.K.’s encryption has a bug in its key scheduling function and we can guess roughly where in the huge (26! x 26^2) possible cipher stream they’ll be using today. We’ve provided
- the range of cipher stream we believe they’re using today, as a million characters A-Z where A = 0 (i.e. A enciphers to A, B to B), B = 1 (i.e. A enciphers to B, B to C) etc.
- the encrypted message we’ve intercepted, 74 characters long
- a table of English letter frequencies (A-Z) from ‘Cryptological Mathematics’ via Wikipedia
0.08167, 0.01492, 0.02782, 0.04253, 0.12702, 0.02228, 0.02015, 0.06094, 0.06966, 0.00153, 0.00772, 0.04025, 0.02406, 0.06749, 0.07507, 0.01929, 0.00095, 0.05987, 0.06327, 0.09056, 0.02758, 0.00978, 0.02360, 0.00150, 0.01974, 0.00074
We know the message is encrypted with a 74-character section of the cipher stream, but we don’t know the starting point within the stream. We need to decode the message find out the evil S.P.O.O.K. plans.
You’ll need to try every possible starting point in the cipher stream and identify one that produces deciphered text most like English. For that, we’ll use the letter frequencies as an estimate of the entropy provided by each letter
letter entropy = – log (letter frequency)
(ignoring the constants that would be in that) and seek to minimise the total entropy of the decoded message, i.e. the sum of these entropies for every decoded letter. (In practice the decoded text may not be the actual minimum entropy, particularly if it’s short and terse – you may need to manually inspect the best few dozen entropy totals to see which one makes sense. However in this case it is: you only need track the best answer to solve this problem.) Once again note that for decoding you’ll need to subtract the cipher stream from the encrypted text, not add it. The stream does not wrap, i.e. the starting point you’re looking for will be at least 74 characters from the end.
Worked example to help debugging
If we have message “VMPFVPLSSKL” and try to decrypt it with the first eleven characters of the cipher stream “CFHNNXLZOSS” we’ll get plain text “THISISATEST”:
V – C = 21 – 2 = 19 = T
M – F = 12 – 5 = 7 = H
P – H = 15 – 7 = 8 = I
We’ll then sum up the entropies:
T: -ln(0.09056) = 2.402
H: -ln(0.06094) = 2.798; total so far = 5.200
I: -ln(0.06966) = 2.664; total so far = 7.864
for a total entropy estimation of 28.18086 in natural logarithms for this string, or 12.23879 if you’re using logs base 10. (This is the lowest score you’ll find in the first 1000 elements of the cipher stream for this input, but not if you search the first million because this message is relatively short of ‘E’s.)
We’ve recovered a second message but we don’t have a cipher stream to give you for that. You’ll need to implement RC4mod26 yourself. Here’s the algorithm:
- Initialise i = 0, j = 0, S an array of numbers 0-25 in order
- To generate the next element in the stream
- i = i + 1
- j = j + S[i]
- Swap S[i] and S[j]
- Output is S[S[i] + S[j]]
where all addition is done mod 26.
For reference the cipher stream provided for the first part is the first million values generated by this stream. The following message was encrypted with values somewhere in the second million:
(Once again there’s no need to track extra candidate solutions: this is the one that minimises entropy.)