Logical Cryptanalysis (Or on Econding Crypto Problems into SAT)
The basic idea of the approach is very simple:
- Encode
the input/output relation of a cryptographic algorithm
into a propositional formula. (bit values
of plaintext, ciphertext, and possibly the key (if used)
are encoded into propositional variables and
the formula is true iff the ciphertext
corresponds to the plaintext encrypted with the key).
- Use a SAT solver to find out the solution.
This approach easily captures to symmetric and public key cryptography,
hash function, stream cipher and the like. The difficult part is finding a good encoding that a modern SAT solver can actually crack. We invented logical cryptanalysis in 1999 (See also the full list of papers farther below)
And this was the start of a long path in community
as the typical comment from crypto people I got was "for ever useless"...
well, in this case it is true that nothing is for ever...
New Development - Logical Cryptanalysis a key component of algorithm for findng SHA-1 collisions
A landmark paper reported a way to generate full collisions with SHA-1, one ofthe most used hash algorithm:.
In the paper they used logical cryptanalysis applied to hash function as a key
component of the differential path search (my italic):
[...] but
all these
attempts led to results that not only were unsatisfactory but that even
threatened the feasibility of the second near-collision attack.[...]
Our final solution was to encode this problem into a satisfiability (SAT)
problem and use a SAT solver to find a drop-in replacement differential path
over the first eight steps that is solvable.
More specifically, we adapted the SHA-1 SAT system
generator from Nossum.
Vegard Nossum was a student at the University of Oslo I co-supervised
and the work of his at the basis of the collision is well described
in his thesis:
Chapeau to Marc, Elie, their team and, last but not least, Vegard.
If you are interested in the history of the field you might continue reading here.
We had some follow-up in SAT Community (where CryptoSAT is a well
known benchmark - See also
Mate Soos' Blog) but the symmetric and
public key approach didn't make much progress.
However, some researchers started using it for finding collisions on
the MD4 and MD5 hash functions
My Old, Initial Encoding of Crypto into SAT
In what now looks the beginning of a field
(but not long ago looked pretty much a dead end)
I worked on two problems:
- Symmetric crypto: namely the encoding of the (former) Data Encryption Standard
- Asymmetric crypto: namely the generation of signatures of RSA with
a fixed public exponent 3 (recommended in a former version of the PKCS
signature standard).
Main References
- F. MASSACCI Using walk-SAT and rel-sat for cryptographic key search. In Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI’99) (1999), T. Dean, Ed., Morgan Kaufmann.
http://www.ing.unitn.it/~massacci/papers/mass-99-IJCAI.pdf
- F. MASSACCI and L. MARRARO. Logical cryptanalysis as a SAT-problem: Encoding and analysis of the U.SS. Data Encryption Standard. Journal of Automated Reasoning 24(1-2), (2000). (Preliminary versions in Third LICS Workshop on Formal Methods and Security Protocols (July 1999), NMR-2000, SAT-2000: Highlights of Satisfiability Research at the Year 2000)
http://www.ing.unitn.it/~massacci/papers/mass-marr-00-JAR.pdf
- C. FIORINI, E. MARTINELLI, and F. MASSACCI. How to fake an RSA signature by encoding modular root finding as a SAT problem. Discrete Applied Mathematics 130(2), (2003). (Preliminary version at MFPS-99)
http://www.ing.unitn.it/~massacci/papers/fior-mart-mass-03-DAM.pdf
Data Encryption Standard
The U.S.A. Data Encryption Standard, a Feistel type block-cipher, has been encoded..
If you are not familiar on the work of DES, you can find background
information in any good handbook of cryptography. If you are more interested
just in the encoding, then our papers give you
enough background information on the problem. The main thing you need to know
is that this cipher works by ``repeating'' an operation (called round) a
suitable number of times. The task of a round is to mix and combine the
input bits with the secret key. Loosely speaking, the higher the number of
rounds, the better. For instance, in the case of DES, after eight rounds all
input bits affect all output bits.
Thus, from the view point of the propositional encoding (for key search),
the higher the number of rounds, the harder the formula. In contrast, the
higher the number of plaintext and ciphertext blocks which are conjoned the
more constrained is the search: after conjoining a suitable number of pairs
there is only one model.
The problems can be of two kind:
- Key Search with a known or unknown plaintext
- The formula encode the input/output relation of one or more blocks of
ciphertexts and known plaintext. If you find a model the first N bits will
give you the value of the secret key used for encryption
- Verification of cryptographic properties
- The formula encodes some good property of a cryptographic algorithm
(e.g. there is no universal key which can decrypt any message, or there are no
two keys that can produce the same plaintext/ciphetext pair). You must prove
that the formula is valid.
The original encoder is available as a tar gzipped folder version 0.9. It comes equipped with scripts for automatic data gathering (the script we used for IJCAI/JAR/NMR papers) and supports lots of formats (Human readable, CNF, SMODELs,FOF, TPTP) BUT no longer works with current gcc.
Sami Liedes sliedes AT cc dot hut dot fi a student from Ilkka Niemela HUT TCSlaboratory has provided a patch that also fixes the scripts (patch of version 0.9).
Faking RSA Signatures with Public exponent 3
The RSA algorithm, a cryptographic algorithm based on modular exponentiation, has been encoded.
If you are not familiar with RSA, you can find background
information in any good handbook of cryptography. If you are more interested
just in the encoding, then our papers give you
enough background information on the problem. The main thing you need to know
is that this cipher works by suitably generating a triple of numbers $n,d,e$ such that the following equation holds
(MD)E = M (modulo N)
The pair (E,N) is the public key and the pair (D,N) is the private signing key. Basically to sign a message M≤N you elevate M to the D-th power (modulo N) and that is the signature F (from the italian Firma). when you receive the message signature pair (M,F) one simply checks that the e-th power of f is equal to m (modulo n). A PKCS standard recommended 3 as a possible value for E.
Thus, from the view point of the propositional encoding, we have encoded a relation
(FE) = M (modulo N)
where M is given and E=3. So any propositional solution of the encoding yields a fake (but valid) signature of M. Mathematically speaking this is called a cubic-residuosity problem.
The problems can be of two kind:
- signature falsification
- When the triple (E,D,N) is generated according the RSA algorithm there is always a solution to the equation given above and so we have the problem of finding the solution to problem whose satisfiability status is known (yes)
- solution of cubic-residuosity
- If the triple (E,D,N) is arbitrary the equation below may not have a solution for all values of M. Then the problem may be satisfiable (M is a cubic residue modulo N) or unsatisfiable (M is not a cubic residue). For example for N=35, M=2 is not a cubic residue, whereas M=6 is a cubic residue. Smodels takes few seconds to prove or disprove the cubic residuosity of all numbers from M<35.
The encoder is available as a tar gzipped folder version 1.0 (also in the version used for the DAM paper here). The good news is that it comes equipped with scripts for automatic data gathering (These are the script we used for the DAM) and (in comparison with DES) that it still works! The bad news is that it the 1.0 (which has some basic English Instructions) only supports the Heerhugo format (basically a human readable formula). Translating it to other formats is easily done via scripts.