Some Notes on Integer Factorization
A Factorization From “The Art of Computer Programming”
We aim to factor the integer \(N = 2^{214} + 1\). Using the identity \(4x^4+1 = (2x^2+2x+1)(2x^2-2x+1)\), we can break \(N\) up into two factors as
\[N = \underbrace{ (2^{107} - 2^{54} + 1)}_{A}\underbrace{(2^{107} + 2^{54} + 1)}_{B}\]Factoring A
By applying trial division for all primes \(p < 1000\), we have that
\[A = 162259276829213345377179500806145 = 5 \cdot 857 \cdot p_{29}\]where \(p_{29} = 37866809061660057264219253397\) is a 29-digit prime confirmed by the Miller-Rabin test.
Factoring B
We first try to divide out small primes from \(B\) - we have that \(B = 162259276829213381405976519770113 = 843589 \cdot c_{27}\) where \(c_{27} = 192343993140277293096491917\). We know that \(c_{27}\) is composite as we have that \(3^{c_{27}-1} \not\equiv 1 \bmod(c_{27})\) which contradicts Fermat’s little theorem. The Pollard Rho algorithm discovers that \(c_{27}\) factors into two primes: \(c_{27} = 8174912477117 \cdot 23528569104401\)
A Modern Sized Factorization A La Knuth
In a similar approach, we attempt to factor \(N = 2^{432}+1\), a composite with 131 digits.
We can factor the polynomial \(p(x) = x^{432} + 1\) to get an algebraic factorization of \(N\). We have that:
\[N = \underbrace{ (2^{16} + 1) }_{65537}\cdot \underbrace{ (2^{32} - 2^{16} + 1)}_{C_1} \; \cdot \; \underbrace{ (2^{96} - 2^{48} + 1) }_{C_2} \cdot \underbrace{ (2^{288} - 2^{144} + 1)}_{C_3}\]where \(C_1 = 4294901761 = 193 \cdot 22253377\) factors via trial division.
\(C_2 = 79228162514264056118567239681\) is a 29-digit composite with relatively small factors; it can be factored with the GNU factor
utility:
$ factor 79228162514264056118567239681
79228162514264056118567239681: 1153 6337 38941695937 278452876033
and therefore, we have that \(C_2 = 1153\cdot 6337 \cdot 38941695937 \cdot 278452876033\)
With this luck, we hope that \(C_3\) also has a small factor - we use the utility gmp-ecm
, an implementation of the elliptic curve factoring method which is useful for finding small to medium sized prime factors)
$ python3 -c "print(2**288 - 2**144 + 1)" | ecm 100000
GMP-ECM 7.0.5 [configured with GMP 6.3.0, --enable-asm-redc] [ECM]
Input number is 497323236409786642155382248146820840100456128496602518909840835357441224364171869552641 (87 digits)
Using B1=100000, B2=40868532, polynomial x^2, sigma=1:2036516883
Step 1 took 115ms
********** Factor found in step 1: 68016300334849
Found prime factor of 14 digits: 68016300334849
Prime cofactor 7311824282729722035859309520826138827918372038863677678267962656996811009 has 73 digits
While we broke up the factorization into stages, the entire factorization is small enough that it can be done in software packages such as gp
and Sagemath:
$ echo "factorint(2^432+1)" | gp -q
[ 193 1]
[ 1153 1]
[ 6337 1]
[ 65537 1]
[ 22253377 1]
[ 38941695937 1]
[ 278452876033 1]
[ 68016300334849 1]
[7311824282729722035859309520826138827918372038863677678267962656996811009 1]
An Unbreakable Cipher
In the August 1977 issue of Scientific American, Martin Gardener presented an instance oF RSA that was prediced to “take millions of years to break” that involved factoring a 129 digit number.
Installing CADO-NFS
CADO-NFS is an open source implementation of the number field sieve algorithm ; we can install and build it from source:
sudo apt install make cmake g++ gcc libgmp3-dev python3 python3-flask git # Install necessary libraries
git clone https://gitlab.inria.fr/cado-nfs/cado-nfs.git && cd cado-nfs # Get source code for CADO-NFS
make -j8 # Change to however many cores you have
./cado-nfs.py $(python3 -c "print(10**57+1)") # Run a test factorization
Breaking the Cipher
After installing, we can use CADO-NFS to factor the 129-digit composite on a 16 core AWS machine in around 2.5 hours:
$ time ./cado-nfs.py 114381625757888867669235779976146612010218296721242362562561842935706935245733897830597123563958705058989075147599290026879543541
[...]
Info:Complete Factorization / Discrete logarithm: Total cpu/elapsed time for entire Complete Factorization 243582/9509.24 [02:38:29]
3490529510847650949147849619903898133417764638493387843990820577 32769132993266709549961988190834461413177642967992942539798288533
real 149m9.555s
user 346m20.767s
sys 21m6.648s
We can then use these primes to decrypt the original ciphertext given in the original issue of Scientific American:
N = 114381625757888867669235779976146612010218296721242362562561842935706935245733897830597123563958705058989075147599290026879543541
p = 3490529510847650949147849619903898133417764638493387843990820577
assert N % p == 0
q = N // p
c = int("""
9686 9613 7546 2206
1477 1409 2225 4355
8829 0575 9991 1245
7431 9874 6951 2093
0816 2982 2514 5708
3569 3147 6622 8839
8962 8013 3919 9055
1829 9451 5781 5154
""".replace("\n", "").replace(" ", ""))
e = 9007
from string import ascii_lowercase
ALPHABET = " " + ascii_lowercase
pt = str( pow(c, pow(e, -1, (p-1)*(q-1)), N) )
print( "".join( [ ALPHABET[ int( pt[i:i+2] ) ] for i in range(0, len(pt), 2) ] ) )
$ python3 decrypt_message.py
the magic words are squeamish ossifrage
A Historical Note
The original factorization was completed in 1993-1994 via a team lead by Atkins et al
- the factoriztaion used the quadratic sieve algorithm instead of the number field sieve used above and took 825 mips years
- the majority of this time was spent on the sieving step (~220 days over 1600 machines), as in our calculation
- linear algebra took around 100 hours
Time to Factor 512-Bit Number
The technical paper mentioned above mentions that factoring a 512-bit number would take “a couple months” at 500,000 mips - we therefore have a total instruction count of
\[500\cdot10^9 \text{ ops per second} \cdot 4 \text{ months} \approx 5.26 \cdot 10^{18} \text{ ops}\]We can compare this instruction count to a factorization of CADO-NFS on a 512-bit modulus on modern hardware:
\[\frac{3.6 \cdot 10^9 \text{ ops/second}}{\text{thread}} \cdot 12 \text{ threads } \cdot 4 \text{ days} \approx 1.5 \cdot 10^{16} \text{ ops}\]