4

I realise this question is somewhere in between Math StackExchange and StackOverflow. So forgive me if this is too much of a practical question, it would probably too theoretical elsewhere.

I am looking for the quickest way to compute the square of a number, modulo a medium-sized Mersenne prime. More specifically, let $P = 2^p - 1$ be a Mersenne prime. For the sake of argument, we can set, e.g., $p = 521$. Let $x$ be an arbitrary number between $1$ and $P$. I want to compute $x^2 \bmod P$.

I need to do this using only 32-bit integers. So, $x$ could for example be encoded in an array of $17$ 32-bit integers, and I'd need another array as output.

How would I go about doing this? I know I could use, e.g., FFT, but for not-that-big numbers it feels like an overkill maybe? I'm thinking Karatsuba, but I'm wondering whether there are shortcuts that either leverage the Mersenne modulo or the fact that it's a square I'm doing, and not a generic multiplication?

Matteo Monti
  • 341
  • 2
  • 9
  • 1
    I think this might be better suited on http://scicomp.stackexchange.com/ or https://cs.stackexchange.com/ (then again, this is definitely not off-topic here...). This would be off-topic on SO though. – Prasun Biswas Jan 28 '21 at 23:40
  • 1
    Might help: https://rosettacode.org/wiki/Talk:Lucas-Lehmer_test#Speeding_things_up If it does, I will make it a full answer if you wish. Old stuff that I wrote on Rosetta Code for a computing task (Lucas-Lehmer test). – Jean-Claude Arbaut Jan 28 '21 at 23:57
  • I assume this is for Lucas-Lehmer. You could make life easy by using Python, which has arbitrary precision integers. ;) https://stackoverflow.com/a/40631767/4014959 Python also has a built-in `pow` function that takes a modulus as an optional 3rd arg, but I didn't bother using it in that code, since computing plain squares is fast enough, unless the numbers have 10,000 (decimal) digits or so. – PM 2Ring Jan 29 '21 at 00:08
  • I *think* there are some relatively simple but useful optimisations here (but I'm getting sleepy, so feel free to ignore me). It's easiest to do arithmetic mod some exact power of 2. Let $$x^2=y+kP$$ for some $k$. Then $$x^2=(y-k)+k(P+1)$$ and $$k=floor(x^2/(P+1))$$ I hope. :) – PM 2Ring Jan 29 '21 at 00:18
  • So that is where I heard about this operation last time!! I had forgotten completely! :) No, I'm not working on an LL test implementation, I'm trying to develop a GPU-friendly Verifiable Delay Function. But I'm happy about the overlap, the folks at Mersenne.org will have made insane optimisations on the topic! – Matteo Monti Jan 29 '21 at 08:10
  • @Jean-ClaudeArbaut thank you so much for the pointer! I was sure that the modulo at least could be heavily optimised! :) I'm wondering what algorithm would be the most suited for the squaring part? – Matteo Monti Jan 29 '21 at 08:14
  • @SSpring thank you! The link you gave me seems to use longhand multiplication, correct? Do you believe that will be the fastest way to perform my multiplication? – Matteo Monti Jan 29 '21 at 08:15
  • @PM2Ring thank you! Indeed, the modulo part can be sped up quite significantly! Now I'm wondering if there's consensus over the fastest way of carrying out the squaring part! – Matteo Monti Jan 29 '21 at 15:17

1 Answers1

1

Here are a few things :

  1. $(n+1)k+r\equiv k+r\pmod n$ so we can take all bits above the $p$-th bit and shift them down by $p$ bits and add them, then repeat until we only have p bits to modularly reduce.
  2. $m=2^i+2^j+2^l+\cdots\implies m^2= m2^i+m2^j+m2^l+\cdots$ so we can shift the value of $m$ to every bit that's 1 and sum those to square. shifting each to modularly reduce like in the first thing from the ( p-th bit of the sum).
Roddy MacPhee
  • 794
  • 1
  • 3
  • 16