Don't bother with assembly; something's wrong with the algorithms you're using. If I can write Ruby code that performs 50 ^ (2 ^ 512) (mod 2^1024) - order calculations almost instantaneously, than using assembly won't help; it will only scale your running time down a little bit.

I can write

Code:

def powmod(b,x,m)
if (x == 0)
return 1
end
tmp = powmod(b, x / 2, m)
if (x % 2 == 1)
return (((tmp * tmp) % m) * b) % m
else
return (tmp * tmp) % m
end
end

This Ruby function computed powmod(123, 2 ** 512 + 12344535, 2 ** 2048 + 1234) and produced

Code:

22813624981075065545135850558173269231510448307866093786909898690881909635625
28026233653165635689972639687049986456799967595172328506910777276889008060872302
35464631832286336920331501492018763317107266447462587900392315073114991959333059
51421717310407289760318099206126989926255863894581434267272318153188897013984781
84378062834515623677409012707914256130537336028576225644897284752852800312488149
65148676160157595427552224160566504933248486744939428838549003370569396501573929
63204417521093027295195029595437349786705582067498498129131203940551923664015160
678987418146095882019961441850108090812140059913087917725917

in less than a second. It's your algorithm that's the problem, and jumping into assembly language would not help.