I would do the window ("shared memory") approach. The performance should be similar to what you could achieve "by hand" - so if the performance isn't where it needs to be, then a different approach may be needed.

gg