You could probably do it even without special OS support, simply by opening /proc/self/mem and mmapping it straight back into your own address space. Yeah, it kinda makes my head hurt.
I guess it's a kernel, not a "process," but it's still something I have to do.
Also, there's a little catch you have to deal with when calling or gating between TSS segments -- the processor fills in the TSS link field of the switched-to-TSS using the page tables of the switching-from TSS. But when the TSS call returns, it restores the link field using the page tables of the switched-to process. This implies that the TSS segments themselves have to be placed somewhere in memory and mapped to the same virtual address in both of the involved tasks.
It's all so "fun..."
Yes, startup code is kind of special case too. And kernels sometimes map user-space memory into kernel mode too, which is another special case. But it's a good idea not to do that unless there is a GOOD reason for it. [E.g. kernel needs to keep a reference to the user processes memory after the system call has been finished. Since the user can free that memory back to the kernel and then do other things with it if we juse keep the user-space mapping around, there is a reason to map it "again"].
TSS belongs in the kernel - and by the way, most of the time, you don't want to use the x86 task-switching mechanisms. [We're getting away from the subject, but ...]. I have ported/written/worked on several different kernels, and none of them use Task-Gates or Task Switches using the x86 mechanisms except for two special cases:
1. Stack fault.
2. Double fault.
Since both of these tend to happen when "things have gone really wrong", you may need a fresh stack, CR3, etc to be able to continue anything useful at all. Stack-fault can of course happen in user-mode, so you detect that case and continue to a kernel function to grow the stack or kill the user-mode app, whichever makes most sense. But in kernel mode, if you ran out of stack, it's bad news - and normal fault handling don't work well when you have "no workable stack", so Task-gate is the only viable option. Double-fault is either a consequence of completely blowing the stack, or something else horribly wrong, so same thing there.
You may need a TSS for switching user/kernel stack, but that's all it does in the normal case. But in modern processors there are "syscall" or "sysenter" (which is which I can never remember) that avoids that for system calls, which leaves interrupts and traps/exceptions. It may be that you have to support multiple TSS so that multiple threads can be in the kernel at the same time.
For general task-switching, you are much better off just writing the code to switch from one process to another (save all registers on stack, save stack in process control block, and fetch page-table map from the new proces into CR3, then reverse the save-process).
FPU state is lazy-saved "on demand".