Any reason you can't use a newer version of the compiler? They work with old code, you know.
There is none of those which I am used to (GCC is experimental for example) thus I consider it more portable for myself atm.

1) Use pointers with exclusive ownership and store them in the map. Wait until all threads terminate before clearing the map. if you then populate the map before launching the threads, it should be lock and atomic free. This works if, and only if, you are using read-only and you never change ownership.
Map is placed in a DLL which is used by other DLLs exclusively (if it matters?)

2) shared_ptr with atomic ref counting and locks when writing and reading data. And you still need to take care of the of the racing condition when creating a thread. If you need read-only, then you can skip the lock. You still need atomic ref counting, however.
I decided to stick to this idea, but with another class and lock that can be used also to modify, so shared_ptr can be faster and for local thread usage only:

Code:
class make_threaded {
private:
    mutable refcount_t m_refcount;
    mutable win_critical_section m_lock;
// constructors & destructor
public:
    make_threaded()
        : m_refcount(0)
    {
    }
    make_threaded(const make_threaded& a)
        : m_refcount(0)
    {
    }
    virtual ~make_threaded()
    {
    }
// operators
public:
    make_threaded& operator = (const make_threaded& a)
    {
        return *this;
    }
// inline methods
public:
    inline void add_ref() const
    {
        m_refcount++;
    }
    inline bool release() const
    {
        return (--m_refcount) == 0;
    }
    inline win_critical_section& get_lock() const
    {
        return m_lock;
    }
};