I want to write a program where, random numbers are going to be created and I am going to track down the greatest of them. Three threads are going to run in parallel.

I do it with two methods. First I create a variable in main(), which then I pass by ref. to every thread. At the end, this variable holds the maximum value generated. When the variable is updated, I use a mutex (do I really have to?).

The second method uses std::atomic and produces the same results (as far as I tested it).

This is a minor example I do, in order to use in my project, where it is critical that all the threads can see the current best value found by all the threads.

Here is the code:

Code:
#include <iostream>       // std::cout
#include <thread>         // std::thread
#include <mutex>          // std::mutex
#include <atomic>
#include <random>

std::default_random_engine generator((unsigned int)time(0));
int random(int n) {
  std::uniform_int_distribution<int> distribution(0, n);
  return distribution(generator);
}

std::mutex mtx;           // mutex for critical section
std::atomic<int> at_best(0);

void update_cur_best(int& cur_best, int a, int b) {
  // critical section (exclusive access to std::cout signaled by locking mtx):
  if(cur_best > a && cur_best > b)
    return;
  if(at_best > a && at_best > b)
        return;
  int best;
  if(a > b)
    best = a;
  else
    best = b;
  mtx.lock();
  cur_best = best;
  mtx.unlock();


  // or

  if(a > b)
    at_best = a;
  else
    at_best = b;
}

void run(int max, int& best) {
    for(int i = 0; i < 15; ++i) {
        update_cur_best(best, random(max), random(max));
    }
}

//g++ -std=c++0x -pthread px.cpp -o px
int main ()
{
  int best = 0;
  std::thread th1 (run, 100, std::ref(best));
  std::thread th2 (run, 100, std::ref(best));
  std::thread th3 (run, 100, std::ref(best));

  th1.join();
  th2.join();
  th3.join();

  std::cout << "best = " << best << std::endl;
  std::cout << "at_best = " << at_best << std::endl;

  return 0;
}
The questions:

1) Are the two methods equivalent?

From the ref: "Atomic types are types that encapsulate a value whose access is guaranteed to not cause data races and can be used to synchronize memory accesses among different threads."

Are they equivalent by terms of the results they produce and efficiency?

2) If they are, then why is atomic introduced? What method should I use? Speed is what I am interested in.

3) Is there any faster method to achieve this functionality?

Remember that for my actual project, the point is that best will have the current best value from all the threads, in order to make comparison easier.

Probably they are not equivalent. I feel that I am not at the right track. How should I approach the problem then?