Originally Posted by
MutantJohn
I have one important question. Are things only on the stack put into the cache? Or are allocations on the heap also put in there as well?
Everything in your memory ends up in the cache. That usually means (in PC systems) your stack, heap, BSS, data segment, whatever. Everything. The cache is managed by the hardware. It is weird you do not know this since you are familiar with C. Oh well, guess one doesn't know everything, huh?
Two other things to consider too:
- You don't have control over cache (the hardware manages it, period), and
- Your program is not the only one whose contents will reside in the cache, so don't be greedy.
Originally Posted by
MutantJohn
I have one more very important question. Are graphics cards limited to only having one memory bus? Or are they different?
They are sort of different. All graphics card have onboard high-speed memory. This is built specifically for the graphics card which requires very high bandwidth.
But graphics cards also need to access system memory, and does so via the (nowadays) PCI Express bus, which it (obviously) will have to compete for. The main memory is nowhere as fast as graphics memory, of course.
Originally Posted by
MutantJohn
Elysia, I'm bad at multithreading. Would you be willing to post the code you mentioned? Or perhaps you have an example of code where multithreading is actually a significant speed-up?
Here is my code, if you still want it:
Code:
#include <cmath>
#include <ctime>
#include <fstream>
#include <iostream>
#include <sstream>
#include <thread>
#include <vector>
#include <type_traits>
#include <assert.h>
#include <memory.h>
#include <random>
#include <iterator>
const int CacheLineSize = 64;
struct Buffer
{
Buffer(unsigned int Threads, unsigned int MatrixSize):
BufferPos(0)
{
StreamBuffer.resize(MatrixSize * MatrixSize * 10 / Threads);
}
std::vector<char> StreamBuffer;
size_t BufferPos;
};
template<typename T>
void ConvertStrInPlace(Buffer& _Buffer, T Number)
{
static_assert(sizeof(Number) <= 8, "Integer size must at most 8 bytes. Algorithm not designed to handle bigger integers.");
static_assert(std::is_integral<T>::value, "Number be an integer.");
bool IsNegative = Number < 0;
if (IsNegative)
Number = -Number;
assert(_Buffer.StreamBuffer.size() - _Buffer.BufferPos >= 30); // Make sure there is room for at least 30 characters
char* Buffer = &_Buffer.StreamBuffer[_Buffer.BufferPos];
int CurrentLength = 0;
if (Number == 0)
{
Buffer[0] = '0';
CurrentLength++;
}
while (Number != 0)
{
char c = (Number % 10) + '0';
memmove(&Buffer[1], Buffer, CurrentLength);
Buffer[0] = c;
Number /= 10;
CurrentLength++;
}
if (IsNegative)
{
memmove(&Buffer[1], Buffer, CurrentLength);
Buffer[0] = '-';
CurrentLength++;
}
_Buffer.BufferPos += CurrentLength;
}
void CopyStr(Buffer& _Buffer, const char* String)
{
while (*String)
_Buffer.StreamBuffer[_Buffer.BufferPos++] = *String++;
}
int main(int argc, char** argv)
{
if (argc < 2)
{
std::cout << "Syntax: <Program> <Integer>\nWhere <Integer> is an integer whose value is the NxN size of the matrix to write to the output file.\n";
return 1;
}
std::srand((unsigned int)std::time(nullptr));
std::ofstream OutFile("Output.txt");
if (!OutFile)
{
std::cout << "Unable to open outfile Output.txt. Make sure you have the appropriate permissions to write to the current directory.\n";
return 1;
}
int Tmp2;
std::stringstream Tmp(argv[1]);
if (!(Tmp >> Tmp2) || Tmp2 <= 0)
{
std::cout << "Invalid matrix size. The matrix size must be an integer and greater than 0.\n";
return 1;
}
unsigned int MatrixSize = static_cast<unsigned int>(Tmp2);
const auto NumThreads = std::thread::hardware_concurrency();
const auto ThreadSlice = MatrixSize / NumThreads;
const int RangeMax = 32000;
const int RangeMin = -32000;
std::mt19937 engine;
std::uniform_int_distribution<int> dist(RangeMin, RangeMax);
std::vector<std::thread> Threads;
std::vector<Buffer> StringBuffer;
for (unsigned int i = 0; i < NumThreads; i++)
StringBuffer.emplace_back(NumThreads, MatrixSize);
auto Time1 = std::clock();
ConvertStrInPlace(StringBuffer[0], MatrixSize);
CopyStr(StringBuffer[0], "\n");
for (unsigned int Thread = 0; Thread < NumThreads; Thread++)
{
Threads.emplace_back([Thread, MatrixSize, ThreadSlice, &OutFile, &StringBuffer, &dist, &engine]
{
auto Start = ThreadSlice * Thread;
auto End = (ThreadSlice * (Thread + 1) <= MatrixSize ? ThreadSlice * (Thread + 1) : MatrixSize);
for (decltype(Start) i = Start; i < End; i++)
{
ConvertStrInPlace(StringBuffer[Thread], i);
CopyStr(StringBuffer[Thread], ": ");
for (decltype(MatrixSize) j = 0; j < MatrixSize; j++)
{
auto Val = dist(engine);
auto & Buffer = StringBuffer[Thread];
ConvertStrInPlace(Buffer, Val);
CopyStr(StringBuffer[Thread], " ");
}
CopyStr(StringBuffer[Thread], "\n");
}
});
}
for (auto & Thread : Threads)
Thread.join();
auto Time2 = std::clock();
std::cout << "Took " << (Time2 - Time1) / (CLOCKS_PER_SEC / 1000) << " ms.\n";
Threads.clear();
Time1 = std::clock();
for (const auto & Buffer : StringBuffer)
OutFile.write(&Buffer.StreamBuffer[0], Buffer.BufferPos);
Time2 = std::clock();
std::cout << "Took " << (Time2 - Time1) / (CLOCKS_PER_SEC / 1000) << " ms.\n";
}
It had to make my own string functions since the standard library's string slowed down things considerably.