I ran across an interesting article on Multi-Threaded File I/O in Dr. Dobb’s today. You can read the article at http://www.ddj.com/hpc-high-performance-computing/220300055
I was particularly intrigued by the statements on variability,
I repeated the entire test suite three times. The values I present here are the average of the three runs. The standard deviation in most cases did not exceed 10-20%. All tests have been also run three times with reboots after every run, so that no file was accessed from cache.
Initially, I thought 10-20% was a bit much; this seemed like a relatively straightforward test and variability should be low. Then I looked at the source code for the test and I’m now even more puzzled about the variability.
Get a copy of the sources here. It is a single source file and in the only case of randomization, it uses rand() to get a location into the file.
The code to do the random seek is below
if(RandomCount)
{
// Seek new position for Random access
if(i >= maxCount)
break;
long pos = (rand() * fileSize) / RAND_MAX - BlockSize;
fseek(file, pos, SEEK_SET);
}
While this is a multi-threaded program, I see no calls to srand() anywhere in the program. Just to be sure, I modified Stefan’s program as attached here. (My apologies, the file has an extension of .jpg because I can’t upload a .cpp or .zip onto this free wordpress blog. The file is a Windows ZIP file, just rename it).
///////////////////////////////////////////////////////////////////////////////
// mtRandom.cpp Amrith Kumar 2009 (amrith (dot) kumar (at) gmail (dot) com
// This program is adapted from the program FileReadThreads.cpp by Stefan Woerthmueller
// No rights reserved. Feel Free to do what ever you like with this code
// but don't blame me if the world comes to an end.
#include "Windows.h"
#include "stdio.h"
#include "conio.h"
#include
#include
#include
#include
///////////////////////////////////////////////////////////////////////////////
// Worker Thread Function
///////////////////////////////////////////////////////////////////////////////
DWORD WINAPI threadEntry(LPVOID lpThreadParameter)
{
int index = (int)lpThreadParameter;
FILE * fp;
char filename[32];
sprintf ( filename, "file-%d.txt", index );
fprintf ( stderr, "Thread %d startedn", index );
if ((fp = fopen ( filename, "w" )) == (FILE * ) NULL)
{
fprintf (stderr, "Error opening file %sn", filename );
}
else
{
for (int i = 0; i < 10; i ++)
{
fprintf ( fp, "%un", rand());
}
fclose (fp);
}
fprintf ( stderr, "Thread %d donen", index );
return 0;
}
#define MAX_THREADS (5)
int main(int argc, char* argv[])
{
HANDLE h_workThread[MAX_THREADS];
for(int i = 0; i < MAX_THREADS; i++)
{
h_workThread[i] = CreateThread(NULL, 0, threadEntry, (LPVOID) i, 0, NULL );
Sleep(1000);
}
WaitForMultipleObjects(MAX_THREADS, h_workThread, TRUE, INFINITE);
printf ( "All done. Good byen" );
return 0;
}
So, I confirmed that Stefan will be getting the same sequence of values from rand() over and over again, across reboots.
Why then is he still seeing 10-20% variability? Beats me, something smells here … I would assume that from run to run, there should be very little variability.
Thoughts?