I ran across an interesting article on Multi-Threaded File I/O in Dr. Dobb’s today. You can read the article at http://www.ddj.com/hpc-high-performance-computing/220300055
I was particularly intrigued by the statements on variability,
I repeated the entire test suite three times. The values I present here are the average of the three runs. The standard deviation in most cases did not exceed 10-20%. All tests have been also run three times with reboots after every run, so that no file was accessed from cache.
Initially, I thought 10-20% was a bit much; this seemed like a relatively straightforward test and variability should be low. Then I looked at the source code for the test and I’m now even more puzzled about the variability.
Get a copy of the sources here. It is a single source file and in the only case of randomization, it uses rand() to get a location into the file.
The code to do the random seek is below
if(RandomCount) { // Seek new position for Random access if(i >= maxCount) break; long pos = (rand() * fileSize) / RAND_MAX - BlockSize; fseek(file, pos, SEEK_SET); }
While this is a multi-threaded program, I see no calls to srand() anywhere in the program. Just to be sure, I modified Stefan’s program as attached here. (My apologies, the file has an extension of .jpg because I can’t upload a .cpp or .zip onto this free wordpress blog. The file is a Windows ZIP file, just rename it).
/////////////////////////////////////////////////////////////////////////////// // mtRandom.cpp Amrith Kumar 2009 (amrith (dot) kumar (at) gmail (dot) com // This program is adapted from the program FileReadThreads.cpp by Stefan Woerthmueller // No rights reserved. Feel Free to do what ever you like with this code // but don't blame me if the world comes to an end. #include "Windows.h" #include "stdio.h" #include "conio.h" #include #include #include #include /////////////////////////////////////////////////////////////////////////////// // Worker Thread Function /////////////////////////////////////////////////////////////////////////////// DWORD WINAPI threadEntry(LPVOID lpThreadParameter) { int index = (int)lpThreadParameter; FILE * fp; char filename[32]; sprintf ( filename, "file-%d.txt", index ); fprintf ( stderr, "Thread %d startedn", index ); if ((fp = fopen ( filename, "w" )) == (FILE * ) NULL) { fprintf (stderr, "Error opening file %sn", filename ); } else { for (int i = 0; i < 10; i ++) { fprintf ( fp, "%un", rand()); } fclose (fp); } fprintf ( stderr, "Thread %d donen", index ); return 0; } #define MAX_THREADS (5) int main(int argc, char* argv[]) { HANDLE h_workThread[MAX_THREADS]; for(int i = 0; i < MAX_THREADS; i++) { h_workThread[i] = CreateThread(NULL, 0, threadEntry, (LPVOID) i, 0, NULL ); Sleep(1000); } WaitForMultipleObjects(MAX_THREADS, h_workThread, TRUE, INFINITE); printf ( "All done. Good byen" ); return 0; }
So, I confirmed that Stefan will be getting the same sequence of values from rand() over and over again, across reboots.
Why then is he still seeing 10-20% variability? Beats me, something smells here … I would assume that from run to run, there should be very little variability.
Thoughts?