Multithreaded File I/O (Reflections on Dr. Dobb’s article by Stefan Wörthmüller)

Thoughts on the results that Stefan Wörthmüller reports in his article on Dr. Dobb’s Journal.

I ran across an interesting article on Multi-Threaded File I/O in Dr. Dobb’s today. You can read the article at http://www.ddj.com/hpc-high-performance-computing/220300055

I was particularly intrigued by the statements on variability,

I repeated the entire test suite three times. The values I present here are the average of the three runs. The standard deviation in most cases did not exceed 10-20%. All tests have been also run three times with reboots after every run, so that no file was accessed from cache.

Initially, I thought 10-20% was a bit much; this seemed like a relatively straightforward test and variability should be low. Then I looked at the source code for the test and I’m now even more puzzled about the variability.

Get a copy of the sources here. It is a single source file and in the only case of randomization, it uses rand() to get a location into the file.

The code to do the random seek is below

   if(RandomCount)
   {
      // Seek new position for Random access
      if(i >= maxCount)
         break;
      long pos = (rand() * fileSize) / RAND_MAX - BlockSize;
      fseek(file, pos, SEEK_SET);
   }

While this is a multi-threaded program, I see no calls to srand() anywhere in the program. Just to be sure, I modified Stefan’s program as attached here. (My apologies, the file has an extension of .jpg because I can’t upload a .cpp or .zip onto this free wordpress blog. The file is a Windows ZIP file, just rename it).

///////////////////////////////////////////////////////////////////////////////
// mtRandom.cpp   Amrith Kumar 2009 (amrith (dot) kumar (at) gmail (dot) com
// This program is adapted from the program FileReadThreads.cpp by Stefan Woerthmueller
// No rights reserved. Feel Free to do what ever you like with this code
// but don't blame me if the world comes to an end.

#include "Windows.h"
#include "stdio.h"
#include "conio.h"
#include
#include 

#include
#include 

///////////////////////////////////////////////////////////////////////////////
// Worker Thread Function
///////////////////////////////////////////////////////////////////////////////

DWORD WINAPI threadEntry(LPVOID lpThreadParameter)

{
    int index = (int)lpThreadParameter;
        FILE * fp;
        char filename[32];

        sprintf ( filename, "file-%d.txt", index );

        fprintf ( stderr, "Thread %d startedn", index );
        if ((fp = fopen ( filename, "w" )) == (FILE * ) NULL)
        {
                fprintf (stderr, "Error opening file %sn", filename );
        }
        else
        {
                for (int i = 0; i < 10; i ++)
                {
                        fprintf ( fp, "%un", rand());
                }

                fclose (fp);
        }

        fprintf ( stderr, "Thread %d donen", index );

    return 0;
}

#define MAX_THREADS (5)

int main(int argc, char* argv[])

{
    HANDLE h_workThread[MAX_THREADS];

    for(int i = 0; i < MAX_THREADS; i++)
    {
        h_workThread[i] = CreateThread(NULL, 0, threadEntry, (LPVOID) i, 0, NULL );
        Sleep(1000);
    }

    WaitForMultipleObjects(MAX_THREADS, h_workThread, TRUE, INFINITE);
    printf ( "All done. Good byen" );
    return 0;
}

So, I confirmed that Stefan will be getting the same sequence of values from rand() over and over again, across reboots.

Why then is he still seeing 10-20% variability? Beats me, something smells here … I would assume that from run to run, there should be very little variability.

Thoughts?