This blog shows how timing measurements for C code can be made accurately. The output is in clock cycles.
1. The timing is measured by using the 'RDTSC' instruction, which returns the processors timestamp value. This value is maintained in a register and incremented at every clock cycle.
2. However, due to the out-of-order execution of the processor, it is not certain when the RDTSC instruction is actually executed. Therefore, we flush the processor's pipeline before and after invoking RDTSC. This ensures that every measurement begins in the same state. The flushing of the pipeline is done by the CPUID instruction.
The code snippet:
unsigned int timestamp(void)
{
unsigned int bottom;
unsigned int top;
asm volatile("xorl %%eax,%%eax\n cpuid \n" ::: "%eax", "%ebx", "%ecx", "%edx"); // flush pipeline
asm volatile("rdtsc\n" : "=a" (bottom), "=d" (top) ); // read rdtsc
asm volatile("xorl %%eax,%%eax\n cpuid \n" ::: "%eax", "%ebx", "%ecx", "%edx"); // flush pipeline again
return bottom;
}
int main()
{
unsigned int t1, t2, to1, to2, tover;
/* Measure the overhead first */
to1 = timestamp();
to2 = timestamp();
tover = to2 - to1;
t1 = timestamp();
/**** Code to be measured goes here *****/
t2 = timestamp();
printf("Time Taken (in clock cycles) : %ul\n", (t2 - t1 - tover));
}