Google AdSense

Thursday, March 26, 2015

The resolution by using RDTSC, CPUID and RDTSCP to measure performance

Descriptions

  • According to How to Benchmark Code Execution Times on Intel® IA-32 and IA-64 Instruction Set Architectures.

Downloads

Results

  • Executes 1000 times per loop when turned off most functionalities that affect measurement.
    Loading hello module...
    loop_size:0 >>>> variance(cycles): 3; max_deviation: 8 ;min time: 44
    loop_size:1 >>>> variance(cycles): 3; max_deviation: 28 ;min time: 44
    loop_size:2 >>>> variance(cycles): 3; max_deviation: 12 ;min time: 44
    loop_size:3 >>>> variance(cycles): 5; max_deviation: 40 ;min time: 44
    loop_size:4 >>>> variance(cycles): 4; max_deviation: 32 ;min time: 44
    loop_size:5 >>>> variance(cycles): 5; max_deviation: 32 ;min time: 44
    loop_size:6 >>>> variance(cycles): 6; max_deviation: 48 ;min time: 44
    loop_size:7 >>>> variance(cycles): 1; max_deviation: 32 ;min time: 48
    loop_size:8 >>>> variance(cycles): 4; max_deviation: 20 ;min time: 48
    loop_size:9 >>>> variance(cycles): 7; max_deviation: 48 ;min time: 48
    loop_size:10 >>>> variance(cycles): 5; max_deviation: 32 ;min time: 48
    loop_size:11 >>>> variance(cycles): 10; max_deviation: 84 ;min time: 48
    .........
    .........
    loop_size:994 >>>> variance(cycles): 1922; max_deviation: 1388 ;min time: 2028
    loop_size:995 >>>> variance(cycles): 0; max_deviation: 0 ;min time: 2032
    loop_size:996 >>>> variance(cycles): 1923; max_deviation: 1388 ;min time: 2032
    loop_size:997 >>>> variance(cycles): 0; max_deviation: 0 ;min time: 2036
    loop_size:998 >>>> variance(cycles): 3; max_deviation: 4 ;min time: 2036
    loop_size:999 >>>> variance(cycles): 1815; max_deviation: 1348 ;min time: 2040

    total number of spurious min values = 0
    total variance = 2520492
    absolute max deviation = 1144364
    variance of variances = 17554753199565
    variance of minimum values = 335594

  • Executes 1000000 times per loop when turned off most functionalities that affect measurement.
    Loading hello module...
    loop_size:0 >>>> variance(cycles): 809; max_deviation: 23816 ;min time: 44
    loop_size:1 >>>> variance(cycles): 405; max_deviation: 19300 ;min time: 44
    loop_size:2 >>>> variance(cycles): 41; max_deviation: 4992 ;min time: 44
    loop_size:3 >>>> variance(cycles): 13; max_deviation: 1920 ;min time: 44
    loop_size:4 >>>> variance(cycles): 6300; max_deviation: 65320 ;min time: 44
    loop_size:5 >>>> variance(cycles): 378; max_deviation: 19012 ;min time: 44
    loop_size:6 >>>> variance(cycles): 2512; max_deviation: 46956 ;min time: 44
    loop_size:7 >>>> variance(cycles): 14308; max_deviation: 109424 ;min time: 48
    loop_size:8 >>>> variance(cycles): 128449; max_deviation: 357728 ;min time: 48
    loop_size:9 >>>> variance(cycles): 1696; max_deviation: 40980 ;min time: 48
    loop_size:10 >>>> variance(cycles): 834; max_deviation: 22336 ;min time: 48
    loop_size:11 >>>> variance(cycles): 4143; max_deviation: 63780 ;min time: 48
    .........
    .........
    loop_size:994 >>>> variance(cycles): 914214; max_deviation: 668016 ;min time: 2028
    loop_size:995 >>>> variance(cycles): 1596810; max_deviation: 728892 ;min time: 2032
    loop_size:996 >>>> variance(cycles): 1775690; max_deviation: 866988 ;min time: 2032
    loop_size:997 >>>> variance(cycles): 2589904; max_deviation: 984516 ;min time: 2036
    loop_size:998 >>>> variance(cycles): 957907; max_deviation: 677884 ;min time: 2036
    loop_size:999 >>>> variance(cycles): 1254143; max_deviation: 748936 ;min time: 2040

    total number of spurious min values = 4
    total variance = 2631291
    absolute max deviation = 246593400
    variance of variances = 17487031211352
    variance of minimum values = 335929

  • Executes 1000000 times per loop when turned on most functionalities that affect measurement.
    Loading hello module...
    loop_size:0 >>>> variance(cycles): 2425; max_deviation: 49056 ;min time: 42
    loop_size:1 >>>> variance(cycles): 20; max_deviation: 3444 ;min time: 42
    loop_size:2 >>>> variance(cycles): 26; max_deviation: 2697 ;min time: 42
    loop_size:3 >>>> variance(cycles): 97; max_deviation: 4395 ;min time: 42
    loop_size:4 >>>> variance(cycles): 40; max_deviation: 2826 ;min time: 42
    loop_size:5 >>>> variance(cycles): 1437; max_deviation: 27309 ;min time: 42
    loop_size:6 >>>> variance(cycles): 30; max_deviation: 2802 ;min time: 42
    loop_size:7 >>>> variance(cycles): 6; max_deviation: 2541 ;min time: 42
    loop_size:8 >>>> variance(cycles): 13; max_deviation: 2433 ;min time: 45
    loop_size:9 >>>> variance(cycles): 60; max_deviation: 3594 ;min time: 42
    loop_size:10 >>>> variance(cycles): 35; max_deviation: 2661 ;min time: 45
    loop_size:11 >>>> variance(cycles): 31; max_deviation: 3534 ;min time: 45
    .........
    .........
    loop_size:994 >>>> variance(cycles): 32588; max_deviation: 46620 ;min time: 1935
    loop_size:995 >>>> variance(cycles): 11208; max_deviation: 22932 ;min time: 1935
    loop_size:996 >>>> variance(cycles): 9178; max_deviation: 15753 ;min time: 1938
    loop_size:997 >>>> variance(cycles): 11525; max_deviation: 55938 ;min time: 1938
    loop_size:998 >>>> variance(cycles): 62386; max_deviation: 229224 ;min time: 1941
    loop_size:999 >>>> variance(cycles): 7847; max_deviation: 6255 ;min time: 1944

    total number of spurious min values = 4
    total variance = 103398
    absolute max deviation = 1191852
    variance of variances = 132114077117
    variance of minimum values = 306145

Comments

  • The results for 1000 times per loop and 1000000 times per loop are mostly equal, which means that it doesn't need too many times to get the acceptable results. The results that turned on every functionalities are a little smaller than turned off, which maybe because of multiple core or turbo mode. I don't know if there are problems using RDTSC on multi-core processor, the results looks normal, i mean, the more instructions you run, the more time it consumes. The measurements are made after reboot so the results are better. It was my impression that the total number of spurious min values is about 100 before reboot and the good news is that the deviations are less than 20. It won't affect me since i need only the relative performances, not absolute ones.
  •  0000000000000000 <measured_loop>:
       0: 31 c0                 xor    eax,eax
       2: 85 c9                 test   ecx,ecx
       4: 74 17                 je     1d <measured_loop+0x1d>
       6: 66 2e 0f 1f 84 00 00  nop    WORD PTR cs:[rax+rax*1+0x0]
       d: 00 00 00 
      10: 83 c0 01              add    eax,0x1
      13: c7 02 01 00 00 00     mov    DWORD PTR [rdx],0x1
      19: 39 c8                 cmp    eax,ecx
      1b: 75 f3                 jne    10 <measured_loop+0x10>
      1d: f3 c3                 repz ret 
      1f: 90                    nop
    
  • The instructions in the loop are add, mov, cmp, jne according to the assembly code. The loop cost can be calculated by the output above, which are 4 cycles per 2 loops after 68th cycle when turning off functionalities and 114 cycles per 65 loops after 112th cycle when turning on functionalities, very stable.

Resources

No comments:

Post a Comment