Here are the results of porting the STREAM memory-bandwidth benchmark to Nallatech H101 hardware using the DIME-C compiler tools. FPGAsMaxwellDIMEC describes the actual compilation process.

The timings for the hardware running at 100MHz are:

Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        2948.7014       0.0055       0.0054       0.0055
Scale:       2954.2701       0.0055       0.0054       0.0056
Add:         4420.7964       0.0055       0.0054       0.0055
Triad:       4420.7297       0.0055       0.0054       0.0055

which compares not too badly to the software version running purely on the 2.8GHz Xeons:

Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        4691.9527       0.0034       0.0034       0.0034
Scale:       4412.6311       0.0036       0.0036       0.0037
Add:         5321.5477       0.0045       0.0045       0.0045
Triad:       5388.3006       0.0045       0.0045       0.0045

The Nallatech hardware results in Phase I on mini looked like this:

Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        2319.4962       0.0069       0.0069       0.0069
Scale:         67.1885       0.2382       0.2381       0.2382
Add:           79.8295       0.3007       0.3006       0.3007
Triad:         62.8453       0.3819       0.3819       0.3820

Some of this increase is probably due to the H101 having 4 banks of SRAM: we were able to put each of the 3 STREAMS arrays in its own memory bank, which increases bandwidth and has also simplified the inner loop, perhaps meaning DIME-C is able to pipeline it better.

STREAMSinDIME-C (last edited 2007-10-16 09:41:26 by RobBaxter)