Flow Simulation Tech Tip: Hardware Benchmarks

This past month I started a transition to new hardware which gave me a chance to compare four different machines side by side. Because I am commonly asked about what hardware is best for flow simulation, I though I would try a few benchmark experiments to see how different combinations of: Memory, CPU and Disk Speed affect solve times. I also wanted to get a feel for scaling by running with a different number of cores.

Hardware: 

Taking a quick look at the hardware, there are two i7 based laptops. The older machine (m6600) has an SSD drive and twice the memory (32Gb). There are also two Xeon based workstations, the older having only 12Gb of memory, both using 7200 rpm drives.

I started out by benchmarking the machines to see the strengths and weakness of each. I used a code called "Passmark" that I found here.

Looking at how the basic hardware setups stack up against one another, the SSD on the older m6600 laptop actually gave it a greater overall performance score than the new m6800 with a standard 7200rpm drive.

Benchmark Model: Because some of the machines are low on memory, I thought it would be good to test them with three differently sized models, since the memory needs on larger models will surely "choke" the older xeon and newer i7 laptop. (12 and 16Gb)

The basic physics of the model include:

- Internal Flow with Natural Convection, Conduction and Radiation. Essentially a hot oven being allowed to cool.

 

I'm using the same model at three different mesh resolutions. I monitored overall system memory during the solve to get a feel for model size/memory needed.

If you want to do performance monitoring and logging, I ran across this useful link during this exercise showing how to use Windows Performance Monitor:

Taking a look at time/iteration and memory needs versus model size; both relationships are pretty linear:

Benchmark Results Largest Model with and without Hyperthreading: For all of our plots I'm using the average time(s)/iteration over the first 58 iterations. The champ in this case is the 6 core xeon with 32Gb of RAM and 3.5 GHz clock speed. The runner up is the older i7 laptop aided by its memory to best the newer i7 laptop even though the clock speed on the new machine is 2.69v 2.3GHz.

Below is the same plot with the older dual quad core xeon removed since it was hindered by a lack of memory.

Observations:

  • - The 6 core xeon with hyperthreading still showed slight performance increase when solving with 12 cores versus 8.
  • - The m6600 i7 showed only a slight performance increase when solving with 8 cores hyperthreaded versus 4 with hyperthreading off.
  • - With 32Gb of RAM, I suspect the newer m6800 would have been faster even without the SSD.

Benchmark Results Medium Model No Hyperthreading: When running the medium-sized model, none of the machines should be slowed by memory limitations, so it's a better comparison of scaling and clock speed. Observations:

  • - Even with 8 cores, the older dual quad xeon's low clock speed and slower disk was consistently the slowest.
  • - With this smaller model, the newer i7 laptop can take advantage of its faster clock speed and after 2 cores is faster than the older i7.
  • - The 6 core xeon with the highest clock speed is consistently the fastest. 

Taking a look at scaling (x times faster than 1 cpu alone), here we see that the 6 core xeon machine is still getting decent benefits when going from 4 cores to 6 cores, ~2.4x faster to 3.1 x faster than a single core alone.

Benchmark Results Small Model: Lastly, a smaller model was run to compare machines and scaling with hyperthreading enabled. We are looking at time/iteration versus CPUs enabled during solve with hyperthreading. We will break them up by laptops with i7 processors and the workstations with the xeon processors.

Observations Small Model and Hyperthreading:

  • - Speed improrvement after the number of "actual" cores is minimal
  • - For eg 6 cores on the on the 6 core xeon is not appreciably slower than 12 cores

General Comments: The clearest observation I can make from this exercise is that your machine needs to be balanced. There's no sense in having many cores and fast clock speed if you are choking your system by not having enough memory or fast enough access to the memory. We saw that with only 12Gb of memory, the dual quad core xeon is orders of magnitude slower than all the machines in the group.

To get the biggest bang for the buck and based on some of the data here, I would spec out a dream machine as follows.

1. With Flow, you are not charged per CPU and can run up to two jobs at a time. I would start by ensuring that I have enough memory to run 2 large jobs at a time. With 3.7M cells, we were using about 20Gb of memory; assuming I might run two slightly larger jobs, I would go with 64Gb of memory.

2. The scaling on the 6 core machine was still good at 6 cores, so I would want a dual 6 core machine to run both jobs at once.

3. To ensure writing to disk does not impede speed, I would recommend an SSD drive.

Based on those estimations if I was building a new machine in order of preference I would choose these basic configurations:

Gold: Dual 6 core xeons, 64Gb of RAM and an SSD drive. Silver: Dual 4 core xeons, 64Gb of RAM and an SSD drive. Bronze/Mobile: i7, 64Gb of RAM and an SSD drive.

Note that I'm not specifying any single 8 or 12 core xeons due to cost and I'm speculating that cache performance would be faster if I had two CPUs each with their own cache rather than a single CPU using one cache. If you are a hardware guru, feel free to set me straight!

Always be sure the graphics card you are selecting is certified, but it's not going to be crucial for solving the problems; however, with large models it will help in reviewing results. Note that all the machines still have a minimum of 64Gb, since RAM is relatively cheap these days and really contributes to computing quality of life.

In a nutshell remember:

RAM = size of the model CPU/cores = speed of solution

Lastly all machines are running SSD since that will also prevent and read/write bottleneck with the faster CPUs.

This should give some general guidance along with some real world numbers on what affects your flow solve times.

For more training and tutorials on the many 3D CAD Modeling solutions in the SOLIDWORKS family of products and add-ons, please feel free to register for an upcoming webcast or event, or look into our training courses.

About the Author:

David Roccaforte earned a BS and MS in Mechanical Engineering from the University of Michigan-Dearborn. He has been working with Computer-Aided Engineering (CAE) tools since the mid-1990s when he was an engineering coop and later a product engineer with Automotive System Laboratory. Seeing the value that CAE brings to the engineering process inspired him to concentrate on CAE during his graduate studies. While finishing his graduate studies, he worked for Mechanical Dynamics as an engineering intern running vehicle dynamics analysis.

After finishing his graduate studies, David worked in the automotive Industry as an Engineering Analyst with Karmann Technical Development supporting the design of convertible roof systems for North American OEM’s.  From there Roccaforte joined MSC Software, one of the top companies in engineering simulation, where he worked as a Senior Application Engineer until he joined Fisher Unitech in 2010.