[osg-users] CPU Performance issues with AMD 2700 vs Intel Corei7 4770S

Robert Osfield robert.osfield at gmail.com
Wed Apr 3 05:12:08 PDT 2019

Hi All,

I've now installed 18.04 on my new AMD2700+Geforce2060 system and did
a range of more tests and learnt a few things along the way.

First up I tried out the open source graphics drivers that come with
18.04 and they do a really poor job at supporting the 2060, screen
resolution pegged at 1024x768 and while the OSG compiled and ran just
fine for my small city test model I only get 39fps on it.  I couldn't
work out how to get the vulkan drivers working so didn't do any tests.

Second I installed the NVidia drivers, Ubuntu/Kubuntu now requires a
few more steps than it used a few years back, seems like they have a
strong preference for the open source drivers, but as performance and
support for modern cards really sucks I don't feel this is a great
move.  Once I installed the NVidia drivers I frame rate for the small
city scene and standard path jumped to 368fps at 1920x1020 so way more
than an order of magnitude better, also got my dual monitors work fine

While exploring the different options in the GUI for the displays I
came across the toggle for switching off the compositor.  This used to
be alongside the desktop effects settings GUI, but now moved to the
display settings.  Switching off the compositor suddenly let the hand
brake off and my new system started pushing frame rates higher than my
older Intel+Gefore1060 system.  Curiously the old system had
compositor switched on and didn't see the same capping of framerates
with the VSG/Vulkan.  I don't know whether why this is happening as
they now both have 18.04 installed, perhaps it's hardware, perhaps
it's the later NVidia drivers, I'll look to upgrading the NVIdia
drives on the old system next.   Switching off the compositor on the
Intel system helps the max performance as well, but only 25% rather
than 200% like I saw on the new system.

Now that I've switched off the compositor on the AMD2700+Geforce 2060
system I'm seeing more predictable results between the two systems and
see patterns emerge.

                            Intel Core i7 4770S   AMD Ryzen7 2700
                               Geforce 1060               Geforce 2060
OSG at 1920x1080     484fps                      369fps  (28% slower)
VSG at 1920x1080     2168fps                     2697fps (23% faster)
VSG at 192x108         2712fps                     2842fps (4% faster)

So here we finally see the Geforce 2060 stretch it legs and beat the
1060 thanks to it's better fill rate.

The OSG's slow performance on the AMD chip though more than
overwhelmed the results at is significantly slower. For users that
rely on OSG applications and considering whether to go for an Intel vs
AMD or investing in a new GPU, the Intel is going to be the far more
critical change.

The results show how different approaches I've used in the VSG for
reducing node size and the complexity of traversal, along with Vulkan
there just isn't the same AMD penalty that we see with the OSG,
instead we see the scaling we should expect with upgrading the
graphics hardware.

The difference isn't just down to OpenGL vs Vulkan with the difference
between Intel and AMD, in developing the VSG I wrote two test
programs, osggroups and vsggroups, that both create a quad tree graph
(11 levels deep by default) and traversers it 10x and then destructs
it.  Here we can see like for like on pure CPU scene graph operations.

                            Intel Core i7 4770S   AMD Ryzen7 2700
osgroups                 3.77 secs                  4.91  (30% slower)
vsggroups               0.55secs                   0.55secs - almost identical!

The results with osggroup CPU test mirrors the speed difference in the
osgviewer test with the small city model I've been using, so this
indicates that it's not just down to differences in OpenGL vs Vulkan
that we see differences in performance.

The vsg results being nearly identical doesn't quite tell the full
story. I've run more VSG related tests and find that double dispatch
visitor&traversal vs single dispatch visitor&traversal and find that
the Intel chip sees more penalty with double dispatch than the AMD
chip.  The AMD tests though show that the destruction of the scene
graph is higher than the Intel chip.  Things tend to balance out for
the vsggroups test though, it's more fluke than any important.  The
key take away is that when you use the CPU's more efficiently like the
VSG does compared to OSG the two chips both perform in a over similar
way w.r.t work per cycles.

Are these efficiencies hat I've efficiency with the VSG possible with
the OSG?  Unfortunately not without breaking key features.
osg::Node's are significantly bigger than their vsg::Node counterparts
as the OSG nodes hold more optional data.  The OSG traversal also
checks more settings - like NodeMask or presence of optional StateSet
that can be stored with all Nodes.  The osg::NodeVisitor has more
different options that control it's behavior so adds more work on the
traversal through the scene graph.  All these extra checks and memory
usage cause more cache misses, more branch predication failures and
less parallelism.  To illustrate the difference between OSG and VSG
when I run perf stat I see that the OSG osggroup run achieves a ~0.7
instructions per cycle while the VSG's vsggruoup run achieves ~2.2
instructions per cycle.

To close the gap we'd need to look at getting rid of NodeMask on all
Nodes, changing NodeVisitor to be less flexible moving more
responsibility on the subclasses to do more work.  Such changes would
break a lot of end user applications.


More information about the osg-users mailing list