[osg-users] CPU Performance issues with AMD 2700 vs Intel Corei7 4770S

François Cami fcami at fedoraproject.org
Wed Apr 3 05:54:55 PDT 2019


On Wed, Apr 3, 2019 at 2:12 PM Robert Osfield <robert.osfield at gmail.com> wrote:
>
> Hi All,
>
> I've now installed 18.04 on my new AMD2700+Geforce2060 system and did
> a range of more tests and learnt a few things along the way.
>
> First up I tried out the open source graphics drivers that come with
> 18.04 and they do a really poor job at supporting the 2060, screen
> resolution pegged at 1024x768 and while the OSG compiled and ran just
> fine for my small city test model I only get 39fps on it.  I couldn't
> work out how to get the vulkan drivers working so didn't do any tests.

Just to set expectations there:
* there is no reclocking support for GPUs newer than Maxwell v2 in Nouveau
* Turing support in Nouveau requires fairly recent builds (early 2019)
* there is no acceleration, only llvmpipe.

As all GTX 9xx and newer always run at their slowest clock, your best
bet is always the NVIDIA drivers with these.

> Second I installed the NVidia drivers, Ubuntu/Kubuntu now requires a
> few more steps than it used a few years back, seems like they have a
> strong preference for the open source drivers, but as performance and
> support for modern cards really sucks I don't feel this is a great
> move.

On the AMD GPU side of things the opensource stack is in much better shape.

>  Once I installed the NVidia drivers I frame rate for the small
> city scene and standard path jumped to 368fps at 1920x1020 so way more
> than an order of magnitude better, also got my dual monitors work fine
> too.
>
> While exploring the different options in the GUI for the displays I
> came across the toggle for switching off the compositor.  This used to
> be alongside the desktop effects settings GUI, but now moved to the
> display settings.  Switching off the compositor suddenly let the hand
> brake off and my new system started pushing frame rates higher than my
> older Intel+Gefore1060 system.  Curiously the old system had
> compositor switched on and didn't see the same capping of framerates
> with the VSG/Vulkan.  I don't know whether why this is happening as
> they now both have 18.04 installed, perhaps it's hardware, perhaps
> it's the later NVidia drivers, I'll look to upgrading the NVIdia
> drives on the old system next.   Switching off the compositor on the
> Intel system helps the max performance as well, but only 25% rather
> than 200% like I saw on the new system.
>
> Now that I've switched off the compositor on the AMD2700+Geforce 2060
> system I'm seeing more predictable results between the two systems and
> see patterns emerge.
>
>                             Intel Core i7 4770S   AMD Ryzen7 2700
>                                Geforce 1060               Geforce 2060
> OSG at 1920x1080     484fps                      369fps  (28% slower)
> VSG at 1920x1080     2168fps                     2697fps (23% faster)
> VSG at 192x108         2712fps                     2842fps (4% faster)
>
> So here we finally see the Geforce 2060 stretch it legs and beat the
> 1060 thanks to it's better fill rate.
>
> The OSG's slow performance on the AMD chip though more than
> overwhelmed the results at is significantly slower. For users that
> rely on OSG applications and considering whether to go for an Intel vs
> AMD or investing in a new GPU, the Intel is going to be the far more
> critical change.
>
> The results show how different approaches I've used in the VSG for
> reducing node size and the complexity of traversal, along with Vulkan
> there just isn't the same AMD penalty that we see with the OSG,
> instead we see the scaling we should expect with upgrading the
> graphics hardware.
>
> The difference isn't just down to OpenGL vs Vulkan with the difference
> between Intel and AMD, in developing the VSG I wrote two test
> programs, osggroups and vsggroups, that both create a quad tree graph
> (11 levels deep by default) and traversers it 10x and then destructs
> it.  Here we can see like for like on pure CPU scene graph operations.
>
>                             Intel Core i7 4770S   AMD Ryzen7 2700
> osgroups                 3.77 secs                  4.91  (30% slower)
> vsggroups               0.55secs                   0.55secs - almost identical!
>
> The results with osggroup CPU test mirrors the speed difference in the
> osgviewer test with the small city model I've been using, so this
> indicates that it's not just down to differences in OpenGL vs Vulkan
> that we see differences in performance.
>
> The vsg results being nearly identical doesn't quite tell the full
> story. I've run more VSG related tests and find that double dispatch
> visitor&traversal vs single dispatch visitor&traversal and find that
> the Intel chip sees more penalty with double dispatch than the AMD
> chip.  The AMD tests though show that the destruction of the scene
> graph is higher than the Intel chip.  Things tend to balance out for
> the vsggroups test though, it's more fluke than any important.  The
> key take away is that when you use the CPU's more efficiently like the
> VSG does compared to OSG the two chips both perform in a over similar
> way w.r.t work per cycles.
>
> Are these efficiencies hat I've efficiency with the VSG possible with
> the OSG?  Unfortunately not without breaking key features.
> osg::Node's are significantly bigger than their vsg::Node counterparts
> as the OSG nodes hold more optional data.  The OSG traversal also
> checks more settings - like NodeMask or presence of optional StateSet
> that can be stored with all Nodes.  The osg::NodeVisitor has more
> different options that control it's behavior so adds more work on the
> traversal through the scene graph.  All these extra checks and memory
> usage cause more cache misses, more branch predication failures and
> less parallelism.  To illustrate the difference between OSG and VSG
> when I run perf stat I see that the OSG osggroup run achieves a ~0.7
> instructions per cycle while the VSG's vsggruoup run achieves ~2.2
> instructions per cycle.
>
> To close the gap we'd need to look at getting rid of NodeMask on all
> Nodes, changing NodeVisitor to be less flexible moving more
> responsibility on the subclasses to do more work.  Such changes would
> break a lot of end user applications.
>
> Cheers,
> Robert
> _______________________________________________
> osg-users mailing list
> osg-users at lists.openscenegraph.org
> http://lists.openscenegraph.org/listinfo.cgi/osg-users-openscenegraph.org


More information about the osg-users mailing list