[osg-users] [vpb] vpbmaster appears to slow down after thousands of tasks completed

Knut Karlsen knut.karlsen at gmail.com
Wed Aug 31 02:35:57 PDT 2016


We have setup vpmaster to use a machinepool to distribute the work of making a large terrain db. This works fairly well and we are getting the expected results after about 40 hours of work. However, after a few thousand of 32000 tasks has been completed it appears that the master can't provide tasks fast enought to the other workers. It will drop from ~ 100 running tasks to 3-4 for a long time. We have ~100 processes configured across 6 machines (40-40-8-8-8-8), and when running tasks are few the other machines have almost no load. 

See the htop snapshot from the master to see the situation. 

After som investigation it appears (and I'm just specualting) that the main vpmaster process writes an enourmous amount of data to the 

"terrainname.ive.0.added" file in the output folder. 

This files gets an increasing number of task names written to it at a rate of 600 MB in a few seconds. 

Tlines are like this:

It appears to add 600 MB worth of these lines every few seconds, which really saturates the disk i/o and keeps one of the processes at 100%

At some point this line is added:
PlanetSAT150m_Mexico15m_vpbmaster.ive.0.added: file truncated

And the file is set to 0 MB and it starts to write to it again.

If I cancle the vpbmaster run and resubmit the tasks, it will start with normal effiency, but after a few thousand tasks this behaviour starts again. 

Has anyone seen this behaviour before?


Thank you!


