[osg-users] Deleting still referenced object

Richard Harrison rjh at zaretto.com
Mon Jan 7 16:15:59 PST 2019


Using osg master; self built; application FlightGear; Win32 x64; MSVC 2015.

In FlightGear when OSG warning Warning: deleting still referenced object 
... the final reference count was" is detected in our NotifyHandler we 
throw an exception (probably the only reliably thing to do, as this 
message generally comes before some sort of segfault).

I've been trying to figure out what's the root cause of this for a 
months; I managed to reduce the incidence of these errors by changing 
ObjectCache::removeExpiredObjectsInCache to use < expiry time rather 
than <= expiry time, however this might be a specific instance of a more 
general problem; as since I did this I think I've figured out what is 
actually going on.

The more general problem is that the ObjectCache is removing an expired 
object however at the same time and inside the DatabasePager thread the 
object becomes active again; i.e. after the check for expiry time where 
_refCount == 1 passes and the deletion has started when at the same 
point a ref_ptr to the newly loaded object is created.

I think my analysis is right; I've had probably the longest debug 
session ever (since 31/12/2018) whilst I investigate (it took over 
40hours of flight before the problem happened).

------------------------
See https://i.imgur.com/DTPEpmh.png

So to unravel the two threads; firstly the stack dump of the main thread.
1.    fgfs.exe!NotifyLogger::notify(osg::NotifySeverity severity, const 
char * message) Line 26    C++
2.      osg160-osg.dll!osg::NotifyStreamBuffer::sync() Line 92 C++
3.      msvcp140.dll!00007ff8ed0927f2()    Unknown
4. osg160-osg.dll!std::endl<char,std::char_traits<char> 
 >(std::basic_ostream<char,std::char_traits<char> > & _Ostr) Line 994    C++
5.      osg160-osg.dll!osg::Referenced::~Referenced() Line 216 C++
6.      osgdb_ac.dll!osg::Group::`scalar deleting destructor'(unsigned 
int)    C++
7. osg160-osg.dll!osg::Referenced::signalObserversAndDelete(bool 
signalDelete, bool doDelete) Line 306    C++
8.      osg160-osg.dll!osg::Referenced::unref() Line 194    C++
9. 
osg160-osgDB.dll!std::_Tree<std::_Tmap_traits<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> 
 >,osg::ref_ptr<osgDB::Options const > 
 >,std::pair<osg::ref_ptr<osg::Object>,double>,osgDB::ObjectCache::ClassComp,std::allocator<std::pair<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,osg::ref_ptr<osgDB::Options const > > const ,std::pair<osg::ref_ptr<osg::Object>,double> > >,0> >::erase(std::_Tree_const_iterator<std::_Tree_val<std::_Tree_simple_types<std::pair<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,osg::ref_ptr<osgDB::Options const > > const ,std::pair<osg::ref_ptr<osg::Object>,double> > > > > _Where) Line 1431    C++
10. 
osg160-osgDB.dll!osgDB::ObjectCache::removeExpiredObjectsInCache(double 
expiryTime) Line 171    C++
11.      osg160-osgViewer.dll!osgViewer::Viewer::updateTraversal() Line 
1161    C++
12.      osg160-osgViewer.dll!osgViewer::ViewerBase::frame(double 
simulationTime) Line 748    C++
13.      fgfs.exe!fgOSMainLoop() Line 339    C++
14.      fgfs.exe!fgMainInit(int argc, char * * argv) Line 619 C++
15.      fgfs.exe!main(int argc, char * * argv) Line 339    C++
16.      fgfs.exe!__scrt_common_main_seh() Line 253    C++
17.      kernel32.dll!00007ff96a477e94()    Unknown
18.      ntdll.dll!00007ff96c81a251()    Unknown

The problem starts in (10) above, referring to the code; it is calling 
_objectCache.erase(oitr) which the reference count must be 1, following 
this through it is Referenced::unref that decrements the count, and 
finally this is checked in Referenced.cpp:213 (I've added extra code as 
part of the diagnostic process).


(10)
155. void ObjectCache::removeExpiredObjectsInCache(double expiryTime)
156. {
157.     OpenThreads::ScopedLock<OpenThreads::Mutex> 
lock(_objectCacheMutex);
158.
159.     // Remove expired entries from object cache
160.     ObjectCacheMap::iterator oitr = _objectCache.begin();
161.     while (oitr != _objectCache.end())
162.     {
163.         if (oitr->second.second <= expiryTime)
164.         {
165.             if (oitr->second.first->referenceCount() == 1) {
166. #if __cplusplus > 199711L
167.                 oitr = _objectCache.erase(oitr);
168. #else
169.                 _objectCache.erase(oitr++);
170. #endif
171.                 continue;
172.             }
173.             else
174.                 //RJHDO: consider whether or not _refCount>0 prior 
to removal
175.                 // first is FileNameOptionsPair
176.                 // second is ObjectTimeStampPair
177.                 OSG_WARN << 
"ObjectCache::removeExpiredObjectsInCache expired object has references: 
" << oitr->first.first << " refcount != 1 " << 
oitr->second.first->referenceCount() << std::endl;
178.
179.         }
180.         ++oitr;
181.     }
182. }
183.

(8)

169. inline int Referenced::unref() const
170. {
171.     int newRef;
172. #if defined(_OSG_REFERENCED_USE_ATOMIC_OPERATIONS)
173.     newRef = --_refCount;
174.     bool needDelete = (newRef == 0);
175. #else
176.     bool needDelete = false;
177.     if (_refMutex)
178.     {
179.         OpenThreads::ScopedLock<OpenThreads::Mutex> lock(*_refMutex);
180.         newRef = --_refCount;
181.         needDelete = newRef==0;
182.     }
183.     else
184.     {
185.         newRef = --_refCount;
186.         needDelete = newRef==0;
187.     }
188. #endif
189.
190.     if (needDelete)
191.     {
192.         signalObserversAndDelete(true,true);
193.     }
194.     return newRef;
195. }
196.


(5)

193. Referenced::~Referenced()
194. {
195. #ifdef DEBUG_OBJECT_ALLOCATION_DESTRUCTION
196.     {
197.         OpenThreads::ScopedLock<OpenThreads::Mutex> 
lock(getNumObjectMutex());
198.         --s_numObjects;
199.         printf("Object created, total num=%d\n",s_numObjects);
200.     }
201. #endif
202.
203.     if (_refCount > 0)
204.     {
205.         int retry_count = 100;
206.         while (_refCount > 0 && retry_count > 0)
207.         {
208.             OSG_WARN << "Wait for object references to be deleted 
(" << _refCount << ") " << this << " of type '" << typeid(this).name() 
<< "'" << std::endl;
209.             OpenThreads::Thread::microSleep(100000); // 100ms wait 
for other threads
210.             retry_count--;
211.
212.         }
213.         if (_refCount > 0)
214.         {
215.             OSG_WARN << "Warning: deleting still referenced object 
" << this << " of type '" << typeid(this).name() << "'" << std::endl;
216.             OSG_WARN << "         the final reference count was " 
<< _refCount << ", memory corruption possible." << std::endl;
217.         }
218.     }
219.
220.     // signal observers that we are being deleted.
221.     signalObserversAndDelete(true, false);
222.
223.     // delete the ObserverSet
224. #if defined(_OSG_REFERENCED_USE_ATOMIC_OPERATIONS)
225.     if (_observerSet.get()) 
static_cast<ObserverSet*>(_observerSet.get())->unref();
226. #else
227.     if (_observerSet) static_cast<ObserverSet*>(_observerSet)->unref();
228. #endif
229.
230. #if !defined(_OSG_REFERENCED_USE_ATOMIC_OPERATIONS)
231.     if (_refMutex) delete _refMutex;
232. #endif
233. }


---------------------------------

At the same time as the main thread (above) the DatabasePager thread is 
busily we have a tile being loaded.

After much analysis I think I've figured out what's happening and it's 
going to be tricky to fix; but as always finding out what's going wrong 
inside is the first step to fixing it.

Reference the stack trace from the pager thread (below).

There is a mutex to control access to the object cache. This locks the 
object cache for the duration of the getFromObjectCache method; however 
once this mutex lock was released by the DatabasePager thread it can 
then be taken almost immediately by a waiting 
removeExpiredObjectsInCache in the main thread.

As the mutex is only kept inside (2) in the stack trace 
(getFromObjectCache) once this method returns a valid pointer that 
pointer is then referenced in (4) and thus the reference count will be 
incremented at some point after object cache expiry check begins, ref 
ObjectCache::getFromObjectCache:99 and ModelRegistryCallback..hxx:79 
(where it calls _cachePolicy.find).

So the locking isn't sufficient to cover the case where an object that 
has expired and is about to be deleted then becomes active in the 
database pager thread

Stack trace of database pager thread.
1.      ot21-OpenThreads.dll!OpenThreads::Mutex::lock() Line 115    C++
2. osg160-osgDB.dll!osgDB::ObjectCache::getFromObjectCache(const 
std::basic_string<char,std::char_traits<char>,std::allocator<char> > & 
fileName, const osgDB::Options * options) Line 99 C++
3.      fgfs.exe!simgear::DefaultCachePolicy::find(const 
std::basic_string<char,std::char_traits<char>,std::allocator<char> > & 
fileName, const osgDB::Options * opt) Line 724    C++
4. 
fgfs.exe!simgear::ModelRegistryCallback<ACProcessPolicy,simgear::DefaultCachePolicy,ACOptimizePolicy,simgear::OSGSubstitutePolicy,simgear::BuildLeafBVHPolicy>::readNode(const 
std::basic_string<char,std::char_traits<char>,std::allocator<char> > & 
fileName, const osgDB::Options * opt) Line 79    C++
5.      fgfs.exe!simgear::ModelRegistry::readNode(const 
std::basic_string<char,std::char_traits<char>,std::allocator<char> > & 
fileName, const osgDB::Options * opt) Line 842    C++
6.      osg160-osgDB.dll!osgDB::Registry::readNode(const 
std::basic_string<char,std::char_traits<char>,std::allocator<char> > & 
fileName, const osgDB::Options * options, bool buildKdTreeIfRequired) 
Line 238    C++
7.      osg160-osgDB.dll!osgDB::readNodeFile(const 
std::basic_string<char,std::char_traits<char>,std::allocator<char> > & 
filename, const osgDB::Options * options) Line 70 C++
8.      fgfs.exe!sgLoad3DModel_internal(const SGPath & path, const 
osgDB::Options * dbOptions, SGPropertyNode * overlay) Line 338    C++
9.      fgfs.exe!simgear::SGReaderWriterXML::readNode(const 
std::basic_string<char,std::char_traits<char>,std::allocator<char> > & 
name, const osgDB::Options * options) Line 92    C++
10. 
fgfs.exe!simgear::ModelRegistryCallback<simgear::DefaultProcessPolicy,simgear::NoCachePolicy,simgear::NoOptimizePolicy,simgear::NoSubstitutePolicy,simgear::BuildLeafBVHPolicy>::loadUsingReaderWriter(const 
std::basic_string<char,std::char_traits<char>,std::allocator<char> > & 
fileName, const osgDB::Options * opt) Line 114    C++
11. 
fgfs.exe!simgear::ModelRegistryCallback<simgear::DefaultProcessPolicy,simgear::NoCachePolicy,simgear::NoOptimizePolicy,simgear::NoSubstitutePolicy,simgear::BuildLeafBVHPolicy>::readNode(const 
std::basic_string<char,std::char_traits<char>,std::allocator<char> > & 
fileName, const osgDB::Options * opt) Line 90    C++
12.      fgfs.exe!simgear::ModelRegistry::readNode(const 
std::basic_string<char,std::char_traits<char>,std::allocator<char> > & 
fileName, const osgDB::Options * opt) Line 842    C++
13.      osg160-osgDB.dll!osgDB::Registry::readNode(const 
std::basic_string<char,std::char_traits<char>,std::allocator<char> > & 
fileName, const osgDB::Options * options, bool buildKdTreeIfRequired) 
Line 238    C++
14.      osg160-osgDB.dll!osgDB::readRefNodeFile(const 
std::basic_string<char,std::char_traits<char>,std::allocator<char> > & 
filename, const osgDB::Options * options) Line 130 C++
15. 
fgfs.exe!simgear::ReaderWriterSTG::_ModelBin::DelayLoadReadFileCallback::AddModelLOD::operator()(osg::LOD 
* leaf, simgear::ReaderWriterSTG::_ModelBin::_ObjectStatic o) Line 
159    C++
16.      fgfs.exe!simgear::QuadTreeBuilder<osg::LOD * 
__ptr64,simgear::ReaderWriterSTG::_ModelBin::_ObjectStatic,simgear::ReaderWriterSTG::_ModelBin::DelayLoadReadFileCallback::MakeQuadLeaf,simgear::ReaderWriterSTG::_ModelBin::DelayLoadReadFileCallback::AddModelLOD,simgear::ReaderWriterSTG::_ModelBin::DelayLoadReadFileCallback::GetModelLODCoord>::addNode(simgear::ReaderWriterSTG::_ModelBin::_ObjectStatic 
& obj) Line 96    C++
17.      fgfs.exe!simgear::QuadTreeBuilder<osg::LOD * 
__ptr64,simgear::ReaderWriterSTG::_ModelBin::_ObjectStatic,simgear::ReaderWriterSTG::_ModelBin::DelayLoadReadFileCallback::MakeQuadLeaf,simgear::ReaderWriterSTG::_ModelBin::DelayLoadReadFileCallback::AddModelLOD,simgear::ReaderWriterSTG::_ModelBin::DelayLoadReadFileCallback::GetModelLODCoord>::buildQuadTree<std::_List_iterator<std::_List_val<std::_List_simple_types<simgear::ReaderWriterSTG::_ModelBin::_ObjectStatic> 
 > > >(const 
std::_List_iterator<std::_List_val<std::_List_simple_types<simgear::ReaderWriterSTG::_ModelBin::_ObjectStatic> 
 > > & begin, const 
std::_List_iterator<std::_List_val<std::_List_simple_types<simgear::ReaderWriterSTG::_ModelBin::_ObjectStatic> 
 > > & end) Line 118    C++
18. 
fgfs.exe!simgear::ReaderWriterSTG::_ModelBin::DelayLoadReadFileCallback::readNode(const 
std::basic_string<char,std::char_traits<char>,std::allocator<char> > & 
__formal, const osgDB::Options * __formal) Line 203 C++
19.      osg160-osgDB.dll!osgDB::Registry::readNode(const 
std::basic_string<char,std::char_traits<char>,std::allocator<char> > & 
fileName, const osgDB::Options * options, bool buildKdTreeIfRequired) 
Line 237    C++
20. osg160-osgDB.dll!osgDB::DatabasePager::DatabaseThread::run() Line 
909    C++
21. 
ot21-OpenThreads.dll!OpenThreads::ThreadPrivateActions::StartThread(void 
* data) Line 112    C++

(3)

97. osg::Object* ObjectCache::getFromObjectCache(const std::string& 
fileName, const Options *options)
98. {
99.     OpenThreads::ScopedLock<OpenThreads::Mutex> lock(_objectCacheMutex);
100.     ObjectCacheMap::iterator itr = find(fileName, options);
101.     if (itr!=_objectCache.end())
102.     {
103.         osg::ref_ptr<const osgDB::Options> o = itr->first.second;
104.         if (o.valid())
105.         {
106.             OSG_DEBUG<<"Found "<<fileName<<" with options '"<< 
o->getOptionString()<< "' in ObjectCache "<<this<<std::endl;
107.         }
108.         else
109.         {
110.             OSG_DEBUG<<"Found "<<fileName<<" in ObjectCache 
"<<this<<std::endl;
111.         }
112.         return itr->second.first.get();
113.     }
114.     else return 0;
115. }

(4)

63. class ModelRegistryCallback : public osgDB::Registry::ReadFileCallback {
64. public:
65.     ModelRegistryCallback(const std::string& extension) :
66.         _processPolicy(extension), _cachePolicy(extension),
67.         _optimizePolicy(extension),
68.         _substitutePolicy(extension), _bvhPolicy(extension)
69.     {
70.     }
71.     virtual osgDB::ReaderWriter::ReadResult
72.     readNode(const std::string& fileName,
73.              const osgDB::Options* opt)
74.     {
75.         using namespace osg;
76.         using namespace osgDB;
77.         using osgDB::ReaderWriter;
78. //        Registry* registry = Registry::instance();
79.         ref_ptr<osg::Node> optimizedNode = 
_cachePolicy.find(fileName, opt);
80.         if (!optimizedNode.valid()) {


----------------------------------------------------

After much thinking I believe that a possibly fix is to use a 
ReadWriteMutex in the registry instance; and to readLock this in the 
DatabasePager and to tryWriteLock in 
ObjectCache::removeExpiredObjectsInCache (tryWriteLock is a new method 
that I added as I don't think we want to wait for the readers to finish 
inside the rendering loop). If the object cache can't get the write lock 
it'll do no expiry until it does get the write lock. This could possibly 
be never; but that remains to be seen.







More information about the osg-users mailing list