[osg-users] Deleting still referenced object
Richard Harrison
rjh at zaretto.com
Mon Jan 7 16:15:59 PST 2019
Using osg master; self built; application FlightGear; Win32 x64; MSVC 2015.
In FlightGear when OSG warning Warning: deleting still referenced object
... the final reference count was" is detected in our NotifyHandler we
throw an exception (probably the only reliably thing to do, as this
message generally comes before some sort of segfault).
I've been trying to figure out what's the root cause of this for a
months; I managed to reduce the incidence of these errors by changing
ObjectCache::removeExpiredObjectsInCache to use < expiry time rather
than <= expiry time, however this might be a specific instance of a more
general problem; as since I did this I think I've figured out what is
actually going on.
The more general problem is that the ObjectCache is removing an expired
object however at the same time and inside the DatabasePager thread the
object becomes active again; i.e. after the check for expiry time where
_refCount == 1 passes and the deletion has started when at the same
point a ref_ptr to the newly loaded object is created.
I think my analysis is right; I've had probably the longest debug
session ever (since 31/12/2018) whilst I investigate (it took over
40hours of flight before the problem happened).
------------------------
See https://i.imgur.com/DTPEpmh.png
So to unravel the two threads; firstly the stack dump of the main thread.
1. fgfs.exe!NotifyLogger::notify(osg::NotifySeverity severity, const
char * message) Line 26 C++
2. osg160-osg.dll!osg::NotifyStreamBuffer::sync() Line 92 C++
3. msvcp140.dll!00007ff8ed0927f2() Unknown
4. osg160-osg.dll!std::endl<char,std::char_traits<char>
>(std::basic_ostream<char,std::char_traits<char> > & _Ostr) Line 994 C++
5. osg160-osg.dll!osg::Referenced::~Referenced() Line 216 C++
6. osgdb_ac.dll!osg::Group::`scalar deleting destructor'(unsigned
int) C++
7. osg160-osg.dll!osg::Referenced::signalObserversAndDelete(bool
signalDelete, bool doDelete) Line 306 C++
8. osg160-osg.dll!osg::Referenced::unref() Line 194 C++
9.
osg160-osgDB.dll!std::_Tree<std::_Tmap_traits<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char>
>,osg::ref_ptr<osgDB::Options const >
>,std::pair<osg::ref_ptr<osg::Object>,double>,osgDB::ObjectCache::ClassComp,std::allocator<std::pair<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,osg::ref_ptr<osgDB::Options const > > const ,std::pair<osg::ref_ptr<osg::Object>,double> > >,0> >::erase(std::_Tree_const_iterator<std::_Tree_val<std::_Tree_simple_types<std::pair<std::pair<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,osg::ref_ptr<osgDB::Options const > > const ,std::pair<osg::ref_ptr<osg::Object>,double> > > > > _Where) Line 1431 C++
10.
osg160-osgDB.dll!osgDB::ObjectCache::removeExpiredObjectsInCache(double
expiryTime) Line 171 C++
11. osg160-osgViewer.dll!osgViewer::Viewer::updateTraversal() Line
1161 C++
12. osg160-osgViewer.dll!osgViewer::ViewerBase::frame(double
simulationTime) Line 748 C++
13. fgfs.exe!fgOSMainLoop() Line 339 C++
14. fgfs.exe!fgMainInit(int argc, char * * argv) Line 619 C++
15. fgfs.exe!main(int argc, char * * argv) Line 339 C++
16. fgfs.exe!__scrt_common_main_seh() Line 253 C++
17. kernel32.dll!00007ff96a477e94() Unknown
18. ntdll.dll!00007ff96c81a251() Unknown
The problem starts in (10) above, referring to the code; it is calling
_objectCache.erase(oitr) which the reference count must be 1, following
this through it is Referenced::unref that decrements the count, and
finally this is checked in Referenced.cpp:213 (I've added extra code as
part of the diagnostic process).
(10)
155. void ObjectCache::removeExpiredObjectsInCache(double expiryTime)
156. {
157. OpenThreads::ScopedLock<OpenThreads::Mutex>
lock(_objectCacheMutex);
158.
159. // Remove expired entries from object cache
160. ObjectCacheMap::iterator oitr = _objectCache.begin();
161. while (oitr != _objectCache.end())
162. {
163. if (oitr->second.second <= expiryTime)
164. {
165. if (oitr->second.first->referenceCount() == 1) {
166. #if __cplusplus > 199711L
167. oitr = _objectCache.erase(oitr);
168. #else
169. _objectCache.erase(oitr++);
170. #endif
171. continue;
172. }
173. else
174. //RJHDO: consider whether or not _refCount>0 prior
to removal
175. // first is FileNameOptionsPair
176. // second is ObjectTimeStampPair
177. OSG_WARN <<
"ObjectCache::removeExpiredObjectsInCache expired object has references:
" << oitr->first.first << " refcount != 1 " <<
oitr->second.first->referenceCount() << std::endl;
178.
179. }
180. ++oitr;
181. }
182. }
183.
(8)
169. inline int Referenced::unref() const
170. {
171. int newRef;
172. #if defined(_OSG_REFERENCED_USE_ATOMIC_OPERATIONS)
173. newRef = --_refCount;
174. bool needDelete = (newRef == 0);
175. #else
176. bool needDelete = false;
177. if (_refMutex)
178. {
179. OpenThreads::ScopedLock<OpenThreads::Mutex> lock(*_refMutex);
180. newRef = --_refCount;
181. needDelete = newRef==0;
182. }
183. else
184. {
185. newRef = --_refCount;
186. needDelete = newRef==0;
187. }
188. #endif
189.
190. if (needDelete)
191. {
192. signalObserversAndDelete(true,true);
193. }
194. return newRef;
195. }
196.
(5)
193. Referenced::~Referenced()
194. {
195. #ifdef DEBUG_OBJECT_ALLOCATION_DESTRUCTION
196. {
197. OpenThreads::ScopedLock<OpenThreads::Mutex>
lock(getNumObjectMutex());
198. --s_numObjects;
199. printf("Object created, total num=%d\n",s_numObjects);
200. }
201. #endif
202.
203. if (_refCount > 0)
204. {
205. int retry_count = 100;
206. while (_refCount > 0 && retry_count > 0)
207. {
208. OSG_WARN << "Wait for object references to be deleted
(" << _refCount << ") " << this << " of type '" << typeid(this).name()
<< "'" << std::endl;
209. OpenThreads::Thread::microSleep(100000); // 100ms wait
for other threads
210. retry_count--;
211.
212. }
213. if (_refCount > 0)
214. {
215. OSG_WARN << "Warning: deleting still referenced object
" << this << " of type '" << typeid(this).name() << "'" << std::endl;
216. OSG_WARN << " the final reference count was "
<< _refCount << ", memory corruption possible." << std::endl;
217. }
218. }
219.
220. // signal observers that we are being deleted.
221. signalObserversAndDelete(true, false);
222.
223. // delete the ObserverSet
224. #if defined(_OSG_REFERENCED_USE_ATOMIC_OPERATIONS)
225. if (_observerSet.get())
static_cast<ObserverSet*>(_observerSet.get())->unref();
226. #else
227. if (_observerSet) static_cast<ObserverSet*>(_observerSet)->unref();
228. #endif
229.
230. #if !defined(_OSG_REFERENCED_USE_ATOMIC_OPERATIONS)
231. if (_refMutex) delete _refMutex;
232. #endif
233. }
---------------------------------
At the same time as the main thread (above) the DatabasePager thread is
busily we have a tile being loaded.
After much analysis I think I've figured out what's happening and it's
going to be tricky to fix; but as always finding out what's going wrong
inside is the first step to fixing it.
Reference the stack trace from the pager thread (below).
There is a mutex to control access to the object cache. This locks the
object cache for the duration of the getFromObjectCache method; however
once this mutex lock was released by the DatabasePager thread it can
then be taken almost immediately by a waiting
removeExpiredObjectsInCache in the main thread.
As the mutex is only kept inside (2) in the stack trace
(getFromObjectCache) once this method returns a valid pointer that
pointer is then referenced in (4) and thus the reference count will be
incremented at some point after object cache expiry check begins, ref
ObjectCache::getFromObjectCache:99 and ModelRegistryCallback..hxx:79
(where it calls _cachePolicy.find).
So the locking isn't sufficient to cover the case where an object that
has expired and is about to be deleted then becomes active in the
database pager thread
Stack trace of database pager thread.
1. ot21-OpenThreads.dll!OpenThreads::Mutex::lock() Line 115 C++
2. osg160-osgDB.dll!osgDB::ObjectCache::getFromObjectCache(const
std::basic_string<char,std::char_traits<char>,std::allocator<char> > &
fileName, const osgDB::Options * options) Line 99 C++
3. fgfs.exe!simgear::DefaultCachePolicy::find(const
std::basic_string<char,std::char_traits<char>,std::allocator<char> > &
fileName, const osgDB::Options * opt) Line 724 C++
4.
fgfs.exe!simgear::ModelRegistryCallback<ACProcessPolicy,simgear::DefaultCachePolicy,ACOptimizePolicy,simgear::OSGSubstitutePolicy,simgear::BuildLeafBVHPolicy>::readNode(const
std::basic_string<char,std::char_traits<char>,std::allocator<char> > &
fileName, const osgDB::Options * opt) Line 79 C++
5. fgfs.exe!simgear::ModelRegistry::readNode(const
std::basic_string<char,std::char_traits<char>,std::allocator<char> > &
fileName, const osgDB::Options * opt) Line 842 C++
6. osg160-osgDB.dll!osgDB::Registry::readNode(const
std::basic_string<char,std::char_traits<char>,std::allocator<char> > &
fileName, const osgDB::Options * options, bool buildKdTreeIfRequired)
Line 238 C++
7. osg160-osgDB.dll!osgDB::readNodeFile(const
std::basic_string<char,std::char_traits<char>,std::allocator<char> > &
filename, const osgDB::Options * options) Line 70 C++
8. fgfs.exe!sgLoad3DModel_internal(const SGPath & path, const
osgDB::Options * dbOptions, SGPropertyNode * overlay) Line 338 C++
9. fgfs.exe!simgear::SGReaderWriterXML::readNode(const
std::basic_string<char,std::char_traits<char>,std::allocator<char> > &
name, const osgDB::Options * options) Line 92 C++
10.
fgfs.exe!simgear::ModelRegistryCallback<simgear::DefaultProcessPolicy,simgear::NoCachePolicy,simgear::NoOptimizePolicy,simgear::NoSubstitutePolicy,simgear::BuildLeafBVHPolicy>::loadUsingReaderWriter(const
std::basic_string<char,std::char_traits<char>,std::allocator<char> > &
fileName, const osgDB::Options * opt) Line 114 C++
11.
fgfs.exe!simgear::ModelRegistryCallback<simgear::DefaultProcessPolicy,simgear::NoCachePolicy,simgear::NoOptimizePolicy,simgear::NoSubstitutePolicy,simgear::BuildLeafBVHPolicy>::readNode(const
std::basic_string<char,std::char_traits<char>,std::allocator<char> > &
fileName, const osgDB::Options * opt) Line 90 C++
12. fgfs.exe!simgear::ModelRegistry::readNode(const
std::basic_string<char,std::char_traits<char>,std::allocator<char> > &
fileName, const osgDB::Options * opt) Line 842 C++
13. osg160-osgDB.dll!osgDB::Registry::readNode(const
std::basic_string<char,std::char_traits<char>,std::allocator<char> > &
fileName, const osgDB::Options * options, bool buildKdTreeIfRequired)
Line 238 C++
14. osg160-osgDB.dll!osgDB::readRefNodeFile(const
std::basic_string<char,std::char_traits<char>,std::allocator<char> > &
filename, const osgDB::Options * options) Line 130 C++
15.
fgfs.exe!simgear::ReaderWriterSTG::_ModelBin::DelayLoadReadFileCallback::AddModelLOD::operator()(osg::LOD
* leaf, simgear::ReaderWriterSTG::_ModelBin::_ObjectStatic o) Line
159 C++
16. fgfs.exe!simgear::QuadTreeBuilder<osg::LOD *
__ptr64,simgear::ReaderWriterSTG::_ModelBin::_ObjectStatic,simgear::ReaderWriterSTG::_ModelBin::DelayLoadReadFileCallback::MakeQuadLeaf,simgear::ReaderWriterSTG::_ModelBin::DelayLoadReadFileCallback::AddModelLOD,simgear::ReaderWriterSTG::_ModelBin::DelayLoadReadFileCallback::GetModelLODCoord>::addNode(simgear::ReaderWriterSTG::_ModelBin::_ObjectStatic
& obj) Line 96 C++
17. fgfs.exe!simgear::QuadTreeBuilder<osg::LOD *
__ptr64,simgear::ReaderWriterSTG::_ModelBin::_ObjectStatic,simgear::ReaderWriterSTG::_ModelBin::DelayLoadReadFileCallback::MakeQuadLeaf,simgear::ReaderWriterSTG::_ModelBin::DelayLoadReadFileCallback::AddModelLOD,simgear::ReaderWriterSTG::_ModelBin::DelayLoadReadFileCallback::GetModelLODCoord>::buildQuadTree<std::_List_iterator<std::_List_val<std::_List_simple_types<simgear::ReaderWriterSTG::_ModelBin::_ObjectStatic>
> > >(const
std::_List_iterator<std::_List_val<std::_List_simple_types<simgear::ReaderWriterSTG::_ModelBin::_ObjectStatic>
> > & begin, const
std::_List_iterator<std::_List_val<std::_List_simple_types<simgear::ReaderWriterSTG::_ModelBin::_ObjectStatic>
> > & end) Line 118 C++
18.
fgfs.exe!simgear::ReaderWriterSTG::_ModelBin::DelayLoadReadFileCallback::readNode(const
std::basic_string<char,std::char_traits<char>,std::allocator<char> > &
__formal, const osgDB::Options * __formal) Line 203 C++
19. osg160-osgDB.dll!osgDB::Registry::readNode(const
std::basic_string<char,std::char_traits<char>,std::allocator<char> > &
fileName, const osgDB::Options * options, bool buildKdTreeIfRequired)
Line 237 C++
20. osg160-osgDB.dll!osgDB::DatabasePager::DatabaseThread::run() Line
909 C++
21.
ot21-OpenThreads.dll!OpenThreads::ThreadPrivateActions::StartThread(void
* data) Line 112 C++
(3)
97. osg::Object* ObjectCache::getFromObjectCache(const std::string&
fileName, const Options *options)
98. {
99. OpenThreads::ScopedLock<OpenThreads::Mutex> lock(_objectCacheMutex);
100. ObjectCacheMap::iterator itr = find(fileName, options);
101. if (itr!=_objectCache.end())
102. {
103. osg::ref_ptr<const osgDB::Options> o = itr->first.second;
104. if (o.valid())
105. {
106. OSG_DEBUG<<"Found "<<fileName<<" with options '"<<
o->getOptionString()<< "' in ObjectCache "<<this<<std::endl;
107. }
108. else
109. {
110. OSG_DEBUG<<"Found "<<fileName<<" in ObjectCache
"<<this<<std::endl;
111. }
112. return itr->second.first.get();
113. }
114. else return 0;
115. }
(4)
63. class ModelRegistryCallback : public osgDB::Registry::ReadFileCallback {
64. public:
65. ModelRegistryCallback(const std::string& extension) :
66. _processPolicy(extension), _cachePolicy(extension),
67. _optimizePolicy(extension),
68. _substitutePolicy(extension), _bvhPolicy(extension)
69. {
70. }
71. virtual osgDB::ReaderWriter::ReadResult
72. readNode(const std::string& fileName,
73. const osgDB::Options* opt)
74. {
75. using namespace osg;
76. using namespace osgDB;
77. using osgDB::ReaderWriter;
78. // Registry* registry = Registry::instance();
79. ref_ptr<osg::Node> optimizedNode =
_cachePolicy.find(fileName, opt);
80. if (!optimizedNode.valid()) {
----------------------------------------------------
After much thinking I believe that a possibly fix is to use a
ReadWriteMutex in the registry instance; and to readLock this in the
DatabasePager and to tryWriteLock in
ObjectCache::removeExpiredObjectsInCache (tryWriteLock is a new method
that I added as I don't think we want to wait for the readers to finish
inside the rendering loop). If the object cache can't get the write lock
it'll do no expiry until it does get the write lock. This could possibly
be never; but that remains to be seen.
More information about the osg-users
mailing list