diff options
Diffstat (limited to '')
-rw-r--r-- | gc/doc/debugging.html | 291 |
1 files changed, 0 insertions, 291 deletions
diff --git a/gc/doc/debugging.html b/gc/doc/debugging.html deleted file mode 100644 index 04773fa..0000000 --- a/gc/doc/debugging.html +++ /dev/null @@ -1,291 +0,0 @@ -<HTML> -<HEAD> -<TITLE>Debugging Garbage Collector Related Problems</title> -</head> -<BODY> -<H1>Debugging Garbage Collector Related Problems</h1> -This page contains some hints on -debugging issues specific to -the Boehm-Demers-Weiser conservative garbage collector. -It applies both to debugging issues in client code that manifest themselves -as collector misbehavior, and to debugging the collector itself. -<P> -If you suspect a bug in the collector itself, it is strongly recommended -that you try the latest collector release, even if it is labelled as "alpha", -before proceeding. -<H2>Bus Errors and Segmentation Violations</h2> -<P> -If the fault occurred in GC_find_limit, or with incremental collection enabled, -this is probably normal. The collector installs handlers to take care of -these. You will not see these unless you are using a debugger. -Your debugger <I>should</i> allow you to continue. -It's often preferable to tell the debugger to ignore SIGBUS and SIGSEGV -("<TT>handle SIGSEGV SIGBUS nostop noprint</tt>" in gdb, -"<TT>ignore SIGSEGV SIGBUS</tt>" in most versions of dbx) -and set a breakpoint in <TT>abort</tt>. -The collector will call abort if the signal had another cause, -and there was not other handler previously installed. -<P> -We recommend debugging without incremental collection if possible. -(This applies directly to UNIX systems. -Debugging with incremental collection under win32 is worse. See README.win32.) -<P> -If the application generates an unhandled SIGSEGV or equivalent, it may -often be easiest to set the environment variable GC_LOOP_ON_ABORT. On many -platforms, this will cause the collector to loop in a handler when the -SIGSEGV is encountered (or when the collector aborts for some other reason), -and a debugger can then be attached to the looping -process. This sidesteps common operating system problems related -to incomplete core files for multithreaded applications, etc. -<H2>Other Signals</h2> -On most platforms, the multithreaded version of the collector needs one or -two other signals for internal use by the collector in stopping threads. -It is normally wise to tell the debugger to ignore these. On Linux, -the collector currently uses SIGPWR and SIGXCPU by default. -<H2>Warning Messages About Needing to Allocate Blacklisted Blocks</h2> -The garbage collector generates warning messages of the form -<PRE> -Needed to allocate blacklisted block at 0x... -</pre> -when it needs to allocate a block at a location that it knows to be -referenced by a false pointer. These false pointers can be either permanent -(<I>e.g.</i> a static integer variable that never changes) or temporary. -In the latter case, the warning is largely spurious, and the block will -eventually be reclaimed normally. -In the former case, the program will still run correctly, but the block -will never be reclaimed. Unless the block is intended to be -permanent, the warning indicates a memory leak. -<OL> -<LI>Ignore these warnings while you are using GC_DEBUG. Some of the routines -mentioned below don't have debugging equivalents. (Alternatively, write -the missing routines and send them to me.) -<LI>Replace allocator calls that request large blocks with calls to -<TT>GC_malloc_ignore_off_page</tt> or -<TT>GC_malloc_atomic_ignore_off_page</tt>. You may want to set a -breakpoint in <TT>GC_default_warn_proc</tt> to help you identify such calls. -Make sure that a pointer to somewhere near the beginning of the resulting block -is maintained in a (preferably volatile) variable as long as -the block is needed. -<LI> -If the large blocks are allocated with realloc, we suggest instead allocating -them with something like the following. Note that the realloc size increment -should be fairly large (e.g. a factor of 3/2) for this to exhibit reasonable -performance. But we all know we should do that anyway. -<PRE> -void * big_realloc(void *p, size_t new_size) -{ - size_t old_size = GC_size(p); - void * result; - - if (new_size <= 10000) return(GC_realloc(p, new_size)); - if (new_size <= old_size) return(p); - result = GC_malloc_ignore_off_page(new_size); - if (result == 0) return(0); - memcpy(result,p,old_size); - GC_free(p); - return(result); -} -</pre> - -<LI> In the unlikely case that even relatively small object -(<20KB) allocations are triggering these warnings, then your address -space contains lots of "bogus pointers", i.e. values that appear to -be pointers but aren't. Usually this can be solved by using GC_malloc_atomic -or the routines in gc_typed.h to allocate large pointer-free regions of bitmaps, etc. Sometimes the problem can be solved with trivial changes of encoding -in certain values. It is possible, to identify the source of the bogus -pointers by building the collector with <TT>-DPRINT_BLACK_LIST</tt>, -which will cause it to print the "bogus pointers", along with their location. - -<LI> If you get only a fixed number of these warnings, you are probably only -introducing a bounded leak by ignoring them. If the data structures being -allocated are intended to be permanent, then it is also safe to ignore them. -The warnings can be turned off by calling GC_set_warn_proc with a procedure -that ignores these warnings (e.g. by doing absolutely nothing). -</ol> - -<H2>The Collector References a Bad Address in <TT>GC_malloc</tt></h2> - -This typically happens while the collector is trying to remove an entry from -its free list, and the free list pointer is bad because the free list link -in the last allocated object was bad. -<P> -With > 99% probability, you wrote past the end of an allocated object. -Try setting <TT>GC_DEBUG</tt> before including <TT>gc.h</tt> and -allocating with <TT>GC_MALLOC</tt>. This will try to detect such -overwrite errors. - -<H2>Unexpectedly Large Heap</h2> - -Unexpected heap growth can be due to one of the following: -<OL> -<LI> Data structures that are being unintentionally retained. This -is commonly caused by data structures that are no longer being used, -but were not cleared, or by caches growing without bounds. -<LI> Pointer misidentification. The garbage collector is interpreting -integers or other data as pointers and retaining the "referenced" -objects. -<LI> Heap fragmentation. This should never result in unbounded growth, -but it may account for larger heaps. This is most commonly caused -by allocation of large objects. On some platforms it can be reduced -by building with -DUSE_MUNMAP, which will cause the collector to unmap -memory corresponding to pages that have not been recently used. -<LI> Per object overhead. This is usually a relatively minor effect, but -it may be worth considering. If the collector recognizes interior -pointers, object sizes are increased, so that one-past-the-end pointers -are correctly recognized. The collector can be configured not to do this -(<TT>-DDONT_ADD_BYTE_AT_END</tt>). -<P> -The collector rounds up object sizes so the result fits well into the -chunk size (<TT>HBLKSIZE</tt>, normally 4K on 32 bit machines, 8K -on 64 bit machines) used by the collector. Thus it may be worth avoiding -objects of size 2K + 1 (or 2K if a byte is being added at the end.) -</ol> -The last two cases can often be identified by looking at the output -of a call to <TT>GC_dump()</tt>. Among other things, it will print the -list of free heap blocks, and a very brief description of all chunks in -the heap, the object sizes they correspond to, and how many live objects -were found in the chunk at the last collection. -<P> -Growing data structures can usually be identified by -<OL> -<LI> Building the collector with <TT>-DKEEP_BACK_PTRS</tt>, -<LI> Preferably using debugging allocation (defining <TT>GC_DEBUG</tt> -before including <TT>gc.h</tt> and allocating with <TT>GC_MALLOC</tt>), -so that objects will be identified by their allocation site, -<LI> Running the application long enough so -that most of the heap is composed of "leaked" memory, and -<LI> Then calling <TT>GC_generate_random_backtrace()</tt> from backptr.h -a few times to determine why some randomly sampled objects in the heap are -being retained. -</ol> -<P> -The same technique can often be used to identify problems with false -pointers, by noting whether the reference chains printed by -<TT>GC_generate_random_backtrace()</tt> involve any misidentified pointers. -An alternate technique is to build the collector with -<TT>-DPRINT_BLACK_LIST</tt> which will cause it to report values that -are almost, but not quite, look like heap pointers. It is very likely that -actual false pointers will come from similar sources. -<P> -In the unlikely case that false pointers are an issue, it can usually -be resolved using one or more of the following techniques: -<OL> -<LI> Use <TT>GC_malloc_atomic</tt> for objects containing no pointers. -This is especially important for large arrays containing compressed data, -pseudo-random numbers, and the like. It is also likely to improve GC -performance, perhaps drastically so if the application is paging. -<LI> If you allocate large objects containing only -one or two pointers at the beginning, either try the typed allocation -primitives is <TT>gc_typed.h</tt>, or separate out the pointerfree component. -<LI> Consider using <TT>GC_malloc_ignore_off_page()</tt> -to allocate large objects. (See <TT>gc.h</tt> and above for details. -Large means > 100K in most environments.) -</ol> -<H2>Prematurely Reclaimed Objects</h2> -The usual symptom of this is a segmentation fault, or an obviously overwritten -value in a heap object. This should, of course, be impossible. In practice, -it may happen for reasons like the following: -<OL> -<LI> The collector did not intercept the creation of threads correctly in -a multithreaded application, <I>e.g.</i> because the client called -<TT>pthread_create</tt> without including <TT>gc.h</tt>, which redefines it. -<LI> The last pointer to an object in the garbage collected heap was stored -somewhere were the collector couldn't see it, <I>e.g.</i> in an -object allocated with system <TT>malloc</tt>, in certain types of -<TT>mmap</tt>ed files, -or in some data structure visible only to the OS. (On some platforms, -thread-local storage is one of these.) -<LI> The last pointer to an object was somehow disguised, <I>e.g.</i> by -XORing it with another pointer. -<LI> Incorrect use of <TT>GC_malloc_atomic</tt> or typed allocation. -<LI> An incorrect <TT>GC_free</tt> call. -<LI> The client program overwrote an internal garbage collector data structure. -<LI> A garbage collector bug. -<LI> (Empirically less likely than any of the above.) A compiler optimization -that disguised the last pointer. -</ol> -The following relatively simple techniques should be tried first to narrow -down the problem: -<OL> -<LI> If you are using the incremental collector try turning it off for -debugging. -<LI> If you are using shared libraries, try linking statically. If that works, -ensure that DYNAMIC_LOADING is defined on your platform. -<LI> Try to reproduce the problem with fully debuggable unoptimized code. -This will eliminate the last possibility, as well as making debugging easier. -<LI> Try replacing any suspect typed allocation and <TT>GC_malloc_atomic</tt> -calls with calls to <TT>GC_malloc</tt>. -<LI> Try removing any GC_free calls (<I>e.g.</i> with a suitable -<TT>#define</tt>). -<LI> Rebuild the collector with <TT>-DGC_ASSERTIONS</tt>. -<LI> If the following works on your platform (i.e. if gctest still works -if you do this), try building the collector with -<TT>-DREDIRECT_MALLOC=GC_malloc_uncollectable</tt>. This will cause -the collector to scan memory allocated with malloc. -</ol> -If all else fails, you will have to attack this with a debugger. -Suggested steps: -<OL> -<LI> Call <TT>GC_dump()</tt> from the debugger around the time of the failure. Verify -that the collectors idea of the root set (i.e. static data regions which -it should scan for pointers) looks plausible. If not, i.e. if it doesn't -include some static variables, report this as -a collector bug. Be sure to describe your platform precisely, since this sort -of problem is nearly always very platform dependent. -<LI> Especially if the failure is not deterministic, try to isolate it to -a relatively small test case. -<LI> Set a break point in <TT>GC_finish_collection</tt>. This is a good -point to examine what has been marked, i.e. found reachable, by the -collector. -<LI> If the failure is deterministic, run the process -up to the last collection before the failure. -Note that the variable <TT>GC_gc_no</tt> counts collections and can be used -to set a conditional breakpoint in the right one. It is incremented just -before the call to GC_finish_collection. -If object <TT>p</tt> was prematurely recycled, it may be helpful to -look at <TT>*GC_find_header(p)</tt> at the failure point. -The <TT>hb_last_reclaimed</tt> field will identify the collection number -during which its block was last swept. -<LI> Verify that the offending object still has its correct contents at -this point. -The call <TT>GC_is_marked(p)</tt> from the debugger to verify that the -object has not been marked, and is about to be reclaimed. -<LI> Determine a path from a root, i.e. static variable, stack, or -register variable, -to the reclaimed object. Call <TT>GC_is_marked(q)</tt> for each object -<TT>q</tt> along the path, trying to locate the first unmarked object, say -<TT>r</tt>. -<LI> If <TT>r</tt> is pointed to by a static root, -verify that the location -pointing to it is part of the root set printed by <TT>GC_dump()</tt>. If it -is on the stack in the main (or only) thread, verify that -<TT>GC_stackbottom</tt> is set correctly to the base of the stack. If it is -in another thread stack, check the collector's thread data structure -(<TT>GC_thread[]</tt> on several platforms) to make sure that stack bounds -are set correctly. -<LI> If <TT>r</tt> is pointed to by heap object <TT>s</tt>, check that the -collector's layout description for <TT>s</tt> is such that the pointer field -will be scanned. Call <TT>*GC_find_header(s)</tt> to look at the descriptor -for the heap chunk. The <TT>hb_descr</tt> field specifies the layout -of objects in that chunk. See gc_mark.h for the meaning of the descriptor. -(If it's low order 2 bits are zero, then it is just the length of the -object prefix to be scanned. This form is always used for objects allocated -with <TT>GC_malloc</tt> or <TT>GC_malloc_atomic</tt>.) -<LI> If the failure is not deterministic, you may still be able to apply some -of the above technique at the point of failure. But remember that objects -allocated since the last collection will not have been marked, even if the -collector is functioning properly. On some platforms, the collector -can be configured to save call chains in objects for debugging. -Enabling this feature will also cause it to save the call stack at the -point of the last GC in GC_arrays._last_stack. -<LI> When looking at GC internal data structures remember that a number -of <TT>GC_</tt><I>xxx</i> variables are really macro defined to -<TT>GC_arrays._</tt><I>xxx</i>, so that -the collector can avoid scanning them. -</ol> -</body> -</html> - - - - |