[Prev] [Contents] [Next]

Powerful OS/400 cleanup mechanisms allow application deadlock (cancel_handler and C++ automatic destructors)

The AS/400 provides a set of very powerful and robust cleanup mechanisms. In OS/400, an application has the ability to register a cancel handler. Your application can enable a cancel handler by using the #pragma cancel_handler preprocessor statement if it is written in C or C++, or by using the CEERTX() API.

A cancel handler is somewhat similar to a pthread cancelation cleanup handler; however, a cancel handler runs whenever the stack frame or function that it was registered for ends in any way other than a normal return. Pthread cancelation cleanup handlers run only when the thread is terminated using pthread_exit(), pthread_cancel(), or return from the threads start routine.

Also, the cancel handler is guaranteed to run for all conditions that cause the stack frame to end (other than return), like thread termination, job termination, calls to exit(), abort(), exceptions that percolate up the stack and cancel stack frames, etc.... Similarly, C++ destructors for automatic C++ objects are guaranteed to run when the stack frame (function) or scope that it was registered in ends in any way.

These mechanisms provide a very powerful and guaranteed method to ensure that your application can always clean up its resources. With the added power of these mechanisms that the AS/400 provides, its easy for your application to cause a deadlock.

For example

Assume your application has a function foo(), that registers a cancel handler called cleanup(). The function foo() is called by multiple threads in your application. Your application is ended abnormally with a call to abort() or by system operator intervention (the ENDJOB *IMMED) CL command.
When this job end condition occurs, every thread is immediately terminated. When the system is terminating a thread by terminating each call stack entry in the thread, it eventually reaches the function foo() in that thread. When function foo() is reached, the system recognizes that it must not remove that function from the call stack without running the function cleanup(), the system runs cleanup().
Since your application is multi-threaded, all of the job ending and cleanup processing proceeds in parallel in each thread. Also, since abort() or ENDJOB *IMMED was used, the current state and location of each thread in your application is indeterminate. When your cleanup() function runs, it is very difficult for your application to correctly assume that any specific cleanup can be done. Any resources that the cleanup() function attempts to acquire may be held by other threads in the process, other jobs in the system, or possibly by the same thread running the cleanup() function. The state of application variables or resources that your application manipulates may be in an inconsistent state, because the call to abort() or ENDJOB *IMMED asynchronously interrupted every thread in the process at the same time. Its very easy for your application to deadlock when running the cancel handlers or C++ destructors.
Do not attempt to acquire locks or resources in cancel handlers or C++ automatic object destructors without preparing for the possibility that the resources cannot be acquired.

Important

Neither a cancel handler nor a destructor for a C++ object can prevent the call stack entry from being terminated, but the termination of the call stack entry (and therefore the job or thread) will be delayed until the cancel handler or destructor completes.

If the cancel handler or destructor does not complete, the system will not continue terminating the call stack entry (and possibly the job or thread). The only alternative at this point is to use the WRKJOB CL command (option 20) to end the thread, or the ENDJOB *IMMED CL command. Since any remaining cancel handlers are still guaranteed to run, if the ENDJOB *IMMED command was the mechanism that caused the cancel handlers to run in the first place, the only option left is the ENDJOBABN CL command.

The ENDJOBABN CL command is not recommended., the ENDJOBABN command causes the job to be terminated with no further cleanup allowed (application or operating system). If the application is hung trying to access certain operating system resources, those resources could be damaged. If there are operating system resources damaged, you may be required to take various reclaim, deletion, or recovery steps, and in extreme conditions, restart the system.

Recommendations

If you want to do cleanup of your job or application, you could use one of the following mechanisms:



[Prev] [Contents] [Next]
Copyright © 1998, IBM Corporation. All rights reserved.
Comments? Contact
rchthrds@us.ibm.com