Suppose that we have an SBY application, which includes AppMaster and Container. When the container falls due to an exception, we need to send exception text to the AppMaster.
When the container falls, the AppMaster’s onContainerFailed()
method is automatically called:
protected boolean onContainerFailed(ContainerStatus status)
The method is in the StaticEventingAppmaster
class. Your AppMaster must be inherited from it. We override the onContainerFailed()
method. It accepts the ContainerStatus
as input, where the diagnostics message is:
public class AppMaster extends StaticEventingAppmaster {
...
@Override
protected boolean onContainerFailed(ContainerStatus status) {
String id = status.getContainerId().toString();
String diagnostics = status.getDiagnostics();
log.error("Container {} failed. Diagnostics: {}", id, diagnostics);
}
}
After that, in the log we will see:
Container e424_1253242942153_5421_02_000003 failed. Diagnostics: Exception from container-launch.
Diagnostics message is set by hadoop itself, not even Spring Boot Yarn. And there are no options to set your text there. Therefore, it will be easiest to use the following scheme to send exception text from Container to AppMaster:
- The code on the container is placed in a try-catch block;
- When an exception occurs in the catch block, the text of the exception and the container ID are sent using ZeroMQ (and similar tools) to AppMaster;
- The AppMaster runs a service that expects an exception text from containers. When an exception text appears, it is placed in a Map, where the key is the container ID and the value is the exception text;
- When the
onContainerFailed()
method is triggered, the exception text extracting from Map from the previous step.
Schematically:
@YarnComponent
class YourContainer {
@OnContainerStart
public void onContainerStart(@YarnEnvironment("CONTAINER_ID") containerId) {
try {
// your code
} catch (Exception e) {
ZMQ.Socket socket = ...;
socket.send(containerId + e.getClass().getSimpleName() + e.getMessage());
socket.close();
}
}
public class AppMaster extends StaticEventingAppmaster {
...
Map<String, String> exceptions = new HashMap<>();
@Override
protected void onInit() throws Exception() {
new Thread(() -> {
while(!Thread.currentThread().isInterrupted()) {
ZMQ.Socket socket = ...;
String exception = new String(socket.recv(), StandartCharsets.UTF_8);
String containerId = exception.substring(0, 20); // 0-20 – it is example containerId length. change it to yours
exception = exception.substring(20, exception.length());
exceptions.put(containerId, exception);
}
}).start;
}
@Override
protected boolean onContainerFailed(ContainerStatus status) {
String id = status.getContainerId.toString();
String diagnostics = exceptions.get(id);
log.error("Container {} failed. Diagnostics: {}", id, diagnostics);
}
}
If you still have any questions, feel free to ask me in the comments under this article, or write me on promark33@gmail.com.