Spring Boot Yarn onContainerFailed method diagnostics message

Suppose that we have an SBY application, which includes AppMaster and Container. When the container falls due to an exception, we need to send exception text to the AppMaster.

When the container falls, the AppMaster’s onContainerFailed() method is automatically called:

protected boolean onContainerFailed(ContainerStatus status)

The method is in the StaticEventingAppmaster class. Your AppMaster must be inherited from it. We override the onContainerFailed() method. It accepts the ContainerStatus as input, where the diagnostics message is:

public class AppMaster extends StaticEventingAppmaster {
  ...
  @Override
  protected boolean onContainerFailed(ContainerStatus status) {
    String id = status.getContainerId().toString();
    String diagnostics = status.getDiagnostics();
    log.error("Container {} failed. Diagnostics: {}", id, diagnostics);
  }
}

After that, in the log we will see:

Container e424_1253242942153_5421_02_000003 failed. Diagnostics: Exception from container-launch.

Diagnostics message is set by hadoop itself, not even Spring Boot Yarn. And there are no options to set your text there. Therefore, it will be easiest to use the following scheme to send exception text from Container to AppMaster:

  • The code on the container is placed in a try-catch block;
  • When an exception occurs in the catch block, the text of the exception and the container ID are sent using ZeroMQ (and similar tools) to AppMaster;
  • The AppMaster runs a service that expects an exception text from containers. When an exception text appears, it is placed in a Map, where the key is the container ID and the value is the exception text;
  • When the onContainerFailed() method is triggered, the exception text extracting from Map from the previous step.

Schematically:

@YarnComponent
class YourContainer {
  @OnContainerStart
  public void onContainerStart(@YarnEnvironment("CONTAINER_ID") containerId) {
    try {
      // your code
    } catch (Exception e) {
      ZMQ.Socket socket = ...;
      socket.send(containerId + e.getClass().getSimpleName() + e.getMessage());
      socket.close();
  }
}
public class AppMaster extends StaticEventingAppmaster {
  ...

  Map<String, String> exceptions = new HashMap<>();

  @Override
  protected void onInit() throws Exception() {
    new Thread(() -> {
      while(!Thread.currentThread().isInterrupted()) {
        ZMQ.Socket socket = ...;
        String exception = new String(socket.recv(), StandartCharsets.UTF_8);
        String containerId = exception.substring(0, 20); // 0-20 – it is example containerId length. change it to yours
        exception = exception.substring(20, exception.length());
        exceptions.put(containerId, exception);
      }
    }).start;
  }

  @Override
  protected boolean onContainerFailed(ContainerStatus status) {
    String id = status.getContainerId.toString();
    String diagnostics = exceptions.get(id);
    log.error("Container {} failed. Diagnostics: {}", id, diagnostics);
  }
}
Telegram channel

If you still have any questions, feel free to ask me in the comments under this article or write me at promark33@gmail.com.

If I saved your day, you can support me 🤝

Leave a Reply

Your email address will not be published. Required fields are marked *