HDFS_DELEGATION_TOKEN can't be found in cache
The problem can be appears in Hadoop’s NodeManager logs.
Usually it means that NodeManager is trying to use an expired / not renewed HDFS delegation token.
For example, you can face this error while app log aggregation process. The timeline is:
- Your application pass HDFS delegation token to the NodeManager through the
ContainerLaunchContext
class, because NodeManager needs to localize container resources. - NodeManager uses the same HDFS delegation token to aggregate the logs. It transfer all app log files from nodes to HDFS.
- You didn’t start containers on some nodes for a long time, so token on corresponding NodeManagers are going to expire.
- When you kill your application, the log aggregation process was triggered, but some NodeManagers are going to give you HDFS_DELEGATION_TOKEN error, because they don’t have valid token.
To solve this error, your application must renew that token itself, which it previously pass to NodeManager through the ContainerLaunchContext
.
Important notice! The token is going to expire and can’t be renewed after about 7 days. You don’t need to do something anymore, because NodeManager is going to make it’s own token.
By the way you need to ensure, that you are using Hadoop version >= 2.6.0. In older versions there is a bug, because of which NodeManager is aren’t going to make it’s own token: https://issues.apache.org/jira/browse/YARN-2704.
Telegram channel
If you still have any questions, feel free to ask me in the comments under this article or write me at promark33@gmail.com.
If I saved your day, you can support me 🤝