Spark Java access remote HDFS

Suppose we need to work with different HDFS (clusterB, for instance) from our Spark Java application, running on clusterA. Firstly, you need to add –conf key to your run command. Depends on Spark version: Secondly, when you creating Spark’s Java context, add that: You need to go to clusterB and gather core-site.xml and hdfs-site.xml from there (default location for Cloudera is /etc/hadoop/conf) […]

READ MORE

Java access remote HDFS from current Hadoop cluster

Suppose we have our Java app running on Hadoop clusterA, and we want to access remote HDFS based on Hadoop clusterB. Let’s see how we can do it: You need to go to clusterB and gather core-site.xml and hdfs-site.xml from there (default location for Cloudera is /etc/hadoop/conf) and put near your app running in clusterA. […]

READ MORE

HDFS DELEGATION TOKEN can’t be found in cache

The problem can be appears in Hadoop’s NodeManager logs. Usually it means that NodeManager is trying to use an expired / not renewed HDFS delegation token. For example, you can face this error while app log aggregation process. The timeline is: Your application pass HDFS delegation token to the NodeManager through the ContainerLaunchContext class, because […]

READ MORE

Resource changed on src filesystem

Full exception text: This can happen when some process overwrites application files in HDFS application directory while app is running. An example of the situation: You start app instance_1, which stores the distribution files in the hdfs://tmp/app folder. After a while you start the second instance_2 which stores the distribution files in the same HDFS […]

READ MORE

Spark concurrent write to same HDFS path

The problem Sometimes you need to run such a scenario when several Spark tasks write data along the same path to HDFS. During the execution of tasks, you may encounter some errors: Suppose we have one Spark task, that writes to the hdfs://data/test directory. At runtime, Spark will make a temporary directory: hdfs://data/test/_temporary/0. There is […]

READ MORE