Suppose we need to work with different HDFS (clusterB
, for instance) from our Spark Java application, running on clusterA
.
Firstly, you need to add --conf
key to your run command. Depends on Spark version:
- (Spark 1.x-2.1)
spark.yarn.access.namenodes=hdfs://clusterA,hdfs://clusterB
- (Spark 2.2+)
spark.yarn.access.hadoopFileSystems=hdfs://clusterA,hdfs://clusterB
Secondly, when you creating Spark’s Java context, add that:
javaSparkContext.hadoopConfiguration().addResource(new Path("core-site-clusterB.xml"));
javaSparkContext.hadoopConfiguration().addResource(new Path("hdfs-site-clusterB.xml"));
You need to go to clusterB
and gather core-site.xml
and hdfs-site.xml
from there (default location for Cloudera is /etc/hadoop/conf
) and put near your app running in clusterA.
Pay attention for that points:
- we are specifying
core-site.xml
andhdfs-site.xml
, not just one of them - we are sending
Path
object toaddResource()
method, not just ordinary String!
Troubleshooting
If you facing java.net.UnknownHostException: clusterB
, then try to put full namenode address of your remote HDFS with port (instead of hdfs/cluster short name) to --conf
into your running command:
- (Spark 1.x-2.1)
spark.yarn.access.namenodes=hdfs://clusterA,hdfs://namenode.fqdn:port
- (Spark 2.2+)
spark.yarn.access.hadoopFileSystems=hdfs://clusterA,
hdfs://namenode.fqdn:port
If you facing AccessControlException
error, check this: https://mchesnavsky.tech/sparks-accesscontrolexception-permission-denied/.
If it’s not working for you, or you facing with another error, check “Troubleshooting” section in my recent article: https://mchesnavsky.tech/java-access-remote-hdfs-from-current-hadoop-cluster.
Additional thanks
- Doug Turnbull, https://opensourceconnections.com/blog/2013/03/24/hdfs-debugging-wrong-fs-expected-file-exception
- @falbani, https://community.cloudera.com/t5/Support-Questions/How-to-Read-Files-from-Remote-Hadoop-HA-cluster-using-spark/td-p/187430
- @dbompart, https://community.cloudera.com/t5/Support-Questions/Kerberos-Cross-Realm-HDFS-Access-Via-Spark-Application/m-p/201968
Telegram channel
If you still have any questions, feel free to ask me in the comments under this article or write me at promark33@gmail.com.
If I saved your day, you can support me 🤝