Java access remote HDFS from current Hadoop cluster

Suppose we have our Java app running on Hadoop clusterA, and we want to access remote HDFS based on Hadoop clusterB. Let’s see how we can do it:

Configuration conf = new Configuration();
conf.addResource(new Path("/path/to/core-site-clusterB.xml"));
conf.addResource(new Path("/path/to/hdfs-site-clusterB.xml"));
FileSystem fileSystem = FileSystem.get(conf);

You need to go to clusterB and gather core-site.xml and hdfs-site.xml from there (default location for Cloudera is /etc/hadoop/conf) and put near your app running in clusterA.

Pay attention for that points:

  • we are specifying core-site.xml and hdfs-site.xml, not just one of them
  • we are sending Path object to addResource() method, not just ordinary String!

Troubleshooting

java.lang.IllegalArgumentException: Wrong FS: hdfs://clusterA/your/path, expected: hdfs://clusterB

See “pay attention” section above.

java.lang.IllegalArgumentException: java.net.UnknownHostException: clusterA

See “pay attention” section above.

java.lang.IllegalArgumentException: Wrong FS: hdfs://clustername:8030/your/path, expected: hdfs://clustername

Just remove port from path you want to access: make hdfs://clustername/your/path instead of hdfs://clustername:8030/your/path.

Additional thanks

Doug Turnbull, https://opensourceconnections.com/blog/2013/03/24/hdfs-debugging-wrong-fs-expected-file-exception

Telegram channel

If you still have any questions, feel free to ask me in the comments under this article or write me at promark33@gmail.com.

If I saved your day, you can support me 🤝

Leave a Reply

Your email address will not be published. Required fields are marked *