Suppose we have our Java app running on Hadoop clusterA
, and we want to access remote HDFS based on Hadoop clusterB
. Let’s see how we can do it:
Configuration conf = new Configuration();
conf.addResource(new Path("/path/to/core-site-clusterB.xml"));
conf.addResource(new Path("/path/to/hdfs-site-clusterB.xml"));
FileSystem fileSystem = FileSystem.get(conf);
You need to go to clusterB
and gather core-site.xml
and hdfs-site.xml
from there (default location for Cloudera is /etc/hadoop/conf
) and put near your app running in clusterA.
Pay attention for that points:
- we are specifying
core-site.xml
andhdfs-site.xml
, not just one of them - we are sending
Path
object toaddResource()
method, not just ordinary String!
Troubleshooting
java.lang.IllegalArgumentException: Wrong FS: hdfs://clusterA/your/path, expected: hdfs://clusterB
See “pay attention” section above.
java.lang.IllegalArgumentException: java.net.UnknownHostException: clusterA
See “pay attention” section above.
java.lang.IllegalArgumentException: Wrong FS: hdfs://clustername:8030/your/path, expected: hdfs://clustername
Just remove port from path you want to access: make hdfs://clustername/your/path
instead of hdfs://clustername:8030/your/path
.
Additional thanks
Doug Turnbull, https://opensourceconnections.com/blog/2013/03/24/hdfs-debugging-wrong-fs-expected-file-exception
Telegram channel
If you still have any questions, feel free to ask me in the comments under this article or write me at promark33@gmail.com.
If I saved your day, you can support me 🤝