Only using the Hadoop libraries I can list all the files that are in a subdirectory with this:
// list all sites we have data for FileSystem fs = FileSystem.get(new Configuration()); FileStatus status[] = fs.listStatus(new Path("hdfs:///dir/subdir/")); for ( FileStatus s : status ) { try { FileStatus[] metricFile = fs.listStatus(new Path(s.getPath().toString() + "/file.json")); logger.info("File: " + metricFile[0].getPath().toString()); } catch ( IOException e ) { // there is no metric file } }
Since I use Spark for most applications I do I prefer this way of dealing with it:
SparkConf sc = new SparkConf().setAppName("Learning"); JavaSparkContext jsc = new JavaSparkContext(sc); JavaPairRDD<String, String> allMetricFiles = jsc.wholeTextFiles("hdfs:///dir/subdir/*/file.json"); for ( Tuple2<String, String> each : allMetricFiles.toArray() ) { logger.info("Only metric file: " + each._1); }