HDFS filenames without rest of the file details

Get the list of file names and absolute path alone from HDFS:


General hdfs list output would be:
$ hdfs dfs -ls
-rw-r--r--   3 foo bar    6346268 2016-12-28 02:52 /user/foo/data/file007.csv
-rw-r--r--   3 foo bar    4397850 2016-12-28 02:52 /user/foo/data/file014.csv
-rw-r--r--   3 foo bar   13297361 2016-12-28 02:52 /user/foo/data/file020.csv
-rw-r--r--   3 foo bar   10400852 2016-12-28 02:53 /user/foo/data/file118.csv
-rw-r--r--   3 foo bar   10184639 2016-12-28 02:52 /user/foo/data/file205.csv
-rw-r--r--   3 foo bar    5542293 2016-12-28 02:53 /user/foo/data/file214.csv
-rw-r--r--   3 foo bar    6085128 2016-12-28 02:53 /user/foo/data/file307.csv

But we would need get just the absolute hdfs file paths especially in shell scripts to perform CRUD operations, like:
/user/foo/data/file007.csv
/user/foo/data/file014.csv
/user/foo/data/file020.csv
/user/foo/data/file118.csv
/user/foo/data/file205.csv
/user/foo/data/file214.csv

One way to achieve this is using awk, like:

$ hdfs dfs -ls | awk -F " " '{print $NF}'

Comments

Popular posts from this blog

Spark Cluster Mode - Too many open files

Binary Data to Float using Spark SQL