Posts

Showing posts from December, 2016

HDFS filenames without rest of the file details

Get the list of file names and absolute path alone from HDFS: General hdfs list output would be: $ hdfs dfs -ls -rw-r--r--   3 foo bar    6346268 2016-12-28 02:52 /user/foo/data/file007.csv -rw-r--r--   3 foo bar    4397850 2016-12-28 02:52 /user/foo/data/file014.csv -rw-r--r--   3 foo bar   13297361 2016-12-28 02:52 /user/foo/data/file020.csv -rw-r--r--   3 foo bar   10400852 2016-12-28 02:53 /user/foo/data/file118.csv -rw-r--r--   3 foo bar   10184639 2016-12-28 02:52 /user/foo/data/file205.csv -rw-r--r--   3 foo bar    5542293 2016-12-28 02:53 /user/foo/data/file214.csv -rw-r--r--   3 foo bar    6085128 2016-12-28 02:53 /user/foo/data/file307.csv But we would need get just the absolute hdfs file paths especially in shell scripts to perform CRUD operations, like: /user/foo/data/file007.csv /user/foo/data/file014.csv /user/foo/data/file020.csv /user/foo/data/file118.csv /user/foo/data/file205.csv /user/foo/data/file214.csv One way to achieve this is using awk, like: $