amazon s3 - distcp from s3 to hadoop - file not found -
i getting below error file not found. well...the file exists. newbie distcp. using cloudera fyi.
https://s3.amazonaws.com/test-development/test/201305031003_0_ubuntu.gz ubuntu@ubuntu:~$ hadoop distcp -i 201305031003_0_ubuntu.gz s3://id:key@test-development/test/201305031003_0_ubuntu.gz 13/05/04 14:54:29 info tools.distcp: srcpaths=[201305031003_0_ubuntu.gz] 13/05/04 14:54:29 info tools.distcp: destpath=s3://id:key@test-development/test/201305031003_0_ubuntu.gz failures, global counters inaccurate; consider running -i copy failed: org.apache.hadoop.mapred.invalidinputexception: input source 201305031003_0_ubuntu.gz not exist. @ org.apache.hadoop.tools.distcp.checksrcpath(distcp.java:641) @ org.apache.hadoop.tools.distcp.copy(distcp.java:656) @ org.apache.hadoop.tools.distcp.run(distcp.java:881) @ org.apache.hadoop.util.toolrunner.run(toolrunner.java:70) @ org.apache.hadoop.util.toolrunner.run(toolrunner.java:84) @ org.apache.hadoop.tools.distcp.main(distcp.java:908)
the first param source should path s3 , path should s3n:// , not s3:// (native s3) unless you've written data s3 using s3:// (block file system)
Comments
Post a Comment