When we run job on cloud we need to specify storage location for input as well as output, which is available for storage as well as retrieval. In this tutorial we will learn how to specify s3 for input / output.
What is S3: Amazon S3 (Simple Storage Service) is a data storage service. Amazon S3 is storage for the Internet. It is designed to make web-scale computing easier for developers
There are two ways that S3 can be used with Hadoop's Map/Reduce, either as a replacement for HDFS using the S3 block filesystem or as a convenient repository for data input to and output from MapReduce, using either S3 filesystem.
To Configure S3 as I/O for Hadoop Map Reduce job visit: