Hadoop is a framework for distributed storage and processing of large datasets across clusters of computers. It utilizes HDFS for storage, which distributes data across nodes and replicates files for fault tolerance. HDFS uses a master/slave architecture, with a NameNode managing the file system namespace and DataNodes storing file data in blocks. The Hadoop API provides access to HDFS through interfaces like FileSystem and FSDataInputStream, allowing applications to read, write, and manipulate data in a distributed manner.