File & I/O Management Interview Questions - Operating System
Last Updated :
01 Sep, 2025
File and I/O management in an operating system ensures organized data storage and efficient device communication. The file system handles creation, deletion, access control, and directory structures, while I/O management coordinates device drivers, buffering, caching, and scheduling.
1. Explain the concept of I/O scheduling in Operating Systems.
- I/O scheduling decides the order in which pending I/O requests are served.
Algorithms include:
- FCFS – Fair, but may cause long waits.
- SSTF – Shortest Seek Time First, reduces average latency but risks starvation.
- SCAN/LOOK – Elevator-style, balances fairness and efficiency.
2. How does the OS handle simultaneous read/write requests to the same file without causing data corruption?
When multiple processes try to access the same file concurrently, the OS must prevent race conditions and data corruption. It does this through the following mechanisms:
Advisory Locking: Processes voluntarily use system calls (e.g., fcntl, flock in UNIX) to coordinate access. Other processes can ignore the lock if not programmed to check.
Mandatory Locking: Enforced by the OS. If a file is locked, read/write system calls are blocked until the lock is released, ensuring strict mutual exclusion.
- Buffer Cache & Synchronization: The OS maintains a buffer cache to temporarily hold file data. Synchronization techniques (mutexes, semaphores) prevent multiple processes from modifying the same cached block simultaneously. Policies like write-through (immediate disk update) and write-back (delayed update with periodic flush) maintain consistency between cache and disk.
- Serialization of Writes: If two processes attempt concurrent writes, the OS may serialize (queue) them, ensuring one completes before the other begins.
- Journaling / Log-based File Systems: Modern file systems (e.g., ext4, NTFS) use journaling. Before making changes, the intent is logged. If a crash occurs mid-write, the OS replays the journal to recover consistent state, preventing corruption.
3. Differentiate between synchronous and asynchronous I/O, and explain when asynchronous I/O gives significant advantages.
- Synchronous I/O blocks the process until the operation completes. While, synchronous I/O allows the process to continue execution while I/O occurs in the background, notifying completion via interrupts or callbacks.
- Asynchronous I/O excels in high-throughput in low-latency applications like web servers, where I/O wait times can be overlapped with computation. Modern OS use AIO system calls, IOCP (Windows), or epoll/kqueue for efficiency.
4. Explain the role of buffer cache in file I/O and why write-back caching can lead to data loss in case of system crashes.
Buffer cache temporarily stores disk blocks in RAM to reduce I/O latency.
- Write-through caching writes immediately to disk (safe but slower).
- Write-back caching delays writes for performance, but risks data loss if the system crashes before flushing buffers.
To mitigate risks, journaling file systems (e.g., ext4, NTFS) maintain logs of pending changes, allowing recovery after crashes.
5. How does a journaling file system differ from a non-journaling one in terms of recovery?
- Non-journaling FS: Relies on full file system checks (e.g., fsck) after crashes, which can be slow for large disks.
- Journaling FS: Records metadata and/or data changes in a log (journal) before committing them. Upon recovery, the OS replays committed transactions, skipping incomplete ones. This reduces recovery time and ensures metadata integrity.
6. What is Direct I/O, and why is it sometimes preferred over buffered I/O?
Direct I/O is a file I/O mechanism where data is transferred directly between user space and the storage device, bypassing the OS buffer cache. Unlike buffered I/O, which temporarily stores data in the kernel’s page cache, Direct I/O avoids this extra layer.
Advantages of Direct I/O:
- Avoids Double Buffering: Data is not copied both to the OS cache and the application’s cache, saving memory.
- Better Performance for Certain Applications: High-performance applications like databases already implement their own caching and prefer direct control over disk access.
- Reduced Cache Pollution: Prevents the OS page cache from being filled with large sequential reads/writes that might evict frequently used pages.
Drawbacks:
- Applications must manage caching explicitly. Poor caching strategies can lead to inefficient reads/writes and degraded performance.
- I/O alignment restrictions (e.g., block size multiples) make programming more complex.
7. How do file systems implement sparse files, and why are they useful?
Sparse files allow unallocated disk regions (holes) to represent large blocks of zeroes without physically storing them.
- Implemented by skipping allocation for zero-filled regions in the inode’s block map.
- Useful for virtual disk images, database files, and large scientific datasets.
- They save disk space but require the OS and applications to handle read/write offsets correctly.
8. Explain the layered architecture of I/O management in an OS.
I/O management typically has layers:
- User-level I/O library (e.g., stdio)
- System call interface (e.g., read, write)
- Device-independent I/O (buffering, naming, access control)
- Device drivers (hardware-specific control)
- Interrupt handlers (handle I/O completion events)
This abstraction allows portability and modularity in device management.
RAID (Redundant Array of Independent Disks) improves performance and/or reliability via striping, mirroring, and parity.
- RAID 0: Stripes data across disks for speed (no redundancy).
- RAID 1: Mirrors data for reliability (slower writes).
- RAID 5/6: Uses parity for fault tolerance and space efficiency.
The OS interacts with RAID as a single logical volume, but driver and filesystem optimizations ensure proper I/O scheduling.
10. How does the OS manage I/O scheduling for devices with very different speeds, such as SSDs and HDDs?
OS uses different scheduling algorithms:
- HDDs: Algorithms like SSTF, SCAN, and C-SCAN to minimize seek times.
- SSDs: Focus on wear leveling and avoiding write amplification rather than seek optimization.
Modern OSes use multi-queue schedulers (e.g., Linux MQ-Deadline, BFQ) to handle multiple devices with tailored strategies.
11. How does an OS handle file system mounting, and what challenges arise in mounting distributed file systems?
File system mounting is the process of making a file system accessible at a specific point in the directory hierarchy. The OS maps the physical storage structure to a logical path for user access. In local systems, mounting involves reading the superblock, validating file system integrity, and integrating directory structures.
In distributed file systems (DFS) like NFS, challenges include:
- Network latency affecting performance.
- Cache consistency between client and server.
- Authentication and access control across systems.
- Partial failures where the file system becomes unresponsive.
Techniques like stateless protocols, write-back caching, and failover replication are used to mitigate these issues.
12. Explain the working of journaling in file systems and its role in crash recovery.
Journaling is a fault-tolerance technique where file system changes are first recorded in a dedicated journal (log) before being committed to the main storage. In case of a crash, the system replays or discards incomplete transactions from the journal, preventing corruption. Journaling modes include:
- Write-back: Data and metadata are logged after write operations.
- Ordered: Metadata is journaled before data.
- Data journaling: Both data and metadata are logged for maximum safety.
While journaling reduces corruption risks, it may introduce write overhead. File systems like ext4, NTFS, and XFS use journaling for fast recovery.
13. What is the difference between buffered and unbuffered I/O?
- Buffered I/O: Uses OS buffers to temporarily store data before writing to disk or after reading. Improves performance by reducing system calls.
- Unbuffered I/O: Directly transfers data between user process and disk. Provides more control but can be slower due to frequent I/O operations.
14. How does the OS implement file locking mechanisms, and what are the types of locks used?
File locking ensures controlled access to files in multi-process environments. Types include:
Advisory Locking: Processes voluntarily check and honor locks (e.g., UNIX flock).
Mandatory Locking: The OS enforces lock restrictions at the kernel level.
Locks can be:
- Shared (Read) Lock: Multiple readers allowed, no writers.
- Exclusive (Write) Lock: Only one writer, no readers.
Challenges: Challenges include deadlocks, starvation, and lock contention. Advanced systems use byte-range locking for partial file locks.
15. What is the difference between block devices and character devices, and how does the OS manage I/O for each?
Block Devices
- Store and transfer data in fixed-size blocks (e.g., HDDs, SSDs, USB drives).
- Support random access, meaning data can be read/written in any order.
- The OS manages them using block device drivers, buffer caches, and I/O schedulers (e.g., elevator algorithm) to optimize throughput.
Character Devices:
- The OS uses different device drivers and I/O request queues for each, optimizing access patterns and caching policies accordingly.
- Handle data as a continuous stream of characters (e.g., keyboards, mice, serial ports).
- Support only sequential access, with little or no buffering.
- The OS uses character device drivers and simple interrupt-driven I/O to handle real-time data input/output.
I/O scheduling determines the order in which I/O requests are serviced. Common algorithms:
- FCFS: Simple, fair, but may cause long delays.
- SSTF (Shortest Seek Time First): Reduces seek time but may starve distant requests.
- SCAN & C-SCAN: Move disk head in a sweeping motion, reducing variance in response time.
- Deadline Scheduling: Prioritizes requests with time constraints.
Choosing the right algorithm depends on workload, databases may benefit from SSTF or deadline scheduling, while general-purpose systems may prefer SCAN for fairness.
17. How does the OS implement buffer cache and page cache in file systems?
The buffer cache stores disk blocks in RAM for faster access, while the page cache keeps entire file pages. Modern OSes integrate these caches for unified management, reducing redundant copies. The write-back policy improves performance by delaying writes, while write-through ensures data integrity. Cache replacement uses algorithms like LRU (Least Recently Used) to evict old entries.
18. Discuss the concept of sparse files and how file systems store them efficiently.
Sparse files contain empty regions that do not consume physical disk space. Instead, the file system maintains metadata indicating which parts are unallocated. When read, the OS fills gaps with zeros. This technique saves space for large but partially filled files (e.g., VM disk images, database backups). However, improper handling can lead to data loss during compression or backup if tools do not preserve sparsity.
19. Explain Direct Memory Access (DMA) and its advantages in I/O operations.
Direct Memory Access (DMA) is an I/O mechanism where a device can transfer data directly between main memory and itself without continuous CPU involvement. The CPU only initializes the DMA controller (start address, size, direction), after which the controller handles the transfer. An interrupt notifies the CPU when the operation finishes.
Advantages of DMA
- CPU Offloading: The CPU is free for computation instead of copying data.
- High Throughput: Essential for high-speed devices (e.g., SSDs, GPUs, network cards).
- Reduced Latency & Overhead: Minimizes multiple data copies between device, memory, and CPU.
- Parallelism: I/O and computation can overlap.
Challenges
- Contention: Multiple DMA devices competing for memory bandwidth.
- Virtual Memory Issues: DMA works with physical addresses; OS must translate from virtual to physical.
- Cache Coherence: Ensuring CPU cache and memory stay consistent when DMA bypasses caches.
20. How does the OS ensure data consistency in a multi-user environment with simultaneous I/O requests?
The OS ensures data consistency through:
- File locking mechanisms (advisory, mandatory, byte-range).
- Atomic writes for small data updates.
- Journaling and write-ahead logging to recover from crashes.
- I/O scheduling to prioritize conflicting requests.
In distributed file systems, additional cache coherency protocols and lease-based locking are implemented to avoid stale reads and overwrites.
Explore
Basics
Process Scheduling
Process Synchronization
Deadlock
Multithreading
Memory Management
Kernel & System-Level Concepts
Disk Management
Important Links