The Inner Workings of File Handling: Exploring Open, Read, Write and Close Operations ๐
Ever wonder why we perform open file operation before reading or writing to it and why we close it afterward?
Why can't we directly perform read or write operation?
Let us learn what happens when you perform these file operations
File Handling in Python
# open a file in read and write mode
f = open('test.txt', 'r+')
# read the contents of the file
file_contents = f.read()
# write to the test.txt
f.write("This is test file")
print(file_contents)
# close the file
f.close()
These file operations remain consistent across all programming languages because they act as wrapper functions that execute OS system calls under the hood.
So, lets dig into open, write, read and close system calls of the operating system
Open
int open(const char *pathname, int flags, mode_t mode );
pathname: Path to the file that you want to open
flags: File access mode (Eg: O_RDRW: Open for read and write, O_RDONLY: Open for only read only etc ..)
mode: Permission to file if it is not created (Eg: 0744)
Open system call returns the positive integer which is called as file descriptor on success and -1 on failure.
WHAT is file descriptor?
File descriptor is the index of the file descriptor table (array). Each element of file descriptor table contains the memory location of entry in open file table.
Before understanding why a file descriptor is needed, we need to learn about the open file table and inode table.
Open file table
The Open File Table is a system-wide data structure maintained by the operating system to keep track of all files currently open by any process. Each entry in this table contains crucial information about an open file such as file offset, file mode and memory address of inode of a file.
The file offset is an integer value that represents the number of bytes from the beginning of the file to the current position.
When a file is first opened, the file offset is typically set to zero. When the data is read, the file offset is incremented by the number of bytes read.
Inode table
Index node (inode) data structure contains the metadata information about file/directory.
Inode contains the info such as size, timestamp and memory location of data blocks. Data in the disk is not stored sequentially. So, we maintain the multiple data block pointers to identify the file data in the disk.
WHY file descriptor?
When open operation is performed, index of file descriptor table is returned. If the value returned is 3, then file_descriptor[3] gives the memory location of open file table entry.
Open file table further points to the inode. This inode contains the pointers to the file data blocks.
So with file descriptor we can identify the data blocks in the disk.
File descriptor is used to identify a file resource in a system and this file descriptor must be passed to read/write system calls and thus read/write can identify the location of file data.
Read
ssize_t read(int fd, void buf[.count], size_t count);
Read system call returns the number of bytes of data it read. Reads the data from the file and stores it in a buffer.
Read system call needs the file descriptor (fd) to identify the file in which it needs to perform operation.
f = open('test.txt', 'r+')
# file_contents is the buffer
file_contents = f.read()
Write
ssize_t write(int fd, const void buf[.count], size_t count);
Write system call returns number of bytes of data it written.
Write system call writes the data from the buffer to the file referred through file descriptor (fd).
# Buffer is memory location of "This is test file"
f.write("This is test file")
Close
int close(int fd);
Close system call removes the file descriptor entry from file descriptor table and removes all the data related to file descriptor from open table and inode.
This fd number can be re used again after close system call.
If we don't close it then we might end up having maximum file descriptors in the system and cannot perform further new file operations.
# close the file
f.close()
Predefined File Descriptors
File descriptors 0, 1 and 2 are reserved for stdin, stdout and stderr respectively.
References
open(2) - Linux manual page (man7.org)
close(2) - Linux manual page (man7.org)
Thanks for reading.
I would love to hear your thoughts and suggestions ๐