Let's say a client requests a page so the server begins to send the HTML. Does the server lock the file down?
Let's say there's a script attempting to modify an HTML file that is being currently sent. What happens in this scenario?
Let's say a client requests a page so the server begins to send the HTML. Does the server lock the file down?
Let's say there's a script attempting to modify an HTML file that is being currently sent. What happens in this scenario?
I would think a server would load the entire HTML file into memory before sending it (that's how I do it anyway, so that I can cache them in a map). I don't think your question is actually Server specific though is it? I mean this could apply to any file reading or writing right?
With Win32, I know you specify the sharing you want as the third parameter when creating a file to read from, I assume most API's do something similar in their lower layers automatically for the user?
WndProc = (2[b] || !(2[b])) ? SufferNobly : TakeArms;
None of the web servers I know of in POSIXy systems do.
There are actually two separate scenarios:
- A script modifies a HTML file (or any other file for that matter) in place, while that file is being sent by the server
It is basically a race. It is completely plausible that the server sends an intermediate form of the file. Apache, for example, reads the file as it is being sent -- this is good, because it allows clients like VLC to effectively stream media over a simple HTTP protocol, and Byte-Range support even allows for skipping and jumping to specific parts of the media without downloading the intervening data. However, that means that the server reads tend to be slow, and if the changes are slow too, the intermediate form is basically a mix of the old and the new data.
It should be noted that on non-Windows systems, it is typically the kernel that caches the file content, not the service itself. This way all available memory is fully utilized; if not for userspace programs, then for caching the files they use.
- A script modifies a file by creating a new file first, then renames it over the old one
There are two sub-cases for this.
- If the underlying file system is unix-like (EXT2, EXT3, EXT4, XFS, ZFS, BTRFS, etc.) based on the concept of inodes, the server always sends the old version of the file. This is because the open file descriptor refers to the inode; when deleted or replaced, only the file name is removed and thus it cannot be opened anymore, but as long as there are open file descriptors, they can read (and even write) the contents normally. Assuming the script is sane and only replaces the old file after the new file has been fully written, the server will always send either the old version or the new version, never a mix of the two.
- There may be other filesystems where replacing the old file with the new one changes the content already open file descriptors refer to. I am not sure if Linux supports any; the rename could be worked around in the virtual filesystem layer (basically postponing the file name change until there are no more open file descriptors), or even rejected (returning an EBUSY or ETXTBUSY error). However, nobody sane uses these filesystems for serving files, as they tend to not support even rudimentary user/group file access controls.
This means that if your script or program modifies a file, it should always create a new file first with the modified contents, then replace the old file with the new one (renaming the new one over the old one).
That does not apply just to HTTP servers, but everywhere Linux/Unix, too. For example, if your script or application does that, and the file happens to be open in nano, gedit, vim, emacs, or some other interactive editor, the editor can (and AFAIK does by default) detect the change, and will ask the user what to do about it. (Personally, I save using a new name, check the differences using diff -u, then re-edit.) This is also why typical utilities like sed -i et cetera edit files using exactly this approach.
This is fully compatible with userspace caching scenarios, too. The change detection is trivial: you simply compare the file properties obtained using fstat() when the file was read, to those of the target before overwrite, obtained using stat(). (There is a small time window where a race is possible if there are two concurrent changers, but in Linux it is easy to close up using file leases. This (leases and fanotify) allows e.g. transparent versioning management for configuration files where multiple admins (often as root) modify files, and do not bother to use the versioning tools. That goes way beyond your original question, though.)
Interesting. Thank you for the replies you guys. I'll have to keep this topic as a reference.