Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Using mmap means that you need to be able to handle memory access exceptions when a disk read or write fails. Examples of disk access that fails includes reading from a file on a Wifi network drive, a USB device with a cable that suddenly loses its connection when the cable is jiggled, or even a removable USB drive where all disk reads fail after it sees one bad sector. If you're not prepared to handle a memory access exception when you access the mapped file, don't use mmap.
 help



Ah, reminds me of 'Are You Sure You Want to Use MMAP in Your Database Management System? (2022)' https://db.cs.cmu.edu/mmap-cidr2022/

Ah yes, the ever popular "mongoDB's developers were incompetent therefore mmap is bad" paper.

Pure tripe. https://www.symas.com/post/are-you-sure-you-want-to-use-mmap...


You can even mmap a socket on some systems (iOS and macOS via GCD). But doing that is super fragile. Socket errors are swallowed.

My interpretation always was the mmap should only be used for immutable and local files. You may still run into issues with those type of files but it’s very unlikely.


mmap is also good for passing shared memory around.

(You still need to be careful, of course.)


It’s also great for when you have a lot of data on local storage, and a lot of different processes that need to access the same subset of that data concurrently.

Without mmap, every process ends up caching its own private copy of that data in memory (think fopen, fread, etc). With mmap, every process accesses the same cached copy of that data directly from the FS cache.

Granted this is a rather specific use case, but for this case it makes a huge difference.


C doesn't have exceptions, do you mean signals? If not, I don't see how that is that any different from having to handle I/O errors from write() and/or open() calls.

Yes, it’s the SIGBUS signal.

It's very different since at random points of your program your signal handler is caleld asynchronously, and you can only do a very limited signal-safe things there, and the flow of control in your i/o, logic etc code has no idea it's happening.

tldr; it's very different.


Well at least in this case the timing won't be arbitrary. Execution will have blocked waiting on the read and you will (AFAIK) receive the signal promptly in this case. Since the code in question was doing IO that you knew could fail handling the situation can be as simple as setting a flag from within the signal handler.

I'm unclear what would happen in the event you had configured the mask to force SIGBUS to a different thread. Presumably undefined behavior.

> If multiple standard signals are pending for a process, the order in which the signals are delivered is unspecified.

That could create the mother of all edgecases if a different signal handler assumed the variable you just failed to read into was in a valid state. More fun footguns I guess.


> Since the code in question was doing IO that you knew could fail handling the situation can be as simple as setting a flag from within the signal handler.

If you are using mmap like malloc (as the article does) you don't necessarily know that you are "reading" from disk. You may have passed the disk-backed pointers to other code. The fact that malloc and mmap return the same type of values is what makes mmap in C so powerful AND so prone to issues.


Yes, and for writing (the example is read-write) it's of course yet another kettle of fish. The error might never get reported at all. Or you might get a SIGBUS (at least with sparse files).

Signals are extremely bad to work with. Would rather do error handling in javascript. It feels like trying to write low level primitives in rust or trying to learn c++. There are so many edge cases that I start questioning what am I doing with my life

> file on a Wifi network drive,

I would simply not mmap this.

> If you're not prepared to handle a memory access exception when you access the mapped file, don't use mmap.

fread can fail too. I don't know why you would be prepared for one and not the other.


Because you're way deep down the call stack in some function that happened to take in a pointer, far far away from the code that opened the file.

If that's your program design then fread is not a substitute. Because you would need to pass in the FILE* pointer to all those calls.

And what are you hoping to do in those call stacks when you find an error? Can any of that logic hope to do anything useful if it can't access this data? Let the OS handle this. crash your program and restart.


Do these really ever result in access failures instead of just hangs? How are they surfaced to processes?

In my experience, all these things just cause whatever process is memory mapping to freeze up horribly and make me regret ever using a network file system or external hard drive.


Depends on the implementation.

Most I/O calls return errors when reads or writes fail, but NFS, for example, would traditionally block on network errors by default — you probably don't want your entire lab full of diskless workstations to kernel panic every time there's a transient network glitch.

You also have the issue of multiple levels of caching and when and how to report delayed errors to programs that don't explicitly use mechanisms like fsync.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: