Thread: Is there any reasons to not use "mmap" to read files?

  1. #1
    Registered User
    Join Date
    Oct 2021
    Posts
    138

    Is there any reasons to not use "mmap" to read files?

    I'm planning to make a change to my program to use "mmap" to the contents of a file rather than "fgetc". This is because I learned that "mmap" can do it faster. The thing is, are there any problems that can occur when using "mmap"? I need to know now because changing this means changing the design of the program and this is not something pleasant to do so I want to be sure that I won't have to change back in the future (where the project will be even bigger).

  2. #2
    Registered User
    Join Date
    Sep 2020
    Posts
    150
    One problem with mmap is that it is not standard, it's Linux only.
    If you don't bother about compatibility then go ahead.

    BTW. Have you decided to use C instead of C++ ?
    I noticed you also have some posts about C++

  3. #3
    Registered User
    Join Date
    Oct 2021
    Posts
    138
    Quote Originally Posted by thmm View Post
    One problem with mmap is that it is not standard, it's Linux only.
    If you don't bother about compatibility then go ahead.
    Thanks you! You are actually right, I totally forgot about that! I want my program to be cross platform and I suppose other platforms will allow you to do that with their memory allocation function but I have to learn about them too. In any case, I could probably use different methods for different platforms to at least gain the extra speed in any platform that supports it. I suppose other Unix platforms support this as well (checked for FreeBSD and it does) so it turns down to Windows and MacOS. I'm going to write a compiler and speed is of highest importance!

    Quote Originally Posted by thmm View Post
    BTW. Have you decided to use C instead of C++ ?
    I noticed you also have some posts about C++
    Thank you for bothering to check! If I remember correctly, I asked about modules in C++ but it turns out that I wasn't able to find a way to do what I wanted to do. I started my project in C, then convert it to D and then after some development, tried to go back to C because of the compilation times of TCC but it turns out that it is a pain in the ass to convert at this points, not because I'm using advanced D features that C doesn't have but because of the header files.

    So at this point I'm using D and I will keep it this way probably unless someone else can suggest something (truly) better. Of course when I finish the compiler and it becomes stable, I will try to bootstrap as soon as possible but this will take years....

  4. #4
    Kiss the monkey. CodeMonkey's Avatar
    Join Date
    Sep 2001
    Posts
    937
    mmap() is standardized by POSIX: mmap

    So, it's not included in the C standard library, but it is available on Linux, MacOS, AIX, etc.

    mmap() tells the system to map a file to the process's virtual memory. read() and its derivatives (fgetc(), etc.) get bytes from a file and copy them into a user-supplied buffer. In the ideal case, mmap() can allow for faster access because you don't need to copy into an intermediate buffer. Also, the C standard functions fgetc() and friends operate on FILE streams, not directly on files. In that case you're copying from the file into a FILE buffer and then possibly into a user-supplied buffer.

    mmap() isn't certain to make your program faster, but it will make your program more complicated and less portable.

    If your program is heavy on I/O and a profiler tells you that you're spending a lot of time in read/write syscalls, then you might consider switching to mmap() and see whether performance improves. Otherwise, I'd stick with the standard functions.
    "If you tell the truth, you don't have to remember anything"
    -Mark Twain

  5. #5
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,675
    > Of course when I finish the compiler and it becomes stable
    If you continue to chase the performance rabbit through the warren of twists, turns and dead-ends, then you're never going to finish (forget about stable).

    I mean, you've already bounced though C++ -> C -> D -> ? (oh, I've lost track).

    You have to make it 'right' before you can even think of making it 'fast'.
    Oh and before you start trying to make it 'fast', make sure you have a solid test suite to check you didn't break anything.
    Not to mention having your source code in the likes of git, so you can have stable, development and test branches.

    > I'm going to write a compiler and speed is of highest importance!
    Speed ain't worth spit if it doesn't exist, or it doesn't work.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  6. #6
    Registered User
    Join Date
    Oct 2021
    Posts
    138
    Quote Originally Posted by CodeMonkey View Post
    mmap() is standardized by POSIX: mmap

    So, it's not included in the C standard library, but it is available on Linux, MacOS, AIX, etc.

    mmap() tells the system to map a file to the process's virtual memory. read() and its derivatives (fgetc(), etc.) get bytes from a file and copy them into a user-supplied buffer. In the ideal case, mmap() can allow for faster access because you don't need to copy into an intermediate buffer. Also, the C standard functions fgetc() and friends operate on FILE streams, not directly on files. In that case you're copying from the file into a FILE buffer and then possibly into a user-supplied buffer.

    mmap() isn't certain to make your program faster, but it will make your program more complicated and less portable.

    If your program is heavy on I/O and a profiler tells you that you're spending a lot of time in read/write syscalls, then you might consider switching to mmap() and see whether performance improves. Otherwise, I'd stick with the standard functions.

    Thank you! I understand your points and I agree! I will keep using "read"! Have a nice day my friend!

  7. #7
    Registered User
    Join Date
    Oct 2021
    Posts
    138
    > If you continue to chase the performance rabbit through the warren of twists, turns and dead-ends, then you're never going to finish (forget about stable).
    Yeah, I have this problem! I tried to write the frontend but I have no experienced and I tried to figure out how to do things by myself like people did in the "good ol' days"! However, It seems that my code is very annoying to work with. Probably my design is bad and I have to properly read something and learn how to implement a proper frontend that I will be able to work with. I will ask a question to see if people can suggest any good sources to learn because at this point, my code makes my wanna give up because there are bugs and things don't work as expected and it is very annoying to fix them every time and it needs a lot of work. If I keep it this way, I can't imagine how much worse it will be when the parser gets bigger and more complete and I add the backend. So yeah, I need to change this ASAP!

    > I mean, you've already bounced though C++ -> C -> D -> ? (oh, I've lost track).
    Actually, I first wrote the source code on C (not C++), then changed it to D so I don't have to mess with header files (and well at that time, also because of the OOP but I don't care about this anymore) because the source code was getting bigger and having to deal with the header files was getting out of hand. Now, some days ago I thought about switching back to C again because, Ii turns out that I don't use and need anything else from D other then the module system it offers and because I'm OBSESSED with TCC and its super fast compilation times (which is what inspired me to make my own compiler). It took me a couple of time to convert the project and still I didn't finished it because I realized that dealing with header files is a HELL so I stayed in D and like I said, probably will stay here unless someone can offer something better. But again, LIMITED features that will also be supported in my language too so the bootsrap becomes easy when the time comes.

    > You have to make it 'right' before you can even think of making it 'fast'.
    > Speed ain't worth spit if it doesn't exist, or it doesn't work.
    Thank you! Love the blackpill truths right there! I realized that my question was stupid shortly after I posting it so I continued with "read" anyways and now I'm just answering to everyone because of course I will not just ignore you.

    > Oh and before you start trying to make it 'fast', make sure you have a solid test suite to check you didn't break anything.
    > Not to mention having your source code in the likes of git, so you can have stable, development and test branches.
    You found another weak point of mine! I also need to learn how to create good test suites because I'm a n00b on that one (and from what I heard, creating good tests is not an easy task for anyone and each program is different so there are no rules that work everywhere). When it comes to git, at least I can use it for the most basic stuff (and keep learning more) so I'm fine with this. And for now, even tho I have nothing serious, I uploaded in an online private repo an Gitlab just to be safe. So thank you for the advise!

    Tbh, the problem is that I want to make something so big and complex without having a lot of experience first. The problem is that for the whole 2 years that I'm "playing" (because let's be honest, that's what I did and still doing), I haven't found NOTHING that excites me about programming and I want to PERSONALLY do. I'm talking about systems programming, web dev, game dev, desktop applications, mobile applications, AIs, hacking etc. Like... NOTHING! Do you believe this? Only making my own compiler excites me as a project!

    Maybe I haven't found anything yet or probably, I'm starting thinking that the problem isn't that I don't like anything but that I don't like the tools (programming languages) to make them. I really don't know. I was dreaming about making my own language since I started (well some months later but close enough) and now I'm kinda excited that I'm getting closer and closer and that at least now, I understand the theory behind it. I just have to first learn:

    how to create a parser.
    how to create a lexer.
    Assembly (X86_X64, ARM for mobile and RISC-V as I believe it will take over in 1 or 2 decades) to get an "easy" introduction to how instructions work.
    Machine language (for all the platforms that I mentioned).

    And this is just generalized, the real small problems that I will face will be huge! I mean, we can talk about them but you don't want to know, trust me! And on top of all that (few and super easy) things, I will also have to actually design a language that people will find useful and enjoyable to use (because if the language is not enjoyable to use then people can keep using C++, lol!) and that will offer something different. Which itself is a whole another separate and complex task!

    So yeah, I'm putting myself into very deep waters so wish me luck on that one! I don't see me finishing it in this decade and I don't seem me getting interesting on anything else too. I just wish C supported modules and I would stick with it! But tbh, the modules system is very very complicated to implement so I can see why they didn't do it. Sorry for this huge messy reply! It represents my programming skills, LMAO!

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 4
    Last Post: 08-24-2014, 08:26 PM
  2. binary read of two files getting all "0" zeros.
    By kryptkat in forum C++ Programming
    Replies: 7
    Last Post: 01-09-2012, 01:01 PM
  3. What are the differences between "brk()" and "mmap()"?
    By meili100 in forum Linux Programming
    Replies: 7
    Last Post: 03-30-2008, 04:16 PM
  4. "itoa"-"_itoa" , "inp"-"_inp", Why some functions have "
    By L.O.K. in forum Windows Programming
    Replies: 5
    Last Post: 12-08-2002, 08:25 AM
  5. "CWnd"-"HWnd","CBitmap"-"HBitmap"...., What is mean by "
    By L.O.K. in forum Windows Programming
    Replies: 2
    Last Post: 12-04-2002, 07:59 AM

Tags for this Thread