Thread: How would I go about trying to write my own cache manager?

  1. #16
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    Speaking of latency and network protocols, I started working on a new network file system few months ago that I thought would have a lot of potential, but just didn't have time to work on it anymore (I believe I finished read-only support). It was already much much faster (as in more responsive) than FTP/SSHFS/SMB.

    The idea is to do server-assisted pre-fetching.

    Overwhelming majority of human use cases look like this -
    1. open directory, get listing
    2. stat every single file in that directory
    3. wait a second or 2
    4. open a subdirectory, go to 1., until we find the file we are looking for
    5. open file, read entire file (because most of the time the file will be small)

    That is a lot of roundtrips, which is bad in high-latency networks (like the internet). That's why opening a directory on a FTP server over the internet takes couple seconds usually.

    By just changing the client side, we can try to pre-fetch and cache directory trees and metadata when the user opens a directory, but that still requires a minimum of 2 roundtrips (get listing, stat files) per level, assuming perfect parallelism (all stats can be retrieved at the same time, eg. like in optimized SMB). To do better than this requires server support.

    What if, the client can make a request like this - "give me a directory listing of this directory, and 64KB of anything (metadata and data) under the directory that you think I may need next"?

    In most cases (directory trees with just 10-20 files/subdirectories under each directory), metadata (directory listings and stats of all files) of everything (recursive) underneath the directory can be returned in the initial call to get directory listing.

    In some cases (directory with small files), the entire directory, including file contents, can be returned in the initial get directory listing request.

    I'm surprised something like this (a network file system that wastes bandwidth to provide very good latency) doesn't already exist. I searched very hard before starting.

  2. #17
    Registered User MutantJohn's Avatar
    Join Date
    Feb 2013
    Posts
    2,665
    Man, I don't know what I'm looking for next.

  3. #18
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Try writing your own cash manager (personal accounting tool)?
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  4. #19
    misoturbutc Hodor's Avatar
    Join Date
    Nov 2013
    Posts
    1,787
    Quote Originally Posted by laserlight View Post
    Try writing your own cash manager (personal accounting tool)?
    I can't think of anything more boring

    Besides, who pronounces cache the same as cash?

  5. #20
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    I'm surprised something like this (a network file system that wastes bandwidth to provide very good latency) doesn't already exist. I searched very hard before starting.
    O_o

    Are you a member of ACM (or otherwise have access to the archives)? If so, you can find papers on a multitude of approaches to caching and prefetch with some specifically addressing latency and predictive methods.

    [Edit]
    To be clear, you may also find these articles outside of ACM. I have no idea and am not advocating for ACM membership or anything.

    I'm just saying, I know ACM has some relevant resources in the event you wish to read more about the possibilities.
    [/Edit]

    That said, most implementations, in practice, do something remarkably similar "behind the scenes" in order to disguise high latency networks.

    For example, the NFS browser, [Removed: I already feel like I'm advocating an expensive tool.], I have (I don't "NFS for Windows Services" or whatever it is called.) starts building up a cache using normal commands as you navigate the remote store. Sure, the first few seconds of navigation can be noticeable, but by the time you can read the `ls' from average directories the cache is already getting populated with more directories.

    Soma
    “Salem Was Wrong!” -- Pedant Necromancer
    “Four isn't random!” -- Gibbering Mouther

  6. #21
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Quote Originally Posted by Hodor View Post

    Besides, who pronounces cache the same as cash?
    I am unaware of any dialect of English that pronounces the final e in cache. (Now, the final syllable in cachet....)

  7. #22
    Registered User MutantJohn's Avatar
    Join Date
    Feb 2013
    Posts
    2,665
    I used to say the final e because I think there's an accent mark on it, right? I used to think it was cash-ay but now I'm like everyone else around me and I just think it's cache. We lost the accent mark over here.

  8. #23
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Quote Originally Posted by MutantJohn View Post
    I used to say the final e because I think there's an accent mark on it, right? I used to think it was cash-ay but now I'm like everyone else around me and I just think it's cache. We lost the accent mark over here.
    Not even the French put an accent on it.

  9. #24
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    Are you a member of ACM (or otherwise have access to the archives)? If so, you can find papers on a multitude of approaches to caching and prefetch with some specifically addressing latency and predictive methods.

    [Edit]
    To be clear, you may also find these articles outside of ACM. I have no idea and am not advocating for ACM membership or anything.

    I'm just saying, I know ACM has some relevant resources in the event you wish to read more about the possibilities.
    [/Edit]

    That said, most implementations, in practice, do something remarkably similar "behind the scenes" in order to disguise high latency networks.

    For example, the NFS browser, [Removed: I already feel like I'm advocating an expensive tool.], I have (I don't "NFS for Windows Services" or whatever it is called.) starts building up a cache using normal commands as you navigate the remote store. Sure, the first few seconds of navigation can be noticeable, but by the time you can read the `ls' from average directories the cache is already getting populated with more directories.
    I do not have access to ACM. I am aware that some clients already do caching using normal commands, but that's not good enough if latency is on the order of 250ms (which is typical across the Pacific Ocean). There is a limit to how much can be done purely on the client.

  10. #25
    misoturbutc Hodor's Avatar
    Join Date
    Nov 2013
    Posts
    1,787
    Quote Originally Posted by tabstop View Post
    Not even the French put an accent on it.
    Hmm. In Australia kaysh (the 'a' long) is common.

    BTW my comment was meant to be taken tongue-in-cheek anyway

  11. #26
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    From what I have played with, SMB and NFS are both way too chatty to be used over high latency networks (they are designed for LAN). Recent versions of SMB are better, but still pretty terrible.

    SSHFS/SFTP is better, but still not nearly as responsive as what I had.

  12. #27
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    Quote Originally Posted by MutantJohn View Post
    How do any of those sound interesting?
    Quote Originally Posted by MutantJohn View Post
    Databases? Ew...
    Me thinks you have no idea...

    Quote Originally Posted by MutantJohn View Post
    Lexical analyzer and parser? Don't even try. Those'll just make the writers try harder! We will never be replaced with automated speech! Or maybe you work for an advertising firm and want to track buzzwords/phrases...
    Me thinks you don't know what a lexical analyser and parser are.

    Quote Originally Posted by MutantJohn View Post
    Okay, the compression algorithm is cool though. I did some light independent reading about that.
    Congratulations! You just chose the least interesting and less challenging of my suggestions.

    Quote Originally Posted by MutantJohn View Post
    FTP client/server, that doesn't sound like anything a natural scientist would like at all! I mean, I'll use one if I need one but ew to this one as well...
    Yeah. Eww to learn how to do network programming...

    Quote Originally Posted by MutantJohn View Post
    And building a file system? That might be kind of neat, actually. Like, how Arch actually resolves package dependencies, etc. That would be really neat because I use it on such a day-to-day basis, I can't help but be curious.
    Me think you don't know what a file system is.

    Quote Originally Posted by MutantJohn View Post
    the rest of your post
    You sort of proved cache programming is a little over your head. Stick to more basic stuff until you get a hold of some important computer concepts.
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  13. #28
    Registered User MutantJohn's Avatar
    Join Date
    Feb 2013
    Posts
    2,665
    The more you describe these things (or choose not to!), Mario, the more I realize, ewwwwwwwwww, actual computer science.

    I've often wondered whether physics was the right choice for me, in hindsight. Now I know that CS was not the way to go. I like being a programmer, not a computer scientist.

    I want something to program.

    Why did I even choose meshing in the first place? As I learned more cosmology, I learned that most of the physics that's done is through simulations so I read up on those and they were pretty cool. But then I read about how they worked and once I started taking some programming, I thought it was really cool and then there's moving mesh codes? Oh man, then I read about Voronoi and then Delaunay triangulations and I was like, "Ooh, girl, you best be gettin' over here."

    I want to expand my programming repetoire but maybe something a little less computer science-y is good for me. I could try learning Dart, actually, and try to make some small web applications. I know Dart is supposed to compile to Javascript and Javascript is a thing on the internet right? So then wouldn't make Javascript binaries be the same thing as making web applications? Literally, I am this ignorant... about Javascript and web coding.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. D-Cache/I Cache Simulator
    By husslela2 in forum C Programming
    Replies: 7
    Last Post: 04-27-2010, 08:41 AM
  2. Difference between ARP Cache and DNS cache?
    By namasteall2000 in forum Networking/Device Communication
    Replies: 9
    Last Post: 06-26-2009, 08:49 AM
  3. SYnchronize read & write of a cache file
    By lei_michelle in forum C Programming
    Replies: 4
    Last Post: 02-26-2008, 05:49 PM
  4. How to write a session manager?
    By Logan in forum C++ Programming
    Replies: 0
    Last Post: 04-25-2006, 06:34 PM
  5. Is it necessary to write a specific memory manager ?
    By Morglum in forum Game Programming
    Replies: 18
    Last Post: 07-01-2002, 01:41 PM