How would I go about trying to write my own cache manager?

**cyberfish** · 12-09-2013

Speaking of latency and network protocols, I started working on a new network file system few months ago that I thought would have a lot of potential, but just didn't have time to work on it anymore (I believe I finished read-only support). It was already much much faster (as in more responsive) than FTP/SSHFS/SMB.

The idea is to do server-assisted pre-fetching.

Overwhelming majority of human use cases look like this -
1. open directory, get listing
2. stat every single file in that directory
3. wait a second or 2
4. open a subdirectory, go to 1., until we find the file we are looking for
5. open file, read entire file (because most of the time the file will be small)

That is a lot of roundtrips, which is bad in high-latency networks (like the internet). That's why opening a directory on a FTP server over the internet takes couple seconds usually.

By just changing the client side, we can try to pre-fetch and cache directory trees and metadata when the user opens a directory, but that still requires a minimum of 2 roundtrips (get listing, stat files) per level, assuming perfect parallelism (all stats can be retrieved at the same time, eg. like in optimized SMB). To do better than this requires server support.

What if, the client can make a request like this - "give me a directory listing of this directory, and 64KB of anything (metadata and data) under the directory that you think I may need next"?

In most cases (directory trees with just 10-20 files/subdirectories under each directory), metadata (directory listings and stats of all files) of everything (recursive) underneath the directory can be returned in the initial call to get directory listing.

In some cases (directory with small files), the entire directory, including file contents, can be returned in the initial get directory listing request.

I'm surprised something like this (a network file system that wastes bandwidth to provide very good latency) doesn't already exist. I searched very hard before starting.

**MutantJohn** · 12-09-2013

Man, I don't know what I'm looking for next.

**laserlight** · 12-10-2013

Try writing your own cash manager (personal accounting tool)?

**Hodor** · 12-10-2013

Originally Posted by laserlight

Try writing your own cash manager (personal accounting tool)?

I can't think of anything more boring

Besides, who pronounces cache the same as cash?

**phantomotap** · 12-10-2013

I'm surprised something like this (a network file system that wastes bandwidth to provide very good latency) doesn't already exist. I searched very hard before starting.

O_o

Are you a member of ACM (or otherwise have access to the archives)? If so, you can find papers on a multitude of approaches to caching and prefetch with some specifically addressing latency and predictive methods.

[Edit]
To be clear, you may also find these articles outside of ACM. I have no idea and am not advocating for ACM membership or anything.

I'm just saying, I know ACM has some relevant resources in the event you wish to read more about the possibilities.
[/Edit]

That said, most implementations, in practice, do something remarkably similar "behind the scenes" in order to disguise high latency networks.

For example, the NFS browser, [Removed: I already feel like I'm advocating an expensive tool.], I have (I don't "NFS for Windows Services" or whatever it is called.) starts building up a cache using normal commands as you navigate the remote store. Sure, the first few seconds of navigation can be noticeable, but by the time you can read the `ls' from average directories the cache is already getting populated with more directories.

Soma

**tabstop** · 12-10-2013

Originally Posted by Hodor

Besides, who pronounces cache the same as cash?

I am unaware of any dialect of English that pronounces the final e in cache. (Now, the final syllable in cachet....)

**MutantJohn** · 12-10-2013

I used to say the final e because I think there's an accent mark on it, right? I used to think it was cash-ay but now I'm like everyone else around me and I just think it's cache. We lost the accent mark over here.

**tabstop** · 12-10-2013

Originally Posted by MutantJohn

I used to say the final e because I think there's an accent mark on it, right? I used to think it was cash-ay but now I'm like everyone else around me and I just think it's cache. We lost the accent mark over here.

Not even the French put an accent on it.

**cyberfish** · 12-10-2013

Are you a member of ACM (or otherwise have access to the archives)? If so, you can find papers on a multitude of approaches to caching and prefetch with some specifically addressing latency and predictive methods.

[Edit]
To be clear, you may also find these articles outside of ACM. I have no idea and am not advocating for ACM membership or anything.

I'm just saying, I know ACM has some relevant resources in the event you wish to read more about the possibilities.
[/Edit]

That said, most implementations, in practice, do something remarkably similar "behind the scenes" in order to disguise high latency networks.

For example, the NFS browser, [Removed: I already feel like I'm advocating an expensive tool.], I have (I don't "NFS for Windows Services" or whatever it is called.) starts building up a cache using normal commands as you navigate the remote store. Sure, the first few seconds of navigation can be noticeable, but by the time you can read the `ls' from average directories the cache is already getting populated with more directories.

I do not have access to ACM. I am aware that some clients already do caching using normal commands, but that's not good enough if latency is on the order of 250ms (which is typical across the Pacific Ocean). There is a limit to how much can be done purely on the client.

**Hodor** · 12-10-2013

Originally Posted by tabstop

Not even the French put an accent on it.

Hmm. In Australia kaysh (the 'a' long) is common.

BTW my comment was meant to be taken tongue-in-cheek anyway

**cyberfish** · 12-10-2013

From what I have played with, SMB and NFS are both way too chatty to be used over high latency networks (they are designed for LAN). Recent versions of SMB are better, but still pretty terrible.

SSHFS/SFTP is better, but still not nearly as responsive as what I had.

**Mario F.** · 12-11-2013

Originally Posted by MutantJohn

How do any of those sound interesting?

Originally Posted by MutantJohn

Databases? Ew...

Me thinks you have no idea...

Originally Posted by MutantJohn

Lexical analyzer and parser? Don't even try. Those'll just make the writers try harder! We will never be replaced with automated speech! Or maybe you work for an advertising firm and want to track buzzwords/phrases...

Me thinks you don't know what a lexical analyser and parser are.

Originally Posted by MutantJohn

Okay, the compression algorithm is cool though. I did some light independent reading about that.

Congratulations! You just chose the least interesting and less challenging of my suggestions.

Originally Posted by MutantJohn

FTP client/server, that doesn't sound like anything a natural scientist would like at all! I mean, I'll use one if I need one but ew to this one as well...

Yeah. Eww to learn how to do network programming...

Originally Posted by MutantJohn

And building a file system? That might be kind of neat, actually. Like, how Arch actually resolves package dependencies, etc. That would be really neat because I use it on such a day-to-day basis, I can't help but be curious.

Me think you don't know what a file system is.

Originally Posted by MutantJohn

the rest of your post

You sort of proved cache programming is a little over your head. Stick to more basic stuff until you get a hold of some important computer concepts.

**MutantJohn** · 12-14-2013

The more you describe these things (or choose not to!), Mario, the more I realize, ewwwwwwwwww, actual computer science.

I've often wondered whether physics was the right choice for me, in hindsight. Now I know that CS was not the way to go. I like being a programmer, not a computer scientist.

I want something to program.

Why did I even choose meshing in the first place? As I learned more cosmology, I learned that most of the physics that's done is through simulations so I read up on those and they were pretty cool. But then I read about how they worked and once I started taking some programming, I thought it was really cool and then there's moving mesh codes? Oh man, then I read about Voronoi and then Delaunay triangulations and I was like, "Ooh, girl, you best be gettin' over here."

I want to expand my programming repetoire but maybe something a little less computer science-y is good for me. I could try learning Dart, actually, and try to make some small web applications. I know Dart is supposed to compile to Javascript and Javascript is a thing on the internet right? So then wouldn't make Javascript binaries be the same thing as making web applications? Literally, I am this ignorant... about Javascript and web coding.

Thread: How would I go about trying to write my own cache manager?

Thread Tools

Search Thread

Display

Similar Threads

D-Cache/I Cache Simulator

Difference between ARP Cache and DNS cache?

SYnchronize read & write of a cache file

How to write a session manager?

Is it necessary to write a specific memory manager ?