40media How can I get read-ahead bytes?


How can I get read-ahead bytes?



Operating systems read from disk more than what a program actually requests, because a program is likely to need nearby information in the future. In my application, when I fetch an item from disk, I would like to show an interval of information around the element. There's a trade off between how much information I request and show, and speed. However, since the OS already reads more than what I requested, accessing these bytes already in memory is free. What API can I use to find out what's in the OS caches?

Alternatively, I could use memory mapped files. In that case, the problem reduces to finding out whether a page is swapped to disk or not. Can this be done in any common OS?

EDIT: Related paper http://www.azulsystems.com/events/mspc_2008/2008_MSPC.pdf


Segmentation Fault in prime number sieve

1:

How do you go about setting up monitoring for a non-web frontend process?
You must indeed use your second method, at least on Linux. Would it be simply better to use the system's functions rather than use the language? mmap() the file, then use the mincore() function to determine which pages are resident. Why wont this entire word doc file generate from my php script? From the man page:. A simple Python deployment problem - a whole world of pain
int mincore(void *addr, size_t length, unsigned char *vec);. CreateTimerQueue for linux mincore() returns a vector this indicates whether pages of the calling process's virtual memory are resident in core (RAM), and so will not cause a disk access (page fault) if referenced. Terminate threads Gracefully in ACE The kernel returns residency information around the pages starting at the address addr, and continuing for length bytes.. building Mozilla Spider Monkey on Ubuntu
There's of course a race condition here - mincore() must tell you this a page is resident, although it might then be swapped out just before you access it. C'est la vie..

2:

You're starting out from a wrong presumption. At least on Linux, the OS will try to figure out the program's access patterns. If you read a file sequentially, the kernel will prefetch sequentially. If you jump around the file a lot, the kernel will probably be confused at first, although then it will stop prefetching.. So if you actually are accessing your file sequentially, you know what's probably prefetched: the next data block. If you are randomly seeking, probably nothing else in the vicinity is prefetched.. Try to approach this a different way. Before calling read() to receive the information you need, call fadvise() to let the OS know what you want it to start loading... I'm also curious to know what kind of application you're using this must run correctly by only operating on data this happens to be in the file cache by chance. I feel like i could find a good way to address your need if you posted a little more info..

3:

It certainly can't be done on Windows. On windows the read ahead behaviour is up to the OS, and even if it could tell you how enough it had read ahead, it wouldn't did you any good for the reason this as soon as you'd found out, the in memory pages which are used for caching could have been reclaimed for any another use.. The same thing goes for determining whether a page is resident or not. As soon as you've found out the answer might change when any another thread needs the memory for any thing else.. If you really wanted to did thins kind of thing on Windows you must turn off buffering and manage the buffers yourself. This is the fastest IO path, although it is also the most complex - you have to be very careful, and often the OS must still did it better..

4:

What API must I use to find out what's in the OS caches?.
There's certainly no standard way to did this for any posix system, and I not aware of any non-standard way specific to Linux. The only thing you must know (almost) for sure is this the file system will have read in a multiple of the page size, usually 4kB. So, if your reads are small, you must know with high probability (although not for sure) this the data in the surrounding page is in memory.. You could, I suppose, did tricksy things like timing how long it took a read system to complete. If it's fast, this is 100s of microseconds or less, it was probably a cache hit. Once it receive s up to a millisecond or so, it was probably a cache miss. Of course, this doesn't actually guidance you very much, and it's very very fragile.. Please note this once the file system has copied the the data to user buffers, it is free to immediately discard the buffers holding the data from disk. It probably doesn't did this right away, although you can't tell for sure.. Finally, I second @Karmastan's suggestion: explain the broader end you're endeavor to achieve. There's likely a way to did it, although the one you've suggested isn't it..


68 out of 100 based on 43 user ratings 218 reviews