Faster File Access With File Mapping by (31 January 2001) |
Return to The Archives |
Introduction
|
All of us have to access files in our programs. Sometimes it's infrequent,
and happens only at the end or the beginning of the program. You use only a very
small config file and you don't need it faster. But sometimes you have big
databases, large files, and you have to look at them all the time while the
program is executing. If you use these files with the common file handling functions, you
soon go mad because the program is running very s-l-o-w-l-y. And if you copy
the contents of all your files, problem gets worse, because you have to do all
the memory handling by yourself. Because of that, the Microsoft team developed a faster, and even easier way to get files working. Its called File Mapping and it's one of the things your mother should've told you but never did. This tutorial will explain how to discover the secret your mother hid from you. It will also speak about normal file managing, so if you don't have any idea about files under Win32 and still use the C / C++ file libraries, this doc is for you. I recommend that you download the following file to see fully working functions that use Win32 file handling APIs: FILE.CPP. I used VC++ to test all the examples. And finally, excuse me if I'm wrong writing some words or phrases, but my english isn't perfect yet. I'm from Argentina and here the people speak Spanish. If you find any mistakes in this tutorial, please let me know. |
The CreateFile Function
|
The first thing you have to do to use a file is to open it, as in the old -
known DOS. But that's done with an API called CreateFile. Curious, eh? Nah,
there is also an OpenFile function, but this is the one I like. The prototype of this API is:
Where the params means: lpFileName: the name of the file you want to open / create. dwDesiredAccess: a value that can be 0, GENERIC_READ (that means you need to read from the file), GENERIC_WRITE (meaning you want to write to the file), or a combination (use GENERIC_READ | GENERIC_WRITE to use Read - Write access). dwShareMode: a value that specifies what happens if somebody tries to open the same file when you are using it. If it's 0, sharing is not allowed with the file. If it's FILE_SHARE_READ, Windows will allow other programs to open the file if they use Read-Only. If it's FILE_SHARE_WRITE, sharing will be allowed only if they are trying to open the file for Write access. Of course, you can combine the values, but I think there is no reason to make it different than 0, or FILE_SHARE_READ. lpSecurityAttributes: it has some use in NT. Keep it as NULL. dwCreationDisposition: have to be one of the following values:
In most of the cases I omited some posible values or details that aren't important if you are using CreateFile for open files. If you are interested in more info about the function, contact me. You should save the handle returned by CreateFunction, because its needed to work with the file you opened. If the value returned is INVALID_HANDLE_VALUE, something failed and you do not need to close the handle. Else, you can work with the handle, and when you end, you can close it off this way: CloseHandle(hndFile); where hndFile is the handle the file has returned to you. Look at the way the Windows API works. You have a function that needs one or two pages to be descripted because of the lots of params it have, but you are only going to use two or three of these params. That's the way it is, and that's the main reason to wrap the APIs in a class or something. And if you are afraid of the Windows file system after you saw the CreateFile function, hear me out. It looks a lot easier when you program it. Look at the CPP file that is with this doc and see that the most of the params are NULL or zero for most of the cases. Do not continue using the C++ file functions, because this way you lose the power of the Win32 API. Decided to go on? Welcome to the world of the File APIs! |
Read / Writing The File
|
To read data from a file, you can use the ReadFile function.
Its prototype is:
and the prototype of the function to write to a file is:
Both have similar parameters: hFile: that has to be the handle that CreateFile gave you when it opened the file. lpBuffer: that's a pointer to the buffer where you want to get the data from, or where you want to put it. nNumberOfBytesToRead, nNumberOfBytesToWrite: specifies the number of bytes to use in this operation. If this number is zero, WriteFile wont write any data but will change the date and time of the file. lpNumberOfBytesRead, lpNumberOfBytesWritten: is a pointer to a DWORD where Windows will return how many bytes it really readed or wrote. That's not useful for files, but for other things you can do with CreateFile. lpOverlapped: we don't need that, set it to NULL. If both functions fail, the return value is zero. When you use both WriteFile or ReadFile, the info is got from (or put at) the byte the File Pointer points to. The first time, when you open the file, it's at the first byte, but after each operation the pointer advances. If you get the first 10 bytes, the next operation will work with the 11th byte and next ones. There is a function that allow you to change the place the File Pointer is. Its prototype is:
A description of each param follows: hFile: the same handle to the file you got before. lDistanceToMove: the number of bytes Windows has to move the pointer. A positive number means to move forward and a negative one means to move backward. lpDistanceToMoveHigh: the people at Microsoft didn't want to limit the size of the file to a LONG, so created this param to allow programmers with kilometric files to use a 64 bits value as distance. You will need this value only if you have files bigger than 4,294,967,294 bytes (2^32 - 2), that's a bit smaller than 4 gigabytes. If you aren't so crazy, keep this param as NULL. dwMoveMethod: can be one of the following values:
If you have GENERIC_WRITE access to the file, you may need to use the FlushFileBuffers function. Its unique parameter is the handle of the file and it returns 0 if failed. The thing this function does is to flush the contents of the internal Window's buffer to disk. Most of the time, you aren't going to need this function, because the Windows cacheing works fine, but if you are doing strange things, maybe. Only keep in mind that the changes you make to a file aren't flushed immediately, but that's no problem because when you close the handle all the changes are flushed.
Another useful function is SetEndOfFile. It has only one parameter, the file handle, and returns 0 if it fails. The function it has is to set the current value of the file pointer as the End Of File.
Another API function that you'll need, especially if you want to use the filemapping feature, is GetFileSize. Its prototype is:
where hFile is the same handle as ever, and lpFileSizeHigh is a pointer to DWORD where Windows can save the high dword of the value returned. It may be NULL. You need a file opened with GENERIC_WRITE or GENERIC_READ to use this function. This API will return the size of the file both in lpFileSizeHigh and its return value if lpFileSizeHigh is different to NULL, or only in its return value if it isn't. If the return value is 0xFFFFFFFF and lpFileSizeHigh is NULL, the function failed and you can get the error code with GetLastError. If the return value is 0xFFFFFFFF, but lpFileSizeHigh isn't NULL, you have to call GetLastError to know if it was an error or if it's the real filesize. If GetLastError returns NO_ERROR, all's fine and the value returned is really the size of the file. Otherwise, an error happened.
|
Warning About SetEndOfFile
|
Be careful with SetEndOfFile. BE EXTREMELY CAREFUL!!! Read the following with
full atention: Some time (years) ago I was programming something that needed a database of a pair of megabytes. The database had to be in my own format, but as that DB had to save the user configurations, it was almost empty the first time the program was executed. To generate that almost - empty database I used WriteFile the first time to write some data, and SetEndOfFile to give the file the size it needed. Luckily, I always was a bit paranoid, and before distributing the database I opened it with an hexadecimal editor. You HAD to see may face!!! A big part of my files and source code and emails I was writting and reading before creating the database were there, in the config file that was to be distributed to lots of people. Windows filled the empty part of the database with things it found in memory. Including private ones. So, be extremely careful. Although maybe you are an Open Source guy, you do NOT like people reading your schedule, or things as the mails you received from your friends. Keep in mind the way Windows fill the empty parts of the files before using this function to grow a file. |
File Mapping: What Why When?
|
File Mapping is an easier and faster way to access files, once you know how to
do it. It's one of the capabilities of Windows that is so good, you can't
understand why so few use it. With File Mapping you don't get a handle to use with other APIs, you get a pointer to the raw data in memory. And Windows is the one that has to worry about what part of the file to copy to memory and so on. The unique disadvantage of this system is that you can't change the size of the file while it's filemapped. With normal file handling functions, you can execute a WriteFile at the EOF and the file will grow. But you can't do that with filemapping. You have to unmap the file, and change the size with any known method (for example, SetEndOfFile). But the easiness of use and the speed increment is so big that you don't need to use the normal file handling functions any more. Wrap it in a class, and use files as memory! |
File Mapping: How?
|
The first thing you have to do to filemap a file is to open it
with CreateFile. Then, you have to use the following function:
The params mean the following: hFile: guess what. Good! The handle to the file got with CreateFile. lpFileMappingAttributes: security? Set it as NULL. flProtect: says the way you want to use the filemap. It can be one of the following values:
lpName: keep it as NULL. If you give the FileMapping object a name means you want to share that. And it's not what we want to do... yet. When I wrote this tutorial I kept in mind the phrase KISS (Keep It Short and Simple) so I intentionally "forgot" all the params that are rarely useful. Trust me, this way is better. If you want more info, mail me and let me know. If there are a lot of guys asking about it, I could write another article to add the difficult or rarely used params / things about file handling. When you create a filemapping object with a file handle, you mustn't use this handle with the normal file handling functions, until you close the filemapping object. CreateFileMapping returns another handle that you have to save, or NULL if failed. When you end using the FileMapping object, you must close it. Close it with CloseHandle, before to close the handle to the file in disk. The last step to get the filemapping working is to map the view of the file. To do that you use the next API:
which params are: hFileMappingObject: the handle returned by CreateFileMapping. dwDesiredAccess: look how many times have to say the way you want to access to the file. That's the principal reason to wrap it all in a class. Can be one of the following values:
dwNumberOfBytesToMap: how many bytes you want to map. Set it to zero and the entire file is mapped. It also has to be a multiple of the granularity value. I always map all of the file, because it's easier, and you don't have to worry about Windows memory if your file is not so big. Windows is supposed to work with your mapped file the same way it works with the swap file, where there is always some part of the file in memory, and if you try to access some byte that isn't there, Win32 will get it from the file. As far as I know, Windows works with File Mapping the same way it works with Virtual Memory, where some ranges of memory are disk space that you can access with a pointer. So, if the OS needs some memory, could get your filemapped back to the disk, and take it again when needed. MapViewOfFile returns a memory pointer. The wonderful thing is that what you modify in that memory will be this way in the disk. Only remember that you can't change the size of the file when mapped, so keep in mind that you must NOT access memory that's further than the last byte of the file, or Windows will trash your program. Use GetFileSize before to map the file to know where is the EOF. You can get as many views of files from a FileMapping object as you want. But generally you won't need more than one. Finally, to end with the file, you should destroy all the views of the files with UnmapViewOfFile and only then close the handle of the FileMapping object and of the file. The prototype of UnmapViewOfFile is:
where lpBaseAddress is the same pointer MapViewOfFile gave you. This functions returns zero if something went wrong. Windows doesn't flush the modifications you make at the view of the file inmediately. If at any time you want to flush'em, use:
where lpBaseAddress says from what byte the API have to begin flushing, and dwNumberOfBytesToFlush says how many bytes you need to flush. If that number is zero, all the contents of the file from lpBaseAddress to forward will be flushed. When closing the FileMapping object, all the changes will be flushed to the disk, so you haven't to worry with this function the most of the times. |
Handles And Processes
|
Using named FileMapping objects you can share the memory. But don't try to use the same handles with different processes with the default security descriptor. You can't. In general, the handles under Win32 are only valid for the process that creates them. To use the same handle in different processes, you have to do some strange things as duplicate the handle and others. But you wont need it anyway most of the time. |
End Of File
|
Well, its time to go. You know, send me a mail at asciimail@yahoo.com to say
what you think about this article, or if you have any info that you think might be
useful for me. And happy millenium (at the last days of January it sounds ridiculous, but that doesn't matter)! Best regards H. Hernán Moraldo aka DoctorK. |
|