flipcode - Programming a Virtual File System

Programming a Virtual File System - Part I
by (05 July 2001)

Introduction

When I started designing my new 3D Engine, I realized that I needed some kind of file system. Not only some file classes or so, but something I call a Virtual File System (VFS), my own file system that supports nice features like compression, encryption, fast access times, and so on.

So I decided to publish my efforts so that you don't have to re-invent the wheel (as we say in Germany...;-) This Article Series will consist of 2 Parts. The first one is the one you're reading at the moment, and it will handle the Design and Interface of the VFS. The second Article will discuss the entire implementation (this will be the larger one). I'd really appreciate comments and constructive critics from you at michiwalter@gmx.de to improve the VFS.

What Is The VFS And The VFS Library?

The VFS is a File System, similar to the ones Windows uses (like FAT32, NTFS, ...).The main difference between the VFS and a real FS is that the VFS uses the real FS within its implementation. Our VFS Library is a collection of Functions, Structures and Tools, with which you can create and work with VFS.

The Features

Now we'll try to name some features of the VFS:

Fast Access Times - The VFS should provide performance comparable with the native Win32 API Functions. By using few, huge files instead of lots of small ones, we could even improve the performance further.

Few Archives Instead Of Lots Of small Files - The VFS should be able to handle so-called Archives, i.e. files that contain lots of other files. This comes in handy, because if we use only one file, we need only one File Handle, and so we won't exhaust too many System Resources. Another advantage is - especially for professional Game Projects - that it's not so easy to parse our file format.

Debugging Features - Although Archives are a big pro for us, they are not the perfect solution for the development and debugging process since you can't quickly access files Imagine an image isn't display correctly on the screen and you want to check if the artist didn't save the file properly or if your code doesn't work: You would have to decompress the Archive File, look at the Image, perhaps fix the Image and compress everything again.

Pluggable Encryption and Compression - Pluggable Encryption and Compression (PEC) is a really cool feature of our VFS. The VFS user can not only choose between several PEC-Plugins, he can even select more than one and determine the order of execution. For instance, you could create an Archive File for the BMP-Images with the PEC-Plugin "RLE-Compression" and another Archive File for TXT-Files with the PEC-Plugins "ZIP-Compression" and "RSA-Encoding".

Security - We will store a MD5 key within each Archive file so that manipulation of the Archive Files will be detected instantly.

Multiple Root Paths - This is important especially for professional games, because you could for instance set the Root Paths to both the Installation Directory of the Game and the CD-ROM/DVD.

Now since we have set up our Features Table, we can advance to the Design Phase.

Basic Design

Ok, let's start playing a little bit around with the design. At first, I designed a full-OOP, Interfaces- and Class-Factory using approach, but I realized that it overcomplicated the interface ( anyway if you're interested in this approach just drop me an e-mail at michiwalter@gmx.de). The other idea is the following which may look similar to you if you already used FMOD ( because of the function naming conventions ).

Let's start now: First, we have the main Interface of the VFS with 16 Functions. I'll present you these Functions first, then I'll discuss what they are for and in the next article we'll discuss the implementation, okay ?


#define VFS_VERSION           0x0100
#define VFS_PATH_SEPARATOR    '\\'

void VFS_Init();
void VFS_Shutdown();

These Functions don't do really more than setting up / freeing some internal Structures we will need in the Implementation.

Filters


typedef BOOL (* VFS_FilterProc )( LPCBYTE pIn, DWORD dwInCount, LPBYTE* ppOut, DWORD* pOutCount );

struct VFS_Filter
{
    string strName;
    string strDescription;
    VFS_FilterProc pfnEncodeProc;
    VFS_FilterProc pfnDecodeProc;
};

void VFS_RegisterFilter( VFS_Filter* pFilter );
void VFS_UnregisterFilter( VFS_Filter* pFilter );
void VFS_UnregisterFilter( DWORD dwIndex );
DWORD VFS_GetNumFilters();
const VFS_Filter* VFS_GetFilter( DWORD dwIndex );

Well, that's a little bit harder now. a VFS_Filter is a kind of subprogram that processes data. You could for instance write a VFS_CryptFilter which encrypts / decrypts data. You got it? A Filter consists out of something like a pre-processor ( the pfnEncodeProc procedure ) for the Data written to an archive and a post-processor ( the pfnDecodeProc procedure ) for the data read from an archive. These filters implement the concept of the pluggable compression and encryption mentioned above, because you can assign one or more Filters to each Archive File you use. If you are little confused now, then look at Figure 1 which provides a scheme diagram of the Filters.

You see, the encode / decode procedures manipulate the data stream in both ways: from the archive to the memory and from the memory to the archive.

For a better explanation of the Filters, let's write a simple Filter (anyway keep in mind that we won't be able test the filter, because we'll implement the VFS next week, not today). Our Filter should add 1 to each Byte.


VFS_Filter ONEADD_Filter =
{
    "ONEADD",
    "This Filter adds 1 to each Byte of the Data. It doesn't really make sense, "
        " but anyway, this is just a test, you know",
    ONEADD_EncodeProc,
    ONEADD_DecodeProc
};

BOOL ONEADD_EncodeProc( LPCBYTE pIn, DWORD dwInCount, LPBYTE* ppOut, DWORD* pOutCount )
{
    assert( ppOut );
    assert( pOutCount );

    // Allocate the Memory.
    *ppOut = new BYTE[ dwInCount ];

    // Perform a For-Loop through each Byte.
    for( DWORD dwIndex = 0; dwIndex < dwInCount; dwIndex++ )
    {
        ( *ppOut )[ dwIndex ] = ( BYTE )( pIn[ dwIndex ] + 1 );
    }

    // Set the Output Count.
    *pOutCount = dwInCount;
}

BOOL ONEADD_DecodeProc( LPCBYTE pIn, DWORD dwInCount, LPBYTE* ppOut, DWORD* pOutCount )
{
    assert( ppOut );
    assert( pOutCount );

    // Allocate the Memory.
    *ppOut = new BYTE[ dwInCount ];

    // Perform a For-Loop through each Byte.
    for( DWORD dwIndex = 0; dwIndex < dwInCount; dwIndex++ )
    {
        ( *ppOut )[ dwIndex ] = ( BYTE )( pIn[ dwIndex ] - 1 );
    }

    // Set the Output Count.
    *pOutCount = dwInCount;
}

The only sense this filter makes is to make manipulating / ripping the archive uneasier (think about how we could improve this filter, guys, perhaps with some kind of encryption algorithm or so...).

The Root Path Functions

We discussed the root path features above, do you remember? If you don't - no problem. I said that we want the feature to have multiple root paths, i.e. multiple search paths like for instance the program's installation directory, the DVD Drive and a shared Network Drive with levels. The following functions will be created to perform this stuff:


void VFS_AddRootPath( LPCTSTR pszRootPath );
void VFS_RemoveRootPath( LPCTSTR pszRootPath );
void VFS_RemoveRootPath( DWORD dwIndex );
DWORD VFS_GetNumRootPaths();
LPCTSTR VFS_GetRootPath( DWORD dwIndex );

Everything pretty basic, don't you think so?

Some Other Basic Stuff

The next 4 functions I'll present are rather basic.


void VFS_Flush();

This Function will close all open archives whose reference count is 0. You may wonder why the archive files whose reference count becomes 0 aren't closed automatically, but you see, if we'd do so, we had to re-parse the Archive Files every time we load + close a File. Look at this code for a further explanation:


// Reference Count is 1. -> Load + Parse!!!
DWORD dwHandle = VFS_File_Open( "Bla\\Bla.Txt" );  

// Reference Count is 0. -> Close!!!
VFS_File_Close( dwHandle );     

// Reference Count is 1. -> Load + Parse!!!
dwHandle = VFS_File_Open( "Bla\\Bla.Txt" );

// Reference Count is 0. -> Close!!!
VFS_File_Close( dwHandle );

You see, we would have to parse the Archive File twice. With the VFS_Flush()-using solution, we have only to reparse once. A good place to call VFS_Flush() in a Game might be when all Level Data is loaded. But now the last 3 Basic Functions:


struct VFS_EntityInfo
{
    BOOL bIsDir;          // Is the Entity a Directory.
    BOOL bArchived;       // True if the Entity is located in an Archive.
    string strDir;        // like Models/Sarge/Textures
    string strPath;       // like Models/Sarge/Textures/Texture1.Jpg
    string strName;       // like Texture1.Jpg
    DWORD dwSize;         // The Number of Files and Subdirectories for a
Directory.
};

BOOL VFS_Exists( LPCTSTR pszPath );
void VFS_GetEntityInfo( LPCTSTR pszPath, VFS_EntityInfo* pInfo );
DWORD VFS_GetVersion();

OK. First, we have to define the expression entity. For me (and for you, if you want to use the VFS), the expression entity means something like an object of the file system, like a file or a directory.

The first function now checks whether an entity with the path pszPath exists. You see, that's fairly basic Stuff, but AFAIK the C(++) Standard Library doesn't contain such a function (I know, guys, they have functions like stat() but I want just something like exists() or so...). The second function is something like stat(); it returns an entity information record about the entity.

The last function doesn't really have anything to do with the entity info stuff, it's just a function to return the current version of the VFS. No big deal. Nothing special (in fact, it just returns the VFS_VERSION Constant ;-)

The File Interface

Now we've covered the easy stuff. But, don't worry, some more easy stuff is following. In fact, everything covered in this part of the article is easy. Sorry, if you want it harder, you have to wait for the next part of the article... ;-)

Well, here they are, the File Interface Functions:


#define VFS_INVALID_HANDLE    ( ( DWORD ) -1 )

// The VFS_File_Open/Create() Flags.
#define VFS_READ 0x01
#define VFS_WRITE 0x02

// The VFS_File_Seek() Flags.
#define VFS_SET 0x00
#define VFS_CURRENT 0x01
#define VFS_END 0x02


// Create / Open / Close a File.
DWORD VFS_File_Create( LPCTSTR pszFile, DWORD dwFlags );
DWORD VFS_File_Open( LPCTSTR pszFile, DWORD dwFlags );
void VFS_File_Close( DWORD dwHandle );

// Read / Write from / to the File.
void VFS_File_Read( DWORD dwHandle, LPBYTE pBuffer, DWORD dwToRead, DWORD* pRead = NULL );
void VFS_File_Write( DWORD dwHandle, LPCBYTE pBuffer, DWORD dwToWrite, DWORD* pWritten = NULL );

// Direct Data Access.
LPCBYTE VFS_File_GetData( DWORD dwHandle );

// Positioning.
void VFS_File_Seek( DWORD dwHandle, LONG dwPos, DWORD dwOrigin = VFS_SET );
LONG VFS_File_Tell( DWORD dwHandle );
DWORD VFS_File_GetSize( DWORD dwHandle );

// Information.
BOOL VFS_File_Exists( LPCTSTR pszFile );
void VFS_File_GetInfo( LPCTSTR pszFile, VFS_EntityInfo* pInfo );
void VFS_File_GetInfo( DWORD dwHandle, VFS_EntityInfo* pInfo );

There are only a few things to mention. First, the dwFlags parameter of the VFS_File_Create() and VFS_File_Open() functions can be either VFS_READ or VFS_WRITE or both, which means read access, write access or read/write access. Second, these two functions return a handle which is used by almost all the other functions as a kind of pointer. We won't use pointers but handles because they provide another layer of abstraction. If we'd use pointers we would have to specify our implementation in the interface; using handles which are just numbers provides another level of abstraction. Another thing I want to mention is the fact that our file functions will load the entire file into memory. This is necessary because of the filters feature (since they need memory to process). You can access this memory directly by using the VFS_File_GetData() function. Well, the rest is mainly stuff you'll know from the Standard I/O Library.

The Library Interface

This might be the point you were expecting since some lines or better pages (and when we talk about expecting: something I didn't expect is the fact that this is already Page 7 or so; WOW).

Anyway, let's stop babbling, here we go:


// Create / Open / Close an Archive.
DWORD VFS_Archive_Create( LPCTSTR pszArchive, const VFS_FilterNameList& Filters, DWORD dwFlags );
DWORD VFS_Archive_CreateFromDirectory( LPCTSTR pszArchive, LPCTSTR pszSrcDir,
const VFS_FilterNameList& Filters, DWORD dwFlags );
DWORD VFS_Archive_Open( LPCTSTR pszArchive, DWORD dwFlags );
void VFS_Archive_Close( DWORD dwHandle );

// Set the Filters used by this Archive.
void VFS_Archive_SetUsedFilters( DWORD dwHandle, const VFS_FilterNameList& Filters );
void VFS_Archive_GetUsedFilters( DWORD dwHandle, VFS_FilterNameList& Filters );

// Add / Remove Files to / from the Archive.
void VFS_Archive_AddFile( DWORD dwHandle, LPCTSTR pszFile );
void VFS_Archive_RemoveFile( DWORD dwHandle, LPCTSTR pszFile );

// Extract the Archive.
void VFS_Archive_Extract( DWORD dwHandle, LPCTSTR pszTarget );

// Information.
void VFS_Archive_GetInfo( DWORD dwHandle, VFS_EntityInfo* pInfo );
void VFS_Archive_GetInfo( LPCTSTR pszArchive, VFS_EntityInfo* pInfo );

Pretty easy interface, don't you think? Just the usual stuff for an archive file. And now you finally see the application for the filter functions we saw before. You can apply filters to each archive with the VFS_Archive_Set/GetUsedFilters() functions.

The Directory Interface

This is the last Interface of the VFS. It contains only 3 functions which are self-explanatory, I think.


// Information.
BOOL VFS_Dir_Exists( LPCTSTR pszDir );
BOOL VFS_Dir_GetInfo( LPCTSTR pszDir, VFS_EntityInfo* pInfo );

// Get the Contents of a Directory.
vector< VFS_EntityInfo > VFS_Dir_GetContents( LPCTSTR pszDir, BOOL bRecursive = FALSE );

Functions one and two are easy (they are like the ones for the files and the common ones in the basic interface). Number three is like the DOS dir command. Got it ;-) ?

The Babbling Must Go On

That's it. We've finally finished Part I of the tutorial. I can't believe it. But the really hard work is in front of us:

We have to IMPLEMENT the VFS!!!

But before I can do this, I need your ideas about the VFS. Your ideas, criticisms, recommendations, and so on... if there will be major changes to the VFS, I'll perhaps add a small Part 1b or so of the tutorial which will introduce the updates / changes of the interface (so that we'll be able to focus completely on the implementation *hehe* ) :-]

The VFS header file is available for download here: article_vfs_header.h.

Bye until next time,
Michael Walter

PS: I want FEEDBACK, Guys!!!

Article Series:

Programming a Virtual File System - Part I
Programming a Virtual File System - Part II
Programming a Virtual File System - Part III