Enhancing PC Disk Performance Bruce Schafer Multisoft Corporation May 22, 1986 Introduction From the beginning, the main benefit of personal computers has been increased productivity. Just as we become familiar with the advantages of personal computers we begin to notice that we spend an increasing amount of time waiting for the computer to finish its task. This is partly due to the natural human tendency to always want more. But, there is a more objective reason as well. Usually, when we begin an activity on a computer, we begin small -- a small document, a small spreadsheet, a small database, or a few weeks of accounting information. Inevitably the data grows, resulting in slower responses from the computer, especially when disk access is involved. This article discusses a way of improving the speed of personal computers using a software technique called disk caching. Put simply, a disk cache is a collection of copies of recently used disk sectors. By keeping a copy of recently used disk sectors in extra RAM memory, a disk cache can reduce the number of times the application actually accesses the disk. This, in turn, can dramatically speed up the performance of the application. History In the early days of computing, memory was a very expensive resource. Large rooms were filled with many hundreds of thousands of dollars of memory, and still there never seemed to be enough. Much research money went into developing the concept of virtual memory. (See "Virtual Memory" by Peter J. Denning in Computing Surveys, September 1970). Much of this research amounted to answering the question, "How can I get my programs to think I have more memory than I actually do without losing to much in performance?" Much of this research into minimizing the loss of performance in creating virtual memory applies equally well to increasing the performance of disk transfers. Since the time large virtual memory systems were developed, the price of memory has dropped dramatically. Just in the short time since the IBM PC was introduced, we have seen memory boards go from 64K bytes per board to 256K/384K, and more recently to 2048K. The price of these boards at the time they were introduced was about the same in each case. In short, the price of personal computer memory has dropped by at least 10-to-1 in four years. This dramatic drop in the cost of memory provides an opportunity to use memory in ways that would have previously been too expensive. In particular, RAM memory can be used to create a disk cache that dramatically improves the performance of applications which access disk storage. Related Technologies You might ask why isn't disk caching built into MSDOS? In fact, we might anticipate that it will be added in some future version of DOS. At the time DOS was originally designed, however, it was clearly out of the question. Memory was still too expensive to seriously consider it. On the other hand, DOS did provide for a related technique, namely buffering. Starting with Version 2.0, DOS even gave the user the option of specifying the number of buffers to be used for disk access. (If you are not familiar with this command, look up the BUFFERS command in your DOS manual.) While buffering provides some performance improvement, it falls far short of what can be accomplished by a disk cache. Another technology related to disk caching is called RAM disk. The idea of RAM disk is similar in that it takes advantage of extra memory to increase disk performance. The major difference is that a RAM disk creates a completely separate virtual disk. A typical use of a RAM disk involves copying frequently used programs to the RAM disk at the beginning of a session. Later, when it is time to run one of these programs, they are loaded from the RAM disk rather than the conventional disk. As a result, they begin executing in a fraction of the normal time. As you can see, this approach is particularly beneficial when the same programs are loaded several times in a session. The case for using a RAM disk for data is not nearly so clear cut, however. First of all, not all of your data may fit in the RAM disk. Secondly, if you make any changes to the data during a session, you must remember to copy it to a conventional disk when you are done. Thirdly, you suffer the risk that the power may fail before you get around to copying your data back to a real disk. Because of these disadvantages, very few people use RAM disks for data. In contrast, a disk cache suffers from none of these disadvantages. By its nature, a disk cache only requires enough memory to hold the most frequently used parts of the data. Changes to the data are automatically written to the physical disk, eliminating the inconvenience and risk inherent in postponing this step when using a RAM disk. Even in the case of program loading, a disk cache is often the superior solution. This is because it eliminates the requirement of anticipating which programs will be used in a session and copying them to a RAM disk. In summary, a disk cache provides almost all of the advantages of a RAM disk and it does so automatically. Disk Cache Theory As mentioned earlier, a disk cache works by accumulating copies of frequently used sectors in memory. As new sectors are accessed by applications it automatically keeps a copy of them as well. Obviously, there won't always be room for a copy of a new sector as well as all previously used sectors. In this case a replacement algorithm is needed. A replacement algorithm is the computer science term for the answer to question, "Which sector copy gets thrown away?" (Keep in mind that any changes to the sector have already been written out to the physical disk.) Early research into virtual memory and other types of hierarchical storage showed that a vast majority of the time, a Least Recently Used approach was best. In other words, replace the copy of the least recently used sector with a copy of the most recently used sector. Disk Cache Practice So how does disk caching affect actual applications? The answer, of course, varies with the application and what kind of disk you are using in your system. Disk caches provide the most dramatic improvement in diskette based systems. Retrieval times in a database can be improved by a factor of six or more and sorts can be improved by a factor of two. Even in hard disk systems the improvement is significant -- retrievals can be improved by a factor of two and sorts by almost a factor of two. You may be wondering about the impact on faster hard disks typical of AT configurations. Even with these disks, the performance boost of disk cache can still be as much as a factor of three-to-one. This is partly due to the fact that the disk cache function is being provided by the faster 80286 processor in the AT. Are performance improvements restricted to data bases? Certainly not -- a broad range of applications benefit: DOS batch files run faster; document and spreadsheet retrievals improve; and compilers and accounting packages speed up. The exact improvement varies according the nature of the application's use of the disk, the amount of data being used, and how often that data is accessed. Another question that comes up with a disk cache is how big should they be? The easy answer is the bigger the better. A slightly better answer is to give the disk cache whatever memory your applications don't require. You should also find that you will need fewer DOS buffers when using a disk cache. The memory taken up by DOS buffers is probably better allocated to the cache. To be more specific concerning disk cache size, 64K will usually provide a noticeable improvement and 128K an impressive improvement. If you are using a large database or a large word-processing file, an even bigger disk cache may be desirable. This is also true if you frequently switch from one application to another. Products on the Market Over the last year, several disk cache products have appeared on the market. They all provide the same basic function of providing a variable size disk cache. While each of them has several optional features, it is beyond the scope of this article to compare and contrast them. Following is a list of disk cache products and prices readily available at the time this article was written. Except as noted the products are not copy protected. 1. Flash. $49.95. Software Masters, 6223 Carrollton Ave., Indianapolis, IN 46220 (317) 253-8088. 2. Personal PC-Kwik. $39.95. Multisoft Corporation, 18220 S.W. Monte Verdi Blvd., Beaverton, OR 97007, (503) 642-7108. 3. Lightning. $49.95 with copy protection, $89.95 without. Personal Computer Support Group, 11035 Harry Hines Blvd., #206, Dallas, TX 75229, (214) 351-0564. 4. Shareware PC-Kwik. Registration charge, $19.95. Distributed by Multisoft Corporation through personal-computer clubs and bulletin boards. 5. VCache. $65.00. Golden Bow Systems, 2870 Fifth Avenue, Suite 201 San Diego, CA 92013., (619) 298-9349 Conclusion The obvious way to increase the performance of your personal computer is to purchase additional hardware options. You may also want to consider upgrading to a more expensive model sometime in the future. In the meantime, you should consider using a disk cache. In so doing, you can save yourself both time and money. Note to newsletter editors: Permission is hereby granted to publish this article for commercial or non-commercial purposes, provided that changes are made only by permission of the author.