NetNews Usenet Archive 1992 #27

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #27 / NN_1992_27.iso / spool / comp / groupwar / 876 < prev next >

Wrap

Text File | 1992-11-15 | 20.2 KB | 433 lines

Newsgroups: comp.groupware Path: sparky!uunet!paladin.american.edu!darwin.sura.net!wupost!emory!sol.ctr.columbia.edu!eff!world!sss From: sss@world.std.com (Sergiu S Simmel) Subject: Groupware Infrastructure -- Join the Kala Forum Message-ID: <SSS.92Nov15210108@world.std.com> Sender: sss@world.std.com (Sergiu S Simmel) Organization: Penobscot Development Corporation, Arlington MA Date: Mon, 16 Nov 1992 02:01:08 GMT Lines: 422 --------------------------------------------------------------------- PENOBSCOT DEVELOPMENT CORPORATION announces A NEW FORUM FOR INFORMATION EXCHANGE on THE KALA(tm) TECHNOLOGY AND PRODUCT --------------------------------------------------------------------- = third announcement = Welcome to Kala -- The Persistent Data Server. Now, you can ... o keep yourself up-to-date with Kala's new developments, o share your experience in using Kala with others, o hear what others have to say about it, o ask us questions and benefit from the answers we provide to others, o participate in discussions on Kala-related technical topics, such as data/object persistence, visibility management, databases and file systems, etc., o learn more about the Kala technology, including as yet undocumented details, o give us important technical and business feedback on our products, o and more ... [for a sample posting, see attachment to this announcement] ... by subscribing to our new Kala Forum. This forum, organized as a mailing list, will be moderated, so we will try to keep its focus and orientation straight, and the junk mail out. If you'd like to see a past issue, you can access them via anonymous FTP from world.std.com. The entire Kala Forum archive is located in the pub/kala/KalaForum directory. All files are ASCII text. To subscribe (and subsequently for any other requests regarding your subscription, such as change of address, unsubscription, etc.), direct your request to: -------------------------- kala-request@world.std.com -------------------------- To contribute to the on-going discussions address your messages to: ------------------ kala@world.std.com ------------------ Since we value most advertizing by word-of-mouth and personal reference, please forward this message to whomever you believe could also benefit from subscribing. We are looking forward to your subscription request and your participation in ### ============== ### ### The Kala Forum ### ### ============== ### _ _ ____ _ ____ tm ____________________________________ \\ / | \ \ | \ \\\\ \\ /__ \ __ \ \ \ __ \ \\\\ \\ \ \ \ \ \ \ \\\\ \\ \ \ \ \ \ \ \\\\ No more than you need !!! \\' \' \' \' '----' \' \' \\\\ No less than you want !!! ........................................................................ Penobscot Development Corp. 50 Princeton Road Arlington Mass. 02174-8253 voice: +1-617-646-3951 fax: +1-617-646-5753 email: kala@world.std.com For BYTE Magazine, December 1992 Issue Persistent Data Servers --Maintain the structure of data --Perform like file systems --Offer database features ========================================================================= Objects of Substance Persistent data servers provide a new way to store object-based data. Sergiu S. Simmel and Ivan Godard ========================================================================= One knock often made against software objects is their transient nature. Traditionally, an object is ephemeral; it is defined, manipulated, and destroyed by the program that creates it. It has no existence beyond the program's execution. Unlike real-world objects, and unlike computer generated data that exists outside of a program in a file system or database system, software objects are usually not persistent. The only way one program can share an object it creates with another program is for the two programs to be executing at the same time. This requirement puts a crimp in any plans for developing distributed object systems. Object-oriented database management systems provide one means of giving objects the characteristic of persistence; file systems provide another. Neither solution, however, is ideal for all applications, situations, and implementations. That's the rationale behind a new class of storage software called Persistent Data Servers. --------------- Hobson's Choice ----------------------------------------------------------------------- The simplest persistent data storage available to you is the file system on your disk drive. File systems have some attractive characteristics; their performance is good, they can hold any data, they're easy to use, and, of course, the price is right. Conversely, files are unreliable. They provide no mechanism for in maintaining data consistency and only primitive data sharing facilities. Few file systems offer version control and all require that you transform data between "internal" and "external" forms all the time. Unlike a file system, a true database management system provides mechanisms for sharing data and for ensuring the integrity of the data. It supports transactions and version control, although the specifics of these functions may not be exactly what your application needs. Finally, a database system is scalable, and much more robust than a file when your hardware or software fails. The downside to a database system is that, compared to a file system, it is slower by an order of magnitude or more. Also, a database system generally confines you to dealing only with the kind of data that it can handle. In addition, a database is usually very complicated, difficult to learn and use, and expensive, both in terms of your cost of operation and in the amount of system resources they consume. Whether you choose a file system or a database manager, then, you have to sacrifice either economy or performance. Is there a happy medium? Something with the speed and flexibility of files, the reliability, shareability and robustness of databases, and at a cost that won't break your wallet or the available hardware? A new breed of products, persistent data servers, aims squarely at the yawning gap between DBMSs and file systems. -------------- An Alternative ----------------------------------------------------------------------- Kala is a persistent data server from Penobscot Development Corporation (Arlington, MA). It is a software subassembly, available to applications and database managers, that manages both the state and visibility of persistent data. It takes care of the how and the where (how data is stored and retrieved, and where it is stored), and also copes with the who, which and when of data management -- who can store and retrieve which data and when. Kala is similar to a file system in its simplicity, high performance, low semantic level (although it also supports pointers, not just bits), and low cost operation. And, it is similar to a DBMS in its robustness, support for transactions, security features, access control, configuration ability, reliability, scaleability, and so forth. But, at the same time, it is different than either of these environments. Kala combines the benefits of both these worlds while avoiding the drawbacks of each. This type of storage software can provide low level persistent data services. No more, no less. -------------- Managing State ----------------------------------------------------------------------- Like file systems, a persistent data server offers a get/put interface to the storage subsystem and can store any kind of data. Unlike file systems or the BLOBs (Binary Large OBjects) used by some database systems, a persistent data server lets the stored data retain its internal structure, no matter how complex. Suppose your application builds a linked list in memory and saves the list to the persistent store. When you retrieve that data it will still be a linked list -- topologically the same as the original even though the memory addresses of the nodes are completely different (see the figure). Of course, object databases can also store references, but the links used by the persistent data server are regular machine pointers, not performance-costly object-oriented pointers. Your stored data can have any representation, including packed structures and executable code. You aren't restricted to a few primitive data types or the type of structures offered by a specific access language. Kala is as happy storing C++ or COBOL data as it is Lisp, assembler, or Smalltalk. ----------------- Development Steps ----------------------------------------------------------------------- The type of persistent data storage Kala provides lets you forget the distinction between in-memory and on-disk data or object "formats." You can program using Kala as if your code never had to remember anything across executions or applications. Write your applications as a demo, with dummy data and no storage i/o. You can lay out your data or objects in memory in the way best suited for in-memory-only processing and fastest execution of your algorithms. Once you are satisfied with the execution of your new "demo" application, you can think about a production-level persistent store for your objects. You first decide what the "unit of transfer" is, that is, which data should go to store and come back together as a unit. The ability to choose the transfer unit improves performance because you can bring in at once all the different pieces of data your application requires. These pieces may be many different objects or parts of objects -- the application doesn't care. For example, if the data you're using is a linked graph structure, you can either transfer the entire graph at once or just each node as you need it. Or, you can load in the entire graph except the contents of a single large but rarely referenced field in each node. You can even bundle the graph with other data, or choose some other unit of transfer. The transfer unit you select consists of hunks of bits and pointers possibly spread all over memory. Using convenient calls to the API, you tell the software where the data is and where, within that data, the machine pointers are. The persistent data server takes care of the rest. It copies that data (no more and no less) onto the persistent store, and gives you a "claim check" in return. When you present that claim check, the server will promptly retrieve that same data and lay it out in the application memory. ------------------- Types Without Limit ----------------------------------------------------------------------- Persistent data servers can handle anything that's made out of bits and pointers including objects, source code, records, images, executable code, noise, video, and so forth. This "model neutrality" makes a persistent data server an ideal interoperability point in the storage domain. It can reside "below" all other subassemblies and components that support only one or at most a few data organizations. In this respect, the role of a persistent data server in the storage domain resembles the X Window System in the display domain, or Postscript in the printing domain. For example, an object management system can interpret data as object slots and methods. Because a persistent data server isn't bound to any particular notion of object, it can simultaneously support several types of objects. The access to and visibility of these objects is guaranteed to remain the same for different language systems, different hardware platforms, and different object management systems. ------------------- Managing Visibility ----------------------------------------------------------------------- Conventional DBMSs and file systems deal with transactions, access control, security, licensing, version control, and configuration control as separate services. This practice has led to a proliferation of transaction managers, security managers, configuration managers, etc. The net result is unnecessarily complex, large, and overhead-burdened products. A persistent data servers works differently. It recognizes that all the services offered by traditional DBMSes are simply facets of the same basic problem: controlling the visibility of data. If you analyze the nature of a transaction commit in a conventional database, you find that it is a means making new values visible to the rest of the world by replacing the old values. Look at security grants. A security grant is simple a means of making data accessible (visible) to qualified agents until the access is revoked. You can think of a license as a means of making a dataset available (visible) to someone on the basis of pre-paid rights. A configuration is simply the bundling of a collection of data so that the collection is always visible as a unit. Each DBMS has its own idea of how to implement the semantics of these services. Take transactions, for example. Many useful transaction models exist, because the needs of automated teller machines are different from those of CASE repositories, which, in turn, are different from those of Personal Information Managers. Several useful access control schemes also exist. Security is treated differently in each organization, while all information vendors have different needs for their licensing models. Mathematically, all models are equivalent because each can be used to implement any of the others. But, in practice, trying to do so leads to unwarranted complications, overhead, and bulkiness. Persistent data software should be different. An application like Kala doesn't provide a single model, or a "one-size-fits-all" solution for each service. Instead, it provides a handful of primitives that you can use to build the right model for the application. Simple models typical of conventional DBMSs can be supplied prebuilt for you to use. -------------------- Managing Performance ----------------------------------------------------------------------- The performance of a persistent data server for a single user is equal to the performance of a good file system when reading and writing the same data. Perhaps surprisingly, its <I>relative performance actually improves with multiple users in a client/server configuration. This phenomenon occurs partly because of the seek optimization and shared buffering of common data used by Kala, and partly because it is no longer necessary for each application to individually open and close files. Kala is algorithmically faster than equivalent conventional technology exactly when you need it most: at peak server loads. It uses a non-write-in-place strategy, never overwriting a prior value. This feature gives it an effective 50 percent update performance advantage in transaction contexts such as OLTP (On Line Transaction Processing) applications. Kala requires only 1 + 1/n disk accesses per update (one to write the new data to free storage, and a fraction to record the commit where the commit record is shared with other transactions). A high-performance conventional DBMS needs 2 + 1/n disk accesses for the same task (one to write the former value in case of crash, one to write the new data back over the former value, and again, a fraction for the commit). This performance gain is not at the expense of data reliability and recoverability. ----------------------------------------------- Persistent Data Servers Versus Object Databases ----------------------------------------------------------------------- Any quality ODBMS can recover all transactions that have been committed, even if it were only milliseconds before the crash. Persistent data servers can do the same, while performing as fast as less reliable systems such as file systems. Many conventional ODBMS that have good performance as single-client applications with systematic access patterns, but degrade badly in multiple- client applications such as groupware, or when used concurrently by different applications that randomly access large pools of data. Many ODBMSes are tuned to display quick response to predictable access patterns. Thus they often achieve local (per-client) optimums at the expense of global (across clients) slow-down. For example, some ODBMSs improve object faulting performance by page- mapping databases using the file-mapping facilities of the OS. In this instance, the unit of transfer is the fixed size virtual memory page (or a multiple of it). These ODBMSes show no sensitivity to the actual access patterns of the application. If the data is a payroll database, an application may need pay records scattered throughout the database. The result in a page-based ODBMS is that it may bring a 4- or 8-KB page into memory to get an object that may be a few hundred bytes at most. The remainder -- perhaps 80 or 90 percent of the total space and access time -- is wasted. The ODBMS may be performing well but the application grinds to a halt due to thrashing in the operating system's page manager. By contrast, Kala's user-specified units of transfer eliminate internal fragmentation. You get only what you requested -- that is, as little as one byte and as much as the size of the virtual memory, or more. In a multi-user environment, this feature also takes care of the severe security loopholes introduced by page- mapping based approaches -- another acute real-world problem. In conventional systems with single users, you can overcome thrashing and other performance problems by having the user manually cluster the data, relying on the programmer's ability to predict the access patterns of a single application and thus optimize the database for that application alone. However, this traditional technique breaks down badly when one application needs one selection of data from the database, and a second application, perhaps running concurrently for other users, needs different selections. The result is less-than-optimal global behavior. Kala doesn't employ such local optimizations. Instead, it uses actual access history to dynamically rearrange the store, so that global optimum occurs. If there is only a single user application, this type of software should be able to achieve clustering as good as the best packing performed by hand. It also should give globally optimum performance in multiple applications, without requiring the services of an expensive database administrator to tune the clustering. -------------- Moving Forward ----------------------------------------------------------------------- Persistent data servers such as Kala provide a new and exciting middle ground between the performance and simplicity of file systems and the capabilities of database managers. They are particularly useful as the underpinnings of object stores because they maintain the structure of the data on the disk, making it independent of the application that created it. More and more, applications need access to complex data types. More and more, applications must support multiple users in a distributed environment. From flat files to objects, persistent data servers can handle them all. ----------------------------------------------------------------------- Ivan Godard and Sergiu S. Simmel are co-founders of Penobscot Development Corporation of Arlington, Massachusetts. Simmel holds an MS in Computer and Information Sciences from the University of Minnesota. His areas of expertise include CASE, hypermedia, and object-oriented databases. Godard has contributed to the development of Algol68, Ada, and Mary, the first wide-spectrum language, and has taught post-graduate courses at Carnegie-Mellon and the University of Maine. You can reach them at +1-617-646-3951 or on the Internet as kala@world.std.com. -----------------------------------------------------------------------