Position statement: Infrastructure and cost models for digital libraries Ouri Wolfson Univ. of Illinois, Chicago and Nasa/Cesdis ABSTRACT: In this short paper I claim that the infrastructure and cost models (and their inter-dependence) are two key research areas to be explored in order to make digital libraries a reality. BACKGROUND: We are witnessing the emergence of a new form of computing/communication environment, called information-at-the-consumer's-fingertips. In such an environment, tens of millions of consumers will access a large number of databases, digital libraries, and other online services. The type of access will vary from WWW-type browsing of long duration, to quick retrieval of a data item (e.g. the price of a stock), to triggers that inform the consumer when an event occurs (e.g. a peace treaty has been signed). Some of the accesses will occur from portable computers equipped with wireless data communicators. One problem that everybody recognizes is searching for information in this worldwide information system. In this short position paper I'd like to discuss another problem, namely determining the infrastructure and cost of information. There will be two types of charges incurred in this information-at-your-fingertips environment, namely access and communication charges. Access charges will be paid to the information provider, and communication charges will be paid to the network provider. For example, currently RAM Mobile Data Corp. charges on average \$0.08 per data message to or from the mobile computer (the actual charge depends on the size of the message), and Data Broadcasting Corp. charges for providing the prices of financial instruments. Today, electronic and other services use cost models that are time based, or request based, or a combination of the two. This holds for both, access and communication. In a time-based model the customer pays for unlimited access to the service for a time-unit (say a month); cable television uses this model. In a request based model the customer pays per request (e.g. news-stand purchase of a newspaper). Once a cost model is selected by an information provider, it remains fixed until it is changed manually. Given limited resources, and the marketing opportunities presented to information providers by the new medium, this solution is unsatisfactory. Similarly, information consumers have unprecedented capabilities for comparison shopping that results in cost minimization. Therefore, we need to develop algorithms and systems that will enable cost minimization for information consumers, and optimal resource utilization and exploitation of new marketing opportunities for information providers. RESEARCH ISSUES: We need to develop algorithms and systems for automatically determining the price of information and communication services. These systems should also dynamically adapt the prices, and maybe even the cost models, to supply and demand. I believe that the information infrastructure on one hand, and the access and communication cost models on the other, will be driving forces that determine the how fast the new environment emerges, and its final shape (access modes and protocols). One issue is to study the infrastructure and cost models that will best facilitate the evolution of digital libraries from the service providers point of view. For example, how can some or all intellectual property rights be transferred automatically from the owner to subcontractors and consumers? When to do so, and how to charge for these rights? How can intellectual property rights be enforced? In terms of the physical architecture, how should the information be allocated and replicated? What is the relationship between property-rights allocation and physical data allocation? What infrastructure and cost models will minimize the overhead for consumers and providers? What are the primitives and parameters of an information-pricing system? Another issue is to study cost models from the consumers' point of view. Consumers will receive monthly bills from multiple information and communication providers. In many cases these charges can be optimized, and how to do so depends on the cost cost models, and the architecture. We need to develop systems that will assist the consumer in cost management. For example, consider a consumer reading the New York Times. She can subscribe to receive the newspaper every day, and in this case the price per copy is lower than the (electronic) news-stand price. However, the the consumer may pay for issues that she does not have the time to read. On the other hand, the news-stand price may be higher, but the consumer buys the access only when she is certain to read the paper. Which is the better access protocol? Obviously, the answer can be computed based on the subscription versus news-stand price, and based on the probability that the consumer reads the paper on a given day; and this probability can be computed based on past experience, i.e., a cost optimization system can ``learn'' the access pattern of the user. Given that the consumer may access many data sources, the research problem is to abstract the cost management problem by determining the primitives and parameters of a cost management system. For example, in addition to the cost model, optimization also depends on the frequency of retrieval versus the frequency with which the information is updated, namely the access pattern. Communication cost also depends on the availability of a broadcast option. Namely, it may be cheaper to broadcast an update than to point-to-point transmit it to multiple clients. In [HSiW-94, SWDNR-96] we analyzed several algorithms that select a provider, cost model, and access protocol for cost optimization. Another factor that determines the cost to the consumer is the needed timeliness of the information (e.g., if the consumer can tolerate data that is 10 minutes out-of-date, then both, the communication cost and the access cost may be reduced). In [HSlW-94] we analyzed some aspects of this problem. Finally, for both, consumers and information providers it is important to develop algorithms and systems that retrospectively examine pricing and purchase decisions; they should be able make or recommend strategy changes. Bibliography [HSiW-94] Y. Huang, P. Sistla, O. Wolfson, "Data Replication for Mobile Computers", Proceedings of the ACM-SIGMOD 1994, International Conference on Management of Data, Minneapolis, MN, May 1994, pp. 13-24. [HSlW-94] Y. Huang, R. Sloan, O. Wolfson, "Divergence Caching in Client-Server Architectures", Proceedings of the third International Conference on Parallel and Distributed Information Systems (PDIS), Austin, TX, Sept. 1994, pp. 131-139. [SWDNR-96] P. Sistla, O. Wolfson, S. Dao, K. Narayanan, R. Raj, "An Architecture for Consumer-Oriented Online Database Services", Proceedings of the 6th International Workshop on Research Issues in Data Engineering: Interoperability of Nontraditional Database Systems (RIDE-NDS'96), New Orleans, LA, Feb. 1996, pp. 50-60.