Editors: Nabil Adam and Yelena Yesha
October 17, 1996
This paper identifies strategic directions in the evolution of electronic commerce and digital libraries by highlighting common issues as well as unique requirements for robust EC and DL systems. EC and DL are two major components of the emerging global, open marketplace (digital agora) for information, goods, and services. They will provide electronic analogs of current practices and financial instruments, protect the rights of individuals and enterprises, and stimulate new uses of information technologies. The digital agora will combine research and technologies from many subfields of computer science and information systems as well as interdisciplinary expertise from many other disciplines.
Electronic commerce (EC) and digital libraries (DL) are two increasingly important areas of computer and information sciences with different user requirements but similar infrastructure requirements. In exploring strategic directions, we examine both requirements of the global information infrastructure that are a necessary prerequisite for EC and DL [2], and specific requirements of EC and DL within the global infrastructure.
Both EC and DL are concerned with systems that support the creation of information sources and with the movement of information across global networks. EC supports effective and efficient business interactions and transactions that take place on behalf of consumers, sellers, intermediaries, and producers, while DL supports effective and efficient interaction among knowledge producers, librarians, and information and knowledge seekers. A digital library may require the transactional aspects of EC to manage the purchasing and distribution of its content while a digital library can be used as a resource in electronic commerce to manage products, services, providers and consumers. EC and DL share a common infrastructure in the networking, security, searching and advertising, negotiating and matchmaking, contracting and ordering, billing, payment, production, distribution, accounting, and customer service mechanisms that support such distributed information systems [31].
In a generic EC/DL model, providers (information providers, merchants, retailers, wholesalers) make multimedia objects available to consumers (customers, information seekers, users) in exchange for payment. An EC/DL system itself is characterized as a collection of distributed autonomous sites (servers) that work together to give the consumer the appearance of a single cohesive collection. Each site may store a large number of multimedia objects (documents, images, video, audio, software, structured data). This content may be stored in a variety of formats and on a variety of media such as disk, tape or CD-ROM and typically originates from a variety of providers who may wish to control its use (retrieval or modification) or to add value. Consumers are assumed to have a wide variety of domain expertise and computer proficiency which must be taken into account by designers of EC/DL systems.
Section 2 examines EC and DL research requirements in six key subareas, while section 3 provides case studies that describe three electronic commerce research projects (USC-ISI, CommerceNet, First Virtual) and six digital libraries projects sponsored by an NSF/ARPA/NASA initiative.
This report focuses on the following common areas of EC and DL research: Acquiring and storing information, finding and filtering information, securing information and auditing access, universal access, cost management and financial instruments, and socio-economic impact.
Both EC and DL benefit from general research on acquiring and storing information, but have specific requirements due to differences in processes of information acquisition and the kinds of information being stored.
EC must handle parts lists, price lists, trading partner lists, catalogs, advertisements and other commerce-related materials, as well as content itself (e.g., books, magazines and movies). In such a scenario, the digital library is used as an information repository to facilitate electronic commerce.
Digital libraries are also dynamic in that materials can be added and updated by many authors. Any individual with access to the Internet can play the role of author and content provider and may wish to charge patrons for the content. For electronic commerce, price lists, advertisements and the like will change over time. Thus an EC/DL system will require mechanisms for updating and adding to the content. The challenges here are:
The vast majority of potential EC/DL content is not currently in digital form. Millions of books, articles, films, audio recordings, maps and other sources are presently stored in their physical, non-digitized form. Materials such as data sets may be stored in a legacy format requiring conversion. Bringing such artifacts on-line is not trivial and many researchers have addressed the basic problem of digitizing existing media [26]. In addition, many forms of media such as newspapers and television broadcasts are now produced in digital form and must be captured in real time.
These challenges include:
The database working group provides additional material on the storage of large volumes of information (see http://www.cs.brown.edu/people/sbz/cra/).
As demonstrated on today's world wide web, any individual with access to the Internet has the potential to be an information provider by making content available for consumer. A key issue for EC/DL is if and how to incorporate (e.g., capture and index) each individual's contributions to the global market. How do we establish criteria for what we do include? How do we archive it if our libraries acquire only pointers to someone else's content, and not the objects themselves?
An elective means of accomplishing this, although limited in scope, exists in the form of on-line malls (e.g., Marketplace MCI http://www2.pcy.mci.net/marketplace/) consisting of several electronic store fronts. However, these fall short of the massive scale requirements dictated by a truly global economy.
Feature selection and extraction is the process of automatically recognizing features in images, scanned pages of text and other media. Information about the features (called metadata) can then be used to form indexes to satisfy queries or for further processing. User requirements dictate the (possibly multiple) types of indexes created for objects. Identifying and efficiently extracting effective and robust features from multimedia data is a problem whose solution will enable highly sophisticated functions. This includes extracting features from the whole range of data in EC/DL systems including text, audio, images, and video. However, the queries over the resulting metadata are not limited to traditional exact match queries, but include spatial, temporal, and other logical relationships (usually domain dependent), while providing for approximate matches [14, 12, 29, 11].
Even though there is a very large body of work in image and speech processing, image analysis, and pattern recognition, incorporating such work into an EC/DL system has thus far not been adequately addressed. The search for efficient computable, effective, and robust features, both global and local with respect to image, video or audio, is far from over. Furthermore, considering that a multimedia object may contain text, still images, video, and audio segments, the task of utilizing the features for each individual component to characterize that object for retrieval purposes is both challenging and promising. The objective is to select and extract features from a multimedia object that can be used to retrieve similar (approximately equivalent) multimedia objects.
Assume that a suitable set of features has been selected, and automatic extraction methods are available. Designing a query language and indexing schemes to process queries efficiently is the next important step [14, 12]. The query language is highly influenced by the notion of approximate similarity adopted in selecting the features. For example, while point-access methods are well studied, they are suitable for indexing numerical features. New indexing schemes are needed for non-numerical features such as graph-based representations.
Consumers will vary in their tolerance of the quality level of the information they receive in formats such as images, video or structured data. DL systems must cater to individual needs by providing efficient and effective representations of information. An opportunity for EC systems is to provide an array of choices (multiple levels of quality ) with associated proportional prices.
A consumer may be willing to tolerate a low quality image or just the titles and abstracts of a collection of papers if the price is significantly lower. From an information provider's standpoint, the cost of delivering such an object will be lower as fewer bytes are transmitted. However, the acquisition of objects must then be done at various quality levels or methods provided to convert the original object into the requisite version.
An interesting new direction is measuring the provider's costs in terms of the opportunity costs of providing the service, rather than actual amount of resources that are being consumed. The opportunity cost is determined by the relationship between the supply and the demand for a given resource, so that opportunity cost of an idle resource is close to zero, and the cost of an over-utilized resource is so high that it is basically unaffordable.
Another consideration is the perceptual measurement of image quality. It is well known that the widely used mean squared error measure is not adequate for assessing the quality of images as perceived by humans. A quality measure that can be efficiently computed and at the same time correlates well with the opinions of human viewers is needed for the design and evaluation of image-processing systems such as image-compression systems. Such perceptual measures, including measures based on models of the human visual system, have been introduced by researchers. One example of such work is a paper by Heeger and Teo [19]. We believe that, due to the complexity of the human visual system, there is a need for additional interdisciplinary research that will involve researchers in the areas of image processing, optics, neurobiology, and psychology, and will result in improved models of the human visual system that will enable the introduction of improved perceptual image quality measures.
In an EC/DL system, consumers will need on-line facilities to help them retrieve information and to locate resources that match certain expectations and desires. Matchmaking programs are required to bring together producers and consumers. Examples of such services include:
The following are challenges for EC/DL systems:
From an EC perspective, consumers seek to find products and services at low cost using language and terminology they are most familiar with. The unique challenges for EC include:
DL research in agents has focused on bringing users in contact with information items of interest. From an EC perspective, agent technologies would be useful to make providers aware of consumer needs and to make consumers aware of a provider's offerings. EC/DL systems will take this one step further by having agents negotiate the terms for a transaction.
Many definitions have been put forth to describe what a software agent is. These range anywhere from the adaptable information filter to autonomous programs that work in conjunction with, or on behalf of, a human user. A taxonomy of agents usually involves the three dimensions of agency (degree of autonomy), intelligence (degree of reasoning behavior) and mobility (in an internetwork context). Software agents also embody the notion of improving over time as they record additional user actions and reactions [28].
Examples of existing agents are the robots (spiders, crawlers, etc.) that traverse the world wide web to build indexes of its content and news and stock quote filtering systems that build a customized news feed (via e-mail or web page) based on user preferences. Other examples can be found at UMBC's AgentWeb site (http://www.cs.umbc.edu/agents/).
From a consumer's perspective, EC/DL systems will require decision agents that can learn an individual consumer's preferences, seek out appropriate providers and negotiate requests for further information (e.g., to bring to the user's attention) or initiate purchases. For example, Anderson Consulting CSTR's BargainFinder Agent (http://bf.cstar.ac.com/bf/) scans multiple price lists of compact disks and returns a list of retailers with the lowest price. These tasks must be carried out in a secure and safe fashion such that a consumer's privacy is maintained and the potential for fraud is minimized.
A more difficult problem is from the provider's perspective: How to identify potential consumers and their preferences? Consumers routinely expect access to a provider's product, price and company information. However, few consumers would voluntarily make information about themselves, such as preferences and buying habits, available to providers. A provider may employ Demand agents that can provide product, price and availability information to consumers' decision agents. A demand agent may simply reply to requests for information on behalf of the provider or it may actively seek out consumers' decision agents.
Underlying all discussion of agents is the need to standardize on agent communications and interfaces. The Artificial Intelligence working group provides additional information on the topic of intelligent agents (see http://www.medg.lcs.mit.edu/doyle/sdcr/ai/).
Matchmaking is a process whereby consumers seeking goods and service with given specifications are put in contact with providers whose goods and service match the specifications. Providers may also seek consumers in a similar fashion. A reasonable assumption for the matchmaking task is that the representation of the consumer's specifications (typically formed as a query) will differ from the representation of the provider's goods and services. For example, one provider may maintain a structured database of products while another provider may maintain a collection of text documents describing the features of each product. The schema integration and data integration problems for structured data are well known in the federated databases and heterogeneous systems literature. The challenge is to offer matching services across both structured and unstructured collections of data.
Exact and/or approximate similarity query processing methods can be effectively used to perform matchmaking. For example, for every item for which a match is sought, one could effectively construct an ideal match and then resort to approximate similarity methods to find the best matches, those that are closer to the ideal match.
In the Tsimmis project at Stanford (http://www-db.stanford.edu/tsimmis/), translator agents perform query and result set conversions between the native source and a common format understood by other agents. For queries, this can mean conversion to a specific query language, invocation of a keyword search mechanism or some other conversion.
To attain even more flexibility, we can imagine the case where there is no globally understood query language and data exchange format. Rather we have facilitator ``agents'' sprinkled throughout the system which know how to perform translations from one data type to another and from one query language to another. These facilitators, must be able to communicate the capacities they possess and can make available to information requesters and sources. The Knowledge Query Manipulation Language (KQML) is becoming a common means for exchanging this type of information with intent (i.e. the sender intends for the receiver to store the content, forward it on, validate it, etc).
An ontology is defined as the set of terms and relationships used in a domain, denoting concepts and objects, often ambiguous among domains [18]. Presently, ontologies are being constructed in a number of domains for a wide variety of purposes. Some, such as Gio Weiderhold's glossary of terms supporting the I3 initiative [39], are used as a domain specific dictionary or thesaurus while others are used as a basis for exchanging information between information systems designed around differing domains. In digital libraries, ontologies are employed to facilitate information retrieval by directing users to related domains. Ontology support for bidding and negotiation in electronic commerce is described in [24]. At Stanford University's Knowledge Systems Laboratory, a project is currently underway to build ontologies that will allow buyers in the U.S. government's Defense Logistics Agency to locate items of interest in the vast Federal Supply Catalog (http://www-ksl.stanford.edu/kst/dealmaker.html).
EC and DL systems emphasize different aspects of ontologies but common principles underlie the development of ontologies for information systems. The two examples of ontologies used in EC reflect a structural view in that specific items, part numbers and descriptions all have distinct relationships captured by the ontology. Digital libraries must cater to users with diverse backgrounds by offering broader, more general ontologies that are inter-linked to cover many domains. For example, a physicist should be able to search for information in the chemistry domain using the physics terminology that is more familiar to him or her. EC/DL systems require both aspects of ontologies to allow a broad range of users to locate products, services and information.
Ontologies are described in some type of representation language such as LOOM, Epikit, Algernon or the Knowledge Interchange Format (KIF). A means to translate (map) between these representations is desirable. [18] describes a portable ontology specification called Ontolingua that can be used as a translation tool between ontologies described in several representation languages. With such tools, the mechanics of mapping between ontologies is made easier.
Data mining is concerned with extracting patterns, associations, and anomalies from large databases and data sets. Here again there are general data mining principles common to EC and DL, and domain-specific data mining problems. The DL has within it not just documents, but many databases. Data mining techniques are applied to some set of these to create new information or insight.
As the use of the digital agora grows, there will be an increasing amount of information collected. Some of the most interesting information is: 1) the access patterns to information in the agora by consumers and 2) the purchase patterns of goods and services in the agora by consumers. A fundamental challenge is to develop algorithms and software for the indirect support of the agora by understanding the information in these patterns and exploiting this information as a basis for better decision making. The problem is important because of the value of the information that can uncovered.
The types of questions providers ask based on access and purchase patterns are: What information is accessed or purchased and by whom? How likely is it that a new object will be accessed/purchased and by whom? What objects are likely to be accessed/purchased together? Which subjects are becoming more important as indicated by trends in the number of objects available and the number of requests for them? Which objects are naturally grouped together on the basis of patrons requests, or sequences of requests? Are there local and regional variations in the purchases of goods and services? Which transactions are fraudulent? How are the purchases of a consumer likely to change over time?
From a consumer's perspective, data mining techniques can also be put to use to discover underlying trends and patterns. For example, before investing in a company's stock, an investor may want to query across multiple EC/DLs to find associations between the board of directors and management and any prior concerns they were involved with.
There are a variety of techniques that have been developed for data mining, including the use of classification and regression trees, clustering algorithms, neural nets, and Bayesian nets for retrieving objects managed by the agora and for the transactions by the agora's consumers.
Current data mining and data warehousing algorithms cannot adequately handle complex data, such as time series associated with purchase decisions, heterogeneous and distributed data, such as arises in a typical EC or DL system, or the volumes of data that will soon be generated by EC and DL systems (cf. [16] and [32]).
EC/DL systems have unique requirements with respect to processing consumer queries. Queries may span multiple EC/DL systems and, if not filtered, may return excessive amounts of result data that can overwhelm networks and systems as well as the consumer's cognitive abilities. Thus query processing must balance the issues of query expansion, to retrieve similar objects of interest on distributed servers, and filtering or query refinement to reduce the size and complexity of result sets and to help pinpoint information of interest. Query expansion is a language semantics issue while filtering and query refinement are most often addressed in conjunction with user interface issues [10].
EC queries are typically directed to locate specific products and services. The task is one of narrowing down a potentially large number of sources to one as in the case of locating the least expensive but most conveniently scheduled airline transportation. Query expansion may be used to include additional air carriers or to vary the dates of travel within some range. Query refinement might then be used to narrow down the choices based on a cost/convenience trade-off.
While exploring digital libraries may consist of a directed search for a specific piece of information, open ended queries such as ``Tell me about the Civil War'' may also be asked. In such cases, query expansion may be employed to include additional information such as factors leading up to the war or to include additional forms of media such as text, maps and images in the reply. Such queries are typical examples of the need for query refinement. There is a whole spectrum of other types of queries that lie between the EC and DL query examples given here.
Though security lapses in EC can be more spectacular, both EC and DL will be a part of an information infrastructure with common integrated security requirements. Information in a private EC/DL system must be protected from unauthorized use and abuse, must ensure the privacy of its users and must protect the intellectual property of the providers and authors. At the same time, legitimate users should be able to add and update information and buy and sell products and services with minimal impediments. The following are challenges for EC/DL systems:
The greatest design challenges for secure EC/DL systems lie in the formulation, specification, and enforcement of comprehensive data protection policies. These policies must take into account the differing concerns of authors, publishers, merchants, librarians, and users, and they must offer three varieties of security:
The secure EC/DL system, then, must address the concerns of its constituents with respect to confidentiality, authenticity, and integrity. Given the volume and complexity of data, the system design must balance security concerns against responsiveness and performance.
While these challenges are great, perhaps the thorniest issue is the need to deal with policy interactions across distributed providers. Providers may have different concerns and motivations, and they may use variants of access operations. Providers' security policies certainly will differ and conflicts among policies must be handled in reasonable ways, without unduly denying or compromising service. Security policies and enforcement mechanisms must be explicitly considered as one aspect of heterogeneity to be managed in digital libraries [7].
Several existing research areas and emerging technologies appear promising for at least partially addressing the security needs of EC/DL, though they will need adaptation or extension for the unique characteristics of this environment. Digital signatures, checksums, and encryption (see, for example, [23] or [4] will provide support for some critical aspects of authenticity and confidentiality. Security features that will be refined as part of the maturation of object database technology will help with protecting composite and multimedia objects in terms of extensible operation sets. Work being done in security for federated database systems will help with issues of the interaction of autonomous policies (see, for example, [6, 13].
Recent advancement in holographic storage and retrieving technology offers a new and promising security verification system for identification and/or credit cards. By utilizing a pseudo-randomly generated phase mask which stores, in a nearly invisible fashion, biometrics information and secret code on ID cards, an all-optical correlation can quickly and reliably separate authentic IDs from counterfeits.
Intellectual property is yet another topic that does not respect the boundaries between EC and DL. Commercial transactions involving EC/DL objects (e.g. books, articles, movies) will necessitate the transfer and enforcement of some/all intellectual property rights. For example, the sale of a book may or may not allow the buyer to reproduce parts of the book for profit. Books, movies and music are all different with respect to the application of copyright law, fair use and performance rights. We will need to develop schemes for the enforcement of intellectual property rights, and the detection of violations. Some progress on this front is already being made. For example, several recent works propose methods of efficient automatic detection of ``suspicious'' similarity between objects
Detecting violations is only part of the story, however. The enforcement of intellectual property rights requires pursuit of the offenders by legal means which may be out of the financial reach of an individual provider or may not be cost effective if the abuse is spread out over millions or billions of individuals. Additional insights into these issues as well as a method for preventing copying of documents, called text flickering, are presented in [25].
In an EC/DL environment today, there are a multiplicity of information appliances. Consumers may use TV sets, radios, PCs, PDAs, laptops, and cellular phones to access information. The goal of the National Information Infrastructure (NII) project is to allow universal access to distributed stores of information (Digital Libraries) and to provide this access at a reasonable cost to every citizen [15]. Widespread use of EC/DL is predicated on the universal availability of access for a large number of users [1]. EC/DL systems will have to contend with many different information appliances. In such a heterogeneous world of objects, user interfaces, networks, clients and servers, the issue of interoperability assumes paramount importance. Challenges to EC/DLs are:
Interoperability is an important research area in software engineering [38]. An EC/DL system should interoperate with legacy and other systems such as database management systems and workflow systems. For example, EC/DL clients and servers will need to retrieve from, and update information in, corporate databases. Workflow systems will be responsible for monitoring and controlling work assignments in organizations. While flowing through an organization, a work-assignment may involve accessing the global digital library, or buying/selling a product. Thus EC/DL clients and servers may need to trigger and be triggered by workflow systems; data will also have to be exchanged between the two types of systems. Mediation and heterogeneous databases are two promising research directions currently being explored to address interoperability. [17] presents and overview of workflow systems including the application of Distributed Object Management (DOM) to support interoperability and integration.
The complex interaction among the heterogeneous systems will need to be managed by facilities for schema integration and standard protocols. These facilities will provide users and programmers with some form of a global schema of available information, albeit in a very restricted domain. Standard protocols will enable a client to search for the best provider of a service or product, and will enable the provider to maximize its profit. HTTP is an example of such a protocol, but higher layers will have to be built to give the infrastructure more intelligence, and elevate the abstraction it provides. International commerce will be greatly facilitated by some form of automatic translation between languages, at least in the restricted domain of electronic commerce.
User interfaces are an important component of an EC/DL system. They must incorporate a wide variety of techniques to afford rich interaction between users and the information they seek. In an EC/DL environment, a large amount of data spread through a number of resources necessitates intuitive interfaces for consumers to query and retrieve information. The ability to smoothly change the consumer's perspective from high-level summarization information down to a specific paragraph of a document or scene from a film remains a challenge to user interface researchers.
The Human Computer Interaction working group provides additional information on this topic (see http://www.cs.brown.edu/people/ifc/hci.html).
The success of EC and DL are critically dependent on progress in networking. In this area there is little difference between EC and DL requirements. Both require increased networking bandwidth fueled from two main fronts.
First, the number of consumers will undoubtedly increase. If the Internet is any indication, exponential growth in the number of users will be the rule for at least the next few years. Second, as the delivery of multimedia data becomes the norm, the demands for high bandwidth increase. However, high bandwidth, in and of itself, is not enough to support EC/DL systems. The intelligent use of bandwidth and the ability to guarantee bandwidth for a given time period are also required.
In an EC/DL system, the cost of communication resources must be taken into account. As is the case with varying qualities of DL objects, some consumers are willing to tolerate the delivery of lower quality objects (e.g., a video displayed at fewer frames per second) in exchange for lower cost. This exchange would be negotiated between a consumer and the provider. The reservation of bandwidth, in turn, would be negotiated between the EC/DL provider and the network provider.
The telecommunications working group provides additional information on these topics (see http://ana-www.lcs.mit.edu/projects/sdcr-net/).
Cost management appears to be more directly related to EC than to DL, but the infrastructure of cost management of network services applies equally to DL.
Society has grown accustomed to a wide variety of cost models and financial instruments. EC/DLs, however, present new challenges that are not adequately addressed by these models and instruments. For example, on-line services presently follow fixed cost models that are insensitive to changes in data contents and wholesale costs. With respect to these issues, EC/DLs should:
Today, information-service providers and consumers use cost models that are time based, or request based, or a combination of the two. In a time-based model the consumer pays for unlimited access to the service for a time-unit (say a month); cable television uses this model. In a request based model the consumer pays per request (e.g. news-stand purchase of a newspaper). Once a cost model is selected by an information provider, it remains fixed until it is changed manually. Given limited resources and the marketing opportunities presented to information providers by the new electronic medium, this solution is unsatisfactory. Similarly, information consumers have unprecedented capabilities for comparison shopping that results in cost minimization. Therefore, we need to develop algorithms and systems that will enable cost minimization for information consumers, and exploitation of new marketing opportunities for information providers.
For information providers, we need to develop algorithms and systems for automatically determining the price of information. These systems should dynamically adapt the prices, and maybe even the cost models, to supply and demand. For example, at the system architecture level, there are tradeoffs concerning the storage of replicated data versus higher communications costs and between costs of processing expanded queries versus returning approximate results at lower cost. What are the parameters and primitive functions of an information-pricing system? A related issue is how can some or all intellectual property rights be transferred automatically from the owner to sub-contractors and consumers? When to do so, and how to charge for these rights? How can intellectual property rights be enforced?
Another issue is to study cost models from the consumers' point of view. Consumers will receive bills from multiple information providers. In many cases these charges can be optimized, and how to do so depends on the available providers of a piece of information, their cost models, and the access patterns and needs. Therefore, we need to develop systems that will assist the consumer in cost management. For example, consider a consumer reading the New York Times. She can subscribe to receive the newspaper every day, and in this case the price per copy is lower than the (electronic) news-stand price. However, the consumer may pay for issues that she does not have the time to read. On the other hand, the news-stand price may be higher, but the consumer buys the access only when she is certain to read the paper. Which is the better access protocol? Obviously, the answer can be computed based on the subscription versus news-stand price, and based on the probability that the consumer reads the paper on a given day; and this probability can be computed based on past experience, i.e., a cost optimization system can ``learn'' the access pattern of the consumer.
Therefore, from the consumer's point of view, the optimal provider and/or access protocol depends on the access pattern, and on the various cost models and providers of a particular piece of information. [20, 33] analyzed several algorithms that select a provider, cost model, and access protocol for cost optimization. Another factor that determines the consumer cost is the needed timeliness of the information (e.g., if the consumer can tolerate data that is 10 minutes out-of-date, then both, the communication cost and the access cost may be reduced). [21] analyzed some aspects of this problem.
Finally, for both, consumers and information providers it is important to develop algorithms and systems that retrospectively examine pricing and purchase decisions; they should be able make or recommend strategy changes.
Examples of existing financial instruments include:
Both the CyberCash Wallet and Ecash can provide retail-oriented EC where consumers need not have pre-existing relationships with merchants. The reliance on cryptography, however, raises some questions in regard to the global acceptance of such products in light of (primarily U.S.) export restrictions on encryption technology.
The above bidding scenario can be handled today using standard electronic mail or in a more structured fashion using EDI transactions. A novel research area is to affect the bidding on and subsequent negotiation and purchase of products in an automated fashion.
It is worth mentioning that the awarding of the 1996 Nobel prize in economics to James Mirrlees of Cambridge University and William Vickrey of Columbia university for models of contract negotiation underscores the growing academic importance of theories about commerce and the maturing of electronic commerce as a topic with conceptual as well as practical substance.
The second challenge is the development of efficient algorithms and optimization for evaluating answers to complex electronic trade queries. This includes indexing and filtering, optimization-level (constraint) algebra, and global optimization. The practical feasibility of electronic trade systems rely heavily on the ability to deeply optimize queries. The algorithms for operators of optimization-level algebra for EC/DL systems are significantly different from those used in relational databases, although many ideas are still applicable.
Traditional database optimization, both compile-time algebraic simplification and run-time cost-based approaches are not adequate in the framework of electronic trade when dealing with information over Internet. The main reason is that constraint manipulation required to analyze complex trade objectives and access to data over the Internet are computationally expensive. Thus a fine balance should be found between the cost of each. For filtering queries, which are triggered every time new relevant information is available from the electronic market place, dynamic view maintenance in the presence of large constraint data sets presents a particular difficulty.
The third challenge is to develop a prototype electronic trade system with complex objectives and to perform a case study using it. This can be done on top of a constraint object-oriented database system, such as C-cube, which has been developed at George Mason University.
Public policies, regulations, laws and just plain public opinion can stop the development of technology in its tracks. The social and economic issues must be addressed either by new technology or modified technology. For these technologies to be effective, there is a need to first be aware of such problems and concerns and second to establish dialogue with researchers from different disciplines.
Six major issue areas are identified which directly flow from the impact of information technology on the world's economy, society and political communities. Our goal in raising these for continued discussion is to insure the beneficial implementation of EC/DL systems.
In the following sections, case studies from the fields of Electronic Commerce (in addition to those cited in Section 2) and Digital Libraries are presented.
There is a series of projects at the Information Sciences Institute (ISI) of the University of Southern California (USC) that demonstrate how collaborative work tools and high-speed communications facilities make it possible to acquire goods and services more cost-effectively [37]. This series of projects, called The Broker, consists of the FAST electronic broker for standard parts acquisition, The MOSIS VLSI fabrication service for custom VLSI fabrication, the MIDAS ASEM brokerage service for multi-chip module fabrication and the EZFAB service for systems assembly. The FAST system is described here.
Under sponsorship of the Defense Department's Advanced Research Projects Agency, ISI has developed a system based on the use of electronic mail for purchasing. FAST is a rapid and reliable purchasing agent accessing many distributors and manufacturers. Customers send quote requests and orders via e-mail. FAST obtains quotes from its vendors, and returns them via e-mail. The customer may then order through FAST, but the vendor ships directly to the customer. FAST pays the vendor directly and also bills the customer.
FAST also provides a ``quote-and-order'' capability when the decision to purchase an item can be made according to simple rules. If the quote obtained by FAST meets criteria provided by the customer, FAST will order directly without returning the quote to the customer. When quotes do not meet the customer's criteria they are forwarded in the usual way. FAST does not support a negotiation between buyer and seller, but because FAST deals with vendors on behalf of many customers, volume discounts can be passed on. FAST has a service charge which is currently 8%.
CommerceNet (http://www.commerce.net/) is an industry association advocating and coordinating the development of technologies used for Internet Commerce. There are presently over 200 members from industry and academia. Current projects and activities include:
Internet user demographics is another project currently under way at Vanderbilt University, Owen Graduate School of Management. This project will techniques and software to collect and analyze survey data on CommerceNet users to determine preliminary information on behavior, attitudes, opinions, and demographics. [8] provides additional references to CommerceNet projects.
First Virtual (http://www.fv.com/) is an Internet payment system (financial instrument) that follows a ``Green Commerce'' model. In this model, the merchant bears the risk of non-payment in that a purchaser has the option to refuse payment if the goods or services received are deemed unsatisfactory [36]. Other aspects of the model, such as buyer confirmation, are designed to reduce the risk to merchants. Some features of the model are:
The green commerce model is aptly suited for small purchases made in high quantities due to the low overhead and use of electronic mail which represents a low common denominator for Internet communications. First Virtual reports there are over 147,177 customers and 1,973 merchants in 144 countries using the First Virtual payment system.
In September of 1994, the joint NSF/ARPA/NASA Digital Library Initiative awarded grants to 6 university and industry consortia headed by Carnegie Mellon University, University of California, Berkeley, the University of Michigan, the University of Illinois, the University of California, Santa Barbara, and Stanford University.
The Informedia project at Carnegie Mellon (http://www.informedia.cs.cmu.edu/) focuses on the storage and retrieval of video materials for science and mathematics subjects and targets users in the K-12 educational community. The project combines speech recognition, image understanding and natural language understanding to both catalog video and to retrieve relevant video segments. The NetBill system is being developed to address authentication, security and privacy, access control, and auditing functions in a highly scalable and fault tolerant environment. Additional EC research issues include adding support for various pricing models and studying user's reactions and behavior when faced with various pricing models.
The focus of the U.C. Berkeley project (http://elib.cs.berkeley.edu/) is to develop digital library technologies to manage a wide variety of environmental data for the state of California including reports, computer models, maps, plans, aerial and ground photographs, videos and structured databases. Their overall approach is to adapt database technologies to meet the demands of digital libraries. Other components of the digital library include powerful interfaces that allow users to manipulate multimedia documents (not simply retrieve and view) and special network protocols adapted for distributed information retrieval.
The University of Michigan digital libraries (UMDL) project (http://http2.sils.umich.edu/UMDL/HomePage.html) is concentrating on making earth and space science materials available to a wide range of users via a single interface. Agent technology is a key component to achieving this goal. Content for UMDL presently comes from existing collections of science journals. Additional efforts include storing and accessing spatial data such as social science and earth science data. A digital library is viewed as a distributed collection of digital documents each represented by different query interfaces and storage and retrieval mechanisms. UMDL differs from other DL projects in that the goal is not to bring all documents into a single system. Software agents are used as a means of carrying out a user's requests in an open market fashion on these heterogeneous collections. The main ares of research in UMDL are agent based technologies and user interface issues.
The digital libraries project at the University of Illinois at Urbana-Champaign (http://www.grainger.uiuc.edu/dli/) focuses on facilitating access to engineering and science journals. Articles are obtained directly from the publishers in SGML format and are made available for searching and full display including text, figures, tables and equations. Research is generally aimed at the scalability (hundreds of thousands of documents with thousands of users) and functionality of digital libraries. Building effective document indexes is also a prime initiative. The UIUC effort, however, treats heterogeneous collections of documents as a federated system with a central repository for indexes and a search interface consisting of multiple views of the source collections. Users should be able to query across collections in a consistent fashion and retrieve and view documents in their entirety.
The focus of the University of California - Santa Barbara project (http://alexandria.sdc.ucsb.edu/) is on the management and retrieval of geographic and spatially indexed information. This approach differs from other DL projects in that short term goals focus on developing a rapid prototype system using existing components as a basis for comparison with future research. Within the first year of the project, a prototype system was constructed using commercial database management systems (Sybase) and geographic information systems (ArcView). This prototype was then augmented with a WWW interface that has recently been deployed for testing. The content for the prototype systems include aerial photographs from NASA/AMES Earth Resource Air Photo Library and air photos over time of the California and Santa Barbara areas. Additional holdings include materials from the Sierra Nevada Ecosystem Project (SNEP) which consists of physical, biological, cultural and political features from surrounding areas stored as ARC/INFO coverage.
The goal of the Stanford digital libraries project (http://www-diglib.stanford.edu/) is to tie together a broad range of information sources spanning traditional stores of document collections to personal information stores. All resources are to be made accessible under an information bus (the InfoBus) that will tie together users, services and resources. Content will be targeted at computer science related literature. The research in the Stanford project centers on the InfoBus, a conceptual architecture that encompasses low level networking protocols, agent based communications protocols, information sources and information clients. The InfoBus facilitates the exchange of information on a variety of levels and is used to support digital library services including searching, translation, publishing, authentication, copy detection, financial, mediator and personal information services. Initial research for the project is concerned with developing the specification for the InfoBus to accommodate the variety of modalities present in a digital library.
The field of EC/DLs draws upon most every sub-discipline of computer science and information systems as well as library science, management sciences and other disciplines. In this review, we have presented research issues that span these disciplines yet at the same time, must be brought together in a cohesive fashion to build truly robust and scalable systems. From a provider's perspective, these open issues include building globally distributed collaborative environments that combine legacy data and digitized materials in secure repositories that can be made universally available to consumers at low cost to providers.
We wish to thank Professor Alfred V. Aho, Columbia University, Dr. Shamim Naqvi, Bellcore and the anonymous reviewers for their invaluable comments on this paper. Nabil Adam would like to acknowledge support from the NASA Center of Excellence in Space Data and Information Sciences. Yelena Yehsa would like to acknowledge support from IBM CAS and Johns Hopkins University.