Digital Library Bibliography

  Kenneth R. Abbott and Sunil K. Sarin. Experiences with workflow management: Issues for the next generation. In Richard Furuta and Christine Neuwirth, editors, CSCW '94, New York, 1994. ACM.
Workflow management is a technology that is considered strategically important by many businesses, and its market growth shows no signs of abating. It is, however, often viewed with skepticism by the research community, conjuring up visions of oppressed workers performing rigidly-defined tasks on an assembly line. Although the potential for abuse no doubt exists, workflow management can instead be used to help individuals manage their work and to provide a clear context for performing that work. A key challenge in the realization of this ideal is the reconciliation of workflow process models and software with the rich variety of activities and behaviors that comprise ``real'' work. Our experiences with the InConcert workflow management system are used as a basis for outlining several issues that will need to be addressed in meeting this challenge. This is intended as an invitation to CSCW researchers to influence this important technology in a constructive manner by drawing on research and experience.
  Tarek F. Abdelzaher and Nina Bhatti. Web content adaptation to improve server overload behavior. In Proceedings of the Eighth International World-Wide Web Conference, 1999.
This paper presents a study of Web content adaptation to improve server overload performance, as well as an implementation of a' Web content adaptation software prototype. When the request rate on a Web server increases beyond server capacity, the server becomes overloaded and unresponsive. The TCP listen queue of the server's socket overflows exhibiting a drop-tail behavior. As a result, clients experience service outages. Since clients typically issue multiple requests over the duration of a session with the server, and since requests are dropped indiscriminately, all clients connecting to the server at overload are likely to experience connection failures, even though there may be enough capacity on the server to deliver all responses properly for a subset of clients. In this paper, we propose to resolve the overload problem by adapting delivered content to load conditions to alleviate overload. The premise is that successful delivery of a less resource intensive content under overload is more desirable to clients than connection rejection or failures.
  Serge Abiteboul, Sophie Cluet, and Tova Milo. Querying and updating the file. In Proceedings of the Nineteenth International Conference on Very Large Databases, pages 73-84, Dublin, Ireland, 1993. VLDB Endowment, Saratoga, Calif.
  Serge Abiteboul, Sophie Cluet, and Tova Milo. Correspondence and translation for heterogeneous data. In Proceedings of the 6th International Conference on Database Theory, Delphi, Greece, 1997. Springer, Berlin.
  Marc Abrams, Constantinos Phanouriou, Alan L. Batongbacal, Stephen M. Williams, and Jonathan E. Shuster. Uiml: An appliance-independent xml user interface language. In Proceedings of the Eighth International World-Wide Web Conference, 1999.
Today's Internet appliances feature user interface technologies almost unknown a few years ago: touch screens, styli, handwriting and voice recognition, speech synthesis, tiny screens, and more. This richness creates problems. First. different appliances use different languages: WML for cell phones; SpeechML, JSML, and VoxML for voice enabled devices such as phones; HTML and XUL for desktop computers, and so on. Thus, developers must maintain multiple source code families to deploy interfaces to one information system on multiple appliances. Second, user interfaces differ dramatically in complexity (e.g, PC versus cell phone interfaces). Thus, developers must also manage interface content. Third, developers risk writing appliance-specific interfaces for an appliance that might not be on the market tomorrow. A solution is to build interfaces with a single, universal language free of assumptions about appliances and interface technology. This paper introduces such a language, the User Interface Markup Language (UIML), an XML-compliant language. UIML insulates the interface designer from the peculiarities of different appliances through style sheets. A measure of the power of UIML is that it can replace hand-coding of Java AWT or Swing user interfaces.
  Mark S. Ackerman. Providing social interaction in the digital library. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994.
Format: HTML Document(12K) . Audience: Non-technical, digital library researchers/funders. References: 13. Links: 2. Relevance: Low-medium. Abstract: Argues that social aspects of collaboration must be included in a Digital Library for the informal, organizational things that aren't always available in information sources. Mentions a TCL based system called CAFE that adds functionality of messages, bulletin boards, and talk.
  Mark S. Ackerman and Roy T. Fielding. Collection maintenance in the digital library. In Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries, 1995.
Format: HTML Document(39K + pictures) .

Audience: Librarians, web masters. References: 27. Links: 2. Relevance: Low. Abstract: Discusses the problem of collection maintenance in the digital domain, and argues that while some traditional practices will carry over, new methods will have to be created, esp. for dynamic and informal resources. S

uggests that some maintenance can be done automatically by agents, and gives 2 examples: MOMSpider, which checks to make sure links are still current and Web:Lookout which notifies user when interesting changes are made to a watched

page.

  Michael J. Ackerman. Accessing the visible human project. D-Lib Magazine, October 1995.
Format: HTML Document(11K). Audience: Medical professionals,. References: 1. Links: 5. Relevance: None. Abstract: Describes the Visible Human Project (1 mm cross sections of two cadavers), how to obtain the images, how large they are, what IP agreements need to be signed.
  R. Acuff, L. Fagan, T. Rindfleisch, B. Levitt, and P. Ford. Lightweight, mobile e-mail for intra-clinic communication. In Proceedings of the 1997 AMIA Annual Fall Symposium, pages 729-33, Oct 1997.
  N. Adam, Y. Yesha, B. Awerbuch, K. Bennet, B. Blaustein, A. Brodsky, R. Chen, O. Dogramaci, B. Grossman, R. Holowczak, J. Johnson K. Kalpakis, C. McCollum, A.-L. Neches, B. Neches, A. Rosenthal, J. Slonim, H. Wactlar, and O. Wolfson. Strategic directions in electronic commerce and digital libraries: towards a digital agora. ACM Computing Surveys, 28(4):818-35, December 1996.
The paper examines the research requirements of electronic commerce and digital libraries in six key areas. It provides case studies that describe three electronic commerce research projects (USC-ISI, CommerceNet, First Virtual) and six digital libraries projects sponsored by an NSF/ARPA/NASA initiative. The paper focuses on the following common areas of EC and DL research: acquiring and storing information; finding and filtering information; securing information and auditing access; universal access; cost management and financial instruments; and socio-economic impact.
  Anne Adams and Ann Blandford. Digital libraries’ support for the user’s ‘information journey’. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005.
The temporal elements of users’ information requirements are a continually confounding aspect of digital library design. No sooner have users’ needs been identified and supported than they change. This paper evaluates the changing information requirements of users through their ‘information journey’ in two different domains (health and academia). In-depth analysis of findings from interviews, focus groups and observations of 150 users have identified three stages to this journey: information initiation, facilitation (or gathering) and interpretation. The study shows that, although digital libraries are supporting aspects of users’ information facilitation, there are still requirements for them to better support users’ overall information work in context. Users are poorly supported in the initiation phase, as they recognize their information needs, especially with regard to resource awareness; in this context, interactive press-alerts are discussed. Some users (especially clinicians and patients) also required support in the interpretation of information, both satisfying themselves that the information is trustworthy and understanding what it means for a particular individual.
  Eytan Adar and Jeremy Hylton. On-the-fly hyperlink creation for page images. In Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries, 1995.
Format: HTML Document () .

Audience: Digital library researchers. References: 9. Links: 0. Relevance: Low. Abstract: Store pages as bitmaps, and retrieve a cite when user clicks on it, by doing OCR, then passing relevant line to library catalog, as 12 queries of 3 words each (randomly selected from the line) and returning the best scoring results. Somewhat robust to typos in cites, but not too slow.

  Paul S. Adler and Terry Winograd, editors. Usability : turning technologies into tools. Oxford University Press, 1992.
 
  Eugene Agichtein and Luis Gravano. Snowball: Extracting relations from large plain-text collections. In Proceedings of the Fifth ACM International Conference on Digital Libraries, 2000.
Text documents often contain valuable structured data that is hidden in regular English sentences. This data is best exploited if available as a relational table that we could use for answering precise queries or for running data mining tasks. We explore a technique for extracting such tables from document collections that requires only a handful of training examples from users. these examples are used to generate extraction patterns, that in turn result in new tuples being extracted from the document collection. We build on this idea and present our Snowball system. Snowball introduces novel strategies for generating patterns and extracting tuples from plain-text documents. At each iteration of the extraction process, Snowball evaluates the quality of these patterns and tuples without human intervention, and keeps only the most reliable ones for the next iteration. In this paper we also develop a scalable evaluation methodology and metrics for our task, and present a thorough experimental evaluation of Snowball and comparable techniques over a collection for more than 300,000 newspaper documents.
  Maristella Agosti, Nicola Ferro, and Nicola Orio. Annotating illuminated manuscripts: an effective tool for research and education. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005.
The aim of this paper is to report the research results of an ongoing project that deals with the exploitation of a digital archive of drawings and illustrations of historic documents for research and education purposes. According to the results on a study of user requirements, we designed tools to provide researchers with novel ways for accessing the digital manuscripts, sharing, and transferring knowledge in a collaborative environment. Annotations are proposed for making explicit the results of scientific research on the relationships between images belonging to manuscripts produced in a time span of centuries. For this purpose, a taxonomy for linking annotation is proposed, together with a conceptual schema for representing annotations and for linking them to digital objects.
  Rakesh Agrawal, Tomasz Imielinski, and Arun Swami. Mining association rules between sets of items in large databases. In Proceedings of the International Conference on Management of Data, pages 207-216. ACM Press, 1993.
  Alfred Aho, John Hopcroft, and Jeffrey Ullman. Data Structures and Algorithms. Addison-Wesley, 1983.
  T. Alanko, M. Kojo, M. Liljeberg, and K. Raatikainen. Mowgli: improvements for internet applications using slow wireless links. In Waves of the Year 2000+ PIMRC '97. The 8th IEEE International Symposium on Personal, Indoor and Mobile Radio Communications. Technical Program, Proceedings (Cat. No.97TH8271), volume 3, pages 1038-42, 1997.
Modern cellular telephone systems extend the usability of portable personal computers enormously. A nomadic user can be given ubiquitous access to remote information stores and computing services. However, the behavior of wireless links creates severe inconveniences within the traditional data communication paradigm. We give an overview of the problems related to wireless mobility. We also present a new software architecture for mastering the problems and discuss a new paradigm for designing mobile distributed applications. The key idea in the architecture is to place a mediator, a distributed intelligent agent, between the mobile node and the wireline network.
  Reka Albert, Albert-Laszlo Barabasi, and Hawoong Jeong. Diameter of the World Wide Web. Nature, 401(6749), September 1999.
  Alexa internet inc. http://www.alexa.com.
  R. B. Allen. Interface issues for interactive multimedia documents. In Advances in Digital Libraries '95, 1995.
Format: Not Yet Online.
  Robert B. Allen. Navigating and searching in hierarchical digital library catalogs. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994.
Format: HTML Document (21K) .

Audience: non technical, users. References: 15. Links: 2. Relevance: Low. Abstract: Describes a particular user interface based on a book shelf metaphor. Tries to use an a priori classification (Dewey Decimal System) as an organization tool (in addition to results of electronic searches).

  Robert B. Allen. Two digital library intefaces which exploit hierarchical structure. In DAGS '95, 1995.
Format: HTML Document(33K + pictures) .

Audience: General Computer scientists, HCI . References: 22. Links: 1. Relevance: Low-Medium. Abstract: Uses metaphor of hierarchical Dewey Decimal system or faceted (implying a DAG) ACM literature categories to aid UI. Shows graphically where in the hierarchy hits were found for a search.

  Robert B. Allen. A query interface for an event gazetteer. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004.
We introduce the idea of an ``event gazetteer'' that stores and presents locations in time. Each event is coded as a schema with attributes of event type, location, actor, and beginning and ending times. Sets of events can be collected as timelines and the events on these timelines can be linked by annotations. The system has been built with JSP and Oracle. Systematic metadata is essential for effective interaction with this system. For instance, the actors may be described by the roles in which they participate. In this paper, we focus on the construction of queries for this complex metadata. Ultimately, we envision a flexible, broad-based service that is a resource for users ranging from students to genealogists interested in events.
  Robert B. Allen. A multi-timeline interface for historical newspapers. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005.
Events may be are best understood in the context of other events. Because of the temporal ordering, we can call a set of related events a timeline. Even such timelines are best understood in the context of other timelines. To facilitate the exploration of a collection of timelines and events, a visualization tool has been developed that structures the user's browsing. In this model, each event is accompanied by a text description and links to related resources. In particular, this system can provide a browsing interface of digitized historical newspapers.
  Robert B. Allen and Jane Acheson. Browsing the structure of multimedia stories. In Proceedings of the Fifth ACM International Conference on Digital Libraries, 2000.
Stories may be analyzed as sequences of causally-related events and reactions to those events by the characters. We employ a notation of plot elements, similar to one developed by Lehnert, and we extend that by forming higher level story threads. This notation requires that events and reactions be linked and that the chains of links be terminated back to the beginning of the story. Furthermore, we have built a browser for the plot elements, the story threads, and associated multimedia. We apply the browser to Corduroy, a children's short feature which was analyzed in detail. We provide additional illustrations with analysis of Kiss of Death, a Film Noir classic. Effectively, the browser provides a framework for interactive summaries of the narrative.
  Open Mobile Alliance. Wireless application protocol. http://www.openmobilealliance.org/tech/affiliates/wap/wapindex.html#wap20, 2001.
The WAP Web site from where the specs are available.
  Virgilio Almeida, Azer Bestavros, Mark Crovella, and Adriana de Oliveira. Characterizing reference locality in the www. In Proceedings of PDIS'96: The IEEE Conference on Parallel and Distributed Information Systems, 1996.
  Virgilio A.F. Almeida, Wagner Meira Jr., Vicotr F. Ribeiro, and Nivio Ziviani. Efficiency analysis of brokers in the electronic marketplace. In Proceedings of the Eighth International World-Wide Web Conference, 1999.
In this paper we analyze the behavior of e-commerce users based on actual logs from two large non-English e-brokers. We start by presenting a quantitative study of the behavior of e-brokers and discuss the influence of regional and cultural issues on them. We then discuss a model that quantifies the efficiency of the results provided by brokers in the electronic marketplace. This model is a function of factors such as server response time and regional factors. Our findings clearly indicate that e-commerce is strongly tied to local language, national customs and regulations, currency conversion and logistics, and Internet infrastructure. We found that the behavior of customers of online bookstores is strongly affected by brand and regional factors. Music CD shoppers show a different behavior that might stem from the fact that music is universal and not so language dependent.
  Altavista incorporated. http://www.altavista.com.
  Amazon inc. http://www.amazon.com.
  Jose-Luis Ambite and Craig A. Knoblock. Reconciling distributed information sources. In AAAI Spring Symposium on Information Gathering, 1995.
Format: Compressed PostScript().
  B. Amento, L. Terveen, and W. Hill. Does authority mean quality? Predicting expert quality ratings of web documents. In Proceedings of the Twenty-Third Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2000.
evaluating different link based ranking techniques
  Einat Amitay, Nadav Har'El, Ron Sivan, and Aya Soffer. Web-a-where: geotagging web content. In SIGIR '04: Proceedings of the 27th annual international conference on Research and development in information retrieval, pages 273-280. ACM Press, 2004.
  E. Amoroso. Fundamentals of Computer Security Technology. Prentice Hall, Englewood Cliffs, NJ., 1994.
  H. Anan, X. Liu, K. Maly, M. Nelson, M. Zubair, J. C. French, E. Fox, and P. Shivakumar. Preservation and transition of ncstrl using an oai-based architecture. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002.
NCSTRL (Networked Computer Science Technical Reference Library) is a federation of digital libraries providing computer science materials. The architecture of the original NCSTRL was based largely on the Dienst software. It was implemented and maintained by the digital library group at Cornell University until September 2001. At that time, we had an immediate goal of preserving the existing NCSTRL collection and a long-term goal of providing a framework where participating organizations could continue to disseminate technical publications. Moreover, we wanted the new NCSTRL to be based on OAI (Open Archives Initiative) principles that provide a framework to facilitate the discovery of content in distributed archives. In this paper, we describe our experience in moving towards an OAI-based NCSTRL.
  Dan Ancona, Jim Frew, Greg Jan‰e, and Dave Valentine. Accessing the alexandria digital library from geographic information systems. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004.
We describe two experimental desktop library clients that offer improved access to geospatial data via the Alexandria Digital Library (ADL): ArcADL, an extension to ESRI's ArcView GIS, and vtADL, an extension to the Virtual Terrain Project's Enviro terrain visualization package. ArcADL provides a simplified user interface to ADL's powerful underlying distributed geospatial search technology. Both clients use the ADL Access Framework to access library data that is available in multiple formats and retrievable by multiple methods. Issues common to both clients and future scenarios are also considered.
  Kenneth M. Anderson, Aaron Andersen, Neet Wadhwani, and Laura M. Bartolo. Metis: Lightweight, flexible, and web-based workflow services for digital libraries. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003.
The Metis project is developing workflow technology designed for use in digital libraries by avoiding the assumptions made by traditional workflow systems. In particular, digital libraries have highly distributed sets of stakeholders who nevertheless must work together to perform shared activities. Hence, traditional assumptions that all members of a workflow belong to the same organization, work in the same fashion, or have access to similar computing platforms are invalid. The Metis approach makes use of event-based workflows to support the distributed nature of digital library workflow and employs techniques to make the resulting technology lightweight, flexible, and integrated with the Web. This paper describes the conceptual framework behind the Metis approach as well as a prototype which implements the framework. The prototype is evaluated based on its ability to model and execute a workflow drawn from a real-world digital library. After describing related work, the paper concludes with a discussion of future research opportunities in the area of digital library workflow and outlines how Metis is being deployed to a small set of digital libraries for additional evaluation.
  R. Anderson and M. Kuhn. Tamper resistance-a cautionary note. In Proceedings of the Second USENIX Workshop on Electronic Commerce, Berkeley, CA, USA, 1996. USENIX Assoc.
An increasing number of systems, from pay-TV to electronic purses, rely on the tamper resistance of smartcards and other security processors. We describe a number of attacks on such systems some old, some new and some that are simply little known outside the chip testing community. We conclude that trusting tamper resistance is problematic; smartcards are broken routinely, and even a device that was described by a government signals agency as the most secure processor generally available' turns out to be vulnerable. Designers of secure systems should consider the consequences with care.
  R. Anderson, C. Manifavas, and C. Sutherland. Netcard - a practical electronic cash system. In Fourth Cambridge Workshop on Security Protocols, 1996.
  R.C. Angell, G.E. Freund, and P. Willett. Automatic spelling correction using a trigram similarity measure. Information Processing and Management, 19(4):255-261, 1983.
 
  ANSI/NISO. Information Retrieval: Application Service Definition and Protocol Specification, April 1995. Available at http://lcweb.loc.gov/z3950/agency/document.html.
 
  Vinod Anupam, Alain Mayer, Kobbi Nissim, Benny Pinkas, and Michael K. Reiter. On the security of pay-per-click and other web advertising schemes. In Proceedings of the Eighth International World-Wide Web Conference, 1999.
We present a hit inflation attack on pay-per- click Web advertising schemes. Our attack is virtually impossible for the program provider to detect conclusively, regardless of whether the provider is a third- party `ad network` or the target of the click itself. If practiced widely, this attack could accelerate a move away from pay- per-click program, and toward programs in which referrers are paid only if the referred user subsequently makes a purchase (pay-per-sale) or engages in other substantial activity at the target site (pay-per-lead). We also briefly discuss the lack of auditability inherent in these schemes.
  Kyoichi Arai, Teruo Yokoyama, and Yutaka Matsushita. A window sytems with leafing through mode: Bookwindow. In Proceedings of the Conference on Human Factors in Computing Systems CHI'92, 1992.
 
  Avi Arampatzis, Marc van Kreveld, Iris Reinbacher, Paul Clough, Hideo Joho, Mark Sanderson, Christopher B. Jones, Subodh Vaid, Marc Benkert, and Alexander Wolff. Web-based delineation of imprecise regions. In Proceedings of the Workshop on Geographic Information Retrieval, 2004.
  Arvind Arasu, Junghoo Cho, Hector Garcia-Molina, Andreas Paepcke, and Sriram Raghavan. Searching the web. ACM Transactions on Internet Technology, 2001. Submitted for publication. Available at http://dbpubs.stanford.edu/pub/2000-37.
We offer an overview of current Web search engine design. After introducing a generic search engine architecture, we examine each engine component in turn. We cover crawling, local Web page storage, indexing, and the use of link analysis for boosting search performance. The most common design and implementation techniques for each of these components are presented. We draw for this presentation from the literature, and from our own experimental search engine testbed. Emphasis is on introducing the fundamental concepts, and the results of several performance analyses we conducted to compare different designs.
  William Y. Arms. Key concepts in the architecture of the digital library. D-Lib Magazine, Jul 1995.
Format: HTML Document(18K + pictures). Audience: computer scientists, digital library researchers. References: 1. Links: 3. Relevance: Medium-low. Abstract: Outlines 8 principles that are important to DLs, a combination of social/economic issues (avoid using words like ``copy'' and ``publish'') and technical ones (basically a sales pitch for the Kahn/Wilensky model of handles, maintenance, and access control.)
  William Y. Arms. Key concepts in the architecture of the digital library. D-Lib Magazine, July 1995.
  R. Armstrong, D. Freitag, T. Joachims, and T. Mitchell. Webwatcher: A learning apprentice for the world wide web. In AAAI Spring Symposium on Information Gathering, 1995.
We describe an information seeking assistant for the world wide web. This agent, called WebWatcher, interactively helps users locate desired information by employing learned knowledge about which hyperlinks are likely to lead to the target information.
  Robert Armstrong, Dayne Freitag, Thorsten Joachims, and Tom Mitchell. Webwatcher: A learning apprentice for the world wide web. In AAAI Spring Symposium on Information Gathering, 1995.
Format: Compressed PostScript().
  Kenneth Arnold. The body in the virtual library: Rethinking scholarly communication. In JEP.
Format: HTML Document (41K) . Audience: Scholars, publishers (esp. university press), librarians. References: 10. Links: 1. Relevance: Low-Medium. Abstract: Discusess the future of university presses, in pretty grim terms. Suggests that they lack the capital, staff, and quick reaction time to survive in an electronic world. Considers the Mellon report on scholarly comm

unication (which suggests universities get copyrights on books their faculty produce) unreasonable. Thinks that relying on commercial network providers (esp. cable, telecom) would be disastrous. Advocates a non-profit distribution ne

twork for scholarly publication.

  Kenneth Arnold. The electronic librarian is a verb/the electronic library is not a sentence. In JEP, 1994.
Format: HTML Document (49K) . Audience: Librarians, policy makers. References: 10. Links: 1. Relevance: low. Abstract: A vision of the networked library. Sees the real value of librarians as creating attention structures which anticipate the way clients search.
  Dennis S. Arnon. Scrimshaw: a language for document queries and transformations. Electronic Publishing: Origination, Dissemination and Design, 6(4):361-372, December 1993.
  J. Ashley, M. Flickner, J. Hafner, D. Lee, W. Niblack, and D. Petkovic. The query by image content (QBIC) system. In Proceedings of the International Conference on Management of Data (SIGMOD). ACM Press, 1995.
  N. Asokan, P.A. Janson, M. Steiner, and M. Waidner. The state of the art in electronic payment systems. Computer, 30(9):28-35, September 1997.
The exchange of goods conducted face-to-face between two parties dates back to before the beginning of recorded history. Traditional means of payment have always had security problems, but now electronic payments retain the same drawbacks and add some risks. Unlike paper, digital documents can be copied perfectly and arbitrarily often, digital signatures can be produced by anybody who knows the secret cryptographic key, and a buyer's name can be associated with every payment, eliminating the anonymity of cash. Without new security measures, widespread electronic commerce is not viable. On the other hand, properly designed electronic payment systems can actually provide better security than traditional means of payments, in addition to flexibility. This article provides an overview of electronic payment systems, focusing on issues related to security.
  Active Server Pages technology. http://msdn.microsoft.com/workshop/server/asp/aspfeat.asp.
  R. Atkinson, A. Demers, C. Hauser, C. Jacobi, P. Kessler, and M. Weiser. Experiences creating a portable cedar. SIGPLAN Not. (USA), SIGPLAN Notices, 24(7):322-8, 1989.
The authors have recently re-implemented the Cedar language to make it portable across many different architectures. The strategy was, first, to use machine-dependent C code as an intermediate language, second, to create a language-independent layer known as the Portable Common Runtime, and third, to write a relatively large amount of Cedar-specific runtime code in a subset of Cedar itself. The paper presents a brief description of the Cedar language, the portability strategy for the compiler and runtime, the manner of making connections to other languages and the Unix operating system, and some performance measures of the Portable Cedar.
  Neal Audenaert, Richard Furuta, Eduardo Urbina, Jie Deng, Carlos Monroy, Rosy Sáenz, and Doris Careaga. Integrating collections at the cervantes project. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005.
Unlike many efforts that focus on supporting scholarly research by developing large-scale, general resources for a wide range of audiences, we at the Cervantes Project have chosen to focus more narrowly on developing resources in support of ongoing research about the life and works of a single author, Miguel de Cervantes Saavedra (1547-1616). This has lead to a group of hypertextual archives, tightly integrated around the narrative and thematic structure of Don Quixote. This project is typical of many humanities research efforts and we discuss how our experiences inform the broader challenge of developing resources to support humanities research.
  Cyrus Azarbod and William Perrizo. Building concept hierarchies for schema integration in hddbs using incremental concept formation. In B. Bhargava, T. Finin, and Y. Yesha, editors, CIKM 93. Proceedings of the Second International Conference on Information and Knowledge Management, pages 732-734, Washington, D.C., November 1993. ACM.
  Sulin Ba, Aimo Hinkkanen, and Andre B. Whinston. Digital library as a foundation for decision support systems. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994.
Format: HTML Document (43K) .

Audience: Semi-technical, business slant, funding proposal. References: 14. Links: 1. Relevance: Low. Abstract: Sees a DL as an enterprise wide collection of *executable* documents. SGML and Mathematica suggested as integration tools. Search for data representation which will allow automatic combination of separate documents to solve problems.

  D. Bachiochi, M. Berstene, E. Chouinard, N. Conlan, M. Danchak, T. Furey, C. Neligon, and D. Way. Usability studies and designing navigational aids for the world wide web. In Proceedings of the Sixth International World-Wide Web Conference, 1997.
 
  B. R. Badrinath. Distributed computing in mobile environments. Computers & Graphics, 20(5):615-17, 1996.
Rapid progress in hardware has led to the availability of portable personal computers ranging from laptops to hand-held computers (PDAs and Internet terminals). The presence of wireless connectivity gives these hand-held units the capability of accessing information anywhere, at any time. These mobile units can be considered to be part of a worldwide distributed information system. Distributed computing in mobile environments faces new challenges as more and more mobile hosts become an integral part of a distributed system. Problems in distributed computing in mobile environments are due to: (1) mobility, (2) wireless and (3) resource constraints at the mobile host. In this paper, we discuss the impact of these factors and research issues that need to be addressed in mobile distributed systems.
  Ricardo Baeza-Yates and Berthier Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley-Longman, May 1999.
The chapters of the book are:

Introduction Modeling Retrieval Evaluation Query Languages (with Gonzalo Navarro) Query Operations Text and Multimedia Languages and Properties Text Operations (with Nivio Ziviani) Indexing and Searching (with Gonzalo Navarro) Parallel and Distributed IR (by Eric Brown) User Interfaces and Visualization (by Marti Hearst) Multimedia IR: Models and Languages (by Elisa Bertino, Barbara Catania and Elena Ferrari) Multimedia IR: Indexing and Searching (by Christos Faloutsos) Searching the Web Libraries and Bibliographic Systems (by Edie Rasmussen) Digital Libraries (by Edward Fox and Ohm Sornil) Appendix: Porter's Algorithm Glossary References (more than 800) Index

More information can be found in: http://www.sims.berkeley.edu/ hearst/irbook

  David Bainbridge, Craig G. Nevill-Manning, Ian H. Witten, Lloyd A. Smith, and Rodger J. McNab. Towards a digital library of popular music. In Proceedings of the Fourth ACM International Conference on Digital Libraries, 1999.
Digital libraries of music have the potential to capture popular imagination in ways that more scholarly libraries cannot. we are working towards a comprehensive digital library of musical material, including popular music. We have developed new ways of collecting musical material, accessing it through searching and browsing, and presenting the results to the user. We work with different representations of music: facsimile images of scores, the internal representation of a music editing program, page images typeset by a music editor, MIDI files, audio files representing sung user input, and textual metadata such as title, composer and arranger, and lyrics. This paper describes a comprehensive suite of tools that we have built for this project. These tools gather musical material, convert between many of these representations, allow searching based on combined musical and textual criteria, and help present the results of searching and browsing. Although we do not yet have a single fully-blown digital music library, we have built several exploratory prototype collections of music, some of them very large (100,000 tunes), and critical components of the system have been evaluated.
  David Bainbridge, John Thompson, and Ian H. Witten. Assembling and enriching digital library collections. In Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital Libraries, 2003.
People who create digital libraries need to gather together the raw material, add metadata as necessary, and design and build new collections. This paper sets out the requirements for these tasks and describes a new tool that supports them interactively, making it easy for users to create their own collections from electronic files of all types. The process involves selecting documents for inclusion, coming up with a suitable metadata set, assigning metadata to each document or group of documents, designing the form of the collection in terms of document formats, searchable indexes, and browsing facilities, building the necessary indexes and data structures, and putting the collection in place for others to use. All these tasks are supported within a modern point-and-click interaction paradigm. Although the tool is specific to the Greenstone digital library software, the underlying ideas should prove useful in more general contexts.
  M. Baker. Changing communication environments in mosquitonet. In Proceedings of the IEEE Workshop on Mobile Computing Systems and Applications, Dec 1994.
  M. Baker, X. Zhao, S. Cheshire, and J. Stone. Supporting mobility in mosquitonet. In Proceedings of the 1996 USENIX Conference, Jan 1996.
  Scott Baker and John H. Hartman. The gecko nfs web proxy. In Proceedings of the Eighth International World-Wide Web Conference, 1999.
The World-Wide Web provides remote access to pages using its own naming scheme (URLs). transfer protocol (HTTP), and cache algorithms. Not only does using these special-purpose mechanisms have performance implications, but they make it impossible for standard Unix applications to access the Web. Gecko is a system that provides access to the Web via the NFS protocol. URLs are mapped to Unix file names, providing unmodified applications access to Web pages; pages are transferred from the Gecko server to the clients using NFS instead of HTTP. significantly improving performance; and NFS's cache consistency mechanism ensures that all clients have the same version of a page. Applications access pages as they would Unix files. A client-side proxy translates HTTP requests into file accesses, allowing existing Web applications to use Gecko. Experiments performed on our prototype show that Gecko is able to provide this additional functionality at a performance level that exceeds that of HTTP.
  Scott M. Baker and Bongki Moon. Distributed cooperative web servers. In Proceedings of the Eighth International World-Wide Web Conference, 1999.
Traditional techniques for a distributed web server design rely on manipulation of central resources, such as routers or DNS services, to distribute requests designated for a single IP address to multiple web servers. The goal of the distributed cooperative Web server (DCWS) system development is to explore application-level techniques for distributing web content. We achieve this by dynamically manipulating the hyperlinks stored within the web documents themselves. The DCWS system effectively eliminates the bottleneck of centralized resources, while balancing the load among distributed web servers. DCWS servers may be located in different networks, or even different continents and still balance load effectively. DCWS system design is fully compatible with existing HTTP protocol semantics and existing web client software products.
  M. Balabanovic and Y. Shoham. Learning information retrieval agents: Experiments with automated web browsing. In AAAI spring symposium on Information Gathering, 1995.
The current exponential growth of the Internet precipitates a need for new tools to help people cope with the volume of information. To complement recent work on creating searchable indexex of the World-Wide Web and systems for filtering incoming e-mail and Usenet news articles, we describe a system which helps users keep abreast of new and interesting information. Every day it presents a selection of interesting web pages. The user evaluates each page, and given this feedback the system adapts and attempts to produce better pages the following day. We prsent some early results from an AI programming class to whom this was set as a project, and then describe our current implementation. Over the course of 24 days the output of our system was compared to both randomly-selected and human-selected pages. It consistently performed better than the random pages, and was better than the human-selected pages half of the time.
  M. Balabanovic and Y. Shoham. Fab: content-based collaborative recommendation. Communications of the ACM, 40(3):66-72, March 1997.
Online readers are in need of tools to help them cope with the mass of content that is available on the World Wide Web. In traditional media, readers are provided assistance in making selections. This includes both implicit assistance in the form of editorial oversight and explicit assistance in the form of recommendation services such as movie reviews and restaurant guides. The electronic medium offers new opportunities to create recommendation services, ones that adapt over time to track users' evolving interests. Fab is such a recommendation system for the Web, and has been operational in several versions since December 1994. By combining both collaborative and content-based filtering systems, Fab may eliminate many of the weaknesses found in each approach.
  M. Balabanovic, Y. Shoham, and Y. Yun. An adaptive agent for automated web browsing. Journal of Visual Communication and Image Representation, 6(4), December 1995.
 
  Marko Balabanovic. An adaptive web page recommendation service. In Proceedings of the First International Conference on Autonomous Agents p. 378-385, February 1997.
 
  Marko Balabanovic. Exploring versus exploiting when learning user models for text recommendation. User Modeling and User-Adapted Interaction (to appear), 8(1), 1998.
  Marko Balabanovic. An interface for learning multi-topic user profiles from implicit feedback. Technical Report SIDL-WP-1998-0089, Stanford University, 1998.
 
  Marko Balabanovic. The ``slider'' interface. IBM interVisions, 11, February 1998.
  Marko Balabanovic, Lonny L. Chu, and Gregory J. Wolff. Storytelling with digital photographs. In CHI '00: Proceedings of the SIGCHI conference on Human factors in computing systems, pages 564-571, New York, NY, USA, 2000. ACM Press.
  Marko Balabanovic and Yoav Shoham. Learning inforamtion retrieval agents: Experiments with automated web browsing. In Proceedings of the AAAI Spring Symposium on Information Gathering from Heterogenous, Distributed Resources, 1995.
Format: Compressed PostScript
  Marko Balabanovic and Yoav Shoham. Combining content-based and collaborative recommendation. Communications of the ACM, 40(3), March 1997.
 
  Marko Balabanovic, Yoav Shoham, and Yeogirl Yun. An adaptive agent for automated web browsing. Journal of Visual Communication and Image Representation, 6(4), December 1995.
you give agent profile. It looks at the Web for things of interest and reports back. You give feedback
  Michelle Baldonado. Searching, browsing, and metasearching with sensemaker. Web Techniques Magazine, May 1997.
 
  Michelle Baldonado, Chen-Chuan K. Chang, Luis Gravano, and Andreas Paepcke. Metadata for digital libraries: Architecture and design rationale. Technical Report SIDL-WP-1997-0055; 1997-26, Stanford University, 1997. Accessible at http://dbpubs.stanford.edu/pub/1997-26.
In a distributed, heterogeneous, proxy-based digital library, autonomous services and collections are accessed indirectly via proxies. To facilitate metadata compatibility and interoperability in such a digital library, we have designed a metadata architecture that includes four basic component classes: attribute model proxies, attribute model translators, metadata facilities for search proxies, and metadata repositories. Attribute model proxies elevate both attribute sets and the attributes they define to first-class objects. They also allow relationships among attributes to be captured. Attribute model translators map attributes and attribute values from one attribute model to another (where possible). Metadata facilities for search proxies provide structured descriptions both of the collections to which the search proxies provide access and of the search capabilities of the proxies. Finally, metadata repositories accumulate selected metadata from local instances of the other three component classes in order to facilitate global metadata queries and local metadata caching. In this paper, we outline further the roles of these component classes, discuss our design rationale, and analyze related work.
  Michelle Baldonado, Chen-Chuan K. Chang, Luis Gravano, and Andreas Paepcke. Metadata for digital libraries: Architecture and design rationale. In Proceedings of the Second ACM International Conference on Digital Libraries, pages 47-56, 1997. At http://dbpubs.stanford.edu/pub/1997-26.
In a distributed, heterogeneous, proxy-based digital library, autonomous services and collections are accessed indirectly via proxies. To facilitate metadata compatibility and interoperability in such a digital library, we have designed a metadata architecture that includes four basic component classes: attribute model proxies, attribute model translators, metadata facilities for search proxies, and metadata repositories. Attribute model proxies elevate both attribute sets and the attributes they define to first-class objects. They also allow relationships among attributes to be captured. Attribute model translators map attributes and attribute values from one attribute model to another (where possible). Metadata facilities for search proxies provide structured descriptions both of the collections to which the search proxies provide access and of the search capabilities of the proxies. Finally, metadata repositories accumulate selected metadata from local instances of the other three component classes in order to facilitate global metadata queries and local metadata caching. In this paper, we outline further the roles of these component classes, discuss our design rationale, and analyze related work.
  Michelle Baldonado, Chen-Chuan K. Chang, Luis Gravano, and Andreas Paepcke. The Stanford Digital Library metadata architecture. International Journal of Digital Libraries, 1(2), February 1997. See also http://dbpubs.stanford.edu/pub/1997-56.
 
  Michelle Baldonado, Steve Cousins, B. Lee, and Andreas Paepcke. Notable: An annotation system for networked handheld devices. In Proceedings of the Conference on Human Factors in Computing Systems CHI'99, pages 210-211, 1999.
 
  Michelle Baldonado, Seth Katz, Andreas Paepcke, Chen-Chuan K. Chang, Hector Garcia-Molina, and Terry Winograd. An extensible constructor tool for the rapid, interactive design of query synthesizers. In Proceedings of the Third ACM International Conference on Digital Libraries, 1998. Accessible at http://dbpubs.stanford.edu/pub/1998-48.
We describe an extensible constructor tool that helps information experts (e.g., librarians) create specialized query synthesizers for heterogeneous digital-library environments. A query synthesizer provides a graphical user interface in which a digital-library patron can specify a high-level, fielded, multi-source query. Furthermore, a query synthesizer interacts with a query translator and an attribute translator to transform high-level queries into sets of source-specific queries. We discuss how the constructor can facilitate discovery of available attributes (e.g., title), collation of schemas from different sources, selection of input widgets for a synthesizer (e.g., a text box or a drop-down list widget to support input of controlled vocabulary), and other design aspects. We also describe a prototype constructor we implemented, based on the Stanford InfoBus and metadata architecture.
  Michelle Q Wang Baldonado and Steve B. Cousins. Addressing heterogeneity in the networked information environment. New Review of Information Networking, 2:83-102, 1996.
Several ongoing Stanford University Digital Library projects address the issue of heterogeneity in networked information environments. A networked information environment has the following components: users, information repositories, information services, and payment mechanisms. This paper describes three of the heterogeneity-focused Stanford projects-InfoBus, REACH, and DLITE. The InfoBus project is at the protocol level, while the REACH and DLITE projects are both at the conceptual model level. The InfoBus project provides the infrastructure necessary for accessing heterogeneous services and utilizing heterogeneous payment mechanisms. The REACH project sets forth a uniform conceptual model for finding information in networked information repositories. The DLITE project presents a general task-based strategy for building user interfaces to heterogeneous networked information services.
  Michelle Q Wang Baldonado and Terry Winograd. Techniques and tools for making sense out of heterogeneous search service results. Technical Report SIDL-WP-1995-0019; 1995-59, Stanford University, 1995.
 
  Michelle Q Wang Baldonado and Terry Winograd. A user interaction model for browsing based on category-level operations. Technical Report SIDL-WP-1996-0029; 1996-75, Stanford University, 1996.
We propose a user interaction model for browsing based on itera tive category-level operations. The motivation comes from two observations: 1) people naturally think in terms of categories, and 2) in browsing, the types of categories that are salient to users change as they browse. We define a set of category-level operations that lets users iteratively view and find results in terms of these changing category types. We also show that we can express some standard IR operations as iteratively applied sequences of a funda mental category-level operation (thus unifying them). Finally, we describe SenseMaker, a prototype interface for browsing heteroge neous sources.
  Michelle Q Wang Baldonado and Terry Winograd. SenseMaker: An information-exploration interface supporting the contextual evolution of a user's interests. In Proceedings of the Conference on Human Factors in Computing Systems CHI'97, pages 11-18, Atlanta, Ga., March 1997. ACM Press, New York.
  Sujata Banerjee and Vibhu O. Mittal. On the use of linguistic ontologies for accessing and indexing distributed digital libraries. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994.
Format: HTML Document () . Audience: Non-technical, on-line searchers. References: 16. Links: 1. Relevance: Low. Abstract: Addresses problem of finding correct keywords to search for by using WordNet. If a search doesn't turn up the hits needed, it modifies query by using synonyms, generalizing, or replacing with a set of more specific words. Searcher is asked to approve modified queries, which are then re-sent to content providers.
  Gaurav Banga, Fred Douglis, and Michael Rabinovich. Optimistic deltas for www latency reduction. In Proceedings of USENIX Technical Conference, pages 289-303, 1997.
  Ziv Bar-Yossef, Alexander Berg, Steve Chien, and Jittat Fakcharoenphol Dror Weitz. Approximating aggregate queries about web pages via random walks. In Proceedings of the Twenty-sixth International Conference on Very Large Databases, 2000.
  Ziv Bar-Yossef, Andrei Z. Broder, Ravi Kumar, and Andrew Tomkins. Sic transit gloria telae: towards an understanding of the web's decay. In WWW '04: Proceedings of the 13th international conference on World Wide Web, pages 328-337, New York, NY, USA, 2004. ACM Press.
The rapid growth of the web has been noted and tracked extensively. Recent studies have however documented the dual phenomenon: web pages have small half lives, and thus the web exhibits rapid death as well. Consequently, page creators are faced with an increasingly burdensome task of keeping links up-to-date, and many are falling behind. In addition to just individual pages, collections of pages or even entire neighborhoods of the web exhibit significant decay, rendering them less effective as information resources. Such neighborhoods are identified only by frustrated searchers, seeking a way out of these stale neighborhoods, back to more up-to-date sections of the web; measuring the decay of a page purely on the basis of dead links on the page is too naive to reflect this frustration. In this paper we formalize a strong notion of a decay measure and present algorithms for computing it efficiently. We explore this measure by presenting a number of validations, and use it to identify interesting artifacts on today's web. We then describe a number of applications of such a measure to search engines, web page maintainers, ontologists, and individual users.
  Albert-Laszlo Barabasi and Reka Albert. Emergence of scaling in random networks. Science, 286(5439):509-512, October 1999.
  David Bargeron, Anoop Gupta, Jonathan Grudin, and Elizabeth Sanocki. Annotations for streaming video on the web: System design and usage studies. In Proceedings of the Eighth International World-Wide Web Conference, 1999.
Streaming video on the World Wide Web is being widely deployed, and workplace training and distance education are key applications. The ability to annotate video on the Web can provide significant added value in these and other areas. Written and spoken annotations can provide `in context' personal notes and can enable asynchronous collaboration among groups of users. With annotations, users are no longer limited to viewing content passively on the Web, but are free to add and share commentary and links, thus transforming the Web into an interactive medium. We discuss design considerations in constructing a collaborative video annotation system, and we introduce our prototype, called MRAS. We present preliminary data on the use of Web- based annotations for personal note-taking and for sharing notes in a distance education scenario, Users showed a strong preference for MRAS over pen-and-paper for taking notes, despite taking longer to do so. They also indicated that they would make more abstract and questions with MRAS than in a `live' situation, and that sharing added substantial value.
  Bruce R. Barkstrom, Melinda Finch, Michelle Ferebee, and Calvin Mackey. Adapting digital libraries to continual evolution. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002.
In this paper, we describe five investment streams (data storage infrastructure, knowledge management, data production control, data transport and security, and personnel skill mix) that need to be balanced against short-term operating demands in order to maximize the probability of long-term viability of a digital library. Because of the rapid pace of information technology change, a digital library cannot be a static institution. Rather, it has to become a flexible organization adapted to continuous evolution of its infrastructure.
  Kobus Barnard, Pinar Duygulu, David Forsyth, Nando de Freitas, David M. Blei, and Michael I. Jordan. Matching words and pictures. J. Mach. Learn. Res., 3:1107-1135, 2003.
We present a new approach for modeling multi-modal data sets, focusing on the specific case of segmented images with associated text. Learning the joint distribution of image regions and words has many applications. We consider in detail predicting words associated with whole images (auto-annotation) and corresponding to particular image regions (region naming). Auto-annotation might help organize and access large collections of images. Region naming is a model of object recognition as a process of translating image regions to words, much as one might translate from one language to another. Learning the relationships between image regions and semantic correlates (words) is an interesting example of multi-modal data mining, particularly because it is typically hard to apply data mining techniques to collections of images. We develop a number of models for the joint distribution of image regions and words, including several which explicitly learn the correspondence between regions and words. We study multi-modal and correspondence extensions to Hofmann's hierarchical clustering/aspect model, a translation model adapted from statistical machine translation (Brown et al.), and a multi-modal extension to mixture of latent Dirichlet allocation (MoM-LDA). All models are assessed using a large collection of annotated images of real scenes. We study in depth the difficult problem of measuring performance. For the annotation task, we look at prediction performance on held out data. We present three alternative measures, oriented toward different types of task. Measuring the performance of correspondence methods is harder, because one must determine whether a word has been placed on the right region of an image. We can use annotation performance as a proxy measure, but accurate measurement requires hand labeled data, and thus must occur on a smaller scale. We show results using both an annotation proxy, and manually labeled data.
  Kobus Barnard and David .A. Forsyth. Learning the semantics of words and pictures. In Proceedings of the IEEE International Conference on Computer Vision, July 2001.
  Rob Barrett, Paul P. Maglio, and Daniel C. Kellem. How to personalize the web. In Proceedings of the Conference on Human Factors in Computing Systems CHI'97, 1997.
 
  Laura M. Bartolo, Cathy S. Lowe, Adam C. Powell IV, Donald R. Sadoway, Jorges Vieyra, and Kyle Stemen. Use of matml with software applications for e-learning. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004.
This pilot project investigates facilitating the development of the Semantic Web for e-learning through a practical example, using Materials Property Data Markup Language (MatML) to provide materials property data to a web-based application program. Property data for 100 materials is marked up with MatML and used as an input format for an application program. Students use the program to generate graphs showing selected properties for different materials. Selected graphs are submitted to the Materials Digital Library (MatDL) so that successive classes may be informed by earlier work to encourage new discoveries.
  C. Batini, M. Lenzerini, and S. Navathe. A comparative analysis of methodologies for database schema integration. ACM Computing Surveys, 18(4), 1986.
 
  Patrick Baudisch and Ruth Rosenholtz. Halo: a technique for visualizing off-screen objects. In CHI '03: Proceedings of the SIGCHI conference on Human factors in computing systems, pages 481-488, New York, NY, USA, 2003. ACM Press.
  E. Bauer, D. Koller, and Y. Singer. Update rules for parameter estimation in Bayesian networks. In Proceedings of the 13th Annual Conference on Uncertainty in AI (UAI), 1997.
 
  M. Bearman. Odp-trader. Open Distributed Processing, 2:19 - 33, 1994.
  Herb Becker. The role of the library of congress in the national digital library. In Proceedings of DL'96, 1996.
Format: Not yet online.
  Benjamin B. Bederson. Photomesa: a zoomable image browser using quantum treemaps and bubblemaps. In Proceedings of the 14th annual ACM symposium on User interface software and technology, pages 71-80. ACM Press, 2001.
  Benjamin B. Bederson, Ben Shneiderman, and Martin Wattenberg. Ordered and quantum treemaps: Making effective use of 2D space to display hierarchies. ACM Transactions on Graphics, 21(4):833-854, 2002.
  Doug Beeferman, Adam Berger, and John D. Lafferty. Statistical models for text segmentation. Machine Learning, 34(1-3):177-210, 1999.
  Alireza Behreman. Generic electronic payment services. In The Second USENIX Workshop on Electronic Commerce Proceedings, 1996.
  Alireza Behreman and Rajkumar Narayanaswamy. Payment method negotiation service. In The Second USENIX Workshop on Electronic Commerce Proceedings, 1996.
  M. Beigl and R. Rudisch. System support for mobile computing. Computers & Graphics, 20(5):619-625, 1996.
Today a mobile user wants to connect his portable computer: remotely to the central database at home, locally to the printer on the spot and globally to the world-wide-web. To achieve this, different connection lines are available: wireless networks for connecting out in the fields, ISDN or analogue telephone lines when residing in a hotel, Ethernet access at the customer's site. But this connectivity raises a lot of questions, about technical, security or accounting issues. This paper presents the architecture of an environment aiming to support mobile users and dealing with the given problems.
  N.J. Belkin and W. Bruce Croft. Information filtering and information retrieval: two sides of same coin? Communications of the ACM, 35(12):29-38, December 1992.
A comparison is made between information retrieval and information filtering. The authors determine that information filtering is a well defined process. By examining its foundations and comparing it to the foundations of the IR enterprise, the authors find there is very little difference between filtering and retrieval at an abstract level. They conclude that the two enterprises have the same goal; namely they are both concerned with getting information to people who need it. However, the authors emphasize that IR research has ignored some aspects of the general problem which both IR and information filtering address, and that these aspects are precisely those which especially relevant to the specific contexts of filtering.
  Timothy C. Bell, Alistair Moffat, and Ian H. Witten. Compressing the digital library. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994.
Format: HTML Document (32K) . Audience: Semi-technical, general computer scientists. References: 8. Links: 1. Relevance: Medium (but not mainstream DL). Abstract: Discusses the interaction of compression and indexing. Suggests a Huffman encoding applied to words & non-words. Inverted bitmap for indexing, enhanced with Golomb encoding. Compressed 266 Mb Wall Street Journal a

rticle database by 50including creating the index. Queries were processed in less than .1 sec.

  M. Bellare, J.A. Garay, R. Hauser, A. Herzberg, H. Krawczyk, M. Steiner, G. Tsudik, and M. Waidner. ikp-a family of secure electronic payment protocols. In Proceedings of the First USENIX Workshop of Electronic Commerce, Berkeley, CA, USA, 1995. USENIX Assoc.
This paper proposes a family of protocols-iKP (i=1,2,3)-for secure electronic payments over the Internet. The protocols implement credit card-based transactions between the customer and the merchant while using the existing financial network for clearing and authorization. The protocols can be extended to apply to other payment models, such as debit cards and electronic checks. They are based on public-key cryptography and can be implemented in either software or hardware. Individual protocols differ in key management complexity and degree of security. It is intended that their deployment be gradual and incremental. The iKP protocols are presented herein with the intention to serve as a starting point for eventual standards on secure electronic payment.
  Jezekiel Ben-Arie, Purvin Pandit, and ShyamSundar Rajaram. Design of a digital library for human movement. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001.
This paper is focused on a central aspect in the design of our planned digital library for human movement, i.e. on the aspect of representation and recognition of human activity from video data. The method of representation is important since it has a major impact on the design of all the other building blocks of our system such as the user interface/query block or the activity recognition/storage block. In this paper we evaluate a representation method for human movement that is based on sequences of angular poses and angular velocites of the human skeletal joints, for storage and retrieval of human actions in video databases. The choice of a representation method plays an important role in the database structure, search methods, storage efficiency etc.. For this representation, we develop a novel approach for complex human activity recognition by employing multidimensional indexing combined with temporal or sequential correlation. this scheme is then evaluated with respect to its efficiency in storage and retrieval. For the indexing we use postures of humans in videos that are decomposed into a set of multidimensional tuples which represent the poses/velocities of human body parts such as arms, legs and torso. Three novel methods for human activity recognition are theoretically and experimentally compared. The methods require only a few sparsely sampled human postures. We also achieve speed invariant recognition of activities by eliminating the time factor and replacing it with sequence information. The indexing approach also provides robust recognition and an efficient storage/retrieval of all the activities in a small set of hash tables.
  Israel Ben-Shaul, Michael Herscovici, Michal Jacovi, Yoelle S. Maarek, Dan Pelleg, Menachem Shtalhaim, Vladimir Soroka, and Sigalit Ur. Adding support for dynamic and focused search with fetuccino. In Proceedings of the Eighth International World-Wide Web Conference, 1999.
This paper proposes two enhancements to existing search services over the Web. One enhancement is the addition of limited dynamic search around results provided by regular Web search services, in order to correct part of the discrepancy between the actual Web and its static image as stored in search repositories. The second enhancement is an experimental two-phase paradigm that allows the user to distinguish between a domain query and a focused query within the dynamically identified domain. We present Fetuccino, an extension of the Mapuccino system that implements these two enhancements. Fetuccino provides an enhanced user-interface for visualization of search results, including advanced graph layout, display of structural information and support for standards (such as XML). While Fetuccino has been implemented on top of existing search services, its features could easily be integrated into any search engine for better performance. A light version of Fetuccino is available on the Internet at http://www.ibm.com/java/fetuccino.
  Israel Ben-Shaul, Michael Herscovici, Michal Jacovi, Yoelle S. Maarek, Dan Pelleg, Menachem Shtalhaim, Vladimir Soroka, and Sigalit Ur. Adding support for dynamic and focused search with fetuccino. In Proceedings of the Eighth International World-Wide Web Conference, 1999.
 
  Tamara L. Berg, Alexander C. Berg, Jaety Edwards, Michael Maire, Ryan White, Yee-Whye Teh, Erik Learned-Miller, and D.A. Forsyth. Names and faces in the news. In CVPR 2004: Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2004.
  Donna Bergmark. Collection synthesis. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002.
The invention of the hyperlink and the HTTP transmission protocol caused an amazing new structure to appear on the Internet - the World Wide Web. With the Web, there came spiders, robots, and Web crawlers, which go from one link to the next checking Web health, ferreting out information and resources, and imposing organization on the huge collection of information (and dross) residing on the net. This paper reports on the use of one such crawler to synthesize document collections on various topics in science, mathematics, engineering and technology. Such collections could be part of a digital library.
  Howard Besser. Mesl project description. In Proceedings of DL'96, 1996.
Format: Not yet online.
  Krishna Bharat and Andrei Broder. Mirror, mirror on the web: A study of host pairs with replicated content. In Proceedings of the Eighth International World-Wide Web Conference, 1999.
TWO previous studies. one done at Stanford in 1997 based on data collected by the Google search engine, and one done at Digital in 1996 based on AltaVista data, revealed that almost a third of the Web consists of duplicate pages. Both studies identified mirroring, that is, the systematic replication of content over a pair of hosts, as the principal cause of duplication, but did not further investigate this phenomenon. The main aim of this paper is to present a clearer picture of mirroring on the Web. As input we used a set of 179 million URLs found during a Web crawl done in the summer of 1998. We looked at all hosts with more than 100 URLs in our input (about 238,000), and discovered that about 10the prevalence of mirroring based on a mirroring classification scheme that we define. There are numerous reasons for mirroring: technical (e.g., to improve access time), commercial (e.g., different intermediaries offering the same products), cultural (e.g., same content in two languages), social (e.g.. sharing of research data). and so forth. Although we have not done a exhaustive study of the causes of replication, we discuss and provide examples for several representative cases. Our technique for detecting mirrored hosts from large sets of collected URLs depends mostly on the syntactic analysis of URL strings, and requires retrieval and content analysis only for a small number of pages. We are able to detect both partial and total mirroring, and handle cases where the content is not byte-wise identical. Furthermore, our technique is computationally very efficient and does not assume that the initial set of URLs gathered from each host is comprehensive. Hence, this approach has practical uses beyond our study, and can be applied in other settings. For instance, for Web crawlers and caching proxies, detecting mirrors can be valuable to avoid redundant fetching. and knowledge of mirroring can be used to compensate for broken links.
  Krishna Bharat, Andrei Broder, Monika Henzinger, Puneet Kumar, and Suresh Venkatasubramanian. The connectivity server: Fast access to linkage information on the web. In Proceedings of the Seventh International World-Wide Web Conference, April 1998.
  B. Bhushan et al. Managing heterogeneous networks-integrator-based approach. In IFIP Transactions C (Communication Systems), 1993.
The authors discuss an object oriented approach to network management. Their goal is to briefly explain a real example of an integrated network management (INM) system. One of the major requirements when looking at information transfer between the managed network and the management system is to mask the heterogeneity of the underlying resources. As an example of the unification of heterogeneity networks, a software called the Integrator has been designed and implemented. The Integrator is a mechanism that provides an object oriented interface to the user (human or network management application programs) to offer a homogeneous view of a world (set of heterogeneous domains) through a model (depicting a formal information view). The Integrator uses two agents to communicate with underlying network elements: an SNMP agent accessing TCP/IP parameters for an Ethernet network through a SNMP agent, and an X.25 interface program doing the same for X.25 parameters through proprietary management software. The concepts of the Integrator has been applied in the EC project PEMMON
  Timothy W. Bickmore and Bill N. Schilit. Digestor: Device-independent access to the world wide web. In Proceedings of the Sixth International World-Wide Web Conference, 1997.
 
  Eric Bier, Lance Good, Kris Popat, and Alan Newberger. A document corpus browser for in-depth reading. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004.
Software tools, including Web browsers, e-books, electronic document formats, search engines, and digital libraries are changing the way that people read, making it easier for them to find and view documents. However, while these tools provide significant help with short-term reading projects involving small numbers of documents, they fall short of supporting readers engaged in longer-term reading projects, in which a topic is to be understood in-depth by reading many documents. Such readers need to find and manage many documents and citations, remember what they have read, and prioritize what to read next. In this paper, we describe three integrated software tools that facilitate in-depth reading. A first tool extracts citation information from documents. A second finds on-line documents from their citations. The last is a document corpus browser that uses a zoomable user interface to show a corpus at multiple granularities while supporting reading tasks that take days, weeks, or longer. We describe these tools and the design principles that motivated them.
  Eric A. Bier and Adam Perer. Icon abacus and ghost icons. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005.
We present two techniques that make document collection visualizations more informative. Icon abacus uses the horizontal position of icon groups to communicate document attributes. Ghost icons show linked documents by adding temporary icons and by highlighting or dimming existing ones.
  William P. Birmingham. An agent-based architecture for digital libraries. D-Lib Magazine, July 1995.
Format: HTML Document().
  William P. Birmingham, Karen M. Drabenstott, Carolyn O. Frost, Amy J. Warner, and Katherine Willis. The university of michigan digital library: This is not your father's library. In Proceedings of the First Annual Conference on the Theory and Practice of Digital Libraries, 1994.
Format: HTML Document (36K) . Audience: slightly technical, generalist comfortable with technology, funders. References: 13. Links: 1. Relevance: Medium-High. Abstract: Describes the UMichigan Digital Libraries proposal, including some detail about their agent architecture. User agents, Collection-interface agents, and mediators all play a role. Network resources are allocated on a market-based mechanism, and proposal mentions need to protect intellectual property & handle payment issues.
  William P. Birmingham, Edmund H. Durfee, Tracy Mullen, and Michael P. Wellman. The distributed agent architecture of the university of michigan digital library (extended abstract). In AAAI Spring Symposium on Information Gathering, 1995.
Format: Compressed PostScript().
  Ann Peterson Bishop. Working towards an understanding of digital library use: A report on the user research efforts of the nsf/arpa/nasa dli projects. D-Lib Magazine, October 1995.
Format: HTML Document().
  Ann Peterson Bishop. Making digital libraries go: Comparing use across genres. In Proceedings of the Fourth ACM International Conference on Digital Libraries, 1999.
A new federal initiative called Information Technology for the Twenty-First Century (IT2) recognizes the need to bridge research across domains in or&r to bring computing benefits to society at large. One implication for digital library (DL) research is that we should start looking at projects that span the spectrum from basic computer science to the implementation of working systems and consider links among findings on information system use from a variety of arenas in life. In this paper, I integrate findings from my research on people's encounters with DLs in two different arenas: academia and low-income neighborhoods. The point is to see how concepts and conclusions related to use do, in fact, cross these arenas. The paper also aims to help bring results from studies of local community information practices into the realm of DLs, since community networking represents one particular genre and audience that has not yet received a great deal of attention from those engaged in DL research. Beginning with a discussion of DL use as an assemblage of infrastructure, norms, knowledge, and practice, the paper explores a number of insights gleaned from user studies associated with two separate research projects: 1) the recently completed University of Illinois Digital Libraries Initiative (DLI) project; and 2) the Community Networking Initiative (CNI) currently in progress under the auspices of the University of Illinois, the Urban League of Champaign County and Prairienet, the community network serving East Central Illinois. Insights about DL use discussed in this paper include: the way in which trivial barriers are magnified until they effectively cut off use on a large scale; the difficulties faced by outsiders whose information worlds are impoverished, the primacy of comfort and relevant content in encouraging use; and the importance of informal social networks for providing help related to system use.
  Barclay Blair and John Boyer. Xfdl: Creating electronic commerce transaction records using xml. In Proceedings of the Eighth International World-Wide Web Conference, 1999.
In the race to transform the World Wide Web from a medium for information presentation to a medium for information exchange, the development of practices for ensuring the security, auditability, and non- repudiation of transactions that are well established in the paper-based world has not kept pace in the digital world. Existing Internet technology provides no easy way to create a valid `digital receipt' that meets the requirements of both complex distributed networks and the business community. In addition, an improved articulation of digital signatures is needed. Extensible Forms Description Language (XFDL), developed by UWI.Com and Tim Bray, is an application of XML that allows organizations to move their paper-based forms systems to the Internet while maintaining the necessary attributes of paper-based transaction records. XFDL was designed for implementation in business-to-business electronic commerce and intra-organizational information transactions.
  Catherine Blake. Information synthesis: A new approach to explore secondary information in scientific literature. In Proceedings of the Fifth ACM/IEEE-CS Joint Conference on Digital Libraries, 2005.
Advances in both technology and publishing practices continue to increase the quantity of scientific literature that is available electronically. In this paper, we introduce the Information Synthesis process, a new approach that enables scientists to visualize, explore, and resolve contradictory findings that are inevitable when multiple empirical studies explore the same natural phenomena. Central to the Information Synthesis approach is a cyber-infrastructure that provides a scientist with both secondary information from an article and structured information resources. To demonstrate this approach, we have developed the Multi-User, Information Extraction for Information Synthesis (METIS) System. METIS is an interactive system that automates critical tasks within the Information Synthesis process. We provide two case-studies that demonstrate the utility of the Information Synthesis approach.
  J.A. Blakeley, W.J. McKenna, and G. Graefe. Experiences building the open oodb query optimizer. In Proceedings of the International Conference on Management of Data, 1993.
The authors report their experiences building the query optimizer for TI's Open OODB system. It is probably the first working object query optimizer to be based on a complete extensible optimization framework including logical algebra, execution algorithms, property enforcers, logical transformation rules, implementation rules, and selectivity and cost estimation. Their algebra incorporates a new materialize operator with its corresponding logical transformation and implementation rules that enable the optimization of path expressions. The Open OODB query optimizer was constructed using the Volcano Optimizer Generator, demonstrating that this second-generation optimizer generator enables rapid development of efficient and effective query optimizers for non-standard data models and systems.
  Ann Blandford, Suzette Keith, Iain Connell, and Helen Edwards. Analytical usability evaluation for digital libraries: a case study. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004.
There are two main kinds of approach to considering usability of any system: empirical and analytical. Empirical techniques involve testing systems with users, whereas analytical techniques involve usability personnel assessing systems using established theories and methods. We report here on a set of studies in which four different techniques were applied to various digital libraries, focusing on the strengths, limitations and scope of each approach. Two of the techniques, Heuristic Evaluation and Cognitive Walkthrough, were applied in text-book fashion, because there was no obvious way to contextualize them to the Digital Libraries (DL) domain. For the third, Claims Analysis, it was possible to develop a set of re-usable scenarios and personas that relate the approach specifically to DL development. The fourth technique, CASSM, relates explicitly to the DL domain by combining empirical data with an analytical approach. We have found that Heuristic Evaluation and Cognitive Walkthrough only address superficial aspects of interface design (but are good for that), whereas Claims Analysis and CASSM can help identify deeper conceptual difficulties (but demand greater skill of the analyst). However, none fit seamlessly within the fragmented function-oriented design practices that typify much digital library development, highlighting an important area for further work to support improved usability.
  Ann Blandford, Hanna Stelmaszewska, and Nick Bryan-Kinns. Use of multiple digital libraries: A case study. In Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries, 2001.
The aim of the work reported here was to better understand the usability issues raised when digital libraries are used in a natural setting. The method used was a protocol analysis of users working on a task of their own choosing to retrieve documents from publicly available digital libraries. Various classes of usability difficulties were found. Here, we focus on use in context - that is, usability concerns that arise from the fact that libraries are accessed in particular ways, under technically and organisationally imposed constraints, and that use of any particular resource is discretionary. The concepts from an Interaction Framework, which provides support for reasoning about patterns of interaction between users and systems, are applied to understand interaction issues.
  R. Boisvert, S. Browne, J. Dongarra, and E. Grosse. Digital software and data repositories for support of scientific computing. In Advances in Digital Libraries '95, 1995.
Format: Not Yet Online.
  Kurt D. Bollacker, Steve Lawrence, and C. Lee Giles. A system for automatic personalized tracking of scientific literature on the web. In Proceedings of the Fourth ACM International Conference on Digital Libraries, 1999.
We introduce a system as part of the CiteSeer digital library project for automatic tracking of scientific literature that is relevant to a user's research interests. Unlike previous systems that use simple keyword matching, CiteSeer is able to track and recommend topically relevant papers even when keyword based query profiles fail. This is made possible through the use of a heterogenous profile to represent user interests. These profiles include several representations, including content based relatedness measures. The CiteSeer tracking system is well integrated into the search and browsing facilities'of CiteSeer, and provides the user with great flexibility in tuning a profile to better match his or her interests. The software for this system is available, and a sample database is online as a public service.
  Leslie Bondaryk. Calculus modules online: An internet multimedia application. In DAGS'95, 1995.
Format: HTML Document(21K + pictures)

Audience: Calculus Instructors. References: 13. Links: 16. Abstract: Discusses an architecture for a system that aids in the teaching of calculus.

  J. Bonigk and A. Lubinski. A basic architecture for mobile information access. Computers & Graphics, 20(5):683-91, 1996.
As the development of pen computing' continues, more and more of today's computers are likely gradually to move away from people's desktops and into their pockets. The development of personal digital assistants (PDAs) has initiated this move. As these devices move into people's pockets, they need the ability to access information on the move. This article describes a generic view of a client server mobile computing architecture. It also sheds some light on the basic network topologies that have been considered previously for such systems. The scenario used is a hospital ward. Each doctor is equipped with a PDA and each ward or a group of wards with a server providing patient records. As a doctor visits a patient in a ward, the patient's record is accessed from the server onto the PDA. The doctor updates the record and sends the update back to the server.
  Jos‰ Borbinha, Nuno Freire, and Joƒo Neves. Bnd: A national digital library as a jigsaw. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004.
This paper describes the architecture and components of the infrastructure in construction for the National Digital Library in Portugal. The requirements emerged from the definition of the services to support, with a special focus on scalability, and from the decision to give a special attention to community building standards, open solutions, and reusable and cost effective components. The generic bibliographic metadata format in this project is UNIMARC, and the structural metadata is METS. The URN identifiers are processed and resolved as simple but very effective PURL identifiers, and the storage is provided by the new emerging LUSTRE file system, for immediate access, and by a locally developed GRID architecture, ARCO, for long term preservation. All these components run on Linux servers, as also the middleware for access based in the FEDORA framework.
  N. Borenstein and N. Freed. MIME (Multipurpose Internet Mail Extensions) Part One: Mechanisms for specifying and describing the format of Internet message bodies, September 1993. Internet RFC 1521.
  Nathaniel Borenstein. Cooperative work in the andrew message system. In Proceedings of the Conference on Computer-Supported Cooperative Work, CSCW'88, 1988.
Describes collab-related aspects of Andrew.
  Christine L. Borgman, Gregory Leazer, Anne Gilliland-Swetland, Kelli Millwood, Leslie Champeny, Jason Finley, and Laura J. Smart. How geography professors select materials for classroom lectures: Implications for the design of digital libraries. In Proceedings of the Fourth ACM/IEEE-CS Joint Conference on Digital Libraries, 2004.
A goal of the Alexandria Digital Earth Prototype (ADEPT) project is to make primary resources in geography useful for undergraduate instruction in ways that will promote inquiry learning. The ADEPT education and evaluation team interviewed professors about their use of geography information as they prepare for class lectures, as compared to their research activities. We found that professors desired the ability to search by concept (erosion, continental drift, etc.) as well as geographic location, and that personal research collections were an important source of instructional materials. Resources in geo-spatial digital libraries are typically described by location, but are rarely described by concept or educational application. This paper presents implications for the design of an educational digital library from our observations of the lecture preparation process. Findings include functionality requirements for digital libraries and implications for the notion of digital libraries as a shared information environment. The functional requirements include definitions and enhancements of searching capabilities, the ability to contribute and to share personal collections of resources, and the capability to manipulate data and images.
  Katy Borner, Ying Feng, and Tamara McMahon. Collaborative visual interfaces to digital libraries. In Proceedings of the Second ACM/IEEE-CS Joint Conference on Digital Libraries, 2002.
This paper argues for the design of collaborative visual interfaces to digital libraries that support social navigation. As an illustrative example we present work in progress on the design of a three-dimensional document space for a scholarly community - namely faculty, staff, and students at the School of Library and Information Science, Indiana University. We conclude with a set of research challenges.
  C. Mic Bowman, Peter B. Danzig, Darren R. Hardy, Udi Manber, Michael F. Schwartz, and Duane P. Wessels. Harvest: A scalable, customizable discovery and access system. Technical Report CU-CS-732-94, Dept. of Computer Science, Univ. of Colorado, Boulder, Colo., August 1994. Accessible at http://harvest.transarc.com/.
  C.M. Bowman, Peter B. Danzig, Darren R. Hardy, Udi Manber, and Michael F. Schwartz. The harvest information discovery and access system. Computer Networks and ISDN Systems, 28(1-2):119-125, December 1995.
It is increasingly difficult to make effective use of Internet information, given the rapid growth in data volume, user base, and data diversity. We introduce Harvest, a system that provides a scalable, customizable architecture for gathering, indexing, caching, replicating, and accessing Internet information.
  Claus Brabrand, Anders Moller, Anders Sandholm, and Michael I. Schwartzbach. A runtime system for interactive web services. In Proceedings of the Eighth International World-Wide Web Conference, 1999.
Interactive Web services are increasingly replacing traditional static Web pages. Producing Web services seems to require a tremendous amount of laborious low-level coding due to the primitive nature of CGI programming. We present ideas for an improved runtime system for interactive Web services built on top of CGI running on virtually every combination of browser and HTTP/CGI server. The runtime system has been implemented and used extensively in <bigwig>. a tool for producing interactive Web services.
  Onn Brandman, Junghoo Cho, Hector Garcia-Molina, and Narayanan Shivakumar. Crawler-friendly web servers. In Proceedings of the Workshop on Performance and Architecture of Web Servers (PAWS), Santa Clara, California, June 2000. Held in conjunction with ACM SIGMETRICS 2000. Available at http://dbpubs.stanford.edu/pub/2000-25.
In this paper we study how to make web servers (e.g., Apache) more crawler friendly. Current web servers offer the same interface to crawlers and regular web surfers, even though crawlers and surfers have very different performance requirements. We evaluate simple and easy-to-incorporate modifications to web servers so that there are significant bandwidth savings. Specifically, we propose that web servers export meta-data archives decribing their content.
  Onn Brandman, Hector Garcia-Molina, and Andreas Paepcke. Where have you been? a comparison of three web tracking technologies. In Submitted for publication, 1999. Available at http://dbpubs.stanford.edu/pub/1999-61.
Web searching and browsing can be improved if browsers and search engines know which pages users frequently visit. 'Web tracking' is the process of gathering that information. The goal for Web tracking is to obtain a database describing Web page download times and users' page traversal patterns. The database can then be used for data mining or for suggesting popular or relevant pages to other users. We implemented three Web tracking systems, and compared their performance. In the first system, rather than connecting directly to Web sites, a client issues URL requests to a proxy. The proxy connects to the remote server and returns the data to the client, keeping a log of all transactions. The second system uses sniffers to log all HTTP traffic on a subnet. The third system periodically collects browser log files and sends them to a central repository for processing. Each of the systems differs in its advantages and pitfalls. We present a comparison of these techniques.
  Jack Brassil. September - secure electronic publishing trial. In Proceedings of DL'96, 1996.
Format: Not yet online.
  Lee Breslau, Pei Cao, Li Fan, Graham Phillips, and Scott Shenker. Web caching and zipf-like distributions: Evidence and implications. In Proceedings of Infocom, 1999.
  Allen Brewer, Wei Ding, Karla Hahn, and Anita Komlodi. The role of intermediary services in emerging digital libraries. In Proceedings of DL'96, 1996.
Format: Not yet online.
  M.W. Bright, A.R. Hurson, and S. Pakzad. Automated resolution of sematic heterogeneity in multidatabases. ACM Transaction on Database Systems, 19(2):212-253, June 1994.
  M.W. Bright, A.R. Hurson, and Simin H. Pakzad. A taxonomy and current issues in multidatabase systems. IEEE Computer, 25(3):51-60, March 1992.
This article presents a taxonomy of global information-sharing systems and discusses where multidatabase systems fit in the spectrum of solutions. The authors use this taxonomy as a basis for defining multidatabase systems, then discuss the issues associated with them. In particular, the paper focuses on two major design approaches- global schema systems and multidatabase language systems.
  Brightplanet.com. http://www.brightplanet.com.
  The Deep Web: Surfacing Hidden Value. http://www.completeplanet.com/Tutorials/DeepWeb/.
  S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proceedings of 7th World Wide Web Conference, 1998.
In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 million pages is available at http://google.stanford.edu/ To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from three years ago. This paper provides an in-depth description of our large-scale web search engine - the first such detailed public description we know of to date. Apart from the problems of scaling traditional search techniques to data of this magnitude, there are new technical challenges involved with using the additional information present in hypertext to produce better search results. This paper addresses this question of how to build a practical large-scale system which can exploit the additional information present in hypertext. Also we look at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.
  Sergev Brin, James Davis, and Hector Garcia-Molina. Copy detection mechanisms for digital documents. SIGMOD, pages 398-409, 1995.
In a digital library system, documents are available in digital form and therefore are more easily copied and their copyrights are more easily violated. This is a very serious problem, as it discourages owners of valuable information from sharing it with authorized users. There are two main philosophies for addressing this problem: prevention and detection. The former actually makes unauthorized use of documents difficult or impossible while the latter makes it easier to discover such activity. We propose a system for registering documents and then detecting copies, either complete copies or partial copies. We describe algorithms for such detection, and metrics required for evaluating detection mechanisms (covering accuracy, efficiency, and security). We also describe a working prototype, called COPS, describe implementation issues, and present experimental results that suggest the proper settings for copy detection parameters.
  Sergey Brin. Extracting patterns and relations from the world wide web. In WebDB Workshop at 6th International Conference on Extending Database Technology, EDBT'98, 1998. Available at http://www-db.stanford.edu/ sergey/extract.ps.
Seed a search with examples of a pattern, such as citations to books. Let the engine run over Web pages and learn. Get back more books.
  Sergey Brin and Lawrence Page. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the Seventh International World-Wide Web Conference, 1998.
Shows architecture of Google.
  Sergey Brin and Lawrence Page. Dynamic data mining: A new architecture for data with high dimensionality. Technical report, Stanford University, 1998.
Describes a new architecture for data mining. It makes use of some of the dynamic itemset counting technology
  Andrei Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins, and Janet Wiener. Graph structure in the web: experiments and models. In Proceedings of the Ninth International World-Wide Web Conference, 2