Online Aggregation

Interactive Display and Control of Long-Running Queries

A sample Online Aggregation interface.

Background on the Online Aggregation Project
Papers on Online Aggregation

Background

Aggregation is an increasingly important operation in today's relational database systems. As data sets grow larger and both users and user interfaces become more sophisticated, there is a growing emphasis on extracting not just specific data items, but also general characterizations of large subsets of the data. Users want this aggregate information right away, even though producing it may involve accessing and condensing enormous amounts of information.

Unfortunately, aggregation processing in today's database systems closely resembles the batch processing of the 1960's. When users submit an aggregation query to the system, they are forced to wait without feedback while the system churns through millions of records or more. Only after a significant period of time does the system respond with the (usually small) final answer. A particularly frustrating aspect of this problem is that aggregation queries are typically used to get a ``rough picture'' of a large body of information, and yet they are computed with painstaking precision, even in situations where an acceptably precise approximation might be available very quickly.

In the Online Aggregation project, we are changing the interface to aggregation processing and, by extension, changing aggregation processing itself. The idea is to perform aggregation online in order to allow users both to observe the progress of their queries and to control execution on the fly. This enhancement requires changes not only to the user interface, but also to the techniques used for query optimization and execution. In addition, we are using new and existing statistical estimation techniques to help users assess the proximity of the running aggregate to the final result; the proposed interface makes these techniques accessible even to users with little or no statistical background. Online aggregation interfaces can go well beyond merely providing a platform for such statistical estimation techniques, permitting an interactive approach to both formal and informal data exploration and analysis.

Papers and Talks

Joseph M. Hellerstein. Online Processing Redux. To appear, Data Engineering Bulletin, September 1997. Available in postscript format.
Joseph M. Hellerstein, Peter J. Haas, Helen J. Wang. Online Aggregation. SIGMOD '97. Available in postscript and adobe pdf formats.
Peter J. Haas. Large-Sample and Deterministic Confidence Intervals for Online Aggregation. To appear, Ninth International Conference on Scientific and Statistical Database Management (SSDBM '97). Available in postscript format.
Joseph M. Hellerstein. The Case For Online Aggregation. An older proposal of the current reality. UC Berkeley Computer Sciences Technical Report. Available in Microsoft Word, postscript, and adobe pdf formats.
Online Aggregation. HTML Slides of a talk given at Maryland, IBM Almaden, Informix, and Berkeley. Click here for text only.

This work is funded by a grant from Informix Corporation.

[ Home | News | Contents ]

Last modified: $Date: 1997/08/05 23:35:34 $ by Joe Hellerstein, jmh@cs.berkeley.edu.