Seeking the truth about ad hoc join costs

被引:40
作者
Haas L.M. [1 ]
Carey M.J. [1 ]
Livny M. [2 ]
Shukla A. [2 ]
机构
[1] IBM Almaden Research Center, K55/B1, 650 Harry Road, San Jose
[2] Computer Sciences Dept., University of Wisconsin-Madison, 1210 West Dayton Street, Madison
关键词
Buffer allocation; Cost models; Join methods; Optimization; Performance;
D O I
10.1007/s007780050043
中图分类号
学科分类号
摘要
In this paper, we re-examine the results of prior work on methods for computing ad hoc joins. We develop a detailed cost model for predicting join algorithm performance, and we use the model to develop cost formulas for the major ad hoc join methods found in the relational database literature. We show that various pieces of "common wisdom" about join algorithm performance fail to hold up when analyzed carefully, and we use our detailed cost model to derive optimal buffer allocation schemes for each of the join methods examined here. We show that optimizing their buffer allocations can lead to large performance improvements, e.g., as much as a 400% improvement in some cases. We also validate our cost model's predictions by measuring an actual implementation of each join algorithm considered. The results of this work should be directly useful to implementors of relational query optimizers and query processing systems.
引用
收藏
页码:241 / 256
页数:15
相关论文
共 26 条
[1]  
Blasgen, M., Eswaran, K., Storage and access in relational data bases (1977) IBM Sys. J, 16 (4), pp. 362-377
[2]  
Bratbergsengen, B., Hashing methods and relational algebra operations (1984) Proc. 10th VLDB Conf., , Dyal U, Schlageter G, Seng LH (eds) Singapore. Morgan Kaufmann, CA
[3]  
Brown, K., (1992) Resource Allocation and Scheduling for Mixed Database Workloads, , CS Tech. Rep. No. 1095, Univ. of Wisconsin, Madison
[4]  
Carey, M., Haas, L., Livny, M., Tapes hold data, too: Challenges of tuples on tertiary store (1993) Proc. ACM SIGMOD Conf., , Buneman P, Jajodia S (eds) Washington, D.C. ACM, NY
[5]  
Davison, D., Graefe, G., Dynamic resource brokering for multi-user query execution (1995) Proc. ACM SIGMOD Conf., , Carey M, Schneider D (eds) San Jose, Calif, ACM, NY
[6]  
Dewitt, D., Implementation techniques for main memory database systems (1984) Proc. ACM SIGMOD Conf., , Yormark B (ed) Boston, Mass, ACM, NY
[7]  
Graefe, G., (1993) Performance Enhancements for Hybrid Hash Join, , Available as University of Colorado CS Technical Report No. 606
[8]  
Graefe, G., Query evaluation techniques for large databases (1993) ACM Comput Surv, 25 (2), pp. 73-170
[9]  
Graefe, G., Linville, A., Shapiro, L., Sort versus hash revisited (1994) IEEE Trans Knowl Data Eng, 6 (6), pp. 934-944
[10]  
Hagmann, R., An observation on database buffering performance metrics (1986) Proc. 12th VLDB Conf., , Chu WW, Gardarin G, Ohsuga S, Kambayashi Y (eds) Kyoto, Japan. Morgan Kaufmann, CA