進階搜尋


 
系統識別號 U0026-0812200915353806
論文名稱(中文) 處理前 k 個在門檻值以及基數範圍內的組合式天際線查詢
論文名稱(英文) Top-k Combinatorial Skyline with Threshold and Cardinality Constraints
校院名稱 成功大學
系所名稱(中) 資訊工程學系碩博士班
系所名稱(英) Institute of Computer Science and Information Engineering
學年度 97
學期 2
出版年 98
研究生(中文) 張文欣
研究生(英文) Wen-Hsin Chang
學號 P7696448
學位類別 碩士
語文別 中文
論文頁數 70頁
口試委員 指導教授-李強
口試委員-陳耀輝
口試委員-盧文祥
指導教授-鍾毓驥
口試委員-謝孫源
中文關鍵字 組合式天際線  top-k  基數限制  天際線  門檻值限制 
英文關鍵字 cardinality constraint  top-k  combinatorial skyline  skyline  threshold constraint 
學科別分類
中文摘要 近幾年來, skyline 查詢非常熱門。因為 skyline 查詢可以幫助使用者在大量具有矛盾條件的資料庫內容中有效過濾不必要的資料,
同時挑選出符合使用者要求的資料結果,
以讓使用者作進一步的決策判斷。然而目前的 skyline 查詢只是將資料一一的比對,找出資料中符合使用者需求的答案,
卻忽略了使用者可能希望從這些資料中找出一個組合({ f combination})(由一到多筆資料經由一個特定的函式構成),
使得這個組合能夠滿足使用者查詢的條件。
在本論文中,我們提出一個新的 skyline 查詢,叫做 {em { f T}op-k { f C}ombinatorial { f S}kyline with { f T}hreshold and { f C}ardinality constraints { f Q}uery} ({em Top-k CSTCQ}) 。
Top-k CSTCQ 可以解決多筆資料產生的組合的 skyline 查詢。然而多筆資料產生的組合計算複雜度高,
因此我們針對 Top-k CSTCQ 提出一個有效率的演算法 {em { f T}op-k { f T}hreshold and { f C}ardinality-based { f C}ombinatorial { f S}kyline} ({em TTCCS}) 演算法。
TTCCS 演算法先採取 threshold 限制轉換成 cardinality 限制的慨念,藉此降低 combinations 個數,接著利用二項式係數以及動態規劃兩個慨念提出表格式的演算法設計原則,
並利用 4 個 pruning strategies 改良 TTCCS 演算法(TTCCS-Extension),
可以過濾不符合 threshold 限制以及 top- k 限制的 combinations 。
在最後,我們用實驗結果証明我們的 TTCCS-EX 演算法在 synthetic datasets 上很有效率。
英文摘要 In recent years, the skyline operator has been popular.
This is mainly due to the importance of various skyline results in applications involving multi-criteria decision making.
However, skyline queries are considered in comparing each tuple one by one so as to provide useful information for the user
but ignoring that user may wish to find information from a combination (constituted through a specific function from one to more tuples)
which meets the conditions of the isuued query.
%Then, the user wishes to make this combination to meet the conditions of the issued query.
In this paper, we exploit a new skyline query, {em { f T}op-k { f C}ombinatorial { f S}kyline with { f T}hreshold and { f C}ardinality constraints { f Q}uery} ({em Top-k CSTCQ}).
Top-k CSTCQ can tackle the problem of a combination which is constituted from one to more tuples.
However, the computation complexity of top-k CSTCQ is quite high. We design an efficient algorithm, named {em { f T}op-k { f T}hreshold and { f C}ardinality-based { f C}ombinatorial { f S}kyline} ({em TTCCS}) algorithm, to solve Top-k CSTCQ.
First, TTCCS algorithm adopts a concept of transforming threshold constraints to cardinality constraint to reduce the number of combinations,
and then uses the concept of binomial coefficient and dynamic programming to construct a table-based algorithm.
We propose four pruning strategies to improve TTCCS algorithm (named {em TTCCS -Extension})
to prune the non-qualifying combinations progressively.
Comprehensive experiments show that our algorithms can answer Top-k CSTCQ on various synthetic datasets efficiently.
論文目次 Abstract i
Acknowledgements iii
Table of Contents iv
Table of Figures vi
Table of Tables viii
Table of Algorithms ix
1 Introduction 1
2 Related Work 8
2.0.1 Traditional skyline query . . . . . . . . . . . . . . . 8
2.0.2 Skyline Variants . . . . . . . . . . . . . . . . . . . . 9
3 Preliminaries 18
3.0.3 Problem Definition . . . . . . . . . . . . . . . . . . 18
4 Top-k Threshold and Cardinality-based Combinatorial Sky-
line Algorithm (TTCCS) 24
4.0.4 Relationship between threshold and cardinality . . 24
4.0.5 TTCCS Algorithm . . . . . . . . . . . . . . . . . . 27
4.0.6 Improving the TTCCS Algorithmwith Pruning Strategies
(TTCCS-EX Algorithm) . . . . . . . . . . . . 32
5 TTCCS-EX - Average 45
6 Experimental Evaluation 53
6.0.7 Experimental Setup . . . . . . . . . . . . . . . . . . 53
6.0.8 Synthetic datasets . . . . . . . . . . . . . . . . . . 54
6.0.9 Performance on Synthetic datasets . . . . . . . . . 57
7 Conclusions and Future Work 61
Biography 70
參考文獻 [1] Stephan Borzsonyi, Donald Kossmann, and Konrad Stocker, “The
Skyline Operator,” in Proceedings of the 17th International Conference
on Data Engineering (ICDE), pp. 421-430, 2001.
[2] Jan Chomicki, Parke Godfrey, Jarek Gryz and Dongming Liang,
“Skyline with Presorting,” in Proceedings of the 17th International
Conference on Data Engineering (ICDE), pp. 717-719, 2003.
[3] Ken C. K. Lee, Baihua Zheng, Huajing Li, and Wang-Chien Lee,
“Approaching the Skyline in Z Order,” in Proceedings of the 33th
International Conference on Very Large Databases (VLDB), pp. 279-
290, 2007.
[4] Donald Kossmann, Frank Ramsak, and Steffen Rost, “Shooting Stars
in the Sky: An Online Algorithm for Skyline Queries,” in Proceedings
of the 28th International Conference on Very Large Data Bases
(VLDB), pp. 275-286, 2002.
[5] Kian-Lee, Tan Pin-Kwang Eng, and Beng Chin Ooi, “Efficient Progressive
Skyline Computation,” in Proceedings of the 27th International
Conference on Very Large Databases (VLDB), pp. 301-310,
2001.
[6] Dimitris Papadias, Yufei Tao, Greg Fu, and Bernhard Seeger, “An
Optimal and Progressive Algorithm for Skyline Queries,” in Pro-
62
BIBLIOGRAPHY 63
ceedings of the 2003 ACM Special Interest Group on Management
Of Data international conference (SIGMOD), pp. 467-478, 2003.
[7] Cuiping Li, Anthony K. H. Tung, Wen Jin, and Martin Ester, “On
Dominating Your Neighborhood Profitably,” in Proceedings of the
33rd international Conference on Very Large Databases (VLDB), pp.
818-829, 2007.
[8] Xuemin, Lin Yidong, Yuan Wei Wang, and Hongjun Lu, “Stabbing
the sky: Efficient skyline computation over sliding windows,” in Proc.
2005 Int. Conf. Data Engineering (ICDE), pp. 502- 513, 2005.
[9] Cuiping Li, Beng Chin Ooi, Anthony K. H. Tung, and Shan Wang,
“Dada: a data cube for dominant relationship analysis,” in Proceedings
of the 2006 ACM SIGMOD international conference on Management
of data conference (SIGMOD), pp. 659-670, 2006.
[10] Milton Friedman and Leonard J. Savage, “The Utility Analysis of
Choices Involvin Risk,” in Journal of Political Economy 56, pp. 279-
304, 1948.
[11] Chien Ping Hsieh, “Fundamentals of Investments,” BEST-WISE.
[12] Stocks, Bonds, Bills, and Inflation Yearbook, “Ibbotson Associates,”
Inc., Chicago.1983.
[13] Frank K. Reilly and Keith C. Brown “Essentials of Investment Analysis
and Portfolio Management,” THOMSON.
[14] NYSE Euronext, http://www.nyse.com/
[15] http://www.angelibrary.com/economic/gpjb/
63
BIBLIOGRAPHY 64
[16] Parke Godfrey, Ryan Shipley, and Jarek Gryz, “Maximal vector computation
in large data sets,” in Proceedings of the 31st international
conference on Very large data bases (VLDB), pp. 229 - 240, 2005.
[17] Wen Jin, Jiawei Han, andMartin Ester, “Mining thick skylines over
large databases,” in Proceedings of the 8th European Conference
on Principles and Practice of Knowledge Discovery in Databases
(PKDD), pp. 255 - 266, 2004.
[18] Tian Xia and Donghui Zhang, “Refreshing the sky: the compressed
skycube with efficient support for frequent updates,” in Proceedings
of the 2006 ACM SIGMOD international conference on Management
of data (SIGMOD), pp. 491-502, 2006.
[19] Yidong Yuany, Xuemin Liny, Qing Liuy, Wei Wangy, Jeffrey Xu
Yux,and Qing Zhangy, “Efficient computation of the skyline cube,”
in Proceedings of the 31th International Conference on Very Large
Data Bases (VLDB), pp.241-252, 2005.
[20] Jian Pei, Wen Jin, Martin Ester,and Yufei Tao, “Catching the best
views in skyline: A semantic approach,” in Proceedings of the 31th
International Conference on Very Large Data Bases (VLDB), pp.
253-264, 2005.
[21] Jian Pei, Yidong Yuan, Xuemin Lin, Wen Jin, Martin Ester, Qing
Liu, Wei Wang, Yufei Tao, Jeffrey Xu Yu and Qing Zhang, “Towards
multidimensional subspace skyline analysis,” in ACM Transactions
on Database Systems (TODS), pp. 1335-1381, 2006.
[22] Jian Pei, Ada Wai-Chee Fu, Xuemin Lin,and Haixun Wang, “Computing
compressed skyline cubes efficiently,” in Proceedings of the
64
BIBLIOGRAPHY 65
23nd International Conference on Data Engineering (ICDE), pp. 96-
105, 2007.
[23] Yufei Tao, Xiaokui Xiao, and Jian Pei, “Subsky: Efficient computation
of skylines in subspaces,” in Proceedings of the 22nd International
Conference on Data Engineering (ICDE), pp. 65-65, 2006.
[24] Yufei Tao and Dimitris Papadias, “Maintaining sliding window skylines
on data streams,” in IEEE Transactions on Knowledge and
Data Engineering (TKDE), pp. 377-391, 2006.
[25] Michael Morse, Jignesh M. Patel, and William I. Grosky, “Efficient
continuous skyline computation,” in Proceedings of the 22nd International
Conference on Data Engineering (ICDE), pp. 3411-3437,
2006.
[26] Jon Louis Bentley, HT T Kung, Mario Schkolnick, and Clark David
Thomborson, “On the average number of maxima in a set of vectors
and applications,” in Journal of the ACM (JACM), pp. 536-543,
1978.
[27] Wolf-tilo Balke , Ulrich Guntzer,and Jason Xin Zheng, “Efficient
distributed skylining for web information systems.” ln Extending
Database Technology (EDBT), pp. 256-273, 2004.
[28] Zhiyong Huang, Jensen, C.S., Hua Lu,and Beng Chin Ooi, “Skyline
queries against mobile lightweight devices in manets,” in Proceedings
of the 22nd International Conference on Data Engineering (ICDE),
pp. 66-66,2006.
[29] CheeYong Chan, PinKwang Eng,and KianLee Tan, “Stratified computation
of skylines with partially-ordered domains,” in Proceedings
65
BIBLIOGRAPHY 66
of the 2005 ACM SIGMOD international conference (SIGMOD), pp.
203-214, 2005.
[30] Vladlen Koltun and Christos H. Papadimitriou, “Approximately
dominating representatives,” in Theoretical Computer Science, pp.
148-154, 2007.
[31] Chee-Yong Chan, H.V. Jagadish, Kian-Lee Tan, Anthony K.H. Tung,
and Zhenjie Zhang, “On high dimensional skylines,” in Extending
Database Technology (EDBT), pp. 478-495, 2006,
[32] Chee-Yong Chan, H.V. Jagadish, Kian-Lee Tan, Anthony K.H.
Tung,and Zhenjie Zhang, “Finding k-dominant skylines in high dimensional
space,” in Proceedings of the 2006 ACM SIGMOD international
conference on Management of data (SIGMOD), pp. 503-514,
2006.
[33] Raymond Chi-WingWong, Jian Pei, AdaWai-Chee Fu,and KeWang
“Mining favorable facets,” in Proceedings of the 13th ACM SIGKDD
international conference on Knowledge discovery and data mining
(KDD), pp. 804-813 , 2007.
[34] Bin Jiang, Jian Pei, Xuemin Lin, David W. Cheung,and Jiawei
Han, “Mining preferences from superior and inferior examples,” in
Proceedings of the 14th ACM SIGKDD international conference on
Knowledge discovery and data mining(KDD). New York, NY, USA:
ACM, 2008.
[35] Raymond ChiWing Wong, Ada WaiChee Fu, Jian Pei, Yip Sing Ho,
Tai Wong,and Yubao Liu, “Efficient skyline querying with variable
user preferences on nominal attributes,” in VLDB: Proceedings of the
66
BIBLIOGRAPHY 67
34th International Conference on Very Large Databases (VLDB), pp.
1032-1043, 2008.
[36] Jie Liu, Liang Feng, and Yunpeng Xing, “A pruning-based approach
for supporting Top-K join queries,” in Proceedings of the 15th international
conference on World Wide Web (WWW), pp. 891-892,
2006.
[37] Xuemin Lin, Yidong Yuan, Qing Zhang, and Ying Zhang, “Selecting
Stars: The k Most Representative Skyline Operator,” in 3rd International
Conference on Data Engineering (ICDE), pp. 86-95, 2007.
[38] Evangelos Dellis, Akrivi Vlachou, and Ilya Vladimirskiy, “Constrained
Subspace Skyline Computation,” in Proceedings of the 15th
ACM international conference on Information and knowledge management
(CIKM), pp. 415-424, 2006.
[39] Dimitris Sacharidis, Stavros Papadopoulos, Dimitris Papadias,
“Topologically-sorted Skylines for Partially-ordered Domains,” in
25th International Conference on Data Engineering (ICDE), pp.
1072-1083, 2009
[40] http://www.derf.net/knapsack/ NP-Completeness, Cryptology, and
Knapsacks
[41] http://www.cse.ohio-state.edu/ gurari/theory-bk/theory-bkfivese4.
html More NP-Complete Problems
[42] Silvano Martello, and Paolo Toth, KNAPSACK PROBLEMS Algorithms
and Computer Implementations.
[43] Wen Jin, Anthony K. H. Tung, Martin Ester, and Jiawei Han, “On
Efficient Processing of Subspace Skyline Queries on High Dimen-
67
BIBLIOGRAPHY 68
sional Data,” in 19th International Conference on Scientific and Statistical
Database Management (SSBDM), pp. 12-12, 2007.
[44] Bin Jiang, and Jian Pei, “Online Interval Skyline Queries on Time
Series,” in Proceedings of the 25th International Conference on Data
Engineering (ICDE), pp. 1036-104, 2009.
[45] Edward M McCreight, “Priority search trees,” SIAM J. Comput.,
vol. 14, no. 2, pp. 257-276, 1985.
[46] Chia-Whey Chen, Yu-Whey Kao, and Yu-Jane Liu, “Investor’s Preferences
and Assets Allocation,”
[47] Bin Cui , Hua Lu , Quanqing Xu , Lijiang Chen , Yafei Dai , and
Yongluan Zhou, “Parallel Distributed Processing of Constrained Skyline
Queries by Filtering,” in Proceedings of the 24th International
Conference on Data Engineering (ICDE), pp. 546-555, 2008.
[48] Hakan Ferhatosmanoglu, Ioanna Stanoi, Divyakant Agrawal, and
Amr El Abbadi, “Constrained Nearest Neighbor Queries,” International
Symposium on Spatial and Temporal Databases (SSTD), pp.
257-278, 2001.
[49] Philippe Flajolet, and G Nigel Martin, “Probabilistic counting algorithms
for data base applications,” Journal of Computer and System
Sciences, pp. 182-209, 1985.
[50] Michael Morse, Jignesh M. Patel, and H.V. Jagadish, “Efficient Skyline
Computation over Low-Cardinality Domains,” in Proceedings of
the 33rd international conference on Very large data bases (VLDB),
pp. 267-278, 2007.
68
BIBLIOGRAPHY 69
[51] Gang Gou,and Rada Chirkova, “Efficiently querying large xml data
repositories: A survey,” in IEEE Transactions on Knowlenge and
Data Engineering (TKDE), pp. 1381-1403, 2007.
[52] Rakesh Agrawal, Alexander Tiberiu Borgida, and Hosagrahar V Jagadish,
“Efficient management of transitive relationships in large
data and knowledge bases,” in SIGMOD, 1989, pp. 253-262.
[53] PingWu, Caijie Zhang, Ying Feng, Ben Y. Zhao, Divyakant Agrawal,
and Amr El Abbadi, “Parallelizing Skyline Queries for Scalable Distribution,”
in Advances In Database Technology (EDBT) pp. 112-
130, 2006.
[54] Gautam Das, Dimitrios Gunopulos, and Nick Koudas, “Answering
Top-k Queries Using Views,” in Proceedings of the 32nd international
conference on Very large data bases (VLDB), pp. 451-462, 2006.
[55] Gautam Das, and Dimitrios Gunopulos, “Adhoc Top-k Query Answering
for Data Streams,” in Proceedings of the 33rd international
conference on Very large data bases (VLDB), pp. 183-194, 2007.
[56] Panayiotis Tsaparas, Themistoklis Palpanas, and Yannis Kotidis
“Ranked Join Indices,” in Proceedings of the 19th International Conference
on Data Engineering (ICDE), pp. 277-288, 2003.
[57] Philippe Rigaux, M. Scholl, and Agnes Voisard. “An introduction to
spatial database systems,” The International Journal on Very Large
Data Base, pp. 357-399, 2000.
[58] Xuemin Lin,Qing Liu, Yidong Yuan,Xiao Fang Zhou,and Hong Jun
Lu, “Summarizing Level-Two Topological Relations in Large Spatial
69
BIBLIOGRAPHY 70
Datasets,” in ACM Transactions on Database Systems (TODS), pp.
584-630, 2006.
[59] Junchang Xin, Guoren Wang, Lei Chen, Xiaoyi Zhang, and ZhenhuaWang,
“Continuously Maintaining Sliding Window Skylines in a
Sensor Network,” in Proceedings of the 12th International Conference
on Database Systems for Advanced Applications (DASFAA),
pp. 509-521, 2007.
[60] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, “INTRODUCTION
TO ALGORITHMS,” MIT Press, McGraw-Hill
Book Company.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2010-08-25起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2011-08-25起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw