系統識別號 U0026-0408201710540100
論文名稱(中文) 群眾智慧的真實性推論與最佳化的利潤回報:從宏觀分配到微觀最佳化
論文名稱(英文) Truth Inference from Crowdsourcing with Optimal Return of Interest:A Strategic Macro Assignment and Micro Optimization Paradigm
校院名稱 成功大學
系所名稱(中) 資訊工程學系
系所名稱(英) Institute of Computer Science and Information Engineering
學年度 105
學期 2
出版年 106
研究生(中文) 楊博安
研究生(英文) Po-An Yang
電子信箱 yangpoan@gmail.com
學號 P76044164
學位類別 碩士
語文別 英文
論文頁數 33頁
口試委員 指導教授-莊坤達
中文關鍵字 群眾外包  最佳化  動態規劃 
英文關鍵字 Crowdsourcing  Optimization  Dynamic Programming 
中文摘要 在巨量資料時代的今天,使用者每天都會在各種不同的應用上製造大量的資料,可是大多數的新資訊,都無法在現今的知識庫上被系統化的取回。像是社群網路每天都會有許多新的 hashtag 出現,這些 hashtag 是機器學習還不知道怎麼做分類,但卻是很有價值的資訊,因此我們需要一個可靠的分類標籤方法。群眾外包平台提供了不一樣的選擇,讓我們可以透過群眾外包平台將新的 hashtag 做分類,在這篇論文中,我們在如何分配資源給群眾外包的問題中,考慮了任務的重要性,稱作 Return of Interest (RoI) , RoI 對商業應用來說是很有意義的,但也因此產生新的挑戰。像是一個比較常見的資源分配方法,只會考慮單一目標最佳化,並無法直接應用於我們的問題中。在這篇論文裡,我們提出了兩階段的架構,宏觀分配 Macro-Assignment 以及微觀最佳化 Micro-Optimization (MAMO) ,這個架構同時考慮資源分配以及每個任務的重要性 (RoI) 。為了更貼近真實性,我們採用的群眾外包平台真正的做法,將群眾分成不同的群體,每個群體有各自的品質以及費用。在有限的資金內,我們證實了將任務發包給不同群體的群眾,並且要獲得最大期望 RoI 是一個 NP-hard 的難題,我們提出了動態規劃的策略可以有效解決這個議題,在實驗的結果中,我們也展示了動態規劃策略比貪婪演算法更好。
英文摘要 In the big data era, the flourishing development of Internet services brings a lot of user generated data, in which most new information cannot be systematically retrieved by current knowledge bases. For example, a dramatic number of new hashtags appear in the social media every day, resulting in much unknown but valuable knowledge that requires reliable category/attribute labeling strategies. The crowdsourcing platform provides an effective tool to leverage opinions from the Internet crowd. In this thesis, we propose incorporating varied task importance, called Return of Interest (RoI), into resource allocation in crowdsourcing. The awareness of RoI is important in the business sense, but it introduces new challenges. For instance, a commonly applied strategy called single-objective optimization, cannot be naively applied to achieve desirable results. In this thesis, we propose a two-phase framework, called Macro-Assignment and Micro-Optimization (MAMO), to simultaneously consider the issue of budget allocation and the chance of iteratively obtaining RoI. To the better practicability, we apply the real practice in crowdsourcing platforms where workers are divided into diverse pools corresponding to their quality and cost. With the fixed budget, we prove that worker allocation to diverse pools for the best expectation of RoI in return is a NP-hard challenge. We propose a Dynamic-Programming strategy to resolve the issue effectively. As shown in our experimental results, we demonstrate that the DP-based strategy can significantly outperform the baseline greedy approaches, also indicating its feasibility to be deployed as the standard component for budget allocation in crowdsourcing.
論文目次 中文摘要 (i)
Abstract (ii)
Contents (iii)
List of Tables (iv)
List of Figures (v)
1 Introduction (1)
2 Related Work (5)
3 Macro-Assignment-Micro-Optimization Framework (7)
3.1 Framework Overview (7)
3.2 Problem Definition (8)
3.3 Macro Assignment (MA) Strategies (12)
4 Micro Optimization Algorithms (16)
4.1 Greedy Algorithm based on RoI (Greedy-R) (16)
4.2 Greedy Algorithm based on Pre-confidence and RoI (Greedy-PR) (16)
4.3 Dynamic Programming (17)
5 Experimental Results (20)
5.1 Dataset Description and Experimental Setup (20)
5.2 Comparison between Different Scenarios (21)
6 Conclusions (29)
Bibliography (30)
參考文獻 [1] N. Vesdapunt, K. Bellare, and N. Dalvi, “Crowdsourcing algorithms for entity resolution,”Proc. VLDB Endow., vol. 7, no. 12, pp. 1071–1082, Aug. 2014.
[2] X. Liu, M. Lu, B. C. Ooi, Y. Shen, S. Wu, and M. Zhang, “Cdas: A crowdsourcing dataanalytics system,”Proc. VLDB Endow., vol. 5, no. 10, pp. 1040–1051, Jun. 2012.[3] A. Doan, R. Ramakrishnan, and A. Y. Halevy, “Crowdsourcing systems on the world-wideweb,”Commun. ACM, vol. 54, no. 4, pp. 86–96, Apr. 2011.
[4] “Amazon mechanical turk (AMT).” https://www.mturk.com/.
[5] G. Kazai, J. Kamps, and N. Milic-Frayling, “The face of quality in crowdsourcing relevancelabels: Demographics, personality and labeling accuracy,” inProceedings of the 21st ACMInternational Conference on Information and Knowledge Management. New York, NY,USA: ACM, 2012, pp. 2583–2586.
[6] D. R. Karger, S. Oh, and D. Shah, “Budget-optimal task allocation for reliable crowd-sourcing systems,”Oper. Res., vol. 62, no. 1, pp. 1–24, Feb. 2014.
[7] X. Chen, Q. Lin, and D. Zhou, “Optimistic knowledge gradient policy for optimal bud-get allocation in crowdsourcing,” inProceedings of the 30th International Conference onInternational Conference on Machine Learning - Volume 28.JMLR.org, 2013, pp. III–64–III–72.
[8] J. Gao, X. Liu, B. C. Ooi, H. Wang, and G. Chen, “An online cost sensitive decision-making method in crowdsourcing systems,” inProceedings of the 2013 ACM SIGMODInternational Conference on Management of Data.New York, NY, USA: ACM, 2013,pp. 217–228.
[9] L. Mo, R. Cheng, B. Kao, X. S. Yang, C. Ren, S. Lei, D. W. Cheung, and E. Lo, “Opti-mizing plurality for human intelligence tasks,” in22nd ACM International Conference onInformation and Knowledge Management, CIKM’13, San Francisco, CA, USA, October27 - November 1, 2013, 2013, pp. 1929–1938.
[10] C.-J. Ho, S. Jabbari, and J. W. Vaughan, “Adaptive task assignment for crowdsourced clas-sification,” inProceedings of the 30th International Conference on International Conferenceon Machine Learning - Volume 28, ser. ICML’13. JMLR.org, 2013, pp. I–534–I–542.
[11] C.-J. Ho and J. W. Vaughan, “Online task assignment in crowdsourcing markets,” inProceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, ser. AAAI’12.AAAI Press, 2012, pp. 45–51. [Online]. Available: http://dl.acm.org/citation.cfm?id=2900728.2900735
[12] D. R. Karger, S. Oh, and D. Shah, “Iterative learning for reliable crowdsourcing systems,”inProceedings of the 24th International Conference on Neural Information Processing Sys-tems. USA: Curran Associates Inc., 2011, pp. 1953–1961.
[13] M. Fang, J. Yin, and D. Tao, “Active learning for crowdsourcing using knowledge transfer,”inProceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence. AAAIPress, 2014, pp. 1809–1815.
[14] B. Mozafari, P. Sarkar, M. Franklin, M. Jordan, and S. Madden, “Scaling up crowd-sourcing to very large datasets: A case for active learning,”Proc. VLDB Endow., vol. 8,no. 2, pp. 125–136, Oct. 2014.
[15] Y. Yan, R. Rosales, G. Fung, and J. G. Dy, “Active learning from crowds,” inProceed-ings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue,Washington, USA, June 28 - July 2, 2011, 2011, pp. 1161–1168.
[16] J. Zhong, K. Tang, and Z.-H. Zhou, “Active learning from crowds with unsure option,” inProceedings of the 24th International Conference on Artificial Intelligence. AAAI Press,2015, pp. 1061–1067.
[17] P. Dawid, A. M. Skene, A. P. Dawidt, and A. M. Skene, “Maximum likelihood estimationof observer error-rates using the em algorithm,”Applied Statistics, pp. 20–28, 1979.
[18] P. G. Ipeirotis, F. Provost, and J. Wang, “Quality management on amazon mechanicalturk,” inProceedings of the ACM SIGKDD Workshop on Human Computation.NewYork, NY, USA: ACM, 2010, pp. 64–67.
[19] V. C. Raykar, S. Yu, L. H. Zhao, A. Jerebko, C. Florin, G. H. Valadez, L. Bogoni, andL. Moy, “Supervised learning from multiple experts: Whom to trust when everyone liesa bit,” inProceedings of the 26th Annual International Conference on Machine Learning.New York, NY, USA: ACM, 2009, pp. 889–896.
[20] M. Venanzi, J. Guiver, G. Kazai, P. Kohli, and M. Shokouhi, “Community-based bayesianaggregation models for crowdsourcing,” inProceedings of the 23rd International Conferenceon World Wide Web. New York, NY, USA: ACM, 2014, pp. 155–164.
[21] J. Whitehill, P. Ruvolo, T. Wu, J. Bergsma, and J. Movellan, “Whose vote should countmore: Optimal integration of labels from labelers of unknown expertise,” inProceedingsof the 22Nd International Conference on Neural Information Processing Systems. USA:Curran Associates Inc., 2009, pp. 2035–2043.
[22] P. Welinder, S. Branson, S. Belongie, and P. Perona, “The multidimensional wisdom ofcrowds,” inProceedings of the 23rd International Conference on Neural Information Pro-cessing Systems. USA: Curran Associates Inc., 2010, pp. 2424–2432.
[23] V. C. Raykar, S. Yu, L. H. Zhao, G. H. Valadez, C. Florin, L. Bogoni, and L. Moy,“Learning from crowds,”J. Mach. Learn. Res., vol. 11, pp. 1297–1322, Aug. 2010.
[24] Q. Li, Y. Li, J. Gao, L. Su, B. Zhao, M. Demirbas, W. Fan, and J. Han, “A confidence-aware approach for truth discovery on long-tail data,”Proc. VLDB Endow., vol. 8, no. 4,pp. 425–436, Dec. 2014.
[25] M. Joglekar, H. Garcia-Molina, and A. Parameswaran, “Evaluating the crowd with confi-dence,” inProceedings of the 19th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining. New York, NY, USA: ACM, 2013, pp. 686–694.
[26] X. Zhang, G. Li, and J. Feng, “Crowdsourced top-k algorithms: An experimental evalua-tion,”Proc. VLDB Endow., vol. 9, no. 8, pp. 612–623, Apr. 2016.
[27] C. C. Cao, J. She, Y. Tong, and L. Chen, “Whom to ask?: Jury selection for decisionmaking tasks on micro-blog services,”Proc. VLDB Endow., vol. 5, no. 11, pp. 1495–1506,Jul. 2012.
[28] L. I. Kuncheva, C. J. Whitaker, C. A. Shipp, and R. P. W. Duin, “Limits on the majorityvote accuracy in classifier fusion,”Pattern Anal. Appl., vol. 6, no. 1, pp. 22–31, 2003.
[29] N. Littlestone and M. K. Warmuth, “The weighted majority algorithm,”Inf. Comput.,vol. 108, no. 2, pp. 212–261, Feb. 1994.
[30] Y. Zheng, J. Wang, G. Li, R. Cheng, and J. Feng, “Qasca: A quality-aware task assign-ment system for crowdsourcing applications,” inProceedings of the 2015 ACM SIGMODInternational Conference on Management of Data.New York, NY, USA: ACM, 2015,pp. 1031–1046.
[31] H. Kellerer, U. Pferschy, and D. Pisinger,Knapsack problems. Springer, 2004.
[32] “Crowdsourcing datasets.” http://dbgroup.cs.tsinghua.edu.cn/ligl/crowddata
  • 同意授權校內瀏覽/列印電子全文服務,於2018-07-07起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2018-07-07起公開。

  • 如您有疑問,請聯絡圖書館