Validating large-scale lexical color resources

被引:0
|
作者
Moroney, Nathan [1 ]
Beretta, Giordano [1 ]
机构
[1] Hewlett-Packard Company, HP Laboratories, M/S 1161, 1501 Page Mill Road, Palo Alto, CA 94304, United States
来源
关键词
Visual languages - Automation - Inspection - Rasterization - Statistics - Color - Natural language processing systems;
D O I
暂无
中图分类号
学科分类号
摘要
The use of the Web for crowd-sourcing lexical color resources has succeeded in creating databases consisting of millions of color terms. Various researchers have demonstrated the value of this data, but questions related to the quality and reliability of the data remain, because each large survey is tainted by a small number of disruptive subjects. The challenge is to cull the resource by identifying and eliminating the data contributed by these disruptive subjects. With a million color terms, it is no longer possible to individually inspect color terms and we need an automated process. Machine evaluation through natural language processing is possible, but this introduces the added complexity of pre-defining properties and criteria for data validity, which could improperly cloud the data. Color terms are terms associated with colors. Instead of examining the terms, we can examine their colors. Our visual system can process purely visual information at a much higher bandwidth, because the language system and its complex cognitive processes can be bypassed. In this contribution we propose a graphical approach in which the associated colors of large-scale lexical resources are first machine-sorted by color appearance so that human experts can efficiently identify outliers or questionable entries by simply looking at a graphical rendering. A recent test with the R. Munroe and E. Ellis Color Survey Data has allowed us to process over a million color terms. The methodology is as follows. First, the color terms are binned categorically, where each bin corresponds to a monolexemic color term. Second, for each term the associated red, green and blue sRGB values are further quantized and then these device values are sorted in lexicographical order. Third, the sorted device values are displayed as raster images in which each term is represented by a pixel drawn in the associated color. Finally, observers identify visually the outliers for each color term by inspecting the raster image. Using this procedure, the relatively rare disruptive subjects are efficiently identified and tagged. This process can be extended to multiple experts and a weight can be derived for the entries in the lexical resource. Experiments show that even using such crude appearance attributes as the sRGB values, the methodology is very effective and it is not crucial to use more sophisticated representations, such as for example correlates of hue, lightness and chroma. Based on this methodology, we show that the Munroe and Ellis Color Survey Data correlates well with data obtained in a controlled laboratory experiment. This is a surprising result given the informal nature of this resource. It is also a testimony of the validity of crowd-sourcing for scientific experimentation. © Copyright 2011 Hewlett-Packard Development Company, L.P.
引用
收藏
相关论文
共 50 条
  • [21] Improving Large-scale Language Models and Resources for Filipino
    Cruz, Jan Christian Blaise
    Cheng, Charibeth
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6548 - 6555
  • [22] Large-Scale Uranium Contamination of Groundwater Resources in India
    Coyte, Rachel M.
    Jain, Ratan C.
    Srivastava, Sudhir K.
    Sharma, Kailash C.
    Khalil, Abedalrazq
    Ma, Lin
    Vengosh, Avner
    ENVIRONMENTAL SCIENCE & TECHNOLOGY LETTERS, 2018, 5 (06): : 341 - 347
  • [23] Large-scale disassembly operations planning with parallel resources
    Gokgur, Burak
    Gokce, Mahmut Ali
    Ozpeynirci, Selin
    INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2015, 81 (5-8): : 1195 - 1214
  • [24] Large-scale modelling of groundwater resources in an arid region
    Kalbus, Edda
    Oswald, Sascha
    Wang, Wenqing
    Kolditz, Olaf
    Engelhardt, Irina
    Al-Saud, Mohammed I.
    Rausch, Randolf
    GQ10: GROUNDWATER QUALITY MANAGEMENT IN A RAPIDLY CHANGING WORLD, 2011, 342 : 27 - +
  • [25] Experimental Study of Large-scale Computing on Virtualized Resources
    Martinez, Juan C.
    Wang, Lixi
    Zhao, Ming
    Sadjadi, S. Masoud
    THIRD INTERNATIONAL WORKSHOP ON VIRTUALIZATION TECHNOLOGIES IN DISTRIBUTED COMPUTING (VTDC-09), 2009, : 35 - 41
  • [26] Large-scale development: The future of distributed energy resources
    Henderson, Michael
    IEEE Power and Energy Magazine, 2019, 17 (02): : 4 - 5
  • [27] Large-scale disassembly operations planning with parallel resources
    Burak Gökgür
    Mahmut Ali Gökçe
    Selin Özpeynirci
    The International Journal of Advanced Manufacturing Technology, 2015, 81 : 1195 - 1214
  • [28] COVID-19 and Misinformation: A Large-Scale Lexical Analysis on Twitter
    Antypas, Dimosthenis
    Rogers, David
    Preece, Alun
    Camacho-Collados, Jose
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2021, : 119 - 126
  • [29] A Large-Scale Multi-Lingual Color Thesaurus
    Lindner, Albrecht
    Li, Bryan Zhi
    Bonnier, Nicolas
    Suesstrunk, Sabine
    COLOR SCIENCE AND ENGINEERING SYSTEMS, TECHNOLOGIES, AND APPLICATIONS: TWENTIETH COLOR AND IMAGING CONFERENCE, 2012, : 30 - 35
  • [30] Large-scale reflective optical Janus color materials
    Wu, Biao
    Liu, Zhengqi
    Liu, Xiaoshan
    Liu, Guiqiang
    Tang, Peng
    Yuan, Wen
    Fu, Guolan
    NANOTECHNOLOGY, 2020, 31 (22)