On issues concerning the assessment of information contained in aggregate data using the F-statistic

被引:0
|
作者
Cheema, S. A. [1 ]
Beh, E. J. [1 ]
Hudson, I. L. [1 ]
机构
[1] Univ Newcastle, Sch Math & Phys Sci, Callaghan, NSW 2308, Australia
关键词
Aggregate data; Aggregate Association index; Selikoff's data; WORLD OCCUPATIONAL EPIDEMIOLOGY; ECOLOGICAL INFERENCE; MARGINAL TOTALS;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The analysis of aggregate data has been gaining momentum in the statistics and allied disciplines, (including public policy, political science and epidemiology) for more than 20 years. As a result, the issue has received an increasing amount of attention by categorical data analysts. Performing aggregate data analysis is quickly becoming unavoidable in many situations, especially when individual level data is unavailable. For example, the U.S. Justice Department uses aggregate data to formulate the public policies against racial discrimination, political scientists are always interested in exploring the political or ideological preferences of different demographic groups while social scientists use aggregate data to study the relationship between crime and unemployment. The availability of aggregate data has increased due to strict confidentiality restrictions imposed upon by government and corporate organisations who are reluctant to release individual level information. There is a wealth of contributions on this issue that is available in the ecological inference (EI) literature which considers the association structure between categorical variables (at the individual level) given only the aggregate information. The main difficulty in EI arises due to the loss of information during the process of aggregation and results in aggregation bias. It is also a matter of concern for aggregate data analysts that the interpretation of the parameters from EI models might be entirely different to analogous parameters for the study of individual level data. An alternative strategy to EI is to consider the recently proposed Aggregate Association Index (AAI) that allows the analyst to quantify the overall extent of association between two dichotomous variables given only the aggregate, or marginal, information of a 2x2 table. Unlike EI, the AAI does not estimate, or model, the conditional proportions but focuses instead on gauging the extent of association between the variables. The AAI can also be further partition into positive and negative association terms thus enabling the analysts to understand the more likely direction of the association given only the aggregate data. However, the major issue with the performance of AAI is the impact the sample size has on its magnitude. In this paper we investigate the informativeness of the aggregate data for inferring an association exists between the variables of a 2x2 table. This article introduces development of an F-test to determine the statistical significance of the information contained in the aggregate data for inferring a statistically significant association between the variables. Unlike Pearson's chi-squared statistic, the F-statistic is robust to any change in the sample size and depends only on the aggregate information in the contingency table. Thus this statistic provides an opportunity to understand the structure of a 2x2 table without being influenced by sample size. The applicability of this test is demonstrated by using the Selikoff's (1981) asbestosis data which was collected from 1117 insulation workers of New York City in 1963 to explore the links between asbestosis and occupational exposure to asbestos fibres. Such work was the key to establishing the link between asbestosis and mesothelioma. As a result of findings of this nature, many international government organisations have now banned the production, and importation, of goods that contain asbestosis fibres.
引用
收藏
页码:1966 / 1972
页数:7
相关论文
共 50 条
  • [1] Gravitational Wave Ringdown Analysis Using the F-statistic
    Wang, Hai-Tian
    Yim, Garvin
    Chen, Xian
    Shao, Lijing
    ASTROPHYSICAL JOURNAL, 2024, 974 (02):
  • [2] Feature Selection Using F-statistic Values for EEG Signal Analysis
    Peng, Genchang
    Nourani, Mehrdad
    Harvey, Jay
    Dave, Hina
    42ND ANNUAL INTERNATIONAL CONFERENCES OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY: ENABLING INNOVATIVE TECHNOLOGIES FOR GLOBAL HEALTHCARE EMBC'20, 2020, : 5963 - 5966
  • [3] Human Movement Analysis: Extension of the F-Statistic to Time Series using HMM
    Karg, Michelle
    Seiberl, Wolfgang
    Hoey, Jesse
    Kulic, Dana
    2013 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2013), 2013, : 3870 - 3875
  • [4] Reanalyzing the ringdown signal of GW150914 using the F-statistic method
    Wang, Hai-Tian
    Wang, Ziming
    Dong, Yiming
    Yim, Garvin
    Shao, Lijing
    PHYSICAL REVIEW D, 2025, 111 (06)
  • [5] Searching for galactic white-dwarf binaries in mock LISA data using an F-statistic template bank
    Whelan, John T.
    Prix, Reinhard
    Khurana, Deepak
    CLASSICAL AND QUANTUM GRAVITY, 2010, 27 (05)
  • [6] STOCHASTIC BOUNDS OF THE F-STATISTIC WHEN DATA ARE INCOMPLETE ONE-WAY CLASSIFICATION
    PASTERNACK, BS
    OGAWA, J
    BULLETIN OF THE INTERNATIONAL STATISTICAL INSTITUTE, 1960, 38 (04): : 189 - 200
  • [7] An improved, "phase-relaxed" F-statistic for gravitational-wave data analysis
    Cutler, Curt
    PHYSICAL REVIEW D, 2012, 86 (06):
  • [8] Using the ANOVA F-Statistic to Isolate Information-Revealing Near-Field Measurement Configurations for Embedded Systems
    Iyer, Vishnuvardhan V.
    Yilmaz, Ali E.
    2021 JOINT IEEE INTERNATIONAL SYMPOSIUM ON ELECTROMAGNETIC COMPATIBILITY, SIGNAL & POWER INTEGRITY, AND EMC EUROPE (EMC+SIPI AND EMC EUROPE), 2021, : 1024 - 1029
  • [9] F-statistic search for white-dwarf binaries in the first Mock LISA Data Challenge
    Prix, Reinhard
    Whelan, John T.
    CLASSICAL AND QUANTUM GRAVITY, 2007, 24 (19) : S565 - S574
  • [10] Using the ANOVA F-Statistic to Rapidly Identify Near-Field Vulnerabilities of Cryptographic Modules
    Iyer, Vishnuvardhan V.
    Yilmaz, Ali E.
    2021 IEEE MTT-S INTERNATIONAL MICROWAVE SYMPOSIUM (IMS), 2021, : 112 - 115