ITC-MNP: a diverse dataset for image file fragment classification

被引:0
|
作者
Tavassoli, Behnam [1 ]
Naghshbandi, Zhino [1 ]
Teimouri, Mehdi [1 ]
机构
[1] Univ Tehran, Informat Theory & Coding ITC Lab, Tehran, Iran
关键词
File fragment classification; File type identification; Image file fragment; Dataset;
D O I
10.1186/s13104-024-07034-w
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
ObjectivesImage file fragment classification is a critical area of study in digital forensics. However, many publicly available datasets in this field are derived from a single source, often lacking consideration of the diversity in image settings and content. To demonstrate the effectiveness of a given methodology, it is essential to evaluate it using datasets that are sampled from varied data sources. Therefore, providing a sufficiently diverse dataset is crucial to enable a realistic assessment of any proposed method.Data descriptionThe dataset includes image file fragments of 4096 bytes from five formats (JPG, BMP, GIF, PNG, and TIFF), each processed with different conversion settings. The source images are categorized into three content types: Nature, People, and Medical. In total, the dataset contains 501,000 fragments. These fragments consist of file headers and incomplete end-of-file fragments, completed with random bytes to approximate how operating systems handle data when file sizes are not multiples of the sector size. This approach aims to simulate typical scenarios where fragments are recovered from a hard drive, though it may not capture all real-world complexities such as data corruption and complex file structures.
引用
收藏
页数:3
相关论文
共 50 条
  • [41] DiagSet: a dataset for prostate cancer histopathological image classification
    Koziarski, Michal
    Cyganek, Boguslaw
    Niedziela, Przemyslaw
    Olborski, Boguslaw
    Antosz, Zbigniew
    Zydak, Marcin
    Kwolek, Bogdan
    Wasowicz, Pawel
    Bukala, Andrzej
    Swadzba, Jakub
    Sitkowski, Piotr
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [42] Evaluation of Deep Learning on an Abstract Image Classification Dataset
    Stabinger, Sebastian
    Rodriguez-Sanchez, Antonio
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, : 2767 - 2772
  • [43] Nitrogen deficiency in maize: Annotated image classification dataset
    Salaic, Miroslav
    Novoselnik, Filip
    Zarko, Ivana Podnar
    Galic, Vlatko
    DATA IN BRIEF, 2023, 50
  • [44] VAID: An Aerial Image Dataset for Vehicle Detection and Classification
    Lin, Huei-Yung
    Tu, Kai-Chun
    Li, Chih-Yi
    IEEE ACCESS, 2020, 8 : 212209 - 212219
  • [45] FruitSeg30_Segmentation dataset & mask annotations: A novel dataset for diverse fruit segmentation and classification
    Shamrat, F. M. Javed Mehedi
    Shakil, Rashiduzzaman
    Idris, Mohd Yamani Idna
    Akter, Bonna
    Zhou, Xujuan
    DATA IN BRIEF, 2024, 56
  • [46] SDFD: Building a Versatile Synthetic Face Image Dataset with Diverse Attributes
    Baltsou, Georgia
    Sarridis, Ioannis
    Koutlis, Christos
    Papadopoulos, Symeon
    2024 IEEE 18TH INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION, FG 2024, 2024,
  • [47] Sparse Coding for N-Gram Feature Extraction and Training for File Fragment Classification
    Wang, Felix
    Quach, Tu-Thach
    Wheeler, Jason
    Aimone, James B.
    James, Conrad D.
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2018, 13 (10) : 2553 - 2562
  • [48] File Fragment Type Classification Using Light-Weight Convolutional Neural Networks
    Felemban, Muhamad
    Ghaleb, Mustafa
    Saaim, Kunwar
    Alsaleh, Saleh
    Almulhem, Ahmad
    IEEE ACCESS, 2024, 12 : 157179 - 157191
  • [49] XMP: A CROSS-ATTENTION MULTI-SCALE PERFORMER FOR FILE FRAGMENT CLASSIFICATION
    Park, Jeong Gyu
    Liu, Sisung
    Hong, Je Hyeong
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 4505 - 4509
  • [50] Comparative evaluation of text classification techniques using a large diverse Arabic dataset
    Mohammad S. Khorsheed
    Abdulmohsen O. Al-Thubaity
    Language Resources and Evaluation, 2013, 47 : 513 - 538