ITC-MNP: a diverse dataset for image file fragment classification

被引:0
|
作者
Tavassoli, Behnam [1 ]
Naghshbandi, Zhino [1 ]
Teimouri, Mehdi [1 ]
机构
[1] Univ Tehran, Informat Theory & Coding ITC Lab, Tehran, Iran
关键词
File fragment classification; File type identification; Image file fragment; Dataset;
D O I
10.1186/s13104-024-07034-w
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
ObjectivesImage file fragment classification is a critical area of study in digital forensics. However, many publicly available datasets in this field are derived from a single source, often lacking consideration of the diversity in image settings and content. To demonstrate the effectiveness of a given methodology, it is essential to evaluate it using datasets that are sampled from varied data sources. Therefore, providing a sufficiently diverse dataset is crucial to enable a realistic assessment of any proposed method.Data descriptionThe dataset includes image file fragments of 4096 bytes from five formats (JPG, BMP, GIF, PNG, and TIFF), each processed with different conversion settings. The source images are categorized into three content types: Nature, People, and Medical. In total, the dataset contains 501,000 fragments. These fragments consist of file headers and incomplete end-of-file fragments, completed with random bytes to approximate how operating systems handle data when file sizes are not multiples of the sector size. This approach aims to simulate typical scenarios where fragments are recovered from a hard drive, though it may not capture all real-world complexities such as data corruption and complex file structures.
引用
收藏
页数:3
相关论文
共 50 条
  • [31] A Dataset for Breast Cancer Histopathological Image Classification
    Spanhol, Fabio A.
    Oliveira, Luiz S.
    Petitjean, Caroline
    Heutte, Laurent
    IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2016, 63 (07) : 1455 - 1462
  • [32] Small but Diverse SEM Image Dataset: Impact of Image Augmentation on the Performance of AlexNet
    Shariff, Khairul Khaizi Mohd
    Abdullah, Noor Ezan
    Abd Al-Misreb, Ali
    Jahidin, Aisyah Hartini
    Ali, Megat Syahirul Amin Megat
    Yassin, Ahmad Ihsan Mohd
    TEM JOURNAL-TECHNOLOGY EDUCATION MANAGEMENT INFORMATICS, 2023, 12 (02): : 883 - 889
  • [33] SDFC dataset: a large-scale benchmark dataset for hyperspectral image classification
    Sun, Liwei
    Zhang, Junjie
    Li, Jia
    Wang, Yueming
    Zeng, Dan
    OPTICAL AND QUANTUM ELECTRONICS, 2023, 55 (02)
  • [34] SDFC dataset: a large-scale benchmark dataset for hyperspectral image classification
    Liwei Sun
    Junjie Zhang
    Jia Li
    Yueming Wang
    Dan Zeng
    Optical and Quantum Electronics, 2023, 55
  • [35] Information Security: Machine Learning Experiments to Solve the File Fragment Classification Problem
    Wilgenbus, Erich
    Kruger, Hennie
    du Toit, Tiny
    PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON CYBER WARFARE AND SECURITY (ICCWS-2015), 2015, : 390 - 398
  • [36] ITC-net-audio-5: an audio streaming dataset for application identification in network traffic classification
    Nikbakht, Mohammad
    Teimouri, Mehdi
    BMC RESEARCH NOTES, 2024, 17 (01)
  • [37] ITC-net-audio-5: an audio streaming dataset for application identification in network traffic classification
    Mohammad Nikbakht
    Mehdi Teimouri
    BMC Research Notes, 17
  • [38] DataMap: Dataset transferability map for medical image classification
    Du, Xiangtong
    Liu, Zhidong
    Feng, Zunlei
    Deng, Hai
    PATTERN RECOGNITION, 2024, 146
  • [39] Image dataset for classification of diseases in guava fruits and leaves
    Shihab, Montasir Rahman
    Saim, Nafiu Islam
    Mojumdar, Mayen Uddin
    Raza, Dewan Mamun
    Siddiquee, Shah Md Tanvir
    Noori, Sheak Rashed Haider
    Chakraborty, Narayan Ranjan
    DATA IN BRIEF, 2025, 59
  • [40] Investigating and Suggesting the Evaluation Dataset for Image Classification Model
    Sivamani, Saraswathi
    Chon, Sun Il
    Choi, Do Yeon
    Park, Ji Hwan
    IEEE ACCESS, 2020, 8 (08): : 173599 - 173608