ITC-MNP: a diverse dataset for image file fragment classification

被引:0
|
作者
Tavassoli, Behnam [1 ]
Naghshbandi, Zhino [1 ]
Teimouri, Mehdi [1 ]
机构
[1] Univ Tehran, Informat Theory & Coding ITC Lab, Tehran, Iran
关键词
File fragment classification; File type identification; Image file fragment; Dataset;
D O I
10.1186/s13104-024-07034-w
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
ObjectivesImage file fragment classification is a critical area of study in digital forensics. However, many publicly available datasets in this field are derived from a single source, often lacking consideration of the diversity in image settings and content. To demonstrate the effectiveness of a given methodology, it is essential to evaluate it using datasets that are sampled from varied data sources. Therefore, providing a sufficiently diverse dataset is crucial to enable a realistic assessment of any proposed method.Data descriptionThe dataset includes image file fragments of 4096 bytes from five formats (JPG, BMP, GIF, PNG, and TIFF), each processed with different conversion settings. The source images are categorized into three content types: Nature, People, and Medical. In total, the dataset contains 501,000 fragments. These fragments consist of file headers and incomplete end-of-file fragments, completed with random bytes to approximate how operating systems handle data when file sizes are not multiples of the sector size. This approach aims to simulate typical scenarios where fragments are recovered from a hard drive, though it may not capture all real-world complexities such as data corruption and complex file structures.
引用
收藏
页数:3
相关论文
共 50 条
  • [1] Dataset for file fragment classification of image file formats
    Fakouri, Reyhane
    Teimouri, Mehdi
    BMC RESEARCH NOTES, 2019, 12 (01)
  • [2] Dataset for file fragment classification of image file formats
    Reyhane Fakouri
    Mehdi Teimouri
    BMC Research Notes, 12
  • [3] Dataset for file fragment classification of audio file formats
    Atieh Khodadadi
    Mehdi Teimouri
    BMC Research Notes, 12
  • [4] Dataset for file fragment classification of audio file formats
    Fakouri, Reyhane
    Teimouri, Mehdi
    BMC RESEARCH NOTES, 2019, 12 (01)
  • [5] Dataset for file fragment classification of textual file formats
    Fatemeh Mansouri Hanis
    Mehdi Teimouri
    BMC Research Notes, 12
  • [6] Dataset for file fragment classification of textual file formats
    Mansouri Hanis, Fatemeh
    Teimouri, Mehdi
    BMC RESEARCH NOTES, 2019, 12 (01)
  • [7] Dataset for file fragment classification of video file formats
    Sadeghi, Narges
    Fahiminia, Mohadeseh
    Teimouri, Mehdi
    BMC RESEARCH NOTES, 2020, 13 (01)
  • [8] Dataset for file fragment classification of video file formats
    Narges Sadeghi
    Mohadeseh Fahiminia
    Mehdi Teimouri
    BMC Research Notes, 13
  • [9] A File Fragment Classification Method Based on Grayscale Image
    Xu, Tantan
    Xu, Ming
    Ren, Yizhi
    Xu, Jian
    Zhang, Haiping
    Zheng, Ning
    JOURNAL OF COMPUTERS, 2014, 9 (08) : 1863 - 1870
  • [10] ITC-Net-blend-60: a comprehensive dataset for robust network traffic classification in diverse environments
    Bayat, Marziyeh
    Garshasbi, Javad
    Mehdizadeh, Mozhgan
    Nozari, Neda
    Khesal, Abolghasem Rezaei
    Dokhaei, Maryam
    Teimouri, Mehdi
    BMC RESEARCH NOTES, 2024, 17 (01)