Diatom communities preserved in sediment samples are valuable indicators for understanding the past and present dynamics of phytoplankton communities, and their response to environmental changes. These studies are traditionally achieved by counting methods using optical microscopy, a time-consuming process that requires taxonomic expertise. With the advent of automated image acquisition workflows, large image data sets can now be acquired, but require efficient preprocessing methods. Detecting diatom frustules on microscope images is a challenge due to their low relief, diverse shapes, and tendency to aggregate, which prevent the use of traditional thresholding techniques. Deep learning algorithms have the potential to resolve these challenges, more particularly for the task of object detection. Here we explore the use of a Faster Region-based Convolutional Neural Network model to detect siliceous biominerals, including diatoms, in microscope images of a sediment trap series from the Mediterranean Sea. Our workflow demonstrates promising results, achieving a precision score of 0.72 and a recall score of 0.74 when applied to a test set of Mediterranean diatom images. Our model performance decreases when used to detect fragments of these microfossils; it also decreases when particles are aggregated or when images are out of focus. Microfossil detection remains high when the model is used on a microscope image set of sediments from a different oceanic basin, demonstrating its potential for application in a wide range of contemporary and paleoenvironmental studies. This automated method provides a valuable tool for analyzing complex samples, particularly for rare species under-represented in training data sets. Microfossils preserved in ocean sediments are studied to explore the impact of climate change on planktonic communities. The usual way to count these microfossils is slow and requires an expert to identify them on microscope images. In this study, we explore how artificial intelligence can be used on microscope images to detect the microfossils produced by one particular group, diatoms. Our results show that models can be trained to identify these objects, including the ones that were not specifically shown to the model during the training phase. However, the quality of the microscope image, and of the sample preparation beforehand, can affect how well the model works. This new protocol has good potential to be used on diatom images differing in age and geographical origins. Adopting this method could make it possible to rapidly increase the temporal resolution and spatial extent of existing data on diatom diversity, which could thus improve our knowledge of plankton resilience to climate change. Faster Region-based Convolutional Neural Network models are efficient at detecting marine diatom frustules on microscope slide images These models can be applied to detect the diatoms on images from a diverse array of environmental conditions Adequate sample preparation and image quality enhance model results