In this work, we investigate neural networks as visual relationship classifiers for precision-constrained applications in partially annotated datasets. The classifier is a convolutional neural network, which we benchmark on three visual relationship datasets. We discuss the effect of partial annotation on precision and why precision-based metrics are not adequate in partial annotation cases. So far, this topic has not been explored in the context of visual relationship classification. We introduce a threshold tuning method that imposes a soft constraint on precision while being less sensitive to the degree of annotation than a regular precision-recall trade-off method. Performance can then be measured via recall of predictions computed with thresholds tuned by the proposed method. Our previously introduced negative sample mining method is now extended to partially annotated datasets (namely Visual Relationship Detection, VRD, and Visual Genome, VG), by sampling from unlabeled pairs instead of unrelated pairs. When thresholds are tuned using our method, negative sample mining improves recall from 24.1%\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$24.1\%$$\end{document} to 30.6%\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$30.6\%$$\end{document} and from 36.7%\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$36.7\%$$\end{document} to 41.3%\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$41.3\%$$\end{document} for VRD and VG, respectively. The neural networks also maintain the ability to correctly classify between predicates. When considering only ground-truth relationships for threshold tuning, there is only a small decrease in recall (from 45.1%\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$45.1\%$$\end{document} to 43.8%\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$43.8\%$$\end{document} in VRD, or from 60.5%\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$60.5\%$$\end{document} to 58.7%\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$58.7\%$$\end{document} in VG) compared to when the neural networks are trained only on ground-truth samples.