We have examined patterns of sequence variability for evidence of linked sequence changes in HIV-1 subtype B protease using translated sequences from protease inhibitor (PI) treated and untreated subjects downloaded from the Stanford HIV RT and Protease Sequence Database (http://hivdb.stanford.edu). The final data set size wits 648 sequences from untreated subjects (notx) and 531 for PI-treated subjects (tx). Each subject was uniquely represented by a single sequence. Mutual information was calculated for all pairwise comparisons of positions with nonconsensus amino acids in at least 5% of sequences; significance of pairwise association was assessed using permutation tests. In addition pairs of positions were assessed for linkage by comparing the observed occurrences of amino acid combinations to expected values. The mutual information statistic indicated linkage between nine pairs of sites in the untreated data set (10:93, 12:19, 35:38, 37:4 1, 62:71, 63:64, 71:77, 71:93, 77:93). Strong statistical support for linkage in the treated data set was seen for 32 pairs, eight involving position 10:7 involving position 71, with the rest being 12:19, 15:77, 20:36, 30:88, 35:36, 35:37, 36:62, 36:77, 46:82, 46:84, 48:54, 48:82, 54:82, 63:64, 63:90, 73:90, 77:93, and 84:90. Most associations were positive, although negative associations were seen for five pairs of interactions. Structural proximity suggests that numerous pairs may interact within a local environment. These interactions include two distinct clusters around 36/77 and 71/93. While some of these interactions may reflect fortuitous linkage in heavily treated subjects with many resistance mutations, others will likely represent important cooperative interactions that are amenable to experimental validation. (C) 2003 Elsevier Inc. All rights reserved.