Bibliographic coupling is a term coined by Kessler (doi:10.1002/asi.5090140103) in 1963 as a measure of similarity between documents. If two documents, A and B, cite a third, C, then A and B are coupled.
I'm interested in extending this to data, such as DNA sequences and specimens. In part this is because within the challenge dataset I'm finding cases where authors cite data, but not the paper publishing the data. For example, a paper may list all the DNA sequences in uses (thus citing the original data), but not the paper providing the data.
To make this concrete, the paper "Towards a phylogenetic framework for the evolution of shakes, rattles, and rolls in Myiarchus tyrant-flycatchers (Aves: Passeriformes: Tyrannidae)" doi:10.1016/S1055-7903(03)00259-8 lists the sequences used, but does not cite the source of three of these (which is the Science paper "Nonequilibrium Diversity Dynamics of the Lesser Antillean Avifauna" (doi:10.1126/science.1065005). As a result, if I was reading "Nonequilibrium Diversity Dynamics of the Lesser Antillean Avifauna" and wanted to learn who had cited it I would miss the fact that paper "Towards a phylogenetic framework for the evolution of shakes, rattles, and rolls..." had used the data (and hence, in effect, "cited" the paper). In some cases, data citation may be more relevant than bibliographic citation because it relates to people using the data, which seems a more significant action than simply reading the paper.
Note that I'm not interested in the issue of credit as such. In the above example, the authors of the Science paper are also coauthors of the "shakes, rattles, and rolls" paper, and hence show commendable restrain in not citing themselves. I'm interested in the fate of the data. Who has used it? What have they done with it? Has anybody challenged the data (for example, suggesting a sequence was misindentified)? These are the things that a true "web of data" could tell us.