Number of intersecting elements
For this we need the number of intersecting elements for all possible set combinations, in my example it looks like this:
numbers.txt
A 13644
B 14729
C 14690
D 13725
E 13742
AB 3689
AC 3616
AD 3523
AE 3496
BC 14281
BD 12852
BE 12694
CD 13215
CE 13060
DE 13563
ABC 3609
ABD 3507
ABE 3480
ACD 3513
ACE 3487
ADE 3496
BCD 12849
BCE 12694
BDE 12694
CDE 13056
ABCD 3506
ABCE 3480
ABDE 3480
ACDE 3487
BCDE 12694
ABCDE 3480
To get these numbers I used a small R script and formatted the output with a little bit of TextWrangler and grep to look like in the file numbers.txt
Here is the short R script:
A <- read.delim('Info1.csv')$name
B <- read.delim('Info2.csv')$name
C <- read.delim('Info3.csv')$name
D <- read.delim('Info4.csv')$name
E <- read.delim('Info5.csv')$name
# all sets
print(paste("A",length(A)))
print(paste("B",length(B)))
print(paste("C",length(C)))
print(paste("D",length(D)))
print(paste("E",length(E)))
# all combinations of two sets
print(paste("A, B", length(intersect(A, B))))
print(paste("A, C", length(intersect(A, C))))
print(paste("A, D", length(intersect(A, D))))
print(paste("A, E", length(intersect(A, E))))
print(paste("B, C", length(intersect(B, C))))
print(paste("B, D", length(intersect(B, D))))
print(paste("B, E", length(intersect(B, E))))
print(paste("C, D", length(intersect(C, D))))
print(paste("C, E", length(intersect(C, E))))
print(paste("D, E", length(intersect(D, E))))
# all combinations of three sets
print(paste("A, B, C", length(intersect(A, intersect(B, C)))))
print(paste("A, B, D", length(intersect(A, intersect(B, D)))))
print(paste("A, B, E", length(intersect(A, intersect(B, E)))))
print(paste("A, C, D", length(intersect(A, intersect(C, D)))))
print(paste("A, C, E", length(intersect(A, intersect(C, E)))))
print(paste("A, D, E", length(intersect(A, intersect(D, E)))))
print(paste("B, C, D", length(intersect(B, intersect(C, D)))))
print(paste("B, C, E", length(intersect(B, intersect(C, E)))))
print(paste("B, D, E", length(intersect(B, intersect(D, E)))))
print(paste("C, D, E", length(intersect(C, intersect(D, E)))))
# all combinations of four sets
print(paste("A, B, C, D", length(intersect(A, intersect(B, intersect(C, D))))))
print(paste("A, B, C, E", length(intersect(A, intersect(B, intersect(C, E))))))
print(paste("A, B, D, E", length(intersect(A, intersect(B, intersect(D, E))))))
print(paste("A, C, D, E", length(intersect(A, intersect(C, intersect(D, E))))))
print(paste("B, C, D, E", length(intersect(B, intersect(C, intersect(D, E))))))
# combination of five sets
print(paste("ABCDE", length(intersect(A, intersect(B, intersect(C, intersect(D, E)))))))
Getting these numbers into the diagram
Since the svg file only consists of 51 lines, I display it here:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="746" height="742" viewBox="-362 -388 746 742">
<title>Radially-symmetrical Five-set Venn Diagram</title>
<desc>Devised by Branko Gruenbaum and rendered by CMG Lee.</desc>
<defs>
<ellipse id="ellipse" cx="36" cy="-56" rx="160" ry="320" />
<g id="ellipses">
<use xlink:href="#ellipse" fill="#0000ff" />
<use xlink:href="#ellipse" fill="#0099ff" transform="rotate(72)" />
<use xlink:href="#ellipse" fill="#00cc00" transform="rotate(144)" />
<use xlink:href="#ellipse" fill="#cc9900" transform="rotate(216)" />
<use xlink:href="#ellipse" fill="#ff0000" transform="rotate(288)" />
</g>
</defs>
<use xlink:href="#ellipses" fill-opacity="0.3" />
<use xlink:href="#ellipses" fill-opacity="0" stroke="#000" stroke-width="2" />
<g text-anchor="middle" font-family="sans-serif" font-size="16">
<text x="30" y="-300" dy="0.7ex" font-size="64">A</text>
<text x="300" y="-60" dy="0.7ex" font-size="64">B</text>
<text x="160" y="280" dy="0.7ex" font-size="64">C</text>
<text x="-220" y="220" dy="0.7ex" font-size="64">D</text>
<text x="-280" y="-130" dy="0.7ex" font-size="64">E</text>
<text x="180" y="-130" dy="0.7ex">AB</text>
<text x="40" y="230" dy="0.7ex">AC</text>
<text x="100" y="-200" dy="0.7ex">AD</text>
<text x="-80" y="-215" dy="0.7ex">AE</text>
<text x="190" y="125" dy="0.7ex">BC</text>
<text x="-190" y="120" dy="0.7ex">BD</text>
<text x="230" y="40" dy="0.7ex">BE</text>
<text x="-60" y="220" dy="0.7ex">CD</text>
<text x="-170" y="-150" dy="0.7ex">CE</text>
<text x="-222" y="0" dy="0.7ex">DE</text>
<text x="90" y="150" dy="0.7ex">ABC</text>
<text x="148" y="-153" dy="0.7ex" font-size="14">ABD</text>
<text x="170" y="-20" dy="0.7ex">ABE</text>
<text x="-33" y="208" dy="0.7ex" font-size="14">ACD</text>
<text x="-93" y="-193" dy="0.7ex" font-size="14">ACE</text>
<text x="20" y="-180" dy="0.7ex">ADE</text>
<text x="-120" y="120" dy="0.7ex">BCD</text>
<text x="190" y="100" dy="0.7ex" font-size="14">BCE</text>
<text x="-211" y="32" dy="0.7ex" font-size="14">BDE</text>
<text x="-150" y="-80" dy="0.7ex">CDE</text>
<text x="-30" y="160" dy="0.7ex">ABCD</text>
<text x="140" y="80" dy="0.7ex">ABCE</text>
<text x="120" y="-100" dy="0.7ex">ABDE</text>
<text x="-60" y="-140" dy="0.7ex">ACDE</text>
<text x="-160" y="20" dy="0.7ex">BCDE</text>
<text x="0" y="0" dy="0.7ex">ABCDE</text>
</g>
</svg>
You can see, that the names of the different overlapping fields are annotated as "A", "B", "AB", and so forth. Hence, I can just replace them with sed, to display my own data!
This I do with the following bash/sed command:
while read LINE
do
NAME=$( echo $LINE | awk '{print $1 }')
NUMB=$( echo $LINE | awk '{print $2 }')
echo $NAME $NUMB
sed -i tmp "s/\>${NAME}\</\>${NUMB}\</g" venn.svg
echo "sed -i tmp "s/\>\${NAME}\</\>\${NUMB}\</g" venn.svg"
done < 'numbers.txt'
The altered Venn diagram
The output is indeed exactly what I wanted:
Of course, I still need to adapt the font size, but thanks to sed, this won't be a big issue.
nb
ReplyDelete