2013-07-19

How to automatically create Venn diagram for five sets

Wikipedia has a very nice Venn diagram for five sets. I use this svg file to automatically create a Venn diagram displaying the number of intersecting elements of five sets.

Number of intersecting elements

For this we need the number of intersecting elements for all possible set combinations, in my example it looks like this:

numbers.txt
A 13644
B 14729
C 14690
D 13725
E 13742
AB 3689
AC 3616
AD 3523
AE 3496
BC 14281
BD 12852
BE 12694
CD 13215
CE 13060
DE 13563
ABC 3609
ABD 3507
ABE 3480
ACD 3513
ACE 3487
ADE 3496
BCD 12849
BCE 12694
BDE 12694
CDE 13056
ABCD 3506
ABCE 3480
ABDE 3480
ACDE 3487
BCDE 12694
ABCDE 3480

To get these numbers I used a small R script and formatted the output with a little bit of TextWrangler and grep to look like in the file numbers.txt

Here is the short R script:

A <- read.delim('Info1.csv')$name
B <- read.delim('Info2.csv')$name
C <- read.delim('Info3.csv')$name
D <- read.delim('Info4.csv')$name
E <- read.delim('Info5.csv')$name


# all sets
print(paste("A",length(A)))
print(paste("B",length(B)))
print(paste("C",length(C)))
print(paste("D",length(D)))
print(paste("E",length(E)))

# all combinations of two sets
print(paste("A, B", length(intersect(A, B))))
print(paste("A, C", length(intersect(A, C))))
print(paste("A, D", length(intersect(A, D))))
print(paste("A, E", length(intersect(A, E))))
print(paste("B, C", length(intersect(B, C))))
print(paste("B, D", length(intersect(B, D))))
print(paste("B, E", length(intersect(B, E))))
print(paste("C, D", length(intersect(C, D))))
print(paste("C, E", length(intersect(C, E))))
print(paste("D, E", length(intersect(D, E))))
# all combinations of three sets
print(paste("A, B, C", length(intersect(A, intersect(B, C)))))
print(paste("A, B, D", length(intersect(A, intersect(B, D)))))
print(paste("A, B, E", length(intersect(A, intersect(B, E)))))
print(paste("A, C, D", length(intersect(A, intersect(C, D)))))
print(paste("A, C, E", length(intersect(A, intersect(C, E)))))
print(paste("A, D, E", length(intersect(A, intersect(D, E)))))
print(paste("B, C, D", length(intersect(B, intersect(C, D)))))
print(paste("B, C, E", length(intersect(B, intersect(C, E)))))
print(paste("B, D, E", length(intersect(B, intersect(D, E)))))
print(paste("C, D, E", length(intersect(C, intersect(D, E)))))
# all combinations of four sets
print(paste("A, B, C, D", length(intersect(A, intersect(B, intersect(C, D))))))
print(paste("A, B, C, E", length(intersect(A, intersect(B, intersect(C, E))))))
print(paste("A, B, D, E", length(intersect(A, intersect(B, intersect(D, E))))))
print(paste("A, C, D, E", length(intersect(A, intersect(C, intersect(D, E))))))
print(paste("B, C, D, E", length(intersect(B, intersect(C, intersect(D, E))))))
# combination of five sets
print(paste("ABCDE", length(intersect(A, intersect(B, intersect(C, intersect(D, E)))))))

Getting these numbers into the diagram

Since the svg file only consists of 51 lines, I display it here:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="746" height="742" viewBox="-362 -388 746 742">
 <title>Radially-symmetrical Five-set Venn Diagram</title>
 <desc>Devised by Branko Gruenbaum and rendered by CMG Lee.</desc>
 <defs>
  <ellipse id="ellipse" cx="36" cy="-56" rx="160" ry="320" />
  <g id="ellipses">
   <use xlink:href="#ellipse" fill="#0000ff" />
   <use xlink:href="#ellipse" fill="#0099ff" transform="rotate(72)" />
   <use xlink:href="#ellipse" fill="#00cc00" transform="rotate(144)" />
   <use xlink:href="#ellipse" fill="#cc9900" transform="rotate(216)" />
   <use xlink:href="#ellipse" fill="#ff0000" transform="rotate(288)" />
  </g>
 </defs>
 <use xlink:href="#ellipses" fill-opacity="0.3" />
 <use xlink:href="#ellipses" fill-opacity="0" stroke="#000" stroke-width="2" />
 <g text-anchor="middle" font-family="sans-serif" font-size="16">
  <text x="30"   y="-300" dy="0.7ex" font-size="64">A</text>
  <text x="300"  y="-60"  dy="0.7ex" font-size="64">B</text>
  <text x="160"  y="280"  dy="0.7ex" font-size="64">C</text>
  <text x="-220" y="220"  dy="0.7ex" font-size="64">D</text>
  <text x="-280" y="-130" dy="0.7ex" font-size="64">E</text>
  <text x="180"  y="-130" dy="0.7ex">AB</text>
  <text x="40"   y="230"  dy="0.7ex">AC</text>
  <text x="100"  y="-200" dy="0.7ex">AD</text>
  <text x="-80"  y="-215" dy="0.7ex">AE</text>
  <text x="190"  y="125"  dy="0.7ex">BC</text>
  <text x="-190" y="120"  dy="0.7ex">BD</text>
  <text x="230"  y="40"   dy="0.7ex">BE</text>
  <text x="-60"  y="220"  dy="0.7ex">CD</text>
  <text x="-170" y="-150" dy="0.7ex">CE</text>
  <text x="-222" y="0"    dy="0.7ex">DE</text>
  <text x="90"   y="150"  dy="0.7ex">ABC</text>
  <text x="148"  y="-153" dy="0.7ex" font-size="14">ABD</text>
  <text x="170"  y="-20"  dy="0.7ex">ABE</text>
  <text x="-33"  y="208"  dy="0.7ex" font-size="14">ACD</text>
  <text x="-93"  y="-193" dy="0.7ex" font-size="14">ACE</text>
  <text x="20"   y="-180" dy="0.7ex">ADE</text>
  <text x="-120" y="120"  dy="0.7ex">BCD</text>
  <text x="190"  y="100"  dy="0.7ex" font-size="14">BCE</text>
  <text x="-211" y="32"   dy="0.7ex" font-size="14">BDE</text>
  <text x="-150" y="-80"  dy="0.7ex">CDE</text>
  <text x="-30"  y="160"  dy="0.7ex">ABCD</text>
  <text x="140"  y="80"   dy="0.7ex">ABCE</text>
  <text x="120"  y="-100" dy="0.7ex">ABDE</text>
  <text x="-60"  y="-140" dy="0.7ex">ACDE</text>
  <text x="-160" y="20"   dy="0.7ex">BCDE</text>
  <text x="0"    y="0"    dy="0.7ex">ABCDE</text>
 </g>
</svg>

You can see, that the names of the different overlapping fields are annotated as "A", "B", "AB", and so forth. Hence, I can just replace them with sed, to display my own data!

This I do with the following bash/sed command:

while read LINE
do
 NAME=$( echo $LINE | awk '{print $1 }')
 NUMB=$( echo $LINE | awk '{print $2 }')
 echo $NAME $NUMB
 sed -i tmp "s/\>${NAME}\</\>${NUMB}\</g" venn.svg
 echo "sed -i tmp "s/\>\${NAME}\</\>\${NUMB}\</g" venn.svg"
done < 'numbers.txt'

The altered Venn diagram

The output is indeed exactly what I wanted:

Of course, I still need to adapt the font size, but thanks to sed, this won't be a big issue.


1 comment: