神刀安全网

Visualizing the network of Hillary Clinton’s emails

Courtesy of the US State Department we have access to part of Hillary Clinton’s emails. Using graph visualization we will explore this data, focusing not on the content of the emails but on their metadata. Let’s see what kind of information we can uncover about Hillary Clinton’s professional network.

Building a graph of Clinton’s emails with Neo4j

Clinton, then Secretary of States of the United States, had the habit of using a personal server email server to exchange professional emails. When this was revealed, it caused a public controversy .

The data was later the object of a public records request. The State Department reviewed the emails to decide which were too sensitive to be turned over. The rest of the emails were published on a monthly basis as PDF documents. WSJ journalists , Ben Hamner and others have undergone the task of turning it in a more exploitable format. For the purpose of the article we will use a cleansed version of the data prepared by Ryan Boyd . It consists in a single CSV file.

The script below (courtesy of Boyd) turns the CSV file into a Neo4j graph database:

// Creating the graph

USING PERIODIC COMMIT

LOAD CSV WITH HEADERS FROM “https://s3-us-west-2.amazonaws.com/neo4j-datasets-public/Emails-refined.csv” AS line

MERGE (fr:Person {alias: COALESCE(line.MetadataFrom, line.ExtractedFrom, ”)})

MERGE (to:Person {alias: COALESCE(line.MetadataTo, line.ExtractedTo, ”)})

MERGE (em:Email { id: line.Id })

ON CREATE SET em.foia_doc=line.DocNumber, em.subject=line.MetadataSubject, em.to=line.MetadataTo, em.from=line.MetadataFrom, em.text=line.RawText, em.ex_to=line.ExtractedTo, em.ex_from=line.ExtractedFrom

MERGE (to)<-[:TO]-(em)-[:FROM]->(fr)

MERGE (fr)-[r:HAS_EMAILED]->(to)

ON CREATE SET r.count = 1

ON MATCH SET r.count = r.count +1;

MATCH (a:Person)-[r]-(b:Email) WITH a, count(r) as count SET a.count = count;

The result is a graph of 8,278 nodes and 16,335 edges.

In our graph we have 2 types of nodes: persons and emails. Persons are linked to emails by “from” and “to” relationships. In addition, persons are directly linked by a relationship when they have exchanged emails.

Visualizing the network of Hillary Clinton’s emails

Visualizing Clinton’s emails and her professional network

Now that we have prepared the data, we can use Linkurious to start investigating it. First let’s look up Hillary Clinton.

Visualizing the network of Hillary Clinton’s emails

Hillary Clinton.

Time to explore what Clinton is connected to. Instead of visualizing all the 7,945 emails she has sent or received, let’s focus on the people she is connected to.

Visualizing the network of Hillary Clinton’s emails

Who has Hillary Clinton exchanged emails with?

Clinton has exchanged emails with 210 persons. Already there are some interesting things to notice. We have a lot of isolated nodes (nodes which are only connected to Clinton) in the top right corner of the screen. In the bottom we have a group of nodes which are highly interconnected. Among them are Cheryl Mills, former Counselor and Chief of Staff,  and Lona Valmoro, Special Assistant. The people in this group are in contact together and form a community. These are Clinton’s closest professional contacts.

Visualizing the network of Hillary Clinton’s emails

Hillary Clinton’s closest contacts.

In this network, who are the most active persons? Let’s map the size of the nodes to the number of emails sent and received.

Visualizing the network of Hillary Clinton’s emails

Who are the most active participants in the network?

We can see that Cheryl Mills, Huma Abedin and Jake Sullivan are the biggest nodes and thus the most active participants (after Clinton) in the network.

Let’s shift our focus to the isolated nodes. They represent people who exchanged with Clinton but were not involved in her day to day work. For example, Cherie Blair, wife of former British PM Tony Blair, is one of these isolated nodes connected to Clinton.

Visualizing the network of Hillary Clinton’s emails

Clinton and Blair are connected.

When we expand Blair’s connections, we see that she received four emails from Clinton with subjects “Confidential, “Get well soon”, “Sorry to miss you” and “Get well soon”.

Visualizing the network of Hillary Clinton’s emails

Clinton and Blair exchanged 4 emails.

We can select the “Confidential” email and read the content.

Visualizing the network of Hillary Clinton’s emails

A “confidential” email.

We don’t have to look at the content of Clinton’s emails though to learn more about her activity at the State Department. Graph visualization helps us turn the emails’ metadata into a clear view of Clinton’s network. We can identify key people and communities quickly and easily.

Want to explore and understand your graph data? Simply try the demo of Linkurious !

转载本站任何文章请注明:转载至神刀安全网,谢谢神刀安全网 » Visualizing the network of Hillary Clinton’s emails

分享到:更多 ()

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
分享按钮