2019-10-29

一般化Freeman指数の計算

はじめに：Freeman指数とは

Freeman指数は社会ネットワークにおけるセグリゲーションの程度を評価するための指標であり、とりわけエッジが対称化された無向ネットワークに対して用いられる。Freeman(1978)によって提案された。

この指数を使用している研究事例としては、例えば、学年によってエスニシティとジェンダーに関するセグリゲーションの程度を比較したShrum et al (1988)等が挙げられる。

基本的な発想は、「ランダムに紐帯が生成された時と比較して、観測されたネットワークにおいてグループを横断する紐帯がどれほど少ないと言えるか」によってセグリゲーションを測るというものである。Freeman指数は0から1までの値を取り、1に近いほど、グループ間のつながりが期待値と比べて少ないことになる。

しかしながら、Freeman(1978)で提出されたオリジナルの指数は、カテゴリを2つのみ持つ社会的次元(e.g. 「男性」と「女性」)でしか使用できないという問題がある。検討したい社会的次元が、「野球部」「サッカー部」「テニス部」のような3つ以上の幅広いカテゴリを持つ場合、この指数はこのままでは使うことができない。

一般化Freeman指数

Bojanowski and Corten(2014)は、Freeman指数を3つ以上の社会的カテゴリを含む社会的次元に関するセグリゲーションに対しても適用できるように改良した一般化Freeman指数(Generalized Freeman’s Index)を開発している．

下式は一般化Freeman指数を定式化したものである．なお，詳しい導出の過程は元論文を参照されたい．

$\displaystyle S = \frac{\pi - p}{\pi} = 1 - \frac{p N (N-1)}{(\sum _ {k=1}^ {K} n _ k)^ 2 - \sum _ {k=1}^ {K} {n _ k}^ 2}$

ここでπは，観測されたネットワークと同じノードの集合を所与とし、紐帯が完全にランダムに生成された仮定した時に発生するグループ間紐帯の全体に占める割合を示し，pは，実際に観測されたネットワークが含むグループ間紐帯の割合を，Nは全てのノード数を，n_kはグループkを構成するノードの数を示している．通常のFreeman指数と異なり、グループ数が多すぎる場合は Sが0を下回る場合もある。

実装

簡単に実装してみたのが以下のコード。属性付きの network オブジェクトを引数に取る。どの属性でFreeman指数を計算するかは、引数 mode で指定する。 igraph オブジェクトが使いたい場合は、 intergraph::asIgraph() などで変換すると良い。

library(network)

Freeman <- function(network,mode){
  matrix <- mixingmatrix(network,mode)$matrix #混合行列(属性間のエッジ数を成分とする行列)
  matrix[upper.tri(matrix)] <- 0 #ダブルカウントになるので上三角成分を0に
  N <- network.size(network) #ノード数
  attribute <- network %v% mode #所望の属性をvectorで取得
  nk_sum_sq <- N^2
  nk_sq_sum <- sum(table(attribute)^2)
  sum_of_mgh <- sum(matrix) - sum(diag(matrix)) #異なるグループ間のエッジ数
  p <- sum_of_mgh / sum(matrix) #異なるグループ間のエッジの割合(観測値)
  freeman <- 1 - p*N*(N-1)/(nk_sum_sq - nk_sq_sum)
  return(freeman)
}

例

Freeman(net, mode = "gender")

注意点

ここで注意しなければならないのは，確かにFreeman指数を用いることでそれぞれ社会的次元に関して発生するセグリゲーションの度合いを個々に評価することはできるが，異なる社会的次元同士の関連（例えばジェンダーと所属部活の関連）や，構造的な紐帯形成プロセス（transitive closureなど）を考慮に入れられないということである。そのような場合は、ERGMなどの多変量解析の手法を用いるのがよい。

2019-06-22

何もかも忘れて浮間公園に行く

公園に行く生活

最近は現実がつらすぎるので、公園で探鳥ばかりしている。

先週、板橋区の都立浮間公園に足を運んだので記録を残しておく。

↓東京都公園協会のページ www.tokyo-park.or.jp

午前7時30分頃。JR埼京線の浮間舟渡駅を降りると、すぐ目の前が公園。

f:id:meana0:20190622175536j:plain

入り口の広場では、ハトの大群が日光浴を楽しんでいた。

f:id:meana0:20190622180022j:plain

「浮間ヶ池」という名の大きな池を囲むようにして遊歩道が整備されていた。

この日池にいたのはカルガモ、カイツブリ、そしてゴイサギ。 ※クロップしまくってるので画質が粗い

写真こそ撮影していないものの、ツバメの群れもいた。

f:id:meana0:20190622180745j:plain

f:id:meana0:20190622180747j:plain

f:id:meana0:20190622180751j:plain

ゴイサギを生で見たのは初めて。今月の月刊BIRDERによればこの20年間で分布が半減したらしい。

公園の奥の方は柵で囲われたバードサンクチュアリになっており、その近辺にシジュウカラもいた。

f:id:meana0:20190622180902j:plain

サンクチュアリ付近に置かれたいた看板。三日月湖なんですね。

f:id:meana0:20190622180944j:plain

有志の方が野鳥の写真も提供しているようだ。

f:id:meana0:20190622180948j:plain

池の脇にはちょっとした水生植物園もあり、睡蓮（ヒツジグサ）の花も見られて、とてもよい。

f:id:meana0:20190622181200j:plain

超望遠を買ったらまたゴイサギを撮りに来ようと思う。

2019-04-15

How to Weight Edges with Number of Common Acquaintances

※ This article is translated version of 共通の知人数でエッジを重み付けする - SNAGeek

Introduction : the concept of "structural embeddedness"

The concept of structural embeddedness refers to how a certain tie is embedded in the social structure. This is measured by "how many common acquaintances exist between the two nodes."

The degree of structural embeddedness can be used as strength of a tie, when it is not possible to directly observe the depth of the relationship between node pairs, such as attachment or number of interactions. The structural features of the network are convoluted as edge weights.

The research that dealt with this, classically, has the following papers. In this paper, it is revealed that the number of common acquaintances is more time stable than the likes and dislikes and the time spent.

www.sciencedirect.com

This concept is derived from Granovetter's "social embeddedness" concept. This is a concept that covers the type of ties and social situations such as "family relations" and "marital relations", but structural embeddedness focuses on structural aspects, especially.

The problem is how to calculate the number of people in common. You can implement it by the double loop, but if you exploit of the properties of the adjacency matrix, you can reduce it into matrix calculation, so it is possible to calculate very fast.

For undirected graphs without loops, if you calculate the power of the adjacency matrix, each element of the matrix is equal to the number of n-paths between i and j . When n = 2, the number of n-paths is the same as the number of nodes to which node i and node j are connected in common.

Implementation

Load libraries

library(igraph)
library(ggnetwork)
library(tidyverse)

Generate a random graph and visualize it

set.seed(111)
network <- erdos.renyi.game(15,0.3)

original_layout <- layout.fruchterman.reingold(network)

gnet <- ggnetwork(network,layout = original_layout)
g <- ggplot(gnet,aes(x=x,y=y,xend=xend,yend=yend))
g <- g + geom_edges(size=0.5)
g <- g + geom_nodes(size=3)
g <- g + geom_nodelabel(aes(label = vertex.names))
g <- g + theme_blank()
g <- g + ggtitle("ORIGINAL NETWORK")
g

f:id:meana0:20190414141758p:plain

The original adjacency matrix

adjmat <- as_adjacency_matrix(network)
adjmat

15 x 15 sparse Matrix of class "dgCMatrix"
                                   
 [1,] . . 1 . . . . . . 1 1 1 . . .
 [2,] . . . . . . . . . . . . 1 1 1
 [3,] 1 . . 1 1 . . . . 1 . 1 1 1 1
 [4,] . . 1 . . . . . . . . . . 1 .
 [5,] . . 1 . . . 1 . . . 1 . 1 . .
 [6,] . . . . . . . 1 1 . . 1 1 . .
 [7,] . . . . 1 . . 1 . . . . . . .
 [8,] . . . . . 1 1 . 1 . . . . . .
 [9,] . . . . . 1 . 1 . . 1 1 . 1 1
[10,] 1 . 1 . . . . . . . 1 . . 1 .
[11,] 1 . . . 1 . . . 1 1 . . . . 1
[12,] 1 . 1 . . 1 . . 1 . . . . . .
[13,] . 1 1 . 1 1 . . . . . . . . .
[14,] . 1 1 1 . . . . 1 1 . . . . .
[15,] . 1 1 . . . . . 1 . 1 . . . .

Next, let's square the adjacency matrix. To calculate matrix product in R, you can use %*% operator.

squared_adjmat <- adjmat %*% adjmat
squared_adjmat

15 x 15 sparse Matrix of class "dgCMatrix"
                                   
 [1,] 4 . 2 1 2 1 . . 2 2 1 1 1 2 2
 [2,] . 3 3 1 1 1 . . 2 1 1 . . . .
 [3,] 2 3 8 1 1 2 1 . 3 2 4 1 1 2 .
 [4,] 1 1 1 2 1 . . . 1 2 . 1 1 1 1
 [5,] 2 1 1 1 4 1 . 1 1 2 . 1 1 1 2
 [6,] 1 1 2 . 1 4 1 1 2 . 1 1 . 1 1
 [7,] . . 1 . . 1 2 . 1 . 1 . 1 . .
 [8,] . . . . 1 1 . 3 1 . 1 2 1 1 1
 [9,] 2 2 3 1 1 2 1 1 6 2 1 1 1 . 1
[10,] 2 1 2 2 2 . . . 2 4 1 2 1 1 2
[11,] 1 1 4 . . 1 1 1 1 1 5 2 1 2 1
[12,] 1 . 1 1 1 1 . 2 1 2 2 4 2 2 2
[13,] 1 . 1 1 1 . 1 1 1 1 1 2 4 2 2
[14,] 2 . 2 1 1 1 . 1 . 1 2 2 2 5 3
[15,] 2 . . 1 2 1 . 1 1 2 1 2 2 3 4

The elements of resulting matrix equal to the number of 2-paths for each component between i and j. Of course, the diagonal is equal to the degree of each node.

Finally, to weight only edges where there was an edge originally, calculate the element product with the adjacency matrix and finally calculate the element sum with the adjacency matrix.

weighted_adjmat <- squared_adjmat * adjmat + adjmat
weighted_adjmat

> weighted_adjmat
15 x 15 sparse Matrix of class "dgCMatrix"
                                   
 [1,] . . 3 . . . . . . 3 2 2 . . .
 [2,] . . . . . . . . . . . . 1 1 1
 [3,] 3 . . 2 2 . . . . 3 . 2 2 3 1
 [4,] . . 2 . . . . . . . . . . 2 .
 [5,] . . 2 . . . 1 . . . 1 . 2 . .
 [6,] . . . . . . . 2 3 . . 2 1 . .
 [7,] . . . . 1 . . 1 . . . . . . .
 [8,] . . . . . 2 1 . 2 . . . . . .
 [9,] . . . . . 3 . 2 . . 2 2 . 1 2
[10,] 3 . 3 . . . . . . . 2 . . 2 .
[11,] 2 . . . 1 . . . 2 2 . . . . 2
[12,] 2 . 2 . . 2 . . 2 . . . . . .
[13,] . 1 2 . 2 1 . . . . . . . . .
[14,] . 1 3 2 . . . . 1 2 . . . . .
[15,] . 1 1 . . . . . 2 . 2 . . . .

Let's visualize the weighted adjacency matrix. After creating a network object in igraph, subtract 1 from the edge weight. Edges that do not have any common acquaintances will be displayed with a weight of 0.

weighted_network <- graph.adjacency(weighted_adjmat,
                                    mode = "undirected",
                                    diag = FALSE,
                                    weighted = TRUE)
E(weighted_network)$weight <- E(weighted_network)$weight - 1

gnet <- ggnetwork(weighted_network,layout = original_layout)
g <- ggplot(gnet,aes(x=x,y=y,xend=xend,yend=yend))
g <- g + geom_edges(size=0.5)
g <- g + geom_nodelabel(aes(label=vertex.names))
g <- g + geom_edgelabel(aes(label=weight))
g <- g + theme_blank()
g <- g + ggtitle("WEIGHTED NETWORK")
g

f:id:meana0:20190414141755p:plain

2019-04-14

共通の知人数でエッジを重み付けする

R ネットワーク分析

はじめに：構造的埋め込みについて

構造的埋め込み(structural embeddedness) とは、「ある紐帯がどれほど社会構造の中に埋め込まれているか」を表す概念です。これは「両者の間で共通の知人がどれだけいるか」によって測定されます。

愛着や相互作用の数など、直接的にノードペアの関係性の深さを観測できない時、構造的埋め込みの度合いが紐帯の強度として使用されることがあります。ネットワークの構造的特徴が、エッジの重みとして畳み込まれるわけです。

これを扱った研究としては、古典的には以下のような論文があります。こちらの論文では、共通の知人の数が、好き嫌いや過ごす時間よりも時間安定的であることが明らかにされています。

www.sciencedirect.com

この概念はグラノヴェッターの社会的埋め込み(social embeddedness)概念に由来しています。社会的埋め込みは「家族関係」「婚姻関係」などの紐帯の性質や社会的状況までカバーする概念ですが、構造的埋め込みは、社会的埋め込み概念のなかでも、特にその構造的側面に着目したものと言えます。

問題は、共通の知人数をどう計算するかです。 愚直に二重ループを回してもいいのですが、隣接行列の性質をうまく利用すると、行列計算に持ち込めるため、たいへん高速に計算することが可能です。

それは、隣接行列をn乗した行列の各成分はi-j間のn-pathの数に等しいという性質です(ループなしの無向グラフの場合)。ことn=2の時、n-pathの数はノードiとノードjが共通してつながっているノードの数と同じです。

実際にやってみる

ライブラリ読み込み

library(igraph)
library(ggnetwork)
library(tidyverse)

ランダムグラフ生成して可視化

set.seed(111)
network <- erdos.renyi.game(15,0.3)

original_layout <- layout.fruchterman.reingold(network)

gnet <- ggnetwork(network,layout = original_layout)
g <- ggplot(gnet,aes(x=x,y=y,xend=xend,yend=yend))
g <- g + geom_edges(size=0.5)
g <- g + geom_nodes(size=3)
g <- g + geom_nodelabel(aes(label = vertex.names))
g <- g + theme_blank()
g <- g + ggtitle("ORIGINAL NETWORK")
g

f:id:meana0:20190414141758p:plain

もともとの隣接行列

adjmat <- as_adjacency_matrix(network)
adjmat

15 x 15 sparse Matrix of class "dgCMatrix"
                                   
 [1,] . . 1 . . . . . . 1 1 1 . . .
 [2,] . . . . . . . . . . . . 1 1 1
 [3,] 1 . . 1 1 . . . . 1 . 1 1 1 1
 [4,] . . 1 . . . . . . . . . . 1 .
 [5,] . . 1 . . . 1 . . . 1 . 1 . .
 [6,] . . . . . . . 1 1 . . 1 1 . .
 [7,] . . . . 1 . . 1 . . . . . . .
 [8,] . . . . . 1 1 . 1 . . . . . .
 [9,] . . . . . 1 . 1 . . 1 1 . 1 1
[10,] 1 . 1 . . . . . . . 1 . . 1 .
[11,] 1 . . . 1 . . . 1 1 . . . . 1
[12,] 1 . 1 . . 1 . . 1 . . . . . .
[13,] . 1 1 . 1 1 . . . . . . . . .
[14,] . 1 1 1 . . . . 1 1 . . . . .
[15,] . 1 1 . . . . . 1 . 1 . . . .

続いてこれを自乗します。Rで行列積を計算する時は %*% 演算子を使います。

squared_adjmat <- adjmat %*% adjmat
squared_adjmat

15 x 15 sparse Matrix of class "dgCMatrix"
                                   
 [1,] 4 . 2 1 2 1 . . 2 2 1 1 1 2 2
 [2,] . 3 3 1 1 1 . . 2 1 1 . . . .
 [3,] 2 3 8 1 1 2 1 . 3 2 4 1 1 2 .
 [4,] 1 1 1 2 1 . . . 1 2 . 1 1 1 1
 [5,] 2 1 1 1 4 1 . 1 1 2 . 1 1 1 2
 [6,] 1 1 2 . 1 4 1 1 2 . 1 1 . 1 1
 [7,] . . 1 . . 1 2 . 1 . 1 . 1 . .
 [8,] . . . . 1 1 . 3 1 . 1 2 1 1 1
 [9,] 2 2 3 1 1 2 1 1 6 2 1 1 1 . 1
[10,] 2 1 2 2 2 . . . 2 4 1 2 1 1 2
[11,] 1 1 4 . . 1 1 1 1 1 5 2 1 2 1
[12,] 1 . 1 1 1 1 . 2 1 2 2 4 2 2 2
[13,] 1 . 1 1 1 . 1 1 1 1 1 2 4 2 2
[14,] 2 . 2 1 1 1 . 1 . 1 2 2 2 5 3
[15,] 2 . . 1 2 1 . 1 1 2 1 2 2 3 4

こうしてできた行列は、各成分がi-j間の相異なる2-pathの数になっています。当然ですが、対角成分は、各ノードの次数と等しくなります。

最後に、もともとエッジがあったところにだけ重み付けするため、隣接行列との要素積を計算し、最後に隣接行列との要素和を計算します。

weighted_adjmat <- squared_adjmat * adjmat + adjmat
weighted_adjmat

> weighted_adjmat
15 x 15 sparse Matrix of class "dgCMatrix"
                                   
 [1,] . . 3 . . . . . . 3 2 2 . . .
 [2,] . . . . . . . . . . . . 1 1 1
 [3,] 3 . . 2 2 . . . . 3 . 2 2 3 1
 [4,] . . 2 . . . . . . . . . . 2 .
 [5,] . . 2 . . . 1 . . . 1 . 2 . .
 [6,] . . . . . . . 2 3 . . 2 1 . .
 [7,] . . . . 1 . . 1 . . . . . . .
 [8,] . . . . . 2 1 . 2 . . . . . .
 [9,] . . . . . 3 . 2 . . 2 2 . 1 2
[10,] 3 . 3 . . . . . . . 2 . . 2 .
[11,] 2 . . . 1 . . . 2 2 . . . . 2
[12,] 2 . 2 . . 2 . . 2 . . . . . .
[13,] . 1 2 . 2 1 . . . . . . . . .
[14,] . 1 3 2 . . . . 1 2 . . . . .
[15,] . 1 1 . . . . . 2 . 2 . . . .

こうして作成した重み付き隣接行列を実際に可視化してみます。igraphでネットワークオブジェクトを作成した後に、重みから1を引きます。共通の知人がいないエッジは重みが0と表示されるようになります。

weighted_network <- graph.adjacency(weighted_adjmat,
                                    mode = "undirected",
                                    diag = FALSE,
                                    weighted = TRUE)
E(weighted_network)$weight <- E(weighted_network)$weight - 1

gnet <- ggnetwork(weighted_network,layout = original_layout)
g <- ggplot(gnet,aes(x=x,y=y,xend=xend,yend=yend))
g <- g + geom_edges(size=0.5)
g <- g + geom_nodelabel(aes(label=vertex.names))
g <- g + geom_edgelabel(aes(label=weight))
g <- g + theme_blank()
g <- g + ggtitle("WEIGHTED NETWORK")
g

f:id:meana0:20190414141755p:plain

2019-03-09

タコのレモン和え

料理

f:id:meana0:20190309112447j:plain

昨日の宅飲みに持っていったもの。

晩冬ですが、春を飛び級して夏を感じさせる一品です。簡単に作れる割に見た目が鮮やかなので、宅飲みにはとても重宝します。香りのキーとなる生のイタリアンパセリは手に入らない場合も多いと思いますが、その時はドライタイプのものを死ぬほど入れることで代用できます（もちろん風味は落ちます）。

元ネタはこちらのレシピですが、より手軽に作れるようにアレンジしています。 abebeeno.blogspot.com

必要なもの

ジップロック（あるいはそれに準ずる袋）
タコ（生食用、茹でてあるもの）300gくらい？
- 一口大に切っておく
イタリアンパセリ（生）5,6枝
- 葉っぱの部分だけをちぎり、ざっくりと刻んでおく
レモン 1個
- 皮ごと使うので、表面はよく洗っておく
- 縦に4等分する
  - 1/4個は薄くいちょう切りにしておく
  - 2/4個は漬ける用の絞り汁に使う
  - 1/4個は食べる直前の絞り汁用
にんにく 1片
- 細かくみじん切りにしておく
オリーブオイル
塩・こしょう

工程

ジップロックににんにく、イタリアンパセリ、タコ、いちょう切りにしたレモンを入れる
オリーブオイルと、レモン半個分の絞り汁を入れる
塩・こしょうを入れる
ジップロックの中で揉み込んで、空気を抜いたら冷暗所で1,2時間ほど寝かせ、味を染み込ませる。
- 味が染み込むのは温度ではなく時間オーダーの現象(参考：冷めるときに味が染み込む理由 | エンジニアのメソッド)なので、冷やす必要はないですが、よく冷やしたほうがお酒に合います。
- 寝かせている間、適当なタイミングで1個つまんで味見して、塩気が足りなかったら塩を足す
皿に盛り付けて、最後に1/4個分のレモン果汁を回しかけて完成。

タコはもちろんのこと、オイルソースにバゲットをつけて食べると最高です。

2019-03-06

Generate Random Graphs with Fixed Degree Distribution (R igraph)

※This article is translated version of 次数分布を固定してランダムグラフを生成する - SNAGeek

Introduction

There are situations when you want to evaluate statistically the whole network statistics, such as global clustering coefficient, average path length.

In such a case, you can use a method of comparing network statistics with that generated from a null model to find out how it statistically deviates . What is commonly used is to compare it with a random graph having an degree distribution same as the original graph.

In this article, we introduce a method to generate random graph by fixing the degree distribution, using R igraph package.

Loading graphs

For this case, we use karate club network data attached to igraph by default.

library(igraph)
library(tidyverse)
library(ggnetwork)

net <- graph("Zachary")

Visualization

Let's visualize the graph with ggnetwork. We will visualize the random graph later, but we must first acquire the coordinates of the node in the original graph so that we can use it again later.

hruchterman_reingold = layout.fruchterman.reingold(net)
gnet <- ggnetwork(net, layout = hruchterman_reingold)
g <- ggplot(gnet, aes(x = x, y = y, xend = xend, yend = yend))
g <- g + geom_edges(size = 0.5)
g <- g + geom_nodes(size = 3, col = "blue")
g <- g + geom_nodelabel_repel(aes(label = vertex.names))
g <- g + ggtitle("original karate club network")
g <- g + theme_blank()
g

Here is the image. Somehow, it seems that the clustering coefficient is high and the average path length is large, at first glance.

f:id:meana0:20190302005735p:plain — Zachary Karate Club Network

When actually calculating it

> cc_original <- transitivity(net) # global clustering factor
> pl_original <- mean_distance(net) # average path length
>
> cc_original
[1] 0.2556818
> pl_original
[1] 2.4082

Well, how far is these values characteristic?

Random graph generation

In igraph it can be generated with sample_degseq(in.deg). in.deg is a vector of degree of each node. In the case of an undirected graph, only in.deg., In case of directed graph, in.deg the distribution of in-degree, out.deg specifying the distribution of out-degree.

Acquire the degree distribution of the original network.

degree_dist <- igraph::degree(net)
g <- ggplot(data = tibble(degree = degree_dist), aes(x = degree))
g <- g + geom_histogram()
g <- g + ggtitle("degree distribution of karate club network")
g

f:id:meana0:20190302183725p:plain — Degree distribution of original network

The method of generating the graph is specified by the optional argument. There are three kinds of igraph, simple(default),simple.no.multiple, vl.

simple

A method of generating n edges appropriately from the node and combining them. Although it is a simple method, multiple edges and self loops occur. This is because the possibility of duplication of the edge partners of the two nodes to be joined together is not excluded.

degree_dist <- igraph::degree(net)
simple_random_net <- sample_degseq(degree_dist)

In the graph obtained in this way, it can be confirmed that if the simplify() `and multiple edges and self-loops are eliminated, the number of edges is slightly reduced.

> simple_random_net
IGRAPH 3c731f6 U --- 34 78 - Degree sequence random graph
+ attr: name(g / c), method(g / c)
+ edges from 3c731f6:
 [1] 2 - 6 1- 1-4 3--17 - - 28 1- - 34 19 - - 28 12 - 33 7 - - 18 9 - - 29 21 - 33 5 - 34 1 - 14 3--33 1 - 28 8 - 8 22 - - 27 25 - 31
[18] 7 - 32 9 - 33 1 - 22 19 19 - 20 11 - 34 1 - 20 2- - 7 4- - 34 1 - 30 18 - 34 4 - 32 2- 5 10 - 34 1 - 1 - 6 - 1 14 1 - 3 9 - 17
[35] 6 - 33 8 - - 24 5 - 34 16 - 30 4 - 2 3 1 - 31 33 - 34 3 - 24 4 25 - 27 33 - 34 2 - 26 1- 3 3-33 33 28 - 34 26 - 33 1 - 3 23 - 34
[52] 4--29 14 - 32 10 - 34 2 - 29 9 - 32 11 - 20 34 - 34 3 - 2 4 11 - 25 5 - 1 5 3 - 1 15 1- 14 2- 1-31 1 - 1 6 7 - - 24 2- - 34 3 - 13
[69] 21 - 32 6 - 32 24 - 30 1 - 2 - 2 - 34 13 - 33 8 - 30 33 - 33 9 - 34 26 - 31
> simplify(simple_random_net, remove.multiple = TRUE, remove.loops = TRUE)
IGRAPH 40 d 3 c 01 U --- 34 65 - Degree sequence random graph
+ attr: name(g / c), method(g / c)
+ edges from 40 d 3 c 01:
 [1] 1 - 2 - 1 - - 3 1 - 4 - 1 - 1 14 1 - 1 6 1 - 20 - 1 - 22 22 1 - 28 1- 1-30 1-31 1-34 2- 5 2-- 6 2-- 7 2--26 2--29 2--31
[18] 2 - 34 - 3 - 3 - 3 - 3 - 3 - 3 - 3 - 2 - 3 - 33 4 - 1 5 4 - 2 3 4 - 2 9 4 - 32 4- - 34 5 - 34 6 - - 14 6 - 32 6 - 33 7 - - 18 7 - 24
[35] 7 - 32 8 - - 24 8 - 30 9 - - 17 9 - 29 9 - 32 9 - 33 9 - 34 10 - 34 11 - 20 11 - 25 11 - 34 12 - 33 13 - 33 14 - 32 16 - 30 17 - 28
[52] 18 - 34 19 - 20 19 - 28 21 - 32 21 - 33 22 - 27 23 - 34 24 - 30 25 - 27 25 - 31 26 - 31 26 - 33 28 - 34 33 - 34

simple.no.multiple

The generation logic is the same as simple, but when multiple edges and self loops occur, the graph is discarded and generation continues until a graph without multiple graphs or loops is generated. Therefore, the larger the average degree is , the more calculation time is required, and it is not uniform sampling from the possible graph set.

Visualize the random graph generated by this method.

gnet <- ggnetwork(simple_no_multiple_random_net, layout = hruchterman_reingold)
g <- ggplot(gnet, aes(x = x, y = y, xend = xend, yend = yend))
g <- g + geom_edges(size = 0.5)
g <- g + geom_nodes(size = 3, col = "blue")
g <- g + geom_nodelabel_repel(aes(label = vertex.names))
g <- g + ggtitle("sampled karate club network(method: simple.no.multiple)")
g <- g + theme_blank()
g

f:id:meana0:20190302005738p:plain — Generated random graph

Compared to the original network, the clustering coefficient is small and the mean distance between nodes seems to be close.

Unlike the graph generated by simple, the number of edges does not change before and aftersimplify().

> ecount(simple_no_multiple_random_net)
[1] 78
> ecount(igraph::simplify(simple_no_multiple_random_net, remove.multiple = TRUE, remove.loop = TRUE))
[1] 78

vl

Because it is difficult, explanation is omitted.
It seems to be uniform sampling here
Official document of igraph says
The "vl" method is a more sophisticated generator. This generator is generated unidentified, connected simple graphs, it is an error to pass the in.deg argument to it. Then some rewiring is done to make the graph connected. Finally a Monte-Carlo algorithm is used to randomize The graph. The "vl" samples from the undirected, connected simple graphs unformly. See http://www-rp.lip6.fr/~latapy/FV/generation.html for details.

Significance Test

In fact, we evaluate the clustering coefficient and the average path length by the deviation from the null model. We generate 1,000 random graphs by the vl method and examine where the value of the original network are located in the distribution of values calculated from them.

sample_cc <- c()
sample_pl <- c()
for(i in 1: 1000) {
  vl_random_net <- sample_degseq(degree_dist, method = "vl")
  sample_cc <- c(sample_cc, transitivity(vl_random_net))
  sample_pl <- c(sample_pl, mean_distance(vl_random_net))
}

The distribution of each indicator is as follows.

# cc hist
g <- ggplot(data = tibble(cc = sample_cc), aes(x = cc))
g <- g + geom_histogram(binwidth =. 01)
g <- g + geom_vline(xintercept = cc_original, col = "red")
g

# pl hist
g <- ggplot(data = tibble(pl = sample_pl), aes(x = pl))
g <- g + geom_histogram(binwidth =. 01)
g <- g + geom_vline(xintercept = pl_original, col = "red")
g

f:id:meana0:20190302183611p:plain — Distribution of clustering coefficients Red line is original value)

f:id:meana0:20190302183615p:plain — Distribution of average path length(red line is original value)

We assume that these distributions follow a normal distribution(though it is distorted to the right with respect to the average path length...), calculate the mean, standard deviation, and z value, and calculate the p value.

cc_mean = mean(sample_cc)
pl_mean = mean(sample_pl)
cc_sd = sd(sample_cc)
pl_sd = sd(sample_pl)
cc_zscore <-(cc_original - cc_mean) / cc_sd
pl_zscore <-(pl_original - pl_mean) / pl_sd

> 1-pnorm(cc_zscore)
[1] 0.1035129
> 1-pnorm(pl_zscore)
[1] 0.0003767265

Compared with the distribution of indices obtained from the random graph, it can be seen that the original network is a statistically deviated value for both network statistics.

Finally

It may be better to keep the series of methods introduced here to the extent that it is used for searching what kind of point is characteristic in a certain graph.

Because according to the paper Comparing Brain Networks of Different Size and Connectivity Density Using Graph Theory which examined the comparison method between graphs with different clustering coefficients and average path length Since these indices are greatly influenced by the node size N and the average degree k, it is difficult to compare graphs of different sizes and densities by raw numerical values based on the original size(this bias is N-independent or k- independent, etc.).

It seems that the null model-based metrics introduced here are more influenced by N and k, if you use it, please be careful.

2019-03-03

春菊とサラミのパスタ

料理

f:id:meana0:20190303233321j:plain

オイルソースのパスタの具として緑色の野菜が欲しい時は大体ほうれん草を使っていたのですが、以下のまとめを見てから、春菊にリプレースしました。

togetter.com

個人的には、春菊を使うメリットは以下の通りかと思っています。

下茹での必要がなく、生のままでも食べられる。
芳香成分であるα-ピネンとペリルアルデヒドは脂溶性のため、オイルソースとの相性が良い。
ほうれん草と比べてシュウ酸の含有量が少ないため、歯のざらつきが少ない。

普段はベーコンと春菊を炒めてアーリオ・オーリオを作るのですが、今回はフエ・カセーロという、表面に白カビのついたサラミを使いました。先日、ヨーロッパ留学から帰ってきた友人からもらったもので、チーズのような風味がたまらない一品です。

フエ・カセーロ・カタルーニャ・サラミ/エルポソブランド

出版社/メーカー: エルポソ社
メディア: その他
この商品を含むブログを見る

レシピ

必要なもの

春菊
- 包丁を使うのはダルいので手でちぎってしまいます。
- 炒めると思いのほか小さくなるので、多めに使うくらいがちょうどよいです。
- トッピング用に葉の部分を取っておくのもよいですね。
- 根っこの部分を食べるかどうかは自由ですが、茎は食感のアクセントになるので捨てることなきよう。
サラミ
- 食べやすいように輪切りにしておきます。
- こちらもトッピング用に2,3切れほど炒めずに取っておいてもよいですね。
にんにく
鷹の爪(optional)
白ワイン
塩・コショウ
スパゲッティ
- 今回は1.9mmを使っています。

分量はすべて適当です。適当にやっても美味しくできるのがパスタのいいところです。強いて言うならば、サラミの塩気があるので、塩は少なめで構いません。

工程

フライパンにオリーブオイル、刻みニンニク、種を取った鷹の爪を入れてから、弱火で熱します。
- このタイミングでパスタも茹で始めてしまいましょう。
にんにくの香りが立ってきたら、輪切りにしたサラミを炒めます。ちょっとカリカリになるくらいがちょうどいいですね。
中火にして、春菊を投入してサッと炒めたら、火を強めて白ワインを振ります。アルコールが飛ぶまでフライパンを適当に揺らしましょう。
パスタの茹で汁を少し加え、かき混ぜて乳化させます。パスタが茹で上がるのを待ちます。
パスタが茹で上がったら、ソースと絡めます。
皿に盛り付けたら、春菊の葉やサラミを乗せて完成です。

あとはビールをプシュッとして、いただきましょう。