SlideShare a Scribd company logo
IT in Industry, vol. 6, 2018 Published online 09-Feb-2018
Copyright Š Dehmer, Klimenko, Shokin, Rychkova, 1 ISSN (Print): 2204-0595
Dobrynin, Konstantinova, Medvedev 2018 ISSN (Online): 2203-1731
Analysis of Webspaces of the Siberian Branch of the
Russian Academy of Sciences and the
Fraunhofer-Gesellschaft
Matthias Dehmer
UMIT
Hall in Tyrol, Austria
Andrey A. Dobrynin, Elena V. Konstantinova,
Andrei Yu. Vesnin
Sobolev Institute of Mathematics SB RAS
Novosibirsk, Russia
Olga A. Klimenko, Yuri I. Shokin, Elena V. Rychkova
Institute of Computational Technologies SB RAS
Novosibirsk, Russia
Alexey N. Medvedev
Central European University
Budapest, Hungary
Abstract—In this paper, two webspaces of academic
institutions of the Siberian Branch of Russian Academy of
Sciences (SB RAS) and of the Fraunhofer-Gesellschaft (FG),
Germany, will be investigated. The webspaces are represented by
directed graphs possessing vertices corresponding to websites. An
arc connects two vertices if there exists at least one hyperlink
between the corresponding websites. Webometrics is used for
ranking the websites of SB RAS and FG. We discuss numerical
results when studying the websites structurally. In particular, we
examine scientific communities of the underlying websites
representing directed graphs and draw important conclusions.
Keywords—network; webometrics; quantitative measure;
communities
I. INTRODUCTION
In this paper, a webspace is a structural object (graph)
formed by a set of websites and hyperlinks between them [1]. To
investigate a webspace structurally, we use methods from
webometrics, i.e., the contemporary method for studying
information resources, structure and technology features of the
web. The development of webometrics has started in 1997 after
the seminal paper of Almind and Ingwersen [2]. Methods from
webometrics possess statistical nature and do not serve as full
description of diverse information processes that occur in the
webspace.Therefore to analyze the structure of webspaces, we
are goingto use graph-theoretical methods [1, 3].
There have been a large number of contributions in the
literature for studying webspaces resembling university websites
and academic institutions [4-12]. Since the number of webspaces
to be studied is infinite, there is still space left for performing
research on this topic. In this paper, we tackle this problem by
studying websites from Russia to investigate the underlying
institutions specifically.
It is well-known that the structure of real-life networks is not
random [13]. When dealing with non-random topologies, the
structural heterogeneity (or complexity) may be captured by
calculating various measures [14, 15]. Another problem in this
context is to determine the community structure of networks.
Community detection has been one of the hot topics in network
sciences and, hence, the problem received considerable
attention in the last decade [16]. The concept of the community
in a network is usually derived from common understanding of
communities in social networks [17].Graph-theoretically, the
problem has been defined by identifying the set of vertices
which are more tightly connected compared to the rest of the
network. Note that in order to solve the community problem, a
precise mathematical quantity Q has been introduced based on
the following description: to partition the vertex set of a network
into a union of subsets that maximizes Q [18].However, this
problem has been proven to be NP-hard and only heuristics
algorithms are available to determine Q [19].
Here we consider webspaces generated by websites of
academic institutions of the Siberian Branch of the Russian
Academy of Sciences (SB RAS)and academic institutions of the
Fraunhofer-Gesellschaft (FG), Germany. The structure of these
webspaces is formedby websites of the scientific institutions and
hyperlinks between them. Websites of SB RAS and FG will be
ranked by using methods from webometrics, and numerical
scores for examining community structure of webspaces
aredetermined. Since Russian webspaces have only been little
investigated, we believe that our work will have an impact for
the webscience community.
II. REPRESENTATION OF WEBSPACES
A simple model for representing the structure of webspaces
is a directed weighted graph G = (V, E) with vertex set V and arc
set E. We assume that webgraphs [14] do not possess any
self-loops and multi-arcs. In this paper, vertices of V correspond
to websites. Suppose that the vertices v and u of G correspond to
sites X and Y; then an arc (v,u) connects the vertices v and u if
IT in Industry, vol. 6, 2018 Published online 09-Feb-2018
Copyright Š Dehmer, Klimenko, Shokin, Rychkova, 2 ISSN (Print): 2204-0595
Dobrynin, Konstantinova, Medvedev 2018 ISSN (Online): 2203-1731
there exists at least one hyperlink in X, referring to site Y. The
number of hyperlinks from X to Y is represented by the weight w
of the arc (v,u). The distance d(v,u) between vertices v and u in a
graph is the number of arcs in the shortest directed path
connecting them.
Fig. 1. Webgraph R of institutes of Siberian Branch of RAS.
Fig. 2. Webgraph F of institutes of Fraunhofer-Gesellschaft.
Let R and F be webgraphs of SB RAS and FG, respectively.
Their structures are shown in Fig. 1 and Fig.2.Graph R has95
vertices and 949 arcs while G consists of 72 vertices and 321
arcs. Additional information on these graphs can be found in
[20,21]. The number of arcs going from (to) a vertex v is
denoted by deg+
(v) (in-degreedeg-
(v)). A pair (deg+
(v), deg-
(v))
gives information on vicinity size of v.The weighted out-degree
(in-degree) wdeg+
(v) (wdeg-
(v)) is the sum of weights of arcs
coming from (reaching) a vertex v. Total degrees are defined as
deg(v) = deg+
(v) + deg-
(v) and wdeg(v) = wdeg+
(v) + wdeg-
(v).
A vertex v is called isolated if deg(v) =0. The degree
distributions of the graphs R and F are shown inFig. 3 to Fig. 6.
III. METHODS
A. Ranking academic institutions by using webometrics
The “Ranking Web of World Research Centers” is an
initiative of the Cybermetrics Lab, a research group belonging to
theConsejo Superior de Investigaciones Cientificas (CSIC),
Spain. Quantitative methods have been designed to measure the
scientific activity on the Web to determine ratings of universities
and research centers of various countries [22].The cybermetric
indicators have been useful to evaluate science and technology
and they serve as a proper complement to the results obtained by
using bibliometric methods connected to scientometric studies.
Fig. 3. Degree distribution of graph R.
Fig. 4. Degree distribution of graph F.
Fig. 5. Weighted degree distribution of graph R.
Starting from 2008, the Institute of Computational
Technologies of Siberian branch of Russian Academy of
Sciences (SB RAS) generates ratings of websites of scientific
institutions of SB RAS [6, 8, 21]. The ranking method is
presented in [22]. In this paper, we are going to extract statistics
from three major search engines: Yandex [23], Google [24], and
Bing [25]. To evaluate websites, the method uses the following
parameters:
IT in Industry, vol. 6, 2018 Published online 09-Feb-2018
Copyright Š Dehmer, Klimenko, Shokin, Rychkova, 3 ISSN (Print): 2204-0595
Dobrynin, Konstantinova, Medvedev 2018 ISSN (Online): 2203-1731
• V – visibility. The parameter equals the number of
external links from other websites to the considered one.
Since the data from different engines is distinct, the
average value is taken: V = (VYandex + VGoogle + VBing)/3.
• S – size. The parameter equals the number of webpages of
the website determined by the search engines. Again, we
use the average value: S = (SYandex + SGoogle+ SBing)/3.
• R – richness value. The parameter equals to the number of
documents the website has with file extensions of Adobe
Acrobat (.pdf), Microsoft Word (.doc) and PowerPoint
(.ppt). The quantity is determined by search engines'
query, therefore we use averaging: R= (RYandex+ RGoogle)/2.
• Sc – citation index obtained from citation system Google
Scholar [26]. This parameter reflects the academic
importance of the website.
The overall rating evaluation includes the following steps.
1. Evaluation of the visibility V, size S and richness R
parameters for all websites in the network.
2. Ranking the values of V, S and R. The parameter array,
say V, is ranked in decreasing order. The website with maximal
V receives rank Vr = 1. The websites with identical values of V
get equal ranks. Similarly, we compute the ranks Sr and Rr by
using the parameters S and R for each website in the network.
3. Evaluation of the rank of the citation index Sc. We
compute the values Sc. The rank Scr is obtained by ordering
these values. The website with the minimal value receives the
rank-value Scr = 1.
4. Computing the sum of the obtained ranks for each
website: W = Vr + Sr + Rr + Scr.
5. The final rating is obtained by sorting the list of W scores
in increasing order. Therefore, the lower the value of W is, the
higher is the rank (rating position) of the website.
B. Quantitative measures for webgraphs
One of the common approaches when studying web
structures is based on quantifying structural information by
using various quantitative measures [14, 15]. Usually, a
quantitative graph measure is a graph invariant that maps a set of
graphs to a set of numbers such that invariant values coincide for
isomorphic graphs [27]. Such invariants can quantify either local
or global properties of graphs. Local measures, as a rule,
describe a graph structure near particular vertices. In contrast,
global measures encode structural information of the entire
graph. Some global invariants may be regarded as a complexity
measure of a graph [28, 29]. We consider the following graph
invariants.
The average degree, adeg(G), of a n-vertex graph G is the
average value:
=
1
=
∈
1
.
∈
The weighted analogue, awdeg(G), is given by the formula:
=
1
=
∈
1
.
∈
The diameter, diam(G), of a graph G is the largest distance
between two vertices:
= max , |	 , ∈ }.
It says how far one can travel in a webspace without any
repetitions of websites.
The vertex index, cv(G), of a graph G. This invariant
indicates which part of a websiteis involved into information
relationships(every website of this part has at least one arc).Let
G be a n-vertex graph with k isolated vertices. Then
! = 1 −
#
	.
The quantity cv(G) reflectstages of webspace growth.
Namely, cv(G) is close to 0 in the initial stagewhen forming the
webspace; the value cv(G) = 1 indicates that allwebsites are
contained in the network.
The arc index, ca(G), of a graph G.The maximal number of
arcs in a directed n-vertex graphis equal to n(n–1), n> 1.Let
Ghas t arcs. Then the arc index is defined as
!$ =
%
− 1
	.
This graph invariant is also referred to as network density
[30].The quantity ca(G) shows which part of arcsparticipate in
changes between websites. The maximal value ca(G) = 1
expresses that one can reach any other website by one click
starting from an arbitrary website.
The betweenness centrality, betw(v), of a vertex v shows the
importance of a vertex in terms of routing and connectivity.
This quantity [31] is a local graph invariant defined as:
& % =
'()
'()
	,
(* *)
where σst is the total number of directed shortest paths from
vertex sto vertex t and σst(v) is the number of those paths that
pass through v.
The clustering coefficient, cc(G), of a graph G.By writing
neighborhood of a vertex v, we refer toall vertices that are
adjacent to v (without orientation of arcs).Let V2 be the set of all
vertices of a directed graph G with deg(v) = 2.Let Gv be the
directed subgraph induced by the neighborhood of v.The
clustering coefficient for a vertex v is defined by ca(Gv), i.e. it is
the arc index of Gv[17, 32].Then the clustering coefficient of G
is the averagevalue of the clustering coefficients for all vertices
regarding V2, namely:
!! =
1
| +|
!$ 	.
∈ ,
The introduced numerical graph invariants are applied tothe
web graphs R and F.
IT in Industry, vol. 6, 2018 Published online 09-Feb-2018
Copyright Š Dehmer, Klimenko, Shokin, Rychkova, 4 ISSN (Print): 2204-0595
Dobrynin, Konstantinova, Medvedev 2018 ISSN (Online): 2203-1731
C. Communities in graphs
We study the community structure in terms of splitting the
vertex set V into non-intersecting subsets (or communities) that
maximize the directed and weighted modification of modularity
coefficient [18]. Denote by wij the weight of an arc (i,j) of a graph
Gwith vertex set V = {1, 2,…, n}. For weighted degrees of
vertices and the total degree w, we getwdeg+
(i) = ÎŁ(i,j) wij,
wdeg-
(i) = ÎŁ(j,i) wjiand w=ÎŁiwdeg+
(i) = ÎŁiwdeg-
(i). Then the
modularity Q(G) can be defined as
- =
1
. /0 −
	 1
2
/,0 ∈3
456/, 607,
where Ci is the cluster of vertex iand δ(Ci,Cj) is the Kronecker
symbol. It equals1 if the vertices i and j are in the same
community; otherwise it equals 0. The unweighted version of
modularity, Qun(G), is obtained from Q(G) by omitting the
weight from every arc. That is for every arc (i,j) we assign a new
weight wij’ = 1 if wij ≠ 0 and wij’ = 0 otherwise. If a graph G has
q arcs, then Qun(G) can be written as follows [18, 33]:
-89 =
1
:
.1 −
	 1
:
2
/,0 ∈3
456/, 607.
The quantities Q and Qun are applied to the graphs R and F.
IV. RESULTS AND DISCUSSION
A. Ranking academic institutions
The results from final ranking the websites academic
institutions of Siberian Branch of RAS and Fraunhofer-
Gesellschaft are presented in [20, 21]. First parts of the rankings
are shown in Table I and TableII. The comparison of four
webometrics quantities for these websites is presented in Fig. 7
and Fig. 8. From these rankings and involved computations (V,
S, R and Sc) we are able to perform several observations:
TABLE I. RATING SCORES P FOR INSTITUTIONS OF SB RAS
P Name of organization Website address W
1
Portal of Siberian Branch of Russian
Academy of Sciences
www.sbras.ru
7
2 Institute of Computational Technologies www.ict.nsc.ru 22
3 Institute of Cytology and Genetics www.bionet.nsc.ru 22
4 Budker Institute of Nuclear Physics www.inp.nsk.su 27
5 Sobolev Institute of Mathematics www.inp.nsk.su 35
6 Institute Computing Simulation icm.krasn.ru 35
7
State Pubic Scientific
Technological Library
www.spsl.nsc.ru
42
8
A.P. Ershov Institute
of Informatics Systems
www.iis.nsk.su
44
9 Branch of SPSL SB RAS www.prometeus.nsc.ru 51
10
Institute of Automation and
Electrometry
www.iae.nsk.su
53
11
Institute of Problems
of Developmentof the North
www.ipdn.ru
56
12
Novosibirsk Institute of Organic
Chemistry
www.nioch.nsc.ru
60
13 Boreskov Institute of Catatysis www.catatysis.ru 61
14 Presidium of SB RAS www.sbras.nsc.ru 68
15 Kirensky Institute of Physics www.kirensky.ru 70
TABLE II. RATING SCORES P FOR INSTITUTIONS OF FG
P Name of organization Website address W
1 Fraunhofer Headquarters www.fraunhofer.de 6
2
Institute for Systems
and Innovation Research
www.isi.fraunhofer.de
26
3
Institute for Open Communication
Systems
www.fokus.fraunhofer.de
30
4
Institute for Manufacturing
Engineeringand Automation
www.ipa.fraunhofer.de
34
5
Institute for Industrial
Mathematics
www.itwm.fraunhofer.de
37
6 Institute for Solar Energy Systems www.ise.fraunhofer.de 42
7 Institute for Industrial Engineering www.iao.fraunhofer.de 43
8 Institute for Laser Technology www.ilt.fraunhofer.de 43
9 Institute for Integrated Circuits www.iis.fraunhofer.de 46
10
Institute for Information Center
for Planningand Building
www.irb.fraunhofer.de
59
11
Institute for Factory Operation
and Automation
www.iff.fraunhofer.de
62
12
Institute for Algorithms and
ScientificComputing
www.scai.fraunhofer.de
72
13 Institute for Building Physics www.ibp.fraunhofer.de 75
14
Institute for Intelligent Analysis
and Information Systems
www.iais.fraunhofer.de
78
15
Institute for Wind Energy and
Energy System Technology
www.iwes.fraunhofer.de
82
Fig. 7. Final rating scores of the first 15 institutions of SB RAS.
Fig. 8. Final rating scores of the first 15 institutions of FG.
• 27 websites of the SB RAS network and 18 websites of the
FG network have more than 1000 external links.
IT in Industry, vol. 6, 2018 Published online 09-Feb-2018
Copyright Š Dehmer, Klimenko, Shokin, Rychkova, 5 ISSN (Print): 2204-0595
Dobrynin, Konstantinova, Medvedev 2018 ISSN (Online): 2203-1731
Therefore, 28% of websites of the SB RAS network and
25% of websites of the FG network have sufficiently many
external links;
• 88% of the websites of the SB RAS network and 95% of
the websites of the FG network have more than 100
webpages. The composition of the websites of the SB RAS
and FG networks is similar: the number of websites for SB
RAS with R> 100 is 47 (45%); for FG we obtain 48 (35%);
• the Google Scholar citation index for FG websites is
greater than for websites of the SB RAS network: the
number of websites with parameter Sc exceeding 10 is 42
(44%); for SB RAS and 66 (92%) for FG.
B. Quantitative graph measures
Global quantitative properties of the considered graphs are
presented in Table III. The average vertex degrees of the graphs
decrease twice while the weighted average degrees decrease ten
times after deleting the administrative hubs. The diameter of the
graphs ranges from 2 in R and for F to 7, respectively.
The vertex index cv indicates that three graphs contain
isolated vertices, i.e., the corresponding websites aren't involved
in any communications. That means nobody can neither visit
these websites, nor leave them. Namely, graph R has 2 isolated
vertices. The other invariants of Table III show that almost all
global and local arc saturations of webgraphs are very small.
TABLE III. QUANTITATIVE INVARIANTS OF WEBGRAPHS
Invariant R (SB RAS) F (FG)
average degree, adeg(G) 9.99 4.46
average degree, awdeg(G) 743.21 763.88
graph diameter, diam(G) 2 2
vertex index, cv(G) 0.98 1.00
arc index, ca(G) 0.11 0.06
transitivity coefficient, cc(G) 0.07 0.09
A local invariant, the betweenness centrality, shows the
webgraphs are very centralized. Approximately 8% of vertices
possess a significantly higher betweenness centrality score and
degrees comparing to the rest. The betweenness centrality
scores are shown in Fig. 9 and Fig. 10. Some structural
properties of webgraphs R and F have been studied in [9].
V. DISCUSSION AND CONCLUSION
The search of modularity maxima has been performed by
using a combination of heuristic algorithms, mainly based on the
tabu search algorithm [33]. The observed heterogeneity
expressed by vertex degrees and the betweenness centrality
score makes it difficult to reveal communities in the network.
The best obtained modularity score was ≈ 0.15 for R and ≈ 0.13
for F (see Table IV). Whenever the algorithm assigns a
community to one of the most influential vertex, its
neighborhoods showed a tendency to fall into this community.
The numbers of the corresponding communities and their sizes
are presented in Table V.
Fig 9. Betweenness centrality distribution of graph R.
Fig 10. Betweenness centrality distribution of graph F.
TABLE IV. MODULARITY RANKS
R (SB RAS) F (FG)
modularity Q 0.153 0.131
modularity Qun 0.155 0.252
TABLEV. NUMBER OF COMMUNITIES AND THEIR SIZES
R (SB RAS) F (FG)
Q 5 (46,41,6,1,1) 5 (62,4,2,2,2)
Qun 8 (30,18,14,11,10,10,1,1) 7 (21,10,10, 9,9,9,4)
Along with the weighted modularity, we also computed the
unweighted version. The unweighted graph showed a weaker
community structure when considering R compared to F. We
emphasize that by omitting the weights, we got better partitions
(see Table IV and Table V).
It is evident assuming that the communities in these
academic networks should reflect scientific collaborations
between the corresponding institutes. This hypothesis has been
checked for the SB RAS graph R, where we composed the found
partitions into communities based on the subject areas of
institutes. The resulting subject partition has modularity rank
Q = 0.115 for R which is far from the optimally obtained
partitions.
ACKNOWLEDMENT
This work was supported in part by the Austrian Science
Funds for supporting this work (project P26142), the Russian
Foundation for Basic Research (grant 16-01-00499), Presidium
RAS (grant 0314-2015-0011), the Leading Scientific Schools of
the Russian Federation (grant 7214.2016.9).
IT in Industry, vol. 6, 2018 Published online 09-Feb-2018
Copyright Š Dehmer, Klimenko, Shokin, Rychkova, 6 ISSN (Print): 2204-0595
Dobrynin, Konstantinova, Medvedev 2018 ISSN (Online): 2203-1731
REFERENCES
[1] S. Chakrabarti, Mining the Web: Discovering Knowledge from Hypertext
Data, Francisco: Morgan Kaufmann, 2002.
[2] T. Almind andP. Ingwersen,“Infometric analyses on the world wide web:
methodological approaches to “webometrics”, J. Document., 1997, vol.
53(4), pp. 404-426.
[3] M. Dehmer, Strukturelle Analyse web-basierter Dokumente. Multimedia
und Telekooperation, Wiesbaden: DeutscherUniversitätsVerlag, 2006.
[4] M.L. Arslan and S.E.Seker,“Web based reputation index of Turkish
universities”,2014. http://arxiv.org/ftp/arxiv/papers/1401/1401.7547.pdf
[5] A.A. Pechnikov and A.M. Nwohiri,“Webometric analysis of Nigerian
university websites”. Webology,vol. 9(1), 2012.
[6] Y.I. Shokin, O.A. Klimenko, E.V. Rychkova, and I.V.
Shabalnikov,“Website rating for scientific and research organizations of
the Siberian Branch of Russian Academy of Sciences”, Computational
Technologies. vol. 13(3), 2008, pp. 128-135, (in Russian).
[7] Y.I. Shokin, A.Y. Vesnin, A.A. Dobrynin, O.A. Klimenko,
E.V. Konstantinova, I.S. Petrov,and E.V. Rychkova, “Investigation of the
academic webspace of the Republic of Serbia”, In:
Zbornikradovakonferencije MIT2013,Belgrad, Serbia, 2014, pp. 601-607,
(in Russian)
[8] http://www.mit.rs/2013/zbornik-2013.pdf.
[9] Y.I. Shokin, A.Y.Vesnin, A.A.Dobrynin, O.A.Klimenko, E.V. Rychkova,
and I.S. Petrov,“Investigation of the academic webspace of the Siberian
Branch of the Russian Academy of Sciences”, Computational
Technologies, vol. 17(6), 2012, pp. 86-98, (in Russian).
[10] Y.I. Shokin, A.Y. Vesnin, A.A. Dobrynin, O.A. Klimenko, and
E.V. Rychkova,“Analysis of a web-space of academic communities by
method of webometrics and graph theory”, Information Technologies,vol.
12, 2014, 31-40, (in Russian).
[11] D. Stuart, M. Thelwall, and G. Harries,“UK academic web links and
collaboration – an exploratory study”,J.Inf Sci., vol. 33(2), 2007, pp.
231-246.
[12] M. Thelwall andD. Wilkinson,“Graph structure in three national academic
webs: power laws with anomalies”, Am. Soc. Inf. Sci. Technol.,vol. 54(8),
2003, pp. 706-712.
[13] A.Y. Vesnin, E.V. Konstantinova, and M.Y. Savin,“On scenarios of
attaching new websites to the webspace of SB RAS”, Vestnik NSU:
Information Technologies,vol. 11(4), 2013, pp. 28-37,(in Russian).
[14] M.E.J. Newman, A.L. BarabĂĄsi, and D.J. Watts, The Structure and
Dynamics of Networks, Princeton Studies in Complexity, Princeton
University Press, 2006.
[15] M. Dehmer and F. Emmert-Streib,“Mining graph patterns in web-based
systems: aconceptual view”, In: A. Mehler,S. Sharoff, G.Rehm, and
M. Santini, Eds., Genres on the Web: Computational Models and
Empirical Studies. Berlin/New York: Springer, 2010. pp. 237-253.
[16] M. Dehmer and F. Emmert-Streib, Eds, Quantitative Graph Theory:
Mathematical Foundations and Applications, Discrete Mathematics and
Its Applications, Chapman and Hall/CRC, 2014.
[17] S. Fortunato,“Community detection in graphs”, Physics Reports,vol.
486(3), 2010, pp. 75-174.
[18] S. Wasserman and K. Faust, Network Analysis:Methods and
Applications, Cambridge: UK, Cambridge University Press, 1994.
[19] M.E.J. Newman,“Analysis of weighted networks”, Phys. Rev. E. vol.
70(5), 2004, 056131.
[20] U. Brandes, D. Delling, M. Gaertler, R. Goerke, M. Hoefer, Z. Nikoloski,
D. Wagner,“On modularity clustering”, IEEE Transactions on Knowledge
and Data Engineering,vol. 20(2), 2008, pp. 172-188.
[21] Hyperlinks between websites of institutes of scientific society
Fraunhofer-Gesellschaft, http://w.ict.nsc.ru/sitepage.php?PageID=1000.
[22] Rating of websites of scientific organizations of the Siberian Branch of
RAS,http://w.ict.nsc.ru/ranking/indexen.php?s_InfoID=15
[23] Project Ranking Web of World Research Centers,
http://research.webometrics.info
[24] Web search engine Yandex, http://www.yandex.ru
[25] Web search engine Google, http://www.google.ru
[26] Web search engine Bing, http://www.bing.com
[27] Indexing service of Google Scholar, http://scholar.google.com
[28] F. Harary, Graph Theory, Addison Wesley Publishing Company, 1969.
[29] F. Emmert-Streib andM. Dehmer,“Networks for systems biology:
conceptual connection of data and function”, IET Systems Biology,vol. 5,
2011, pp. 185-207.
[30] A. Mowshowitz andM. Dehmer,“Entropy and the complexity of graphs
revisited, Entropy, vol. 14(3), 2012, pp. 559-570.
[31] P. Hage and F.Harary, Structural Models in Anthropology,
CambridgeUK: Cambridge University Press, 1983.
[32] S. Gago, J.C. Hurajová, and T. Madaras,“Betweenness centrality of
graphs”, In: M. Dehmer and F.Emmert-Streib, Eds., Quantitative Graph
Theory: Mathematical Foundations and Applications, Discrete
Mathematics and Its Applications, Chapman and Hall/CRC,
2014,pp. 233-257.
[33] D. Watts andS. Strogatz,“Collective dynamics of small world networks”,
Nature, vol. 393, 1998, pp. 440-442.
[34] A. Arenas, A. Fernández, and S. Gómez,“Analysis of the structure of
complex networks at different resolution levels”, New Journal of
Physics,vol. 10(5), 2008, 053039.

More Related Content

PDF
TEXT EXTRACTION FROM RASTER MAPS USING COLOR SPACE QUANTIZATION
csandit
 
PDF
KTH-Texxi Project 2010
Texxi Global
 
PDF
An improved graph drawing algorithm for email networks
Zakaria Boulouard
 
PDF
Van damme icc
Ana-Maria RAIMOND
 
DOCX
Understanding Map Integration Using GIS Software_ff
Michelle Pasco
 
PPTX
Distributed graph summarization
aftab alam
 
PPTX
Mujungi Davis
Said Mujungi
 
PDF
Baroclinic Channel Model in Fluid Dynamics
IJERA Editor
 
TEXT EXTRACTION FROM RASTER MAPS USING COLOR SPACE QUANTIZATION
csandit
 
KTH-Texxi Project 2010
Texxi Global
 
An improved graph drawing algorithm for email networks
Zakaria Boulouard
 
Van damme icc
Ana-Maria RAIMOND
 
Understanding Map Integration Using GIS Software_ff
Michelle Pasco
 
Distributed graph summarization
aftab alam
 
Mujungi Davis
Said Mujungi
 
Baroclinic Channel Model in Fluid Dynamics
IJERA Editor
 

What's hot (13)

PPTX
A Graph Summarization: A Survey | Summarizing and understanding large graphs
aftab alam
 
PPT
Huelva07 Ws2 Drea
Territorial Intelligence
 
PPTX
Host rank:Exploiting the Hierarchical Structure for Link Analysis
NEERAJ BAGHEL
 
PPTX
Spatial databases
Neha Kulkarni
 
PDF
Graph Analysis over Relational Database. Roberto Franchini - Arcade Analytics
Data Driven Innovation
 
PPTX
Algorithms for Query Processing and Optimization of Spatial Operations
Natasha Mandal
 
PDF
Mathematical Analysis of Half Volume DRA with Performance Evaluation for High...
rahulmonikasharma
 
PDF
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
Jason Riedy
 
PDF
Cut to Fit: Tailoring the Partitioning to the Computation
jackkolokasis
 
PDF
Au 2008 Gs100 1 P Getting Spatial With
Richard Chappell, GISP
 
PDF
From Data to Knowledge thru Grailog Visualization
giurca
 
PDF
Measuring Open Data Portal User-Orientation: A Computational Approach
Mark Dix
 
PDF
Graph Signal Processing for Machine Learning A Review and New Perspectives - ...
lauratoni4
 
A Graph Summarization: A Survey | Summarizing and understanding large graphs
aftab alam
 
Huelva07 Ws2 Drea
Territorial Intelligence
 
Host rank:Exploiting the Hierarchical Structure for Link Analysis
NEERAJ BAGHEL
 
Spatial databases
Neha Kulkarni
 
Graph Analysis over Relational Database. Roberto Franchini - Arcade Analytics
Data Driven Innovation
 
Algorithms for Query Processing and Optimization of Spatial Operations
Natasha Mandal
 
Mathematical Analysis of Half Volume DRA with Performance Evaluation for High...
rahulmonikasharma
 
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
Jason Riedy
 
Cut to Fit: Tailoring the Partitioning to the Computation
jackkolokasis
 
Au 2008 Gs100 1 P Getting Spatial With
Richard Chappell, GISP
 
From Data to Knowledge thru Grailog Visualization
giurca
 
Measuring Open Data Portal User-Orientation: A Computational Approach
Mark Dix
 
Graph Signal Processing for Machine Learning A Review and New Perspectives - ...
lauratoni4
 
Ad

Similar to Analysis of Webspaces of the Siberian Branch of the Russian Academy of Sciences and the Fraunhofer-Gesellschaft (20)

PDF
A survey of_eigenvector_methods_for_web_information_retrieval
Chen Xi
 
PDF
Sub-Graph Finding Information over Nebula Networks
ijceronline
 
PDF
Finding important nodes in social networks based on modified pagerank
csandit
 
PDF
Ijciet 10 01_183
IAEME Publication
 
PDF
Graph Analyses with Python and NetworkX
Benjamin Bengfort
 
PDF
Community detection of political blogs network based on structure-attribute g...
IJECEIAES
 
PDF
An experimental evaluation of similarity-based and embedding-based link predi...
IJDKP
 
PDF
An Experimental Evaluation of Similarity-Based and Embedding-Based Link Predi...
IJDKP
 
PDF
Cach bieu dien bieu do va cac loai graph
tranhoangphong020820
 
PDF
FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK
cscpconf
 
PDF
An experimental evaluation of similarity-based and embedding-based link predi...
IJDKP
 
PDF
Reconciling Event-Based Knowledge through RDF2VEC
Mehwish Alam
 
PDF
Random web surfer pagerank algorithm
alexandrelevada
 
PDF
Markov chains and page rankGraphs.pdf
rayyverma
 
PDF
ESWC2015 - Tutorial on Publishing and Interlinking Linked Geospatial Data
Kostis Kyzirakos
 
PDF
A Subgraph Pattern Search over Graph Databases
IJMER
 
PDF
LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks
IJCSIS Research Publications
 
PDF
A survey of web metrics
unyil96
 
PPTX
Automatically Inferring ClassSheet Models from Spreadsheets
JĂĄcome Cunha
 
PDF
Centrality Prediction in Mobile Social Networks
IJERA Editor
 
A survey of_eigenvector_methods_for_web_information_retrieval
Chen Xi
 
Sub-Graph Finding Information over Nebula Networks
ijceronline
 
Finding important nodes in social networks based on modified pagerank
csandit
 
Ijciet 10 01_183
IAEME Publication
 
Graph Analyses with Python and NetworkX
Benjamin Bengfort
 
Community detection of political blogs network based on structure-attribute g...
IJECEIAES
 
An experimental evaluation of similarity-based and embedding-based link predi...
IJDKP
 
An Experimental Evaluation of Similarity-Based and Embedding-Based Link Predi...
IJDKP
 
Cach bieu dien bieu do va cac loai graph
tranhoangphong020820
 
FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANK
cscpconf
 
An experimental evaluation of similarity-based and embedding-based link predi...
IJDKP
 
Reconciling Event-Based Knowledge through RDF2VEC
Mehwish Alam
 
Random web surfer pagerank algorithm
alexandrelevada
 
Markov chains and page rankGraphs.pdf
rayyverma
 
ESWC2015 - Tutorial on Publishing and Interlinking Linked Geospatial Data
Kostis Kyzirakos
 
A Subgraph Pattern Search over Graph Databases
IJMER
 
LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks
IJCSIS Research Publications
 
A survey of web metrics
unyil96
 
Automatically Inferring ClassSheet Models from Spreadsheets
JĂĄcome Cunha
 
Centrality Prediction in Mobile Social Networks
IJERA Editor
 
Ad

More from ITIIIndustries (20)

PDF
13th International Conference of Advanced Computer Science & Information Tech...
ITIIIndustries
 
PDF
12th International Conference on Bioinformatics and Bioscience (ICBB 2025)
ITIIIndustries
 
PDF
13th International Conference on Natural Language Processing (NLP 2024)
ITIIIndustries
 
PDF
11th International Conference on Computer Networks & Data Communications (CND...
ITIIIndustries
 
PDF
10th International Conference on Software Engineering and Applications (SOFEA...
ITIIIndustries
 
PDF
10th International Conference on Fuzzy Logic Systems (Fuzzy 2024)
ITIIIndustries
 
PDF
10th International Conference on Natural Language Computing (NATL 2024)
ITIIIndustries
 
PDF
10th International Conference on Fuzzy Logic Systems (Fuzzy 2024)
ITIIIndustries
 
PDF
2nd International Conference on Computer Science and Information Technology A...
ITIIIndustries
 
PDF
10th International Conference on Fuzzy Logic Systems (Fuzzy 2024)
ITIIIndustries
 
PDF
Call For Papers -10th International Conference on Natural Language Computing ...
ITIIIndustries
 
PDF
2nd International Conference on Semantic Technology (SEMTEC 2024)
ITIIIndustries
 
PDF
12th International Conference on Artificial Intelligence, Soft Computing (AIS...
ITIIIndustries
 
PDF
9th International Conference on Education (EDU 2024)
ITIIIndustries
 
PDF
Securing Cloud Computing Through IT Governance
ITIIIndustries
 
PDF
Information Technology in Industry(ITII) - November Issue 2018
ITIIIndustries
 
PDF
Design of an IT Capstone Subject - Cloud Robotics
ITIIIndustries
 
PDF
Dimensionality Reduction and Feature Selection Methods for Script Identificat...
ITIIIndustries
 
PDF
Image Matting via LLE/iLLE Manifold Learning
ITIIIndustries
 
PDF
Annotating Retina Fundus Images for Teaching and Learning Diabetic Retinopath...
ITIIIndustries
 
13th International Conference of Advanced Computer Science & Information Tech...
ITIIIndustries
 
12th International Conference on Bioinformatics and Bioscience (ICBB 2025)
ITIIIndustries
 
13th International Conference on Natural Language Processing (NLP 2024)
ITIIIndustries
 
11th International Conference on Computer Networks & Data Communications (CND...
ITIIIndustries
 
10th International Conference on Software Engineering and Applications (SOFEA...
ITIIIndustries
 
10th International Conference on Fuzzy Logic Systems (Fuzzy 2024)
ITIIIndustries
 
10th International Conference on Natural Language Computing (NATL 2024)
ITIIIndustries
 
10th International Conference on Fuzzy Logic Systems (Fuzzy 2024)
ITIIIndustries
 
2nd International Conference on Computer Science and Information Technology A...
ITIIIndustries
 
10th International Conference on Fuzzy Logic Systems (Fuzzy 2024)
ITIIIndustries
 
Call For Papers -10th International Conference on Natural Language Computing ...
ITIIIndustries
 
2nd International Conference on Semantic Technology (SEMTEC 2024)
ITIIIndustries
 
12th International Conference on Artificial Intelligence, Soft Computing (AIS...
ITIIIndustries
 
9th International Conference on Education (EDU 2024)
ITIIIndustries
 
Securing Cloud Computing Through IT Governance
ITIIIndustries
 
Information Technology in Industry(ITII) - November Issue 2018
ITIIIndustries
 
Design of an IT Capstone Subject - Cloud Robotics
ITIIIndustries
 
Dimensionality Reduction and Feature Selection Methods for Script Identificat...
ITIIIndustries
 
Image Matting via LLE/iLLE Manifold Learning
ITIIIndustries
 
Annotating Retina Fundus Images for Teaching and Learning Diabetic Retinopath...
ITIIIndustries
 

Recently uploaded (20)

PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PDF
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PDF
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
PDF
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PDF
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
The Future of Artificial Intelligence (AI)
Mukul
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 

Analysis of Webspaces of the Siberian Branch of the Russian Academy of Sciences and the Fraunhofer-Gesellschaft

  • 1. IT in Industry, vol. 6, 2018 Published online 09-Feb-2018 Copyright Š Dehmer, Klimenko, Shokin, Rychkova, 1 ISSN (Print): 2204-0595 Dobrynin, Konstantinova, Medvedev 2018 ISSN (Online): 2203-1731 Analysis of Webspaces of the Siberian Branch of the Russian Academy of Sciences and the Fraunhofer-Gesellschaft Matthias Dehmer UMIT Hall in Tyrol, Austria Andrey A. Dobrynin, Elena V. Konstantinova, Andrei Yu. Vesnin Sobolev Institute of Mathematics SB RAS Novosibirsk, Russia Olga A. Klimenko, Yuri I. Shokin, Elena V. Rychkova Institute of Computational Technologies SB RAS Novosibirsk, Russia Alexey N. Medvedev Central European University Budapest, Hungary Abstract—In this paper, two webspaces of academic institutions of the Siberian Branch of Russian Academy of Sciences (SB RAS) and of the Fraunhofer-Gesellschaft (FG), Germany, will be investigated. The webspaces are represented by directed graphs possessing vertices corresponding to websites. An arc connects two vertices if there exists at least one hyperlink between the corresponding websites. Webometrics is used for ranking the websites of SB RAS and FG. We discuss numerical results when studying the websites structurally. In particular, we examine scientific communities of the underlying websites representing directed graphs and draw important conclusions. Keywords—network; webometrics; quantitative measure; communities I. INTRODUCTION In this paper, a webspace is a structural object (graph) formed by a set of websites and hyperlinks between them [1]. To investigate a webspace structurally, we use methods from webometrics, i.e., the contemporary method for studying information resources, structure and technology features of the web. The development of webometrics has started in 1997 after the seminal paper of Almind and Ingwersen [2]. Methods from webometrics possess statistical nature and do not serve as full description of diverse information processes that occur in the webspace.Therefore to analyze the structure of webspaces, we are goingto use graph-theoretical methods [1, 3]. There have been a large number of contributions in the literature for studying webspaces resembling university websites and academic institutions [4-12]. Since the number of webspaces to be studied is infinite, there is still space left for performing research on this topic. In this paper, we tackle this problem by studying websites from Russia to investigate the underlying institutions specifically. It is well-known that the structure of real-life networks is not random [13]. When dealing with non-random topologies, the structural heterogeneity (or complexity) may be captured by calculating various measures [14, 15]. Another problem in this context is to determine the community structure of networks. Community detection has been one of the hot topics in network sciences and, hence, the problem received considerable attention in the last decade [16]. The concept of the community in a network is usually derived from common understanding of communities in social networks [17].Graph-theoretically, the problem has been defined by identifying the set of vertices which are more tightly connected compared to the rest of the network. Note that in order to solve the community problem, a precise mathematical quantity Q has been introduced based on the following description: to partition the vertex set of a network into a union of subsets that maximizes Q [18].However, this problem has been proven to be NP-hard and only heuristics algorithms are available to determine Q [19]. Here we consider webspaces generated by websites of academic institutions of the Siberian Branch of the Russian Academy of Sciences (SB RAS)and academic institutions of the Fraunhofer-Gesellschaft (FG), Germany. The structure of these webspaces is formedby websites of the scientific institutions and hyperlinks between them. Websites of SB RAS and FG will be ranked by using methods from webometrics, and numerical scores for examining community structure of webspaces aredetermined. Since Russian webspaces have only been little investigated, we believe that our work will have an impact for the webscience community. II. REPRESENTATION OF WEBSPACES A simple model for representing the structure of webspaces is a directed weighted graph G = (V, E) with vertex set V and arc set E. We assume that webgraphs [14] do not possess any self-loops and multi-arcs. In this paper, vertices of V correspond to websites. Suppose that the vertices v and u of G correspond to sites X and Y; then an arc (v,u) connects the vertices v and u if
  • 2. IT in Industry, vol. 6, 2018 Published online 09-Feb-2018 Copyright Š Dehmer, Klimenko, Shokin, Rychkova, 2 ISSN (Print): 2204-0595 Dobrynin, Konstantinova, Medvedev 2018 ISSN (Online): 2203-1731 there exists at least one hyperlink in X, referring to site Y. The number of hyperlinks from X to Y is represented by the weight w of the arc (v,u). The distance d(v,u) between vertices v and u in a graph is the number of arcs in the shortest directed path connecting them. Fig. 1. Webgraph R of institutes of Siberian Branch of RAS. Fig. 2. Webgraph F of institutes of Fraunhofer-Gesellschaft. Let R and F be webgraphs of SB RAS and FG, respectively. Their structures are shown in Fig. 1 and Fig.2.Graph R has95 vertices and 949 arcs while G consists of 72 vertices and 321 arcs. Additional information on these graphs can be found in [20,21]. The number of arcs going from (to) a vertex v is denoted by deg+ (v) (in-degreedeg- (v)). A pair (deg+ (v), deg- (v)) gives information on vicinity size of v.The weighted out-degree (in-degree) wdeg+ (v) (wdeg- (v)) is the sum of weights of arcs coming from (reaching) a vertex v. Total degrees are defined as deg(v) = deg+ (v) + deg- (v) and wdeg(v) = wdeg+ (v) + wdeg- (v). A vertex v is called isolated if deg(v) =0. The degree distributions of the graphs R and F are shown inFig. 3 to Fig. 6. III. METHODS A. Ranking academic institutions by using webometrics The “Ranking Web of World Research Centers” is an initiative of the Cybermetrics Lab, a research group belonging to theConsejo Superior de Investigaciones Cientificas (CSIC), Spain. Quantitative methods have been designed to measure the scientific activity on the Web to determine ratings of universities and research centers of various countries [22].The cybermetric indicators have been useful to evaluate science and technology and they serve as a proper complement to the results obtained by using bibliometric methods connected to scientometric studies. Fig. 3. Degree distribution of graph R. Fig. 4. Degree distribution of graph F. Fig. 5. Weighted degree distribution of graph R. Starting from 2008, the Institute of Computational Technologies of Siberian branch of Russian Academy of Sciences (SB RAS) generates ratings of websites of scientific institutions of SB RAS [6, 8, 21]. The ranking method is presented in [22]. In this paper, we are going to extract statistics from three major search engines: Yandex [23], Google [24], and Bing [25]. To evaluate websites, the method uses the following parameters:
  • 3. IT in Industry, vol. 6, 2018 Published online 09-Feb-2018 Copyright Š Dehmer, Klimenko, Shokin, Rychkova, 3 ISSN (Print): 2204-0595 Dobrynin, Konstantinova, Medvedev 2018 ISSN (Online): 2203-1731 • V – visibility. The parameter equals the number of external links from other websites to the considered one. Since the data from different engines is distinct, the average value is taken: V = (VYandex + VGoogle + VBing)/3. • S – size. The parameter equals the number of webpages of the website determined by the search engines. Again, we use the average value: S = (SYandex + SGoogle+ SBing)/3. • R – richness value. The parameter equals to the number of documents the website has with file extensions of Adobe Acrobat (.pdf), Microsoft Word (.doc) and PowerPoint (.ppt). The quantity is determined by search engines' query, therefore we use averaging: R= (RYandex+ RGoogle)/2. • Sc – citation index obtained from citation system Google Scholar [26]. This parameter reflects the academic importance of the website. The overall rating evaluation includes the following steps. 1. Evaluation of the visibility V, size S and richness R parameters for all websites in the network. 2. Ranking the values of V, S and R. The parameter array, say V, is ranked in decreasing order. The website with maximal V receives rank Vr = 1. The websites with identical values of V get equal ranks. Similarly, we compute the ranks Sr and Rr by using the parameters S and R for each website in the network. 3. Evaluation of the rank of the citation index Sc. We compute the values Sc. The rank Scr is obtained by ordering these values. The website with the minimal value receives the rank-value Scr = 1. 4. Computing the sum of the obtained ranks for each website: W = Vr + Sr + Rr + Scr. 5. The final rating is obtained by sorting the list of W scores in increasing order. Therefore, the lower the value of W is, the higher is the rank (rating position) of the website. B. Quantitative measures for webgraphs One of the common approaches when studying web structures is based on quantifying structural information by using various quantitative measures [14, 15]. Usually, a quantitative graph measure is a graph invariant that maps a set of graphs to a set of numbers such that invariant values coincide for isomorphic graphs [27]. Such invariants can quantify either local or global properties of graphs. Local measures, as a rule, describe a graph structure near particular vertices. In contrast, global measures encode structural information of the entire graph. Some global invariants may be regarded as a complexity measure of a graph [28, 29]. We consider the following graph invariants. The average degree, adeg(G), of a n-vertex graph G is the average value: = 1 = ∈ 1 . ∈ The weighted analogue, awdeg(G), is given by the formula: = 1 = ∈ 1 . ∈ The diameter, diam(G), of a graph G is the largest distance between two vertices: = max , | , ∈ }. It says how far one can travel in a webspace without any repetitions of websites. The vertex index, cv(G), of a graph G. This invariant indicates which part of a websiteis involved into information relationships(every website of this part has at least one arc).Let G be a n-vertex graph with k isolated vertices. Then ! = 1 − # . The quantity cv(G) reflectstages of webspace growth. Namely, cv(G) is close to 0 in the initial stagewhen forming the webspace; the value cv(G) = 1 indicates that allwebsites are contained in the network. The arc index, ca(G), of a graph G.The maximal number of arcs in a directed n-vertex graphis equal to n(n–1), n> 1.Let Ghas t arcs. Then the arc index is defined as !$ = % − 1 . This graph invariant is also referred to as network density [30].The quantity ca(G) shows which part of arcsparticipate in changes between websites. The maximal value ca(G) = 1 expresses that one can reach any other website by one click starting from an arbitrary website. The betweenness centrality, betw(v), of a vertex v shows the importance of a vertex in terms of routing and connectivity. This quantity [31] is a local graph invariant defined as: & % = '() '() , (* *) where σst is the total number of directed shortest paths from vertex sto vertex t and σst(v) is the number of those paths that pass through v. The clustering coefficient, cc(G), of a graph G.By writing neighborhood of a vertex v, we refer toall vertices that are adjacent to v (without orientation of arcs).Let V2 be the set of all vertices of a directed graph G with deg(v) = 2.Let Gv be the directed subgraph induced by the neighborhood of v.The clustering coefficient for a vertex v is defined by ca(Gv), i.e. it is the arc index of Gv[17, 32].Then the clustering coefficient of G is the averagevalue of the clustering coefficients for all vertices regarding V2, namely: !! = 1 | +| !$ . ∈ , The introduced numerical graph invariants are applied tothe web graphs R and F.
  • 4. IT in Industry, vol. 6, 2018 Published online 09-Feb-2018 Copyright Š Dehmer, Klimenko, Shokin, Rychkova, 4 ISSN (Print): 2204-0595 Dobrynin, Konstantinova, Medvedev 2018 ISSN (Online): 2203-1731 C. Communities in graphs We study the community structure in terms of splitting the vertex set V into non-intersecting subsets (or communities) that maximize the directed and weighted modification of modularity coefficient [18]. Denote by wij the weight of an arc (i,j) of a graph Gwith vertex set V = {1, 2,…, n}. For weighted degrees of vertices and the total degree w, we getwdeg+ (i) = ÎŁ(i,j) wij, wdeg- (i) = ÎŁ(j,i) wjiand w=ÎŁiwdeg+ (i) = ÎŁiwdeg- (i). Then the modularity Q(G) can be defined as - = 1 . /0 − 1 2 /,0 ∈3 456/, 607, where Ci is the cluster of vertex iand δ(Ci,Cj) is the Kronecker symbol. It equals1 if the vertices i and j are in the same community; otherwise it equals 0. The unweighted version of modularity, Qun(G), is obtained from Q(G) by omitting the weight from every arc. That is for every arc (i,j) we assign a new weight wij’ = 1 if wij ≠ 0 and wij’ = 0 otherwise. If a graph G has q arcs, then Qun(G) can be written as follows [18, 33]: -89 = 1 : .1 − 1 : 2 /,0 ∈3 456/, 607. The quantities Q and Qun are applied to the graphs R and F. IV. RESULTS AND DISCUSSION A. Ranking academic institutions The results from final ranking the websites academic institutions of Siberian Branch of RAS and Fraunhofer- Gesellschaft are presented in [20, 21]. First parts of the rankings are shown in Table I and TableII. The comparison of four webometrics quantities for these websites is presented in Fig. 7 and Fig. 8. From these rankings and involved computations (V, S, R and Sc) we are able to perform several observations: TABLE I. RATING SCORES P FOR INSTITUTIONS OF SB RAS P Name of organization Website address W 1 Portal of Siberian Branch of Russian Academy of Sciences www.sbras.ru 7 2 Institute of Computational Technologies www.ict.nsc.ru 22 3 Institute of Cytology and Genetics www.bionet.nsc.ru 22 4 Budker Institute of Nuclear Physics www.inp.nsk.su 27 5 Sobolev Institute of Mathematics www.inp.nsk.su 35 6 Institute Computing Simulation icm.krasn.ru 35 7 State Pubic Scientific Technological Library www.spsl.nsc.ru 42 8 A.P. Ershov Institute of Informatics Systems www.iis.nsk.su 44 9 Branch of SPSL SB RAS www.prometeus.nsc.ru 51 10 Institute of Automation and Electrometry www.iae.nsk.su 53 11 Institute of Problems of Developmentof the North www.ipdn.ru 56 12 Novosibirsk Institute of Organic Chemistry www.nioch.nsc.ru 60 13 Boreskov Institute of Catatysis www.catatysis.ru 61 14 Presidium of SB RAS www.sbras.nsc.ru 68 15 Kirensky Institute of Physics www.kirensky.ru 70 TABLE II. RATING SCORES P FOR INSTITUTIONS OF FG P Name of organization Website address W 1 Fraunhofer Headquarters www.fraunhofer.de 6 2 Institute for Systems and Innovation Research www.isi.fraunhofer.de 26 3 Institute for Open Communication Systems www.fokus.fraunhofer.de 30 4 Institute for Manufacturing Engineeringand Automation www.ipa.fraunhofer.de 34 5 Institute for Industrial Mathematics www.itwm.fraunhofer.de 37 6 Institute for Solar Energy Systems www.ise.fraunhofer.de 42 7 Institute for Industrial Engineering www.iao.fraunhofer.de 43 8 Institute for Laser Technology www.ilt.fraunhofer.de 43 9 Institute for Integrated Circuits www.iis.fraunhofer.de 46 10 Institute for Information Center for Planningand Building www.irb.fraunhofer.de 59 11 Institute for Factory Operation and Automation www.iff.fraunhofer.de 62 12 Institute for Algorithms and ScientificComputing www.scai.fraunhofer.de 72 13 Institute for Building Physics www.ibp.fraunhofer.de 75 14 Institute for Intelligent Analysis and Information Systems www.iais.fraunhofer.de 78 15 Institute for Wind Energy and Energy System Technology www.iwes.fraunhofer.de 82 Fig. 7. Final rating scores of the first 15 institutions of SB RAS. Fig. 8. Final rating scores of the first 15 institutions of FG. • 27 websites of the SB RAS network and 18 websites of the FG network have more than 1000 external links.
  • 5. IT in Industry, vol. 6, 2018 Published online 09-Feb-2018 Copyright Š Dehmer, Klimenko, Shokin, Rychkova, 5 ISSN (Print): 2204-0595 Dobrynin, Konstantinova, Medvedev 2018 ISSN (Online): 2203-1731 Therefore, 28% of websites of the SB RAS network and 25% of websites of the FG network have sufficiently many external links; • 88% of the websites of the SB RAS network and 95% of the websites of the FG network have more than 100 webpages. The composition of the websites of the SB RAS and FG networks is similar: the number of websites for SB RAS with R> 100 is 47 (45%); for FG we obtain 48 (35%); • the Google Scholar citation index for FG websites is greater than for websites of the SB RAS network: the number of websites with parameter Sc exceeding 10 is 42 (44%); for SB RAS and 66 (92%) for FG. B. Quantitative graph measures Global quantitative properties of the considered graphs are presented in Table III. The average vertex degrees of the graphs decrease twice while the weighted average degrees decrease ten times after deleting the administrative hubs. The diameter of the graphs ranges from 2 in R and for F to 7, respectively. The vertex index cv indicates that three graphs contain isolated vertices, i.e., the corresponding websites aren't involved in any communications. That means nobody can neither visit these websites, nor leave them. Namely, graph R has 2 isolated vertices. The other invariants of Table III show that almost all global and local arc saturations of webgraphs are very small. TABLE III. QUANTITATIVE INVARIANTS OF WEBGRAPHS Invariant R (SB RAS) F (FG) average degree, adeg(G) 9.99 4.46 average degree, awdeg(G) 743.21 763.88 graph diameter, diam(G) 2 2 vertex index, cv(G) 0.98 1.00 arc index, ca(G) 0.11 0.06 transitivity coefficient, cc(G) 0.07 0.09 A local invariant, the betweenness centrality, shows the webgraphs are very centralized. Approximately 8% of vertices possess a significantly higher betweenness centrality score and degrees comparing to the rest. The betweenness centrality scores are shown in Fig. 9 and Fig. 10. Some structural properties of webgraphs R and F have been studied in [9]. V. DISCUSSION AND CONCLUSION The search of modularity maxima has been performed by using a combination of heuristic algorithms, mainly based on the tabu search algorithm [33]. The observed heterogeneity expressed by vertex degrees and the betweenness centrality score makes it difficult to reveal communities in the network. The best obtained modularity score was ≈ 0.15 for R and ≈ 0.13 for F (see Table IV). Whenever the algorithm assigns a community to one of the most influential vertex, its neighborhoods showed a tendency to fall into this community. The numbers of the corresponding communities and their sizes are presented in Table V. Fig 9. Betweenness centrality distribution of graph R. Fig 10. Betweenness centrality distribution of graph F. TABLE IV. MODULARITY RANKS R (SB RAS) F (FG) modularity Q 0.153 0.131 modularity Qun 0.155 0.252 TABLEV. NUMBER OF COMMUNITIES AND THEIR SIZES R (SB RAS) F (FG) Q 5 (46,41,6,1,1) 5 (62,4,2,2,2) Qun 8 (30,18,14,11,10,10,1,1) 7 (21,10,10, 9,9,9,4) Along with the weighted modularity, we also computed the unweighted version. The unweighted graph showed a weaker community structure when considering R compared to F. We emphasize that by omitting the weights, we got better partitions (see Table IV and Table V). It is evident assuming that the communities in these academic networks should reflect scientific collaborations between the corresponding institutes. This hypothesis has been checked for the SB RAS graph R, where we composed the found partitions into communities based on the subject areas of institutes. The resulting subject partition has modularity rank Q = 0.115 for R which is far from the optimally obtained partitions. ACKNOWLEDMENT This work was supported in part by the Austrian Science Funds for supporting this work (project P26142), the Russian Foundation for Basic Research (grant 16-01-00499), Presidium RAS (grant 0314-2015-0011), the Leading Scientific Schools of the Russian Federation (grant 7214.2016.9).
  • 6. IT in Industry, vol. 6, 2018 Published online 09-Feb-2018 Copyright Š Dehmer, Klimenko, Shokin, Rychkova, 6 ISSN (Print): 2204-0595 Dobrynin, Konstantinova, Medvedev 2018 ISSN (Online): 2203-1731 REFERENCES [1] S. Chakrabarti, Mining the Web: Discovering Knowledge from Hypertext Data, Francisco: Morgan Kaufmann, 2002. [2] T. Almind andP. Ingwersen,“Infometric analyses on the world wide web: methodological approaches to “webometrics”, J. Document., 1997, vol. 53(4), pp. 404-426. [3] M. Dehmer, Strukturelle Analyse web-basierter Dokumente. Multimedia und Telekooperation, Wiesbaden: DeutscherUniversitätsVerlag, 2006. [4] M.L. Arslan and S.E.Seker,“Web based reputation index of Turkish universities”,2014. http://arxiv.org/ftp/arxiv/papers/1401/1401.7547.pdf [5] A.A. Pechnikov and A.M. Nwohiri,“Webometric analysis of Nigerian university websites”. Webology,vol. 9(1), 2012. [6] Y.I. Shokin, O.A. Klimenko, E.V. Rychkova, and I.V. Shabalnikov,“Website rating for scientific and research organizations of the Siberian Branch of Russian Academy of Sciences”, Computational Technologies. vol. 13(3), 2008, pp. 128-135, (in Russian). [7] Y.I. Shokin, A.Y. Vesnin, A.A. Dobrynin, O.A. Klimenko, E.V. Konstantinova, I.S. Petrov,and E.V. Rychkova, “Investigation of the academic webspace of the Republic of Serbia”, In: Zbornikradovakonferencije MIT2013,Belgrad, Serbia, 2014, pp. 601-607, (in Russian) [8] http://www.mit.rs/2013/zbornik-2013.pdf. [9] Y.I. Shokin, A.Y.Vesnin, A.A.Dobrynin, O.A.Klimenko, E.V. Rychkova, and I.S. Petrov,“Investigation of the academic webspace of the Siberian Branch of the Russian Academy of Sciences”, Computational Technologies, vol. 17(6), 2012, pp. 86-98, (in Russian). [10] Y.I. Shokin, A.Y. Vesnin, A.A. Dobrynin, O.A. Klimenko, and E.V. Rychkova,“Analysis of a web-space of academic communities by method of webometrics and graph theory”, Information Technologies,vol. 12, 2014, 31-40, (in Russian). [11] D. Stuart, M. Thelwall, and G. Harries,“UK academic web links and collaboration – an exploratory study”,J.Inf Sci., vol. 33(2), 2007, pp. 231-246. [12] M. Thelwall andD. Wilkinson,“Graph structure in three national academic webs: power laws with anomalies”, Am. Soc. Inf. Sci. Technol.,vol. 54(8), 2003, pp. 706-712. [13] A.Y. Vesnin, E.V. Konstantinova, and M.Y. Savin,“On scenarios of attaching new websites to the webspace of SB RAS”, Vestnik NSU: Information Technologies,vol. 11(4), 2013, pp. 28-37,(in Russian). [14] M.E.J. Newman, A.L. BarabĂĄsi, and D.J. Watts, The Structure and Dynamics of Networks, Princeton Studies in Complexity, Princeton University Press, 2006. [15] M. Dehmer and F. Emmert-Streib,“Mining graph patterns in web-based systems: aconceptual view”, In: A. Mehler,S. Sharoff, G.Rehm, and M. Santini, Eds., Genres on the Web: Computational Models and Empirical Studies. Berlin/New York: Springer, 2010. pp. 237-253. [16] M. Dehmer and F. Emmert-Streib, Eds, Quantitative Graph Theory: Mathematical Foundations and Applications, Discrete Mathematics and Its Applications, Chapman and Hall/CRC, 2014. [17] S. Fortunato,“Community detection in graphs”, Physics Reports,vol. 486(3), 2010, pp. 75-174. [18] S. Wasserman and K. Faust, Network Analysis:Methods and Applications, Cambridge: UK, Cambridge University Press, 1994. [19] M.E.J. Newman,“Analysis of weighted networks”, Phys. Rev. E. vol. 70(5), 2004, 056131. [20] U. Brandes, D. Delling, M. Gaertler, R. Goerke, M. Hoefer, Z. Nikoloski, D. Wagner,“On modularity clustering”, IEEE Transactions on Knowledge and Data Engineering,vol. 20(2), 2008, pp. 172-188. [21] Hyperlinks between websites of institutes of scientific society Fraunhofer-Gesellschaft, http://w.ict.nsc.ru/sitepage.php?PageID=1000. [22] Rating of websites of scientific organizations of the Siberian Branch of RAS,http://w.ict.nsc.ru/ranking/indexen.php?s_InfoID=15 [23] Project Ranking Web of World Research Centers, http://research.webometrics.info [24] Web search engine Yandex, http://www.yandex.ru [25] Web search engine Google, http://www.google.ru [26] Web search engine Bing, http://www.bing.com [27] Indexing service of Google Scholar, http://scholar.google.com [28] F. Harary, Graph Theory, Addison Wesley Publishing Company, 1969. [29] F. Emmert-Streib andM. Dehmer,“Networks for systems biology: conceptual connection of data and function”, IET Systems Biology,vol. 5, 2011, pp. 185-207. [30] A. Mowshowitz andM. Dehmer,“Entropy and the complexity of graphs revisited, Entropy, vol. 14(3), 2012, pp. 559-570. [31] P. Hage and F.Harary, Structural Models in Anthropology, CambridgeUK: Cambridge University Press, 1983. [32] S. Gago, J.C. HurajovĂĄ, and T. Madaras,“Betweenness centrality of graphs”, In: M. Dehmer and F.Emmert-Streib, Eds., Quantitative Graph Theory: Mathematical Foundations and Applications, Discrete Mathematics and Its Applications, Chapman and Hall/CRC, 2014,pp. 233-257. [33] D. Watts andS. Strogatz,“Collective dynamics of small world networks”, Nature, vol. 393, 1998, pp. 440-442. [34] A. Arenas, A. FernĂĄndez, and S. GĂłmez,“Analysis of the structure of complex networks at different resolution levels”, New Journal of Physics,vol. 10(5), 2008, 053039.