Focus on Domain
4 Levels of analysis
Data analysis
An overview to the collection
Main information about the collection
Dataset Box
Main Information
The most important information about the collection are shown in the boxes
Main information about the collection
Dataset Tab
Main Information
Authors Collaboration measures:
   
   
 
(Elango and Rajendran, 2012; Koseoglu, 2016)
International co-authorships %
= (
Multiple Countries
Publication
/
Total Publication)
* 100
Annual scientific production
Dataset Tab
Annual Scientific Production
All the graphs generated with
plotly are dynamic (passing the
mouse over the graph you can
see more information)
This information is contained in
the TABLE menu
Average Citations per Year
In our collection, one or more articles, published in 2010, are collecting the highest number of average total citations per year
Dataset Tab
Average Citations per Year
Three-Fields Plot:
Focus on the Top Keywords
Relationship among Top Keywords, Top Authors and Top Journals summarized by a Sankey Plot
Dataset Tab
Three-Fields Plot
The graph can
be created by
selecting
3 of the main
meta-data
fields
Three-Fields Plot:
Focus on the Top Authors
Dataset Tab
Three-Fields Plot
Intellectual
roots
Research
contents
Data analysis
Level: Sources
Most Relevant Sources
Sources Tab
Most Relevant Sources
Source is a journal/book/conference proceedings series/etc. which published one or more documents included in
our collection. E.g. In our collection, we have 151 sources
Most Cited Sources (from Reference Lists)
Sources Tab
Most Cited Sources
A Cited Source is a journal/book/conference proceedings series/etc. included in at least one of the reference lists of the
document set. E.g. In our collection, we have 35,603 cited sources included in the 4,441 document bibliographies.
«a cited source is a
source cited by one
or more documents»
Bradford’s law states that:
“if the journals are arranged in descending order of the number of
articles they carried on the subject, then successive zones of
periodicals containing the same number of articles on the subject form
the simple geometric series
Bradford called the first zone, the nucleus of
journals particularly devoted to the given subject
Sources Tab
Bradford’s law
Source clustering through Bradford's Law
Core
zone
Zone 3
Zone 2
Sources Tab
Bradford’s law
1 journal publishes about 42% of the documents of the entire collection. Core Zone is composed by just 1 journal out of 151
Source clustering through Bradford's Law
SCIENTOMETRICS
published 1856
documents
Sources Tab
Bradford’s law
37.4%
Middle zone
8 journals
1,154 articles
Core
Zone
42% Minor zone
142 journals
1,431 articles
Core zone
1 journal
1,856 articles
Scientometrics 26%
32%
The Bradford’s law can be used to identify “core” journals in a discipline and to focus the
analysis on the core zone documents
Source clustering through Bradford's Law
Source impact
by H-index and its generalizations
Sources Tab
Source Impact
The H-index (Hirsch index) is
an authors (or journal’s)
number of published articles
(H) each of which has been
cited in other papers at least h
time
H-index and its generalizations
Sources Tab
Source Impact
The m-index is defined as H/n, where H is the H-index and n is the number of years since
the first published paper of the scientist (journal)
The g-index has been introduced by Egghe in 2006 as an improvement of the h-index in
order to measure the global citation performance of a set of articles. If this set is ranked in
decreasing order of the number of citations that they received, the g-index is the (unique)
largest number such that the top g articles received (together) at least g2 citations.
Source Dynamics
Sources Tab
Source Dynamics
Data analysis
Level: Authors
(and their affiliations and countries)
Most Relevant Authors
per n. of Authored Documents
Authors Tab
Most Relevant Authors
Most Relevant Authors
per Fractionalized N. of Authored Documents
Authors Tab
Most Relevant Authors
Fractional authorship quantifies an individual authors contributions to a published set of papers
    
  
Most Relevant Authors
Authors Tab
Most Relevant Authors
Most Local Cited Authors
Authors Tab
Most Local Cited Authors
The bubble size
is proportional
to the n. of
documents
(Bornmann in
2016 authored
19 documents)
Authors Production over Time
Authors Tab
Authors Production over Time
The line
represents an
author's timeline
(Moed has the
longest timeline,
from 1985 to
2021)
The color
intensity is
proportion
al to the
total
citations
per year
(Bornmann
documents
2015, have
collected
106 total
citations
per year)
Author Productivity through Lotka's Law
Authors Tab
Lotka’s law
Lotka's law describes the frequency of
publication by authors in any given field
Lotka’s law is an approximate inverse-
square law, where the number of authors
publishing a certain number of articles is a
fixed ratio to the number of authors
publishing a single article
Lotka’s law affirms As the number of
articles published increases, authors
producing that many publications become
less frequent
core” authors
authors who have
published many
documents
(in this case at
least 5
documents)
Author Productivity through Lotka's Law
Authors Tab
Lotka’s law
---Theoretical distribution
occasional”
authors
4862 authors
(75.4%)
have written
just one
document
5
Author impact
by H-index and its generalizations
Authors Tab
Author Impact
Author impact
by H-index and its generalizations
Authors Tab
Author Impact
Leiden Univ
Original Affiliation strings need to be cleaned!
Most Relevant Affiliations
by number of documents
Authors Tab
Most Relevant Affiliation
Most Relevant Affiliations
by number of documents
Authors Tab
Most Relevant Affiliation
Disambiguated Affiliation items
Affiliations' Production over Time
Authors Tab
Affiliations' Production over Time
Corresponding Author's Country
Authors Tab
Corresponding Author's Country
MCP = Multiple Countries Publication
SCP = Single Country Publication
MCP indicates, for each country, the number of documents in which there is at least one co-author from a different country.
MCP measures the international collaboration intensity of a country.
Country
Articles
Freq
SCP
MCP
MCP_Ratio
CHINA
544
0,123496
421
123
0,2261
USA
510
0,115778
414
96
0,1882
INDIA
353
0,080136
337
16
0,0453
SPAIN
316
0,071737
247
69
0,2184
GERMANY
240
0,054484
159
81
0,3375
UNITED KINGDOM
223
0,050624
166
57
0,2556
ITALY
212
0,048127
184
28
0,1321
NETHERLANDS
209
0,047446
152
57
0,2727
BRAZIL
128
0,029058
105
23
0,1797
CANADA
128
0,029058
90
38
0,2969
BELGIUM
117
0,026561
50
67
0,5726
AUSTRALIA
87
0,01975
69
18
0,2069
RUSSIA
87
0,01975
74
13
0,1494
SWEDEN
84
0,019069
63
21
0,25
FRANCE
80
0,018161
55
25
0,3125
DENMARK
72
0,016345
60
12
0,1667
SOUTH AFRICA
65
0,014756
57
8
0,1231
KOREA
58
0,013167
43
15
0,2586
IRAN
55
0,012486
50
5
0,0909
MALAYSIA
52
0,011805
41
11
0,2115
Corresponding Author's Country
Authors Tab
Corresponding Author's Country
High
International
Collaboration
Low
International
Collaboration
Number of Documents per Country
Authors Tab
Country Scientific Production
In this map, all authors’ nationality who make up the collection is considered
The color intensity is proportional to the number of publications
Countries Production over Time
Authors Tab
Countries Production over Time
Total Citations per Country
Authors Tab
Most Cited Countries
Average Citations per Year per Country
Authors Tab
Most Cited Countries
Data analysis
Level: Documents
(and their contents and bibliographies)
Documents and References
Document (or citing document)
It refers to a scientific document (article, review, conference proceedings, etc.) included in a
bibliographic collection
e.g. we have 4,441 documents
Reference (or cited reference)
It refers to a scientific document included in at least one of the reference lists (bibliography) of
the document set Then “a reference is cited by one or more documents”
e.g. we have 94,843 references included in the 4,441 document bibliographies
Cited Document
It refers to a scientific document included in a bibliographic collection, and at the same time, it is
cited in at least one other document of the collection
Cited documents are a subset of the reference set
References
Documents
Cited Documents
A cited document
is a document
cited by other
document of the
same collection
It belongs both in
the Document
Set and in the
Reference Set
Documents and References
Cited Documents
2,996 items
References
94,843 items
Documents
4,441 items
In our collection:
Bibliometric Approaches in
Information Science Library
Science disciplines
Documents and References
Global and Local Citations
Global citations
It measures the number of citations a document has received from documents contained in
the entire database (e.g. WoS or Scopus)
This data is provided by WoS/Scopus and is included in the meta-data record
Global citations measure the impact of a documents in the whole bibliographic
database
For many documents, a large part of global citations could come from other
disciplines!
Local citations
It measures the number of citations a document has received from documents
included in the analyzed collection
It is calculated by bibliometrix analyzing the whole reference set.
Local citations measure the impact of a documents in the analyzed collection
Top Documents by Global Citations
Documents Tab
Most Global Cited Documents
Top Documents by Local Citations
Documents Tab
Most Local Cited Documents
The paper most cited globally is
also the one most cited locally
Document
Year
Local
Citations
Global
Citations
LC/GC
Ratio (%)
Normalized
Local Citations
Normalized
Global Citations
VAN ECK NJ, 2010, SCIENTOMETRICS
-009-0146
2010
234
3191
7,33
21,16
43,96
VAN RAAN AFJ, 2006, SCIENTOMETRICS
2006
146
405
36,05
11,16
7,71
BORNMANN L, 2008, J DOC
2008
131
712
18,40
13,08
16,00
MEHO LI, 2007, J AM SOC INF SCI TEC
2007
121
652
18,56
13,82
16,63
MOED HF, 1995, SCIENTOMETRICS
1995
114
355
32,11
14,85
12,87
NEDERHOF AJ, 2006, SCIENTOMETRICS
-006-0007
2006
105
481
21,83
8,03
9,15
WALTMAN L, 2016, J INFORMETR
2016
103
443
23,25
25,50
18,67
VAN RAAN AFJ, 2005, SCIENTOMETRICS
-005-0008
2005
97
450
21,56
12,15
11,30
ALONSO S, 2009, J INFORMETR
2009
97
476
20,38
11,36
11,70
WALTMAN L, 2011, J INFORMETR
2011
90
262
34,35
10,22
6,38
COSTAS R, 2007, J INFORMETR
2007
86
288
29,86
9,82
7,35
VANRAAN AFJ, 1996, SCIENTOMETRICS
1996
83
259
32,05
9,99
6,49
BOYACK KW, 2010, J AM SOC INF SCI TEC
2010
82
534
15,36
7,41
7,36
MONGEON P, 2016, SCIENTOMETRICS
-015-1765
2016
81
890
9,10
20,05
37,52
BORGMAN CL, 2002, ANNU REV INFORM SCI
2002
78
378
20,63
6,23
5,46
WALTMAN L, 2010, J INFORMETR
2010
78
635
12,28
7,05
8,75
D'ANGELO CA, 2011, J AM SOC INF SCI TEC
2011
74
129
57,36
8,41
3,14
WALTMAN L, 2012, J AM SOC INF SCI TEC
-a
2012
72
243
29,63
11,09
6,12
GLANZEL W, 2002, SCIENTOMETRICS
2002
70
358
19,55
5,59
5,17
WALTMAN L, 2012, J AM SOC INF SCI TEC
2012
66
302
21,85
10,17
7,61
COBO MJ, 2011, J AM SOC INF SCI TEC
2011
64
745
8,59
7,27
18,14
Top Documents by Local Citations
Documents Tab
Most Local Cited Documents
Top References by Local Citations
Documents Tab
Most Local Cited References
Documents Tab
Reference Publication Year Spectroscopy (RPYS) is a quantitative
method for identifying the historical origins of research fields and
topics.
RPYS creates a temporal profile of cited references for a set of papers
that emphasizes years where relatively significant findings were
published.
RPYS allows to identify the temporal roots of a disciplines
Marx, W., Bornmann, L., Barth, A., & Leydesdorff, L. (2014). Detecting the historical roots of research fields by
reference publication year spectroscopy (RPYS). Journal of the Association for Information Science and
Technology,65(4), 751-764.
Reference Publication Year Spectroscopy
Reference Spectroscopy
Deviation from the 5-year median
Documents Tab
Reference Publication Year Spectroscopy
Reference Spectroscopy
N. of cited reference per year
What happened in 1963?
Documents Tab
Reference Spectroscopy
Reference Publication Year Spectroscopy
Top Author Keywords by occurrences
Documents Tab
Most Frequent Words
Author Keywords consist of a list of terms that authors believe best represent the content of their paper
They are often selected prudently and need to be cleaned (plurals, conjugations, etc.)
These 3 words
are trivial
because they
consist in a set
of terms used to
build up the
query!
(This is true also
for title and
abstract words)
Text Editing: how to remove terms?
Documents Tab
Most Frequent Words
bibliometrix can remove trivial terms in two steps:
1) Upload a text file or a CSV file in which there are the words of
the plot that you want to delete, separated by “,or “;” or
tabular form
2) Indicate the chosen separator
The list loaded on the screen!!
Top Author Keywords by occurrences
Documents Tab
Most Frequent Words
Terms removed:
bibliometrics,
bibliometric,
bibliometric analysis
These two words are
synonymous with citation
analysis so they must be
combined
Text Editing:
how to combine synonyms?
Documents Tab
Most Frequent Words
bibliometrix can combine synonyms in two steps:
1) Upload a text file or a CSV file in which there are the synonyms
of the plot that we want to combine separated by , or ;or
in tabular form. First word of the list will be the one to which
bibliometrix will associate all the following ones
2) Indicate the chosen separator
The list loaded on the screen!!
Top Author Keywords by occurrences
Documents Tab
Most Frequent Words
«citation analysis» now contains
the frequency of its synonyms:
- citation analysis (363)
- citations (101)
- citation (59)
Top Keywords Plus by occurrences
Keywords Plus, generated by an automatic computer algorithm, are words or phrases that appear frequently in
the titles of an article's references and not necessarily in the title of the article or as Author Keywords
Documents Tab
Most Frequent Words
Garfield claimed that Keywords Plus terms are able to capture an
article's content with greater depth and variety (Garfield, 1993)
Keywords Plus is as effective as Author Keywords in terms of
bibliometric analysis investigating the knowledge structure of
scientific fields, but it is less comprehensive in representing an
article's content (Zhang et al., 2016)
Documents Tab
Most Frequent Words
Comparing Keywords Plus and
Author Keywords
Title words by occurrences
Words are extracted by titles (or abstracts) removing “stop words” and punctuation.
To title and abstract, you can see the single word, the pair of words and the triplet of the most frequent words.
Documents Tab
Most Frequent Words
Abstract words by occurrences
Abstract words need to be cleaned by trivial terms such as “research”, “study, … e.g. A typical first sentence of a
generic abstract: “This study…” or “In our paper…” or “Our analysis…”
Documents Tab
Most Frequent Words
Abstract words by occurrences
Documents Tab
Most Frequent Words
Subject Categories by occurrences
Documents Tab
Most Frequent Words
Top Author Keywords, Keyword plus, Title,
Abstract words represented by a Wordcloud
A tag cloud (word cloud, or weighted list in visual design) is a visual
representation of text data, typically used to depict keyword
metadata (tags) on websites, or to visualize free form text
Tags are usually single words, and the importance of each tag is
shown with font size or color
This graph is useful for quickly perceiving the most prominent terms
and for locating a term alphabetically to determine its relative
prominence
Keyword Plus
Title Words
Author Keyword
Abstract Words
Top Keyword plus, subject categories, Author
Keywords, Title, Abstract words represented
by Wordcloud
Documents Tab
Wordcloud
It is possible to choose different scale transformations (Frequency, Square root, Log, Log10) to
smooth the word frequency distribution (McDonald, 2009)
Subject categories
Top Author Keywords and Keyword plus
represented by Treemap
Treemap
Documents Tab
Keyword Plus
Author Keyword
Treemap
Title Words
Abstract Words
Top subject categories, Title, Abstract words
represented by Treemap
Documents Tab
Treemap
Documents Tab
Subject categories (Wos)
Word Dynamics
Keywords Plus
Number of occurrences cumulate
Keywords Plus
Number of occurrences per year
Passing the mouse on the graph it is possible to see which is the most frequent word, year and its occurrence
Documents Tab
Word Dynamics
Word Dynamics
Authors Keywords
Number of occurrences cumulate
Authors Keywords
Number of occurrences per year
Passing the mouse on the graph it is possible to see which is the most frequent word, year and its occurrence
Documents Tab
Word Dynamics
Word Dynamics
Titles
Number of occurrences cumulate
Titles
Number of occurrences per year
Passing the mouse on the graph it is possible to see which is the most frequent word, year and its occurrence
Documents Tab
Word Dynamics
Word Dynamics
Abstracts
Number of occurrences cumulate
Abstracts
Number of occurrences per year
Passing the mouse on the graph it is possible to see which is the most frequent word, year and its occurrence
Documents Tab
Word Dynamics
Keyword plus
Documents Tab
Trend Topics
Each bubble on the
graph represents a
topic. Bubble size is
proportional to the
word occurrences.
The gray bar
indicates the first
and third quartiles
of the occurrence
distribution
The trend topic graph is a scatter diagram in which, time is on the x axis, topic is on the y axis. The reference year for each topic
is identified using the median of the distribution of occurrences over the time period considered. To increase the readability of
the graph, for each year tonly the first ktopics are reported in decreasing order of frequency.
Data analysis
Level: Clustering of Documents, Authors and Sources
Clustering by Coupling
Bibliographic coupling can be considered the mirror image of co-
citation.
It is the relationship between two or more documents citing a third.
Bibliographic coupling starts from the assumption that two documents,
while not directly citing each other, can be significantly correlated as
they share at least one bibliographic reference.
The first to theorize the conceptual value of bibliographic coupling was
in 1963 M. M. Kessler who directly proved his theory by analyzing the
articles published in 35 volumes of the Physical Review (Kessler 1963).
Clustering by Coupling
Clustering by Coupling
The starting point is a Matrix references for references
it is a symmetric matrix
It tells us how similar the documents are since they are cited by the
same references





Diagonal elements are the
occurrence of each reference in
the collection
Non-diagonal elements are the co-
occurrence of two references in
the collection
Coupling
Map
Parameters
Clustering by Coupling
3 Units of analysis:
2 Impact measures:
Attributes:
Clustering by Coupling
Clusters by Documents Coupling
Clustering by Coupling
Network
Clusters by Documents Coupling
Units of analysis:
Documents
Coupling measured by:
References
Impact measure:
Local citation Score
Cluster labeling by:
Keyword plus
Clustering by Coupling
Map
Clusters by Documents Coupling
Clustering by Coupling
Table
Clusters by Documents Coupling
Clustering by Coupling
Clusters