Biblioshiny
bibliometrix for no coders
MASSIMO ARIA
FULL PROFESSOR IN STATISTICS FOR SOCIAL SCIENCES UNIVERSITÀ DEGLI STUDI DI NAPOLI FEDERICO II
CORRADO CUCCURULLO
FULL PROFESSOR IN MANAGEMENT AND ECONOMICS UNIVERSITÀ DEGLI STUDI DELLA CAMPANIA “LUIGI VANVITELLI
bibliometrix: An R-Tool for Comprehensive
Science Mapping Analysis
bibliometrix is an open-source tool for executing a
comprehensive science mapping analysis of scientific
literature
It was programmed in R language to be flexible and
facilitate integration with other statistical and
graphical packages. Indeed, bibliometrics is a constantly
changing science and bibliometrix has the flexibility to
be quickly upgraded and integrated
Its development can address a large and active
community of developers formed by prominent
researchers
Aria, M., & Cuccurullo, C. (2017). bibliometrix: An R-tool for comprehensive science mapping analysis. Journal of Informetrics.
bibliometrix
bibliometrix provides various routines for importing bibliographic data
from SCOPUS, Clarivate Analytics' Web of Science, Dimensions, The
Lens, PubMed and Cochrane databases, performing bibliometric
analysis and building data matrices for co-citation, coupling, scientific
collaboration analysis and co-word analysis.
Biblioshiny App
Biblioshiny is a web-base app included in the bibliometrix package
Biblioshiny allows no coders to use bibliometrix
It is developed in the Shiny environment
Just install and load bibliometrix, library(bibliometrix) and type
biblioshiny() and the game starts!
Biblioshiny: how it works
Biblioshiny combines the functionality of bibliometrix package with the
ease of use of web apps using the Shiny package environment
What do you see! What it do!
biblioshiny web app Shiny
dashboard
package
R environment
bibliometrix
package
The Data
What, Where, How
Bibliographic database
Abibliographic database is a database of bibliographic records, an
organized digital collection of references to published scientific
literature, including journal articles, conference proceedings,
patents, books, etc.
They generally contain very rich subject descriptions in the form of
keywords, subject classification terms, or abstracts.
Information related to a bibliographic record are named
bibliographic meta-data
Main bibliographic databases
Multidisciplinary:
Microsoft Academic
CrossRef
Dimensions
OpenAlexR
Web of Science
Scopus
Specialized:
ArXiv
Cochrane
EconBiz
IEEE Xplore
PubMed
(Source: Visser, M., van Eck, N. J., & Waltman, L. (2021). Large-scale comparison of bibliographic data sources: Scopus,
Web of Science, Dimensions, Crossref, and Microsoft Academic. Quantitative Science Studies,2(1), 20-41.
An example about document meta-data
Meta-Data
Supported databases:
Web of Science (WoS)
Scopus
Dimensions
Lens.org
PubMed
Cochrane
Openalex
Dataset (analyzed by biblioshiny)
File formats:
Plain text
BibTeX
CSV/xlsx
CIW
zip (Multiple files importing)
Rdata (bibliometrix file)
An example of WoS
plain text export format
Main meta-data fields:
AU Authors
AF Authors’ full name
TI Title
SO Document source (eq Journal name)
DT Document type
DE Authors’ keywords
ID Keyword Plus (assigned by WoS machine learning algorithm)
AB Abstract
C1 Authors’ affiliations
RP Corresponding author’ affiliation
CR Cited references
TC Total citations
PY Publication year
DI DOI
SC Subject category
For a complete list of field tags see
http://www.bibliometrix.org/documents/Field_Tags_bibliometrix.pdf
Bibliographic record of: Aria & Cuccurullo, 2017, Journal of Informetrics
An example of BibTeX export formats
Web of Science Scopus
Very different!
(both in terms of content and string format)
The merging of WoS and Scopus collections
is a very difficult task
(no software currently allows this!)
Some remarks about the DBs and the data
formats
Web of Science is preferable to other databases in terms of data
quality
In Scopus, the reference elements mentioned are not standardized -> they must
be combined
In Dimensions, the algorithm that classifies search areas is not efficient
In Web of Science, plain text format is preferable to others
Scopus BibTeX format and Dimensions CSV format do not allow exporting some
metadata
Some limits of the DBs
Web of Science:
Permits to export only 500 records at a time but allows to split the selected collection into multiple
downloads (e.g. from 1 to 500,501 to 1000,1001 to 1500,., from 5001 to 5500, etc.)
It is not possible to use API to directly search and export meta-data (using the Italian academic subscription)
Scopus:
Scopus permits to export 2,000 records at a time but does not allow to split the selected collection into
multiple downloads (It is necessary to define a multiple search strategy selecting up to 2,000 documents at a
time!)
It allows scholars to use API to directly search and export meta-data in R environment (e.g. rscopus package)
Dimensions:
Dimensions allows you to export 50,000 records at a time but does not allow you to split the selected
collection into multiple downloads (It is necessary to define a multiple search strategy by selecting up to
50,000 documents at a time!)
It allows scholars to use the API to search and export metadata in R environment (e.g. dimensionsR package)
Data Collection
Querying, Selecting, Exporting
Querying” a bibliographic database
Data can be extracted through a query
A query is a combination of terms linked by Boolean operators
A query defines a search strategy by (search fields):
Keywords
Titles
Abstracts
Authors
Journals
Affiliations
An example using the Web of Science database
The use of bibliometric approaches in information science and library
science disciplines”
We want to describe the use of the bibliometric approaches in the
information science and library science scientific literature
Lets start with the example:
How to define a query?
1) We need to choose the combination of terms which identify the
scientific literature that used bibliometric approaches
2) We need to limit research in the area of information science and
library science disciplines
3) We need to limit the timespan
4) We need to choose what kind of documents to analyze
PRISMA diagram
PRISMA stands for Preferred Reporting Items for
Systematic Reviews and Meta-Analyses
It is an evidence-based minimum set of items for
reporting in systematic reviews and meta-
analyses
The aim of the PRISMA Statement is to help
authors improve the reporting of systematic
reviews and meta-analyses
PRISMA may also be useful for critical appraisal of
published systematic reviews, although it is not a
quality assessment instrument to gauge the
quality of a systematic review
Search strategy
1. Select the WoS sub-DBs: SCI (Science Citation Index), SSCI (Social
Science Citation Index), and ESCI(Emerging Sources Citation
Index
2. Select all the documents that contain the words bibliometric*”
or science map*in the title, abstract or in the keyword list
3. Select only documents included in the subject category
Information Science Library Science
4. Select the timespan: all complete years
5. Select document types: Articles or Proceedings Papers or Review
Articles
6. Select only documents written in English
Select the WoS sub-DBs:
Search strategy Step 1
SCI - EXPANDED (Science Citation Index)
SSCI (Social Science Citation Index)
ESCI (Emerging Sources Citation Index)
Select all the documents that contain the words
“bibliometric*” or “science map*” in the topic
“*” is a jolly
symbol
it means any
character
e.g. bibliometric*:
- Bibliometric
- Bibliometrics
- Etc.
e.g. science map*:
- Science map
- Science maps
- Science mapping
Boolean operator:
“OR”
AND”
“NOT
Search strategy Step 2
Search field:
“title”
“topic”
“keywords”
Query result:
20,248 documents
Search strategy Step 2
Results for "bibliometric*" (Topic) OR "science map*"
(Topic)
WoS categories:
Information Science Library Science
Refine by
WoS category:
Information
Science Library
Science
Results:
5,390
documents
Search strategy Step 3
NOT Publication Years: 2022
Exclude incomplete years: 2022
Results: 5,329 documents
Search strategy Step 4
Document types:
Articles or Proceedings
Papers or Review Articles
Search strategy Step 5-6
Refine by document type:
articles, proceedings papers, and
review articles
Results: 5,107 documents
Refine by language: English
Results: 4,441 documents
Languages: English
Continue the search or export the results
Now we should continue our search strategy (using other filters) or export the results
Search strategy Marked list
Search history
Clicking on a
result, we can go
back along the
search steps.
History represents
a classical
PRISMA diagram
It shows the
search steps we
just performed
Search strategy Search history
Export allows you to immediately export the results of your search
Marked List is a sort of shopping cart where we can save your
meta-data collection and we can continue to manipulate or export
the results of your search in the future
Difference between
Add to marked list” and “export
Search strategy
Add to marked list
Search strategy Marked list
“Export
WoS will download many plain text files
called: saverecs.txt, saverecs (1).txt,
Search strategy File export
Repeat this operation until the total
number of documents is reached (e.g.,
from 1 to 500, from 501 to 1000, etc.)
Biblioshiny interface
Tabs, Methods, Workflow
bibliometrix package installation
Lets start play with biblioshiny!!!
Biblioshiny interface
After using the
software
remember to
cite
Notifications about
software
-Package tutorial
-Information about
Convert and import
data
-A biblioshiny tutorial
Possibility of making
donations that help
ensure the future
development of
Bibliometrix.
Link to:
bibliometrix
K-Synth
Github
3 Structures
of Knowledge
4 Levels of
analysis
Biblioshiny workflow
Organized according to the science mapping workflow
Welcome Tab
Bibliometric Analysis for Systematic Literature
Reviews
Bibliometric Analysis
Focus on domain
(metrics) K structures
Often, new knowledge
emerges at crossroad
among structures and
time evolution
Level of Analysis
Metrics
Overview
Main
information
Annual
scientific production
Average
citations per year
Three
-Field Plot
Sources
Most relevant and cited
Bradford’s Law
Impact metrics
Source dynamics
Authors
Affiliations
Countries
Most relevant and cited
Production over time
Lotka’s
Law
Impact metrics
Documents
Cited references
Words: ID, DE, TI, AB
Most local/global cited
Spectroscopy
Words: word cloud,
treemap
Trend topic
Structure Bibliometric
Technique
Unit of
Analysis
Statistical
Techniques
Conceptual
Co-word
ID, DE (keywords)
TI
AB
Subject categories (Wos)
Network Analysis
Thematic mapping
Thematic evolution
Factorial Analysis (CA; MCA; MDS)
Intellectual
Co-citation
Citation
Papers
Authors
Sources
Network analysis
Histograph
Social
Collaboration
Authors (co-authorship)
Institution
Countries
Collaboration network
Clustering by Coupling
Data analysis
Loading, Converting, Filtering
Loading and converting a meta-data collection
Load Tab
biblioshiny allows you to
Import or Load files or
Gather data using the APIs
(Application Programming
Interfaces)
Loading and converting a meta-data collection
Load Tab
Data downloaded through this
API cannot be used for
identifying citation,
bibliographic coupling, and co-
citation links between items.
Database
selection
biblioshiny can
download data via
APIs
bibliometrix provides
support for the APIs
of Dimensions, NCBI
PubMed and Scopus.
Export meta-
data
Number of
Documents
Loading and converting a meta-data collection
Load Tab
By default, the access
to PubMed API is free
and does not
necessarily require an
API key. In this case,
PubMed limits users to
making only 3
requests per second.
Users who register for
an API key are able to
make up to ten
requests per second
Loading and converting a meta-data collection
Load Tab
Dimensions API needs
an account to obtain a
valid token to query the
database.
The account can be
obtained for free for
scientometric research
project asking for it at
https://www.dimensions
.ai/scientometric-
research/
Loading and converting a meta-data collection
Load Tab
biblioshiny allows you to
import raw files
(.bib, .txt, .ciw, .csv),
load a bibliometrix file(s)
(.rdata, .xlsx),
or use a sample collection,
from from the major
bibliometric databases.
Database
selection
File selection
Convert/Export
meta-data
Multiple export files (e.g. savedrecs.txt, savedrecs(1).txt., etc) can be imported by a single .zip” file
Load Tab
Number of
Documents
Loading and converting a meta-data collection
Bibliographic dataframe:
Each row is a document
Each column is a meta-data field
Loading and converting a meta-data collection
Load Tab
Filtering data
Language (LA)
Publication year (PY)
Document Type (DT)
Average Citation per Year
Source by Bradford’s Law
In our example, we do not need to apply any filter
Filter Tab