The Research Organization Registry (ROR)
The Research Organization Registry (ROR) is a global, community-led registry of open persistent identifiers for research organizations. The Research Organization Registry (ROR) includes IDs and metadata for more than 107,000 organizations and counting. Registry data is CC0 and openly available via a search interface, REST API, and data dump. Registry updates are curated through a community process and released at least once a month.
ROR REST API v2
Version 2 of the ROR schema and API was released on April 15, 2024 (official API documentation). Let us use the REST API to retrieve data on all German research organizations in education.
ror_base_url <- "https://api.ror.org/v2/organizations"
req <- httr2::request(
ror_base_url
) |>
httr2::req_url_query(
filter = c(
"types:education",
"country.country_code:DE"
),
.multi = "comma"
)
resp <- req |>
httr2::req_perform()
resp_body <- resp |>
httr2::resp_body_json()
no_of_results <- resp_body$number_of_results
no_of_pages <- ceiling(no_of_results / 20)
To determine how many pages you will need to retrieve in order to obtain your entire result set, check metadata.number_of_results and divide by 20. Regardless of which page you are on, metadata.number_of_results indicates the total number of results returned by your request.
The total number of search results for a query filtering on
types:education
and country.country_code:DE
gives 544 results. This means we have to request the data for 28 pages.
We can implement this with httr2::iterate_with_offset
.
ror_base_url <- "https://api.ror.org/v2/organizations"
req <- httr2::request(
ror_base_url
) |>
httr2::req_url_query(
filter = c(
"types:education",
"country.country_code:DE"
),
.multi = "comma"
) |>
httr2::req_throttle(
10
)
resps <- httr2::req_perform_iterative(
req,
next_req = httr2::iterate_with_offset(
param_name = "page",
resp_pages = function(resp) ceiling(httr2::resp_body_json(resp)$number_of_results / 20)
),
max_reqs = Inf
)
#> Iterating ■■■■■■ 18% | ETA: 5s
#> Iterating ■■■■■■■ 21% | ETA: 5s
#> Iterating ■■■■■■■■■■■■■■■■■■■■■■ 71% | ETA: 2s
#> Iterating ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 100% | ETA: 0s
df_ror_germany_education <- resps |>
purrr::map(
httr2::resp_body_json
) |>
purrr::map(
"items"
) |>
purrr::list_flatten() |>
tibble::tibble() |>
tidyr::unnest_wider(
1
)
df_ror_germany_education
#> # A tibble: 544 × 11
#> admin domains established external_ids id links locations names
#> <list> <list> <int> <list> <chr> <list> <list> <list>
#> 1 <named list> <NULL> 1997 <list [1]> https:… <list> <list> <list>
#> 2 <named list> <NULL> 2014 <list [1]> https:… <list> <list> <list>
#> 3 <named list> <NULL> 2006 <list [1]> https:… <list> <list> <list>
#> 4 <named list> <NULL> NA <list [1]> https:… <list> <list> <list>
#> 5 <named list> <NULL> 2015 <list [1]> https:… <list> <list> <list>
#> 6 <named list> <NULL> 2005 <list [2]> https:… <list> <list> <list>
#> 7 <named list> <NULL> 1951 <list [3]> https:… <list> <list> <list>
#> 8 <named list> <NULL> NA <list [1]> https:… <list> <list> <list>
#> 9 <named list> <NULL> NA <list [1]> https:… <list> <list> <list>
#> 10 <named list> <NULL> NA <list [1]> https:… <list> <list> <list>
#> # ℹ 534 more rows
#> # ℹ 3 more variables: relationships <list>, status <chr>, types <list>
Extracting the data in the list columns for each German research organization in education
Because df_ror_germany_education
contains many list
columns, extracting data can be quite cumbersome. Here are workflows for
the most important list columns.
Names
For illustrative purposes, this is one way of retrieving all content
from df_ror_germany_education[["names"]]
for each
organization.
df_names <- tibble::tibble(
id = df_ror_germany_education[["id"]],
names = df_ror_germany_education[["names"]]
) |>
tidyr::unnest_longer(
col = 2,
indices_include = TRUE
) |>
tidyr::unnest_wider(
col = "names"
) |>
tidyr::unnest_wider(
col = "types",
names_sep = "_"
) |>
dplyr::select(
id,
names_id,
name = value,
lang,
dplyr::starts_with("types")
)
df_names
#> # A tibble: 1,479 × 6
#> id names_id name lang types_1 types_2
#> <chr> <int> <chr> <chr> <chr> <chr>
#> 1 https://ror.org/009dxjw45 1 Faurecia (Germany) en ror_di… label
#> 2 https://ror.org/04839sh14 1 MHB NA acronym NA
#> 3 https://ror.org/04839sh14 2 Medizinische Hochsc… de ror_di… label
#> 4 https://ror.org/02nq6sy34 1 AfG NA acronym NA
#> 5 https://ror.org/02nq6sy34 2 Akademie für Gesund… de ror_di… label
#> 6 https://ror.org/02652az59 1 CUA NA acronym NA
#> 7 https://ror.org/02652az59 2 Control Union Acade… en ror_di… label
#> 8 https://ror.org/05gxhfq53 1 Hochschule der Poli… de ror_di… label
#> 9 https://ror.org/04tenkb98 1 German Graduate Sch… en ror_di… label
#> 10 https://ror.org/04tenkb98 2 Hheilbronn Business… en alias NA
#> # ℹ 1,469 more rows
Links
This is one way of retrieving all content from
df_ror_germany_education[["links"]]
for each
organization.
df_links <- tibble::tibble(
id = df_ror_germany_education[["id"]],
links = df_ror_germany_education[["links"]]
) |>
tidyr::unnest_longer(
col = 2,
indices_include = TRUE
) |>
tidyr::unnest_wider(
col = "links"
)
df_links
#> # A tibble: 854 × 4
#> id type value links_id
#> <chr> <chr> <chr> <int>
#> 1 https://ror.org/009dxjw45 website http://www.faurecia.de/ 1
#> 2 https://ror.org/009dxjw45 wikipedia https://en.wikipedia.org/wiki/F… 2
#> 3 https://ror.org/04839sh14 website http://www.mhb-fontane.de/ 1
#> 4 https://ror.org/02nq6sy34 website http://www.afg-heidelberg.de/wi… 1
#> 5 https://ror.org/02652az59 website http://www.cu-academy.de/ 1
#> 6 https://ror.org/05gxhfq53 website http://www.polizei.rlp.de/ 1
#> 7 https://ror.org/04tenkb98 website http://www.ggs.de/en/ 1
#> 8 https://ror.org/01a2z2x66 website http://www.iwkoeln.de/en/ 1
#> 9 https://ror.org/01a2z2x66 wikipedia https://en.wikipedia.org/wiki/C… 2
#> 10 https://ror.org/00nm1ws09 website http://www.ifnm.net/ 1
#> # ℹ 844 more rows
Locations
This is one way of retrieving all content from
df_ror_germany_education[["locations"]]
for each
organization.
df_locations <- tibble::tibble(
id = df_ror_germany_education[["id"]],
locations = df_ror_germany_education[["locations"]]
) |>
tidyr::unnest_longer(
col = 2
) |>
tidyr::unnest_wider(
col = "locations"
) |>
tidyr::unnest_wider(
col = "geonames_details"
)
df_locations
#> # A tibble: 544 × 7
#> id country_code country_name lat lng name geonames_id
#> <chr> <chr> <chr> <dbl> <dbl> <chr> <int>
#> 1 https://ror.org/009d… DE Germany 49.2 9.35 Neue… 2865560
#> 2 https://ror.org/0483… DE Germany 52.9 12.8 Neur… 2864276
#> 3 https://ror.org/02nq… DE Germany 49.4 8.66 Heid… 2907911
#> 4 https://ror.org/0265… DE Germany 52.5 13.5 Berl… 2950159
#> 5 https://ror.org/05gx… DE Germany 49.9 7.25 Büch… 2942559
#> 6 https://ror.org/04te… DE Germany 49.1 9.22 Heil… 2907669
#> 7 https://ror.org/01a2… DE Germany 50.9 6.96 Köln 2886242
#> 8 https://ror.org/00nm… DE Germany 50.7 7.12 Bonn 2946447
#> 9 https://ror.org/01nb… DE Germany 49.9 8.63 Darm… 2938913
#> 10 https://ror.org/02qt… DE Germany 48.9 8.71 Pfor… 2853969
#> # ℹ 534 more rows
External IDs
This is one way of retrieving all content from
df_ror_germany_education[["external_ids"]]
for each
organization.
df_external_ids <- tibble::tibble(
id = df_ror_germany_education[["id"]],
external_ids = df_ror_germany_education[["external_ids"]]
) |>
tidyr::unnest_longer(
col = 2,
indices_include = TRUE
) |>
tidyr::unnest_wider(
col = "external_ids"
) |>
tidyr::unnest_wider(
col = "all",
names_sep = '_'
) |>
dplyr::select(
id,
external_ids_id,
type,
preferred,
dplyr::starts_with("all")
)
df_external_ids
#> # A tibble: 1,335 × 10
#> id external_ids_id type preferred all_1 all_2 all_3 all_4 all_5 all_6
#> <chr> <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 https://… 1 grid grid.473… grid… NA NA NA NA NA
#> 2 https://… 1 grid grid.473… grid… NA NA NA NA NA
#> 3 https://… 1 grid grid.473… grid… NA NA NA NA NA
#> 4 https://… 1 grid grid.473… grid… NA NA NA NA NA
#> 5 https://… 1 grid grid.473… grid… NA NA NA NA NA
#> 6 https://… 1 grid grid.473… grid… NA NA NA NA NA
#> 7 https://… 2 isni NA 0000… NA NA NA NA NA
#> 8 https://… 1 grid grid.473… grid… NA NA NA NA NA
#> 9 https://… 2 isni NA 0000… NA NA NA NA NA
#> 10 https://… 3 wiki… NA Q958… NA NA NA NA NA
#> # ℹ 1,325 more rows
# Let's filter on all Wikidata entries
df_external_ids |>
dplyr::filter(
type == "wikidata"
)
#> # A tibble: 358 × 10
#> id external_ids_id type preferred all_1 all_2 all_3 all_4 all_5 all_6
#> <chr> <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 https://… 3 wiki… NA Q958… NA NA NA NA NA
#> 2 https://… 3 wiki… NA Q455… NA NA NA NA NA
#> 3 https://… 3 wiki… NA Q111… NA NA NA NA NA
#> 4 https://… 3 wiki… NA Q302… NA NA NA NA NA
#> 5 https://… 3 wiki… NA Q314… NA NA NA NA NA
#> 6 https://… 3 wiki… NA Q322… NA NA NA NA NA
#> 7 https://… 3 wiki… NA Q162… NA NA NA NA NA
#> 8 https://… 3 wiki… NA Q314… NA NA NA NA NA
#> 9 https://… 4 wiki… NA Q160… NA NA NA NA NA
#> 10 https://… 3 wiki… NA Q135… NA NA NA NA NA
#> # ℹ 348 more rows