Research Organization Registry (ROR) • studentenstatistikNRW

library(studentenstatistikNRW)

The Research Organization Registry (ROR)

The Research Organization Registry (ROR) is a global, community-led registry of open persistent identifiers for research organizations. The Research Organization Registry (ROR) includes IDs and metadata for more than 107,000 organizations and counting. Registry data is CC0 and openly available via a search interface, REST API, and data dump. Registry updates are curated through a community process and released at least once a month.

ROR REST API v2

Version 2 of the ROR schema and API was released on April 15, 2024 (official API documentation). Let us use the REST API to retrieve data on all German research organizations in education.

ror_base_url <- "https://api.ror.org/v2/organizations"

req <- httr2::request(
  ror_base_url
) |> 
  httr2::req_url_query(
    filter = c(
      "types:education",
      "country.country_code:DE"
    ),
    .multi = "comma"
  )

resp <- req |> 
  httr2::req_perform()

resp_body <- resp |> 
  httr2::resp_body_json()

no_of_results <- resp_body$number_of_results
no_of_pages <- ceiling(no_of_results / 20)

To determine how many pages you will need to retrieve in order to obtain your entire result set, check metadata.number_of_results and divide by 20. Regardless of which page you are on, metadata.number_of_results indicates the total number of results returned by your request.

The total number of search results for a query filtering on types:education and country.country_code:DE gives 544 results. This means we have to request the data for 28 pages. We can implement this with httr2::iterate_with_offset.


ror_base_url <- "https://api.ror.org/v2/organizations"

req <- httr2::request(
  ror_base_url
) |> 
  httr2::req_url_query(
    filter = c(
      "types:education",
      "country.country_code:DE"
    ),
    .multi = "comma"
  ) |> 
  httr2::req_throttle(
    10
  )

resps <- httr2::req_perform_iterative(
  req,
  next_req = httr2::iterate_with_offset(
    param_name = "page",
    resp_pages = function(resp) ceiling(httr2::resp_body_json(resp)$number_of_results / 20)
  ),
  max_reqs = Inf
)
#> Iterating ■■■■■■                            18% | ETA:  5s
#> Iterating ■■■■■■■                           21% | ETA:  5s
#> Iterating ■■■■■■■■■■■■■■■■■■■■■■            71% | ETA:  2s
#> Iterating ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■  100% | ETA:  0s

df_ror_germany_education <- resps |> 
  purrr::map(
    httr2::resp_body_json
  ) |> 
  purrr::map(
    "items"
  ) |> 
  purrr::list_flatten() |> 
  tibble::tibble() |> 
  tidyr::unnest_wider(
    1
  )

df_ror_germany_education
#> # A tibble: 544 × 11
#>    admin        domains established external_ids id      links  locations names 
#>    <list>       <list>        <int> <list>       <chr>   <list> <list>    <list>
#>  1 <named list> <NULL>         1997 <list [1]>   https:… <list> <list>    <list>
#>  2 <named list> <NULL>         2014 <list [1]>   https:… <list> <list>    <list>
#>  3 <named list> <NULL>         2006 <list [1]>   https:… <list> <list>    <list>
#>  4 <named list> <NULL>           NA <list [1]>   https:… <list> <list>    <list>
#>  5 <named list> <NULL>         2015 <list [1]>   https:… <list> <list>    <list>
#>  6 <named list> <NULL>         2005 <list [2]>   https:… <list> <list>    <list>
#>  7 <named list> <NULL>         1951 <list [3]>   https:… <list> <list>    <list>
#>  8 <named list> <NULL>           NA <list [1]>   https:… <list> <list>    <list>
#>  9 <named list> <NULL>           NA <list [1]>   https:… <list> <list>    <list>
#> 10 <named list> <NULL>           NA <list [1]>   https:… <list> <list>    <list>
#> # ℹ 534 more rows
#> # ℹ 3 more variables: relationships <list>, status <chr>, types <list>

Extracting the data in the list columns for each German research organization in education

Because df_ror_germany_education contains many list columns, extracting data can be quite cumbersome. Here are workflows for the most important list columns.

Names

For illustrative purposes, this is one way of retrieving all content from df_ror_germany_education[["names"]] for each organization.


df_names <- tibble::tibble(
  id = df_ror_germany_education[["id"]],
  names = df_ror_germany_education[["names"]]
) |> 
  tidyr::unnest_longer(
    col = 2,
    indices_include = TRUE
  ) |> 
  tidyr::unnest_wider(
    col = "names"
  ) |> 
  tidyr::unnest_wider(
    col = "types",
    names_sep = "_"
  ) |> 
  dplyr::select(
    id,
    names_id,
    name = value,
    lang,
    dplyr::starts_with("types")
  )

df_names
#> # A tibble: 1,479 × 6
#>    id                        names_id name                 lang  types_1 types_2
#>    <chr>                        <int> <chr>                <chr> <chr>   <chr>  
#>  1 https://ror.org/009dxjw45        1 Faurecia (Germany)   en    ror_di… label  
#>  2 https://ror.org/04839sh14        1 MHB                  NA    acronym NA     
#>  3 https://ror.org/04839sh14        2 Medizinische Hochsc… de    ror_di… label  
#>  4 https://ror.org/02nq6sy34        1 AfG                  NA    acronym NA     
#>  5 https://ror.org/02nq6sy34        2 Akademie für Gesund… de    ror_di… label  
#>  6 https://ror.org/02652az59        1 CUA                  NA    acronym NA     
#>  7 https://ror.org/02652az59        2 Control Union Acade… en    ror_di… label  
#>  8 https://ror.org/05gxhfq53        1 Hochschule der Poli… de    ror_di… label  
#>  9 https://ror.org/04tenkb98        1 German Graduate Sch… en    ror_di… label  
#> 10 https://ror.org/04tenkb98        2 Hheilbronn Business… en    alias   NA     
#> # ℹ 1,469 more rows

Links

This is one way of retrieving all content from df_ror_germany_education[["links"]] for each organization.


df_links <- tibble::tibble(
  id = df_ror_germany_education[["id"]],
  links = df_ror_germany_education[["links"]]
) |> 
  tidyr::unnest_longer(
    col = 2,
    indices_include = TRUE
  ) |> 
  tidyr::unnest_wider(
    col = "links"
  ) 

df_links
#> # A tibble: 854 × 4
#>    id                        type      value                            links_id
#>    <chr>                     <chr>     <chr>                               <int>
#>  1 https://ror.org/009dxjw45 website   http://www.faurecia.de/                 1
#>  2 https://ror.org/009dxjw45 wikipedia https://en.wikipedia.org/wiki/F…        2
#>  3 https://ror.org/04839sh14 website   http://www.mhb-fontane.de/              1
#>  4 https://ror.org/02nq6sy34 website   http://www.afg-heidelberg.de/wi…        1
#>  5 https://ror.org/02652az59 website   http://www.cu-academy.de/               1
#>  6 https://ror.org/05gxhfq53 website   http://www.polizei.rlp.de/              1
#>  7 https://ror.org/04tenkb98 website   http://www.ggs.de/en/                   1
#>  8 https://ror.org/01a2z2x66 website   http://www.iwkoeln.de/en/               1
#>  9 https://ror.org/01a2z2x66 wikipedia https://en.wikipedia.org/wiki/C…        2
#> 10 https://ror.org/00nm1ws09 website   http://www.ifnm.net/                    1
#> # ℹ 844 more rows

Locations

This is one way of retrieving all content from df_ror_germany_education[["locations"]] for each organization.


df_locations <- tibble::tibble(
  id = df_ror_germany_education[["id"]],
  locations = df_ror_germany_education[["locations"]]
) |> 
  tidyr::unnest_longer(
    col = 2
  ) |> 
  tidyr::unnest_wider(
    col = "locations"
  ) |> 
  tidyr::unnest_wider(
    col = "geonames_details"
  )

df_locations
#> # A tibble: 544 × 7
#>    id                    country_code country_name   lat   lng name  geonames_id
#>    <chr>                 <chr>        <chr>        <dbl> <dbl> <chr>       <int>
#>  1 https://ror.org/009d… DE           Germany       49.2  9.35 Neue…     2865560
#>  2 https://ror.org/0483… DE           Germany       52.9 12.8  Neur…     2864276
#>  3 https://ror.org/02nq… DE           Germany       49.4  8.66 Heid…     2907911
#>  4 https://ror.org/0265… DE           Germany       52.5 13.5  Berl…     2950159
#>  5 https://ror.org/05gx… DE           Germany       49.9  7.25 Büch…     2942559
#>  6 https://ror.org/04te… DE           Germany       49.1  9.22 Heil…     2907669
#>  7 https://ror.org/01a2… DE           Germany       50.9  6.96 Köln      2886242
#>  8 https://ror.org/00nm… DE           Germany       50.7  7.12 Bonn      2946447
#>  9 https://ror.org/01nb… DE           Germany       49.9  8.63 Darm…     2938913
#> 10 https://ror.org/02qt… DE           Germany       48.9  8.71 Pfor…     2853969
#> # ℹ 534 more rows

External IDs

This is one way of retrieving all content from df_ror_germany_education[["external_ids"]] for each organization.


df_external_ids <- tibble::tibble(
  id = df_ror_germany_education[["id"]],
  external_ids = df_ror_germany_education[["external_ids"]]
) |> 
  tidyr::unnest_longer(
    col = 2,
    indices_include = TRUE
  ) |> 
  tidyr::unnest_wider(
    col = "external_ids"
  ) |> 
  tidyr::unnest_wider(
    col = "all",
    names_sep = '_'
  ) |> 
  dplyr::select(
    id,
    external_ids_id,
    type,
    preferred,
    dplyr::starts_with("all")
  )

df_external_ids
#> # A tibble: 1,335 × 10
#>    id        external_ids_id type  preferred all_1 all_2 all_3 all_4 all_5 all_6
#>    <chr>               <int> <chr> <chr>     <chr> <chr> <chr> <chr> <chr> <chr>
#>  1 https://…               1 grid  grid.473… grid… NA    NA    NA    NA    NA   
#>  2 https://…               1 grid  grid.473… grid… NA    NA    NA    NA    NA   
#>  3 https://…               1 grid  grid.473… grid… NA    NA    NA    NA    NA   
#>  4 https://…               1 grid  grid.473… grid… NA    NA    NA    NA    NA   
#>  5 https://…               1 grid  grid.473… grid… NA    NA    NA    NA    NA   
#>  6 https://…               1 grid  grid.473… grid… NA    NA    NA    NA    NA   
#>  7 https://…               2 isni  NA        0000… NA    NA    NA    NA    NA   
#>  8 https://…               1 grid  grid.473… grid… NA    NA    NA    NA    NA   
#>  9 https://…               2 isni  NA        0000… NA    NA    NA    NA    NA   
#> 10 https://…               3 wiki… NA        Q958… NA    NA    NA    NA    NA   
#> # ℹ 1,325 more rows

# Let's filter on all Wikidata entries

df_external_ids |> 
  dplyr::filter(
    type == "wikidata"
  )
#> # A tibble: 358 × 10
#>    id        external_ids_id type  preferred all_1 all_2 all_3 all_4 all_5 all_6
#>    <chr>               <int> <chr> <chr>     <chr> <chr> <chr> <chr> <chr> <chr>
#>  1 https://…               3 wiki… NA        Q958… NA    NA    NA    NA    NA   
#>  2 https://…               3 wiki… NA        Q455… NA    NA    NA    NA    NA   
#>  3 https://…               3 wiki… NA        Q111… NA    NA    NA    NA    NA   
#>  4 https://…               3 wiki… NA        Q302… NA    NA    NA    NA    NA   
#>  5 https://…               3 wiki… NA        Q314… NA    NA    NA    NA    NA   
#>  6 https://…               3 wiki… NA        Q322… NA    NA    NA    NA    NA   
#>  7 https://…               3 wiki… NA        Q162… NA    NA    NA    NA    NA   
#>  8 https://…               3 wiki… NA        Q314… NA    NA    NA    NA    NA   
#>  9 https://…               4 wiki… NA        Q160… NA    NA    NA    NA    NA   
#> 10 https://…               3 wiki… NA        Q135… NA    NA    NA    NA    NA   
#> # ℹ 348 more rows