University Choices

Academia R

Navigating through the college choice options from the CAO.

Eugene https://fizzics.netliffy.app
2021-12-06

I have a daughter finishing up school and so looking at her university options for next year. In Ireland, there is a central clearing house (called the CAO) that takes care of university admission based on performance in an end-of-school exam. This exam gives you a certain number of points, with a maximum of 625.

The preference (at the moment, it changes frequently) is for something featuring Environmental Science. To that end, we looked around at the different options available at Irish universities and colleges. There are about 10 Irish universities, an amazing number for a small country, but then we have a strong academic ethos here. There are also a similar number of smaller colleges. Looking through the various offerings, there are North of 1000 different courses available for students wishing to undertake a four year degree. Quite the range of choices.

Next question is, which ones feature Environmental Science? To find out, we went to the CAO website. There you can track down and access a bunch of pdf’s and excel spreadsheets that detail the various courses, as well as the points required for entry. And it gives this information going back several years. A data treasure trove.

Of course, pdf’s and excel spreadsheet are not usually designed for easy access for data mining. The pdftools package helps a lot, but there is still plenty of data wrangling to do to get things in to middling half decent shape

years <- c("07", "08", "09", "10", "13", 14:19 )
cao_points_year <- function(year) {
  cao_pdf <- glue::glue("http://www2.cao.ie/points/lvl8_{year}.pdf")
  z <- pdf_text(cao_pdf) %>% 
    str_split("\n") %>% 
    unlist()
  z <- z[!str_detect(z, "^ ") & z != "" & !str_detect(z, "Course Code") & str_count(z, "  +") == 3] %>% # gets rid of non-data rows
    str_split("  +") %>% # splits rows based on runs of several spaces
    unlist() %>% 
    str_remove("#") %>% 
    str_remove("\\*") # deletes some annoying characters
  z <- tibble(year = glue::glue("20{year}"), 
              code = z[c(T, F, F, F)], 
              course = z[c(F, T, F, F)], 
              final = z[c(F, F, T, F)], 
              medium = z[c(F, F, F, T)]) 
  # original list made bunches of four elements that together described a course. This tibble winds them up to one row
  z
}

z <- map_df(years, cao_points_year) %>% 
  mutate(year = as.numeric(year))

That took care of the pdf’s. But the last two years (2020, 2021) data was stored in excel files. It turned out to be simpler to download these from here and here and then whip them into the same shape as our pdf data.

z21 <- readxl::read_excel("data/CAOPointsCharts2021.xlsx", 
                        sheet = "EOS_2021", skip = 10) %>% 
  janitor::clean_names() %>% 
  filter(course_level == 8) %>% 
  mutate(year = 2021) %>% 
  select(year,
         code = course_code, 
         course = course_title,
         final = eos_points,
         medium = eos_midpoints) %>% 
  mutate(year = 2021,
         medium = as.character(medium))
z20 <- readxl::read_excel("data/CAOPointsCharts2020.xlsx", 
                        sheet = "PointsCharts2020_V2", skip = 9) %>% 
  janitor::clean_names() %>% 
  filter(level == 8) %>% 
  mutate(year = 2020) %>%  
  select(year,
         code = course_code2, 
         course = course_title,
         final = eos,
         medium = eos_mid_point) %>% 
  mutate(year = 2020)

And finally, we put all three together

z <- bind_rows(z, z20, z21)

Now we take this data (all 10585 rows of it), pick out the courses with an Environmental Science -like hue, and create a plot showing how the entry points have changed over the years. The results is shown below. The labels on the graph give the course code, which also points toward the host university. So TR064 will indicate a course in Trinity College Dublin, GY308 a course in Galway University. The actual names of the courses can be pretty verbose and have been abbreviated in the legend. It’s still possible, just about, to guess what they are, Blgcl,ErthandEnvSc,s for example, would be Biological, Earth and Environmental Sciences.

I’m not sure if we’re any closer to pinpointing a course, but at least with this analysis it’ll be a lot easier to investigate next weeks favourites.

The code for the plot is given below:

z %>%  filter(code %in% courses) %>%  
  left_join(max_years) %>% 
  mutate(last_name = str_replace(last_name, "Environmental Science", "EnvSci.")) %>% 
  mutate(label = ifelse(year == year_max, code, "")) %>% 
  mutate(code = glue::glue("{code}: {abbreviate(last_name, 20)}")) %>% 
  ggplot(aes(year, final, colour = code, group = code)) +
  geom_line(size = 2) +
  geom_point(size = 5) +
  geom_label_repel(aes(label = label),
                   nudge_x = 0.2,
                   size = 12,
                   na.rm = TRUE,
                   show.legend = F) +
  scale_color_okabe_ito() +
  scale_y_continuous(breaks = seq(100, 700, by = 50)) +
  labs(y = "Final Points", x = "") +
  theme_clean() +
  theme(legend.position = "bottom", legend.title = element_blank()) +
  guides(col=guide_legend(nrow=3))

And the libraries used are:

# http://www2.cao.ie/points/deg03.htm#trd

library(tidyverse)
library(pdftools)
library(ggokabeito)   # Colorblind-friendly color palette
library(showtext)
library(ggrepel)

Full code can be seen on github.

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/eugene100hickey/fizzics, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Eugene (2021, Dec. 6). Euge: University Choices. Retrieved from https://www.fizzics.ie/posts/2021-12-06-university-choices/

BibTeX citation

@misc{eugene2021university,
  author = {Eugene, },
  title = {Euge: University Choices},
  url = {https://www.fizzics.ie/posts/2021-12-06-university-choices/},
  year = {2021}
}