The Global Health Network (TGHN) Asia has set up a Data Science club in icddr,b, Bangladesh. As part of the first two sessions, TGHN, in collaboration with HDR UK, icddr,b, Fiocruz, and Africa CDC organized a hybrid workshop on Introduction to R in Epidemiology on July 17 & 18, 2023 to deliver training in the R statistical package for aspiring epidemiologists. The goal of the two-day programme was to make R training more accessible and tailored to the broader communities of practice in the Global South.

Participants joining the two sessions learned about introductory R topics such as syntax, data cleaning, and basic data analysis and visualization. A total of 37 participants attended the sessions, including 15 who joined virtually from 6 different countries, and 22 young researchers joined in person from institutions in Bangladesh.


Saimul Islam, Senior Research Investigator, Non communicable Diseases, NRD, icddr,b moderated the whole session. Mathew Redford from HDR Global gave the opening remarks, where he introduced the participants to the Global Health Data Science knowledge community and hub, the concept of Data Science Clubs and Clinics, networking with researchers, and availability of relevant tools. He also presented on the Data Science hub’s initiative to make health data more accessible and future initiatives to incorporate Artificial Intelligence and Machine Language in health research.

Aashna Uppal, an expert data scientist, and a candidate of the HDRUK-Turing-Wellcome PhD Programme in Health Data Science at the University of Oxford, facilitated the training. On the first day, participants were introduced to Applied Epidemiology and ‘R and RStudio’. Then, they were taken to the R studio environment. In the duration of the first session, different functions, packages, calculations, using working directory, creating R projects, and R syntax were touched upon. Next, they learned about class of objects, indexing and took part in practice exercises.The second day began with a brief session to address any queries regarding topics of the previous day. The training was then moved to exposing the participants with real working challenges. Base R vs Tidyverse coding conventions, how to import data and packages, basic data manipulation, and data cleaning using RStudio were discussed. In the next segment of the session, creating tables, data visualization, R markdown and automated reporting features were introduced.

In the closing remarks, Dr. Aliya Naheed, lead of TGHN Asia and scientist, Noncommunicable Diseases, Nutrition Research Division, engaged the participants sharing her real experiences of data management challenges. She encouraged the participants to go forward with their research career with effective communication with institutional and global communities. Moreover, she emphasized on practicing the demonstrated topics for effective learning.

The participants shared how useful they found the two sessions and learned a lot from the training provided. An in-person-participant showed interest in doing more of this hands-on workshop, while a virtually participant from Cambodia expressed her initiative to incorporate R/Rstudio in their mixed methods study analysis.

Feedback from the survey participants will be implemented moving forward. For example, future cohorts may be separated based on skill level, and the length of the workshops may be increased to spend more time on material covered. Participants may also be asked beforehand if they have specific requests for intermediate and advanced topics, and the team will try to run workshops in participants’ mother tongue where possible (although it was noted that running these workshops in English was not a barrier to learning). As well, a key recommendation would be to ground all learning in practical examples; having practical examples could not only encourage discussion, but solidify learning. Lastly, given feedback received, it is recommended to follow up with participants at higher skill levels to gauge whether they would be interested in undertaking a “training of trainers” course, whereby they learn how to run Data Clubs and Clinics themselves. This would ensure sustainability in maintaining this learning platform over the long term. 

Please visit the video link for more information on the Data Science Club:

View in full screen (1hour 20m 21s)

Introduction to R for Epidemiology - Session 1

View in full screen (1hour 46m 52s)

Introduction to R for Epidemiology - Session 2