Parallel Optimisation of RSiena for Social Network Analyses
Social network analysis (SNA) is the science of social connection. SNA can be used to examine complex forms of social connectivity (for example, friendship, social support), and how these connections relate to human behaviour and attributes. In disaster research, researchers have utilised advanced statistical forms of SNA to examine how disaster-affected communities and residents are resilient in the wake of the 2009 “Black Saturday” bushfires, including the complex relationship between social support and mental health.
An enduring methodological issue in SNA is dealing with patchy or incomplete network data, such as sampled or missing data. Patchy data are inevitable in community surveys of network structure, but nevertheless limit our ability to make sound statistical inferences regarding network-based social processes, such as social selection (e.g., homophily) or social influence processes. Bayesian data augmentation techniques are a promising advance in dealing with patchiness in network data; however, these methods are still under development, and have not been widely implemented within SNA software packages.
With support from Melbourne Climate Futures, Dr Colin Gallagher, Dr Johan Koskinen, Elle Pattenden, Jonathan Januar, and Prof Richard Bryant have sought to implement recently developed methods for Bayesian estimation for RSiena (Koskinen and Snijders, 2022), a R-based statistical software for stochastic actor oriented modeling of the coevolution of social networks and behaviour. This work underpins the analysis of a longitudinal social network dataset gathered on bushfire-affected communities from the Beyond Bushfires project.
This work required the parallelisation of a Bayesian implementation for this statistical framework (Koskinen and Snijders, 2022), to allow it to simulate dynamic network processes for hundreds of imputed networks simultaneously. ADACS programmers identified the need for a tailored MPI routine, and improvements to the R functionality of the RSiena package itself. The impact of these optimisations will vary depending on the network being simulated, since workload depends on the number of networks involved as well as their size. However, for a large reference problem used for testing during this work, a speed-up of a factor of 36 was obtained. Scaling of performance is very good to at least 256 compute cores for this problem as well (see attached plot).
The new code will allow for the use of RSiena – the gold standard software for longitudinal SNA – to be run on distributed systems using a customized message passing interface (MPI) – a world first.
“It’s been a real treat to work with ADACS, who resolved our software challenge in fine form. While SNA and Astrophysics may seem a long ways apart, there is a fundamental and important common ground in Markov chain Monte Carlo statistical methods that bridges the two fields. As a result, the ADACS team had a firm grasp of the statistical background for the project, which sped up our ability to work with them. Our regular meetings were productive and informative, and culminated in a successful outcome. We’re hoping to continue our collaboration with such a unique, capable, and collegial group.”
This work was externally funded and performed by ADACS using capacity outside of the NCRIS funded services to the national astronomy community.