The Structure of Distributed Scientific Research Teams Affects Collaboration and Research Output

Sarah Gehlert, PhD1,2∗, Jung Ae Lee, PhD, Jeff Gill, PhD, Graham Colditz, MD, Ruth Patterson, PhD, Kathryn Schmitz, PhD, Linda Nebeling, PhD, RD, Frank Hu, MD, Dale McLerran, Diana Lowry, MPH, the TREC Collaboration and Outcomes Working Group, Mark Thornquist, PhD, The George Warren Brown School, Washington University, Saint Louis, MO; Siteman Cancer Center, Washington University, Saint Louis, MO; Agriculture Statistics Lab, University of Arkansas, Fayetteville, AR; Department of Political Science, Washington University,St. Louis, MO; 5Moores Cancer Center, University of California at San Diego, La Jolla, CA; 6Clinical Epidemiology and Biostatistics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA; National Cancer Institute, Bethesda, MD; Harvard T.H. Chan School of Public Health, Harvard University, Boston MA; Fred Hutchinson Cancer Research Center, Seattle, WA; Sarah Gehlert, College of Social Work, The University of South Carolina, Pendleton, Columbia, SC, Email: sgehlert@mailbox.sc.edu.


Introduction
Current approaches to biomedical research rely increasingly on cross-disciplinary collaboration. In addition, recent funding announcements emphasize team science and call for the development of multiinstitutional collaborations or hubs that promote team science. Yet despite this emphasis, questions remain about whether the benefits of a team science based approach to research outweigh the operational and transactional costs.
Beginning in the late 1990s, the National Cancer Institute (NCI) incentivized team science through three multi-site initiatives, Transdisciplinary Tobacco Use Research Centers (TTURC), Centers for Population Health and Health Disparities (CPHHD), and Transdisciplinary Research in Energetics and Cancer (TREC).
In the present paper, we perform a social network analysis of four waves of internal survey data from TREC investigators augmented by bibliometric data to determine the extent to which collaboration across investigators, disciplines, and sites affects the timing, rate, and type of publications. This approach is unique as the first longitudinal examination of an NIH transdisciplinary initiative and as the first study allowing comparison between investigators' subjective views of collaboration and objective counts and types of shared publication. Our research question is whether active efforts by a funded research initiative to foster investigator ties (network "edges") within and across sites will affect the density, centrality, and homophily of those ties at the level of the initiative, in turn speeding the onset of publication and the diversity of disciplines and institutions represented in the authorship of those publications. Our comparison is with available data from two other transdisciplinary sites funded by NIH.

NIH Transdisciplinary Initiatives and their Evaluation
In 1999, the NIH established the first of three multi-site transdisciplinary research initiatives, the Transdisciplinary Tobacco Research Use Centers (TTURC) [1].
TTURC was the first to undergo evaluation of its impact on research output, as measured by the timing of publications. A retrospectively-conducted bibliographic evaluation of TTURC in comparison with R01s found that after an initial two to three-year lag period, TTURC had higher overall publication rates over a 10-year comparison period [3]. In a companion article to the bibliographic evaluation, Rimer and Abrams cautioned that the ultimate impact and value-added nature of transdisciplinary frameworks may be decades away due to the time that it takes to establish effective team functioning [4].
CPHHD, the second NIH transdisciplinary initiative to be funded, focuses on the determinants of health disparities and translation of this knowledge into solutions. CPHHD is unique among NIH's transdisciplinary initiatives in its focus on local communities with high rates of disparities and its mandate to partner with these communities at all stages of the research process. Okamoto conducted an analysis of one wave of survey data from the ten-year initiative in 2015 [5] and found a similar lag period as seen in TTURC.

The TREC Initiative (2010-2015)
In 2005, TREC was funded by NCI as NIH's third transdisciplinary initiative, in response to increasing evidence of the contributions of nutrition, energetics, and physical activity to cancer incidence, morbidity, and mortality. The NCI Request for Applications (RFA) (http://grants.nih.gov/grants/rfa-files/RFA-CA-10-006.html) describes the TREC initiative's purpose as "to foster collaboration across multiple disciplines and encompasses projects that cover the biology, genomics, and genetics of energy balance to behavioral, socio-cultural, and environmental influences on nutrition, physical activity, weight, energetics, and cancer risk."(p. 1).
The TREC Coordination Center and Dr. Gehlert began prospectively gathering data to evaluate the transdisciplinary efforts of the TREC at the beginning of its second cycle of funding, in 2011. These data were supplemented data with an analysis of prospectively-obtained bibliometric data.
The TREC sites (the University of Pennsylvania, Washington University in St. Louis, the University of California-San Diego, Harvard University, and the Coordination Center at Fred Hutchinson Cancer Research Center in Seattle) comprise more than 120 investigators and over 30 clinical, biologi-cal, behavioral, and social science disciplines, with inevitable fluctuation over time due to investigator moves. Their scope extends from the biology, genetics, and genomics of energy balance to social and behavioral influences on physical activity and nutrition, weight, energetics, and cancer risk [6]. In part because of the emphasis on the specialized technology inherent in research on energetics and a reliance on animal models, systematic efforts were made to maximize resources by fostering ties across sites. This was incentivized by developmental awards requiring participation by junior and senior investigators across sites that were managed by the TREC Coordination Center. TREC was the only one of the three NIH transdisciplinary initiatives to have a Coordination Center as part of its structure.
The TREC initiative aims to accelerate scientific discovery through a transdisciplinary approach to team science, which Rosenfield defines as exchanging information, altering discipline-specific approaches, sharing resources, and integrating disciplines to achieve a common scientific goal [7]. Transdisciplinary research differs from multidisciplinary and interdisciplinary research in the extent to which investigators operate outside the boundaries of their own disciplines to share language, pool knowledge and theories, and develop new methods of analysis. It is generally considered to represent the highest degree of disciplinary collaboration [8].
In a 2008 article introducing the TREC initiative in a special issue of the Journal of Preventive Medicine, Robert Croyle, who heads NCI's Division of Cancer Control and Population Sciences, wrote that "one important assumption underlying these efforts was that the speed of scientific progress and its effective application to public health problems would depend on the integration of discipline-specific efforts and increased support for collaboration, evidence synthesis, and the science of dissemination" [9]. Thus, integration among collaborators is seen as a measure of the success of transdisciplinary team science initiatives like TREC. In another article on team science in that same year, Karen Emmons (now VP, Director of the Kaiser Foundation Research Institute) was quoted as saying "among the most important indicators of success is rich team communication" [10].

Study Sample
Survey participants were investigators involved in the five TREC sites from 2010 to 2015. A list of investigators was developed by the TREC Steering Committee and after receiving appropriate IRB approvals, each was sent a letter inviting them to participate along with a copy of the social network survey. The 2011 survey established a baseline measure of ties after the first year of funding. The survey was re-sent yearly to assess the degree to which the density of social network ties changed over time during the height of the grant activity. Because it was sent late in 2011, the next survey was received early in 2013, 13 months after the previous year's survey. Thus, no data are available for 2012, and the time between waves was 13 months rather than 12 months. In 2013, we limit the invitations only to those who responded in 2011 and remained active in TREC. Fewer invitations were sent in 2014 and 2015 due to turnover of faculty over time, which is a typical phenomenon in such projects. The TREC Coordinating Center collected publication data on a regular basis and examined authorship in terms of the disciplines and sites represented. In the current paper, we report on data from the first four waves of data collection. The response rate of the survey is approximately 80 percent in each year, although the absolute number of respondents decreased slightly over the period of study. The number of respondents by academic position and site are summarized in Table 1.

Measures
Collaboration Network. The survey listed the names of all TREC investigators and asked individual respondents if they currently worked with or had worked with prior to TREC each investigator on the list: (1) on a study or grant; (2) on a co-authored publication; (3) on a co-authored presentation; (4) in mentoring or training; (5) on a committee or work group; or, (6) in any other activity. These are the conditions for interacting and thus forming an edge (tie) in the researcher network. Thus, the behavior that we study here is two researchers deciding to collaborate in one of these six ways. Note that we are not measuring the quality of these ties, nor the resulting groups from collections of ties, except that a grant award or publication is clearly an indication of edge effectiveness. Discipline. From a list of 37 academic disciplines, investigators were asked to choose the one that best characterized the disciplinary perspective of their work. For purposes of analysis, responses were collapsed into eight disciplines ( Table 2). The subgroups of disciplines and the distribution of disciplines by site are described in Table 3.
Social Network. We consider a social network to be a relational structure among actors. We define individual TREC investigators who participated in our survey to be a set of actors (also called nodes), and assign a tie (as in "tie-together") if there are collaborations between those investigators. These ties between two actors are called arcs or edges. Considering the group of TREC investigators as a social network, or a social entity made up of a number of actors, allows the group's structure to be analyzed in its entirety as well as the dyadic relationships between its members. The underlying assumption is that more frequent communication within and across sites will better foster advances in science, specifically in this case energy balance and cancer. For the purposes of analysis, we define communication as the quality of dyadic and group interactions beyond what is measured in six survey criteria for an edge. We also specify an undirected network, so that if actor 1 is tied to actor 2, the reverse is also true, because we consider peer-level scientific collaborations to be overwhelmingly mutual rather than hierarchical in this context.
Network Size. The network size is the number of actors in the network (i.e., the total number of members in the network).
Network Density. More ties between investigators imply greater network interaction, as defined above. The principal way of evaluating this quality is by measuring the density of social network ties, defined as the number of actual ties between network members compared to the number of potential ties (equal to (n 2 − n)/2 for n individuals in an undirected network). Denser networks suggest faster propagation of information and greater group cohesion [11]. Also, individuals who conduct more information tend to be more active in terms of research goals and objectives [12].
Triads. Triangle relationships occur when two individuals with a tie have a tie with the same third individual, and the number of triangles can be greater than the number of direct ties [13], especially in network with multiple attributes and diverse individual backgrounds like TREC. Four types of triadic relations occur in undirected networks: no ties (three actors are isolated without ties/edges), a single tie between two actors while the third actor is isolated, two ties among three actors, or all three ties forming a triangle. Counting the number of these types across all possible triples, a so-called triad census [14], allows us to better understand the local social structure [15], which may not be captured by global measures or dyad density. Relative prevalence of these triangles implies that interpersonal choices tend to be mutual and transitive [15].
Centrality. Another traditional way to evaluate a network is through the node centrality [16]. While the density of a network is a global measure to understand the overall network function, centrality evaluates the power and influence of each node on social relations. For example, some actors in a network are highly central while others are not widely connected. Large differences in centrality for a given network tend to produce hierarchical structures with isolated individual actors at the periphery: a core-periphery structure [17]. We first measure degree centrality, which shows how many actors are tied directly to each actor. The measure is computed by counting the number of adjacencies for an actor in a network of size n, that is, where I ij is 1 if the actor i and j are connected, 0 otherwise. Our second measure of centrality is closeness. This measure provides an index of independence or efficiency [16], meaning the speed with which an actor reaches other actors in the network. The computation of closeness is based on summing the shortest paths (called geodesics) from an actor to all other actors in the network, that is, where d ij is the geodesic distance between i and j. This closeness score is rescaled between 0 and 1. It is 0 if an actor is an isolate, and 1 if an actor is directly connected to all others. We also measure centrality by the location of actors of high academic rank (full professors), which can be seen as a measure of the effect of seniority. Our third centrality measure is betweenness, which quantifies the number of times of an actor being situated as a bridge along the shortest path between two other actors, to assess how likely this actor is to be a direct route between two actors that are not linked otherwise. This measure is computed by where g − ij is the number of the shortest paths linkingi and j, and g ij (k) is the number of the shortest paths linking i and j that contain k.
Homophily. Similar actors tend to bond with each other for many reasons such as opportunity, affinity, ease of communication, reduced transaction costs, and organizational foci [18][19][20][21]. We assume that a tendency toward homophily that exists in the TREC network consists of eight subgroups by discipline and five subgroups by site, within which network members are easily accessible to each other. An effective way to view homophily is through the exponentialfamily random graph model (ERGM) [22,23]. The model suggests the probability to be connected between i and j individuals in the network. A tendency toward homophily indicates a higher probability of ties being formed between actors within the same site or within the same discipline.
Number of Publications and Authorship. As per the initial NCI Request for Proposals (http://grants.nih.gov/grants/rfa-files/RFA-CA-10-006.html), we consider the establishment of collaborations among disciplines (i.e., biology, genomics, and genetics of energy balance and behavioral, socio-cultural, and environmental factors that determine cancer risk) as a key measure of TREC's success. In addition to measuring these collaborations subjectively, we measured collaborations within and between TREC sites in terms of the authorship of those papers across disciplines and sites. Examining the production of papers across time allowed us to investigate whether the lag time seen in TTURC and CPHHD was also true of TREC.
Although others have measured impact of publications using citation counts [24], we did not consider the four years of our study sufficient time for citations to accrue in a way that would reflect impact. We thus considered citation data to be a less reliable measure of research output. It will be the subject of future papers as the opportunity for TREC work to be cited increases over time. Data were collected from the Coordination Center at Fred Hutchinson Cancer Research Center through the Annual Progress Report to the National Cancer Institute and the cita-tions of the TREC Center grant number in PubMed. We explored the authorship of each paper reported retrospectively in terms of the disciplines and sites involved.

Statistical Analysis
Social network analysis was used as the principal mode of analysis in the present study. All statistical results and graphs were analyzed using R version 3.0.3 [25], and the network, sna, and ergm R packages [26][27][28] were specifically used.

Results
The network properties of the TREC research sites over years are summarized in Table 2 Note that the network size shown in Table 2 is different from the number of respondents for the survey reported in Table 1. The difference between the number of respondents and the total number of network actors is due to the secondary actors who are included in the network because, although they did not themselves respond to the survey, they were designated by respondents as a link. Thus, we do not know about the relationship among the secondary actors. We take this into account in our analyses because the network density can be sensitive to the number of respondents rather than the network size, which often makes it hard to fairly compare the longitudinal social network data due to different sets of respondents each time point.
The density of TREC network is 0.086 in 2011 and remains similar or slightly decreases over time ( Table 2). The relatively lower density 0.082 in 2015 does not necessarily mean less collaboration, because we observe more ties in 2015 than in 2011 (547 vs. 415). The lower density means that there are fewer social ties among actors relative to the chances of those ties occurring, which can occur simply because a network is large. As seen in Table  1, the network size was noticeably smaller in the first year of the grant (n=99), when researchers were building new relationships, than in subsequent years when it stabilized in terms of total participants (n=116, 114, 116). Network ties expanded rapidly in the early period, going from 414 in 2011 to 577 in 2013. However, network ties then stabilized in the mid-500s (t=577, 525, 547). Note that these are not equilibrium values in the traditional long-run sense since the study contains only four waves.
Triad census is reported in Table 2. Triangle relationships increased over time, from 592 in 2011 to 1175 in 2015. These numbers are translated to the Transitivity Index 1, by dividing by all possible triads, and Transitivity Index 2, by dividing by the number of one-tie triads. Both transitivity indexes increased in 2015. The second index implies that the characteristics of the small group relationships have been changing from "couple only" to "triangle" in general.
Transitivity of interpersonal choices forms a more clustered structure, as evident in Figure 4. In a transitivity network, the existence of the two ties, a↔b and b↔c, will increase the probability of another tie a↔c, which represents the closure of the triangle. Transitivity therefore leads to the larger connected groups by particular clustering members [14,15], which are a small number of highly active members surrounded many much less active members creating more triangles than a network with ties distributed uniformly randomly.
The density by each site is displayed in Table 2. A decreasing trend in density is most noticeable in Sites 1 and 2. This occurs because of the lower number of survey respondents in later years and because the total TREC social network has built up more cross-site relationships over time as researchers from different institutions collaborated more closely.
In Table 2, we report the mean and standard deviation (sd) of three types of centrality measures. In terms of degree centrality, we see an average of eight to nine edges for an actor per year, as shown in the measure of the degree 8.38, 9.95, 9.21, 9.43 in each year. In terms of seniority, we find that the degree of the group of investigators of high academic rank is greater than the network average for all members. We likewise see that it increased over the period of the study, which indicates that senior faculty play a central role as coordinators or gatekeepers amongst the total investigators [29]. As can be seen in Table  2, the scores do not differ markedly among actors, as indicated by low standard deviations in all years. Each member thus has a similar level of dependency (or efficiency) to connect to every network member. In Table 2, we see a large standard deviation of the betweenness among actors. This indicates that a small set of actors perform key brokerage functions within the network [29]. Individual differences can be seen in Figs. 1-4, where the size of the node indicates the degree of the ties for each investigator. Although this refers to the degree, it indirectly shows the betweenness distribution among the actors, since these two centrality measures are highly correlated.
In Fig. 1, each node indicates an individual investigator, and is colored according to discipline, with sites labeled by the letters A to E. We see that the network is clustered by site, and there are key players in each site with large sized circles for greater relative connectedness. For instance, in site A, some investigators in epidemiology and biochemistry/genetics are shown to be key actors in the local network. However, a member in exercise and physiology, located in the center of site B, is similarly highly connected. In site C, the epidemiology and social behavior science fields are at the center of the diagrams. In site D and E, each actor shows widely spread, but more diffuse connectedness in the local network. Therefore, some disciplines are more influential in some sites than others. Combined, these observations imply the collaborations across the sites implies cross-disciplinary interactions between geographically distinct leaders.
Homophily. From Table 2, the lines of the homophily, we report the results from two separate ERGM models across both site and discipline. Not surprisingly, there is a homophily effect by both site and discipline over four years, as shown by a small p-value (<0.01), meaning that actors are more likely to make ties with those in their own site than those in other sites. A similar interpretation is possible for disciplines. Also, the probability of collaboration within site was 0.311 in 2011, which is higher than the overall density 0.086: the density can be interpreted as the probability of ties of two actors in general. However, we notice that the probability of ties within sites monotonically decreases over time to 0.219 in 2015. Within-site homophily tendency decreases as cross-site collaborations increase. Likewise, the probability of collaboration with someone from the same discipline was 0.145 in 2011, which is higher than overall density 0.086, after which the number decreases over time. The homophily by discipline is relatively less strong than homophily by site, suggesting that the "transdisciplinary" team mission has been well performed since the beginning of the TREC.

Number of publications. Publication numbers by
year are summarized in Fig. 5. The number of publications increased in later years of TREC as we expected from the expanding collaborations in TREC centers. Furthermore, we calculated the number of papers in which the coauthors were from multiple disciplines and from multiple sites. These numbers are also reported in Fig. 5. Note that information about discipline of authors of papers was not universally accessible. We therefore used information collected in our longitudinal survey and our central TREC directory of TREC investigators, for which only 40% of disciplines are known. This may have resulted in an underestimate of the number of disciplines per publication. In counting multisite publications, we included both TREC sites and non-TREC sites because the coauthors from non-TREC sites indicate extended collaborations with TREC and spinoff from TREC projects. Having either multi-site or multi-disciplinary authors on a publication is common across all years.

Nature of Collaboration
Perhaps most striking is the rapid development of cross-site ties. In Table 4, we reported the proportion of cross-site versus within-site ties. In 2011, the cross-site ties only account for 15.66% of total ties, increasing over time to 39.67% in 2015. This phenomenon is shown graphically through Figs. 1-4. The network in Fig. 1 is clustered by site, whereas the network in Fig. 4 exhibits a well-mixed cluster of sites. This occurs because cross-site collaborations increased over the years. The number of cross-site ties after the first year was twice as high for TREC than for CPHHD [5], the only other NCI-funded initiative for which they were measured, during the first year of its second round of funding, the only year in which CPHHD network ties were measured. In terms of type of collaboration, we see that in 2011, grant and committee collaborations were most prevalent, while co-authored publications were most prevalent from 2013 to 2015.

Discussion
Several things are notable in terms of the collaborations among TREC investigators over time and the research output that accrued. In terms of subjective reports of collaborations, we see that the number of network ties increased over time. These network ties are reflected in an objective count of the num-bers of peer-reviewed publications with authors from multiple disciplines and publications that cross sites. This suggests that TREC was successful as defined in the original NCI Request for Applications for TREC (http://grants.nih.gov/grants/rfa-files/RFA-CA-10-006.html).
Our comparison of network ties is with the one year of data collected from CPHHD. We found many more cross-site ties after the first year of TREC than were observed after the first year of CPHHD. In retrospect, we anticipate that this likely was due to the use of the TREC Coordination Center as a natural network hub for TREC, an unanticipated consequence of the use of a cross-site coordinating mechanism. Neither TTURC nor CPHHD had such a center as part of its structure. The Coordination Center fostered cross-site ties through establishing and maintaining mechanisms such active monthly steering committee teleconference calls with representatives from each site and NCI, cross-site working groups representing multiple disciplines on topics such as cancer disparities, biomarkers, and measures of physical activity, and designated cross-site developmental awards that privileged junior and senior investigators working together, mandating that more than one site be represented.
Our results show two main differences between TREC and the other two NIH transdisciplinary initiatives. In comparison to the CPHHD, which also compared within-site and cross-site network ties after its first year of functioning, TREC had many more cross-site than within-site ties. We expect that this is due to CPHHD's mandate to partner with communities, which drew investigators inward, while TREC investigators' emphasis on technology caused us to look across sites for resources to share. At the same time, the Coordination Center fostered and facilitated the sharing of resources and expertise across sites. Secondly, because TREC was able to promote and support work across disciplines and site, in part through the Coordination Center, TREC did not experience the two to three-year lag in publications to the same degree as TTURC and CPHHD [2][3][4].
Both our survey and the bibliometric results consistently show that TREC collaborations have been growing and emblemize successful team science in public health and medical research. Both crossdiscipline and cross-site collaborations contributed to this growth. Importantly, the analysis here suggests that multi-site team science initiatives are more likely to foster greater collaboration and cooperation when they are designed to be transdisciplinary from the start, whereas one previous study [24] found that across all types of biomedical studies increasing physical distance between investigators is a major deterrent to scientific impact. In the current environment of shrinking grant support, funding agencies may want to consider focusing on transdisciplinary team science as a way to increase research success for a given level of funding.
In analyzing the network, our study has some limitations since it is a longitudinal study. Losing or adding members in a network in successive years makes some measures less comparable. Lower density can occur simply because of a different size of network or a heterogeneous set of actors of each year. Also note that the density measure is based on dyad relationships, ties between two actors, which may not capture the local social structure like a triangle. Rather than relying solely on results of density, we also looked at the nature of relationships through a triad census and the exponential random graph model. The TREC network is a good example of change in that the overall density is similar over a given year, but the nature of local relationships can change. Another issue may come from the inclusion of the secondary actors who are designated by the survey respondent but did not participate in survey. More secondary actors that we have more missing, and therefore the overall edges are underestimated by the survey. Fortunately, we could locate more defined groups, notably triangles, in the network with the most secondary actors [29], buffering our general conclusions.
Our analyses of TREC data are ongoing. The number, pace, and nature of publications are important and visible academic outcomes for any scientific team, as are citations of those publications. As stated earlier, while we considered it premature to analyze article citation data, we intend to do so retrospectively after publications have had time to collect citations in the normal fashion. In the meantime, our network survey data are highly nuanced and provide extensive insight into dyadic and group collaborations as well as network dynamics, improving upon previous investigations from NIH-funded projects.   by different disciplines and labeled by the letters A to E for different sites. The size of node indicates the degree of the ties that each investigator has. It is clearly seen that the network is clustered by site.     (13) 13 (14) 10 (10) 10 (10)