Practical resources for analysing your first DCE

 

I’m relatively new to discrete choice experiments and have really enjoyed learning about the different analysis approaches and techniques used. It is such a rapidly evolving field and there is always something new to learn. While there is a lot happening to push the boundaries, I’ve recently been helping a couple of people with the analysis of their first DCE. While a lot of your analysis approach should be worked out before you begin the DCE,  when you get to the point of actually doing the analysis for the first time there is a whole lot of stuff around which commands to use that you might still need help with. I realised there are some references I just keep recommending and coming back to, so I’ve shared them here maybe you’ll find them helpful too. [Note: this post is updatted as I come across new resources].

General guidance

It often helps to know at the start what you are aiming to achieve at the end. I think this is a nice example of describing the methods and assumptions of a DCE around parental preferences for vaccination programs really clearly and succinctly. The other general information I refer people to is the ISPOR Analysis of DCE guidelines, which include the ESTIMATE checklist of things to consider when justifying your choice of approach.

Analysis approach

When I did the DCE course run through HERU in Aberdeen it was suggested that the typical approach to considering analysis of DCEs was to be to start with a simple model and then use more complex models to address specific issues that arise with your data or relate to your research question. This commonly means starting with a conditional logit model, and then considering options such as mixed logit and latent class analysis. The ISPOR Analysis of DCE guidelines have clear descriptions of the theory and assumptions of these approaches, and I found this paper interesting in comparing mixed logit and latent class approaches.

Analysis code

I am originally a SAS user, and so when I first started analysing DCE data I assumed I would do so in SAS. However, after much investigation I’ve realised this is easier said that done and have now moved to using STATA for the DCE analysis, although I’m still much more comfortable doing the data management and preparation in SAS. Using two different packages is time consuming, clunky and the opposite of “reproducible research”, so my next step is to convert managing my DCE data AND analysis in R. I haven’t got very far, so if anyone knows any good packages then please pass them on! I promise to update this page if I find something useful.

  • SAS

It is straight forward to run a conditional logit in SAS using PROC MDC (user guide). Some resources I found helpful to implement PROC MDC is this example code for conditional logit with PROC MDC and this SAS user group paper “Discrete choice modelling with PROC MDC”. The error message I’ve had most often in doing this analysis is “CHOICE=variable contains redundant alternatives” which relates to the data looking like people have chosen more than one option in a choice set. If you get this, check the cleaning and the sorting of your data!

You can do effectively the same analysis using PROC PHREG, as described by this technote, plus there is a suite of marketing research guides that describe various ways to analysis discrete choice data.

Moving on from conditional logit to mixed logit or latent class analysis is more difficult in SAS. There is a guide in this video to running conditional logit models and mixed logit models (using PROC MDC, starts at 5:30 minutes), although I could never get their mixed logit method to work (entirely possible due to user error!). I did also contact the SAS helpdesk and they said it would be difficult, but recommended using PROC BCHOICE (Bayesian Choice) for mixed logit analysis with DCE data that has multiple choice sets per participant. There is some documentation here and a worked example here.  Again, I never really got this to work but it could be my mistake.

  • STATA

Having faffed around in SAS for long enough, I caved in and transitioned to using STATA like everyone else in my research group! I found this a really nice introductory, step by step guide to analysis in STATA, including data set up and Conditional Logit and Mixed logit options. There is also this article which is a guide to analysing DCE data and model selection, and includes STATA code (as well Nlogit and Biogene) in the supplementary material. Finally, this working paper is useful for describing the theory and code for doing more advanced models, like Mixed Logit and Latent Class analysis in STATA, although the code isn’t annotated which I found frustrating as a new STATA user. I haven’t used it yet, but there was a STATA newsletter article about using the margins option to interpret MIXL choice model results, which could be useful.

For latent class analysis is STATA I found this article in the STATA journal a useful description of the command, and this was a nice example of a paper that used mixed logit and latent class models and wrote them up clearly. Finally, these three articles (one, two, three) seemed like good examples of calculating and displaying relative importance graphs.

  • R

I’m keen to analyse my next DCE in R, so have started looking at how I might do this. I have found the following resources, but if anyone has any experience with DCEs in R then please get in touch!

  • Two papers by Aizaki and Aizaki & Nishimura on designing DCEs in R, and including analysis using conditional logit models
  • Example R code and case study of mixed logit model with multiple choices per respondent, including analysis and helpful tips, written by Kenneth Train and Yves Croissant
  • An mlogit package for analysing DCE data in R, as described in Kenneth Train (2009)
  • Thanks to Nikita Khanna for pointing me to this paper & code for doing sample size calculations for a DCE in R.
  • There is also the Apollo package in R, developed by the group at the Choice Modelling Centre at the University of Leeds, with a website & manual available.

Cancer is about more than health: work and leisure after cancer

This is a guest blogpost by Marjon Faaij, who I was delighted to supervise for her Master of Pharmacy research project.  We made a great team – Marjon had a personal interest in the impact of cancer on daily life, and I had access to some data about cancer survivorship through the PROFILES registry. Even better, because Marjon was from Utrech University, she could translate the Dutch PROFILES data much more easily than I could! Marjon presented the results of her research at the NCRI conference in the UK, and we are now writing them up as a publication. In the meantime, Marjon put together this summary, and was kind enough to let me share it here.

In 2005, I lost my mother due to cancer. Before she died, she was sick for almost three years. During this period, cancer had a big impact on her daily life. Shortly after the diagnosis of cancer she could still do everything she liked; working in the hospital as a nurse, taking care of her family, cleaning our house, giving music lessons and swim lessons and socialising with friends and family. But as the time after the diagnosis increased, she became sicker, she had more pain and was more tired. She did not have the energy to do all the things she liked. She decided to work less hours until she stopped working completely. She used this time to spend more time with us and to rest more.

A lot of different factors influenced her decisions about doing work, unpaid work and leisure. One of the most important factors for her was the support from family and friends, but I can imagine that it will be different for each cancer patient.

Therefore, I decided to do a research project about the different factors of influence on cancer survivors doing daily activities, for my Master of Pharmacy. For this research I used surveys of Dutch cancer survivors, including people with Hodgkin lymphoma, non-Hodgkin’s lymphoma, multiple myeloma, thyroid or prostate cancer.

Factors of influence

From my results it is clear that cancer survivors are less likely to do paid work, and those who do work are likely to work fewer hours. Cancer survivors are also more limited in their unpaid work and leisure. However, how much cancer influences each activity is dependent on cancer type. Each cancer type has different symptoms, and has different treatments, which leads to different influence on doing daily activities.

Consistent to my mother, most cancer survivors try to keep working and fully participate in leisure and unpaid work activities. However, if they become sicker it is harder to fully participate in these activities. When they are limited in one area, they appear to be limited in all activities.

There are a lot of factors that have influence on doing daily activities. For example:

–         People were less likely to have a paid job if they were: female, had surgery, older, widowed or had multiple comorbidities.

–         People were more likely to be limited in their unpaid work if they had: non-Hodgkin lymphoma or multiple myeloma, multiple comorbidities, were female, or were never married.

–         People were more limited in their leisure activities if they had: medium education or multiple comorbidities.

It was interesting that people who received more follow-up services were no more or less likely to report difficulty with paid work, unpaid work or leisure. But people who felt satisfied with the follow-up care they received had an increased chance of participating in daily activities.

What does this mean?

These results show that there are many factors of influence on daily activities. The factors are unique for each cancer survivor, and so are the impacts. It is important for patients to know that changes can take place across all of their daily activities during cancer, so they can prepare for and react to these changes.

Doctors need to know that cancer and its treatment can influence patients’ daily activities, and that these changes can be important for quality of life. Discussing these changes with patients and providing support and referral to services that can assist patients (and their families) during this difficult time. These referrals are not possible if there is nowhere to refer patients to, and so health care systems need to ensure that services like work rehabilitation, occupational therapy and palliative care are available and appropriately funded.

Finally, the results are important for health economics. Economic evaluation using a societal perspective account for  changes in paid work due to illness (known as lost productivity) but the contribution of unpaid work usually goes unaccounted for. From these results it is clear that cancer has a big impact on both paid and unpaid work, and thus both should be considered in economic evaluations taking a societal perspective.

This research, cancer is about more than health – work and leisure after cancer, is based on data of the PROFILES Registry. This research project is carried out by Marjon Faaij. She is a Dutch Master of Pharmacy student from Utrecht University. This research project has been done at the Centre for Health Economics Research and Evaluation at the University of Technology Sydney, under the supervision of Alison Pearce and in collaboration with Dounya Schoormans of the PROFILES Registry.

1 in 5 people with cancer report financial difficulties

More than 20% of people with cancer in the Netherlands report financial difficulties as a result of their cancer care. If they are unemployed, this goes up to over 25%, as found in a paper published today in the Journal of Cancer Survivorship.

Dr Alison Pearce, the lead author on the study explains “People often think about the extra costs of cancer care putting financial strain on patients and their families. We were interested in whether having difficulties maintaining a job during cancer treatment might also impact people’s financial worries.”

Financial difficulties were also more common for men, young people, people who weren’t married, and people who had lower education or socioeconomic status. For many people in these groups, financial reserves and flexibility might be limited. For example, young people may not have had time to save money for situations like this, or people working casual jobs might have lower income as well as less access to sick leave.

Professor Dr Lonneke van de Poll-Franse from the PROFILES registry that provided the data: “Although in the Netherlands, like Australia, we have a good social security system to pay for cancer treatment and disability, people still experience financial difficulties. More attention should be paid to the potential origins of this problem, for example maintaining employment, getting a mortgage or insurance or missing out on work-related financial bonusses.”

Some types of cancer were more likely to result in financial difficulties. People who had blood cancer or colorectal cancer were more likely to feel stress due to the costs of cancer, while people with a type of skin cancer called Basal Cell Carcinoma were less likely to experience financial stress.  This may reflect the duration and complexity of treatment for different cancers.

Just like the physical side effects of treatment reduce after stopping treatment, the chances of financial difficulties also reduced over time.

Dr Pearce says “This is probably related to people going back to work. But, we know financial difficulties reduce quality of life. So, it would be better if we could help people to avoid or minimise financial problems, rather than just waiting for them to go away.”

Introducing return to work programs for cancer survivors might be one way to prevent or reduce financial difficulties among cancer survivors. Research suggest that multidisciplinary teams involving physical therapy, psychological support and workplace specific training have been effective in helping people return to work.


Link to paper: A Pearce, B Tomalin, B Kaambwa, N Horevoorts, S Duijts, F Mols, L van de Poll-Franse, B Koczwara. Financial toxicity is more than costs of care: The relationship between employment and financial toxicity in long-term cancer survivors. Journal of Cancer Survivorship. Published online 24th October 2018.

About the authors: This research was conducted by a collaborative group, with researchers from the University of Technology Sydney, Flinders University, the Netherlands Comprehensive Cancer Organisation, the University Medical Center Groningen, and Tilburg University.

About PROFILES: PROFILES (Patient Reported Outcomes Following Initial treatment and Long-term Evaluation of Survivorship) is a registry for the study of the physical and psychosocial impact of cancer and its treatment from a dynamic, growing population-based cohort of both short and long-term cancer survivors. Researchers from the Netherlands Comprehensive Cancer Organisation and Tilburg University in Tilburg, The Netherlands, work together with medical specialists from national hospitals in order to setup different PROFILES studies, collect the necessary data, and present the results in scientific journals and (inter)national conferences.

For more information contact:

Alison Pearce: Alison.pearce@chere.uts.edu.au

$46 billion in productivity lost to cancer in developing countries

Premature – and potentially avoidable – death from cancer is costing tens of billions of dollars in lost productivity in a group of key developing economies that includes China, India and South Africa.

Over two-thirds of the world’s cancer deaths occur in economically developing countries, but the societal costs of cancer have rarely been assessed in these settings.

In a paper to be published in the journal Cancer Epidemiology we show that the total cost of lost productivity due to premature cancer mortality in Brazil, Russia, India, China and South Africa, collectively known as the BRICS countries, was $46.3 billion in 2012 (the most recent year for which cancer data was available for all these nations).

The largest loss was in China ($28 billion), while South Africa had the highest cost per cancer death ($101,000).

The BRICS countries are diverse but have been grouped by economists and others because of their particularly rapid demographic and economic growth. Currently the five countries combined comprise over 40% of the world’s population and 25% of global gross domestic product.

Liver and lung cancers had the largest impact on total lost productivity across the BRICS countries due to their high incidence, our research found.

But in South Africa, there are high productivity losses per death due to AIDS-related Kaposi sarcoma – an indication of the magnitude of the HIV/AIDS epidemic in Sub-Saharan Africa, and in India, lip and oral cancers dominated due to the prevalence of chewing tobacco there.

Many cancers which result in high lost productivity in the BRICS countries are amenable to prevention, early detection or treatment. Sadly, and in contrast to developed countries, most developing countries do not have such programs.

In particular, tobacco- and infection-related cancers (such as liver, cervical, stomach cancers and Kaposi sarcoma) were major contributors to productivity losses across BRICS countries.

Beyond the evident public health impact, cancer also imposes economic costs on individuals and society. These costs include lost productivity — where society loses the contribution of an individual to the market economy because they died prematurely from cancer.

Valuing this lost production gives policy- and decision-makers an additional perspective when identifying priorities for cancer prevention and control. This is particularly important in developing economies, where workforce and productivity are key resources in ensuring sustained economic growth.

Developing economies often have different demography, exposure to cancer risk factors, and economic environments than developed countries – all of which could modify the economic impact of cancer.

Locally tailored strategies are required to reduce the economic burden of cancer in developing economies. Focussing on tobacco control, vaccination programs and cancer screening, combined with access to adequate treatment, could yield significant gains for both public health and economic performance of the BRICS countries.

Country specific results

Brazil:

  • In Brazil, lung cancer resulted in the greatest productivity losses ($0.5 billion in 2012), with $402 million in lost productivity each year due to tobacco smoking, although Brazil has recently implemented successful tobacco use reduction policies.
  • Rapidly growing rates of obesity in Brazil result in up to $126 million in lost productivity due to cancer each year.

Russian Federation:

  • Total productivity lost due to cancer in the Russian Federation were $5 billion in 2012. They had the second highest cost per death of the BRICS countries.
  • Both liver and head and neck cancers contribute to the high number of excess alcohol-related deaths in the Russian Federation, with a likely considerably economic impact.

India:

  • Lip and oral cancers dominate lost productivity in India due to the relatively high prevalence of chewing tobacco. The use of smokeless tobacco, often combined with betel quid, may account for lost productivity of $486 million each year.
  • In India, the lost productivity costs per death of leukaemia are relatively high, perhaps because the advanced, multi-modality treatments required are not available, or are difficult to access

China:

  • Productivity lost due to cancer in China was $26 billion in 2012, more than all the other BRICS countries combined.
  • Two-thirds of total lost productivity costs in China were in urban areas (66%), considerably more than the proportion of people who reside in urban areas (52%).
  • In China, dietary aflatoxins in many staple foods is a major risk factor for liver cancer, and our results suggest this costs the economy $972 million annually.

South Africa:

  • In South Africa there are high productivity losses per death due to AIDS-related Kaposi sarcoma – an indication of the magnitude of the HIV/AIDS epidemic in Sub-Saharan Africa.
  • Cervical cancer represents a particularly large economic impact in South Africa. While there are new vaccinations available to prevent HPV, one of the precursors to cervical cancer, the effects of vaccination need a few decades to show impact. In the meantime, cervical cancer screening can offer an effective solution to reduce both the public health and economic burden of cervical cancer.

The reality of chemotherapy side effects

My latest publication shows that over three-quarters of people having chemotherapy in New South Wales experience multiple side effects during their treatment, and for over 60% of people this included a serious side effect. These results confirm previous research that suggests side effects might be more common, and more serious, in clinical practice (ie ‘real life’) than reported in clinical trials.

During their chemotherapy, 86% of our sample (who had lung, breast or colorectal cancer) reported at least one side effect, and 67% experienced six or more different side effects. Fatigue was the most common side effect (80%), followed by pain (75%), constipation (74%), and diarrhoea (74%). For nearly a quarter of participants (24%) the side effects were mild, but for many more (62%) the side effects were moderate or severe.

The number of different side effects experienced during chemotherapy
The number of different self-reported side effects experienced during chemotherapy by participants with lung, breast or colorectal cancer in the EOCC study

Older people in our sample were less likely to have a side effect. This is perhaps because older people tend to receive less aggressive treatments, despite this also possibly reducing how effective their chemotherapy is. Other things, like the type and stage of cancer, gender, education, and socioeconomic status did not change how likely a person was to have a side effect.

When we looked at the patterns of side effects over time, many people had mild side effects which stayed with them throughout their chemotherapy, especially constipation, diarrhoea, mucositis and nausea / vomiting. There was also a particularly large proportion of people reporting serious fatigue throughout their treatment.

The frequency of side effects
The frequency of side effects self-reported side effects experienced during chemotherapy by participants with lung, breast or colorectal cancer in the EOCC study

The first study of this type in Australia, our Elements of Cancer Care study followed 441 people with breast, lung and colorectal cancer having chemotherapy in New South Wales. We interviewed them each month during their chemotherapy treatment to ask them about a wide range of topics, including what side effects they’d experienced and how serious they were. We also collected information from their medical records at the hospital and with Medicare.

Side effects in real life vs clinical trials

When new chemotherapy treatments are developed, the side effects they cause are tested in research studies called clinical trials. Doctors, patients and policy makers then base their decisions about chemotherapy on the data from these clinical trials.

But, what happens in clinical trials does not always reflect what happens in real life. Clinical trials usually have very strict criteria for who can participate. Clinical trial participants are usually younger and fitter than typical cancer patients, and so be more able to cope physically with chemotherapy and therefore less likely to have a side effect. In addition, clinical trials are usually conducted in large, high-quality, teaching hospitals with extra monitoring and treatment of side effects, which may reduce how often they occur, or how serious they become.

Asking patients about side effects

How patients are asked about side effects can also influence what they report. When doctors or nurses ask general questions like “how have you been feeling” or “have you had any side effects” patients might not remember or report all of their side effects, particularly if they are not still happening. We gave participants a checklist of side effects, which may have encouraged them to report a greater variety of side effects, and those side effects which were less severe. This is a technique which could be implemented by doctors and nurses in cancer care clinics.

What does this mean?

On top of dealing with a diagnosis of cancer and being treated with chemotherapy, having side effects can affect someone’s physical health, survival, quality of life and emotional state. Because our information comes from real life, rather than clinical trials, it allows doctors, nurses, policy makers and patients to think more realistically about the side effects of chemotherapy.

 

 

Full reference (Open Access): Pearce A, Haas M, Viney R, Pearson SA, Haywood P, Brown C, Ward R (2017). Incidence and severity of self-reported chemotherapy side effects in routine care: A prospective cohort study. PLOS One 12(10): e0184360

The Elements of Cancer Care study was funded by the National Health and Medical Research Council (Health Services Research Grant ID 455366). Alison was supported by a University of Technology Sydney Doctoral Scholarship, and a PhD top-up scholarship from NHMRC Health Services Research Grant (ID455366). Sallie is supported by a Cancer Institute NSW Career Development Fellowship (ID: 12/CDF/2-25). No funding organisation had any role in the study design, data collection and analysis, decision to publish or preparation of the manuscript.

Selecting a wage growth rate for economic evaluations in an uncertain economy

When doing economic evaluation you often need to forecast into the future.  And when projecting about earnings, you need to account for changes in the economy (for example, inflation). I am currently working on a study examining productivity losses associated with cancer in Ireland, and need to account for wage growth in the future.  But how do you do this when the economy is as uncertain as the current situation in Ireland?

Wage growth:  This Wall Street Journal blogpost describes wage growth as one of the key indicators of economic health (as well as some of the current problems with wage growth in the US).  In Ireland there are similar economic woes, but future predictions of the real wage growth rate are harder to come by.  Instead, people have used the Gross National Product (GNP) percentage change per year as a proxy for wage growth.  GNP is the total value of all products and services produced by residents of a country over a particular period of time.  Previous work similar to mine (Hanley 2012 & Hanley 2013) has used older versions of these predictions, which estimated an average growth rate of 2.6%.

GNP in Ireland:  At the height of the Celtic Tiger period (mid 1990’s to mid 2000’s) GNP in Ireland was over 5%, however in the period 2007 to 2012 the growth rate of GNP in Ireland has been -2.2%.  The Economic and Social Research Institute (ESRI) propose that this was due to the global financial crises causing the Irish housing market to crash.  This in turn led to collapse of the construction and banking industries, resulting in Ireland entering a period of recession. This period has been characterised by high levels of state debt and unemployment.

graph for blog

GNP growth projections:  According to the latest report from ESRI, the GNP growth rate in coming years will be dependent on a number of factors, particularly the recovery of the EU economy, domestic policy decisions and the impact of changes in both the EU and Irish economies on domestic government finances.  The report explains that the current government policy-making position is a risk averse one of ‘no regrets’. Although not necessarily resulting in the ‘optimal’ policy option being selected, this approach should result in policy options which lead to generally positive outcomes across a range of possible scenarios being selected / implemented.  This is necessary given the current tenuous position of the Irish economy to withstand any additional shocks, as well as the high level of uncertainty in the economic environment both locally and more broadly in the EU and worldwide.

The ESRI report includes estimates of GNP growth rates in the medium term (2015 to 2020) under three recovery scenarios, ranging from stagnation to recovery.  See table below for summary of GNP growth under the three scenarios.

% GNP Change per year

 Scenario

2012

2013

2014

2015

2016

2017

2018

2019

2020

Recover

3.3

1.2

0.5

4.3

3.6

4

3.4

3.2

3.6

Delayed adjustment

3.3

1.3

-0.9

3

1.1

2.8

3.1

 –  –
Stagnation

3.3

1.2

0

1.9

0.6

2.1

0.4

0.9

1.7

This report provides an ideal source for the proxy wage growth estimates, as it takes into account many aspects of economic recovery you might not have considered.  For the calculation of productivity losses associated with cancer in Ireland, you could use the wage growth rate based on the forecast GNP growth rate from the recent ESRI report.   You can use the Delayed Adjustment scenario as the base case, with the Recovery and Stagnation scenarios providing upper and lower bounds for sensitivity analysis.

Extra considerations:  If using this, you need to be aware of a number of considerations:

  • You could calculate the growth adjustment per year for each year of lost productivity, or use the average of the annual % change for the years 2015 – 2030.  For my work, the changes between years are less important, and I will use the average.
  • The wage growth rate may not be consistent with reports of other improvements in the economy, making GNP growth a poor proxy.  This was well described in the Wall Street Journal article mentioned earlier, which discusses how the current pattern of economic recovery in the US is masking consistently low real wage growth rates.  In this case, you must weigh up using current real wage growth (which may not hold for the future) against using a potentially poor proxy but which has been projected to take account of the changing economic environment.  For my research, I believe that the uncertainty around economic recovery scenarios is more important than the potential difference between actual wage growth and the proxy value, so I am going to use the projected GNP growth.
  • As with any projection or forecast or prediction, it is almost certainly wrong!  So you need to carefully consider the uncertainty around the estimates and how they might influence your results.

Overall, this is a difficult economic time to be trying to make forecasts, however the very useful report from ESRI gives a good platform on which to form a base case and sensitivity analysis.  And remember, the most important component of choosing a growth rate (or any assumption in your model) is to have a justification for your choice of methods and sources.  

Multiple regression ‘cheat sheet’

This was a ‘cheat sheet’ I put together during the ACSPRI 2012 Winter Program course “Fundamentals of Multiple Regression” (Fun Reg). The cheat sheet simply summarises the concepts, formula’s and assumptions often used in regression analysis which were discussed in the course.

Fun Reg Cheat Sheet

This was a fantastic course that I would highly recommend to anyone looking to use regression in their research. The course description is below for your information, and you can check out the full range of courses they run at http://www.acspri.org.au/courses

Fundamentals of multiple regression: This course provides an introduction to, and the fundamentals of multiple regression, covering enough of the statistical material for the intelligent use of the technique. The approach is informal and applied rather than emphasising proofs of relevant theorems. The course begins with a review of bivariate regression and extends the relevant principles to the case of multiple regression. Particular attention is given to the application of multiple regression to substantive problems in the social sciences. By the end of the course, the student will have a knowledge of the principles of multiple regression, and the ability to conduct regression analyses, interpret the results, and to inspect elementary regression diagnostics to test the underlying model assumptions. This course provides the foundations necessary for progression to ‘Applied Multiple Regression Analysis’, and to subsequent advanced-level courses in structural equation modelling and log-linear modelling.

My experience of working with data that isn’t ‘mine’

This blog post was originally written for and published by the Health Services Research Association of Australia and New Zealand (HSRAANZ) Emerging Researcher Group (ERGO) section of the August 2012 Newsletter.

 

During my PhD I was lucky enough to be offered access to a large dataset for analysis.  This was a fantastic opportunity, which has strengthened my PhD and my data management and analysis skills.  Logistically however, it was not always easy.  I learnt a number of lessons that I thought other early career researchers may find useful.

There were four main issues I encountered:

Physically accessing the data

The data was held at another institution, who had received ethics approval and access to the data on the basis that it was kept confidential and did not leave their secure building.  I therefore needed to go to their site to conduct the analysis.  Whilst this was not foreseen to be an issue, there was a huge amount of red tape to get access to the university building, a desk, a computer, a log-in etc, because I was neither a staff member nor a student at that university.

To avoid these issues, start planning logistics early, and be realistic about what you need.  ‘Hot desking’ is not as easy as it sounds, so if you need your own computer, or a larger than average hard drive, be specific.  Make sure you ask very specific questions very early on about how you will access buildings and resources such as stationary and software etc.

Working with data that wasn’t ‘mine’

It takes extra time to get to know your data when you haven’t been involved in collecting it.  A data dictionary can be extremely helpful in these situations, and it is worth continuing to ask for one, if it is not provided with your data.  The other issue with working with data that isn’t yours, is that you may end up waiting for other people to prepare or clean datasets before you can use them.  Obviously this impacts on timelines, so build in a generous buffer into your project plans.

Working with a very large dataset

My working data file was over 60Gb, and analysis code often took days to run.  The computer system at the University was not really configured to cope with work being done overnight, and so often my programs would get interrupted by virus scans and automated backups.  I ended up using a local drive and doing my own backups, to avoid the issue, but in future I would try to sort this out before starting.

Working off site

Finally, I have already covered some of the access issues of working off site, but the other issues this raised was that it was quite isolating.  There was no one there who was really responsible for my work or who understood my project and methods, and I couldn’t sit down and show my data to anyone at my PhD office.  It was also difficult to integrate time to be in two offices into my daily schedule.  Meetings and events often prevented me going for days at a time, by which time I had forgotten what I was working on!  The solution to this was to get as organized as possible, to keep detailed notes of what I did each day, and to use tools such as dropbox (where allowed) to keep track of things.