Sunday 28 August 2011

Introduction to OpenMx for twin analysis

Preamble
Twin data can be used to estimate the relative contribution of genetic and environmental influences to quantitative traits. For many years, the key textbook on methods has been Neale & Cardon (1992): Methodology for genetic studies of twins and families. This goes hand in hand with annual workshops on Twin Methodology hosted by the Institute of Behavioral Genetics at Boulder, CO. When I first became interested in this topic, the preferred software for running twin analyses was Mx, which was freely available from the University of Virginia. Scripts for running different types of analysis were available from the Mx website, and also from a helpful site at the Free University of Amsterdam. Nevertheless, even with these tools, twin analysis was technical, complicated, and often frustrating. Mx had a tendency to crash unexpectedly without giving you much help in understanding why. Its idiosyncratic error messages were at first amusing (e.g., "Well there I was, all ready to equate all the matrices in this group to those of a previous group, and then you didn’t put *which* group on the same line") , but after a bit, you wanted to hurl your computer through the nearest window, as you were admonished for doing something completely incomprehensible, e.g. "Your observed covariance matrix is not positive-definite".
Around 2009 (I can't find a precise date on the web), the OpenMx project was started, with the goal of creating a new software package, OpenMx, written in the R programming language. As the authors state: "In some ways, Mx is barely recognizeable in OpenMx, since the interface is completely different and the software has been rewritten top to bottom using modern programming techniques and languages. But deep within OpenMx still beats the ancient heart of Mx: a general purpose matrix optimization package."
Last year when I needed to train a couple of my graduate students to do twin analyses, I was ready to embrace OpenMx and turned with alacrity to the OpenMx website. But alas, the age old problem arose: the very very clever people who had written OpenMx had no idea how to communicate with lesser mortals. I already knew a fair bit about twin analysis and Mx, but I struggled with the manuals and examples. And at several points, the general recommendation seemed to be 'Go away and learn R first'. But I knew that if you want to learn a programming language, you only really do so if you have a problem you want to tackle. And so I decided I would write an elementary introduction for my students that did not assume that you knew anything about R, Mx, structural equation modeling or behaviour genetics.  There was, of course, a problem. I didn't know anything myself about R, and my mastery of the other topics was pretty amateurish. But in some ways, an amateur has an advantage for manual-writing: if you've had to work it out for yourself, you know what needs explaining.
Anyhow, having created this introductory document, I felt it was worth sharing in case it could be useful to others. Although I have tried to simplify and explain, this material is not easy to master, especially if you have no background in programming. But for what it is worth, here is my best attempt at explanation of the basics, for those rare creatures who want to analyse twin data and haven't already mastered Mx or OpenMx.
I should add that OpenMx is appropriate for much broader applications of structural equation modeling, but the focus here is just on what you can do with twin data.
This manual assumes you will be working on a PC in a Windows-based environment.  Please note, the manual was written by someone who was teaching themselves both R and OpenMx, and so some of the example scripts are cumbersome.


A note on formatting
I apologise for the erratic and inconsistent formatting in these blogposts. Google's Blogger has many fine features, but control of formatting is not one of them. It has a mind of its own and will decide which font and spacing to use, despite instructions to the contrary. I don't want to spend hours delving in the html version to sort this out, so decided we'd have to live with it.

The scripts
You may find it easier to copy the scripts from a Word document, rather than from this blog, in which case you can download the original document on which this blog is based from this site.

Feedback
Comments are enabled on this blog, and I'd welcome feedback and suggestions for improvement. I will try to incorporate any good ideas, but I can't provide technical support, as (a) I am not very expert myself, and (b) have a busy job that is only occasionally concerned with twin analysis.
I hope somebody out there finds this useful!




No comments: