Neurosynth: Frequently Asked Questions
- How are these images generated?
- Doesn't the coordinate extraction process have errors?
- I found an error in the coordinate data for a paper. How can I fix it?
- How come my study isn't in the database?
- How do you deal with the fact that different studies report coordinates in different stereotactic spaces?
- How do you distinguish between activations and deactivations when extracting coordinates from published articles?
- Are individual words or phrases really a good proxy for cognitive processes? Can you really say that studies that use the term 'pain' at a certain frequency are about pain?
- Isn't selection bias likely to play a role here? If everyone thinks that the amygdala is involved in emotion, isn't it likely that an automated meta-analysis will simply capture that bias?
- How come so many of the terms are non-content words no one cares about (e.g., 'addressed', 'abstract', or 'reliable')?
- Why do images take a while to load?
- When I enter coordinates manually, they're rounded to different numbers!
- How are the images thresholded?
- What format are the downloadable images in?
- What do the 'forward inference' and 'reverse inference' descriptions mean in the feature image names?
- Activation coordinates are extracted from published neuroimaging articles using an automated parser.
- The full text of all articles is parsed, and each article is 'tagged' with a set of terms that occur at a high frequency in that article.
- A list of several thousand terms that occur at high frequency in 20 or more studies is generated.
- For each term of interest (e.g., 'emotion', 'language', etc.), the entire database of coordinates is divided into two sets: those that occur in articles containing the term, and those that don't.
- A giant meta-analysis is performed comparing the coordinates reported for studies with and without the term of interest. In addition to producing statistical inference maps (i.e., z and p value maps), we also compute posterior probability maps, which display the likelihood of a given term being used in a study if activation is observed at a particular voxel.
There are two reasons for this. One is that we didn't want to arbitrarily restrict the lexicon to just those words that we personally found meaningful. Some of the terms we think are uninteresting might be of considerable interest to someone else. Since there's no real cost to making additional terms available, we decided to make the list as comprehensive as possible in the latest version of Neurosynth.
The other reason we left non-content terms in is that they provide a nice baseline for evaluating the results of terms that we do care about. Since there's no particular reason why terms like 'several' or 'through' should be associated with specific patterns of activation, the expectation is that meta-analysis of such terms should reveal little or no activation. That's exactly what turns out to be true of most non-content words. In contrast, most of the terms researchers are likely to care about reveal robust patterns of activation, which gives us all the more reason to think the approach is working well.
There are three reasons why images may take a while to display in the interactive viewer when you first load a page. First, the image files themselves are fairly large (typically around 1 MB each), and most pages on Neurosynth display multiple images simultaneously. This means that your browser is usually receiving 1 - 3 MB of data on each request. Needless to say, the images can't be rendered until they've been received, so users on slower connections may have to wait a bit.
Lastly, some images are only generated on demand--for example, if you're the first user to vist a particular brain location, you'll enjoy the privilege of waiting for the coactivation image for that voxel to be generated anew. This will typically only take a second or two, but can take as long as a minute in rare cases (specifically, when too many users are accessing Neurosynth and we need to spin up another background worker to handle the load).
The images you see are thresholded to correct for multiple comparisons. We use a false discovery rate (FDR) criterion of .01, meaning that, on average, you can expect about 1% of the voxels you see activated in any given map to be false positives (though the actual proportion will vary and is impossible to determine).
When you click the download link below the map you're currently viewing, you'll start downloading a 3D image corresponding to the map you currently have loaded. The downloaded images are in NIFTI format, and the files are gzipped to conserve space. All major neuroimaging software packages should be able to read these images in without any problems. The images are all nominally in MNI152 2mm space (the default space in SPM and FSL), though there's a bit more to it than that, because technically we don't account very well for stereotactic differences between studies in the underlying database (we convert Talairach to MNI, but it's imperfect, and we don't account for more subtle differences between, e.g., FSL and SPM templates). For a more detailed explanation, see the paper.
Note that the downloaded images are not dynamically adjusted to reflect the viewing options you currently have set in your browser. For instance, if you've adjusted the settings to only display negative activations at a threshold of z = -7 or lower, clicking the download link won't give you an image with only extremely strong negative activations--it'll give you the original (FDR-corrected) image. Of course, you can easily recreate what you're seeing in your browser by adjusting the thresholds correspondingly in your off-line viewer.
- Forward inference map: z-scores corresponding to the likelihood that a region will activate if a study uses a particular term (i.e., P(Activation|Term));
- Reverse inference map: z-scores corresponding to the likelihood that a term is used in a study given the presence of reported activation (i.e., P(Term|Activation));
- Posterior probability map: the estimated probability of a term being used given the presence of activation (i.e., P(Term|Activation)).
Long answer: The forward inference and reverse inference maps are statistical inference maps; they display z-scores for two different kinds of analyses. The forward inference map can be interpreted in the same way as most standard whole-brain fMRI analysis: it displays the degree to which each voxel is consistently activated in studies that use a given term. For instance, the fact that the forward inference map for the term 'emotion' displays high z-scores in the amygdala implies that studies that use the word emotion a lot tend to report consistent activation in the amygdala. Note that, unlike most meta-analysis packages (e.g., ALE or MKDA), z-scores aren't generated through permutation, but using a chi-square test that compares the observed distribution of activations across a null of uniform distribution throughout gray matter. Generally speaking, this procedure gives slightly (but only very slightly) more liberal results than a permutation test would produce. (We use the chi-square test solely for pragmatic reasons: we generate thousands of maps at a time, so it's not computationally feasible to run thousands of permutations for each one.)
The reverse inference maps provides somewhat different (and, in our view, more useful) information. Whereas the forward inference maps tell you about the consistency of activation for a given term, the reverse inference maps tell you about the relatively selectivity with which regions activate. Strictly speaking, these maps reflect a comparison between all the studies in our database that contain a term and all those that don't. So for instance, the fact that the amygdala shows very strong activation in the reverse inference map for emotion implies that studies that use the term emotion frequently are much more likely to report amygdala activation than studies that don't use the term emotion. That's important, because it controls for base rate differences between regions. Meaning, some regions (e.g., dorsal medial frontal cortex and lateral PFC) play a very broad role in cognition, and hence tend to be consistently activated for many different terms, despite lacking selectivity. The reverse inference maps let you make more confident claims that a given region is involved relatively selectively in a particular process, and isn't involved in just about every task.