Proposed code cleanup by Lestropie · Pull Request #1 · mblesac/mrtrix3

Lestropie · 2021-11-15T08:19:24Z

I had some fun cleaning up and Pythonifying the code and so now have a bit of a clearer picture as to what's going on. Given I don't yet have any exemplar data I've not made any attempt at all to run it, so there may well be outright syntax errors in there let alone runtime issues; but depending on their severity / how cryptic the errors are you might be able to address some stuff at your end (feel free to commit directly to this branch if you wish). My changes should also clean up the terminal output quite a bit.

This PR proposes merging my new branch on your fork into the master branch of your fork. Therefore, once this PR is merged, the corresponding changes will appear in MRtrix3#2393. So we will use this PR to discuss the cleanup / refactoring between us, and then MRtrix3#2393 can be used for final discussion with a wider audience before merge into the MRtrix3 dev branch.

I'm tempted to suggest that the dHCP pipeline should be a separate 5ttgen algorithm.
While it's true that the joint label fusion with the M-CRIB data is executed in both scenarios:

In the absence of the dHCP pipeline, that forms the entirety of the output data;
In the presence of the dHCP pipeline, it forms the vast minority of the output data, only being used to insert additional sub-cortical grey matter structures into the dHCP segmentation.

I'm unsure of how much utility the output of the -parcellation option would be if the dHCP pipeline data are used. Also, if the dHCP version were standalone, it could omit the modality command-line argument, which is non-functional in that use case.

Minor points:

Given other work in BIDS & quantitative imaging I prefer to use e.g. "T1w" rather than "T1" wherever possible.
(Though admittedly now that I think of it this is a problem in 5ttgen fsl...)
Config file entries are camel case, so I changed "MCRIBpath" to "MCRIBPath".
I'm not sure if the histogram matching is requisite here; I'd have expected the ANTs scripts to be doing their own intensity matching (or using a registration metric that doesn't necessitate such)?
I don't think it is necessarily guaranteed that the sGM segmentations provided by M-CRIB will always lie within voxels that are classified by the dHCP pipeline as containing 100% WM. So greater care might be required regarding how those sGM segmentations are inserted; i.e. it might not be only the WM image that needs to be decreased to accommodate those extra parcels.
I'm strongly considering changing the -sgm_amyg_hipp option to be specific to the 5ttgen fsl algorithm, and to always assign the amygdalae and hippocampi to the sGM volume in other algorithms. The option was put there because in 5ttgen fsl the segmentations of those structures by FIRST are on occasion less extensive than the GM fraction estimated by FAST, and so streamlines are not able to reach those structures for the sGM priors to kick in; but it's a problem specific to that algorithm, and I suspect that it should not only be the default behaviour for other algorithms but also that the converse behaviour doesn't even need to be provided. Do you have any thoughts on this based on your own experiences?
There's a lot of importing of data into the scratch directory that can probably be omitted. Not only the import as .mif then conversion to .nii; much of those data could likely be used in-place without having to convert anything into the scratch directory. You just need to make sure to use absolute filesystem paths (since the user may specify a path relative to the working directory, but the relevant command is executed from inside the scratch directory).
Many scratch file names could possibly be shortened in order to reduce the length of terminal outputs.
There were some errors in I think it was the MCRIB GM parcel list: a couple of nodes missing and a couple of nodes duplicated. I hope you can see from the code restructuring how it facilitates diagnosing such things. But it's worth (once it's in an operational state) comparing the output of this version to your own code to confirm.

Co-authored-by: Manuel Blesa <mblesac@gmail.com> Co-authored-by: Paola Galdi <paola.galdi@gmail.com>

mblesac · 2021-11-16T16:25:45Z

Hi @Lestropie,

We've been working on this following your advice. You'll see three commits:

Fixing some syntax errors
Removing the histogram matching
Splitting the command into two different algorithms: neonatal_mcrib and neonatal_dhcp

The results are very similar to the original implementation (note that the results are never identical). Let us know what do you think about the changes. In the meantime, I'm doing the paperwork to be able to send you a dataset, then you can do the actual testing.

I think your points have been addressed in the different modifications to the code, but there are thee I would like to comment:

I'm unsure of how much utility the output of the -parcellation option would be if the dHCP pipeline data are used. Also, if the dHCP version were standalone, it could omit the modality command-line argument, which is non-functional in that use case.

I think is still an interesting option to have. The cortical parcellation provided by the dHCP has only 32 nodes. So I don't think it is gonna be used for connectomics. The M-CRIB atlas is the translation of the freesurfer parcellation to the neonatal brain, so I think that will become one of the most used parcellations (with permission of the neonatal AAL derived or the JHU parcellations). My opinion is to leave it as an additional output, the people can always use another parcellation, but in this way, you can save a lot of time.

I don't think it is necessarily guaranteed that the sGM segmentations provided by M-CRIB will always lie within voxels that are classified by the dHCP pipeline as containing 100% WM. So greater care might be required regarding how those sGM segmentations are inserted; i.e. it might not be only the WM image that needs to be decreased to accommodate those extra parcels.

The initial WM segmentation created is done combining the WM and the sGM of the dHCP outputs (we call it WM, which can be misleading). So in fact we are inserting the sGM of the M-CRIB atlas in the combination of WM+sGM of the dHCP. I found this approach to provide a better delineation of the subcortical structures.

I'm strongly considering changing the -sgm_amyg_hipp option to be specific to the 5ttgen fsl algorithm, and to always assign the amygdalae and hippocampi to the sGM volume in other algorithms. The option was put there because in 5ttgen fsl the segmentations of those structures by FIRST are on occasion less extensive than the GM fraction estimated by FAST, and so streamlines are not able to reach those structures for the sGM priors to kick in; but it's a problem specific to that algorithm, and I suspect that it should not only be the default behavior for other algorithms but also that the converse behavior doesn't even need to be provided. Do you have any thoughts on this based on your own experiences?

I have no answer for that. These structures are very difficult to segment in this population due to the size/resolution and the low contrast. I think it makes sense to leave it also for these algorithms as it is and let the users decide, but again, I have no definitive answer.

Lestropie · 2021-11-18T03:32:19Z

Splitting the command into two different algorithms: neonatal_mcrib and neonatal_dhcp

Would simply naming them as "dhcp" and "mcrib" be a problem in any way?
(e.g. is there dHCP data to which this method would not be applicable)
One advantage of having separate algorithms is that they can have independent command-line interfaces. So for instance, with the dHCP algorithm, the path to the dHCP processed data must always be specified. Currently it's a mandatory command-line option, which we use vary sparingly in MRtrix3: we try to have anything compulsory as a positional argument. So for instance:
- Is there a mask provided by the dHCP pipeline that one would expect to be used (if not always, then at least in the vast majority of cases)?
- Is there any need to provide separately an input structural image and the path to the dHCP pipeline output, or would it make more sense to instead only provide the dHCP path, and the structural image would be pulled from that structure?

The results are very similar to the original implementation (note that the results are never identical).

My question here was not about any potential stochasticity within the relevant ANTs commands (which is what I suspect you're referring to here), but moreso the fact that I observed potential errors in your code relating to duplicate / absent indices, and so hoped that you would spot the differences in script outcomes from those and confirm whether or not their resolution is correct.

The initial WM segmentation created is done combining the WM and the sGM of the dHCP outputs (we call it WM, which can be misleading). So in fact we are inserting the sGM of the M-CRIB atlas in the combination of WM+sGM of the dHCP. I found this approach to provide a better delineation of the subcortical structures.

While true, it doesn't quite address the original question. Let's say you have a voxel that is outside of the dHCP combined WM & sGM mask; it could be cGM or CSF according to dHCP. Now that voxel is classified as sGM by M-CRIB. What will be the outcome of your script in that voxel?

I have no answer for that.

No worries. I'll pose that question over on the main MRtrix3 repo.

Lestropie · 2021-11-18T03:35:03Z

lib/mrtrix3/_5ttgen/neonatal.py

 DHCP_AMYG_HIPP = list(range(1, 5))

-MCRIB_CGM = list(range(1000, 1036) + range(2000, 2036))
+MCRIB_CGM = list(range(1000, 1004)) + list(range(1005,1036)) + list(range(2000, 2004)) + list(range(2005,2036))


This relates to my question RE: absent / duplicated indices. Looks like the elements I thought may have been accidentally omitted you intended to omit; but I have some recollection that there were additionally indices that were duplicated in your original code. This is the better way to do it, but I was partly curious whether or not you would spot the influence of those duplications in a change in output of the script.

(P.S. Given it looks like an unusual omission, it might be worth adding a code comment indicating the structures omitted and why)

Hi,

We have been working on this, these are the new changes:

Now the algorithms are called mcrib and dhcp.

The dHCP algorithm has been modified, now you just need to input the path of the derivates (/anat) folder and that's all, no mask required anymore. Note that there are some versions of the dHCP pipeline around with different names, I think now it covers all the possible cases.

In the dHCP algorithm we added a filter to ensure that the merging of the sGM doesn't overlap with the CSF.

A note has been added about the missing labels: the labels 1004 and 2004 don't exist in the original M-CRIB atlas.

As it is now, the dHCP algorithm is the same as in my original paper (with the addition of this CSF filter to ensure that the sGM doesn't enter in the CSF). Basically, it combines the tissue probability maps of the dHCP pipeline with the binary subcortical gray matter parcellations of the M-CRIB atlas. However, I've been thinking to modify it and add two possibilities: the use of the probability maps or not. This will translate in the addition of a new flag (-posteriors), that if specify it it will work as it does now, the only difference will be the use of tissue probability maps for the subcortical GM, in this way all the tissues will have the partial volumes. If not specify it, the algorithm will use only the binary segmentations for everything. I was thinking about this because probably a lot of the people that run the dHCP pipeline didn't use the -additional flag (for example in the dHCP data itself, this data is not there). What do you think? Does this make sense?

…put) Manuel Blesa <mblesac@gmail.com> Paola Galdi <paola.galdi@gmail.com>

Lestropie · 2021-11-24T02:22:32Z

Diverting conversation from #1 (comment) to main thread as it relates to more items than were highlighted in the code for that specific comment.

now you just need to input the path of the derivates (/anat) folder

If you wanted to be really fancy, you could make parsing of the user-specified input path more robust, so that as long as the user provides a path that can be reasonably interpreted the script will still proceed. So e.g. if the user provides the path to the subject directory, the anat/ sub-directory within that, or a file within that sub-directory, the script will figure out the location of the anat/ directory and then proceed from there. Just makes it less likely to throw errors at users. I do this with some command interfaces. But it's most definitely not compulsory.

However, I've been thinking to modify it and add two possibilities: the use of the probability maps or not.

The description of the situation is still a bit too ambiguous for me to be able to immediately advise here, as I'm not familiar with the data provided by the pipeline. Let's see if I understand the situation correctly:

Everything other than sub-cortical GM:
- dHCP provides binary tissue classifications
- dHCP provides tissue probability priors, only if -additional is specified (and there is distributed pre-processed data where this flag was not used)
Sub-cortical GM:
- dHCP provides minimal segmentation, which is immediately ignored in the 5ttgen dhcp algorithm here
- M-CRIB + ANTs provides binary segmentations only (?)

If this is the case, then my advice would be that the script should check for presence of tissue probability priors in dHCP data.

If present, utilise those priors;
If absent, or if the user specifies some command-line option e.g. -binary, then utilise the binary segmentations.

However I'm concerned my interpretation is incorrect because you then talk about use of tissue probability priors for sub-cortical GM:

the only difference will be the use of tissue probability maps for the subcortical GM

So please correct my dot points above as necessary and I can revise.

mblesac · 2021-11-24T14:16:39Z

The description of the situation is still a bit too ambiguous for me to be able to immediately advise here, as I'm not familiar with the data provided by the pipeline. Let's see if I understand the situation correctly:

Everything other than sub-cortical GM:

dHCP provides binary tissue classifications

dHCP provides tissue probability priors, only if -additional is specified (and there is distributed pre-processed data where this flag was not used)

Yes, that's right.

Sub-cortical GM:

dHCP provides minimal segmentation, which is immediately ignored in the 5ttgen dhcp algorithm here

M-CRIB + ANTs provides binary segmentations only (?)

dHCP provides a segmentation of the sGM (also tissue probability priors only if -additional is specified), but I was not happy about how this segmentation looks for this purpose. See an example

M-CRIB + ANTs provides both, binary and tissue probability priors, but for the original implementation I used the binary masks. Please take a look to an example (ignore the cerebellum):

My reason for choosing the M-CRIB subcortical parcellation was that it was not as "bulky" as the dHCP one and it could allow the WM tracts to better pass through the middle. This was purely visual, based on my opinion, I didn't do any quantitative experiment.

If this is the case, then my advice would be that the script should check for presence of tissue probability priors in dHCP data.

If present, utilise those priors;

If absent, or if the user specifies some command-line option e.g. -binary, then utilise the binary segmentations.

However I'm concerned my interpretation is incorrect because you then talk about use of tissue probability priors for sub-cortical GM:

the only difference will be the use of tissue probability maps for the subcortical GM

So please correct my dot points above as necessary and I can revise.

What I was thinking is:

if the -binary flag is selected (or if the folder of the probability maps doesn't exist): rise a warning and use the binary segmentation for both: cGM and sGM
By default use the posteriors to create the 5tt file using the tissue probability maps for both tissues, the cGM and sGM.

Does make sense?

Lestropie · 2021-11-24T22:39:08Z

OK cool. So an important detail here is that both dHCP and M-CRIB (may) have non-binary segmentations. I personally don't think there's a huge benefit in offering independent control of the two.

if the -binary flag is selected (or if the folder of the probability maps doesn't exist): rise a warning and use the binary segmentation for both: cGM and sGM

That logic doesn't quite work as it results in issuing a warning if the user specifies that command-line option. It should instead be something like:

use_hard_segmentation = False
if app.ARGS.hard:
    use_hard_segmentation = True
elif not os.path.isfile(soft_parcellation_path):
    app.warn('No tissue posteriors found (-additional flag not used); output will be hard segmentation')
    use_hard_segmentation = True

Variable use_hard_segmentation would then apply to both dHCP and M-CRIB data utilisation.

Note I've changed to "hard segmentation" rather than "binary"; I think that's more faithful to the fact that the output 5TT image is 4D rather than 3D.

…atal algorithms Co-authored-by: Manuel Blesa <mblesac@gmail.com> Co-authored-by: Paola Galdi <paola.galdi@gmail.com>

mblesac · 2021-11-26T17:50:22Z

Hi,

I added the option to use the hard or the soft segmentation (by default uses the soft).

I tested in a dHCP subject and it looks like this:

Now you can test it with any subject of the dHCP and it should work. I'm still waiting for the confirmation from the HR office to be able to send you another subject from our cohort (with all the posteriors) so you can try it there as well.

- Ensure that input files are unique and unambiguous. - Remove redundant logic across get_inputs() and execute() regarding use of hard segmentation. - Simplify logic of primary conversion loop. - Generate sub-cortical grey matter image from M-CRIB only once, rather than once per tissue; this should facilitate code execution without using the -force option.

lib/mrtrix3/_5ttgen/dhcp.py

Lestropie · 2021-11-29T23:07:39Z

Also, at some point run the docs/generate_user_docs.sh script and commit the resulting changes to docs/reference/commands/5ttgen.rst and docs/reference/commands_list.rst under your accounts.

Lestropie and others added 4 commits November 15, 2021 19:11

5ttgen neonatal: Proposed code cleanup

98802a5

Fix syntax errors

f65a394

Co-authored-by: Manuel Blesa <mblesac@gmail.com> Co-authored-by: Paola Galdi <paola.galdi@gmail.com>

Remove histogram matching

a7f3188

Co-authored-by: Manuel Blesa <mblesac@gmail.com> Co-authored-by: Paola Galdi <paola.galdi@gmail.com>

Neonatal algorithm split in two: neonatal_mcrib and neonatal_dhcp

246cfca

Co-authored-by: Manuel Blesa <mblesac@gmail.com> Co-authored-by: Paola Galdi <paola.galdi@gmail.com>

Lestropie commented Nov 18, 2021

View reviewed changes

Renaming algorithms and modification of the dhcp (using path as an in…

db20d6a

…put) Manuel Blesa <mblesac@gmail.com> Paola Galdi <paola.galdi@gmail.com>

Addition of the option to use hard of soft segmentations to both neon…

7438c0d

…atal algorithms Co-authored-by: Manuel Blesa <mblesac@gmail.com> Co-authored-by: Paola Galdi <paola.galdi@gmail.com>

Lestropie commented Nov 28, 2021

View reviewed changes

lib/mrtrix3/_5ttgen/dhcp.py Outdated Show resolved Hide resolved

Lestropie added 2 commits November 30, 2021 09:36

5ttgen: Revert names of new algorithms

fe7fe13

5ttgen: pylint fixes for new algorithms

1bbb593

paola-g added 3 commits December 2, 2021 15:48

removed EN-DASH from reference

c48e69a

removed unicode char from refs

dc59419

updated docs with new 5ttgen commands

4757b77

mblesac merged commit d7475aa into master Dec 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposed code cleanup#1

Proposed code cleanup#1
mblesac merged 12 commits intomasterfrom
5ttgen_neonatal_rs

Lestropie commented Nov 15, 2021

Uh oh!

mblesac commented Nov 16, 2021

Uh oh!

Lestropie commented Nov 18, 2021

Uh oh!

Lestropie Nov 18, 2021

Uh oh!

mblesac Nov 23, 2021

Uh oh!

Lestropie commented Nov 24, 2021

Uh oh!

mblesac commented Nov 24, 2021

Uh oh!

Lestropie commented Nov 24, 2021

Uh oh!

mblesac commented Nov 26, 2021

Uh oh!

Uh oh!

Lestropie commented Nov 29, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Lestropie commented Nov 15, 2021

Uh oh!

mblesac commented Nov 16, 2021

Uh oh!

Lestropie commented Nov 18, 2021

Uh oh!

Lestropie Nov 18, 2021

Choose a reason for hiding this comment

Uh oh!

mblesac Nov 23, 2021

Choose a reason for hiding this comment

Uh oh!

Lestropie commented Nov 24, 2021

Uh oh!

mblesac commented Nov 24, 2021

Uh oh!

Lestropie commented Nov 24, 2021

Uh oh!

mblesac commented Nov 26, 2021

Uh oh!

Uh oh!

Lestropie commented Nov 29, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants