Skip to content
This repository was archived by the owner on May 3, 2024. It is now read-only.

Comments

[feature] WIP: automate with nextflow#8

Draft
nh13 wants to merge 4 commits intoMagdoll:masterfrom
nh13:feature/nextflow
Draft

[feature] WIP: automate with nextflow#8
nh13 wants to merge 4 commits intoMagdoll:masterfrom
nh13:feature/nextflow

Conversation

@nh13
Copy link

@nh13 nh13 commented Feb 20, 2021

This is a work-in-progress PR to automate the steps described in https://github.com/Magdoll/CoSA/wiki/SARS-CoV-2-variant-calling-using-PacBio-HiFi-data#ccs. My motivation is to understand the process, including the inputs and outputs for all steps along the way.

@nh13
Copy link
Author

nh13 commented Feb 20, 2021

@Magdoll do you have any small example data you can point me to that would help me here? Specifically I am looking for a subreads BAM as input to ccs, along with any of the metadata I need to know about to execute the remaining steps.

@Magdoll
Copy link
Owner

Magdoll commented Feb 20, 2021

@nh13 - do you really need subreads.bam. I know I wrote the ccs step in there, but at this point SQ2e generates ccs.bam directly off instrument and service providers will run ccs.bam and send that to customers.

I'm waiting for the ccs.bam that is used for creating this protocol to become public in SRA. Hopefully soon.

@nh13
Copy link
Author

nh13 commented Feb 20, 2021

Nope, so don’t need the CCS reads then, just didn’t want to overlook that step if needed. Also want to try it out on my end, but no big deal.

ch_bams_by_sample = ch_in_lima_outputs
.join(ch_lima_bams) // join by barcode name key
.map { (key, sample, bam) -> (sample, bam) } // discard the key
.groupTuple() // collect all BAMs by sample name
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pditommaso I am requiring a CSV that gives the sample (patient) along with the forward and reverse barcodes used. Since the sample can have multiple F/R barcode pairs, I need to merge the BAMs after demultiplexing. I am trying to figure out a concise way of joining the output of lima (demultiplexing) with the metadata from the CSV, so I can group the BAMs to merge. Is this type of channel joining the idiomatic way, or would you recommend something different?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks you have already improved it bd3d982. Can't think of anything better 👍

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pdtommaso!

@drpatelh
Copy link

@nh13 I have been busy updating our SARS-CoV-2 pipeline to DSL2 of late and just finished adding Nanopore support on Friday 😅 Be happy to consider adding --platform pacbio to that pipeline, I have intentionally kept the implementation open to this sort of extension. We have also added a bunch of modules to nf-core/modules that you should be able to re-use straight away. Have a look here.

@nh13
Copy link
Author

nh13 commented Feb 22, 2021

How about I get a working implementation here first as adding another/third code path in that repo seems too much for my limited free time at the moment.

@drpatelh
Copy link

Understandable. Happy to help where I can.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants