Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ITS genes not being found #14

Open
rissidaniel opened this issue May 8, 2023 · 11 comments
Open

ITS genes not being found #14

rissidaniel opened this issue May 8, 2023 · 11 comments

Comments

@rissidaniel
Copy link

Hello,
Please, I have done the "ufcg download -t full" and when I run the profile analysis with --set NUC , the tool is not finding any ITS gene for any of the genomes.
The other --set options are working.

command:

ufcg profile --input ./genomes/ --output results_phylo_2_nuc --set NUC --thread 28 -k -w tmp_results_phylo_2_nuc

How can I solve this?
Thank you in advance

@rissidaniel rissidaniel changed the title NUC Analysis ITS genes not being found May 8, 2023
@endixk
Copy link
Member

endixk commented May 9, 2023

Hello,

I found that the download package actually lacked ITS database, which is required for the extraction.

I uploaded the updated package that will allow your command to run properly.

Could you please try to run ufcg download -t core once, and run your command again?

Sorry for the inconvenience!

@JWDebler
Copy link

JWDebler commented Aug 31, 2023

Hi, not sure if I have the same problem, but running
ufcg profile --input genomes --output output --set NUC --force 1 --thread 10 --metadata metadata.tsv gives me a FAILED : ITS sequence not found. for every genome.
I installed via conda and tried both ufcg download -t core and ufcg download -t full.
Cheers

Edit: it is now running after I changed NUC to PRO. I couldn't find anything about this setting in the manual other than one sentence in the tutorial saying "We want to extract protein markers from the sequences. Type 'PRO' to continue."

@ignadb
Copy link

ignadb commented Sep 19, 2023

Hi, thanks for developing ufcg; it's very useful!
It seems the problem still persists for me. I ran a command with --set NUC and got the ITS sequence not found message for all genomes. Tried downloading the database as suggested above and reran the command without success either with NUC or PRO. I am not sure if this is relevant; I found only two folders (busco,pro) in ../steineggerlab/ufcg/1.0.5/confid/model. Also there is no hmm profile for ITS in the pro folder. Do you have any suggestion about this?

I followed the instruction for installation from Github. Thanks a lot in advance!

@endixk
Copy link
Member

endixk commented Sep 21, 2023

Hello @ignadb,
It seems the change I made in the recent update on MMseqs2 parameters ruined its nucleotide search capacity 😞
This could be quickly fixed but it will take some time for the amendment being reflected on the conda mirror.
Please wait for the new release or install the program manually from the recent clone.

@alisqq
Copy link

alisqq commented Mar 5, 2024

Hello @endixk,
has this issue been resolved? i reinstalled ufcg (i tried both conda and git) yesterday and the pipeline still doesn't find ITS sequences in the genomes.

edit: nevermind, it worked with git clone install!

@jackscanlan
Copy link

Hi @endixk, thanks so much for your work on this tool, it's really impressive. Just wondering if the recent commits, including df9d3e6 referenced above, could please be included in a new tag of the ufcg Docker container? I'd love to be able to extract NUC/BUSCO sequences using the container for a Nextflow pipeline I'm working on.

@endixk
Copy link
Member

endixk commented Jul 2, 2024

Hello @jackscanlan, sorry for the late reply.

I also think this is a good time to release a new minor version including these updates.

I will work on it soonish and leave a note here when it's done :)

@endixk
Copy link
Member

endixk commented Jul 9, 2024

Hi, I pushed a new version of Docker container recently. Could you please check it out?

@jackscanlan
Copy link

Hi @endixk, thanks for making a new version of the Docker container. I'm trying the following command and getting the following output. (Note that this is a Nextflow process with only a single input genome, which is why UFCG isn't finding a bunch of the samples in the metadata file--expected behaviour for me)

Command:

ufcg profile \
    --input $3 \
    --metadata $META_PATH \
    --output . \
    -t $6 \
    --set NUC \
    -f \
    -w /tmp/${2} \
    --nocolor \
    -v

Output:

�[32;1m    __  __ _____ _____ _____�[0m
�[32;1m   / / / // ___// ___// ___/�[0m
�[32;1m  / / / // /_  / /   / / __�[0m
�[32;1m / /_/ // __/ / /___/ /_/ /�[0m
�[32;1m \____//_/    \____/\____/�[0m�[32m v1.0.6�[0m


[JUL 15 00:28:13] UFCG  |:  Verbose option check.
[JUL 15 00:28:13] UFCG  |:  Timestamp printing option check.
[JUL 15 00:28:13] UFCG  |:  Input file check : GCA_023212845.1_ASM2321284v1_genomic.fna
[JUL 15 00:28:13] UFCG  |:  Symbolic link detected : GCA_023212845.1_ASM2321284v1_genomic.fna -> /group/pathogens/IAWS/Personal/JackS/dev/fungal-phylo/work/5b/08202e572e198319df0319156c19e8/GCA_023212845.1_ASM2321284v1_genomic.fna
[JUL 15 00:28:13] UFCG  |:  Input file check : /group/pathogens/IAWS/Personal/JackS/dev/fungal-phylo/work/5b/08202e572e198319df0319156c19e8/GCA_023212845.1_ASM2321284v1_genomic.fna
[JUL 15 00:28:13] UFCG  |:  Input argument : ASCII text
[JUL 15 00:28:13] UFCG  |:  Output directory check : .
[JUL 15 00:28:13] UFCG  |:  Temporary directory check : /tmp/GCA_023212845.1
[JUL 15 00:28:13] UFCG  |:  Custom CPU thread count check : 8
[JUL 15 00:28:13] UFCG  |:  Metadata file check : /group/pathogens/IAWS/Personal/JackS/dev/fungal-phylo/work/58/6bb959fc58e39920facc71ee504686/repository_metadata.tsv
[JUL 15 00:28:13] UFCG  |:  SUCCESS : Option parsing
[JUL 15 00:28:13] UFCG  |:  Solving dependencies...
[JUL 15 00:28:14] UFCG  |:  SUCCESS : Dependency solving
[JUL 15 00:28:14] UFCG  |:  Launching UFCG profile module...

[JUL 15 00:28:14] UFCG  |:  Importing given metadata file : /group/pathogens/IAWS/Personal/JackS/dev/fungal-phylo/work/58/6bb959fc58e39920facc71ee504686/repository_metadata.tsv
[JUL 15 00:28:14] UFCG  |:  Metadata file with 10 entities successfully imported.
[JUL 15 00:28:14] UFCG  |:  Reading input data...
[JUL 15 00:28:14] WARN  |:  Metadata entity EPFG6_scaffolds.fasta is not in the input files.
[JUL 15 00:28:14] WARN  |:  Metadata entity GCA_020975405.1_ASM2097540v1_genomic.fna is not in the input files.
[JUL 15 00:28:14] WARN  |:  Metadata entity GCA_000426965.1_ASM42696v1_genomic.fna is not in the input files.
[JUL 15 00:28:14] WARN  |:  Metadata entity GCA_000426985.1_ASM42698v1_genomic.fna is not in the input files.
[JUL 15 00:28:14] WARN  |:  Metadata entity GCA_000739145.1_Metarhizium_anisopliae_genomic.fna is not in the input files.
[JUL 15 00:28:14] WARN  |:  Metadata entity GCA_000814975.1_MAN_1.0_genomic.fna is not in the input files.
[JUL 15 00:28:14] WARN  |:  Metadata entity GCA_013305495.1_ASM1330549v1_genomic.fna is not in the input files.
[JUL 15 00:28:14] WARN  |:  Metadata entity GCA_013839505.1_ASM1383950v1_genomic.fna is not in the input files.
[JUL 15 00:28:14] WARN  |:  Metadata entity GCA_039654215.1_AGRO-Manis_genomic.fna is not in the input files.
[JUL 15 00:28:14] UFCG  |:  Queries prepared. 1 genome sequences identified.
[JUL 15 00:28:14] UFCG  |:  Temporary directory check : /tmp/GCA_023212845.1/GCA_023212845.1
[JUL 15 00:28:14] UFCG  |:  QUERY 1/1 : GCA_023212845.1 (Metarhizium anisopliae)
[JUL 15 00:28:14] UFCG  |:  Extracting nucleotide markers...
[JUL 15 00:28:22] WARN  |:  Result file not created : /tmp/GCA_023212845.1/GCA_023212845.1/UFCG_4297ba9db2cf4b1a_GCA_023212845.1_ASM2321284v1_genomic.fna.m8
[JUL 15 00:28:22] UFCG  |:  FAILED : ITS sequence not found.
[JUL 15 00:28:22] UFCG  |:  Writing results on : ./GCA_023212845.1.ucg
[JUL 15 00:28:22] UFCG  |:  Cleaning temporary files up...
[JUL 15 00:28:22] UFCG  |:  Job finished. Terminating process.

So it seems like, for me, the new version hasn't fixed the original issue, unfortunately?

@jackscanlan
Copy link

Just to add to that, when I use --set BUSCO, I get the following, seemingly unrelated, error:

  [JUL 15 04:16:55] UFCG  |:  Timestamp printing option check.
  [JUL 15 04:16:55] UFCG  |:  Input file check : EPFG6_scaffolds.fasta
  [JUL 15 04:16:55] UFCG  |:  Symbolic link detected : EPFG6_scaffolds.fasta -> /group/pathogens/IAWS/Personal/JackS/dev/fungal-phylo/input/EPFG6_scaffolds.fasta
  [JUL 15 04:16:55] UFCG  |:  Input file check : /group/pathogens/IAWS/Personal/JackS/dev/fungal-phylo/input/EPFG6_scaffolds.fasta
  [JUL 15 04:16:55] UFCG  |:  Input argument : ASCII text
  [JUL 15 04:16:55] UFCG  |:  Output directory check : .
  [JUL 15 04:16:55] UFCG  |:  Number of BUSCOs to extract : 0
  [JUL 15 04:16:55] UFCG  |:  Temporary directory check : /tmp/EPFG6
  [JUL 15 04:16:55] UFCG  |:  Custom CPU thread count check : 8
  [JUL 15 04:16:55] UFCG  |:  Metadata file check : /group/pathogens/IAWS/Personal/JackS/dev/fungal-phylo/work/50/3daab5f3d8738f4fef451c36d149cf/repository_metadata.tsv
  [JUL 15 04:16:55] UFCG  |:  SUCCESS : Option parsing
  [JUL 15 04:16:55] UFCG  |:  Solving dependencies...
  [JUL 15 04:16:55] UFCG  |:  SUCCESS : Dependency solving
  [JUL 15 04:16:55] UFCG  |:  Launching UFCG profile module...
  
  [JUL 15 04:16:55] UFCG  |:  Importing given metadata file : /group/pathogens/IAWS/Personal/JackS/dev/fungal-phylo/work/50/3daab5f3d8738f4fef451c36d149cf/repository_metadata.tsv
  [JUL 15 04:16:55] UFCG  |:  Metadata file with 10 entities successfully imported.
  [JUL 15 04:16:55] UFCG  |:  Reading input data...
  [JUL 15 04:16:55] WARN  |:  Metadata entity GCA_020975405.1_ASM2097540v1_genomic.fna is not in the input files.
  [JUL 15 04:16:55] WARN  |:  Metadata entity GCA_000426965.1_ASM42696v1_genomic.fna is not in the input files.
  [JUL 15 04:16:55] WARN  |:  Metadata entity GCA_000426985.1_ASM42698v1_genomic.fna is not in the input files.
  [JUL 15 04:16:55] WARN  |:  Metadata entity GCA_000739145.1_Metarhizium_anisopliae_genomic.fna is not in the input files.
  [JUL 15 04:16:55] WARN  |:  Metadata entity GCA_000814975.1_MAN_1.0_genomic.fna is not in the input files.
  [JUL 15 04:16:55] WARN  |:  Metadata entity GCA_013305495.1_ASM1330549v1_genomic.fna is not in the input files.
  [JUL 15 04:16:55] WARN  |:  Metadata entity GCA_013839505.1_ASM1383950v1_genomic.fna is not in the input files.
  [JUL 15 04:16:55] WARN  |:  Metadata entity GCA_023212845.1_ASM2321284v1_genomic.fna is not in the input files.
  [JUL 15 04:16:55] WARN  |:  Metadata entity GCA_039654215.1_AGRO-Manis_genomic.fna is not in the input files.
  [JUL 15 04:16:55] UFCG  |:  Queries prepared. 1 genome sequences identified.
  [JUL 15 04:16:55] UFCG  |:  Temporary directory check : /tmp/EPFG6/ABMA9
  [JUL 15 04:16:55] UFCG  |:  QUERY 1/1 : ABMA9 (unknown)
  [JUL 15 04:16:55] UFCG  |:  Extracting BUSCOs...
  [JUL 15 04:16:55] UFCG  |:  RESULT : [Single: 0 ; Duplicated: 0 ; Missing: 0]
  [JUL 15 04:16:55] UFCG  |:  Writing results on : ./ABMA9.ucg
  [JUL 15 04:16:55] UFCG  |:  ERROR! java.lang.StringIndexOutOfBoundsException: Range [0, -1) out of bounds for length 0
  [JUL 15 04:16:55] UFCG  |:    at java.base/jdk.internal.util.Preconditions$1.apply(Preconditions.java:55)
  [JUL 15 04:16:55] UFCG  |:    at java.base/jdk.internal.util.Preconditions$1.apply(Preconditions.java:52)
  [JUL 15 04:16:55] UFCG  |:    at java.base/jdk.internal.util.Preconditions$4.apply(Preconditions.java:213)
  [JUL 15 04:16:55] UFCG  |:    at java.base/jdk.internal.util.Preconditions$4.apply(Preconditions.java:210)
  [JUL 15 04:16:55] UFCG  |:    at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:98)
  [JUL 15 04:16:55] UFCG  |:    at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckFromToIndex(Preconditions.java:112)
  [JUL 15 04:16:55] UFCG  |:    at java.base/jdk.internal.util.Preconditions.checkFromToIndex(Preconditions.java:349)
  [JUL 15 04:16:55] UFCG  |:    at java.base/java.lang.AbstractStringBuilder.substring(AbstractStringBuilder.java:1093)
  [JUL 15 04:16:55] UFCG  |:    at java.base/java.lang.StringBuilder.substring(StringBuilder.java:91)
  [JUL 15 04:16:55] UFCG  |:    at entity.JsonProfileEntity.setRunData(JsonProfileEntity.java:76)
  [JUL 15 04:16:55] UFCG  |:    at entity.JsonProfileEntity.<init>(JsonProfileEntity.java:25)
  [JUL 15 04:16:55] UFCG  |:    at process.JsonBuildProcess.build(JsonBuildProcess.java:67)
  [JUL 15 04:16:55] UFCG  |:    at module.ProfileModule.run(ProfileModule.java:933)
  [JUL 15 04:16:55] UFCG  |:    at pipeline.ModuleHandler.handle_profile(ModuleHandler.java:45)
  [JUL 15 04:16:55] UFCG  |:    at pipeline.ModuleHandler.handle(ModuleHandler.java:83)
  [JUL 15 04:16:55] UFCG  |:    at pipeline.UFCGMainPipeline.main(UFCGMainPipeline.java:301)

@endixk
Copy link
Member

endixk commented Jul 30, 2024

Hey @jackscanlan,

Thanks for sharing the results. For the second issue, BUSCO config files are missing. Running ufcg download -t busco will solve it. I should have provided a proper error message for this particular issue 😅

For the first one though, I am not 100% clear about the root cause. I tried to reproduce the issue from my environment with symlink inputs but it worked fine. It will be helpful if you could run it using -dev option to find out exactly which sub-command was problematic. It would be also helpful if you could provide if the program works (or fails) on the default core gene (-s PRO) task.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants