Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for splitting pacbio reads.bam files #367

Open
wants to merge 4 commits into
base: devel
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions Changes
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
Upcoming

- Support for splitting PacBio reads.bam file where data is ccs
processed on the instrument. Allow iRODS loading jobs to be
submitted to wr as instument run file loading will protentially
take considerably longer.

Release 2.35.0
- Tweak Sequel AnalysisPublisher for SMRT Link 10.2 to allow xml
in entry-points subdir.
Expand Down
31 changes: 31 additions & 0 deletions MANIFEST
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,9 @@ lib/WTSI/NPG/HTS/PacBio/Sequel/RunDelete.pm
lib/WTSI/NPG/HTS/PacBio/Sequel/RunMonitor.pm
lib/WTSI/NPG/HTS/PacBio/Sequel/RunPublisherBase.pm
lib/WTSI/NPG/HTS/PacBio/Sequel/RunPublisher.pm
lib/WTSI/NPG/HTS/PacBio/Sequel/SeqchkCalculator.pm
lib/WTSI/NPG/HTS/PacBio/Sequel/JobExecute/JobBase.pm
lib/WTSI/NPG/HTS/PacBio/Sequel/JobExecute/WrJob.pm
lib/WTSI/NPG/HTS/PathLister.pm
lib/WTSI/NPG/HTS/PublishState.pm
lib/WTSI/NPG/HTS/RunPublisher.pm
Expand Down Expand Up @@ -16301,6 +16304,16 @@ t/data/pacbio/sequel/r54097_20170727_165601/1_A02/m54097_170727_170646.subreads.
t/data/pacbio/sequel/r54097_20170727_165601/1_A02/m54097_170727_170646.subreadset.xml
t/data/pacbio/sequel/r54097_20170727_165601/1_A02/m54097_170727_170646.primary_qc.tar.xz
t/data/pacbio/sequel/r54097_20170727_165601/1_A02/m54097_170727_170646.transferdone
t/data/pacbio/sequel/r64016e_20220316_164414/1_A01/m64016e_220316_165505.reads.bam
t/data/pacbio/sequel/r64094e_20220401_114325/4_D01/m64094e_220405_144215.ccs_reports.json
t/data/pacbio/sequel/r64094e_20220401_114325/4_D01/m64094e_220405_144215.ccs_reports.txt
t/data/pacbio/sequel/r64094e_20220401_114325/4_D01/m64094e_220405_144215.consensusreadset.xml
t/data/pacbio/sequel/r64094e_20220401_114325/4_D01/m64094e_220405_144215.primary_qc.tar.xz
t/data/pacbio/sequel/r64094e_20220401_114325/4_D01/m64094e_220405_144215.reads.bam
t/data/pacbio/sequel/r64094e_20220401_114325/4_D01/m64094e_220405_144215.reads.bam.pbi
t/data/pacbio/sequel/r64094e_20220401_114325/4_D01/m64094e_220405_144215.sts.xml
t/data/pacbio/sequel/r64094e_20220401_114325/4_D01/m64094e_220405_144215.transferdone
t/data/pacbio/sequel/r64094e_20220401_114325/4_D01/m64094e_220405_144215.zmw_metrics.json.gz
t/data/pacbio/sequel/r64174e_20210114_161659/1_A01/m64174e_210114_162751.ccs_reports.json
t/data/pacbio/sequel/r64174e_20210114_161659/1_A01/m64174e_210114_162751.ccs_reports.txt
t/data/pacbio/sequel/r64174e_20210114_161659/1_A01/m64174e_210114_162751.consensusreadset.xml
Expand Down Expand Up @@ -16420,6 +16433,20 @@ t/data/pacbio/sequel_analysis/001612/tasks/barcoding.tasks.lima-0/lima_output.re
t/data/pacbio/sequel_analysis/001612/tasks/barcoding.tasks.lima-0/lima_output.removed.bam.pbi
t/data/pacbio/sequel_analysis/001612/tasks/barcoding.tasks.lima-0/lima_output.removed.subreadset.xml
t/data/pacbio/sequel_analysis/001612/tasks/barcoding.tasks.lima-0/merged_analysis_report.json
t/data/pacbio/sequel_analysis/001612v2/entry-points/07d85801-6a09-4728-982c-e3c048f95bd8.subreadset.xml
t/data/pacbio/sequel_analysis/001612v2/tasks/barcoding.tasks.lima-0/lima_output.lbc12--lbc12.bam
t/data/pacbio/sequel_analysis/001612v2/tasks/barcoding.tasks.lima-0/lima_output.lbc12--lbc12.bam.pbi
t/data/pacbio/sequel_analysis/001612v2/tasks/barcoding.tasks.lima-0/lima_output.lbc12--lbc12.subreadset.xml
t/data/pacbio/sequel_analysis/001612v2/tasks/barcoding.tasks.lima-0/lima_output.lbc5--lbc5.bam
t/data/pacbio/sequel_analysis/001612v2/tasks/barcoding.tasks.lima-0/lima_output.lbc5--lbc5.bam.pbi
t/data/pacbio/sequel_analysis/001612v2/tasks/barcoding.tasks.lima-0/lima_output.lbc5--lbc5.subreadset.xml
t/data/pacbio/sequel_analysis/001612v2/tasks/barcoding.tasks.lima-0/lima_output.lima.counts
t/data/pacbio/sequel_analysis/001612v2/tasks/barcoding.tasks.lima-0/lima_output.lima.guess.txt
t/data/pacbio/sequel_analysis/001612v2/tasks/barcoding.tasks.lima-0/lima_output.lima.summary.txt
t/data/pacbio/sequel_analysis/001612v2/tasks/barcoding.tasks.lima-0/lima_output.removed.bam
t/data/pacbio/sequel_analysis/001612v2/tasks/barcoding.tasks.lima-0/lima_output.removed.bam.pbi
t/data/pacbio/sequel_analysis/001612v2/tasks/barcoding.tasks.lima-0/lima_output.removed.subreadset.xml
t/data/pacbio/sequel_analysis/001612v2/tasks/barcoding.tasks.lima-0/merged_analysis_report.json
t/data/pacbio/sequel_analysis/000226/entry-points/acf46f00-12b8-45e6-bc10-b0790f8d6758.subreadset.xml
t/data/pacbio/sequel_analysis/000226/tasks/pbcoretools.tasks.auto_ccs_outputs-0/m64016_190608_025655.ccs.bam
t/data/pacbio/sequel_analysis/000226/tasks/pbcoretools.tasks.auto_ccs_outputs-0/m64016_190608_025655.ccs.bam.pbi
Expand Down Expand Up @@ -16498,11 +16525,13 @@ t/lib/WTSI/NPG/HTS/PacBio/Sequel/AnalysisPublisherTest.pm
t/lib/WTSI/NPG/HTS/PacBio/Sequel/AnalysisReportTest.pm
t/lib/WTSI/NPG/HTS/PacBio/Sequel/ApiClientTest.pm
t/lib/WTSI/NPG/HTS/PacBio/Sequel/ImageArchiveTest.pm
t/lib/WTSI/NPG/HTS/PacBio/Sequel/JobExecute/JobTest.pm
t/lib/WTSI/NPG/HTS/PacBio/Sequel/RunAuditorTest.pm
t/lib/WTSI/NPG/HTS/PacBio/Sequel/RunDeleteMonitorTest.pm
t/lib/WTSI/NPG/HTS/PacBio/Sequel/RunDeleteTest.pm
t/lib/WTSI/NPG/HTS/PacBio/Sequel/RunMonitorTest.pm
t/lib/WTSI/NPG/HTS/PacBio/Sequel/RunPublisherTest.pm
t/lib/WTSI/NPG/HTS/PacBio/Sequel/SeqchkCalculatorTest.pm
t/lib/WTSI/NPG/HTS/SeqchksumTest.pm
t/lib/WTSI/NPG/HTS/Test.pm
t/lib/WTSI/NPG/HTS/TreePublisherTest.pm
Expand All @@ -16521,11 +16550,13 @@ t/pacbio_sequel_analysis_publisher.t
t/pacbio_sequel_analysis_report.t
t/pacbio_sequel_api_client.t
t/pacbio_sequel_image_archive.t
t/pacbio_sequel_job.t
t/pacbio_sequel_run_auditor.t
t/pacbio_sequel_run_delete_monitor.t
t/pacbio_sequel_run_delete.t
t/pacbio_sequel_run_monitor.t
t/pacbio_sequel_run_publisher.t
t/pacbio_sequel_seqchk_calculator.t
t/params.t
t/perlcriticrc
t/README
Expand Down
9 changes: 8 additions & 1 deletion bin/npg_pacbio_run_auxiliary.pl
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@

Readonly::Scalar my $DEFAULT_INTERVAL_DAYS => 14;
Readonly::Scalar my $DEFAULT_OLDER_THAN_DAYS => 60;
Readonly::Scalar my $MINIMUM_DELETION_DAYS => 30;
Readonly::Array my @TYPES => qw(delete audit);

my $api_uri;
Expand Down Expand Up @@ -93,6 +94,11 @@
my ($num_runs, $num_processed, $num_actioned, $num_errors);

if ($task eq 'delete') {
## error if older_than is less than minimum deletion allowed
if($older_than < $MINIMUM_DELETION_DAYS) {
pod2usage(-msg => 'For delete older_than must be >'. $MINIMUM_DELETION_DAYS,
-exitval => 2);
}
my $delete = WTSI::NPG::HTS::PacBio::Sequel::RunDeleteMonitor->new(@init_args);
($num_runs, $num_processed, $num_actioned, $num_errors) = $delete->delete_runs;
} elsif ($task eq 'audit') {
Expand Down Expand Up @@ -141,7 +147,8 @@ =head1 SYNOPSIS
--logconf A log4perl configuration file. Optional.
--older-than
--older_than Only consider runs older than a specified number of
days. Optional defaults to 60 days.
days. 30 days is the minimum for the delete task.
Optional defaults to 60 days.
--task Required. Current permitted options are delete and
audit.
--verbose Print messages while processing. Optional.
Expand Down
57 changes: 48 additions & 9 deletions bin/npg_pacbio_runmonitor.pl
Original file line number Diff line number Diff line change
Expand Up @@ -20,26 +20,33 @@

Readonly::Scalar my $DEFAULT_INTERVAL_DAYS => 14;
Readonly::Scalar my $DEFAULT_OLDER_THAN_DAYS => 0;
Readonly::Scalar my $MODE_GROUP_WRITABLE => q(0020);

my $api_uri;
my $collection;
my $debug;
my $execute = 1;
my $interval = $DEFAULT_INTERVAL_DAYS;
my $local_path;
my $log_dir;
my $log4perl_config;
my $older_than = $DEFAULT_OLDER_THAN_DAYS;
my $submit_wr;
my $verbose;

GetOptions('collection=s' => \$collection,
GetOptions('api-uri|api_uri=s' => \$api_uri,
'collection=s' => \$collection,
'debug' => \$debug,
'help' => sub {
pod2usage(-verbose => 2, -exitval => 0);
},
'execute!' => \$execute,
'interval=i' => \$interval,
'logconf=s' => \$log4perl_config,
'local-path|local_path=s' => \$local_path,
'log-dir|log_dir=s' => \$log_dir,
'older-than|older_than=i' => \$older_than,
'api-uri|api_uri=s' => \$api_uri,
'submit-wr|submit_wr' => \$submit_wr,
'verbose' => \$verbose);


Expand All @@ -60,20 +67,37 @@
-exitval => 2);
}

if (!$log_dir || !-d $log_dir) {
pod2usage(-msg => 'A log dir must be specified by the log-dir argument and must exist',
-exitval => 2);
}

my $mode = (stat $log_dir)[2];
if ( not ($mode & $MODE_GROUP_WRITABLE) ) {
pod2usage(-msg => 'A log dir specified by the log-dir argument must be group writable',
-exitval => 2);
}

my $irods = WTSI::NPG::iRODS->new;
my $wh_schema = WTSI::DNAP::Warehouse::Schema->connect;

my @init_args = (interval => $interval,
my @init_args = (execute => $execute,
interval => $interval,
irods => $irods,
local_staging_area => $local_path,
log_dir => $log_dir,
mlwh_schema => $wh_schema,
older_than => $older_than,
);

if($api_uri) {
push @init_args, api_uri => $api_uri;
}
if ($collection) {
push @init_args, dest_collection => $collection;
}
if($api_uri) {
push @init_args, api_uri => $api_uri;
if($submit_wr) {
push @init_args, submit_wr => $submit_wr;
}

my $monitor = $module->new(@init_args);
Expand Down Expand Up @@ -103,14 +127,18 @@ =head1 NAME
=head1 SYNOPSIS

npg_pacbio_runmonitor --local-path </path/to/staging/area
[--collection <path>] [--debug] [--interval days] [--logconf <path>]
[--older-than days] [--verbose] [--api-uri]
[--collection <path>] [--debug] [--execute] [--interval days]
[--logconf <path>] [--log-dir <path>] [--execute] [--older-than days]
[--submit-wr] [--verbose] [--api-uri]

Options:
--collection The destination collection in iRODS. Optional,
defaults to /seq/pacbio/.
--debug Enable debug level logging. Optional, defaults to
false.
--[no]execute Propose jobs but don't submit them. Optional, defaults
to true. Only relevant in conjunction with submit-wr
option.
--help Display help.
--interval Interval of time in days for run loading.
Optional, defaults to 14.
Expand All @@ -119,12 +147,22 @@ =head1 SYNOPSIS
are staged for loading into iRODS.
--logconf A log4perl configuration file. Optional.

--log-dir
--log_dir Path to a log directory which is used in conjunction
with submit-wr option for log files and for run lock
files without the submit-wr option. Directory must exist
and be writable.
--older-than
--older_than Only consider runs older than a specified number of
days. Optional defaults to 0 days.

--execute A flag turning on and off execution. True by default.
--submit-wr
--submit_wr Submit individual run loading jobs to a wr manager.
Optional, defaults to false.
--verbose Print messages while processing. Optional.

--api_uri
--api-uri
--api_uri Specify the server host and port. Optional.


Expand All @@ -139,7 +177,8 @@ =head1 AUTHOR

=head1 COPYRIGHT AND DISCLAIMER

Copyright (C) 2016, 2019 Genome Research Limited. All Rights Reserved.
Copyright (C) 2016, 2019, 2022 Genome Research Limited. All Rights
Reserved.

This program is free software: you can redistribute it and/or modify
it under the terms of the Perl Artistic License or the GNU General
Expand Down
29 changes: 19 additions & 10 deletions bin/npg_publish_pacbio_run.pl
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@

use WTSI::DNAP::Warehouse::Schema;
use WTSI::NPG::iRODS;
use WTSI::NPG::HTS::PacBio::Sequel::APIClient;
use WTSI::NPG::HTS::PacBio::Sequel::RunPublisher;

our $VERSION = '';
Expand All @@ -37,14 +38,16 @@
LOGCONF
;

my $api_uri;
my $collection;
my $debug;
my $force = 0;
my $log4perl_config;
my $runfolder_path;
my $verbose;

GetOptions('collection=s' => \$collection,
GetOptions('api-uri|api_uri=s' => \$api_uri,
'collection=s' => \$collection,
'debug' => \$debug,
'force' => \$force,
'help' => sub {
Expand All @@ -55,8 +58,7 @@
'verbose' => \$verbose);



my $module = 'WTSI::NPG::HTS::PacBio::Sequel::RunPublisher';
my $module = 'WTSI::NPG::HTS::PacBio::Sequel::RunPublisher';

# Process CLI arguments
if ($log4perl_config) {
Expand All @@ -78,27 +80,29 @@
-exitval => 2);
}


my $irods = WTSI::NPG::iRODS->new;
my $wh_schema = WTSI::DNAP::Warehouse::Schema->connect;


my @init_args = (force => $force,
irods => $irods,
mlwh_schema => $wh_schema,
runfolder_path => $runfolder_path);
if ($collection) {
push @init_args, dest_collection => $collection;
}
if ($api_uri) {
my $api_client = WTSI::NPG::HTS::PacBio::Sequel::APIClient->new
('api_uri' => $api_uri);
push @init_args, api_client => $api_client;
}

my $publisher = $module->new(@init_args);

use sigtrap 'handler', \&handler, 'normal-signals';

sub handler {
my ($signal) = @_;

$log->info('Writing restart file ', $publisher->restart_file);
$publisher->write_restart_file;
$log->error("Exiting due to $signal");
exit 1;
}
Expand All @@ -114,6 +118,8 @@ sub handler {
"with $num_errors errors");
}

exit 0;

__END__

=head1 NAME
Expand All @@ -122,10 +128,12 @@ =head1 NAME

=head1 SYNOPSIS

npg_publish_pacbio_run --runfolder-path <path> [--collection <path>]
[--force] [--debug] [--verbose] [--logconf <path>]
npg_publish_pacbio_run --runfolder-path <path> [--api-uri <uri>]
[--collection <path>] [--force] [--debug] [--verbose] [--logconf <path>]

Options:
--api-uri
--api_uri Specify the server host and port. Optional.
--collection The destination collection in iRODS. Optional,
defaults to /seq/pacbio/.
--debug Enable debug level logging. Optional, defaults to
Expand Down Expand Up @@ -171,7 +179,8 @@ =head1 AUTHOR

=head1 COPYRIGHT AND DISCLAIMER

Copyright (C) 2016, 2017 Genome Research Limited. All Rights Reserved.
Copyright (C) 2016, 2017, 2022 Genome Research Limited. All Rights
Reserved.

This program is free software: you can redistribute it and/or modify
it under the terms of the Perl Artistic License or the GNU General
Expand Down
11 changes: 6 additions & 5 deletions lib/WTSI/NPG/HTS/PacBio/Sequel/AnalysisMonitor.pm
Original file line number Diff line number Diff line change
Expand Up @@ -68,17 +68,18 @@ sub publish_analysed_cells {
foreach my $job (@jobs) {
try {
my $analysis_path = $job->{path};
if(-d $analysis_path){
my ($nf, $np, $ne) = $self->_publish_analysis_path($analysis_path);
if (-d $analysis_path) {
my ($nf, $np, $ne) = $self->_publish_analysis_path($analysis_path);
$self->debug("Processed [$np / $nf] files in ",
"'$analysis_path' with $ne errors");
"'$analysis_path' with $ne errors");

if ($ne > 0) {
$self->logcroak("Encountered $ne errors while processing ",
$self->logcroak("Encountered $ne errors while processing ",
"[$np / $nf] files in '$analysis_path'");
}
$num_processed++;
}else{
}
else {
$self->warn('IGNORING job id ',$job->{id} .
qq[ as output dir [$analysis_path] not found]);
}
Expand Down
Loading