Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to reproduce the results in the paper on KITTI #230

Open
Tord-Zhang opened this issue May 9, 2022 · 3 comments
Open

Failed to reproduce the results in the paper on KITTI #230

Tord-Zhang opened this issue May 9, 2022 · 3 comments

Comments

@Tord-Zhang
Copy link

Tord-Zhang commented May 9, 2022

Hi, I train the PackNet with train_kitti.yaml and the dataset split you provided, But the results is far more worse than the numbers in the paper. I get abs_rel 0.121 while the results in the paper was about 0.07.

This is the config I used for training:
model: name: 'SelfSupModel' optimizer: name: 'Adam' depth: lr: 0.0002 pose: lr: 0.0002 scheduler: name: 'StepLR' step_size: 30 gamma: 0.5 depth_net: name: 'PackNet01' version: '1A' pose_net: name: 'PoseNet' version: '' params: crop: 'garg' min_depth: 0.0 max_depth: 80.0 datasets: augmentation: image_shape: (192, 640) train: batch_size: 4 dataset: ['KITTI'] path: ['datasets/KITTI_raw'] split: ['data_splits/eigen_zhou_files.txt'] depth_type: ['velodyne'] repeat: [2] validation: dataset: ['KITTI'] path: ['datasets/KITTI_raw'] split: ['data_splits/eigen_val_files.txt', 'data_splits/eigen_test_files.txt'] depth_type: ['velodyne'] test: dataset: ['KITTI'] path: ['datasets/KITTI_raw'] split: ['data_splits/eigen_test_files.txt'] depth_type: ['velodyne'] checkpoint: filepath: kitti_ckpt monitor: 'rmse_pp_gt' monitor_index: 0 mode: 'min'

and this is the result I get:
image

@VitorGuizilini-TRI
Copy link
Collaborator

Thank you for that, I will take a look to see if there is something wrong from our end. What is your hardware configuration for training?

@Tord-Zhang
Copy link
Author

Tord-Zhang commented May 10, 2022

Thank you for that, I will take a look to see if there is something wrong from our end. What is your hardware configuration for training?

@VitorGuizilini The number of GPUs are 6. All V100 GPUs. Below is the command:

#!/bin/bash
NGPUS=$1
LOG_FILE=$2
echo $NGPUS
MPI_CMD="mpirun -allow-run-as-root
-np ${NGPUS}
-H localhost:${NGPUS}
-x MASTER_ADDR=127.0.0.1
-x MASTER_PORT=23457
-x HOROVOD_TIMELINE
-x OMP_NUM_THREADS=1
-x KMP_AFFINITY='granularity=fine,compact,1,0'
-bind-to none -map-by slot -x NCCL_DEBUG=INFO -x NCCL_MIN_NRINGS=4
--report-bindings"
COMMAND="python3 scripts/train.py configs/train_kitti.yaml"
bash -c "${MPI_CMD} ${COMMAND}" 2>&1 | tee >(sed -r 's/\x1b[[0-9;]*m//g' > ${LOG_FILE})

@liortalker
Copy link

The problem is probably that you are training with a resized image (datasets: augmentation: image_shape: (192, 640)).
Try using the crop in the default YAML:
datasets:
augmentation:
crop_train_borders: (-352, 0, 0.5, 1216)
crop_eval_borders: (-352, 0, 0.5, 1216)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants