On Improving Line-Level Defect Prediction: An Evaluation and Enhancement of the DeepLineDP Model Using the Defectors Dataset

Authors

Patrycja Kałużna (252864)

Jakub Walaszek (252897)

Reasons

The topic was chosen because of:

availability of source code of models which gives a possibility to reproduce and develop research,
availability of datasets on which models were trained which also gives a possibility to reproduce research,
availability of scripts and an instruction of how to use these scripts to reproduce research.

Articles

DeepLineDP model: https://ieeexplore.ieee.org/document/9689967

Defectors dataset: https://arxiv.org/abs/2303.04738

DeepLineDP model's GitHub

https://github.com/awsm-research/DeepLineDP

Defectors dataset's source

https://zenodo.org/record/7708984

Trello

https://trello.com/b/tYlTMYCg/main

Overleaf

https://www.overleaf.com/project/6401cc2de33881644150cd5f

Google Colaboratory

For generating research reproduction's results:

https://colab.research.google.com/drive/139uWve5H07uM0SIKZSuevsi-dEjWeK9P?usp=sharing

For generating figures representing research questions' answers:

https://colab.research.google.com/drive/1rglM2qt-w5JA-PXk2WnGHK_eAml3Wqae?usp=sharing

For generating research development reproduction's results:

https://colab.research.google.com/drive/1IPa3uUJq5pp6JZCgie_G34mhz2m8TbqK?usp=sharing

Leadership's schedule

https://politechnikawroclawska-my.sharepoint.com/:x:/g/personal/252864_student_pwr_edu_pl/EZkkBqJQHCROlh3e_fCDmbABqrCMpqgz_4aZvHR55gZ14A?e=RNPHLI

Research reproduction steps

Click on link to Google Colaboratory environment which contains Jupyter Notebook that allows of downloading required files, configuring required environment and reproducing research for DeepLineDP model and file-level baselines.

https://colab.research.google.com/drive/139uWve5H07uM0SIKZSuevsi-dEjWeK9P?usp=sharing
Click on consecutive field's arrow signs to download required files, configure required environment and reproduce research.

Note: Google Colaboratory environment session can last up to 12 hours but can end before this time unexpectedly. It is important to remember to save results outside of this environment from time to time. At the end of the Jupyter Notebook there are 2 fields which alows of exporting and importing all data that is in the environment to and from Google Drive.

To reproduce research for line-level baselines on Windows machine:
1. Install Java and Python if they are not installed,
2. Run DeepLineDP_line_level_baselines_local_reproduction.bat script from M8/Reproduction folder.

Research reproduction's results can be found in M8/Reproduction/DeepLineDP/output directory:

/content/M8/Reproduction/DeepLineDP/output/model/DeepLineDP/<PROJECT_NAME> - contains trained models for <PROJECT_NAME> project,
/content/M8/Reproduction/DeepLineDP/output/loss/DeepLineDP/<PROJECT_NAME>-loss_record.csv - contains training and validations loss for <PROJECT_NAME> project,
/content/M8/Reproduction/DeepLineDP/output/prediction/DeepLineDP/within-release - contains projects releases' within-release predictions,
/content/M8/Reproduction/DeepLineDP/output/prediction/DeepLineDP/cross-project/<PROJECT_NAME> - contains <PROJECT_NAME> project's cross-projects predictions,
/content/M8/Reproduction/DeepLineDP/output/model/<FILE-LEVEL_BASELINE_NAME>/<PROJECT_NAME> - contains trained models for <FILE-LEVEL_BASELINE_NAME> file-level baseline and <PROJECT_NAME> project,
/content/M8/Reproduction/DeepLineDP/output/loss/<FILE-LEVEL_BASELINE_NAME>/<PROJECT_NAME>-<FILE-LEVEL_BASELINE_NAME>-loss_record.csv - contains training and validation loss for <FILE-LEVEL_BASELINE_NAME> file-level baseline and <PROJECT_NAME> project,
/content/M8/Reproduction/DeepLineDP/output/prediction/<FILE-LEVEL_BASELINE_NAME> - contains predictions for <FILE-LEVEL_BASELINE_NAME> file-level baseline,
M8/Reproduction/DeepLineDP/output/n_gram_result - contains results for N-gram line-level baseline,
M8/Reproduction/DeepLineDP/output/ErrorProne_result - contains results for ErrorProne line-level baseline.

Where <PROJECT_NAME> can be one of following: activemq, camel, derby, groovy, hbase, hive, jruby, lucene, wicket and where <FILE-LEVEL_BASELINE_NAME> can be one of following: Bi-LSTM, CNN, DBN, BoW.

Click on link to Google Colaboratory environment with R runtime type which contains Jupter Notebook that allows of downloading research reproduction results, installing required R libraries and generating figures representing research questions' answers.

https://colab.research.google.com/drive/1rglM2qt-w5JA-PXk2WnGHK_eAml3Wqae?usp=sharing
Click on consecutive field's arrow signs to download research reproduction results, install required R libraries and generate figures.

Figures representing research question's answers can be found in M8/Reproduction/DeepLineDP/output/figure directory.

Note: This Google Colaboratory environment and Jupyter Notebook allows of generating figures representing research questions' answers for 2 of 4 research questions. It is because get_evaluation_result.R script provided by researchers has bugs and we have not been able to correct all of them yet.

Research development reproduction steps

Click on link to Google Colaboratory environment which contains Jupter Notebook that allows of downloading required files, configuring required environemnt and reproducing developed research.

https://colab.research.google.com/drive/1IPa3uUJq5pp6JZCgie_G34mhz2m8TbqK?usp=sharing

Click on consecutive field's arrow signs to download required files, configure required environment and reproduce developed research.

Developed research reproduction's results can be found in the same directory as previous reproduction results:

/content/M8/Reproduction/DeepLineDP/output/model/DeepLineDP/<DATASET_NAME> - contains trained models for <DATASET_NAME> dataset,
/content/M8/Reproduction/DeepLineDP/output/loss/DeepLineDP/<DATASET_NAME>-loss_record.csv - contains training and validation loss for <DATASET_NAME> dataset,
/content/M8/Reproduction/DeepLineDP/output/prediction/DeepLineDP/within-release - contains predictions for each dataset.

Where <DATASET_NAME> can be one of following: dataset_RQ1, dataset_RQ2, dataset_RQ3, dataset_RQ4, dataset_RQ5. These datasets are prepared per research question.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
Presentations		Presentations
Reproduction		Reproduction
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

On Improving Line-Level Defect Prediction: An Evaluation and Enhancement of the DeepLineDP Model Using the Defectors Dataset

Authors

Reasons

Articles

DeepLineDP model's GitHub

Defectors dataset's source

Trello

Overleaf

Google Colaboratory

Leadership's schedule

Research reproduction steps

Research development reproduction steps

About

Releases

Packages

Contributors 3

Languages

License

pwr-pbr23/M8

Folders and files

Latest commit

History

Repository files navigation

On Improving Line-Level Defect Prediction: An Evaluation and Enhancement of the DeepLineDP Model Using the Defectors Dataset

Authors

Reasons

Articles

DeepLineDP model's GitHub

Defectors dataset's source

Trello

Overleaf

Google Colaboratory

Leadership's schedule

Research reproduction steps

Research development reproduction steps

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages