diff --git a/admin/docs/testing/create_module_test_cases.md b/admin/docs/testing/create_module_test_cases.md new file mode 100644 index 00000000..43f70676 --- /dev/null +++ b/admin/docs/testing/create_module_test_cases.md @@ -0,0 +1,107 @@ +# Creating Module Test Cases + +This document describes the process of creating test cases for LEAPP modules using the `make_test_data.py` script. + +## Overview + +The `make_test_data.py` script is designed to generate test data for LEAPP modules. It processes input files (zip, tar, or tar.gz) to extract relevant files based on the module's artifact patterns and creates structured test cases. + +## Usage + +To create test cases for a module, use the following command: + +```python +make_test_data.py +- ``: Name of the module (e.g., keyboard or keyboard.py) +- ``: Case number for the test data +- ``: Path to the input file (zip, tar, or tar.gz) +``` +## Process + +1. The script imports the specified module and retrieves artifact information. +2. It creates or updates a JSON file with test case metadata. +3. The script processes the input archive file, searching for files matching the artifact patterns. +4. For each artifact, it creates a zip file containing the matching files. + +5. The JSON file is updated with information about the created test data. + +## Output + +The script generates the following outputs: + +1. A JSON file containing test case metadata. + - `admin/test/cases/testdata..json` +2. Zip files for each artifact, containing the relevant test data files. + - `admin/test/cases/data/testdata....zip` + +## Example + +```python +make_test_data.py keyboard 001 /path/to/input/data.zip +``` + +This command will create test data for the keyboard module, using case number 001 and the specified input file. + +## Notes + +- The script supports zip, tar, and tar.gz input files. +- Test data is stored in the `admin/test/cases/data` directory. +- JSON metadata files are stored in the `admin/test/cases` directory. +- Always review and update the generated JSON file with additional test case details as needed. + +## Test Case JSON File Structure + +The script generates a JSON file (e.g., `testdata..json`) that contains metadata and information about the test cases. This file is crucial for running tests and maintaining test data. Here's an explanation of its structure: + +```json +{ + "case001": { + "description": "", + "maker": "", + "make_data": { + "input_data_path": "/path/to/input/data.tar.gz", + "os": "macOS-15.0-x86_64-i386-64bit", + "timestamp": "2024-10-14T10:17:49.432528" + }, + "artifacts": { + "artifact_name": { + "search_patterns": [ + "*/path/to/artifact/files" + ], + "file_count": 1, + "expected_output": { + "headers": [], + "data": [] + } + } + } + } +} +``` + +- `case001`: A unique identifier for each test case. + - `description`: A brief description of the test case (to be filled in manually). + - `maker`: The person who created the test case (to be filled in manually). + - `make_data`: Information about the test case creation process. + - `input_data_path`: The path to the input file used to create the test case. + - `os`: The operating system on which the test case was created. + - `timestamp`: The date and time when the test case was created. + - `artifacts`: A dictionary of artifacts tested in this case. + - `artifact_name`: The name of the artifact (e.g., "get_keyboard_lexicon"). + - `search_patterns`: The file patterns used to find relevant files. + - `file_count`: The number of files found matching the search patterns. + - `expected_output`: The expected results of the artifact extraction. + - `headers`: Column headers for the expected output (to be filled in manually). + - `data`: Expected data rows (to be filled in manually). + +Multiple test cases (e.g., "case001", "case002") can be included in a single JSON file. + +## Updating the JSON File + +After creating test cases, it's important to manually update the JSON file with the following information: + +1. Add a meaningful description for each test case. +2. Include the name of the person who created the test case. +3. Fill in the expected output headers and data for each artifact. + +This information is crucial for validating test results and ensuring the accuracy of the artifact extraction process. diff --git a/admin/docs/testing/readme.md b/admin/docs/testing/readme.md new file mode 100644 index 00000000..a7c49d32 --- /dev/null +++ b/admin/docs/testing/readme.md @@ -0,0 +1,42 @@ +# LEAPP Testing Documentation + +This documentation provides an overview of the testing processes used in the LEAPP project, with a current focus on module testing. + +## Module Testing + +Module testing is a crucial part of ensuring the reliability and accuracy of LEAPP's artifact extraction and analysis capabilities. Our module testing process involves two main steps: + +1. [Creating Module Test Cases](create_module_test_cases.md) +2. [Testing Modules](testing_modules.md) + +### Creating Module Test Cases + +We use a custom script to generate test cases for each module. This process involves: + +- Extracting relevant files from input data (zip, tar, or tar.gz files) +- Creating JSON files with test case metadata +- Generating zip files containing test data for each artifact + +For more details, see the [Creating Module Test Cases](create_module_test_cases.md) documentation. + +### Testing Modules + +Once test cases are created, we use another script to run tests on the modules. This process includes: + +- Selecting specific modules, artifacts, and test cases to run +- Executing the module's artifact extraction function +- Comparing the results with expected output +- Generating test result files + +For more information, refer to the [Testing Modules](testing_modules.md) documentation. + +## Future Expansions + +As the LEAPP project grows, we plan to expand our testing documentation to cover: + +- Unit testing +- Integration testing +- Performance testing +- Continuous Integration (CI) processes + + diff --git a/admin/docs/testing/testing_modules.md b/admin/docs/testing/testing_modules.md new file mode 100644 index 00000000..5dc9fb63 --- /dev/null +++ b/admin/docs/testing/testing_modules.md @@ -0,0 +1,55 @@ +# Testing Modules + +This document describes the process of testing LEAPP modules using the `test_module.py` script. + +## Overview + +The `test_module.py` script is designed to run tests on LEAPP modules using previously created test cases. It processes the test data, executes the module's artifact extraction function, and generates test results. + +## Usage + +To test a module, use the following command: + +``` python +python test_module.py [artifact_name] [case_number] + + +- ``: Name of the module to test +- `[artifact_name]`: (Optional) Name of the specific artifact to test (or 'all') +- `[case_number]`: (Optional) Specific case number to test (or 'all') +``` +- If `artifact_name` or `case_number` are not provided, the script will prompt you to select from available options or run all. +- Provide `all` on the command line to run all test cases and artifacts. + +## Process + +1. The script loads test cases for the specified module. +2. It allows selection of specific artifacts and test cases to run. +3. For each selected test case and artifact: + a. The script extracts test data from the corresponding zip file. + b. It executes the module's artifact extraction function. + c. The results are collected and formatted. +4. Test results are saved as JSON files. + +## Output + +The script generates JSON files containing test results, including: + +- Metadata about the test run (module name, artifact name, case number, etc.) +- Execution time and performance metrics +- The extracted data (headers and rows) + +Output files are stored in the `admin/test/results` directory. + +## Example + +```python test_module.py keyboard``` + + +This command will start the testing process for the keyboard module, prompting you to select specific artifacts and test cases. + +## Notes + +- The script uses test data created by the `make_test_data.py` script. +- Test results include information about the last Git commit for the tested module. +- You can run tests for all artifacts and all test cases by selecting 'all' when prompted. \ No newline at end of file diff --git a/admin/test/cases/data/testdata.keyboard.get_keyboard_app_usage.case001.zip b/admin/test/cases/data/testdata.keyboard.get_keyboard_app_usage.case001.zip new file mode 100644 index 00000000..6d4904ad Binary files /dev/null and b/admin/test/cases/data/testdata.keyboard.get_keyboard_app_usage.case001.zip differ diff --git a/admin/test/cases/data/testdata.keyboard.get_keyboard_lexicon.case001.zip b/admin/test/cases/data/testdata.keyboard.get_keyboard_lexicon.case001.zip new file mode 100644 index 00000000..b2e4844d Binary files /dev/null and b/admin/test/cases/data/testdata.keyboard.get_keyboard_lexicon.case001.zip differ diff --git a/admin/test/cases/data/testdata.keyboard.get_keyboard_usage_stats.case001.zip b/admin/test/cases/data/testdata.keyboard.get_keyboard_usage_stats.case001.zip new file mode 100644 index 00000000..3ebe37d0 Binary files /dev/null and b/admin/test/cases/data/testdata.keyboard.get_keyboard_usage_stats.case001.zip differ diff --git a/admin/test/cases/testdata.keyboard.json b/admin/test/cases/testdata.keyboard.json new file mode 100644 index 00000000..adb5cbc9 --- /dev/null +++ b/admin/test/cases/testdata.keyboard.json @@ -0,0 +1,50 @@ +{ + "case001": { + "description": "", + "maker": "", + "make_data": { + "input_data_path": "/Users/jameshabben/Documents/ios backup play/Josh/iOS_15_Public_Image.tar.gz", + "os": "macOS-15.0-x86_64-i386-64bit", + "timestamp": "2024-10-14T11:54:02.029434", + "last_commit": { + "hash": "25ee1e18a1ff21d062595753ba9b33bfc73df248", + "author_name": "stark4n6", + "author_email": "48143894+stark4n6@users.noreply.github.com", + "date": "2024-03-08T10:29:04-05:00", + "message": "Update keyboard.py" + } + }, + "artifacts": { + "get_keyboard_lexicon": { + "search_patterns": [ + "*/mobile/Library/Keyboard/*-dynamic.lm/dynamic-lexicon.dat" + ], + "file_count": 1, + "expected_output": { + "headers": [], + "data": [] + } + }, + "get_keyboard_app_usage": { + "search_patterns": [ + "*/mobile/Library/Keyboard/app_usage_database.plist" + ], + "file_count": 1, + "expected_output": { + "headers": [], + "data": [] + } + }, + "get_keyboard_usage_stats": { + "search_patterns": [ + "*/mobile/Library/Keyboard/user_model_database.sqlite*" + ], + "file_count": 3, + "expected_output": { + "headers": [], + "data": [] + } + } + } + } +} \ No newline at end of file diff --git a/admin/test/results/keyboard_get_keyboard_lexicon_case001_20241014182516.json b/admin/test/results/keyboard_get_keyboard_lexicon_case001_20241014182516.json new file mode 100644 index 00000000..4a4a6327 --- /dev/null +++ b/admin/test/results/keyboard_get_keyboard_lexicon_case001_20241014182516.json @@ -0,0 +1,25 @@ +{ + "metadata": { + "module_name": "keyboard", + "artifact_name": "Keyboard Dynamic Lexicon", + "function_name": "get_keyboard_lexicon", + "case_number": "case001", + "number_of_columns": 2, + "number_of_rows": 1, + "total_data_size_bytes": 2961, + "input_zip_path": "admin/test/cases/data/testdata.keyboard.get_keyboard_lexicon.case001.zip", + "start_time": "2024-10-14T18:25:16.578197+00:00", + "end_time": "2024-10-14T18:25:16.583042+00:00", + "run_time_seconds": 0.0008459091186523438 + }, + "headers": [ + "Found Strings", + "File Location" + ], + "data": [ + [ + "3rd,N(%O4fa3,about,N(%Oacb4,accident,actually,ads,N(%Oaf1a,afternoon,after,against,again,agreed,alarm,N(%Alarm,all,also,amazed,Android,and,another,any,apparently,appear,apps,app,6ap,are,around,attached,N(d,Attachment,attachment,audio,Austin,avatar,awesome,backwards,back,bad,N'Bandit,basic,battery,Beach,N(%Obeee,been,before,believe,+Below,best,better,bigger,big,binge,bite,bit,both,breakfast,burned,burner,but,calling,calls,call,canceled,cannot,can't,can,catching,charged,chat,Chinese,chose,close,clue,comes,confusing,cooked,correct,could,covered,cracked,crazy,create,creating,crowded,daily,N(%Daily,data,day,deal,decided,+Deck,definitely,deleted,delete,devices,didn't,did,dinner,DM's,DMs,doesn't,does,doing,N(,gDominant,N(,iDominion,done,don't,download,down,N(,cDo,drained,dropped,N'DS9,dying,eat,edit,emojis,episode,N(%Event,everyone,everything,excuse,experienced,experiencing,extra,N(%Of5c4,FaceTime,fair,fam,favorite,features,few,figure,find,finish,first,forgot,for,from,general,generated,generate,generating,getting,get,give,glad,going,gone,good,Google,got,grabbed,great,group,_Group,guess,Hal,handled,hangs,hang,happened,happens,happen,hard,has,have,head,heard,hearing,heck,hello,here,hey,hide,Hiw,horrendous,horrible,hours,hour,how's,how,huddle,hustle,N'Ibaudio,I'll,ill,N'Images,image,N(%qImage,I'm,incoming,initially,_Ios15,iOS,iPhone,$Is,it's,I've,June,just,keep,keyboard,kids,N'Kik,Kim,knows,know,lame,last,later,leaning,learn,let's,let,like,lines,list,location,lol,Lol,long,looks,look,lot,lunch,machine,main,make,many,man,may,kma,meetings,meeting,N(d7Memes,messages,message,messaging,4miening,mine,minutes,minute,momentarily,moment,more,morning,mostly,move,much,N'Murica,myself,native,needs,need,never,new,next,nice,nope,note,N(,Note,nothing,noticed,notifications,not,now,number,odd,off,old,ones,one,only,honl,open,ouch,our,out,overdue,over,own,pain,party,part,pay,people,percent,phone,picture,N(%]Pic,pic,plans,please,plot,point,poor,portability,present,private,probably,quick,quite,raining,recall,received,regular,relax,removed,remove,reply,retiring,room,Room,run,sailboat,same,saved,saving,says,season,secure,seems,seem,see,sending,send,sent,server,service,setting,set,shall,should,shows,show,sidetracked,since,sir,sitting,slightly,slowly,slow,something,some,sooo,sorry,sounds,N'1Space,speaking,standby,started,start,still,storage,stuck,stuff,sucks,sure,switch,takeout,N'takeou,takes,tell,testing,N'Test,test,texts,thank,that's,that,them,then,there's,there,these,they,the,N(,iThe,things,thing,think,this,thought,though,through,time,today,too,total,towards,trashed,ytrashws,trashy,truck,true,truth,try,turn,two,typo,underrated,unfortunately,unsend,use,video,Voice,waffles,waffle,want,wasn't,was,watched,watching,week,welcome,well,we'll,we've,what's,\nb\t,what,when,which,N'Whic,who,why,will,winding,window,with,won't,would,Wrightsville,N(%wss,N(%]w,yay,yeah,yea,yep,Yes,yet,you'd,N(%IYou'reI,1You're,you're,your,you", + "en-dynamic.lm/dynamic-lexicon.dat" + ] + ] +} \ No newline at end of file diff --git a/admin/test/results/keyboard_get_keyboard_lexicon_case001_20241014183005.json b/admin/test/results/keyboard_get_keyboard_lexicon_case001_20241014183005.json new file mode 100644 index 00000000..dd83b8d3 --- /dev/null +++ b/admin/test/results/keyboard_get_keyboard_lexicon_case001_20241014183005.json @@ -0,0 +1,32 @@ +{ + "metadata": { + "module_name": "keyboard", + "artifact_name": "Keyboard Dynamic Lexicon", + "function_name": "get_keyboard_lexicon", + "case_number": "case001", + "number_of_columns": 2, + "number_of_rows": 1, + "total_data_size_bytes": 2961, + "input_zip_path": "admin/test/cases/data/testdata.keyboard.get_keyboard_lexicon.case001.zip", + "start_time": "2024-10-14T18:30:05.265827+00:00", + "end_time": "2024-10-14T18:30:05.393523+00:00", + "run_time_seconds": 0.0005900859832763672, + "last_commit": { + "hash": "25ee1e18a1ff21d062595753ba9b33bfc73df248", + "author_name": "stark4n6", + "author_email": "48143894+stark4n6@users.noreply.github.com", + "date": "2024-03-08T10:29:04-05:00", + "message": "Update keyboard.py" + } + }, + "headers": [ + "Found Strings", + "File Location" + ], + "data": [ + [ + "3rd,N(%O4fa3,about,N(%Oacb4,accident,actually,ads,N(%Oaf1a,afternoon,after,against,again,agreed,alarm,N(%Alarm,all,also,amazed,Android,and,another,any,apparently,appear,apps,app,6ap,are,around,attached,N(d,Attachment,attachment,audio,Austin,avatar,awesome,backwards,back,bad,N'Bandit,basic,battery,Beach,N(%Obeee,been,before,believe,+Below,best,better,bigger,big,binge,bite,bit,both,breakfast,burned,burner,but,calling,calls,call,canceled,cannot,can't,can,catching,charged,chat,Chinese,chose,close,clue,comes,confusing,cooked,correct,could,covered,cracked,crazy,create,creating,crowded,daily,N(%Daily,data,day,deal,decided,+Deck,definitely,deleted,delete,devices,didn't,did,dinner,DM's,DMs,doesn't,does,doing,N(,gDominant,N(,iDominion,done,don't,download,down,N(,cDo,drained,dropped,N'DS9,dying,eat,edit,emojis,episode,N(%Event,everyone,everything,excuse,experienced,experiencing,extra,N(%Of5c4,FaceTime,fair,fam,favorite,features,few,figure,find,finish,first,forgot,for,from,general,generated,generate,generating,getting,get,give,glad,going,gone,good,Google,got,grabbed,great,group,_Group,guess,Hal,handled,hangs,hang,happened,happens,happen,hard,has,have,head,heard,hearing,heck,hello,here,hey,hide,Hiw,horrendous,horrible,hours,hour,how's,how,huddle,hustle,N'Ibaudio,I'll,ill,N'Images,image,N(%qImage,I'm,incoming,initially,_Ios15,iOS,iPhone,$Is,it's,I've,June,just,keep,keyboard,kids,N'Kik,Kim,knows,know,lame,last,later,leaning,learn,let's,let,like,lines,list,location,lol,Lol,long,looks,look,lot,lunch,machine,main,make,many,man,may,kma,meetings,meeting,N(d7Memes,messages,message,messaging,4miening,mine,minutes,minute,momentarily,moment,more,morning,mostly,move,much,N'Murica,myself,native,needs,need,never,new,next,nice,nope,note,N(,Note,nothing,noticed,notifications,not,now,number,odd,off,old,ones,one,only,honl,open,ouch,our,out,overdue,over,own,pain,party,part,pay,people,percent,phone,picture,N(%]Pic,pic,plans,please,plot,point,poor,portability,present,private,probably,quick,quite,raining,recall,received,regular,relax,removed,remove,reply,retiring,room,Room,run,sailboat,same,saved,saving,says,season,secure,seems,seem,see,sending,send,sent,server,service,setting,set,shall,should,shows,show,sidetracked,since,sir,sitting,slightly,slowly,slow,something,some,sooo,sorry,sounds,N'1Space,speaking,standby,started,start,still,storage,stuck,stuff,sucks,sure,switch,takeout,N'takeou,takes,tell,testing,N'Test,test,texts,thank,that's,that,them,then,there's,there,these,they,the,N(,iThe,things,thing,think,this,thought,though,through,time,today,too,total,towards,trashed,ytrashws,trashy,truck,true,truth,try,turn,two,typo,underrated,unfortunately,unsend,use,video,Voice,waffles,waffle,want,wasn't,was,watched,watching,week,welcome,well,we'll,we've,what's,\nb\t,what,when,which,N'Whic,who,why,will,winding,window,with,won't,would,Wrightsville,N(%wss,N(%]w,yay,yeah,yea,yep,Yes,yet,you'd,N(%IYou'reI,1You're,you're,your,you", + "en-dynamic.lm/dynamic-lexicon.dat" + ] + ] +} \ No newline at end of file diff --git a/admin/test/scripts/make_test_data.py b/admin/test/scripts/make_test_data.py new file mode 100644 index 00000000..54fcea98 --- /dev/null +++ b/admin/test/scripts/make_test_data.py @@ -0,0 +1,236 @@ +import os +import sys +import json +import zipfile +import tarfile +import fnmatch +import argparse +import time +import platform +import subprocess +from datetime import datetime +from collections import defaultdict +from io import BytesIO + +# Add the correct path to the system path +repo_root = os.path.abspath(os.path.join(os.path.dirname(__file__), '..', '..', '..')) +sys.path.append(repo_root) + +def get_artifact_info(module_name): + try: + module = __import__(f'scripts.artifacts.{module_name}', fromlist=['__artifacts_v2__']) + return module.__artifacts_v2__ + except ImportError: + print(f"Error: Could not import module 'scripts.artifacts.{module_name}'") + sys.exit(1) + +def get_last_commit_info(file_path): + try: + # Get the last commit hash + git_log = subprocess.check_output(['git', 'log', '-n', '1', '--pretty=format:%H|%an|%ae|%ad|%s', '--', file_path], universal_newlines=True) + commit_hash, author_name, author_email, commit_date, commit_message = git_log.strip().split('|') + + # Convert the commit date to ISO format + commit_date = datetime.strptime(commit_date, '%a %b %d %H:%M:%S %Y %z').isoformat() + + return { + 'hash': commit_hash, + 'author_name': author_name, + 'author_email': author_email, + 'date': commit_date, + 'message': commit_message + } + except subprocess.CalledProcessError: + return None + +def process_archive(input_file, all_patterns): + matching_files = defaultdict(dict) + print(f"Searching for files matching all patterns") + start_time = time.time() + local_start_time = datetime.now().strftime("%Y-%m-%d %H:%M:%S") + print(f"Search started at: {local_start_time}") + #print("Progress: ", end="", flush=True) + + file_count = 0 + if input_file.endswith('.zip'): + with zipfile.ZipFile(input_file, 'r') as zip_ref: + for file in zip_ref.namelist(): + file_count += 1 + if file_count % 10000 == 0: + print(".", end="", flush=True) + for artifact, patterns in all_patterns.items(): + for pattern in patterns: + if fnmatch.fnmatch(file, pattern): + matching_files[artifact][file] = zip_ref.read(file) + break + elif input_file.endswith('.tar.gz') or input_file.endswith('.tgz'): + print("Processing tar.gz file") + with tarfile.open(input_file, 'r:gz') as tar_ref: + print("Searching files\n") + for member in tar_ref.getmembers(): + file_count += 1 + if file_count % 10000 == 0: + print(".", end="", flush=True) + for artifact, patterns in all_patterns.items(): + for pattern in patterns: + if fnmatch.fnmatch(member.name, pattern): + matching_files[artifact][member.name] = tar_ref.extractfile(member).read() + break + elif input_file.endswith('.tar'): + with tarfile.open(input_file, 'r') as tar_ref: + for member in tar_ref.getmembers(): + file_count += 1 + if file_count % 10000 == 0: + print(".", end="", flush=True) + for artifact, patterns in all_patterns.items(): + for pattern in patterns: + if fnmatch.fnmatch(member.name, pattern): + matching_files[artifact][member.name] = tar_ref.extractfile(member).read() + break + else: + raise ValueError("Unsupported file format. Please use .zip, tar, or .tar.gz") + + print() # New line after progress dots + end_time = time.time() + local_end_time = datetime.now().strftime("%Y-%m-%d %H:%M:%S") + print(f"Search completed at: {local_end_time}") + total_matches = sum(len(files) for files in matching_files.values()) + print(f"Searched {file_count} files, found {total_matches} matching files in {end_time - start_time:.2f} seconds") + return matching_files + +def create_test_data(module_name, case_number, input_file): + overall_start_time = time.time() + print(f"Processing module: {module_name}") + artifacts = get_artifact_info(module_name) + + # Get git information for the module + module_path = os.path.join(repo_root, 'scripts', 'artifacts', f'{module_name}.py') + last_commit_info = get_last_commit_info(module_path) + + # Update paths for new folder structure + cases_dir = os.path.join(repo_root, 'admin', 'test', 'cases') + data_dir = os.path.join(cases_dir, 'data') + os.makedirs(data_dir, exist_ok=True) + + json_file = os.path.join(cases_dir, f"testdata.{module_name}.json") + + # Check if JSON file exists and is not empty + if os.path.exists(json_file) and os.path.getsize(json_file) > 0: + try: + with open(json_file, 'r') as f: + json_data = json.load(f) + print(f"Updating existing JSON file: {json_file}") + except json.JSONDecodeError: + print(f"Existing JSON file is invalid.") + create_new = input("Do you want to create new JSON data? This will overwrite the existing file. (y/n): ") + if create_new.lower() != 'y': + print("Aborting operation.") + sys.exit(1) + json_data = {} + else: + if os.path.exists(json_file): + create_new = input(f"JSON file {json_file} is empty. Create new JSON data? (y/n): ") + else: + create_new = input(f"JSON file {json_file} does not exist. Create new JSON file? (y/n): ") + + if create_new.lower() != 'y': + print("Aborting operation.") + sys.exit(1) + json_data = {} + print(f"Creating new JSON file: {json_file}") + + # Check if case number already exists + case_key = f"case{case_number:03d}" + if case_key in json_data: + overwrite = input(f"Case {case_number} already exists. Do you want to overwrite it? (y/n): ") + if overwrite.lower() != 'y': + print("Aborting operation.") + sys.exit(1) + + # Create or update case entry + json_data[case_key] = { + "description": "", + "maker": "", + "make_data": { + "input_data_path": os.path.abspath(input_file), + "os": platform.platform(), + "timestamp": datetime.now().isoformat(), + "last_commit": last_commit_info + }, + "artifacts": {} + } + + # Collect all patterns + all_patterns = {artifact_name: artifact_info['paths'] for artifact_name, artifact_info in artifacts.items()} + + # Process archive and get matching files for all artifacts at once + matching_files = process_archive(input_file, all_patterns) + + for artifact_name, artifact_info in artifacts.items(): + artifact_start_time = time.time() + local_artifact_start_time = datetime.now().strftime("%Y-%m-%d %H:%M:%S") + print(f"\nProcessing artifact: {artifact_name}") + print(f"Artifact processing started at: {local_artifact_start_time}") + + # Create file name in the new data directory + file_name = os.path.join(data_dir, f"testdata.{module_name}.{artifact_name}.{case_key}.zip") + + # Create zip file with matching files + zip_start_time = time.time() + with zipfile.ZipFile(file_name, 'w') as zip_file: + for file_path, file_content in matching_files[artifact_name].items(): + zip_file.writestr(file_path, file_content) + zip_end_time = time.time() + + # Update JSON data for this artifact + json_data[case_key]["artifacts"][artifact_name] = { + "search_patterns": artifact_info['paths'], + "file_count": len(matching_files[artifact_name]), + "expected_output": { + "headers": [], + "data": [] + } + } + + print(f"Test data created: {file_name}") + print(f"Added {len(matching_files[artifact_name])} files to the test data") + #print(f"Zip file creation took {zip_end_time - zip_start_time:.2f} seconds") + artifact_end_time = time.time() + local_artifact_end_time = datetime.now().strftime("%Y-%m-%d %H:%M:%S") + #print(f"Artifact processing completed at: {local_artifact_end_time}") + #print(f"Total processing time for {artifact_name}: {artifact_end_time - artifact_start_time:.2f} seconds") + + # Write updated JSON data + json_start_time = time.time() + with open(json_file, 'w') as f: + json.dump(json_data, f, indent=2) + json_end_time = time.time() + + print(f"\nJSON file updated: {json_file}") + print(f"JSON file update took {json_end_time - json_start_time:.2f} seconds") + print("Please update the JSON file with test case details.") + + overall_end_time = time.time() + local_overall_end_time = datetime.now().strftime("%Y-%m-%d %H:%M:%S") + print(f"\nTotal processing time: {overall_end_time - overall_start_time:.2f} seconds") + print(f"Script completed at: {local_overall_end_time}") + +if __name__ == "__main__": + parser = argparse.ArgumentParser(description="Create test data for artifacts") + parser.add_argument("module_name", help="Name of the module (e.g., keyboard or keyboard.py)") + parser.add_argument("case_number", type=int, help="Case number for the test data") + parser.add_argument("input_file", help="Path to the input file (zip, tar, or tar.gz)") + + args = parser.parse_args() + + # Remove .py extension if present + module_name = args.module_name[:-3] if args.module_name.endswith('.py') else args.module_name + + script_start_time = datetime.now().strftime("%Y-%m-%d %H:%M:%S") + print(f"Starting test data creation for module: {module_name}, case number: {args.case_number}") + print(f"Input file: {args.input_file}") + print(f"Script started at: {script_start_time}") + + create_test_data(module_name, args.case_number, args.input_file) + + print("\nTest data creation completed.") diff --git a/admin/test/scripts/test_module.py b/admin/test/scripts/test_module.py index 83173c6b..76df6dce 100644 --- a/admin/test/scripts/test_module.py +++ b/admin/test/scripts/test_module.py @@ -7,6 +7,9 @@ from pathlib import Path from datetime import datetime, timezone import time +from functools import wraps +import shutil +import subprocess # Adjust import paths as necessary sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..', '..', '..'))) @@ -21,75 +24,197 @@ def mock_environment(): return mock_seeker, mock_wrap_text -def process_zip(zip_path, module_name): - # Extract the module's processing function +def process_artifact(zip_path, module_name, artifact_name, artifact_data): + # Import the module module = importlib.import_module(f'scripts.artifacts.{module_name}') - process_func = getattr(module, f'get_{module_name}') - # Set up mocked environment - mock_seeker, mock_wrap_text = mock_environment() + # Get the function to test + func_to_test = getattr(module, artifact_name) + + # Extract the original function from the decorated one + original_func = func_to_test + while hasattr(original_func, '__wrapped__'): + original_func = original_func.__wrapped__ + + # Prepare mock objects + mock_report_folder = 'mock_report_folder' + mock_seeker = MagicMock() + mock_wrap_text = MagicMock() + timezone_offset = 'UTC' # Prepare a list to hold all files all_files = [] - # Extract all files from the zip - with zipfile.ZipFile(zip_path, 'r') as zip_ref: - temp_dir = Path('temp_extract') - zip_ref.extractall(temp_dir) + # Create the base temp directory if it doesn't exist + base_temp_dir = Path('admin/test/temp') + base_temp_dir.mkdir(parents=True, exist_ok=True) + + # Create a unique temporary directory within the base temp directory + temp_dir = base_temp_dir / f'extract_{module_name}_{artifact_name}_{int(time.time())}' + + # Get the module file path + module_file_path = module.__file__ + + # Get the last commit information + last_commit_info = get_last_commit_info(module_file_path) + + try: + # Extract all files from the zip + with zipfile.ZipFile(zip_path, 'r') as zip_ref: + zip_ref.extractall(temp_dir) + + # Recursively get all files + for root, _, files in os.walk(temp_dir): + for file in files: + all_files.append(os.path.join(root, file)) + + # Call the original function directly + start_time = time.time() + data_headers, data_list, _ = original_func(all_files, mock_report_folder, mock_seeker, mock_wrap_text, timezone_offset) + end_time = time.time() - # Recursively get all files - for root, _, files in os.walk(temp_dir): - for file in files: - all_files.append(os.path.join(root, file)) - - # Call the module's processing function - start_time = time.time() - data_headers, data_list, _ = process_func(all_files, 'test_report', mock_seeker, mock_wrap_text, 'UTC') - end_time = time.time() - - # Clean up temp directory - for file in all_files: - os.remove(file) - os.rmdir(temp_dir) - - return data_headers, data_list, end_time - start_time + return data_headers, data_list, end_time - start_time, last_commit_info + + finally: + # Clean up temp directory + if temp_dir.exists(): + shutil.rmtree(temp_dir, ignore_errors=True) def calculate_data_size(data_list): return sum(len(str(item).encode('utf-8')) for row in data_list for item in row) +def load_test_cases(module_name): + cases_file = Path(f'admin/test/cases/testdata.{module_name}.json') + with open(cases_file, 'r') as f: + return json.load(f) + +def get_artifact_names(module_name, test_cases): + artifact_names = set() + for case in test_cases.values(): + artifact_names.update(case['artifacts'].keys()) + return list(artifact_names) + +def select_case(test_cases): + print("Available cases:") + sorted_cases = sorted(test_cases.keys()) + for i, case_num in enumerate(sorted_cases, 1): + case_data = test_cases[case_num] + print(f"{i}. {case_num}: {case_data.get('description', 'No description')}") + + while True: + case_choice = input("Enter case number, name, or 'all' for all cases: ").strip().lower() + if case_choice == 'all': + return 'all' + try: + index = int(case_choice) - 1 + if 0 <= index < len(sorted_cases): + return sorted_cases[index] + except ValueError: + if case_choice in test_cases: + return case_choice + print("Invalid choice. Please try again.") + +def select_artifact(artifact_names): + print("Available artifacts:") + sorted_artifacts = sorted(artifact_names) + for i, name in enumerate(sorted_artifacts, 1): + print(f"{i}. {name}") + + while True: + artifact_choice = input("Enter artifact number, name, or 'all' for all artifacts: ").strip().lower() + if artifact_choice == 'all': + return 'all' + try: + index = int(artifact_choice) - 1 + if 0 <= index < len(sorted_artifacts): + return sorted_artifacts[index] + except ValueError: + if artifact_choice in sorted_artifacts: + return artifact_choice + print("Invalid choice. Please try again.") + +def main(module_name, artifact_name=None, case_number=None): + test_cases = load_test_cases(module_name) + artifact_names = get_artifact_names(module_name, test_cases) + + if not artifact_name: + artifact_name = select_artifact(artifact_names) + + if not case_number: + case_number = select_case(test_cases) + + cases_to_process = [case_number] if case_number != 'all' else test_cases.keys() + artifacts_to_process = [artifact_name] if artifact_name != 'all' else artifact_names + + module = importlib.import_module(f'scripts.artifacts.{module_name}') + artifacts_info = getattr(module, '__artifacts_v2__', {}) + + for case in cases_to_process: + case_data = test_cases[case] + zip_path = Path('admin/test/cases/data') / f"testdata.{module_name}.{artifact_name}.{case}.zip" + + for artifact in artifacts_to_process: + if artifact in case_data['artifacts']: + artifact_data = case_data['artifacts'][artifact] + artifact_info = artifacts_info.get(artifact, {}) + start_datetime = datetime.now(timezone.utc) + headers, data, run_time, last_commit_info = process_artifact(zip_path, module_name, artifact, artifact_data) + end_datetime = datetime.now(timezone.utc) + + result = { + "metadata": { + "module_name": module_name, + "artifact_name": artifact_info.get('name', artifact), + "function_name": artifact, + "case_number": case, + "number_of_columns": len(headers), + "number_of_rows": len(data), + "total_data_size_bytes": calculate_data_size(data), + "input_zip_path": str(zip_path), + "start_time": start_datetime.isoformat(), + "end_time": end_datetime.isoformat(), + "run_time_seconds": run_time, + "last_commit": last_commit_info + }, + "headers": headers, + "data": data + } + + output_dir = Path('admin/test/results') + output_dir.mkdir(parents=True, exist_ok=True) + output_file = output_dir / f"{module_name}_{artifact}_{case}_{start_datetime.strftime('%Y%m%d%H%M%S')}.json" + + with open(output_file, 'w') as f: + json.dump(result, f, indent=2) + + print(f"Test results for {module_name} - {artifact} - Case {case} saved to {output_file}") + +def get_last_commit_info(file_path): + try: + # Get the last commit hash + git_log = subprocess.check_output(['git', 'log', '-n', '1', '--pretty=format:%H|%an|%ae|%ad|%s', '--', file_path], universal_newlines=True) + commit_hash, author_name, author_email, commit_date, commit_message = git_log.strip().split('|') + + # Convert the commit date to ISO format + commit_date = datetime.strptime(commit_date, '%a %b %d %H:%M:%S %Y %z').isoformat() + + return { + 'hash': commit_hash, + 'author_name': author_name, + 'author_email': author_email, + 'date': commit_date, + 'message': commit_message + } + except subprocess.CalledProcessError: + return None + if __name__ == '__main__': - if len(sys.argv) != 3: - print("Usage: python test_module.py ") + if len(sys.argv) < 2: + print("Usage: python test_module.py [artifact_name] [case_number]") sys.exit(1) module_name = sys.argv[1] - zip_path = sys.argv[2] - - start_datetime = datetime.now(timezone.utc) - headers, data, run_time = process_zip(zip_path, module_name) - end_datetime = datetime.now(timezone.utc) - - result = { - "metadata": { - "module_name": module_name, - "function_name": f"get_{module_name}", - "number_of_columns": len(headers), - "number_of_rows": len(data), - "total_data_size_bytes": calculate_data_size(data), - "input_zip_path": os.path.abspath(zip_path), - "start_time": start_datetime.isoformat(), - "end_time": end_datetime.isoformat(), - "run_time_seconds": run_time - }, - "headers": headers, - "data": data - } - - output_dir = Path('/admin/test/results') - output_dir.mkdir(parents=True, exist_ok=True) - output_file = output_dir / f"{module_name}_get_{module_name}_{start_datetime.strftime('%Y%m%d%H%M%S')}.json" - - with open(output_file, 'w') as f: - json.dump(result, f, indent=2) - - print(f"Test results saved to {output_file}") \ No newline at end of file + artifact_name = sys.argv[2] if len(sys.argv) > 2 else None + case_number = sys.argv[3] if len(sys.argv) > 3 else None + + main(module_name, artifact_name, case_number) diff --git a/scripts/artifacts/keyboard.py b/scripts/artifacts/keyboard.py index d2f7c381..5533e75b 100644 --- a/scripts/artifacts/keyboard.py +++ b/scripts/artifacts/keyboard.py @@ -1,138 +1,111 @@ +__artifacts_v2__ = { + "get_keyboard_lexicon": { + "name": "Keyboard Dynamic Lexicon", + "description": "Extracts dynamic lexicon data from the keyboard", + "author": "@your_username", + "version": "1.0", + "date": "2023-05-24", + "requirements": "none", + "category": "User Activity", + "notes": "", + "paths": ('*/mobile/Library/Keyboard/*-dynamic.lm/dynamic-lexicon.dat',), + "output_types": ["all"] + }, + "get_keyboard_app_usage": { + "name": "Keyboard Application Usage", + "description": "Extracts keyboard application usage data", + "author": "@your_username", + "version": "1.0", + "date": "2023-05-24", + "requirements": "none", + "category": "User Activity", + "notes": "", + "paths": ('*/mobile/Library/Keyboard/app_usage_database.plist',), + "output_types": ["all"] + }, + "get_keyboard_usage_stats": { + "name": "Keyboard Usage Stats", + "description": "Extracts keyboard usage statistics", + "author": "@your_username", + "version": "1.0", + "date": "2023-05-24", + "requirements": "none", + "category": "User Activity", + "notes": "", + "paths": ('*/mobile/Library/Keyboard/user_model_database.sqlite*',), + "output_types": ["all"] + } +} + import plistlib import sqlite3 import string from os.path import dirname -from scripts.artifact_report import ArtifactHtmlReport -from scripts.ilapfuncs import logfunc, tsv, timeline, open_sqlite_db_readonly, convert_ts_human_to_utc, convert_utc_human_to_timezone +from scripts.ilapfuncs import logfunc, open_sqlite_db_readonly, convert_ts_human_to_utc, convert_utc_human_to_timezone, artifact_processor, convert_plist_date_to_timezone_offset -def get_keyboard(files_found, report_folder, seeker, wrap_text, timezone_offset): - data_list_usage = [] - data_list_lex = [] - data_list_stats = [] - tsv_data_list = [] +@artifact_processor +def get_keyboard_lexicon(files_found, report_folder, seeker, wrap_text, timezone_offset): + data_list = [] for file_found in files_found: - file_found = str(file_found) + print(file_found) strings_list = [] + with open(file_found, 'rb') as dat_file: + print('file opened') + dat_content = dat_file.read() + dat_content_decoded = str(dat_content, 'utf-8', 'ignore') + found_str = '' + for char in dat_content_decoded: + if char in string.printable: + found_str += char + else: + if found_str and len(found_str) > 2 and found_str != 'DynamicDictionary-9': + strings_list.append(found_str) + found_str = '' - # Keyboard Lexicon - if file_found.endswith('dynamic-lexicon.dat'): - with open(file_found, 'rb') as dat_file: - dat_content = dat_file.read() - dat_content_decoded = str(dat_content, 'utf-8', 'ignore') - found_str = '' - for char in dat_content_decoded: - if char in string.printable: - found_str += char - else: - if found_str: - if len(found_str) > 2: # reduce noise - if found_str != 'DynamicDictionary-9': - strings_list.append(found_str) - found_str = '' - - if file_found.find("Keyboard/") >= 0: - slash = '/' - else: - slash = '\\' - location_file_found = file_found.split(f"Keyboard{slash}", 1)[1] - data_list_lex.append(('
'.join(strings_list), location_file_found)) - tsv_data_list.append((','.join(strings_list), location_file_found)) - - dir_file_found = dirname(file_found).split('Keyboard', 1)[0] + 'Keyboard' - - # Keyboard App Usage - if file_found.endswith('app_usage_database.plist'): - with open(file_found, "rb") as plist_file: - plist_content = plistlib.load(plist_file) - for app in plist_content: - for entry in plist_content[app]: - data_list_usage.append((entry['startDate'], app, entry['appTime'], ', '.join(map(str, entry['keyboardTimes'])))) - - # Keyboard Usage Stats - if file_found.endswith('user_model_database.sqlite'): - db = open_sqlite_db_readonly(file_found) - cursor = db.cursor() - cursor.execute(''' - select - datetime(creation_date,'unixepoch'), - datetime(last_update_date,'unixepoch'), - key, - value - from usermodeldurablerecords - ''') - - all_rows = cursor.fetchall() - usageentries = len(all_rows) - - if usageentries > 0: - for row in all_rows: - create_ts = convert_utc_human_to_timezone(convert_ts_human_to_utc(row[0]),timezone_offset) - update_ts = convert_utc_human_to_timezone(convert_ts_human_to_utc(row[1]),timezone_offset) - - data_list_stats.append((create_ts,update_ts,row[2],row[3],file_found)) - - else: - continue + location_file_found = file_found.split("Keyboard/", 1)[1] if "Keyboard/" in file_found else file_found.split("Keyboard\\", 1)[1] + data_list.append((','.join(strings_list), location_file_found)) - # Keyboard Lexicon Report - if data_list_lex: - report = ArtifactHtmlReport('Keyboard Dynamic Lexicon') - report.start_artifact_report(report_folder, 'Keyboard Dynamic Lexicon') - report.add_script() - data_headers = ('Found Strings', 'File Location') - report.write_artifact_data_table(data_headers, data_list_lex, dir_file_found, html_no_escape=['Found Strings']) - report.end_artifact_report() - - tsvname = 'Keyboard Dynamic Lexicon' - tsv(report_folder, data_headers, tsv_data_list, tsvname) - - tlactivity = 'Keyboard Dynamic Lexicon' - timeline(report_folder, tlactivity, tsv_data_list, data_headers) - - else: - logfunc('No Keyboard Dynamic Lexicon data found') - - # Keyboard App Usage Report - if data_list_usage: - report = ArtifactHtmlReport('Keyboard Application Usage') - report.start_artifact_report(report_folder, 'Keyboard Application Usage') - report.add_script() - data_headers = ('Date', 'Application Name', 'Application Time Used in Seconds', 'Keyboard Times Used in Seconds') - report.write_artifact_data_table(data_headers, data_list_usage, file_found) - report.end_artifact_report() - - tsvname = 'Keyboard Application Usage' - tsv(report_folder, data_headers, data_list_usage, tsvname) + data_headers = ('Found Strings', 'File Location') + print(len(data_list)) + return data_headers, data_list, dirname(files_found[0]).split('Keyboard', 1)[0] + 'Keyboard' - tlactivity = 'Keyboard Application Usage' - timeline(report_folder, tlactivity, data_list_usage, data_headers) +@artifact_processor +def get_keyboard_app_usage(files_found, report_folder, seeker, wrap_text, timezone_offset): + data_list = [] + + for file_found in files_found: + with open(file_found, "rb") as plist_file: + plist_content = plistlib.load(plist_file) + for app in plist_content: + for entry in plist_content[app]: + start_date = convert_plist_date_to_timezone_offset(entry['startDate'], timezone_offset) + data_list.append((start_date, app, entry['appTime'], ', '.join(map(str, entry['keyboardTimes'])))) + + data_headers = (('Date', 'datetime'), 'Application Name', 'Application Time Used in Seconds', 'Keyboard Times Used in Seconds') + return data_headers, data_list, files_found[0] - else: - logfunc('No Keyboard Application Usage found') +@artifact_processor +def get_keyboard_usage_stats(files_found, report_folder, seeker, wrap_text, timezone_offset): + data_list = [] + + for file_found in files_found: + db = open_sqlite_db_readonly(file_found) + cursor = db.cursor() + cursor.execute(''' + SELECT + datetime(creation_date,'unixepoch'), + datetime(last_update_date,'unixepoch'), + key, + value + FROM usermodeldurablerecords + ''') - # Keyboard Usage Stats Report - if data_list_stats: - report = ArtifactHtmlReport('Keyboard Usage Stats') - report.start_artifact_report(report_folder, 'Keyboard Usage Stats') - report.add_script() - data_headers = ('Creation Date', 'Last Update Date', 'Key', 'Value', 'Source File') - report.write_artifact_data_table(data_headers, data_list_stats, 'See source paths below') - report.end_artifact_report() - - tsvname = 'Keyboard Usage Stats' - tsv(report_folder, data_headers, data_list_stats, tsvname) - - tlactivity = 'Keyboard Usage Stats' - timeline(report_folder, tlactivity, data_list_stats, data_headers) - - else: - logfunc('No Keyboard Application Usage found') - -__artifacts__ = { - "keyboard": ( - "Keyboard", - ('*/mobile/Library/Keyboard/*-dynamic.lm/dynamic-lexicon.dat','*/mobile/Library/Keyboard/app_usage_database.plist','*/mobile/Library/Keyboard/user_model_database.sqlite*'), - get_keyboard) -} \ No newline at end of file + for row in cursor.fetchall(): + create_ts = convert_utc_human_to_timezone(convert_ts_human_to_utc(row[0]), timezone_offset) + update_ts = convert_utc_human_to_timezone(convert_ts_human_to_utc(row[1]), timezone_offset) + data_list.append((create_ts, update_ts, row[2], row[3], file_found)) + + data_headers = (('Creation Date', 'datetime'), ('Last Update Date', 'datetime'), 'Key', 'Data Value', 'Source File') + return data_headers, data_list, 'See source paths in data'