Logfile Reconciliation - Cognitive Bias (CB)

Experimental data analysis (Oct / Nov 2023)

Authors
Affiliations

Cathrynne Henshall

Michael J. Booth

Published

Tue Nov 12, 2024 11:38 AM

Abstract

Reconciliation of the CB logfiles generated during the experiments.

Purpose of This Notebook

This notebook serves as an exploratory tool for examining the log files produced during the Cognitive Bias (CB) horse behavioural experiments conducted in October and November 2023. It facilitates the experimentation with text parsing techniques on the files before they are imported into a database. The primary objective is to reconcile which log files should be included in or excluded from the analysis.

Experiment details and naming conventions

Logfile Exclusion rules

Logfiles that are from test runs and also bad data need to be excluded from the analysis.

Rules are case-insensitive. Files which satisfy the following conditions are excluded:

  • TODO

Problems with log file names during experiments

  • TODO

Time differences

For each trial we calculate the following time differences:

Cognitive Bias Experiments

  • Extract “RIGHT” or “LEFT” from the Comment field.
  • Also extract details from log file name
    1. Training experiments:
    • Type 1
    • Type 2
    1. Testing experiments:
    • Type 1
    • Type 2
    • Type 3
    • Type 4 (re-uses Type 1 with indicator to distinguish in Comment field)

Time differences

For each trial:

Training Type 1 (randomised versus fixed): - Start datetime = Green button pressed and horse is released - Capture positive (GO) / negative (NOGO) response time subject to maximum cutoff time (e.g. 30 seconds) - In addition to left/right positioning of feed, there are also median, near positive and near negative positions.

TODO check with CH: Test only - be in all?

Log file reconciliation

Setup project & directories

This is the init_notebook_mode cell from ITables v2.1.4
(you should not see this message - is your notebook trusted?)
INFO     | Data directory: /Users/mjboothaus/code/github/databooth/horse-logic/data
INFO     | Data directory purpose: Parent directory for raw and processed data
INFO     | Sql directory: /Users/mjboothaus/code/github/databooth/horse-logic/sql
INFO     | Sql directory purpose: Store SQL scripts
INFO     | Output directory: /Users/mjboothaus/code/github/databooth/horse-logic/notebooks/results/CB
INFO     | Output directory purpose: Store output files and results by experiment type
INFO     | Logfiles directory: /Users/mjboothaus/code/github/databooth/horse-logic/data/results/zips/cb_data
INFO     | Logfiles directory purpose: Store for the raw log files
INFO     | Notebooks directory: /Users/mjboothaus/code/github/databooth/horse-logic/notebooks
INFO     | Notebooks directory purpose: Jupyter notebooks for performing analysis
INFO     | Existing database file deleted: /Users/mjboothaus/code/github/databooth/horse-logic/data/Experiments_CB_2023_Q4.ddb
INFO     | Database file path: /Users/mjboothaus/code/github/databooth/horse-logic/data/Experiments_CB_2023_Q4.ddb
INFO     | Database purpose: Main project databases (outputs) by experiment type
INFO     | Project initialised (CB): config defined in project_config.yaml

Get Subject info

INFO     | Loaded subject info from: /Users/mjboothaus/code/github/databooth/horse-logic/docs/from_CH/Cohort data for MB.xlsx
INFO     | Subject count: 22
INFO     | Sorted subject names:
    apollo, ash, atom, bonnie, clover, dodge, dougie, dusty, filly, freya, george, gio, jelly, molly, mowgli, myrtle, 
    nix, olive, pumba, smudge, teddy, yoshi

Initial exclusions (rule-based)

Rules:

  • Ignore all CBF1 files - data will not be analysed (CH: What does CBF1 mean? and similar)

  • Ignore all Olive files (6 log files)

  • Ignore Maple CBT1 on 9 Oct (CH: Did Maple have a different name?)

  • Run check to see how many N bucket GO responses exceed 30s in first 3 days (CH: Please explain)

  • 29 *test*.log files (exclude)

### Override - files to be excluded specified in following CSV file 16 October 2024 by email from CH

exclude_16Oct2024_csv = (
    project.project_dir / "docs" / "from_CH" / "CB_all_included_files_CH_16Oct2024.csv"
)
exclude_16Oct2024_df = pd.read_csv(
    exclude_16Oct2024_csv, header=None, names=["Exclude", "csv_index", "logfilename"]
)
exclude_16Oct2024_df.head()
Exclude csv_index logfilename
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)
# Create LOGFILES_TO_INCLUDE_16Oct2024
LOGFILES_TO_INCLUDE_16Oct2024 = exclude_16Oct2024_df[exclude_16Oct2024_df["Exclude"].isna()][
    "logfilename"
].tolist()

# Create LOGFILES_TO_EXCLUDE_16Oct2024
LOGFILES_TO_EXCLUDE_16Oct2024 = exclude_16Oct2024_df[~exclude_16Oct2024_df["Exclude"].isna()][
    "logfilename"
].tolist()
len(LOGFILES_TO_INCLUDE_16Oct2024) + len(LOGFILES_TO_EXCLUDE_16Oct2024)
244
len(LOGFILES_TO_INCLUDE_16Oct2024)
183
len(list(set(LOGFILES_TO_INCLUDE_16Oct2024)))
183

Identify test log files (to exclude)

INFO     | Testing log files list. 29 rows exported to CSV.

Identify olive log files to exclude

INFO     | Olive log files list. 6 rows exported to CSV.

Identify filly log files to exclude

WARNING: These are no filly log files identified.

INFO     | Filly log files list. 12 rows exported to CSV.
#### Override the previous list

logs = Logs(path=LOGFILES_DIR, patterns=LOGFILES_TO_INCLUDE_16Oct2024, include=True)
INFO     | Logs: 183 log files in /Users/mjboothaus/code/github/databooth/horse-logic/data/results/zips/cb_data
INFO     | Included logfiles: 183
INFO     | Excluded logfiles: 96
Excluded
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

Included log files

INFO     | List of files to be included in analysis. 183 rows exported to CSV.

Sessions summary by subject name

1. Atom: 9 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

2. Ash: 8 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

3. Mowgli: 8 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

4. Teddy: 8 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

5. Dodge: 8 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

6. Filly: 8 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

7. Dougie: 8 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

8. Bonnie: 9 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

9. Apollo: 10 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

10. Molly: 9 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

11. Jelly: 8 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

12. Smudge: 10 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

13. George: 7 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

14. Myrtle: 10 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

15. Yoshi: 10 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

16. Nix: 10 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

17. Gio: 8 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

18. Dusty: 10 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

19. Freya: 7 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)
INFO     | Subject olive: No experiments conducted

21. Pumba: 8 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

22. Clover: 10 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

Session Summary

Subject number Subject name Session count
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)
INFO     | Session overview (subject session counts). 22 rows exported to CSV.
original_filename subject_name experiment_type session_number datetime time_diff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)
INFO     | List of log files excluded. 96 rows exported to CSV.
INFO     | Experiment summary. 183 rows exported to CSV.
INFO     | File list of log files included. 183 rows exported to CSV.