Logfile Reconciliation - Reward Prediction Error (RPE)

Purpose

This notebook serves as an exploratory tool for examining the log files produced during the horse behavioural experiments conducted in October and November 2023. It facilitates the experimentation with text parsing techniques on the files before they are imported into a DuckDB database.

The primary objectives are:

To reconcile which log files should be included in or excluded from the analysis.
To conduct experiments with regular expressions (regex) aimed at extracting pertinent data and fields from the log files.

Experiment details and naming conventions

There are two main types of Experiment:

Reward Prediction (RPE)
Cognitive Bias (CB)

The RPE have the following subtypes:

RPE-A : acquisition of response
RPE-H : habit formation
RPE-E : extinction of response
RPE-ER : extinction prior to reinstatement of response
RPE-R : reinstatement of response

RPE-type experiments

A new experiment type (RPE-ER) was created during the experiments which was not in the original specification.

Logfile Exclusion rules

Logfiles that are from test runs and also bad data need to be excluded from the analysis.

Rules are case-insensitive. Files which satisfy the following conditions are excluded:

All files with _TEST_ as the subject name
All files with _FRECKLE_ as the subject name
Some files with _BONNIE_ (14 legitimate files - TODO: CH to confirm details)
All files with test in the Comment field
Possibly some logs with very short run-times - TODO: CH to confirm the files that have been identified
All files with _OLIVE_ as the subject name - probably discard, treat as optional for now (TODO: CH to confirm treatment)

Problems with log file names during experiments

Appears that for greater than 20 trials (subjects?) the value in the filename was reported as NaN. e.g. see pumba experiments.
Other NaNs? There are 31 files with NaN (some are restarts) - TODO: Confirm with CH why were there restarts? Why not “new” experiment?

Time differences

For each trial we calculate the following time differences:

Time delta: (touch datetime - start tone datetime)
Time delta: (Next start datetime - dispense of pellets datetime)
Time delta: (Dispense final pellets datetime - start tone datetime)

We use item 3 as cross check on the consistency of previous time deltas.

These calculated quantities are the same for all RPE-type experiments.

Setup project & directories

This is the init_notebook_mode cell from ITables v2.1.4
(you should not see this message - is your notebook trusted?)

INFO     | Data directory: /Users/mjboothaus/code/github/databooth/horse-logic/data
INFO     | Data directory purpose: Parent directory for raw and processed data
INFO     | Sql directory: /Users/mjboothaus/code/github/databooth/horse-logic/sql
INFO     | Sql directory purpose: Store SQL scripts
INFO     | Output directory: /Users/mjboothaus/code/github/databooth/horse-logic/notebooks/results/RPE
INFO     | Output directory purpose: Store output files and results by experiment type
INFO     | Logfiles directory: /Users/mjboothaus/code/github/databooth/horse-logic/data/results/zips/data_17Jan2020_email_hillydale_equine
INFO     | Logfiles directory purpose: Store for the raw log files
INFO     | Notebooks directory: /Users/mjboothaus/code/github/databooth/horse-logic/notebooks
INFO     | Notebooks directory purpose: Jupyter notebooks for performing analysis
INFO     | Existing database file deleted: /Users/mjboothaus/code/github/databooth/horse-logic/data/Experiments_RPE_2023_Q4.ddb
INFO     | Database file path: /Users/mjboothaus/code/github/databooth/horse-logic/data/Experiments_RPE_2023_Q4.ddb
INFO     | Database purpose: Main project databases (outputs) by experiment type
INFO     | Project initialised (RPE): config defined in project_config.yaml

# project.display_file_with_highlighting("project_config.yaml")

Get subject info

subject = Subject(project)

subject_df = subject.get_subject_info()

INFO     | Loaded subject info from: /Users/mjboothaus/code/github/databooth/horse-logic/docs/from_CH/Cohort data for MB.xlsx
INFO     | Subject count: 22
INFO     | Sorted subject names:
    apollo, ash, atom, bonnie, clover, dodge, dougie, dusty, filly, freya, george, gio, jelly, molly, mowgli, myrtle, 
    nix, olive, pumba, smudge, teddy, yoshi

Log file reconciliation

Initial exclusions (rule-based)

INFO     | Logs: 267 log files in /Users/mjboothaus/code/github/databooth/horse-logic/data/results/zips/data_17Jan2020_email_hillydale_equine
INFO     | Included logfiles: 267
INFO     | Excluded logfiles: 114

INFO     | List of files to be excluded in analysis (initially). 114 rows exported to CSV.

results/RPE/excluded_files_initial.csv

Included log files

Initial list of log files that are included based on rules specified above.

INFO     | List of files to be included in analysis. 267 rows exported to CSV.

results/RPE/included_files.csv

Sessions summary by subject name

1. Atom: 9 session(s)

	original_filename	datetime	session_number	experiment_type	time_dff
Loading ITables v2.1.4 from the `init_notebook_mode` cell... (need help?)

2. Ash: 14 session(s)

	original_filename	datetime	session_number	experiment_type	time_dff
Loading ITables v2.1.4 from the `init_notebook_mode` cell... (need help?)

3. Mowgli: 10 session(s)

	original_filename	datetime	session_number	experiment_type	time_dff
Loading ITables v2.1.4 from the `init_notebook_mode` cell... (need help?)

4. Teddy: 12 session(s)

	original_filename	datetime	session_number	experiment_type	time_dff
Loading ITables v2.1.4 from the `init_notebook_mode` cell... (need help?)

5. Dodge: 10 session(s)

	original_filename	datetime	session_number	experiment_type	time_dff
Loading ITables v2.1.4 from the `init_notebook_mode` cell... (need help?)

6. Filly: 10 session(s)

	original_filename	datetime	session_number	experiment_type	time_dff
Loading ITables v2.1.4 from the `init_notebook_mode` cell... (need help?)

7. Dougie: 9 session(s)

	original_filename	datetime	session_number	experiment_type	time_dff
Loading ITables v2.1.4 from the `init_notebook_mode` cell... (need help?)

8. Bonnie: 74 session(s)

	original_filename	datetime	session_number	experiment_type	time_dff
Loading ITables v2.1.4 from the `init_notebook_mode` cell... (need help?)

9. Apollo: 8 session(s)

	original_filename	datetime	session_number	experiment_type	time_dff
Loading ITables v2.1.4 from the `init_notebook_mode` cell... (need help?)

10. Molly: 8 session(s)

	original_filename	datetime	session_number	experiment_type	time_dff
Loading ITables v2.1.4 from the `init_notebook_mode` cell... (need help?)

11. Jelly: 12 session(s)

	original_filename	datetime	session_number	experiment_type	time_dff
Loading ITables v2.1.4 from the `init_notebook_mode` cell... (need help?)

12. Smudge: 10 session(s)

	original_filename	datetime	session_number	experiment_type	time_dff
Loading ITables v2.1.4 from the `init_notebook_mode` cell... (need help?)

13. George: 9 session(s)

	original_filename	datetime	session_number	experiment_type	time_dff
Loading ITables v2.1.4 from the `init_notebook_mode` cell... (need help?)

14. Myrtle: 9 session(s)

	original_filename	datetime	session_number	experiment_type	time_dff
Loading ITables v2.1.4 from the `init_notebook_mode` cell... (need help?)

15. Yoshi: 8 session(s)

	original_filename	datetime	session_number	experiment_type	time_dff
Loading ITables v2.1.4 from the `init_notebook_mode` cell... (need help?)

16. Nix: 7 session(s)

	original_filename	datetime	session_number	experiment_type	time_dff
Loading ITables v2.1.4 from the `init_notebook_mode` cell... (need help?)

17. Gio: 8 session(s)

	original_filename	datetime	session_number	experiment_type	time_dff
Loading ITables v2.1.4 from the `init_notebook_mode` cell... (need help?)

18. Dusty: 9 session(s)

	original_filename	datetime	session_number	experiment_type	time_dff
Loading ITables v2.1.4 from the `init_notebook_mode` cell... (need help?)

19. Freya: 8 session(s)

	original_filename	datetime	session_number	experiment_type	time_dff
Loading ITables v2.1.4 from the `init_notebook_mode` cell... (need help?)

20. Olive: 3 session(s)

	original_filename	datetime	session_number	experiment_type	time_dff
Loading ITables v2.1.4 from the `init_notebook_mode` cell... (need help?)

21. Pumba: 11 session(s)

	original_filename	datetime	session_number	experiment_type	time_dff
Loading ITables v2.1.4 from the `init_notebook_mode` cell... (need help?)

22. Clover: 9 session(s)

	original_filename	datetime	session_number	experiment_type	time_dff
Loading ITables v2.1.4 from the `init_notebook_mode` cell... (need help?)

Session Summary - by Subject name and Session count

Subject number	Subject name	Session count
Loading ITables v2.1.4 from the `init_notebook_mode` cell... (need help?)

INFO     | Session overview (subject session counts). 22 rows exported to CSV.

results/RPE/session_overview.csv

original_filename	subject_name	experiment_type	session_number	datetime	time_diff
Loading ITables v2.1.4 from the `init_notebook_mode` cell... (need help?)

Specific analysis for Bonnie log files

Used some Bonnie log files for testing - not all experiments are valid to include

	original_filename	subject_name	experiment_type	session_number	datetime	time_diff
Loading ITables v2.1.4 from the `init_notebook_mode` cell... (need help?)

INFO     | Bonnie log file exclude list. 36 rows exported to CSV.

results/RPE/bonnie_excluded_files.csv

INFO     | File Experiment_2023-10-11T16:56:09.875604_bonnie_58_RPE-A.log has been included.
INFO     | File Experiment_2023-10-16T16:55:54.839279_bonnie_68_RPE-A.log has been included.
INFO     | File Experiment_2023-10-19T14:37:56.973616_bonnie_72_RPE-H.log has been included.
INFO     | # Files specifically included: 3

Write out final lists of excluded and included log files

INFO     | List of all the log files excluded. 150 rows exported to CSV.

results/RPE/all_excluded_files.csv

INFO     | Summary of ALL the experiments (both included and excluded). 267 rows exported to CSV.

results/RPE/Experiment_Summary_Oct2023_included.csv

INFO     | File list of all included log files. 234 rows exported to CSV.

results/RPE/all_included_files.csv

Key output file here is all_included_files.csv.

It defines all of the log files that will be included in the load of data to the DuckDB database for further analysis in the next notebook.

The next notebook to run is notebooks/logfile-to-database-RPE.ipynb.