Logfile Reconciliation - Reward Prediction Error (RPE)

Experimental data analysis (Oct / Nov 2023)

Authors
Affiliations

Cathrynne Henshall

Michael J. Booth

Published

Tue Nov 12, 2024 11:38 AM

Abstract

Reconciliation of the logfiles generated during the experiments to determine which log files to load to the database.

Purpose

This notebook serves as an exploratory tool for examining the log files produced during the horse behavioural experiments conducted in October and November 2023. It facilitates the experimentation with text parsing techniques on the files before they are imported into a DuckDB database.

The primary objectives are:

  1. To reconcile which log files should be included in or excluded from the analysis.
  2. To conduct experiments with regular expressions (regex) aimed at extracting pertinent data and fields from the log files.

Experiment details and naming conventions

There are two main types of Experiment:

  • Reward Prediction (RPE)
  • Cognitive Bias (CB)

The RPE have the following subtypes:

  • RPE-A : acquisition of response
  • RPE-H : habit formation
  • RPE-E : extinction of response
  • RPE-ER : extinction prior to reinstatement of response
  • RPE-R : reinstatement of response

RPE-type experiments

A new experiment type (RPE-ER) was created during the experiments which was not in the original specification.

Logfile Exclusion rules

Logfiles that are from test runs and also bad data need to be excluded from the analysis.

Rules are case-insensitive. Files which satisfy the following conditions are excluded:

  • All files with _TEST_ as the subject name
  • All files with _FRECKLE_ as the subject name
  • Some files with _BONNIE_ (14 legitimate files - TODO: CH to confirm details)
  • All files with test in the Comment field
  • Possibly some logs with very short run-times - TODO: CH to confirm the files that have been identified
  • All files with _OLIVE_ as the subject name - probably discard, treat as optional for now (TODO: CH to confirm treatment)

Problems with log file names during experiments

  • Appears that for greater than 20 trials (subjects?) the value in the filename was reported as NaN. e.g. see pumba experiments.
  • Other NaNs? There are 31 files with NaN (some are restarts) - TODO: Confirm with CH why were there restarts? Why not “new” experiment?

Time differences

For each trial we calculate the following time differences:

  1. Time delta: (touch datetime - start tone datetime)
  2. Time delta: (Next start datetime - dispense of pellets datetime)
  3. Time delta: (Dispense final pellets datetime - start tone datetime)

We use item 3 as cross check on the consistency of previous time deltas.

These calculated quantities are the same for all RPE-type experiments.

Setup project & directories

This is the init_notebook_mode cell from ITables v2.1.4
(you should not see this message - is your notebook trusted?)
INFO     | Data directory: /Users/mjboothaus/code/github/databooth/horse-logic/data
INFO     | Data directory purpose: Parent directory for raw and processed data
INFO     | Sql directory: /Users/mjboothaus/code/github/databooth/horse-logic/sql
INFO     | Sql directory purpose: Store SQL scripts
INFO     | Output directory: /Users/mjboothaus/code/github/databooth/horse-logic/notebooks/results/RPE
INFO     | Output directory purpose: Store output files and results by experiment type
INFO     | Logfiles directory: /Users/mjboothaus/code/github/databooth/horse-logic/data/results/zips/data_17Jan2020_email_hillydale_equine
INFO     | Logfiles directory purpose: Store for the raw log files
INFO     | Notebooks directory: /Users/mjboothaus/code/github/databooth/horse-logic/notebooks
INFO     | Notebooks directory purpose: Jupyter notebooks for performing analysis
INFO     | Existing database file deleted: /Users/mjboothaus/code/github/databooth/horse-logic/data/Experiments_RPE_2023_Q4.ddb
INFO     | Database file path: /Users/mjboothaus/code/github/databooth/horse-logic/data/Experiments_RPE_2023_Q4.ddb
INFO     | Database purpose: Main project databases (outputs) by experiment type
INFO     | Project initialised (RPE): config defined in project_config.yaml
# project.display_file_with_highlighting("project_config.yaml")

Get subject info

subject = Subject(project)

subject_df = subject.get_subject_info()
INFO     | Loaded subject info from: /Users/mjboothaus/code/github/databooth/horse-logic/docs/from_CH/Cohort data for MB.xlsx
INFO     | Subject count: 22
INFO     | Sorted subject names:
    apollo, ash, atom, bonnie, clover, dodge, dougie, dusty, filly, freya, george, gio, jelly, molly, mowgli, myrtle, 
    nix, olive, pumba, smudge, teddy, yoshi

Log file reconciliation

Initial exclusions (rule-based)

INFO     | Logs: 267 log files in /Users/mjboothaus/code/github/databooth/horse-logic/data/results/zips/data_17Jan2020_email_hillydale_equine
INFO     | Included logfiles: 267
INFO     | Excluded logfiles: 114
INFO     | List of files to be excluded in analysis (initially). 114 rows exported to CSV.

Included log files

Initial list of log files that are included based on rules specified above.

INFO     | List of files to be included in analysis. 267 rows exported to CSV.

Sessions summary by subject name

1. Atom: 9 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

2. Ash: 14 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

3. Mowgli: 10 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

4. Teddy: 12 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

5. Dodge: 10 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

6. Filly: 10 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

7. Dougie: 9 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

8. Bonnie: 74 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

9. Apollo: 8 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

10. Molly: 8 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

11. Jelly: 12 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

12. Smudge: 10 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

13. George: 9 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

14. Myrtle: 9 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

15. Yoshi: 8 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

16. Nix: 7 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

17. Gio: 8 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

18. Dusty: 9 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

19. Freya: 8 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

20. Olive: 3 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

21. Pumba: 11 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

22. Clover: 9 session(s)

original_filename datetime session_number experiment_type time_dff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

Session Summary - by Subject name and Session count

Subject number Subject name Session count
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)
INFO     | Session overview (subject session counts). 22 rows exported to CSV.
original_filename subject_name experiment_type session_number datetime time_diff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

Specific analysis for Bonnie log files

Used some Bonnie log files for testing - not all experiments are valid to include

original_filename subject_name experiment_type session_number datetime time_diff
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)
INFO     | Bonnie log file exclude list. 36 rows exported to CSV.
INFO     | File Experiment_2023-10-11T16:56:09.875604_bonnie_58_RPE-A.log has been included.
INFO     | File Experiment_2023-10-16T16:55:54.839279_bonnie_68_RPE-A.log has been included.
INFO     | File Experiment_2023-10-19T14:37:56.973616_bonnie_72_RPE-H.log has been included.
INFO     | # Files specifically included: 3

Write out final lists of excluded and included log files

INFO     | List of all the log files excluded. 150 rows exported to CSV.
INFO     | Summary of ALL the experiments (both included and excluded). 267 rows exported to CSV.
INFO     | File list of all included log files. 234 rows exported to CSV.

Key output file here is all_included_files.csv.

It defines all of the log files that will be included in the load of data to the DuckDB database for further analysis in the next notebook.

The next notebook to run is notebooks/logfile-to-database-RPE.ipynb.