Database apply corrections - Cognitive Bias (CB)

Experimental data analysis (Oct / Nov 2023)

Authors
Affiliations

Cathrynne Henshall

Michael J. Booth

Published

Tue Nov 12, 2024 11:38 AM

Abstract

Apply corrections to create Corrected views prior to analysis.

Purpose of This Notebook

This notebook serves to create the Corrected views.

We create some interim views, before creating the final outputs, new views:

  • ExperimentCBsCorrected (the ExperimentCBs table with additions)
  • TrialCBsCorrected (the TrialCBs table with corrections/additions)
  • ResponseCBsCorrected (the ResponseCBs table with corrections/additions)
  • EventCBsCorrected (the EventCBs table with corrections/additions)

These Corrected versions of the data are the views that should be used for all of the analysis.

Given that only partial (key) data was captured in the Corrections workbook it is not possible to populate all values within these new views. For example details like Start and Finish times are not available, just response times. In these case all such values are represented as NULL in the views.

Setup project & directories

This is the init_notebook_mode cell from ITables v2.1.4
(you should not see this message - is your notebook trusted?)
INFO     | Data directory: /Users/mjboothaus/code/github/databooth/horse-logic/data
INFO     | Data directory purpose: Parent directory for raw and processed data
INFO     | Sql directory: /Users/mjboothaus/code/github/databooth/horse-logic/sql
INFO     | Sql directory purpose: Store SQL scripts
INFO     | Output directory: /Users/mjboothaus/code/github/databooth/horse-logic/notebooks/results/CB
INFO     | Output directory purpose: Store output files and results by experiment type
INFO     | Logfiles directory: /Users/mjboothaus/code/github/databooth/horse-logic/data/results/zips/cb_data
INFO     | Logfiles directory purpose: Store for the raw log files
INFO     | Notebooks directory: /Users/mjboothaus/code/github/databooth/horse-logic/notebooks
INFO     | Notebooks directory purpose: Jupyter notebooks for performing analysis
INFO     | Database file path: /Users/mjboothaus/code/github/databooth/horse-logic/data/Experiments_CB_2023_Q4.ddb
INFO     | Database purpose: Main project databases (outputs) by experiment type
INFO     | Project initialised (CB): config defined in project_config.yaml
INFO     | GitHub repo: https://github.com/DataBooth/horse-logic
INFO     | Connected to database: /Users/mjboothaus/code/github/databooth/horse-logic/data/Experiments_CB_2023_Q4.ddb
INFO     | DuckDB version: v1.1.3
INFO     | Dropped 0 non-default tables.

Create / compare Corrected views of ExperimentCBs, TrialCBs, ResponseCBs and EventCBs tables

ExperimentCBsCorrected table

  • Purpose: Combines existing experiments with new experiments from corrections.
  • Approach: Uses a UNION ALL to combine existing and new experiments.
  • Structure: View that includes all original experiments and adds new ones from the Corrections table.
  • Features: Assigns new ExperimentIDs to added experiments and includes a DataSource column.

Note:

  • At this stage there is only one new experiment (Yoshi with no logfile).

AddExperiments view

First we determine which Experiments need to be added from the Corrections table.

INFO     | Exported table: AddExperiments. 1 total rows exported to XLSX across 1 sheets.

Table: AddExperiments

Cohort ExperimentDate SessionType SubjectName LogFilename Comment DataSource
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

Low Cardinality columns for AddExperiments

column_name distinct_count unique_values_with_counts
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)
INFO     | Exported table: ReformattedAddExperiments. 1 total rows exported to XLSX across 1 sheets.

Table: ReformattedAddExperiments

ExperimentID Cohort SubjectName SubjectNumber SessionNumber SessionType ExperimentType Direction Comment DateTime LogFilename Parameters DataSource
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

Low Cardinality columns for ReformattedAddExperiments

column_name distinct_count unique_values_with_counts
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)
INFO     | Exported query results: ExperimentCBsCorrected without Parameters. 184 total rows exported to XLSX across 1 sheets.

Query Results: ExperimentCBsCorrected without Parameters

ExperimentID Cohort SubjectName SubjectNumber SessionNumber SessionType ExperimentType Direction Comment DateTime LogFileName DataSource
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)
ExperimentID Cohort SubjectName SubjectNumber SessionNumber SessionType ExperimentType Direction Comment DateTime LogFileName Parameters DataSource
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

Mon_11Nov_Item1: RESOLUTION REQUIRED: ExperimentCBsCorrected

  • SubjectNumber and Direction have been inserted (from Subject info) for additional Yoshi experiment.
  • Can any of the additional fields e.g. SessionNumber, ExperimentType (e.g. from Comment), Parameters be populated or assuming NULL?

For completeness, let’s examine the additional Yoshi experiment (with no logfile) and the trial details (additions to be made) to the TrialCBs.

Cohort ExperimentDate SessionType SubjectName TrialNumber TrialType ResponseTime ResponseType Comment LogFilename CorrectTable CorrectField CorrectValue
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

So there are 20 (new) trials to add as expected.

TrialCBsCorrected table

  • Purpose: Combines existing trials with new trials from corrections and applies corrections.
  • Approach: Uses CTEs for new trials and existing trials, then combines them.
  • Structure: View that includes all original trials, adds new ones, and applies corrections.
  • Features: Assigns new TrialIDs to added trials and includes a CorrectedFlag column.

AddTrials view

INFO     | Exported table: AddTrials. 20 total rows exported to XLSX across 1 sheets.

Table: AddTrials

ExperimentID TrialNumber TrialSubType ResponseType CriterionType CriterionCount TrialStartTime TrialEndTime CorrectedFlag
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

Low Cardinality columns for AddTrials

column_name distinct_count unique_values_with_counts
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)
INFO     | Exported table: CorrectTrials. 3171 total rows exported to XLSX across 1 sheets.

Table: CorrectTrials

TrialID ExperimentID TrialNumber TrialSubType ResponseType CriterionType CriterionCount TrialStartTime TrialEndTime CorrectedFlag
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

Low Cardinality columns for CorrectTrials

column_name distinct_count unique_values_with_counts
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

So there are 50 trials (1.5% of trials) where the ResponseType has (actually) been corrected. Note that only ResponseType can be corrected in the Trials views, ResponseTime is corrected in the Response views.

INFO     | Exported table: TrialCBs. 3171 total rows exported to XLSX across 1 sheets.

Table: TrialCBs

TrialID ExperimentID TrialNumber ResponseType TrialSubType CriterionType CriterionCount TrialStartTime TrialEndTime
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

Low Cardinality columns for TrialCBs

column_name distinct_count unique_values_with_counts
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)
INFO     | Exported table: TrialCBsCorrected. 3191 total rows exported to XLSX across 1 sheets.

Table: TrialCBsCorrected

TrialID ExperimentID TrialNumber TrialSubType ResponseType CriterionType CriterionCount TrialStartTime TrialEndTime CorrectedFlag
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

Low Cardinality columns for TrialCBsCorrected

column_name distinct_count unique_values_with_counts
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)
┌──────────────────┐
│ nTrialsCorrected │
│      int64       │
├──────────────────┤
│               46 │
└──────────────────┘
┌─────────────────┐
│ nTrialsOriginal │
│      int64      │
├─────────────────┤
│            3145 │
└─────────────────┘

Diffs - TrialCBs vs TrialCBsCorrected

┌────────────────────────────┬───────┐
│         Difference         │ Count │
│          varchar           │ int64 │
├────────────────────────────┼───────┤
│ Additions (new rows)       │    20 │
│ Correction to ResponseType │    46 │
│ Existing (same rows)       │  3125 │
│ Total TrialCBs             │  3171 │
│ Total TrialCBsCorrected    │  3191 │
└────────────────────────────┴───────┘
INFO     | Exported table: CorrectedResponses. 1735 total rows exported to XLSX across 1 sheets.

Table: CorrectedResponses

ResponseID TrialID ExperimentID TrialNumber ResponseTime EventDateTime CorrectedFlag
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

Low Cardinality columns for CorrectedResponses

column_name distinct_count unique_values_with_counts
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

So this suggests a total number of corrected ReponseTime of 144 (change of value) and one set to NULL (8% of the overall values).

ResponseCBsCorrected table

  • Purpose: Combines existing responses with new responses from corrections and applies corrections.
  • Approach: Similar to TrialCBsCorrected, uses CTEs for new and existing responses.
  • Structure: View that includes all original responses, adds new ones, and applies corrections.
  • Features: Assigns new ResponseIDs to added responses and includes a CorrectedFlag column.
INFO     | Exported query results: ResponseCBsCorrected_with_context_fields. 1755 total rows exported to XLSX across 1 sheets.

Query Results: ResponseCBsCorrected_with_context_fields

SubjectName SubjectNumber ExperimentID TrialID ResponseType ResponseID ResponseTime EventDateTime CorrectedFlag
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

Diff - ResponseCBs vs ResponseCBsCorrected

┌──────────────────────────┬───────┐
│        Difference        │ Count │
│         varchar          │ int64 │
├──────────────────────────┼───────┤
│ Additions (new rows)     │    20 │
│ Corrected response times │   144 │
│ In Both, Identical       │  1591 │
└──────────────────────────┴───────┘

TrialCBsCorrected_compare_TrialCBs view

project.view_and_export_table("TrialCBsCorrected_compare_TrialCBs")
INFO     | Exported table: TrialCBsCorrected_compare_TrialCBs. 3191 total rows exported to XLSX across 1 sheets.

Table: TrialCBsCorrected_compare_TrialCBs

LogFileName ExperimentID TrialID TrialNumber OriginalResponseType CorrectedResponseType ResponseTypeCorrectedFlag OriginalResponseTime CorrectedResponseTime ResponseTimeCorrectedFlag TrialSubType CriterionType CriterionCount TrialStartTime TrialEndTime
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

Low Cardinality columns for TrialCBsCorrected_compare_TrialCBs

column_name distinct_count unique_values_with_counts
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

TrialCBsCorrected with additional fields for context

In addition to all the fields in the TrialCBsCorrected view, we have added SubjectName, SubjectNumber, and both the original and corrected ResponseTime.

INFO     | Exported query results: TrialCBsCorrected_with_context_fields. 3191 total rows exported to XLSX across 1 sheets.

Query Results: TrialCBsCorrected_with_context_fields

TrialID ExperimentID SubjectName SubjectNumber TrialNumber TrialSubType ResponseType CriterionType CriterionCount TrialStartTime TrialEndTime OriginalResponseTime CorrectedResponseTime CorrectedFlag
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

Comparison: ResponseCBsCorrected vs. ResponseCBs (with additional fields for context)

The view joins the corrected views (ResponseCBsCorrected, TrialCBsCorrected, and ExperimentCBsCorrected) with the original tables (TrialCBs and ResponseCBs) to show both the original and corrected values side by side (specifically for ResponseType and ResponseTime). This allows for easy comparison and verification of the corrections made.

INFO     | Exported table: ResponseCBsCorrected_compare_ResponseCBs. 1755 total rows exported to XLSX across 1 sheets.

Table: ResponseCBsCorrected_compare_ResponseCBs

LogFileName SubjectName SubjectNumber ExperimentID TrialID TrialNumber OriginalResponseType CorrectedResponseType TrialCorrectedFlag ResponseID EventDateTime OriginalResponseTime CorrectedResponseTime ResponseCorrectedFlag
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

Low Cardinality columns for ResponseCBsCorrected_compare_ResponseCBs

column_name distinct_count unique_values_with_counts
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

Check that the additions from the Corrections workbook align for the trials and responses.

LogFileName SubjectName SubjectNumber ExperimentID TrialID TrialNumber OriginalResponseType CorrectedResponseType TrialCorrectedFlag ResponseID EventDateTime OriginalResponseTime CorrectedResponseTime ResponseCorrectedFlag
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

Summary of changes to Responses

The following queries looks the the rows within the ResponseCBsCorrected_compare_ResponseCBs where one or both of the ResponseType or ResponseTime have changed.

INFO     | Exported query results: Summary of changes to Responses. 164 total rows exported to XLSX across 1 sheets.

Query Results: Summary of changes to Responses

LogFileName SubjectName SubjectNumber ExperimentID TrialID TrialNumber OriginalResponseType CorrectedResponseType TrialCorrectedFlag ResponseID EventDateTime OriginalResponseTime CorrectedResponseTime ResponseCorrectedFlag
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

EventCBsCorrected table

  • Purpose: Combines existing events with placeholder events for new trials.
  • Approach: Uses CTEs for new trial events and existing events, then combines them.
  • Structure: View that includes all original events and adds placeholder events for new trials.
  • Features: Assigns new EventIDs to added events and includes a CorrectedFlag column.
INFO     | Exported table: EventCBsCorrected. 6181 total rows exported to XLSX across 1 sheets.

Table: EventCBsCorrected

EventID TrialID EventType EventTime ElapsedTime CorrectedFlag
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

Low Cardinality columns for EventCBsCorrected

column_name distinct_count unique_values_with_counts
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?)

Diff - EventCBs vs EventCBsCorrected

┌──────────────────────┬───────┐
│      Difference      │ Count │
│       varchar        │ int64 │
├──────────────────────┼───────┤
│ Additions (new rows) │    20 │
│ In Both, Identical   │  6161 │
└──────────────────────┴───────┘

Summary of CB database tables / views

Name                           Type            Size            Dependencies
--------------------------------------------------------------------------------
ActualCorrections              VIEW            377 x 19        CorrectionWithExistingValues, Corrections
AddExperiments                 VIEW            1 x 7           Corrections, ExperimentCBs
AddTrials                      VIEW            20 x 9          AddExperiments, Corrections, ExperimentCBs, ExperimentCBsCorrected
CorrectTrials                  VIEW            3171 x 10       Corrections, CorrectionsSplit, ExperimentCBs, ExperimentCBsCorrected, TrialCBs
CorrectedResponses             VIEW            1735 x 7        Corrections, CorrectionsSplit, ExperimentCBs, ExperimentCBsCorrected, ResponseCBs, TrialCBs, TrialCBsCorrected
CorrectionWithExistingValues   VIEW            612 x 19        Corrections, CorrectionsSplit, ExperimentCBs, ResponseCBs, TrialCBs
Corrections                    BASE TABLE      306 x 13        None
CorrectionsSplit               VIEW            612 x 16        Corrections
EventCBs                       BASE TABLE      6161 x 5        None
EventCBsCorrected              VIEW            6181 x 6        Corrections, EventCBs, TrialCBs, TrialCBsCorrected
ExperimentCBs                  BASE TABLE      183 x 12        None
ExperimentCBsCorrected         VIEW            184 x 13        AddExperiments, ExperimentCBs, ReformattedAddExperiments
ReformattedAddExperiments      VIEW            1 x 13          AddExperiments, ExperimentCBs, SubjectCBs
ResponseCBs                    BASE TABLE      1735 x 4        None
ResponseCBsCorrected           VIEW            1755 x 5        AddExperiments, Corrections, CorrectionsSplit, ExperimentCBs, ExperimentCBsCorrected, ResponseCBs, TrialCBs, TrialCBsCorrected
ResponseCBsCorrected_compare_ResponseCBs VIEW            1755 x 14       ExperimentCBs, ExperimentCBsCorrected, ResponseCBs, ResponseCBsCorrected, TrialCBs, TrialCBsCorrected
SubjectCBs                     BASE TABLE      21 x 5          None
TrialCBs                       BASE TABLE      3171 x 9        None
TrialCBsCorrected              VIEW            3191 x 10       AddTrials, Corrections, CorrectionsSplit, ExperimentCBs, ExperimentCBsCorrected, TrialCBs
TrialCBsCorrected_compare_TrialCBs VIEW            3191 x 15       ExperimentCBs, ExperimentCBsCorrected, ResponseCBs, ResponseCBsCorrected, TrialCBs, TrialCBsCorrected
┌───────────────┬─────────────┐
│  dependency   │  dependant  │
│    varchar    │   varchar   │
├───────────────┼─────────────┤
│ ExperimentCBs │ TrialCBs    │
│ TrialCBs      │ ResponseCBs │
│ TrialCBs      │ EventCBs    │
└───────────────┴─────────────┘
┌────────────────────────┬──────────────┬─────────────┬────────────┬───────────────┬───────────┬─────────┬───────────────────────┬──────────┬───────────┬─────────────────┬────────────────┬──────────────┬─────────────┬────────────────────────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│     database_name      │ database_oid │ schema_name │ schema_oid │  table_name   │ table_oid │ comment │         tags          │ internal │ temporary │ has_primary_key │ estimated_size │ column_count │ index_count │ check_constraint_count │                                                                                                                                                      sql                                                                                                                                                       │
│        varchar         │    int64     │   varchar   │   int64    │    varchar    │   int64   │ varchar │ map(varchar, varchar) │ boolean  │  boolean  │     boolean     │     int64      │    int64     │    int64    │         int64          │                                                                                                                                                    varchar                                                                                                                                                     │
├────────────────────────┼──────────────┼─────────────┼────────────┼───────────────┼───────────┼─────────┼───────────────────────┼──────────┼───────────┼─────────────────┼────────────────┼──────────────┼─────────────┼────────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Experiments_CB_2023_Q4 │         1146 │ main        │       1148 │ Corrections   │      1153 │ NULL    │ {}                    │ false    │ false     │ false           │            306 │           13 │           0 │                      0 │ CREATE TABLE Corrections(Cohort VARCHAR, ExperimentDate DATE, SessionType VARCHAR, SubjectName VARCHAR, TrialNumber INTEGER, TrialType VARCHAR, ResponseTime DOUBLE, ResponseType VARCHAR, "Comment" VARCHAR, LogFilename VARCHAR, CorrectTable VARCHAR, CorrectField VARCHAR, CorrectValue VARCHAR);          │
│ Experiments_CB_2023_Q4 │         1146 │ main        │       1148 │ EventCBs      │      1186 │ NULL    │ {}                    │ false    │ false     │ true            │           6161 │            5 │           2 │                      0 │ CREATE TABLE EventCBs(EventID INTEGER PRIMARY KEY, TrialID INTEGER, EventType VARCHAR, EventTime TIMESTAMP, ElapsedTime DOUBLE, FOREIGN KEY (TrialID) REFERENCES TrialCBs(TrialID));                                                                                                                           │
│ Experiments_CB_2023_Q4 │         1146 │ main        │       1148 │ ExperimentCBs │      1172 │ NULL    │ {}                    │ false    │ false     │ true            │            183 │           12 │           1 │                      0 │ CREATE TABLE ExperimentCBs(ExperimentID INTEGER PRIMARY KEY, Cohort VARCHAR, SubjectName VARCHAR, SubjectNumber INTEGER, SessionNumber INTEGER, SessionType VARCHAR, ExperimentType VARCHAR, Direction VARCHAR, "Comment" VARCHAR, DateTime TIMESTAMP, LogFileName VARCHAR, Parameters JSON);                  │
│ Experiments_CB_2023_Q4 │         1146 │ main        │       1148 │ ResponseCBs   │      1205 │ NULL    │ {}                    │ false    │ false     │ true            │           1735 │            4 │           2 │                      0 │ CREATE TABLE ResponseCBs(ResponseID INTEGER PRIMARY KEY, TrialID INTEGER, ResponseTime DOUBLE, EventDateTime TIMESTAMP, FOREIGN KEY (TrialID) REFERENCES TrialCBs(TrialID));                                                                                                                                   │
│ Experiments_CB_2023_Q4 │         1146 │ main        │       1148 │ SubjectCBs    │      1165 │ NULL    │ {}                    │ false    │ false     │ false           │             21 │            5 │           0 │                      0 │ CREATE TABLE SubjectCBs(SubjectName VARCHAR, SubjectNumber INTEGER, Cohort VARCHAR, Direction VARCHAR, TotalExperiments BIGINT);                                                                                                                                                                               │
│ Experiments_CB_2023_Q4 │         1146 │ main        │       1148 │ TrialCBs      │      1206 │ NULL    │ {}                    │ false    │ false     │ true            │           3171 │            9 │           2 │                      0 │ CREATE TABLE TrialCBs(TrialID INTEGER PRIMARY KEY, ExperimentID INTEGER, TrialNumber INTEGER, ResponseType VARCHAR, TrialSubType VARCHAR, CriterionType VARCHAR, CriterionCount INTEGER, TrialStartTime TIMESTAMP, TrialEndTime TIMESTAMP, FOREIGN KEY (ExperimentID) REFERENCES ExperimentCBs(ExperimentID)); │
└────────────────────────┴──────────────┴─────────────┴────────────┴───────────────┴───────────┴─────────┴───────────────────────┴──────────┴───────────┴─────────────────┴────────────────┴──────────────┴─────────────┴────────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
┌─────────┬───────┬──────────┬────────────┬──────────┬─────────────┬─────────┐
│ classid │ objid │ objsubid │ refclassid │ refobjid │ refobjsubid │ deptype │
│  int64  │ int64 │  int32   │   int64    │  int64   │    int32    │ varchar │
├─────────┼───────┼──────────┼────────────┼──────────┼─────────────┼─────────┤
│       0 │  1206 │        0 │          0 │     1186 │           0 │ n       │
│       0 │  1206 │        0 │          0 │     1205 │           0 │ n       │
│       0 │  1172 │        0 │          0 │     1206 │           0 │ n       │
└─────────┴───────┴──────────┴────────────┴──────────┴─────────────┴─────────┘
INFO     | Database connection closed