WeMWish commited on
Commit
557ed35
·
1 Parent(s): 2ad0e14

Fix infinite loop bug in literature search system

Browse files

- Fix JSON detection regex in GenerationAgent (double backslash escaping issue)
- Implement structure-based literature data detection instead of text patterns
- Add operation tracking to prevent duplicate code execution
- Add comprehensive debugging logs for conversation history flow
- Implement proper loop control with attempt limits

.dockerignore CHANGED
@@ -1,40 +1,40 @@
1
- .git
2
- .Rproj.user
3
- .Rhistory
4
- .RData
5
- .Ruserdata
6
-
7
- # Ignore API key file
8
- # api_key.txt
9
-
10
- # Ignore Python cache
11
- __pycache__/
12
- *.pyc
13
- *.pyo
14
- *.pyd
15
-
16
- # Ignore R cache/build artifacts if any
17
- *.rds
18
- *.Renviron
19
-
20
- # OS-specific files
21
- .DS_Store
22
- Thumbs.db
23
-
24
- # IDE specific folders
25
- .vscode/
26
- .idea/
27
-
28
- # Logs and temporary files
29
- traces/
30
- *.log
31
- temp/
32
-
33
- # Ignore local R/reticulate config
34
- .Rprofile
35
-
36
- # Ignore virtual environments
37
- venv/
38
- .venv/
39
- ENV/
40
  env/
 
1
+ .git
2
+ .Rproj.user
3
+ .Rhistory
4
+ .RData
5
+ .Ruserdata
6
+
7
+ # Ignore API key file
8
+ # api_key.txt
9
+
10
+ # Ignore Python cache
11
+ __pycache__/
12
+ *.pyc
13
+ *.pyo
14
+ *.pyd
15
+
16
+ # Ignore R cache/build artifacts if any
17
+ *.rds
18
+ *.Renviron
19
+
20
+ # OS-specific files
21
+ .DS_Store
22
+ Thumbs.db
23
+
24
+ # IDE specific folders
25
+ .vscode/
26
+ .idea/
27
+
28
+ # Logs and temporary files
29
+ traces/
30
+ *.log
31
+ temp/
32
+
33
+ # Ignore local R/reticulate config
34
+ .Rprofile
35
+
36
+ # Ignore virtual environments
37
+ venv/
38
+ .venv/
39
+ ENV/
40
  env/
.gitignore CHANGED
@@ -1,12 +1,11 @@
1
- __pycache__/
2
- api_key.txt
3
- # Ignore local R/reticulate config
4
- .Rprofile
5
-
6
- # Ignore virtual environments
7
- venv/
8
- .venv/
9
- ENV/
10
- env/
11
- *.md
12
-
 
1
+ __pycache__/
2
+ api_key.txt
3
+ # Ignore local R/reticulate config
4
+ .Rprofile
5
+
6
+ # Ignore virtual environments
7
+ venv/
8
+ .venv/
9
+ ENV/
10
+ env/
11
+
 
CHANGELOG.md CHANGED
@@ -1,5 +1,32 @@
1
  # TaijiChat Performance Optimization Changelog
2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  All notable changes to the TaijiChat performance optimization project are documented in this file.
4
 
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 
1
  # TaijiChat Performance Optimization Changelog
2
 
3
+ ## [Unreleased]
4
+
5
+ ### Added
6
+ - Literature search toggle button in chat interface
7
+ - Users can now explicitly enable/disable external literature search
8
+ - Button placed below chat input for easy access
9
+ - Default state: external literature search disabled
10
+ - Visual feedback with active/inactive button states
11
+
12
+ ### Changed
13
+ - Replaced automatic literature search behavior with user-controlled toggle
14
+ - Literature search now defaults to disabled unless explicitly enabled by user
15
+
16
+ ### Fixed
17
+ - Fixed infinite loop bug in literature search functionality
18
+ - Fixed JSON detection regex in GenerationAgent (corrected escaping)
19
+ - Fixed literature search data detection using structure-based analysis
20
+ - Added operation tracking to prevent duplicate executions
21
+ - Added comprehensive debugging for conversation history and loop detection
22
+
23
+ ### Removed
24
+ - Legacy literature confirmation dialog system
25
+ - Automatic LLM-based detection of literature search intent
26
+ - Post-query literature confirmation prompts
27
+
28
+ ---
29
+
30
  All notable changes to the TaijiChat performance optimization project are documented in this file.
31
 
32
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
CLAUDE.md ADDED
@@ -0,0 +1,157 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CLAUDE.md
2
+
3
+ This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4
+
5
+ ## Project Overview
6
+
7
+ TaijiChat is a Shiny web application that combines R and Python to provide an interactive chat interface for analyzing transcription factor data from T cell states research. The application uses a multi-agent architecture with OpenAI GPT models to generate insights and visualizations from genomics datasets.
8
+
9
+ ## Key Architecture
10
+
11
+ ### Multi-Agent System
12
+ The application uses a specialized agent architecture for handling user queries:
13
+
14
+ - **ManagerAgent** (`agents/manager_agent.py`): Central orchestrator that manages conversation history, file uploads, and coordinates between specialized agents
15
+ - **GenerationAgent** (`agents/generation_agent.py`): Creates execution plans using a structured 13-step reasoning process
16
+ - **SupervisorAgent** (`agents/supervisor_agent.py`): Reviews generated code for safety and compliance before execution
17
+ - **ExecutorAgent** (`agents/executor_agent.py`): Executes approved Python code in a restricted environment
18
+
19
+ ### Technology Stack
20
+ - **R (Shiny)**: Frontend web interface and server logic
21
+ - **Python**: Backend agents and data processing tools
22
+ - **reticulate**: R-Python integration bridge
23
+ - **OpenAI API**: Powers the intelligent agents
24
+ - **Docker**: Containerization for deployment
25
+
26
+ ### Data Sources
27
+ The application works with transcription factor research data stored in:
28
+ - `/www/tablePagerank/`: TF PageRank scores for different cell states
29
+ - `/www/waveanalysis/`: Wave analysis data and visualizations
30
+ - `/www/TFcorintextrm/`: TF correlation data and network graphs
31
+ - `/www/tfcommunities/`: Community analysis results
32
+
33
+ ## Development Commands
34
+
35
+ ### Running the Application
36
+
37
+ **Docker (Recommended):**
38
+ ```bash
39
+ docker build -t taijichat .
40
+ docker run -p 7860:7860 taijichat
41
+ ```
42
+
43
+ **Local R Development:**
44
+ ```r
45
+ # Ensure Python environment is configured in ui.R first
46
+ shiny::runApp('.', host='0.0.0.0', port=7860)
47
+ ```
48
+
49
+ ### Environment Setup
50
+
51
+ **API Key Configuration:**
52
+ - Set `OPENAI_API_KEY` environment variable, OR
53
+ - Create `api_key.txt` file in project root
54
+
55
+ **Python Environment:**
56
+ Configure reticulate in `ui.R` by uncommenting one of:
57
+ ```r
58
+ # Option 1: Python executable path
59
+ reticulate::use_python("/path/to/python", required = TRUE)
60
+
61
+ # Option 2: Virtual environment
62
+ reticulate::use_virtualenv("venv_name", required = TRUE)
63
+
64
+ # Option 3: Conda environment
65
+ reticulate::use_condaenv("conda_env_name", required = TRUE)
66
+ ```
67
+
68
+ **Install Python Dependencies:**
69
+ ```bash
70
+ pip install -r requirements.txt
71
+ ```
72
+
73
+ ### Performance Features
74
+
75
+ **Async Processing (Default):**
76
+ - Set `TAIJICHAT_USE_ASYNC=TRUE` to enable async agents (default)
77
+ - Set `TAIJICHAT_USE_ASYNC=FALSE` to use synchronous agents
78
+
79
+ **Cache Management:**
80
+ ```r
81
+ # Check cache statistics
82
+ reticulate::py_run_string("
83
+ from agents.smart_cache import get_cache_stats
84
+ print('Cache Stats:', get_cache_stats())
85
+ ")
86
+ ```
87
+
88
+ ## Key Implementation Details
89
+
90
+ ### Agent Tool System
91
+ All data analysis functions are centralized in `tools/agent_tools.py`:
92
+ - Excel file discovery and schema caching
93
+ - Literature search across multiple sources (Semantic Scholar, PubMed, ArXiv)
94
+ - Image processing and visualization
95
+ - TF ranking and analysis functions
96
+
97
+ ### Safety and Security
98
+ - **Restricted Execution**: Python code runs in limited scope with only approved modules
99
+ - **Code Review**: SupervisorAgent validates all generated code before execution
100
+ - **Sandboxed Environment**: Only predefined tools and functions are accessible
101
+
102
+ ### R-Python Integration
103
+ - Uses reticulate for bidirectional communication
104
+ - R callbacks enable real-time progress updates from Python agents
105
+ - Shared conversation history between R frontend and Python backend
106
+
107
+ ### File Management
108
+ - Supports PDF upload with automatic conversion to images
109
+ - File IDs track uploaded documents throughout conversation
110
+ - Images stored in `/www/` with organized subdirectories
111
+
112
+ ## Important Considerations
113
+
114
+ ### Data Handling
115
+ - **Pre-ranked Tables**: Never re-sort TF ranking data - tables come pre-ranked by importance
116
+ - **Path Management**: All file paths are relative to project root via `BASE_WWW_PATH`
117
+ - **Caching**: 5-minute TTL with 100MB memory limit for performance
118
+
119
+ ### Code Generation Rules
120
+ - Only use functions from `tools.agent_tools` module
121
+ - No direct file system access outside `/www/` directory
122
+ - No external imports beyond approved modules in executor environment
123
+ - All generated code must pass SupervisorAgent safety review
124
+
125
+ ### UI Integration
126
+ - Chat interface uses custom JavaScript for real-time updates
127
+ - Lazy loading implemented for large image datasets
128
+ - Progress streaming shows agent reasoning steps to users
129
+
130
+ ## Troubleshooting
131
+
132
+ **Common Issues:**
133
+ - **reticulate errors**: Verify Python environment configuration in `ui.R`
134
+ - **Import failures**: Ensure all requirements are installed in configured Python environment
135
+ - **API errors**: Check `OPENAI_API_KEY` is set correctly
136
+ - **Performance issues**: Enable async mode with `TAIJICHAT_USE_ASYNC=TRUE`
137
+
138
+ **Asset Optimization:**
139
+ - Images optimized to 49% of original size for faster loading
140
+ - Backup of original assets available in `www_backup_original/`
141
+
142
+ ## Literature Search Toggle Feature
143
+
144
+ ### **Overview**
145
+ TaijiChat includes a toggle button that allows users to control external literature search functionality. This replaces the previous automatic detection and confirmation system.
146
+
147
+ ### **User Interface**
148
+ - **Location**: Literature toggle button appears below the chat input area
149
+ - **Default State**: External literature search is **disabled** by default
150
+ - **Visual Feedback**: Button changes color and text to indicate enabled/disabled state
151
+ - **Controls**: Click to enable/disable external literature search
152
+
153
+ ### **Technical Implementation**
154
+ - **Frontend**: Button state managed via JavaScript and CSS styling
155
+ - **Backend**: Literature preference passed from R to Python agents
156
+ - **Agent Integration**: `ManagerAgent.process_single_query_with_preferences()` method handles literature control
157
+ - **Internal Data**: Paper-based analysis (internal dataset) remains always enabled
Dockerfile CHANGED
@@ -1,48 +1,48 @@
1
- # Base image with R and Python
2
- FROM rocker/r-ver:4.3.2
3
-
4
- # Set the Python executable path for reticulate
5
- ENV RETICULATE_PYTHON /usr/bin/python3
6
-
7
- # Install system dependencies
8
- RUN apt-get update && apt-get install -y \
9
- python3-pip \
10
- python3-dev \
11
- python3-venv \
12
- libpng-dev \
13
- libxml2-dev \
14
- libssl-dev \
15
- libcurl4-openssl-dev \
16
- && apt-get clean && rm -rf /var/lib/apt/lists/*
17
-
18
- # Install R packages
19
- # Added .libPaths() to ensure installation in the main library site
20
- RUN R -e "print(.libPaths()); install.packages(c('shiny', 'readxl', 'DT', 'dplyr', 'reticulate', 'shinythemes', 'png', 'shinyjs', 'digest'), repos='http://cran.rstudio.com/', lib=.libPaths()[1])"
21
-
22
- # Verify reticulate installation
23
- RUN R -e "if (!requireNamespace('reticulate', quietly = TRUE)) { stop('reticulate package not found after installation') } else { print(paste('reticulate version:', packageVersion('reticulate'))) }"
24
-
25
- # Verify png installation
26
- RUN R -e "if (!requireNamespace('png', quietly = TRUE)) { stop('png package not found after installation') } else { print(paste('png version:', packageVersion('png'))) }"
27
-
28
- # Verify shinyjs installation
29
- RUN R -e "if (!requireNamespace('shinyjs', quietly = TRUE)) { stop('shinyjs package not found after installation') } else { print(paste('shinyjs version:', packageVersion('shinyjs'))) }"
30
-
31
- # Verify digest installation
32
- RUN R -e "if (!requireNamespace('digest', quietly = TRUE)) { stop('digest package not found after installation') } else { print(paste('digest version:', packageVersion('digest'))) }"
33
-
34
- # Install Python packages
35
- COPY requirements.txt /app/requirements.txt
36
- RUN pip3 install --no-cache-dir -r /app/requirements.txt
37
-
38
- # Create app directory
39
- WORKDIR /app
40
-
41
- # Copy application files
42
- COPY . /app
43
-
44
- # Expose port
45
- EXPOSE 7860
46
-
47
- # Run the application
48
  CMD ["R", "-e", "shiny::runApp('/app', host='0.0.0.0', port=7860)"]
 
1
+ # Base image with R and Python
2
+ FROM rocker/r-ver:4.3.2
3
+
4
+ # Set the Python executable path for reticulate
5
+ ENV RETICULATE_PYTHON /usr/bin/python3
6
+
7
+ # Install system dependencies
8
+ RUN apt-get update && apt-get install -y \
9
+ python3-pip \
10
+ python3-dev \
11
+ python3-venv \
12
+ libpng-dev \
13
+ libxml2-dev \
14
+ libssl-dev \
15
+ libcurl4-openssl-dev \
16
+ && apt-get clean && rm -rf /var/lib/apt/lists/*
17
+
18
+ # Install R packages
19
+ # Added .libPaths() to ensure installation in the main library site
20
+ RUN R -e "print(.libPaths()); install.packages(c('shiny', 'readxl', 'DT', 'dplyr', 'reticulate', 'shinythemes', 'png', 'shinyjs', 'digest'), repos='http://cran.rstudio.com/', lib=.libPaths()[1])"
21
+
22
+ # Verify reticulate installation
23
+ RUN R -e "if (!requireNamespace('reticulate', quietly = TRUE)) { stop('reticulate package not found after installation') } else { print(paste('reticulate version:', packageVersion('reticulate'))) }"
24
+
25
+ # Verify png installation
26
+ RUN R -e "if (!requireNamespace('png', quietly = TRUE)) { stop('png package not found after installation') } else { print(paste('png version:', packageVersion('png'))) }"
27
+
28
+ # Verify shinyjs installation
29
+ RUN R -e "if (!requireNamespace('shinyjs', quietly = TRUE)) { stop('shinyjs package not found after installation') } else { print(paste('shinyjs version:', packageVersion('shinyjs'))) }"
30
+
31
+ # Verify digest installation
32
+ RUN R -e "if (!requireNamespace('digest', quietly = TRUE)) { stop('digest package not found after installation') } else { print(paste('digest version:', packageVersion('digest'))) }"
33
+
34
+ # Install Python packages
35
+ COPY requirements.txt /app/requirements.txt
36
+ RUN pip3 install --no-cache-dir -r /app/requirements.txt
37
+
38
+ # Create app directory
39
+ WORKDIR /app
40
+
41
+ # Copy application files
42
+ COPY . /app
43
+
44
+ # Expose port
45
+ EXPOSE 7860
46
+
47
+ # Run the application
48
  CMD ["R", "-e", "shiny::runApp('/app', host='0.0.0.0', port=7860)"]
R/caching.R CHANGED
@@ -1,101 +1,101 @@
1
- # caching.R
2
-
3
- # Directory to store cache files
4
- CACHE_DIR <- "./cache_data"
5
-
6
- # Ensure cache directory exists
7
- if (!dir.exists(CACHE_DIR)) {
8
- dir.create(CACHE_DIR, recursive = TRUE)
9
- }
10
-
11
- #' Generate a cache key from an operation name and its arguments.
12
- #'
13
- #' @param operation_name A string identifying the operation.
14
- #' @param ... Arguments to the operation, used to create a unique hash.
15
- #' @return A string representing the cache key.
16
- generate_cache_key <- function(operation_name, ...) {
17
- # Create a list of arguments
18
- args_list <- list(...)
19
-
20
- # Combine operation name and a digest of the arguments
21
- # Ensure consistent ordering of named arguments for consistent hashing
22
- if (length(args_list) > 0) {
23
- if (!is.null(names(args_list))) {
24
- args_list <- args_list[order(names(args_list))]
25
- }
26
- # Use deparse and digest for complex arguments
27
- args_digest <- digest::digest(lapply(args_list, deparse))
28
- key_string <- paste(operation_name, args_digest, sep = "_")
29
- } else {
30
- key_string <- operation_name
31
- }
32
-
33
- # Sanitize the key to be a valid filename
34
- key_string <- gsub("[^a-zA-Z0-9_.-]", "_", key_string)
35
- return(paste0(key_string, ".rds"))
36
- }
37
-
38
- #' Retrieve an item from the cache.
39
- #'
40
- #' @param key The cache key (typically generated by generate_cache_key).
41
- #' @param max_age_seconds The maximum age of the cache file in seconds.
42
- #' If the cache file is older, it's considered stale.
43
- #' Default is NULL (no age check).
44
- #' @return The cached item, or NULL if not found or stale.
45
- get_cached_item <- function(key, max_age_seconds = NULL) {
46
- cache_file_path <- file.path(CACHE_DIR, key)
47
- if (file.exists(cache_file_path)) {
48
- if (!is.null(max_age_seconds)) {
49
- file_info <- file.info(cache_file_path)
50
- if (difftime(Sys.time(), file_info$mtime, units = "secs") > max_age_seconds) {
51
- # Cache is stale
52
- message(paste("Cache stale for key:", key, "- Recomputing."))
53
- return(NULL)
54
- }
55
- }
56
- message(paste("Cache hit for key:", key))
57
- return(readRDS(cache_file_path))
58
- } else {
59
- message(paste("Cache miss for key:", key))
60
- return(NULL)
61
- }
62
- }
63
-
64
- #' Save an item to the cache.
65
- #'
66
- #' @param key The cache key.
67
- #' @param value The item to save.
68
- save_cached_item <- function(key, value) {
69
- if (is.null(value)) {
70
- # Avoid saving NULLs if an operation truly returns NULL
71
- # Or handle as an explicit cache clear if needed
72
- message(paste("Skipping saving NULL value to cache for key:", key))
73
- return()
74
- }
75
- cache_file_path <- file.path(CACHE_DIR, key)
76
- tryCatch({
77
- saveRDS(value, file = cache_file_path)
78
- message(paste("Saved item to cache. Key:", key))
79
- }, error = function(e) {
80
- warning(paste("Error saving item to cache for key:", key, ":", e$message))
81
- })
82
- }
83
-
84
- #' Clear the entire cache directory.
85
- clear_all_cache <- function() {
86
- files_in_cache <- list.files(CACHE_DIR, full.names = TRUE)
87
- if (length(files_in_cache) > 0) {
88
- removed_files <- file.remove(files_in_cache)
89
- message(paste("Cleared", sum(removed_files), "files from cache."))
90
- } else {
91
- message("Cache directory is already empty.")
92
- }
93
- }
94
-
95
- # Ensure digest package is available
96
- if (!requireNamespace("digest", quietly = TRUE)) {
97
- # This is a server-side script, so direct installation might be okay
98
- # but ideally should be in requirements or Dockerfile.
99
- # For now, just message that it's needed.
100
- message("Package 'digest' is not installed. Cache key generation might not be robust. Please install it.")
101
  }
 
1
+ # caching.R
2
+
3
+ # Directory to store cache files
4
+ CACHE_DIR <- "./cache_data"
5
+
6
+ # Ensure cache directory exists
7
+ if (!dir.exists(CACHE_DIR)) {
8
+ dir.create(CACHE_DIR, recursive = TRUE)
9
+ }
10
+
11
+ #' Generate a cache key from an operation name and its arguments.
12
+ #'
13
+ #' @param operation_name A string identifying the operation.
14
+ #' @param ... Arguments to the operation, used to create a unique hash.
15
+ #' @return A string representing the cache key.
16
+ generate_cache_key <- function(operation_name, ...) {
17
+ # Create a list of arguments
18
+ args_list <- list(...)
19
+
20
+ # Combine operation name and a digest of the arguments
21
+ # Ensure consistent ordering of named arguments for consistent hashing
22
+ if (length(args_list) > 0) {
23
+ if (!is.null(names(args_list))) {
24
+ args_list <- args_list[order(names(args_list))]
25
+ }
26
+ # Use deparse and digest for complex arguments
27
+ args_digest <- digest::digest(lapply(args_list, deparse))
28
+ key_string <- paste(operation_name, args_digest, sep = "_")
29
+ } else {
30
+ key_string <- operation_name
31
+ }
32
+
33
+ # Sanitize the key to be a valid filename
34
+ key_string <- gsub("[^a-zA-Z0-9_.-]", "_", key_string)
35
+ return(paste0(key_string, ".rds"))
36
+ }
37
+
38
+ #' Retrieve an item from the cache.
39
+ #'
40
+ #' @param key The cache key (typically generated by generate_cache_key).
41
+ #' @param max_age_seconds The maximum age of the cache file in seconds.
42
+ #' If the cache file is older, it's considered stale.
43
+ #' Default is NULL (no age check).
44
+ #' @return The cached item, or NULL if not found or stale.
45
+ get_cached_item <- function(key, max_age_seconds = NULL) {
46
+ cache_file_path <- file.path(CACHE_DIR, key)
47
+ if (file.exists(cache_file_path)) {
48
+ if (!is.null(max_age_seconds)) {
49
+ file_info <- file.info(cache_file_path)
50
+ if (difftime(Sys.time(), file_info$mtime, units = "secs") > max_age_seconds) {
51
+ # Cache is stale
52
+ message(paste("Cache stale for key:", key, "- Recomputing."))
53
+ return(NULL)
54
+ }
55
+ }
56
+ message(paste("Cache hit for key:", key))
57
+ return(readRDS(cache_file_path))
58
+ } else {
59
+ message(paste("Cache miss for key:", key))
60
+ return(NULL)
61
+ }
62
+ }
63
+
64
+ #' Save an item to the cache.
65
+ #'
66
+ #' @param key The cache key.
67
+ #' @param value The item to save.
68
+ save_cached_item <- function(key, value) {
69
+ if (is.null(value)) {
70
+ # Avoid saving NULLs if an operation truly returns NULL
71
+ # Or handle as an explicit cache clear if needed
72
+ message(paste("Skipping saving NULL value to cache for key:", key))
73
+ return()
74
+ }
75
+ cache_file_path <- file.path(CACHE_DIR, key)
76
+ tryCatch({
77
+ saveRDS(value, file = cache_file_path)
78
+ message(paste("Saved item to cache. Key:", key))
79
+ }, error = function(e) {
80
+ warning(paste("Error saving item to cache for key:", key, ":", e$message))
81
+ })
82
+ }
83
+
84
+ #' Clear the entire cache directory.
85
+ clear_all_cache <- function() {
86
+ files_in_cache <- list.files(CACHE_DIR, full.names = TRUE)
87
+ if (length(files_in_cache) > 0) {
88
+ removed_files <- file.remove(files_in_cache)
89
+ message(paste("Cleared", sum(removed_files), "files from cache."))
90
+ } else {
91
+ message("Cache directory is already empty.")
92
+ }
93
+ }
94
+
95
+ # Ensure digest package is available
96
+ if (!requireNamespace("digest", quietly = TRUE)) {
97
+ # This is a server-side script, so direct installation might be okay
98
+ # but ideally should be in requirements or Dockerfile.
99
+ # For now, just message that it's needed.
100
+ message("Package 'digest' is not installed. Cache key generation might not be robust. Please install it.")
101
  }
WORKFLOW_CHANGES.md ADDED
@@ -0,0 +1,287 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # TaijiChat Workflow Changes: Literature Dialog Removal
2
+
3
+ ## Overview
4
+
5
+ This document outlines the major changes made to the TaijiChat multi-agent system to improve user experience by removing the upfront literature confirmation dialog and implementing a post-analysis literature exploration approach.
6
+
7
+ ## Problem Statement
8
+
9
+ ### Previous Workflow Issues:
10
+ 1. **User Friction**: Every query was blocked by a literature preference dialog before processing
11
+ 2. **Interruption of Flow**: Users had to make decisions before seeing any analysis results
12
+ 3. **Unclear Context**: Users couldn't make informed decisions about literature sources without seeing initial results
13
+ 4. **Pattern Matching Limitations**: Hardcoded keyword matching was unreliable for determining user intent
14
+
15
+ ## Solution Design
16
+
17
+ ### New Workflow Philosophy:
18
+ - **Analyze First, Explore Later**: Provide immediate value with optional deeper exploration
19
+ - **LLM-Powered Classification**: Use AI reasoning instead of pattern matching for intent detection
20
+ - **Clear Source Distinction**: Differentiate between primary paper (guaranteed) vs external literature (supplementary)
21
+ - **Progressive Disclosure**: Natural conversation flow with contextual followup options
22
+
23
+ ## Implementation Details
24
+
25
+ ### 1. ManagerAgent Changes (`agents/manager_agent.py`)
26
+
27
+ #### **Removed Components:**
28
+ ```python
29
+ # REMOVED: Literature confirmation dialog
30
+ def _request_literature_confirmation_upfront(self, user_query: str) -> str:
31
+ # This entire method was removed
32
+ ```
33
+
34
+ #### **Modified Components:**
35
+ ```python
36
+ def _process_turn(self, user_query_text: str) -> tuple:
37
+ # OLD: Asked for literature preferences before processing
38
+ # NEW: Process directly with default settings (both sources enabled)
39
+ response_text = self._process_with_literature_preferences(
40
+ user_query_text,
41
+ use_paper=True,
42
+ use_external_literature=True
43
+ )
44
+ return response_text, False, None
45
+ ```
46
+
47
+ #### **Enhanced Features:**
48
+ - Proper conversation history management
49
+ - Direct processing without interruption
50
+ - Maintains all existing security features
51
+
52
+ ### 2. GenerationAgent Changes (`agents/generation_agent.py`)
53
+
54
+ #### **Enhanced 13-Step Reasoning Process:**
55
+ ```
56
+ 1. Analyze the user query in detail
57
+ 2. Analyze the conversation history if there's any
58
+ 3. Analyze images, paper, data according to the plan if there's any provided
59
+ 4. Analyze errors from previous attempts if there's any
60
+ 5. Read the paper description to understand what the paper is about
61
+ 6. **NEW: QUERY TYPE CLASSIFICATION:**
62
+ - Is this a NEW_TASK (fresh analytical question) or FOLLOWUP_REQUEST (responding to literature offer)?
63
+ - If FOLLOWUP_REQUEST, what does user want: PRIMARY_PAPER, EXTERNAL_LITERATURE, or COMPREHENSIVE?
64
+ - Base decision on conversation context and user intent, not keywords
65
+ - Consider if previous response contained "Explore Supporting Literature" section
66
+ 7. Read the tools documentation thoroughly
67
+ 8. Decide which tools can be helpful when answering the query
68
+ 9. Read the data documentation
69
+ 10. Decide which datasets are relevant to the user query
70
+ 11. Decide whether the user query can be solved by paper or tools or data or a combination
71
+ 12. Decide whether the user query is about image(s)
72
+ 13. Put everything together to make a comprehensive plan
73
+ ```
74
+
75
+ #### **New Helper Methods:**
76
+ ```python
77
+ def _check_for_literature_offer(self, conversation_history: list) -> bool:
78
+ """Check if previous response contained literature exploration offer."""
79
+
80
+ def _classify_query_type(self, user_query: str, conversation_history: list) -> dict:
81
+ """Provide context for LLM-based query classification."""
82
+
83
+ def _append_literature_offer(self, explanation: str) -> str:
84
+ """Append literature exploration options to NEW_TASK responses."""
85
+ ```
86
+
87
+ #### **Response Format Rules:**
88
+ - **NEW_TASK**: Provide analysis + literature exploration offer
89
+ - **FOLLOWUP_REQUEST**: Execute requested literature analysis without new offer
90
+
91
+ ### 3. Literature Offer Format
92
+
93
+ #### **Clear Source Distinction:**
94
+ ```markdown
95
+ ---
96
+
97
+ **Explore Supporting Literature:**
98
+
99
+ 📄 **Primary Paper**: Analyze the foundational research paper this website is based on for additional context about these findings.
100
+
101
+ 🔍 **Recent Publications**: Search external academic databases for the latest research on these topics.
102
+
103
+ 📚 **Comprehensive**: Get insights from both the foundational paper and recent literature.
104
+
105
+ *Note: External literature serves as supplementary information only.*
106
+ ```
107
+
108
+ #### **Key Benefits:**
109
+ - **Primary Paper**: Vetted, guaranteed accuracy, foundational to website
110
+ - **External Literature**: Recent, supplementary, not guaranteed by website
111
+ - **User Choice**: Informed decision about source reliability vs recency
112
+
113
+ ## Workflow Examples
114
+
115
+ ### Example 1: Fresh Query → Analysis + Offer
116
+
117
+ **User Input:** *"What are the top 5 TEXterm-specific TFs?"*
118
+
119
+ **System Flow:**
120
+ 1. ManagerAgent processes immediately (no dialog)
121
+ 2. GenerationAgent Step 6: Classification → NEW_TASK
122
+ 3. Execute TF data analysis
123
+ 4. Return results with literature exploration offer
124
+
125
+ **Expected Response:**
126
+ ```
127
+ The top 5 TEXterm-specific transcription factors are:
128
+ 1. Zscan20 (p-value: 0.001)
129
+ 2. Jdp2 (p-value: 0.002)
130
+ 3. Zfp324 (p-value: 0.003)
131
+ 4. Batf (p-value: 0.004)
132
+ 5. Ikzf1 (p-value: 0.005)
133
+
134
+ These rankings are based on statistical significance from the dataset analysis.
135
+
136
+ ---
137
+
138
+ **Explore Supporting Literature:**
139
+
140
+ 📄 **Primary Paper**: Analyze the foundational research paper this website is based on for additional context about these TFs.
141
+
142
+ 🔍 **Recent Publications**: Search external academic databases for the latest research on these transcription factors.
143
+
144
+ 📚 **Comprehensive**: Get insights from both the foundational paper and recent literature.
145
+
146
+ *Note: External literature serves as supplementary information only.*
147
+ ```
148
+
149
+ ### Example 2: Literature Followup → Targeted Analysis
150
+
151
+ **User Input:** *"Search recent publications about these TFs"*
152
+
153
+ **System Flow:**
154
+ 1. GenerationAgent detects previous literature offer
155
+ 2. Step 6: Classification → FOLLOWUP_REQUEST, intent: EXTERNAL_LITERATURE
156
+ 3. Execute literature search using previous TF context
157
+ 4. Return literature analysis (no new offer)
158
+
159
+ **Expected Response:**
160
+ ```
161
+ ## Recent Literature on TEXterm Transcription Factors
162
+
163
+ Based on external academic database search, here are key recent findings:
164
+
165
+ **Zscan20 in T Cell Exhaustion:**
166
+ Recent studies [1] demonstrate that Zscan20 acts as a master regulator of terminal exhaustion...
167
+
168
+ **Jdp2 Regulatory Networks:**
169
+ New research [2] reveals Jdp2's role in chromatin remodeling during exhaustion programming...
170
+
171
+ [Additional literature analysis with proper citations]
172
+
173
+ ## References
174
+ [1] Smith et al. (2023). Zscan20 controls T cell exhaustion pathways. Nature Immunology.
175
+ [2] Johnson et al. (2023). Jdp2 in immune regulation. Cell.
176
+
177
+ *This analysis is based on external literature sources and serves as supplementary information.*
178
+ ```
179
+
180
+ ### Example 3: Primary Paper Request → Paper Analysis
181
+
182
+ **User Input:** *"What does the foundational study say about these TFs?"*
183
+
184
+ **System Flow:**
185
+ 1. Step 6: Classification → FOLLOWUP_REQUEST, intent: PRIMARY_PAPER
186
+ 2. Analyze paper.pdf with previous TF context
187
+ 3. Return focused paper analysis
188
+
189
+ ## Technical Implementation
190
+
191
+ ### Query Classification Logic
192
+
193
+ The system uses LLM reasoning instead of pattern matching:
194
+
195
+ ```python
196
+ # Context provided to LLM for classification
197
+ classification_instructions = f"\\n\\nQUERY CLASSIFICATION CONTEXT:"
198
+ classification_instructions += f"\\n- Previous response had literature offer: {has_previous_offer}"
199
+ if has_previous_offer:
200
+ classification_instructions += "\\n- This query might be a FOLLOWUP_REQUEST for literature analysis"
201
+ classification_instructions += "\\n- Determine user intent: PRIMARY_PAPER, EXTERNAL_LITERATURE, or COMPREHENSIVE"
202
+ classification_instructions += "\\n- If FOLLOWUP_REQUEST, do NOT append literature offer to final response"
203
+ else:
204
+ classification_instructions += "\\n- This is likely a NEW_TASK requiring fresh analysis"
205
+ classification_instructions += "\\n- If status is CODE_COMPLETE, append literature offer to explanation"
206
+ ```
207
+
208
+ ### Conversation History Management
209
+
210
+ ```python
211
+ # ManagerAgent properly manages conversation state
212
+ def _process_with_literature_preferences(self, user_query: str, use_paper: bool, use_external_literature: bool) -> str:
213
+ # Process query and get response
214
+ final_response = final_plan_for_turn.get('explanation', 'Processing completed.')
215
+
216
+ # Add response to conversation history for future context
217
+ self.conversation_history.append({"role": "assistant", "content": final_response})
218
+
219
+ return final_response
220
+ ```
221
+
222
+ ## Benefits
223
+
224
+ ### 1. **Improved User Experience**
225
+ - **Immediate Response**: No blocking dialogs
226
+ - **Natural Flow**: Conversational interaction
227
+ - **Informed Decisions**: Literature choices made after seeing results
228
+
229
+ ### 2. **Better Intent Recognition**
230
+ - **LLM-Powered**: Semantic understanding vs keyword matching
231
+ - **Context-Aware**: Considers conversation history
232
+ - **Flexible**: Adapts to various user phrasings
233
+
234
+ ### 3. **Clear Information Hierarchy**
235
+ - **Primary Sources**: Guaranteed accuracy, foundational research
236
+ - **Supplementary Sources**: Recent literature, clearly marked as external
237
+ - **User Agency**: Informed choice about source reliability
238
+
239
+ ### 4. **Maintained Security**
240
+ - **All existing safeguards preserved**
241
+ - **SupervisorAgent**: Code review unchanged
242
+ - **ExecutorAgent**: Sandboxed execution unchanged
243
+ - **Literature preferences**: Still respected in execution
244
+
245
+ ## Testing
246
+
247
+ ### Test Scenarios Created:
248
+ 1. **Fresh Query Test**: Verify immediate analysis + literature offer
249
+ 2. **External Literature Followup**: Test FOLLOWUP_REQUEST classification
250
+ 3. **Primary Paper Followup**: Test paper analysis request
251
+ 4. **Conversation Context**: Verify proper history management
252
+
253
+ ### Test File: `test_workflow.py`
254
+ - Comprehensive workflow testing
255
+ - Conversation history verification
256
+ - Response format validation
257
+
258
+ ## Migration Notes
259
+
260
+ ### Backward Compatibility
261
+ - **R Interface**: `handle_literature_confirmation()` method marked as LEGACY but preserved
262
+ - **Existing Data**: All dataset access patterns unchanged
263
+ - **Security Model**: No changes to permission structure
264
+
265
+ ### Deployment Considerations
266
+ - **No breaking changes** to existing functionality
267
+ - **Enhanced user experience** without compromising security
268
+ - **Gradual rollout** possible through feature flags if needed
269
+
270
+ ## Future Enhancements
271
+
272
+ ### Potential Improvements:
273
+ 1. **Smart Context Extraction**: Better extraction of relevant terms from previous analysis for literature searches
274
+ 2. **Citation Quality**: Enhanced citation formatting and link validation
275
+ 3. **User Preferences**: Optional user settings to remember literature preferences
276
+ 4. **Analytics**: Track which literature options users choose most frequently
277
+
278
+ ## Conclusion
279
+
280
+ The new workflow successfully addresses the original user experience issues while maintaining all security and functionality requirements. The system now provides immediate value to users while offering natural pathways for deeper exploration, creating a more engaging and efficient interaction model.
281
+
282
+ Key success metrics:
283
+ - ✅ **Removed user friction**: No blocking dialogs
284
+ - ✅ **Maintained security**: All safeguards preserved
285
+ - ✅ **Improved classification**: LLM-based intent recognition
286
+ - ✅ **Clear information hierarchy**: Distinguished source types
287
+ - ✅ **Natural conversation flow**: Progressive disclosure model
agents/executor_agent.py CHANGED
@@ -1,81 +1,81 @@
1
- # agents/executor_agent.py
2
-
3
- import io
4
- import contextlib
5
- import json
6
- # IMPORTANT: We will dynamically import the tools module when needed for execution
7
- # to ensure it uses the version from the tools/ directory.
8
- # import tools.agent_tools as agent_tools # Avoid top-level import for now
9
-
10
- class ExecutorAgent:
11
- def __init__(self, openai_api_key: str = None):
12
- print("ExecutorAgent initialized.")
13
- self.openai_api_key = openai_api_key
14
-
15
- def execute_code(self, python_code: str) -> dict:
16
- print(f"ExecutorAgent received code for execution:\n{python_code}")
17
-
18
- # Dynamically import agent_tools from the tools directory
19
- # This assumes main.py is run from the project root.
20
- import sys
21
- import os
22
- # Add project root to sys.path to allow `import tools.agent_tools`
23
- # This might be needed if executor_agent.py itself is run from a different context later,
24
- # but for now, assuming standard Python module resolution from root where main.py is.
25
- # script_dir = os.path.dirname(os.path.abspath(__file__))
26
- # project_root = os.path.abspath(os.path.join(script_dir, ".."))
27
- # if project_root not in sys.path:
28
- # sys.path.insert(0, project_root)
29
-
30
- try:
31
- # Ensure tools.agent_tools can be imported relative to project root
32
- import tools.agent_tools as agent_tools_module
33
- except ImportError as e:
34
- return {
35
- "execution_output": f"ExecutorAgent Error: Could not import agent_tools module. Ensure it's in tools/ and __init__.py might be needed in tools/. Error: {e}",
36
- "execution_status": "ERROR: ImportFailure"
37
- }
38
- except Exception as e:
39
- return {
40
- "execution_output": f"ExecutorAgent Error: Unexpected error during tools import. Error: {e}",
41
- "execution_status": "ERROR: ImportFailure"
42
- }
43
-
44
-
45
- # Create a restricted global scope for exec()
46
- # Only allow access to the agent_tools module (aliased as 'tools') and builtins
47
- restricted_globals = {
48
- "__builtins__": __builtins__, # Standard builtins (print, len, etc.)
49
- "tools": agent_tools_module,
50
- "json": json,
51
- "api_key": self.openai_api_key
52
- }
53
- # No separate locals, exec will use restricted_globals as locals too
54
-
55
- captured_output = io.StringIO()
56
- try:
57
- with contextlib.redirect_stdout(captured_output):
58
- exec(python_code, restricted_globals)
59
- output_str = captured_output.getvalue()
60
- return {
61
- "execution_output": output_str.strip() if output_str else "(No output printed by code)",
62
- "execution_status": "SUCCESS"
63
- }
64
- except Exception as e:
65
- error_details = f"{type(e).__name__}: {str(e)}"
66
- # Try to get traceback if possible, though might be complex to format cleanly here
67
- return {
68
- "execution_output": f"Execution Error!\n{error_details}",
69
- "execution_status": f"ERROR: {type(e).__name__}"
70
- }
71
-
72
- if __name__ == '__main__':
73
- # For testing individual agent if needed
74
- # executor = ExecutorAgent()
75
- # result = executor.execute_code("print(tools.get_biorxiv_paper_url())")
76
- # print(result)
77
- # result_error = executor.execute_code("print(tools.non_existent_tool())")
78
- # print(result_error)
79
- # result_unsafe = executor.execute_code("import os\nprint('dangerous')")
80
- # print(result_unsafe) # Should fail at exec if globals are well-restricted from direct os import
81
  print("ExecutorAgent should be orchestrated by the ManagerAgent.")
 
1
+ # agents/executor_agent.py
2
+
3
+ import io
4
+ import contextlib
5
+ import json
6
+ # IMPORTANT: We will dynamically import the tools module when needed for execution
7
+ # to ensure it uses the version from the tools/ directory.
8
+ # import tools.agent_tools as agent_tools # Avoid top-level import for now
9
+
10
+ class ExecutorAgent:
11
+ def __init__(self, openai_api_key: str = None):
12
+ print("ExecutorAgent initialized.")
13
+ self.openai_api_key = openai_api_key
14
+
15
+ def execute_code(self, python_code: str) -> dict:
16
+ print(f"ExecutorAgent received code for execution:\n{python_code}")
17
+
18
+ # Dynamically import agent_tools from the tools directory
19
+ # This assumes main.py is run from the project root.
20
+ import sys
21
+ import os
22
+ # Add project root to sys.path to allow `import tools.agent_tools`
23
+ # This might be needed if executor_agent.py itself is run from a different context later,
24
+ # but for now, assuming standard Python module resolution from root where main.py is.
25
+ # script_dir = os.path.dirname(os.path.abspath(__file__))
26
+ # project_root = os.path.abspath(os.path.join(script_dir, ".."))
27
+ # if project_root not in sys.path:
28
+ # sys.path.insert(0, project_root)
29
+
30
+ try:
31
+ # Ensure tools.agent_tools can be imported relative to project root
32
+ import tools.agent_tools as agent_tools_module
33
+ except ImportError as e:
34
+ return {
35
+ "execution_output": f"ExecutorAgent Error: Could not import agent_tools module. Ensure it's in tools/ and __init__.py might be needed in tools/. Error: {e}",
36
+ "execution_status": "ERROR: ImportFailure"
37
+ }
38
+ except Exception as e:
39
+ return {
40
+ "execution_output": f"ExecutorAgent Error: Unexpected error during tools import. Error: {e}",
41
+ "execution_status": "ERROR: ImportFailure"
42
+ }
43
+
44
+
45
+ # Create a restricted global scope for exec()
46
+ # Only allow access to the agent_tools module (aliased as 'tools') and builtins
47
+ restricted_globals = {
48
+ "__builtins__": __builtins__, # Standard builtins (print, len, etc.)
49
+ "tools": agent_tools_module,
50
+ "json": json,
51
+ "api_key": self.openai_api_key
52
+ }
53
+ # No separate locals, exec will use restricted_globals as locals too
54
+
55
+ captured_output = io.StringIO()
56
+ try:
57
+ with contextlib.redirect_stdout(captured_output):
58
+ exec(python_code, restricted_globals)
59
+ output_str = captured_output.getvalue()
60
+ return {
61
+ "execution_output": output_str.strip() if output_str else "(No output printed by code)",
62
+ "execution_status": "SUCCESS"
63
+ }
64
+ except Exception as e:
65
+ error_details = f"{type(e).__name__}: {str(e)}"
66
+ # Try to get traceback if possible, though might be complex to format cleanly here
67
+ return {
68
+ "execution_output": f"Execution Error!\n{error_details}",
69
+ "execution_status": f"ERROR: {type(e).__name__}"
70
+ }
71
+
72
+ if __name__ == '__main__':
73
+ # For testing individual agent if needed
74
+ # executor = ExecutorAgent()
75
+ # result = executor.execute_code("print(tools.get_biorxiv_paper_url())")
76
+ # print(result)
77
+ # result_error = executor.execute_code("print(tools.non_existent_tool())")
78
+ # print(result_error)
79
+ # result_unsafe = executor.execute_code("import os\nprint('dangerous')")
80
+ # print(result_unsafe) # Should fail at exec if globals are well-restricted from direct os import
81
  print("ExecutorAgent should be orchestrated by the ManagerAgent.")
agents/generation_agent.py CHANGED
@@ -33,11 +33,10 @@ For EVERY query, you MUST follow this EXACT 13-step structured approach:
33
  3. Analyze images, paper, data according to the plan if there's any provided
34
  4. Analyze errors from previous attempts if there's any
35
  5. Read the paper description to understand what the paper is about
36
- 6. **QUERY TYPE CLASSIFICATION:**
37
- - Is this a NEW_TASK (fresh analytical question) or FOLLOWUP_REQUEST (responding to literature offer)?
38
- - If FOLLOWUP_REQUEST, what does user want: PRIMARY_PAPER, EXTERNAL_LITERATURE, or COMPREHENSIVE?
39
- - Base decision on conversation context and user intent, not keywords
40
- - Consider if previous response contained "Explore Supporting Literature" section
41
  7. Read the tools documentation thoroughly
42
  8. Decide which tools can be helpful when answering the query; if there are any, prepare the list of tools to be used
43
  9. Read the data documentation
@@ -93,21 +92,8 @@ You MUST output a single JSON object with these fields:
93
  - "explanation": User-facing explanation or report of your findings
94
 
95
  **RESPONSE FORMAT RULES:**
96
- - For NEW_TASK queries with status CODE_COMPLETE: Always append literature exploration offer to explanation
97
- - For FOLLOWUP_REQUEST queries: Provide requested analysis without offering literature options again
98
- - Literature offer format:
99
-
100
- ---
101
-
102
- **Explore Supporting Literature:**
103
-
104
- 📄 **Primary Paper**: Analyze the foundational research paper this website is based on for additional context about these findings.
105
-
106
- 🔍 **Recent Publications**: Search external academic databases for the latest research on these topics.
107
-
108
- 📚 **Comprehensive**: Get insights from both the foundational paper and recent literature.
109
-
110
- *Note: External literature serves as supplementary information only.*
111
 
112
  **STATUS TYPES:**
113
  - "AWAITING_DATA": Use when fetching data with Python tools
@@ -433,20 +419,30 @@ class GenerationAgent:
433
  }
434
 
435
  # Look for JSON blocks in conversation history
436
- for turn in reversed(conversation_history[-6:]): # Check last 6 turns for relevant context
 
 
 
437
  content_from_history = turn.get("content", "")
 
 
 
438
  # Regex to find ```json ... ``` blocks
439
- # Using re.DOTALL to make . match newlines within the JSON block
440
- # Using re.IGNORECASE for ```json opening tag flexibility (though strictly lowercase is typical)
441
- json_block_match = re.search(r"```json\\s*(.*?)\\s*```", content_from_history, flags=re.DOTALL | re.IGNORECASE)
442
 
443
  if not json_block_match:
 
444
  continue # No JSON block in this turn's content
445
 
 
 
446
  try:
447
  # The actual JSON string is in group(1) of the match
448
  json_string_from_history = json_block_match.group(1)
 
449
  json_data_from_history = json.loads(json_string_from_history)
 
450
 
451
  # PHASE 3 FOR IMAGES: Check for image description JSON
452
  if "description" in json_data_from_history and "intermediate_data_for_llm" not in json_data_from_history: # Avoid conflict if key names overlap
@@ -473,16 +469,11 @@ class GenerationAgent:
473
  if user_query.startswith("FINAL_FORMATTING_REQUEST:"):
474
  query_for_classification = user_query.split("Original query: ", 1)[-1] if "Original query: " in user_query else user_query
475
 
476
- classification_context = self._classify_query_type(query_for_classification, conversation_history)
477
- is_followup = classification_context.get("likely_followup", False)
478
-
479
- # Append literature offer for NEW_TASK queries
480
  final_explanation = base_explanation
481
- if not is_followup:
482
- final_explanation = self._append_literature_offer(base_explanation)
483
 
484
  return {
485
- "thought": "I have retrieved the top transcription factors as requested from history and will present them with appropriate literature exploration options if this is a new task.",
486
  "status": "CODE_COMPLETE",
487
  "python_code": "",
488
  "explanation": final_explanation
@@ -493,12 +484,39 @@ class GenerationAgent:
493
  intermediate_content = json_data_from_history["intermediate_data_for_llm"]
494
 
495
  # Determine if this data is from a literature search tool
 
496
  is_literature_search_data = False
497
- if "CONTEXT_FROM_RESOURCE_FETCH" in content_from_history:
498
- # Example history content: "CONTEXT_FROM_RESOURCE_FETCH (original_identifier: print(json.dumps({'intermediate_data_for_llm': tools.multi_source_literature_search(...)}))): ..."
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
499
  if ("tools.multi_source_literature_search" in content_from_history or
500
- "tools.fetch_text_from_urls" in content_from_history):
 
501
  is_literature_search_data = True
 
502
 
503
  if is_literature_search_data:
504
  print(f"[GenerationAgent] Found literature search data (intermediate_data_for_llm) in history. Proceeding to summarization.")
@@ -625,9 +643,13 @@ class GenerationAgent:
625
  else:
626
  print(f"[GenerationAgent] Found unknown JSON format in conversation history, continuing search")
627
 
628
- except json.JSONDecodeError:
 
 
629
  continue # not valid JSON, skip
630
 
 
 
631
  # PHASE 1: No special conditions met, start with data/image fetching
632
  try:
633
  # Format conversation history for reference
@@ -660,14 +682,8 @@ class GenerationAgent:
660
  has_previous_offer = classification_context.get("has_previous_offer", False)
661
 
662
  classification_instructions = f"\\n\\nQUERY CLASSIFICATION CONTEXT:"
663
- classification_instructions += f"\\n- Previous response had literature offer: {has_previous_offer}"
664
- if has_previous_offer:
665
- classification_instructions += "\\n- This query might be a FOLLOWUP_REQUEST for literature analysis"
666
- classification_instructions += "\\n- Determine user intent: PRIMARY_PAPER, EXTERNAL_LITERATURE, or COMPREHENSIVE"
667
- classification_instructions += "\\n- If FOLLOWUP_REQUEST, do NOT append literature offer to final response"
668
- else:
669
- classification_instructions += "\\n- This is likely a NEW_TASK requiring fresh analysis"
670
- classification_instructions += "\\n- If status is CODE_COMPLETE, append literature offer to explanation"
671
 
672
  comprehensive_text_prompt += classification_instructions
673
 
@@ -948,23 +964,11 @@ class GenerationAgent:
948
 
949
  def _append_literature_offer(self, explanation: str) -> str:
950
  """
951
- Append literature exploration options to final responses for NEW_TASK queries.
952
  """
953
- literature_offer = """
954
-
955
- ---
956
-
957
- **Explore Supporting Literature:**
958
-
959
- 📄 **Primary Paper**: Analyze the foundational research paper this website is based on for additional context about these findings.
960
-
961
- 🔍 **Recent Publications**: Search external academic databases for the latest research on these topics.
962
-
963
- 📚 **Comprehensive**: Get insights from both the foundational paper and recent literature.
964
-
965
- *Note: External literature serves as supplementary information only.*"""
966
-
967
- return explanation + literature_offer
968
 
969
  if __name__ == '__main__':
970
  print("GenerationAgent should be orchestrated by the ManagerAgent.")
 
33
  3. Analyze images, paper, data according to the plan if there's any provided
34
  4. Analyze errors from previous attempts if there's any
35
  5. Read the paper description to understand what the paper is about
36
+ 6. **QUERY ANALYSIS:**
37
+ - Understand the user's specific request and intent
38
+ - Consider conversation context and previous responses
39
+ - Focus on providing direct, helpful analysis
 
40
  7. Read the tools documentation thoroughly
41
  8. Decide which tools can be helpful when answering the query; if there are any, prepare the list of tools to be used
42
  9. Read the data documentation
 
92
  - "explanation": User-facing explanation or report of your findings
93
 
94
  **RESPONSE FORMAT RULES:**
95
+ - Provide clear, direct responses to user queries
96
+ - Focus on the analysis results without additional promotional content
 
 
 
 
 
 
 
 
 
 
 
 
 
97
 
98
  **STATUS TYPES:**
99
  - "AWAITING_DATA": Use when fetching data with Python tools
 
419
  }
420
 
421
  # Look for JSON blocks in conversation history
422
+ print(f"[DEBUG] GenerationAgent: Searching for JSON blocks in conversation history")
423
+ print(f"[DEBUG] - Checking last {min(6, len(conversation_history))} turns out of {len(conversation_history)} total")
424
+
425
+ for i, turn in enumerate(reversed(conversation_history[-6:])): # Check last 6 turns for relevant context
426
  content_from_history = turn.get("content", "")
427
+ turn_index = len(conversation_history) - 6 + i
428
+ print(f"[DEBUG] - Turn {turn_index}: {turn.get('role')} - Content: {content_from_history[:100]}...")
429
+
430
  # Regex to find ```json ... ``` blocks
431
+ # FIX: Use single backslash for whitespace patterns
432
+ json_block_match = re.search(r"```json\s*(.*?)\s*```", content_from_history, flags=re.DOTALL | re.IGNORECASE)
 
433
 
434
  if not json_block_match:
435
+ print(f"[DEBUG] - Turn {turn_index}: No JSON block found")
436
  continue # No JSON block in this turn's content
437
 
438
+ print(f"[DEBUG] - Turn {turn_index}: Found JSON block! Extracting...")
439
+
440
  try:
441
  # The actual JSON string is in group(1) of the match
442
  json_string_from_history = json_block_match.group(1)
443
+ print(f"[DEBUG] - JSON string extracted: {json_string_from_history[:100]}...")
444
  json_data_from_history = json.loads(json_string_from_history)
445
+ print(f"[DEBUG] - JSON parsed successfully! Keys: {list(json_data_from_history.keys())}")
446
 
447
  # PHASE 3 FOR IMAGES: Check for image description JSON
448
  if "description" in json_data_from_history and "intermediate_data_for_llm" not in json_data_from_history: # Avoid conflict if key names overlap
 
469
  if user_query.startswith("FINAL_FORMATTING_REQUEST:"):
470
  query_for_classification = user_query.split("Original query: ", 1)[-1] if "Original query: " in user_query else user_query
471
 
472
+ # Literature offers are now controlled by the toggle button in the frontend
 
 
 
473
  final_explanation = base_explanation
 
 
474
 
475
  return {
476
+ "thought": "I have retrieved the top transcription factors as requested from history and will present them directly.",
477
  "status": "CODE_COMPLETE",
478
  "python_code": "",
479
  "explanation": final_explanation
 
484
  intermediate_content = json_data_from_history["intermediate_data_for_llm"]
485
 
486
  # Determine if this data is from a literature search tool
487
+ # Check the structure of the data to identify if it's literature data
488
  is_literature_search_data = False
489
+
490
+ print(f"[DEBUG] Checking intermediate_data_for_llm structure for literature detection")
491
+ print(f"[DEBUG] - Data type: {type(intermediate_content)}")
492
+
493
+ if isinstance(intermediate_content, list) and intermediate_content:
494
+ # Check if the first item has literature-like structure
495
+ first_item = intermediate_content[0]
496
+ print(f"[DEBUG] - First item type: {type(first_item)}")
497
+
498
+ if isinstance(first_item, dict):
499
+ first_item_keys = list(first_item.keys())
500
+ print(f"[DEBUG] - First item keys: {first_item_keys}")
501
+
502
+ # Literature papers typically have title, authors, abstract, etc.
503
+ literature_indicators = ['title', 'authors', 'abstract', 'doi', 'source_api']
504
+
505
+ # Check if at least 2 literature indicators are present
506
+ found_indicators = sum(1 for key in literature_indicators if key in first_item_keys)
507
+ print(f"[DEBUG] - Found {found_indicators} literature indicators out of {len(literature_indicators)}")
508
+
509
+ if found_indicators >= 2:
510
+ is_literature_search_data = True
511
+ print(f"[DEBUG] - Identified as literature search data based on structure")
512
+
513
+ # Fallback: check content for literature search patterns (legacy method)
514
+ if not is_literature_search_data:
515
  if ("tools.multi_source_literature_search" in content_from_history or
516
+ "tools.fetch_text_from_urls" in content_from_history or
517
+ "CONTEXT_FROM_RESOURCE_FETCH" in content_from_history):
518
  is_literature_search_data = True
519
+ print(f"[DEBUG] - Identified as literature search data based on content patterns")
520
 
521
  if is_literature_search_data:
522
  print(f"[GenerationAgent] Found literature search data (intermediate_data_for_llm) in history. Proceeding to summarization.")
 
643
  else:
644
  print(f"[GenerationAgent] Found unknown JSON format in conversation history, continuing search")
645
 
646
+ except json.JSONDecodeError as e:
647
+ print(f"[DEBUG] - Turn {turn_index}: JSON parsing error: {e}")
648
+ print(f"[DEBUG] - Raw JSON string that failed: {json_string_from_history[:200]}...")
649
  continue # not valid JSON, skip
650
 
651
+ print(f"[DEBUG] GenerationAgent: No valid JSON blocks found in conversation history")
652
+
653
  # PHASE 1: No special conditions met, start with data/image fetching
654
  try:
655
  # Format conversation history for reference
 
682
  has_previous_offer = classification_context.get("has_previous_offer", False)
683
 
684
  classification_instructions = f"\\n\\nQUERY CLASSIFICATION CONTEXT:"
685
+ # Literature offers are now controlled by the toggle button in the frontend
686
+ classification_instructions += "\\n- Focus on providing clear, direct responses to user queries"
 
 
 
 
 
 
687
 
688
  comprehensive_text_prompt += classification_instructions
689
 
 
964
 
965
  def _append_literature_offer(self, explanation: str) -> str:
966
  """
967
+ DISABLED: Literature exploration options are now controlled by toggle button.
968
  """
969
+ # Literature offers are now controlled by the toggle button in the frontend
970
+ # Return explanation unchanged
971
+ return explanation
 
 
 
 
 
 
 
 
 
 
 
 
972
 
973
  if __name__ == '__main__':
974
  print("GenerationAgent should be orchestrated by the ManagerAgent.")
agents/manager_agent.py CHANGED
@@ -138,38 +138,6 @@ class ManagerAgent:
138
  # REMOVED: _request_literature_confirmation_upfront - no longer needed
139
  # Literature preferences are now handled as post-analysis options
140
 
141
- def handle_literature_confirmation(self, user_response: str, original_query: str = None) -> str:
142
- """
143
- LEGACY: Public method to handle literature confirmation from R/UI.
144
- NOTE: This method may no longer be needed with the new workflow, but kept for backward compatibility.
145
- Literature preferences are now handled as post-analysis followup requests.
146
- """
147
- print(f"[ManagerAgent] Received literature confirmation: {user_response}")
148
-
149
- # Get the stored query
150
- user_query = self.pending_literature_query or original_query
151
- if not user_query:
152
- return "No pending literature query found."
153
-
154
- # Clear the pending query
155
- self.pending_literature_query = None
156
-
157
- # Process the query with the specified literature preferences
158
- try:
159
- # Parse user preferences
160
- use_paper = user_response in ["both", "paper"]
161
- use_external_literature = user_response in ["both", "external"]
162
-
163
- print(f"[ManagerAgent] Processing with preferences - Paper: {use_paper}, External: {use_external_literature}")
164
- self._send_thought_to_r(f"Processing with literature preferences: {user_response}")
165
-
166
- # Continue with the full processing pipeline with preferences
167
- return self._process_with_literature_preferences(user_query, use_paper, use_external_literature)
168
-
169
- except Exception as e:
170
- error_msg = f"Error processing with literature preferences: {str(e)}"
171
- print(f"[ManagerAgent] {error_msg}")
172
- return error_msg
173
 
174
  def _continue_with_literature_plan(self, plan: dict) -> str:
175
  """Continue processing with the original plan that includes literature search."""
@@ -224,13 +192,14 @@ class ManagerAgent:
224
  print(f"[Manager._process_turn] Processing query: '{user_query_text[:100]}...'")
225
  self._send_thought_to_r(f"Processing query: '{user_query_text[:50]}...'") # THOUGHT
226
 
227
- # --- Process directly with default literature settings (both sources enabled) ---
228
- print(f"[Manager._process_turn] Processing with default literature settings")
229
- self._send_thought_to_r("Processing query with both literature sources enabled...")
 
230
  response_text = self._process_with_literature_preferences(
231
  user_query_text,
232
- use_paper=True,
233
- use_external_literature=True
234
  )
235
  return response_text, False, None
236
 
@@ -256,6 +225,9 @@ class ManagerAgent:
256
  current_query_for_generation_agent = user_query
257
  previous_generation_attempts = []
258
 
 
 
 
259
  # This variable will hold the File ID if the manager uploads a file and needs to re-call generate_code_plan
260
  image_file_id_for_analysis_step = None
261
 
@@ -275,7 +247,13 @@ class ManagerAgent:
275
  call_ga_again_for_follow_up = True
276
  current_plan_holder = final_plan_for_turn
277
 
278
- while call_ga_again_for_follow_up:
 
 
 
 
 
 
279
  call_ga_again_for_follow_up = False
280
 
281
  if not self.generation_agent:
@@ -286,6 +264,13 @@ class ManagerAgent:
286
 
287
  self._send_thought_to_r(f"Asking GenerationAgent for a plan with literature preferences...")
288
 
 
 
 
 
 
 
 
289
  # Pass literature preferences to GenerationAgent
290
  plan = self.generation_agent.generate_code_plan(
291
  user_query=effective_query_for_ga,
@@ -331,8 +316,17 @@ class ManagerAgent:
331
  if supervisor_status != "APPROVED_FOR_EXECUTION":
332
  return f"Code execution blocked by supervisor: {supervisor_feedback}"
333
 
 
 
 
 
 
 
334
  # Execute the code
335
  self._send_thought_to_r("Executing code...")
 
 
 
336
  execution_result = self.executor_agent.execute_code(code_to_execute)
337
  execution_output = execution_result.get("execution_output", "")
338
  execution_status = execution_result.get("execution_status", "UNKNOWN")
@@ -340,14 +334,26 @@ class ManagerAgent:
340
  if execution_status == "SUCCESS":
341
  self._send_thought_to_r(f"Code execution successful.")
342
 
 
 
 
 
343
  # Add results to conversation history
344
- self.conversation_history.append({"role": "assistant", "content": f"```json\n{execution_output}\n```"})
 
 
 
 
 
 
345
 
346
  # Always continue to GenerationAgent for final formatting
347
  # This ensures literature offers and proper response formatting
348
  if "intermediate_data_for_llm" in execution_output:
 
349
  call_ga_again_for_follow_up = True
350
  else:
 
351
  # Instead of returning raw execution output, let GenerationAgent format it
352
  call_ga_again_for_follow_up = True
353
  # Set a flag so GenerationAgent knows this is final formatting phase
@@ -381,6 +387,14 @@ class ManagerAgent:
381
  self.conversation_history.append({"role": "assistant", "content": error_msg})
382
  return error_msg
383
 
 
 
 
 
 
 
 
 
384
  def process_single_query(self, user_query_text: str, conversation_history_from_r: list = None) -> str:
385
  """
386
  Processes a single query, suitable for calling from an external system like R/Shiny.
 
138
  # REMOVED: _request_literature_confirmation_upfront - no longer needed
139
  # Literature preferences are now handled as post-analysis options
140
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
141
 
142
  def _continue_with_literature_plan(self, plan: dict) -> str:
143
  """Continue processing with the original plan that includes literature search."""
 
192
  print(f"[Manager._process_turn] Processing query: '{user_query_text[:100]}...'")
193
  self._send_thought_to_r(f"Processing query: '{user_query_text[:50]}...'") # THOUGHT
194
 
195
+ # --- Process with dynamic literature settings based on frontend preference ---
196
+ use_external_literature = getattr(self, 'literature_enabled', False) # Default to False
197
+ print(f"[Manager._process_turn] Processing with literature enabled: {use_external_literature}")
198
+ self._send_thought_to_r(f"Processing query with external literature: {'enabled' if use_external_literature else 'disabled'}")
199
  response_text = self._process_with_literature_preferences(
200
  user_query_text,
201
+ use_paper=True, # Keep paper (internal data) always enabled
202
+ use_external_literature=use_external_literature # Use frontend preference
203
  )
204
  return response_text, False, None
205
 
 
225
  current_query_for_generation_agent = user_query
226
  previous_generation_attempts = []
227
 
228
+ # Track attempted operations to prevent infinite loops
229
+ attempted_operations = set()
230
+
231
  # This variable will hold the File ID if the manager uploads a file and needs to re-call generate_code_plan
232
  image_file_id_for_analysis_step = None
233
 
 
247
  call_ga_again_for_follow_up = True
248
  current_plan_holder = final_plan_for_turn
249
 
250
+ while call_ga_again_for_follow_up and current_data_fetch_attempt < max_data_fetch_attempts_per_generation:
251
+ current_data_fetch_attempt += 1
252
+ print(f"[DEBUG] Data fetch attempt {current_data_fetch_attempt}/{max_data_fetch_attempts_per_generation}")
253
+
254
+ if current_data_fetch_attempt > max_data_fetch_attempts_per_generation:
255
+ print(f"[DEBUG] Maximum data fetch attempts reached for generation {current_generation_attempt}")
256
+ break
257
  call_ga_again_for_follow_up = False
258
 
259
  if not self.generation_agent:
 
264
 
265
  self._send_thought_to_r(f"Asking GenerationAgent for a plan with literature preferences...")
266
 
267
+ # DEBUG: Log conversation history being passed to GenerationAgent
268
+ print(f"[DEBUG] Passing conversation history to GenerationAgent:")
269
+ print(f"[DEBUG] - History length: {len(self.conversation_history)}")
270
+ print(f"[DEBUG] - History roles: {[msg['role'] for msg in self.conversation_history]}")
271
+ for i, msg in enumerate(self.conversation_history[-3:]): # Show last 3 messages
272
+ print(f"[DEBUG] - Message {len(self.conversation_history)-3+i}: {msg['role']} - {msg['content'][:100]}...")
273
+
274
  # Pass literature preferences to GenerationAgent
275
  plan = self.generation_agent.generate_code_plan(
276
  user_query=effective_query_for_ga,
 
316
  if supervisor_status != "APPROVED_FOR_EXECUTION":
317
  return f"Code execution blocked by supervisor: {supervisor_feedback}"
318
 
319
+ # Check if this operation has been attempted before to prevent loops
320
+ operation_signature = f"{code_to_execute.strip()[:100]}" # Use first 100 chars as signature
321
+ if operation_signature in attempted_operations:
322
+ print(f"[DEBUG] Loop detected! Operation already attempted: {operation_signature[:50]}...")
323
+ return "Loop detected: This operation has already been attempted. Please try a different approach."
324
+
325
  # Execute the code
326
  self._send_thought_to_r("Executing code...")
327
+ attempted_operations.add(operation_signature)
328
+ print(f"[DEBUG] Added operation to attempted set: {operation_signature[:50]}...")
329
+
330
  execution_result = self.executor_agent.execute_code(code_to_execute)
331
  execution_output = execution_result.get("execution_output", "")
332
  execution_status = execution_result.get("execution_status", "UNKNOWN")
 
334
  if execution_status == "SUCCESS":
335
  self._send_thought_to_r(f"Code execution successful.")
336
 
337
+ # DEBUG: Log conversation history before storing results
338
+ print(f"[DEBUG] Conversation history length before storing ExecutorAgent result: {len(self.conversation_history)}")
339
+ print(f"[DEBUG] ExecutorAgent output being stored: {execution_output[:200]}...")
340
+
341
  # Add results to conversation history
342
+ stored_content = f"```json\n{execution_output}\n```"
343
+ self.conversation_history.append({"role": "assistant", "content": stored_content})
344
+
345
+ # DEBUG: Log conversation history after storing results
346
+ print(f"[DEBUG] Conversation history length after storing result: {len(self.conversation_history)}")
347
+ print(f"[DEBUG] Last conversation entry: {self.conversation_history[-1]['content'][:200]}...")
348
+ print(f"[DEBUG] Full conversation history roles: {[msg['role'] for msg in self.conversation_history]}")
349
 
350
  # Always continue to GenerationAgent for final formatting
351
  # This ensures literature offers and proper response formatting
352
  if "intermediate_data_for_llm" in execution_output:
353
+ print(f"[DEBUG] Found 'intermediate_data_for_llm' in output - continuing to GenerationAgent for processing")
354
  call_ga_again_for_follow_up = True
355
  else:
356
+ print(f"[DEBUG] No 'intermediate_data_for_llm' found - requesting final formatting from GenerationAgent")
357
  # Instead of returning raw execution output, let GenerationAgent format it
358
  call_ga_again_for_follow_up = True
359
  # Set a flag so GenerationAgent knows this is final formatting phase
 
387
  self.conversation_history.append({"role": "assistant", "content": error_msg})
388
  return error_msg
389
 
390
+ def process_single_query_with_preferences(self, user_query_text: str,
391
+ conversation_history_from_r: list = None,
392
+ literature_enabled: bool = True) -> str:
393
+ """Process query with explicit literature preference from frontend."""
394
+ print(f"[Manager.process_single_query_with_preferences] Literature enabled: {literature_enabled}")
395
+ self.literature_enabled = literature_enabled
396
+ return self.process_single_query(user_query_text, conversation_history_from_r)
397
+
398
  def process_single_query(self, user_query_text: str, conversation_history_from_r: list = None) -> str:
399
  """
400
  Processes a single query, suitable for calling from an external system like R/Shiny.
agents/supervisor_agent.py CHANGED
@@ -1,268 +1,268 @@
1
- # agents/supervisor_agent.py
2
-
3
- import json
4
- import time # Added for polling
5
- import os # Added for path operations
6
- from openai import OpenAI # Ensure OpenAI is imported for client usage
7
-
8
- # --- Constants for the Supervisor Assistant ---
9
- SUPERVISOR_ASSISTANT_NAME = "TaijiChat Code Review Assistant"
10
- SUPERVISOR_INSTRUCTIONS_TEMPLATE_FILE = "supervisor_instructions_template.md" # New
11
-
12
- # Define JSON examples as separate strings
13
- EXAMPLE_JSON_APPROVED = '''
14
- {
15
- "safety_feedback": "Code uses the 'tools' module correctly and employs permitted built-in functions for data processing. No forbidden operations detected.",
16
- "safety_status": "APPROVED_FOR_EXECUTION",
17
- "user_facing_rejection_reason": "Approved."
18
- }
19
- '''
20
-
21
- EXAMPLE_JSON_REJECTED = '''
22
- {
23
- "safety_feedback": "Forbidden operation detected: Code attempts to import the 'os' module, which is not allowed (Rule I.4).",
24
- "safety_status": "REJECTED_NEEDS_REVISION",
25
- "user_facing_rejection_reason": "The code attempted a restricted operation that is not permitted for safety reasons."
26
- }
27
- '''
28
-
29
- # SUPERVISOR_ASSISTANT_INSTRUCTIONS f-string is now removed. It will be loaded from the template file.
30
-
31
- POLLING_INTERVAL_S = 1
32
- MAX_POLLING_ATTEMPTS = 60
33
-
34
- class SupervisorAgent:
35
- def __init__(self, client_openai: OpenAI = None):
36
- self.client = client_openai
37
- self.supervisor_assistant = None
38
- self.formatted_instructions = self._load_and_format_instructions() # Load instructions
39
-
40
- if not self.formatted_instructions:
41
- print("SupervisorAgent Critical: Failed to load or format supervisor instructions. Assistant may not function correctly.")
42
- # Potentially raise an error or set a flag indicating a critical failure
43
-
44
- if self.client:
45
- try:
46
- self._create_or_retrieve_supervisor_assistant()
47
- print("SupervisorAgent: Successfully created/retrieved Supervisor Assistant.")
48
- except Exception as e:
49
- print(f"SupervisorAgent Error: Could not create/retrieve/update Supervisor Assistant: {str(e)}")
50
- self.supervisor_assistant = None
51
- else:
52
- print("SupervisorAgent Critical: OpenAI client not provided. Cannot create Supervisor Assistant.")
53
-
54
- def _load_and_format_instructions(self) -> str:
55
- """Loads instructions from a template file and formats them with JSON examples."""
56
- try:
57
- # Construct path relative to this file's location
58
- script_dir = os.path.dirname(os.path.abspath(__file__))
59
- template_path = os.path.join(script_dir, SUPERVISOR_INSTRUCTIONS_TEMPLATE_FILE)
60
-
61
- with open(template_path, 'r', encoding='utf-8') as f:
62
- template_content = f.read()
63
-
64
- return template_content.format(
65
- example_approved_json_str=EXAMPLE_JSON_APPROVED,
66
- example_rejected_json_str=EXAMPLE_JSON_REJECTED
67
- )
68
- except FileNotFoundError:
69
- print(f"SupervisorAgent Error: Instructions template file not found at {template_path}")
70
- return None
71
- except KeyError as e:
72
- print(f"SupervisorAgent Error: Placeholder key missing in instructions template: {e}")
73
- return None
74
- except Exception as e:
75
- print(f"SupervisorAgent Error: Failed to load/format instructions: {e}")
76
- return None
77
-
78
- def _create_or_retrieve_supervisor_assistant(self):
79
- if not self.client or not self.formatted_instructions:
80
- if not self.formatted_instructions:
81
- print("SupervisorAgent Error: Cannot create/retrieve assistant because instructions are missing.")
82
- if not self.client:
83
- print("SupervisorAgent Error: Cannot create/retrieve assistant because OpenAI client is missing.")
84
- return
85
- try:
86
- print("SupervisorAgent: Attempting to list existing assistants...")
87
- try:
88
- assistants = self.client.beta.assistants.list(order="desc", limit="20")
89
- except Exception as list_error:
90
- print(f"SupervisorAgent Error: Failed to list assistants. Error type: {type(list_error).__name__}, Error: {str(list_error)}")
91
- if hasattr(list_error, 'response'):
92
- print(f"SupervisorAgent Error: Response status: {list_error.response.status_code if hasattr(list_error.response, 'status_code') else 'N/A'}")
93
- print(f"SupervisorAgent Error: Response body: {list_error.response.text if hasattr(list_error.response, 'text') else 'N/A'}")
94
- raise
95
- found_assistant = None
96
- for assistant in assistants.data:
97
- if assistant.name == SUPERVISOR_ASSISTANT_NAME:
98
- found_assistant = assistant
99
- print(f"SupervisorAgent: Found existing assistant with ID: {assistant.id}")
100
- break
101
-
102
- if found_assistant:
103
- print(f"SupervisorAgent: Updating existing assistant {found_assistant.id}...")
104
- try:
105
- self.supervisor_assistant = self.client.beta.assistants.update(
106
- assistant_id=found_assistant.id,
107
- instructions=self.formatted_instructions,
108
- model="gpt-4o",
109
- tools=[]
110
- )
111
- print(f"SupervisorAgent: Successfully updated assistant {self.supervisor_assistant.id}")
112
- except Exception as update_error:
113
- print(f"SupervisorAgent Error: Failed to update assistant: {str(update_error)}")
114
- raise
115
- else:
116
- print(f"SupervisorAgent: Creating new assistant '{SUPERVISOR_ASSISTANT_NAME}'...")
117
- try:
118
- self.supervisor_assistant = self.client.beta.assistants.create(
119
- name=SUPERVISOR_ASSISTANT_NAME,
120
- instructions=self.formatted_instructions,
121
- model="gpt-4o",
122
- tools=[]
123
- )
124
- print(f"SupervisorAgent: Successfully created assistant with ID: {self.supervisor_assistant.id}")
125
- except Exception as create_error:
126
- print(f"SupervisorAgent Error: Failed to create assistant: {str(create_error)}")
127
- raise
128
-
129
- if not self.supervisor_assistant:
130
- raise Exception("Assistant object is None after creation/update")
131
-
132
- except Exception as e:
133
- print(f"SupervisorAgent Error: Could not create/retrieve/update Supervisor Assistant: {str(e)}")
134
- self.supervisor_assistant = None
135
- raise # Re-raise the exception to be handled by the caller
136
-
137
- def review_code(self, python_code: str, thought: str): # Removed client_openai from params
138
- print(f"SupervisorAgent.review_code received code. Thought: {thought[:100]}...") # Log more of the thought
139
-
140
- if not python_code.strip():
141
- print("SupervisorAgent: No actual code provided for review. Approving as safe.")
142
- return {"safety_feedback": "No code provided by Generation Agent.", "safety_status": "APPROVED_FOR_EXECUTION", "user_facing_rejection_reason": ""}
143
-
144
- if not self.client or not self.supervisor_assistant:
145
- print("SupervisorAgent Error: OpenAI client or Supervisor Assistant not available for code review.")
146
- return {"safety_feedback": "Error: Supervisor Agent not properly initialized.", "safety_status": "REJECTED_NEEDS_REVISION", "user_facing_rejection_reason": "The supervisor agent encountered an error."}
147
-
148
- thread = None # Initialize for the finally block
149
- try:
150
- # 1. Construct User Message Content
151
- user_message_content = (
152
- f"Please review the following Python code for safety and correctness based on your instructions.\\n\\n"
153
- f"Context (AI's plan that generated this code):\\n{thought}\\n\\n"
154
- f"Python Code to Review:\\n```python\\n{python_code}\\n```\\n"
155
- f"Ensure your response is only the required JSON object."
156
- )
157
-
158
- # 2. Create a Thread
159
- # print("SupervisorAgent: Creating new Thread for review...")
160
- thread = self.client.beta.threads.create()
161
- # print(f"SupervisorAgent: Thread created: {thread.id}")
162
-
163
- # 3. Add message to Thread
164
- self.client.beta.threads.messages.create(
165
- thread_id=thread.id,
166
- role="user",
167
- content=user_message_content
168
- )
169
- # print("SupervisorAgent: Message added to Thread.")
170
-
171
- # 4. Create Run
172
- # print(f"SupervisorAgent: Creating Run for Assistant {self.supervisor_assistant.id} on Thread {thread.id}...")
173
- run = self.client.beta.threads.runs.create(
174
- thread_id=thread.id,
175
- assistant_id=self.supervisor_assistant.id
176
- )
177
- # print(f"SupervisorAgent: Run created: {run.id}, Status: {run.status}")
178
-
179
- # 5. Poll Run for completion
180
- attempts = 0
181
- while run.status in ["queued", "in_progress"] and attempts < MAX_POLLING_ATTEMPTS:
182
- time.sleep(POLLING_INTERVAL_S)
183
- run = self.client.beta.threads.runs.retrieve(thread_id=thread.id, run_id=run.id)
184
- # print(f"SupervisorAgent: Polling Run {run.id}: Status: {run.status}") # Verbose
185
- attempts += 1
186
-
187
- # 6. Process Run Outcome
188
- if run.status == "completed":
189
- # print(f"SupervisorAgent: Run {run.id} completed.")
190
- messages_response = self.client.beta.threads.messages.list(thread_id=thread.id, order="desc", limit=1)
191
- if messages_response.data and messages_response.data[0].content and messages_response.data[0].content[0].type == "text":
192
- assistant_response_json_str = messages_response.data[0].content[0].text.value
193
- # print(f"SupervisorAgent: Raw LLM review JSON: {assistant_response_json_str}")
194
-
195
- # Strip markdown fences if present
196
- if assistant_response_json_str.startswith("```json"):
197
- assistant_response_json_str = assistant_response_json_str[len("```json"):].strip()
198
- if assistant_response_json_str.startswith("```"):
199
- assistant_response_json_str = assistant_response_json_str[len("```"):].strip()
200
- if assistant_response_json_str.endswith("```"):
201
- assistant_response_json_str = assistant_response_json_str[:-len("```")].strip()
202
-
203
- try:
204
- parsed_response = json.loads(assistant_response_json_str)
205
- # Validate structure
206
- if not all(k in parsed_response for k in ["safety_feedback", "safety_status", "user_facing_rejection_reason"]):
207
- print("SupervisorAgent Error: LLM review JSON missing required keys.")
208
- return {
209
- "safety_feedback": "Internal Error: LLM review response malformed (missing keys).",
210
- "safety_status": "REJECTED_NEEDS_REVISION",
211
- "user_facing_rejection_reason": "The code review process encountered an internal error."
212
- }
213
- # Validate safety_status value
214
- if parsed_response["safety_status"] not in ["APPROVED_FOR_EXECUTION", "REJECTED_NEEDS_REVISION"]:
215
- print(f"SupervisorAgent Error: LLM returned an invalid safety_status value: {parsed_response['safety_status']}.")
216
- # Override with rejection if status is invalid
217
- parsed_response["safety_feedback"] += " (Original status was invalid)"
218
- parsed_response["safety_status"] = "REJECTED_NEEDS_REVISION"
219
- # Ensure user_facing_rejection_reason is generic if not already sensible
220
- if not parsed_response.get("user_facing_rejection_reason"):
221
- parsed_response["user_facing_rejection_reason"] = "The code could not be validated due to an internal status error."
222
-
223
- # Ensure user_facing_rejection_reason is present if rejected, and appropriate if approved
224
- if parsed_response["safety_status"] == "REJECTED_NEEDS_REVISION" and not parsed_response.get("user_facing_rejection_reason","").strip():
225
- parsed_response["user_facing_rejection_reason"] = "The proposed code could not be approved due to safety or correctness concerns." # Default user reason
226
- elif parsed_response["safety_status"] == "APPROVED_FOR_EXECUTION" and not parsed_response.get("user_facing_rejection_reason","").strip():
227
- parsed_response["user_facing_rejection_reason"] = "Approved."
228
-
229
- return parsed_response
230
- except json.JSONDecodeError as e:
231
- print(f"SupervisorAgent JSONDecodeError: Could not parse LLM review JSON: {e}. Response: {assistant_response_json_str}")
232
- return {
233
- "safety_feedback": f"Internal Error: Failed to parse LLM review JSON. {e}",
234
- "safety_status": "REJECTED_NEEDS_REVISION",
235
- "user_facing_rejection_reason": "The code review result was unreadable."
236
- }
237
- else:
238
- print("SupervisorAgent Error: No valid message content from assistant after review run completion.")
239
- return {
240
- "safety_feedback": "Internal Error: No content from supervisor assistant.",
241
- "safety_status": "REJECTED_NEEDS_REVISION",
242
- "user_facing_rejection_reason": "The supervisor agent provided no response."
243
- }
244
- else:
245
- error_message = f"Review run failed or timed out. Status: {run.status}"
246
- if run.last_error:
247
- error_message += f" Last Error: {run.last_error.message}"
248
- print(f"SupervisorAgent Error: {error_message}")
249
- return {"safety_feedback": error_message, "safety_status": "REJECTED_NEEDS_REVISION", "user_facing_rejection_reason": "The code review process encountered an error."}
250
- except Exception as e:
251
- print(f"SupervisorAgent Error: General exception during review_code: {e}")
252
- return {
253
- "safety_feedback": f"General exception in review_code: {e}",
254
- "safety_status": "REJECTED_NEEDS_REVISION",
255
- "user_facing_rejection_reason": "A general error occurred during code review."
256
- }
257
- finally:
258
- # 7. Delete Thread
259
- if thread:
260
- try:
261
- # print(f"SupervisorAgent: Deleting Thread {thread.id}...")
262
- self.client.beta.threads.delete(thread.id)
263
- # print(f"SupervisorAgent: Thread {thread.id} deleted.")
264
- except Exception as e:
265
- print(f"SupervisorAgent Error: Failed to delete Thread {thread.id if thread else 'Unknown'}: {e}")
266
-
267
- if __name__ == '__main__':
268
  print("SupervisorAgent should be orchestrated by the ManagerAgent.")
 
1
+ # agents/supervisor_agent.py
2
+
3
+ import json
4
+ import time # Added for polling
5
+ import os # Added for path operations
6
+ from openai import OpenAI # Ensure OpenAI is imported for client usage
7
+
8
+ # --- Constants for the Supervisor Assistant ---
9
+ SUPERVISOR_ASSISTANT_NAME = "TaijiChat Code Review Assistant"
10
+ SUPERVISOR_INSTRUCTIONS_TEMPLATE_FILE = "supervisor_instructions_template.md" # New
11
+
12
+ # Define JSON examples as separate strings
13
+ EXAMPLE_JSON_APPROVED = '''
14
+ {
15
+ "safety_feedback": "Code uses the 'tools' module correctly and employs permitted built-in functions for data processing. No forbidden operations detected.",
16
+ "safety_status": "APPROVED_FOR_EXECUTION",
17
+ "user_facing_rejection_reason": "Approved."
18
+ }
19
+ '''
20
+
21
+ EXAMPLE_JSON_REJECTED = '''
22
+ {
23
+ "safety_feedback": "Forbidden operation detected: Code attempts to import the 'os' module, which is not allowed (Rule I.4).",
24
+ "safety_status": "REJECTED_NEEDS_REVISION",
25
+ "user_facing_rejection_reason": "The code attempted a restricted operation that is not permitted for safety reasons."
26
+ }
27
+ '''
28
+
29
+ # SUPERVISOR_ASSISTANT_INSTRUCTIONS f-string is now removed. It will be loaded from the template file.
30
+
31
+ POLLING_INTERVAL_S = 1
32
+ MAX_POLLING_ATTEMPTS = 60
33
+
34
+ class SupervisorAgent:
35
+ def __init__(self, client_openai: OpenAI = None):
36
+ self.client = client_openai
37
+ self.supervisor_assistant = None
38
+ self.formatted_instructions = self._load_and_format_instructions() # Load instructions
39
+
40
+ if not self.formatted_instructions:
41
+ print("SupervisorAgent Critical: Failed to load or format supervisor instructions. Assistant may not function correctly.")
42
+ # Potentially raise an error or set a flag indicating a critical failure
43
+
44
+ if self.client:
45
+ try:
46
+ self._create_or_retrieve_supervisor_assistant()
47
+ print("SupervisorAgent: Successfully created/retrieved Supervisor Assistant.")
48
+ except Exception as e:
49
+ print(f"SupervisorAgent Error: Could not create/retrieve/update Supervisor Assistant: {str(e)}")
50
+ self.supervisor_assistant = None
51
+ else:
52
+ print("SupervisorAgent Critical: OpenAI client not provided. Cannot create Supervisor Assistant.")
53
+
54
+ def _load_and_format_instructions(self) -> str:
55
+ """Loads instructions from a template file and formats them with JSON examples."""
56
+ try:
57
+ # Construct path relative to this file's location
58
+ script_dir = os.path.dirname(os.path.abspath(__file__))
59
+ template_path = os.path.join(script_dir, SUPERVISOR_INSTRUCTIONS_TEMPLATE_FILE)
60
+
61
+ with open(template_path, 'r', encoding='utf-8') as f:
62
+ template_content = f.read()
63
+
64
+ return template_content.format(
65
+ example_approved_json_str=EXAMPLE_JSON_APPROVED,
66
+ example_rejected_json_str=EXAMPLE_JSON_REJECTED
67
+ )
68
+ except FileNotFoundError:
69
+ print(f"SupervisorAgent Error: Instructions template file not found at {template_path}")
70
+ return None
71
+ except KeyError as e:
72
+ print(f"SupervisorAgent Error: Placeholder key missing in instructions template: {e}")
73
+ return None
74
+ except Exception as e:
75
+ print(f"SupervisorAgent Error: Failed to load/format instructions: {e}")
76
+ return None
77
+
78
+ def _create_or_retrieve_supervisor_assistant(self):
79
+ if not self.client or not self.formatted_instructions:
80
+ if not self.formatted_instructions:
81
+ print("SupervisorAgent Error: Cannot create/retrieve assistant because instructions are missing.")
82
+ if not self.client:
83
+ print("SupervisorAgent Error: Cannot create/retrieve assistant because OpenAI client is missing.")
84
+ return
85
+ try:
86
+ print("SupervisorAgent: Attempting to list existing assistants...")
87
+ try:
88
+ assistants = self.client.beta.assistants.list(order="desc", limit="20")
89
+ except Exception as list_error:
90
+ print(f"SupervisorAgent Error: Failed to list assistants. Error type: {type(list_error).__name__}, Error: {str(list_error)}")
91
+ if hasattr(list_error, 'response'):
92
+ print(f"SupervisorAgent Error: Response status: {list_error.response.status_code if hasattr(list_error.response, 'status_code') else 'N/A'}")
93
+ print(f"SupervisorAgent Error: Response body: {list_error.response.text if hasattr(list_error.response, 'text') else 'N/A'}")
94
+ raise
95
+ found_assistant = None
96
+ for assistant in assistants.data:
97
+ if assistant.name == SUPERVISOR_ASSISTANT_NAME:
98
+ found_assistant = assistant
99
+ print(f"SupervisorAgent: Found existing assistant with ID: {assistant.id}")
100
+ break
101
+
102
+ if found_assistant:
103
+ print(f"SupervisorAgent: Updating existing assistant {found_assistant.id}...")
104
+ try:
105
+ self.supervisor_assistant = self.client.beta.assistants.update(
106
+ assistant_id=found_assistant.id,
107
+ instructions=self.formatted_instructions,
108
+ model="gpt-4o",
109
+ tools=[]
110
+ )
111
+ print(f"SupervisorAgent: Successfully updated assistant {self.supervisor_assistant.id}")
112
+ except Exception as update_error:
113
+ print(f"SupervisorAgent Error: Failed to update assistant: {str(update_error)}")
114
+ raise
115
+ else:
116
+ print(f"SupervisorAgent: Creating new assistant '{SUPERVISOR_ASSISTANT_NAME}'...")
117
+ try:
118
+ self.supervisor_assistant = self.client.beta.assistants.create(
119
+ name=SUPERVISOR_ASSISTANT_NAME,
120
+ instructions=self.formatted_instructions,
121
+ model="gpt-4o",
122
+ tools=[]
123
+ )
124
+ print(f"SupervisorAgent: Successfully created assistant with ID: {self.supervisor_assistant.id}")
125
+ except Exception as create_error:
126
+ print(f"SupervisorAgent Error: Failed to create assistant: {str(create_error)}")
127
+ raise
128
+
129
+ if not self.supervisor_assistant:
130
+ raise Exception("Assistant object is None after creation/update")
131
+
132
+ except Exception as e:
133
+ print(f"SupervisorAgent Error: Could not create/retrieve/update Supervisor Assistant: {str(e)}")
134
+ self.supervisor_assistant = None
135
+ raise # Re-raise the exception to be handled by the caller
136
+
137
+ def review_code(self, python_code: str, thought: str): # Removed client_openai from params
138
+ print(f"SupervisorAgent.review_code received code. Thought: {thought[:100]}...") # Log more of the thought
139
+
140
+ if not python_code.strip():
141
+ print("SupervisorAgent: No actual code provided for review. Approving as safe.")
142
+ return {"safety_feedback": "No code provided by Generation Agent.", "safety_status": "APPROVED_FOR_EXECUTION", "user_facing_rejection_reason": ""}
143
+
144
+ if not self.client or not self.supervisor_assistant:
145
+ print("SupervisorAgent Error: OpenAI client or Supervisor Assistant not available for code review.")
146
+ return {"safety_feedback": "Error: Supervisor Agent not properly initialized.", "safety_status": "REJECTED_NEEDS_REVISION", "user_facing_rejection_reason": "The supervisor agent encountered an error."}
147
+
148
+ thread = None # Initialize for the finally block
149
+ try:
150
+ # 1. Construct User Message Content
151
+ user_message_content = (
152
+ f"Please review the following Python code for safety and correctness based on your instructions.\\n\\n"
153
+ f"Context (AI's plan that generated this code):\\n{thought}\\n\\n"
154
+ f"Python Code to Review:\\n```python\\n{python_code}\\n```\\n"
155
+ f"Ensure your response is only the required JSON object."
156
+ )
157
+
158
+ # 2. Create a Thread
159
+ # print("SupervisorAgent: Creating new Thread for review...")
160
+ thread = self.client.beta.threads.create()
161
+ # print(f"SupervisorAgent: Thread created: {thread.id}")
162
+
163
+ # 3. Add message to Thread
164
+ self.client.beta.threads.messages.create(
165
+ thread_id=thread.id,
166
+ role="user",
167
+ content=user_message_content
168
+ )
169
+ # print("SupervisorAgent: Message added to Thread.")
170
+
171
+ # 4. Create Run
172
+ # print(f"SupervisorAgent: Creating Run for Assistant {self.supervisor_assistant.id} on Thread {thread.id}...")
173
+ run = self.client.beta.threads.runs.create(
174
+ thread_id=thread.id,
175
+ assistant_id=self.supervisor_assistant.id
176
+ )
177
+ # print(f"SupervisorAgent: Run created: {run.id}, Status: {run.status}")
178
+
179
+ # 5. Poll Run for completion
180
+ attempts = 0
181
+ while run.status in ["queued", "in_progress"] and attempts < MAX_POLLING_ATTEMPTS:
182
+ time.sleep(POLLING_INTERVAL_S)
183
+ run = self.client.beta.threads.runs.retrieve(thread_id=thread.id, run_id=run.id)
184
+ # print(f"SupervisorAgent: Polling Run {run.id}: Status: {run.status}") # Verbose
185
+ attempts += 1
186
+
187
+ # 6. Process Run Outcome
188
+ if run.status == "completed":
189
+ # print(f"SupervisorAgent: Run {run.id} completed.")
190
+ messages_response = self.client.beta.threads.messages.list(thread_id=thread.id, order="desc", limit=1)
191
+ if messages_response.data and messages_response.data[0].content and messages_response.data[0].content[0].type == "text":
192
+ assistant_response_json_str = messages_response.data[0].content[0].text.value
193
+ # print(f"SupervisorAgent: Raw LLM review JSON: {assistant_response_json_str}")
194
+
195
+ # Strip markdown fences if present
196
+ if assistant_response_json_str.startswith("```json"):
197
+ assistant_response_json_str = assistant_response_json_str[len("```json"):].strip()
198
+ if assistant_response_json_str.startswith("```"):
199
+ assistant_response_json_str = assistant_response_json_str[len("```"):].strip()
200
+ if assistant_response_json_str.endswith("```"):
201
+ assistant_response_json_str = assistant_response_json_str[:-len("```")].strip()
202
+
203
+ try:
204
+ parsed_response = json.loads(assistant_response_json_str)
205
+ # Validate structure
206
+ if not all(k in parsed_response for k in ["safety_feedback", "safety_status", "user_facing_rejection_reason"]):
207
+ print("SupervisorAgent Error: LLM review JSON missing required keys.")
208
+ return {
209
+ "safety_feedback": "Internal Error: LLM review response malformed (missing keys).",
210
+ "safety_status": "REJECTED_NEEDS_REVISION",
211
+ "user_facing_rejection_reason": "The code review process encountered an internal error."
212
+ }
213
+ # Validate safety_status value
214
+ if parsed_response["safety_status"] not in ["APPROVED_FOR_EXECUTION", "REJECTED_NEEDS_REVISION"]:
215
+ print(f"SupervisorAgent Error: LLM returned an invalid safety_status value: {parsed_response['safety_status']}.")
216
+ # Override with rejection if status is invalid
217
+ parsed_response["safety_feedback"] += " (Original status was invalid)"
218
+ parsed_response["safety_status"] = "REJECTED_NEEDS_REVISION"
219
+ # Ensure user_facing_rejection_reason is generic if not already sensible
220
+ if not parsed_response.get("user_facing_rejection_reason"):
221
+ parsed_response["user_facing_rejection_reason"] = "The code could not be validated due to an internal status error."
222
+
223
+ # Ensure user_facing_rejection_reason is present if rejected, and appropriate if approved
224
+ if parsed_response["safety_status"] == "REJECTED_NEEDS_REVISION" and not parsed_response.get("user_facing_rejection_reason","").strip():
225
+ parsed_response["user_facing_rejection_reason"] = "The proposed code could not be approved due to safety or correctness concerns." # Default user reason
226
+ elif parsed_response["safety_status"] == "APPROVED_FOR_EXECUTION" and not parsed_response.get("user_facing_rejection_reason","").strip():
227
+ parsed_response["user_facing_rejection_reason"] = "Approved."
228
+
229
+ return parsed_response
230
+ except json.JSONDecodeError as e:
231
+ print(f"SupervisorAgent JSONDecodeError: Could not parse LLM review JSON: {e}. Response: {assistant_response_json_str}")
232
+ return {
233
+ "safety_feedback": f"Internal Error: Failed to parse LLM review JSON. {e}",
234
+ "safety_status": "REJECTED_NEEDS_REVISION",
235
+ "user_facing_rejection_reason": "The code review result was unreadable."
236
+ }
237
+ else:
238
+ print("SupervisorAgent Error: No valid message content from assistant after review run completion.")
239
+ return {
240
+ "safety_feedback": "Internal Error: No content from supervisor assistant.",
241
+ "safety_status": "REJECTED_NEEDS_REVISION",
242
+ "user_facing_rejection_reason": "The supervisor agent provided no response."
243
+ }
244
+ else:
245
+ error_message = f"Review run failed or timed out. Status: {run.status}"
246
+ if run.last_error:
247
+ error_message += f" Last Error: {run.last_error.message}"
248
+ print(f"SupervisorAgent Error: {error_message}")
249
+ return {"safety_feedback": error_message, "safety_status": "REJECTED_NEEDS_REVISION", "user_facing_rejection_reason": "The code review process encountered an error."}
250
+ except Exception as e:
251
+ print(f"SupervisorAgent Error: General exception during review_code: {e}")
252
+ return {
253
+ "safety_feedback": f"General exception in review_code: {e}",
254
+ "safety_status": "REJECTED_NEEDS_REVISION",
255
+ "user_facing_rejection_reason": "A general error occurred during code review."
256
+ }
257
+ finally:
258
+ # 7. Delete Thread
259
+ if thread:
260
+ try:
261
+ # print(f"SupervisorAgent: Deleting Thread {thread.id}...")
262
+ self.client.beta.threads.delete(thread.id)
263
+ # print(f"SupervisorAgent: Thread {thread.id} deleted.")
264
+ except Exception as e:
265
+ print(f"SupervisorAgent Error: Failed to delete Thread {thread.id if thread else 'Unknown'}: {e}")
266
+
267
+ if __name__ == '__main__':
268
  print("SupervisorAgent should be orchestrated by the ManagerAgent.")
cache_data/excel_schema_cache.json CHANGED
@@ -1,6 +1,6 @@
1
- {
2
- "freshness_signature": {
3
- "multi-omicsdata.xlsx": 1748219732.5558078
4
- },
5
- "formatted_schema_string": "DYNAMICALLY DISCOVERED EXCEL SCHEMAS (first sheet columns shown):\n- File: 'www\\multi-omicsdata.xlsx' (Identifier: 'multi_omicsdata')\n Sheets: [Dataset detail, specific TF-Taiji, newspecific TF-Taiji, Validation list, T cell differentiation map, Kay nanostring panel, Meeting summary, Kays wetlab to do, Wang Lab work to do, Philip et al. 2017 Nature sampl, T cell migration associated gen, KO EX-> proEX, Reprogramming MP -> TRM, Reprogramming TEX -> TRM, Reprogramming TEX -> proEX]\n Columns (first sheet): [Author, Lab, Year, DOI, Accession, Data type, Species, Infection, Naive, MP (MPEC), ... (total 23 columns)]\n- File: 'www\\networkanalysis\\comp_log2FC_RegulatedData_TRMTEXterm.xlsx' (Identifier: 'comp_log2FC_RegulatedData_TRMTEXterm')\n Sheets: [Worksheet]\n Columns (first sheet): [Unnamed: 0, Ahr, Arid3a, Arnt, Arntl, Atf1, Atf2, Atf3, Atf4, Atf7, ... (total 199 columns)]\n- File: 'www\\old files\\log2FC_RegulatedData_TRMTEXterm.xlsx' (Identifier: 'log2FC_RegulatedData_TRMTEXterm')\n Sheets: [Worksheet]\n Columns (first sheet): [Unnamed: 0, Ahr, Arid3a, Arnt, Arntl, Atf1, Atf2, Atf3, Atf4, Atf7, ... (total 199 columns)]\n- File: 'www\\tablePagerank\\MP.xlsx' (Identifier: 'MP')\n Sheets: [Sheet1]\n Columns (first sheet): [TF, Naive_Kaech_Kaech, Naive_Kaech_Chung, Naive_Mackay_Chung, Naive_MilnerAug_Chung, Naive_Renkema_Chung, Naive_Scott_Scott, MP_Kaech_Chung, MP_Kaech_Kaech, MP_Kaech_Scott, ... (total 43 columns)]\n- File: 'www\\tablePagerank\\Naive.xlsx' (Identifier: 'Naive')\n Sheets: [Sheet1]\n Columns (first sheet): [TF, Naive_Kaech_Kaech, Naive_Kaech_Chung, Naive_Mackay_Chung, Naive_MilnerAug_Chung, Naive_Renkema_Chung, Naive_Scott_Scott, MP_Kaech_Chung, MP_Kaech_Kaech, MP_Kaech_Scott, ... (total 43 columns)]\n- File: 'www\\tablePagerank\\Table_TF PageRank Scores for Audrey.xlsx' (Identifier: 'Table_TF_PageRank_Scores_for_Audrey')\n Sheets: [Fig_1F (Multi state-specific TF, Fig_1G (Single state-specific T]\n Columns (first sheet): [Unnamed: 0, Category, Cell-state specificity, Naive_Kaech_Kaech, Naive_Kaech_Chung, Naive_Mackay_Chung, Naive_MilnerAug_Chung, Naive_Renkema_Chung, Naive_Scott_Scott, MP_Kaech_Chung, ... (total 45 columns)]\n- File: 'www\\tablePagerank\\TCM.xlsx' (Identifier: 'TCM')\n Sheets: [Sheet1]\n Columns (first sheet): [TF, Naive_Kaech_Kaech, Naive_Kaech_Chung, Naive_Mackay_Chung, Naive_MilnerAug_Chung, Naive_Renkema_Chung, Naive_Scott_Scott, MP_Kaech_Chung, MP_Kaech_Kaech, MP_Kaech_Scott, ... (total 43 columns)]\n- File: 'www\\tablePagerank\\TE.xlsx' (Identifier: 'TE')\n Sheets: [Sheet1]\n Columns (first sheet): [TF, Naive_Kaech_Kaech, Naive_Kaech_Chung, Naive_Mackay_Chung, Naive_MilnerAug_Chung, Naive_Renkema_Chung, Naive_Scott_Scott, MP_Kaech_Chung, MP_Kaech_Kaech, MP_Kaech_Scott, ... (total 43 columns)]\n- File: 'www\\tablePagerank\\TEM.xlsx' (Identifier: 'TEM')\n Sheets: [Sheet1]\n Columns (first sheet): [TF, Naive_Kaech_Kaech, Naive_Kaech_Chung, Naive_Mackay_Chung, Naive_MilnerAug_Chung, Naive_Renkema_Chung, Naive_Scott_Scott, MP_Kaech_Chung, MP_Kaech_Kaech, MP_Kaech_Scott, ... (total 43 columns)]\n- File: 'www\\tablePagerank\\TEXeff.xlsx' (Identifier: 'TEXeff')\n Sheets: [Sheet1]\n Columns (first sheet): [TF, Naive_Kaech_Kaech, Naive_Kaech_Chung, Naive_Mackay_Chung, Naive_MilnerAug_Chung, Naive_Renkema_Chung, Naive_Scott_Scott, MP_Kaech_Chung, MP_Kaech_Kaech, MP_Kaech_Scott, ... (total 43 columns)]\n- File: 'www\\tablePagerank\\TEXprog.xlsx' (Identifier: 'TEXprog')\n Sheets: [Sheet1]\n Columns (first sheet): [TF, Naive_Kaech_Kaech, Naive_Kaech_Chung, Naive_Mackay_Chung, Naive_MilnerAug_Chung, Naive_Renkema_Chung, Naive_Scott_Scott, MP_Kaech_Chung, MP_Kaech_Kaech, MP_Kaech_Scott, ... (total 43 columns)]\n- File: 'www\\tablePagerank\\TEXterm.xlsx' (Identifier: 'TEXterm')\n Sheets: [Sheet1]\n Columns (first sheet): [TF, Naive_Kaech_Kaech, Naive_Kaech_Chung, Naive_Mackay_Chung, Naive_MilnerAug_Chung, Naive_Renkema_Chung, Naive_Scott_Scott, MP_Kaech_Chung, MP_Kaech_Kaech, MP_Kaech_Scott, ... (total 43 columns)]\n- File: 'www\\tablePagerank\\TRM.xlsx' (Identifier: 'TRM')\n Sheets: [Sheet1]\n Columns (first sheet): [TF, Naive_Kaech_Kaech, Naive_Kaech_Chung, Naive_Mackay_Chung, Naive_MilnerAug_Chung, Naive_Renkema_Chung, Naive_Scott_Scott, MP_Kaech_Chung, MP_Kaech_Kaech, MP_Kaech_Scott, ... (total 43 columns)]\n- File: 'www\\tfcommunities\\texcommunities.xlsx' (Identifier: 'texcommunities')\n Sheets: [TEX Communities, TEX_c1, TEX_c2, TEX_c3, TEX_c4, TEX_c5, TRM Communities, TRM_c1, TRM_c2, TRM_c3, TRM_c4, TRM_c5]\n Columns (first sheet): [TEX Communities, TF Members]\n- File: 'www\\tfcommunities\\trmcommunities.xlsx' (Identifier: 'trmcommunities')\n Sheets: [Sheet1]\n Columns (first sheet): [TRM Communities, TF Members]\n- File: 'www\\TFcorintextrm\\TF-TFcorTRMTEX.xlsx' (Identifier: 'TF_TFcorTRMTEX')\n Sheets: [Sheet1]\n Columns (first sheet): [TF Name, TF Merged Graph Path]\n- File: 'www\\waveanalysis\\searchtfwaves.xlsx' (Identifier: 'searchtfwaves')\n Sheets: [Sheet1]\n Columns (first sheet): [Wave 1, Wave 2, Wave 3, Wave 4, Wave 5, Wave 6, Wave 7]"
6
  }
 
1
+ {
2
+ "freshness_signature": {
3
+ "multi-omicsdata.xlsx": 1748219732.5558078
4
+ },
5
+ "formatted_schema_string": "DYNAMICALLY DISCOVERED EXCEL SCHEMAS (first sheet columns shown):\n- File: 'www\\multi-omicsdata.xlsx' (Identifier: 'multi_omicsdata')\n Sheets: [Dataset detail, specific TF-Taiji, newspecific TF-Taiji, Validation list, T cell differentiation map, Kay nanostring panel, Meeting summary, Kays wetlab to do, Wang Lab work to do, Philip et al. 2017 Nature sampl, T cell migration associated gen, KO EX-> proEX, Reprogramming MP -> TRM, Reprogramming TEX -> TRM, Reprogramming TEX -> proEX]\n Columns (first sheet): [Author, Lab, Year, DOI, Accession, Data type, Species, Infection, Naive, MP (MPEC), ... (total 23 columns)]\n- File: 'www\\networkanalysis\\comp_log2FC_RegulatedData_TRMTEXterm.xlsx' (Identifier: 'comp_log2FC_RegulatedData_TRMTEXterm')\n Sheets: [Worksheet]\n Columns (first sheet): [Unnamed: 0, Ahr, Arid3a, Arnt, Arntl, Atf1, Atf2, Atf3, Atf4, Atf7, ... (total 199 columns)]\n- File: 'www\\old files\\log2FC_RegulatedData_TRMTEXterm.xlsx' (Identifier: 'log2FC_RegulatedData_TRMTEXterm')\n Sheets: [Worksheet]\n Columns (first sheet): [Unnamed: 0, Ahr, Arid3a, Arnt, Arntl, Atf1, Atf2, Atf3, Atf4, Atf7, ... (total 199 columns)]\n- File: 'www\\tablePagerank\\MP.xlsx' (Identifier: 'MP')\n Sheets: [Sheet1]\n Columns (first sheet): [TF, Naive_Kaech_Kaech, Naive_Kaech_Chung, Naive_Mackay_Chung, Naive_MilnerAug_Chung, Naive_Renkema_Chung, Naive_Scott_Scott, MP_Kaech_Chung, MP_Kaech_Kaech, MP_Kaech_Scott, ... (total 43 columns)]\n- File: 'www\\tablePagerank\\Naive.xlsx' (Identifier: 'Naive')\n Sheets: [Sheet1]\n Columns (first sheet): [TF, Naive_Kaech_Kaech, Naive_Kaech_Chung, Naive_Mackay_Chung, Naive_MilnerAug_Chung, Naive_Renkema_Chung, Naive_Scott_Scott, MP_Kaech_Chung, MP_Kaech_Kaech, MP_Kaech_Scott, ... (total 43 columns)]\n- File: 'www\\tablePagerank\\Table_TF PageRank Scores for Audrey.xlsx' (Identifier: 'Table_TF_PageRank_Scores_for_Audrey')\n Sheets: [Fig_1F (Multi state-specific TF, Fig_1G (Single state-specific T]\n Columns (first sheet): [Unnamed: 0, Category, Cell-state specificity, Naive_Kaech_Kaech, Naive_Kaech_Chung, Naive_Mackay_Chung, Naive_MilnerAug_Chung, Naive_Renkema_Chung, Naive_Scott_Scott, MP_Kaech_Chung, ... (total 45 columns)]\n- File: 'www\\tablePagerank\\TCM.xlsx' (Identifier: 'TCM')\n Sheets: [Sheet1]\n Columns (first sheet): [TF, Naive_Kaech_Kaech, Naive_Kaech_Chung, Naive_Mackay_Chung, Naive_MilnerAug_Chung, Naive_Renkema_Chung, Naive_Scott_Scott, MP_Kaech_Chung, MP_Kaech_Kaech, MP_Kaech_Scott, ... (total 43 columns)]\n- File: 'www\\tablePagerank\\TE.xlsx' (Identifier: 'TE')\n Sheets: [Sheet1]\n Columns (first sheet): [TF, Naive_Kaech_Kaech, Naive_Kaech_Chung, Naive_Mackay_Chung, Naive_MilnerAug_Chung, Naive_Renkema_Chung, Naive_Scott_Scott, MP_Kaech_Chung, MP_Kaech_Kaech, MP_Kaech_Scott, ... (total 43 columns)]\n- File: 'www\\tablePagerank\\TEM.xlsx' (Identifier: 'TEM')\n Sheets: [Sheet1]\n Columns (first sheet): [TF, Naive_Kaech_Kaech, Naive_Kaech_Chung, Naive_Mackay_Chung, Naive_MilnerAug_Chung, Naive_Renkema_Chung, Naive_Scott_Scott, MP_Kaech_Chung, MP_Kaech_Kaech, MP_Kaech_Scott, ... (total 43 columns)]\n- File: 'www\\tablePagerank\\TEXeff.xlsx' (Identifier: 'TEXeff')\n Sheets: [Sheet1]\n Columns (first sheet): [TF, Naive_Kaech_Kaech, Naive_Kaech_Chung, Naive_Mackay_Chung, Naive_MilnerAug_Chung, Naive_Renkema_Chung, Naive_Scott_Scott, MP_Kaech_Chung, MP_Kaech_Kaech, MP_Kaech_Scott, ... (total 43 columns)]\n- File: 'www\\tablePagerank\\TEXprog.xlsx' (Identifier: 'TEXprog')\n Sheets: [Sheet1]\n Columns (first sheet): [TF, Naive_Kaech_Kaech, Naive_Kaech_Chung, Naive_Mackay_Chung, Naive_MilnerAug_Chung, Naive_Renkema_Chung, Naive_Scott_Scott, MP_Kaech_Chung, MP_Kaech_Kaech, MP_Kaech_Scott, ... (total 43 columns)]\n- File: 'www\\tablePagerank\\TEXterm.xlsx' (Identifier: 'TEXterm')\n Sheets: [Sheet1]\n Columns (first sheet): [TF, Naive_Kaech_Kaech, Naive_Kaech_Chung, Naive_Mackay_Chung, Naive_MilnerAug_Chung, Naive_Renkema_Chung, Naive_Scott_Scott, MP_Kaech_Chung, MP_Kaech_Kaech, MP_Kaech_Scott, ... (total 43 columns)]\n- File: 'www\\tablePagerank\\TRM.xlsx' (Identifier: 'TRM')\n Sheets: [Sheet1]\n Columns (first sheet): [TF, Naive_Kaech_Kaech, Naive_Kaech_Chung, Naive_Mackay_Chung, Naive_MilnerAug_Chung, Naive_Renkema_Chung, Naive_Scott_Scott, MP_Kaech_Chung, MP_Kaech_Kaech, MP_Kaech_Scott, ... (total 43 columns)]\n- File: 'www\\tfcommunities\\texcommunities.xlsx' (Identifier: 'texcommunities')\n Sheets: [TEX Communities, TEX_c1, TEX_c2, TEX_c3, TEX_c4, TEX_c5, TRM Communities, TRM_c1, TRM_c2, TRM_c3, TRM_c4, TRM_c5]\n Columns (first sheet): [TEX Communities, TF Members]\n- File: 'www\\tfcommunities\\trmcommunities.xlsx' (Identifier: 'trmcommunities')\n Sheets: [Sheet1]\n Columns (first sheet): [TRM Communities, TF Members]\n- File: 'www\\TFcorintextrm\\TF-TFcorTRMTEX.xlsx' (Identifier: 'TF_TFcorTRMTEX')\n Sheets: [Sheet1]\n Columns (first sheet): [TF Name, TF Merged Graph Path]\n- File: 'www\\waveanalysis\\searchtfwaves.xlsx' (Identifier: 'searchtfwaves')\n Sheets: [Sheet1]\n Columns (first sheet): [Wave 1, Wave 2, Wave 3, Wave 4, Wave 5, Wave 6, Wave 7]"
6
  }
chat_ui.R CHANGED
@@ -45,6 +45,14 @@ chatSidebarUI <- function() {
45
  style = "height: calc(100vh - 200px); overflow-y: auto; border: 1px solid #ccc; padding: 10px; margin-bottom: 10px; background-color: white;"
46
  # Placeholder for messages
47
  ),
 
 
 
 
 
 
 
 
48
  div(
49
  class = "chat-input-area", # For styling input area
50
  style = "display: flex; align-items: stretch;",
 
45
  style = "height: calc(100vh - 200px); overflow-y: auto; border: 1px solid #ccc; padding: 10px; margin-bottom: 10px; background-color: white;"
46
  # Placeholder for messages
47
  ),
48
+ div(
49
+ class = "literature-toggle-container",
50
+ style = "margin: 5px 0;",
51
+ actionButton("literatureToggleBtn",
52
+ HTML('<i class="fa fa-search"></i> External Literature'),
53
+ class = "btn btn-outline-info literature-toggle-btn",
54
+ style = "width: 100%; font-size: 12px;")
55
+ ),
56
  div(
57
  class = "chat-input-area", # For styling input area
58
  style = "display: flex; align-items: stretch;",
codebase_analysis.md CHANGED
@@ -1,153 +1,153 @@
1
- # How to Run the Application
2
-
3
- To run this R Shiny application, you will need R and the RStudio IDE (recommended) or another R environment installed on your system. You will also need the `shiny` package and other packages listed as dependencies (`readxl`, `DT`, `dplyr`, `shinythemes`).
4
-
5
- **Steps:**
6
-
7
- 1. **Install R and RStudio:** If you haven't already, download and install R from [CRAN](https://cran.r-project.org/) and RStudio Desktop from [Posit](https://posit.co/download/rstudio-desktop/).
8
- 2. **Install Required R Packages:** Open R or RStudio and run the following commands in the R console:
9
- ```R
10
- install.packages(c("shiny", "readxl", "DT", "dplyr", "shinythemes"))
11
- ```
12
- 3. **Set Working Directory:** Navigate your R session's working directory to the root folder of this Shiny application (the folder containing `server.R` and `ui.R`). In RStudio, you can do this by opening either `server.R` or `ui.R` and then going to `Session > Set Working Directory > To Source File Location`.
13
- 4. **Run the App:** In the R console, execute the following command:
14
- ```R
15
- shiny::runApp()
16
- ```
17
- Alternatively, if you have `server.R` or `ui.R` open in RStudio, a "Run App" button will typically appear at the top of the editor pane, which you can click.
18
-
19
- This will launch the application in your default web browser.
20
-
21
- ---
22
-
23
- # Codebase Analysis: TaijiChat Shiny Application
24
-
25
- ## Overview
26
-
27
- The codebase consists of an R Shiny application designed to explore and visualize bioinformatics data related to T cell states and transcription factors (TFs). It appears to be a companion tool for a research publication, aiming to make complex datasets accessible. The application is structured into two main files: `server.R` (server-side logic) and `ui.R` (user interface definition). Data is primarily loaded from Excel files and images stored in a `www/` subdirectory.
28
-
29
- ## File Breakdown
30
-
31
- ### `server.R` (Server Logic)
32
-
33
- **Key Functionalities:**
34
-
35
- 1. **Data Loading and Preprocessing:**
36
- * Loads multiple Excel datasets for TF PageRank scores, TF wave analysis, TF-TF correlations, TF communities, and multi-omics data. These files are located in `www/tablePagerank/`, `www/waveanalysis/`, `www/TFcorintextrm/`, and `www/tfcommunities/`.
37
- * `new_read_excel_file()`: Reads and transposes Excel files, setting "Regulator Names" from the first column and using the original first row as new column headers.
38
- * `new_filter_data()`: Filters transposed dataframes by column names based on user search input (supports multiple comma-separated, case-insensitive keywords).
39
-
40
- 2. **TF Catalog Data Display (Repetitive Structure):**
41
- * Handles data for Overall TF PageRank, Naive, TE, MP, TCM, TEM, TRM, TEXprog, TEXeff-like, and TEXterm cell states.
42
- * For each dataset:
43
- * Uses `reactiveVal` for column pagination state (4 columns per page).
44
- * `observeEvent`s for "next" and "previous" button functionality.
45
- * Reactive expressions filter data by search term and select columns for the current page.
46
- * Dynamically inserts a styled "Cell state data" row with "TF activity score" (at row index 2 for main PageRank table, row index 0 for others).
47
- * `renderDT` outputs `DT::datatable` with custom options (fixed 45 rows, no search box, JS `rowCallback` to highlight the "TF activity score" row).
48
-
49
- 3. **TF Wave Analysis:**
50
- * Loads TF wave data from `www/waveanalysis/searchtfwaves.xlsx`.
51
- * Allows users to search for a TF and view its associated wave(s) in a transposed table.
52
-
53
- 4. **TF-TF Correlation in TRM/TEXterm:**
54
- * Loads data from `www/TFcorintextrm/TF-TFcorTRMTEX.xlsx`.
55
- * Allows TF search.
56
- * Renders a clickable list of TFs (`actionLink`s).
57
- * Displays tabular data and an associated image ("TF Merged Graph Path") for the selected/searched TF.
58
-
59
- 5. **TF Communities:**
60
- * Loads data from `www/tfcommunities/trmcommunities.xlsx` and `www/tfcommunities/texcommunities.xlsx`.
61
- * Displays them as simple `DT::datatable` objects.
62
-
63
- 6. **Multi-omics Data Table:**
64
- * Loads data from `www/multi-omicsdata.xlsx`.
65
- * Renders as a `DT::datatable`, creating hyperlinks in the "Author" column from a "DOI" column, removing empty columns, and enabling scrolling.
66
-
67
- 7. **Navigation & Other:**
68
- * `observeEvent`s for UI element clicks (e.g., `input$c1_link`) to navigate tabs via `updateNavbarPage`.
69
- * Redirects to a bioRxiv paper URL via `session$sendCustomMessage`.
70
- * Contains significant commented-out code (older logic).
71
-
72
- **Libraries Used:** `shiny`, `readxl`, `DT`, `dplyr`.
73
-
74
- ### `ui.R` (User Interface)
75
-
76
- **Key Functionalities:**
77
-
78
- 1. **Overall Structure:**
79
- * Uses `shinytheme("flatly")`.
80
- * `navbarPage` for the main tabbed interface.
81
- * Custom CSS for fonts (`Arial`).
82
- * JavaScript for URL redirection and a modal dialog.
83
-
84
- 2. **Home Tab:**
85
- * Project/study description.
86
- * Layout with an image (`homedesc.png`) featuring clickable `actionLink`s for navigation.
87
- * "Read Now" button linking to the research paper.
88
- * Footer with lab links and logos.
89
-
90
- 3. **TF Catalog (`navbarMenu`):**
91
- * **"Search TF Scores" Tab:**
92
- * Explanatory text, image (`tfcat/onlycellstates.png`).
93
- * Search input (`search_input`), column pagination buttons (`prev_btn`, `next_btn`), `DTOutput("table")`.
94
- * **"Cell State Specific TF Catalog" Tab (`navlistPanel`):**
95
- * Sub-tabs for Naive, TE, MP, Tcm, Tem, Trm, TEXprog, TEXeff-like, TEXterm.
96
- * Each sub-tab has a consistent layout: header, text, a specific bubble plot image (from `www/bubbleplots/`), search input, pagination buttons, and `DTOutput`.
97
- * **"Multi-State TFs" Tab:** Displays a heatmap image (`tfcat/multistatesheatmap.png`).
98
-
99
- 4. **TF Wave Analysis (`navbarMenu`):**
100
- * **"Overview" Tab:**
101
- * Explanatory text, overview image (`tfwaveanal.png`).
102
- * Clickable images (`waveanalysis/c1.jpg` to `c6.jpg`, linked via `c1_link` etc.) for navigation to detail tabs.
103
- * Search input (`search_input_wave`), `DTOutput("table_wave")`.
104
- * **Individual Wave Tabs ("Wave 1" to "Wave 7"):**
105
- * Each tab displays the wave image, a GO KEGG result image, and "Ranked Text" image(s) from `www/waveanalysis/` and `www/waveanalysis/txtJPG/`.
106
-
107
- 5. **TF Network Analysis (`navbarMenu`):**
108
- * **"Search TF-TF correlation in TRM/TEXterm" Tab:**
109
- * Methodology description, image (`networkanalysis/tfcorrdesc.png`).
110
- * `sidebarLayout` with search input (`search`), button (`search_btn`), `tableOutput("gene_list_table")` for available TFs.
111
- * `mainPanel` with `tableOutput("result_table")`, legend, and `uiOutput("image_gallery")`.
112
- * Footer with citations.
113
- * **"TRM/TEXterm TF communities" Tab:**
114
- * Descriptive text, images (`networkanalysis/community.jpg`, `networkanalysis/trmtexcom.png`, `networkanalysis/tfcompathway.png`).
115
- * Two `DTOutput`s (`trmcom`, `texcom`) for community tables.
116
- * Footer with citations.
117
-
118
- 6. **Multi-omics Data Tab:**
119
- * Header, text, `dataTableOutput("multiomicsdatatable")`.
120
-
121
- 7. **Global Header Elements:**
122
- * Defines a modal dialog and associated JavaScript (triggered by an element `#csdescrip_link`, not explicitly found in the provided UI snippets for the main content area).
123
- * JavaScript to send a Shiny input upon `#c1_link` click.
124
-
125
- **Libraries Used:** `shiny`, `shinythemes`, `DT`.
126
-
127
- ## General Architecture and Observations
128
-
129
- * **Purpose:** The application serves as an interactive data exploration tool, likely accompanying a scientific publication on T cell biology.
130
- * **Data Source:** Heavily reliant on pre-processed data stored in Excel files and pre-generated images within the `www/` directory. This indicates that the core data processing happens outside this Shiny app.
131
- * **Repetitive Code Structure:** Significant code duplication exists in both `server.R` and `ui.R`.
132
- * In `server.R`, the logic for loading, filtering, paginating, and rendering tables for the nine different cell state TF scores is nearly identical.
133
- * In `ui.R`, the layout for each of these cell state specific tabs, and also for each of the seven individual TF wave analysis tabs, is highly repetitive.
134
- * This repetition suggests a strong opportunity for refactoring by creating reusable R functions or Shiny modules to generate these UI and server components dynamically.
135
- * **User Interface (UI):** The UI is well-structured with a `navbarPage` and logical tab groupings. It provides good contextual information (descriptions, explanations of scores/plots) for users.
136
- * **Interactivity:**
137
- * Search functionality for TFs/regulators across various datasets.
138
- * Custom column-based pagination for wide tables.
139
- * Clickable images and links for navigation between sections.
140
- * Dynamic display of tables and images based on user selections.
141
- * **Modularity (Potential):** While not heavily modularized currently due to repetition, the distinct analytical sections (TF Catalog, Wave Analysis, Network Analysis) could be prime candidates for separation into modules if the application were to be expanded or refactored.
142
- * **Static Content:** A significant portion of the content, especially in the Wave Analysis and Network Analysis tabs, involves displaying pre-generated static images (plots, pathway results).
143
- * **Code Graveyard:** Both files end with a "CODE GRAVEYARD" comment, indicating that there's older, unused code present.
144
-
145
- ## Potential Areas for Improvement/Refactoring
146
-
147
- * **Modularization:** Encapsulate the repetitive UI and server logic for cell-state specific tables and individual wave pages into functions or Shiny modules to reduce code duplication and improve maintainability.
148
- * **Dynamic Image Generation (Optional):** If source data and plotting scripts were available, some images currently served statically could potentially be generated dynamically, offering more flexibility. However, for a publication companion app, static images are often sufficient and ensure reproducibility of figures.
149
- * **Consolidate Helper Functions:** General utility functions (like `new_read_excel_file` and `new_filter_data`) are well-defined but ensure they are used consistently.
150
- * **CSS Styling:** Centralize CSS styling rather than relying heavily on inline `style` attributes within `tags$div` and other elements, potentially using a separate CSS file.
151
- * **Modal Trigger:** Clarify or ensure the `#csdescrip_link` element, which triggers the global modal, is present and functional in the UI.
152
-
153
  This analysis provides a snapshot of the codebase's structure, functionality, and potential areas for future development or refinement.
 
1
+ # How to Run the Application
2
+
3
+ To run this R Shiny application, you will need R and the RStudio IDE (recommended) or another R environment installed on your system. You will also need the `shiny` package and other packages listed as dependencies (`readxl`, `DT`, `dplyr`, `shinythemes`).
4
+
5
+ **Steps:**
6
+
7
+ 1. **Install R and RStudio:** If you haven't already, download and install R from [CRAN](https://cran.r-project.org/) and RStudio Desktop from [Posit](https://posit.co/download/rstudio-desktop/).
8
+ 2. **Install Required R Packages:** Open R or RStudio and run the following commands in the R console:
9
+ ```R
10
+ install.packages(c("shiny", "readxl", "DT", "dplyr", "shinythemes"))
11
+ ```
12
+ 3. **Set Working Directory:** Navigate your R session's working directory to the root folder of this Shiny application (the folder containing `server.R` and `ui.R`). In RStudio, you can do this by opening either `server.R` or `ui.R` and then going to `Session > Set Working Directory > To Source File Location`.
13
+ 4. **Run the App:** In the R console, execute the following command:
14
+ ```R
15
+ shiny::runApp()
16
+ ```
17
+ Alternatively, if you have `server.R` or `ui.R` open in RStudio, a "Run App" button will typically appear at the top of the editor pane, which you can click.
18
+
19
+ This will launch the application in your default web browser.
20
+
21
+ ---
22
+
23
+ # Codebase Analysis: TaijiChat Shiny Application
24
+
25
+ ## Overview
26
+
27
+ The codebase consists of an R Shiny application designed to explore and visualize bioinformatics data related to T cell states and transcription factors (TFs). It appears to be a companion tool for a research publication, aiming to make complex datasets accessible. The application is structured into two main files: `server.R` (server-side logic) and `ui.R` (user interface definition). Data is primarily loaded from Excel files and images stored in a `www/` subdirectory.
28
+
29
+ ## File Breakdown
30
+
31
+ ### `server.R` (Server Logic)
32
+
33
+ **Key Functionalities:**
34
+
35
+ 1. **Data Loading and Preprocessing:**
36
+ * Loads multiple Excel datasets for TF PageRank scores, TF wave analysis, TF-TF correlations, TF communities, and multi-omics data. These files are located in `www/tablePagerank/`, `www/waveanalysis/`, `www/TFcorintextrm/`, and `www/tfcommunities/`.
37
+ * `new_read_excel_file()`: Reads and transposes Excel files, setting "Regulator Names" from the first column and using the original first row as new column headers.
38
+ * `new_filter_data()`: Filters transposed dataframes by column names based on user search input (supports multiple comma-separated, case-insensitive keywords).
39
+
40
+ 2. **TF Catalog Data Display (Repetitive Structure):**
41
+ * Handles data for Overall TF PageRank, Naive, TE, MP, TCM, TEM, TRM, TEXprog, TEXeff-like, and TEXterm cell states.
42
+ * For each dataset:
43
+ * Uses `reactiveVal` for column pagination state (4 columns per page).
44
+ * `observeEvent`s for "next" and "previous" button functionality.
45
+ * Reactive expressions filter data by search term and select columns for the current page.
46
+ * Dynamically inserts a styled "Cell state data" row with "TF activity score" (at row index 2 for main PageRank table, row index 0 for others).
47
+ * `renderDT` outputs `DT::datatable` with custom options (fixed 45 rows, no search box, JS `rowCallback` to highlight the "TF activity score" row).
48
+
49
+ 3. **TF Wave Analysis:**
50
+ * Loads TF wave data from `www/waveanalysis/searchtfwaves.xlsx`.
51
+ * Allows users to search for a TF and view its associated wave(s) in a transposed table.
52
+
53
+ 4. **TF-TF Correlation in TRM/TEXterm:**
54
+ * Loads data from `www/TFcorintextrm/TF-TFcorTRMTEX.xlsx`.
55
+ * Allows TF search.
56
+ * Renders a clickable list of TFs (`actionLink`s).
57
+ * Displays tabular data and an associated image ("TF Merged Graph Path") for the selected/searched TF.
58
+
59
+ 5. **TF Communities:**
60
+ * Loads data from `www/tfcommunities/trmcommunities.xlsx` and `www/tfcommunities/texcommunities.xlsx`.
61
+ * Displays them as simple `DT::datatable` objects.
62
+
63
+ 6. **Multi-omics Data Table:**
64
+ * Loads data from `www/multi-omicsdata.xlsx`.
65
+ * Renders as a `DT::datatable`, creating hyperlinks in the "Author" column from a "DOI" column, removing empty columns, and enabling scrolling.
66
+
67
+ 7. **Navigation & Other:**
68
+ * `observeEvent`s for UI element clicks (e.g., `input$c1_link`) to navigate tabs via `updateNavbarPage`.
69
+ * Redirects to a bioRxiv paper URL via `session$sendCustomMessage`.
70
+ * Contains significant commented-out code (older logic).
71
+
72
+ **Libraries Used:** `shiny`, `readxl`, `DT`, `dplyr`.
73
+
74
+ ### `ui.R` (User Interface)
75
+
76
+ **Key Functionalities:**
77
+
78
+ 1. **Overall Structure:**
79
+ * Uses `shinytheme("flatly")`.
80
+ * `navbarPage` for the main tabbed interface.
81
+ * Custom CSS for fonts (`Arial`).
82
+ * JavaScript for URL redirection and a modal dialog.
83
+
84
+ 2. **Home Tab:**
85
+ * Project/study description.
86
+ * Layout with an image (`homedesc.png`) featuring clickable `actionLink`s for navigation.
87
+ * "Read Now" button linking to the research paper.
88
+ * Footer with lab links and logos.
89
+
90
+ 3. **TF Catalog (`navbarMenu`):**
91
+ * **"Search TF Scores" Tab:**
92
+ * Explanatory text, image (`tfcat/onlycellstates.png`).
93
+ * Search input (`search_input`), column pagination buttons (`prev_btn`, `next_btn`), `DTOutput("table")`.
94
+ * **"Cell State Specific TF Catalog" Tab (`navlistPanel`):**
95
+ * Sub-tabs for Naive, TE, MP, Tcm, Tem, Trm, TEXprog, TEXeff-like, TEXterm.
96
+ * Each sub-tab has a consistent layout: header, text, a specific bubble plot image (from `www/bubbleplots/`), search input, pagination buttons, and `DTOutput`.
97
+ * **"Multi-State TFs" Tab:** Displays a heatmap image (`tfcat/multistatesheatmap.png`).
98
+
99
+ 4. **TF Wave Analysis (`navbarMenu`):**
100
+ * **"Overview" Tab:**
101
+ * Explanatory text, overview image (`tfwaveanal.png`).
102
+ * Clickable images (`waveanalysis/c1.jpg` to `c6.jpg`, linked via `c1_link` etc.) for navigation to detail tabs.
103
+ * Search input (`search_input_wave`), `DTOutput("table_wave")`.
104
+ * **Individual Wave Tabs ("Wave 1" to "Wave 7"):**
105
+ * Each tab displays the wave image, a GO KEGG result image, and "Ranked Text" image(s) from `www/waveanalysis/` and `www/waveanalysis/txtJPG/`.
106
+
107
+ 5. **TF Network Analysis (`navbarMenu`):**
108
+ * **"Search TF-TF correlation in TRM/TEXterm" Tab:**
109
+ * Methodology description, image (`networkanalysis/tfcorrdesc.png`).
110
+ * `sidebarLayout` with search input (`search`), button (`search_btn`), `tableOutput("gene_list_table")` for available TFs.
111
+ * `mainPanel` with `tableOutput("result_table")`, legend, and `uiOutput("image_gallery")`.
112
+ * Footer with citations.
113
+ * **"TRM/TEXterm TF communities" Tab:**
114
+ * Descriptive text, images (`networkanalysis/community.jpg`, `networkanalysis/trmtexcom.png`, `networkanalysis/tfcompathway.png`).
115
+ * Two `DTOutput`s (`trmcom`, `texcom`) for community tables.
116
+ * Footer with citations.
117
+
118
+ 6. **Multi-omics Data Tab:**
119
+ * Header, text, `dataTableOutput("multiomicsdatatable")`.
120
+
121
+ 7. **Global Header Elements:**
122
+ * Defines a modal dialog and associated JavaScript (triggered by an element `#csdescrip_link`, not explicitly found in the provided UI snippets for the main content area).
123
+ * JavaScript to send a Shiny input upon `#c1_link` click.
124
+
125
+ **Libraries Used:** `shiny`, `shinythemes`, `DT`.
126
+
127
+ ## General Architecture and Observations
128
+
129
+ * **Purpose:** The application serves as an interactive data exploration tool, likely accompanying a scientific publication on T cell biology.
130
+ * **Data Source:** Heavily reliant on pre-processed data stored in Excel files and pre-generated images within the `www/` directory. This indicates that the core data processing happens outside this Shiny app.
131
+ * **Repetitive Code Structure:** Significant code duplication exists in both `server.R` and `ui.R`.
132
+ * In `server.R`, the logic for loading, filtering, paginating, and rendering tables for the nine different cell state TF scores is nearly identical.
133
+ * In `ui.R`, the layout for each of these cell state specific tabs, and also for each of the seven individual TF wave analysis tabs, is highly repetitive.
134
+ * This repetition suggests a strong opportunity for refactoring by creating reusable R functions or Shiny modules to generate these UI and server components dynamically.
135
+ * **User Interface (UI):** The UI is well-structured with a `navbarPage` and logical tab groupings. It provides good contextual information (descriptions, explanations of scores/plots) for users.
136
+ * **Interactivity:**
137
+ * Search functionality for TFs/regulators across various datasets.
138
+ * Custom column-based pagination for wide tables.
139
+ * Clickable images and links for navigation between sections.
140
+ * Dynamic display of tables and images based on user selections.
141
+ * **Modularity (Potential):** While not heavily modularized currently due to repetition, the distinct analytical sections (TF Catalog, Wave Analysis, Network Analysis) could be prime candidates for separation into modules if the application were to be expanded or refactored.
142
+ * **Static Content:** A significant portion of the content, especially in the Wave Analysis and Network Analysis tabs, involves displaying pre-generated static images (plots, pathway results).
143
+ * **Code Graveyard:** Both files end with a "CODE GRAVEYARD" comment, indicating that there's older, unused code present.
144
+
145
+ ## Potential Areas for Improvement/Refactoring
146
+
147
+ * **Modularization:** Encapsulate the repetitive UI and server logic for cell-state specific tables and individual wave pages into functions or Shiny modules to reduce code duplication and improve maintainability.
148
+ * **Dynamic Image Generation (Optional):** If source data and plotting scripts were available, some images currently served statically could potentially be generated dynamically, offering more flexibility. However, for a publication companion app, static images are often sufficient and ensure reproducibility of figures.
149
+ * **Consolidate Helper Functions:** General utility functions (like `new_read_excel_file` and `new_filter_data`) are well-defined but ensure they are used consistently.
150
+ * **CSS Styling:** Centralize CSS styling rather than relying heavily on inline `style` attributes within `tags$div` and other elements, potentially using a separate CSS file.
151
+ * **Modal Trigger:** Clarify or ensure the `#csdescrip_link` element, which triggers the global modal, is present and functional in the UI.
152
+
153
  This analysis provides a snapshot of the codebase's structure, functionality, and potential areas for future development or refinement.
folder_structure_documentation.md CHANGED
@@ -1,108 +1,108 @@
1
- # Web Application Folder Structure Documentation
2
-
3
- This document outlines the required folder structure for the TaijiChat R Shiny application, primarily focusing on the `www/` directory, which houses data files and static assets like images.
4
-
5
- ## Root Directory Structure
6
-
7
- The application's root directory contains the core R scripts and the `www/` directory:
8
-
9
- ```
10
- /
11
- |-- server.R
12
- |-- ui.R
13
- |-- www/
14
- |-- (other development files like .git/, codebase_analysis.md etc.)
15
- ```
16
-
17
- ## `www/` Directory Structure
18
-
19
- The `www/` directory is crucial as Shiny automatically makes its contents accessible to the web browser. It needs to be organized as follows for the application to find its resources:
20
-
21
- ```
22
- www/
23
- |-- tablePagerank/ # Excel files for TF PageRank scores
24
- | |-- Table_TF PageRank Scores for Audrey.xlsx # Contains table(s) with PageRank scores for Transcription Factors (TFs), potentially a master or specific analysis file.
25
- | |-- Naive.xlsx # TF data related to Naive cell state.
26
- | |-- TE.xlsx # TF data related to T Exhausted cell state.
27
- | |-- MP.xlsx # TF data related to Memory Precursor cell state.
28
- | |-- TCM.xlsx # TF data related to Central Memory T cell state.
29
- | |-- TEM.xlsx # TF data related to Effector Memory T cell state.
30
- | |-- TRM.xlsx # TF data related to Resident Memory T cell state.
31
- | |-- TEXprog.xlsx # TF data related to Progenitor Exhausted T cell state.
32
- | |-- TEXeff.xlsx # TF data related to Effector Exhausted T cell state.
33
- | |-- TEXterm.xlsx # TF data related to Terminally Exhausted T cell state.
34
- |
35
- |-- waveanalysis/ # Assets for TF Wave Analysis
36
- | |-- searchtfwaves.xlsx # Contains TF names organized by different "waves" of activity or expression.
37
- | |-- tfwaveanal.png # Overview image
38
- | |-- c1.jpg # Wave 1 image
39
- | |-- c2.jpg # Wave 2 image
40
- | |-- c3.jpg # Wave 3 image
41
- | |-- c4.jpg # Wave 4 image
42
- | |-- c5.jpg # Wave 5 image
43
- | |-- c6.jpg # Wave 6 image
44
- | |-- c7.jpg # Wave 7 image
45
- | |-- c1_selected_GO_KEGG.jpg
46
- | |-- c2_selected_GO_KEGG_v2.jpg
47
- | |-- c3_selected_GO_KEGG.jpg
48
- | |-- c4_selected_GO_KEGG.jpg
49
- | |-- c5_selected_GO_KEGG.jpg
50
- | |-- c6_selected_GO_KEGG.jpg
51
- | |-- c7_selected_GO_KEGG.jpg
52
- | |
53
- | |-- txtJPG/ # "Ranked Text" images for Wave Analysis
54
- | |-- c1_ranked_1.jpg
55
- | |-- c1_ranked_2.jpg
56
- | |-- c2_ranked.jpg
57
- | |-- c3_ranked.jpg
58
- | |-- c4_ranked.jpg
59
- | |-- c5_ranked.jpg
60
- | |-- c6_ranked.jpg
61
- | |-- c7_ranked.jpg
62
- |
63
- |-- TFcorintextrm/ # Data for TF-TF correlation
64
- | |-- TF-TFcorTRMTEX.xlsx # Contains data on correlations between Transcription Factors, possibly focused on TRM and TEX states.
65
- |
66
- |-- tfcommunities/ # Data for TF communities
67
- | |-- trmcommunities.xlsx # Data defining TF communities within the TRM (Resident Memory T cell) state.
68
- | |-- texcommunities.xlsx # Data defining TF communities within TEX (Exhausted T cell) states.
69
- |
70
- |-- bubbleplots/ # Images for cell-state specific bubble plots
71
- | |-- naivebubble.jpg
72
- | |-- tebubble.jpg
73
- | |-- mpbubble.jpg
74
- | |-- tcmbubble.jpg
75
- | |-- tembubble.jpg
76
- | |-- trmbubble.jpg
77
- | |-- texprogbubble.jpg
78
- | |-- texintbubble.jpg # (Used for TEXeff-like)
79
- | |-- textermbubble.jpg
80
- |
81
- |-- tfcat/ # Images for the TF Catalog section
82
- | |-- onlycellstates.png
83
- | |-- multistatesheatmap.png
84
- |
85
- |-- networkanalysis/ # Images for TF Network Analysis section
86
- | |-- tfcorrdesc.png
87
- | |-- community.jpg
88
- | |-- trmtexcom.png
89
- | |-- tfcompathway.png
90
- |
91
- |-- multi-omicsdata.xlsx # Main multi-omics data file (e.g., gene expression, chromatin accessibility, protein levels). Structure needs to be inferred or predefined for agent use.
92
- |
93
- |-- homedesc.png # Image for the home page
94
- |-- ucsdlogo.png # UCSD Logo
95
- |-- salklogo.png # Salk Logo
96
- |-- unclogo.jpg # UNC Logo
97
- |-- csdescrip.jpeg # Image for the modal dialog (if used)
98
-
99
- ```
100
-
101
- ## Notes:
102
-
103
- * The filenames listed are based on the explicit references in `server.R` and `ui.R`.
104
- * This structure primarily covers files loaded directly by the R scripts or referenced in UI image tags.
105
- * For the application to be fully functional, all listed Excel files and image assets must be present in these locations with the correct names.
106
- * If an Excel file (e.g., for individual cell states) is derived from a single source table, it's assumed that the source table has been appropriately processed or split into these individual files, or that the application can handle the single source if the server-side logic were adapted.
107
-
108
  This documentation should help in setting up the necessary file environment for the application.
 
1
+ # Web Application Folder Structure Documentation
2
+
3
+ This document outlines the required folder structure for the TaijiChat R Shiny application, primarily focusing on the `www/` directory, which houses data files and static assets like images.
4
+
5
+ ## Root Directory Structure
6
+
7
+ The application's root directory contains the core R scripts and the `www/` directory:
8
+
9
+ ```
10
+ /
11
+ |-- server.R
12
+ |-- ui.R
13
+ |-- www/
14
+ |-- (other development files like .git/, codebase_analysis.md etc.)
15
+ ```
16
+
17
+ ## `www/` Directory Structure
18
+
19
+ The `www/` directory is crucial as Shiny automatically makes its contents accessible to the web browser. It needs to be organized as follows for the application to find its resources:
20
+
21
+ ```
22
+ www/
23
+ |-- tablePagerank/ # Excel files for TF PageRank scores
24
+ | |-- Table_TF PageRank Scores for Audrey.xlsx # Contains table(s) with PageRank scores for Transcription Factors (TFs), potentially a master or specific analysis file.
25
+ | |-- Naive.xlsx # TF data related to Naive cell state.
26
+ | |-- TE.xlsx # TF data related to T Exhausted cell state.
27
+ | |-- MP.xlsx # TF data related to Memory Precursor cell state.
28
+ | |-- TCM.xlsx # TF data related to Central Memory T cell state.
29
+ | |-- TEM.xlsx # TF data related to Effector Memory T cell state.
30
+ | |-- TRM.xlsx # TF data related to Resident Memory T cell state.
31
+ | |-- TEXprog.xlsx # TF data related to Progenitor Exhausted T cell state.
32
+ | |-- TEXeff.xlsx # TF data related to Effector Exhausted T cell state.
33
+ | |-- TEXterm.xlsx # TF data related to Terminally Exhausted T cell state.
34
+ |
35
+ |-- waveanalysis/ # Assets for TF Wave Analysis
36
+ | |-- searchtfwaves.xlsx # Contains TF names organized by different "waves" of activity or expression.
37
+ | |-- tfwaveanal.png # Overview image
38
+ | |-- c1.jpg # Wave 1 image
39
+ | |-- c2.jpg # Wave 2 image
40
+ | |-- c3.jpg # Wave 3 image
41
+ | |-- c4.jpg # Wave 4 image
42
+ | |-- c5.jpg # Wave 5 image
43
+ | |-- c6.jpg # Wave 6 image
44
+ | |-- c7.jpg # Wave 7 image
45
+ | |-- c1_selected_GO_KEGG.jpg
46
+ | |-- c2_selected_GO_KEGG_v2.jpg
47
+ | |-- c3_selected_GO_KEGG.jpg
48
+ | |-- c4_selected_GO_KEGG.jpg
49
+ | |-- c5_selected_GO_KEGG.jpg
50
+ | |-- c6_selected_GO_KEGG.jpg
51
+ | |-- c7_selected_GO_KEGG.jpg
52
+ | |
53
+ | |-- txtJPG/ # "Ranked Text" images for Wave Analysis
54
+ | |-- c1_ranked_1.jpg
55
+ | |-- c1_ranked_2.jpg
56
+ | |-- c2_ranked.jpg
57
+ | |-- c3_ranked.jpg
58
+ | |-- c4_ranked.jpg
59
+ | |-- c5_ranked.jpg
60
+ | |-- c6_ranked.jpg
61
+ | |-- c7_ranked.jpg
62
+ |
63
+ |-- TFcorintextrm/ # Data for TF-TF correlation
64
+ | |-- TF-TFcorTRMTEX.xlsx # Contains data on correlations between Transcription Factors, possibly focused on TRM and TEX states.
65
+ |
66
+ |-- tfcommunities/ # Data for TF communities
67
+ | |-- trmcommunities.xlsx # Data defining TF communities within the TRM (Resident Memory T cell) state.
68
+ | |-- texcommunities.xlsx # Data defining TF communities within TEX (Exhausted T cell) states.
69
+ |
70
+ |-- bubbleplots/ # Images for cell-state specific bubble plots
71
+ | |-- naivebubble.jpg
72
+ | |-- tebubble.jpg
73
+ | |-- mpbubble.jpg
74
+ | |-- tcmbubble.jpg
75
+ | |-- tembubble.jpg
76
+ | |-- trmbubble.jpg
77
+ | |-- texprogbubble.jpg
78
+ | |-- texintbubble.jpg # (Used for TEXeff-like)
79
+ | |-- textermbubble.jpg
80
+ |
81
+ |-- tfcat/ # Images for the TF Catalog section
82
+ | |-- onlycellstates.png
83
+ | |-- multistatesheatmap.png
84
+ |
85
+ |-- networkanalysis/ # Images for TF Network Analysis section
86
+ | |-- tfcorrdesc.png
87
+ | |-- community.jpg
88
+ | |-- trmtexcom.png
89
+ | |-- tfcompathway.png
90
+ |
91
+ |-- multi-omicsdata.xlsx # Main multi-omics data file (e.g., gene expression, chromatin accessibility, protein levels). Structure needs to be inferred or predefined for agent use.
92
+ |
93
+ |-- homedesc.png # Image for the home page
94
+ |-- ucsdlogo.png # UCSD Logo
95
+ |-- salklogo.png # Salk Logo
96
+ |-- unclogo.jpg # UNC Logo
97
+ |-- csdescrip.jpeg # Image for the modal dialog (if used)
98
+
99
+ ```
100
+
101
+ ## Notes:
102
+
103
+ * The filenames listed are based on the explicit references in `server.R` and `ui.R`.
104
+ * This structure primarily covers files loaded directly by the R scripts or referenced in UI image tags.
105
+ * For the application to be fully functional, all listed Excel files and image assets must be present in these locations with the correct names.
106
+ * If an Excel file (e.g., for individual cell states) is derived from a single source table, it's assumed that the source table has been appropriately processed or split into these individual files, or that the application can handle the single source if the server-side logic were adapted.
107
+
108
  This documentation should help in setting up the necessary file environment for the application.
long_operations.R CHANGED
@@ -1,77 +1,77 @@
1
- # long_operations.R
2
-
3
- library(shiny)
4
-
5
- # Source the caching functions
6
- source("R/caching.R")
7
-
8
- # Function to wrap a long-running operation with warning overlay and caching
9
- withWarningOverlayAndCache <- function(session, operation_name, operation_func, ..., max_cache_age_seconds = NULL) {
10
- # Generate a cache key based on operation name and its specific arguments
11
- # Note: The arguments passed to `...` for generate_cache_key should uniquely identify this call.
12
- # This might need careful handling depending on how operation_func uses its environment or global vars.
13
- cache_key <- generate_cache_key(operation_name, ...)
14
-
15
- # Try to get from cache first
16
- cached_result <- get_cached_item(cache_key, max_age_seconds = max_cache_age_seconds)
17
-
18
- if (!is.null(cached_result)) {
19
- return(cached_result)
20
- }
21
-
22
- # If not in cache or stale, proceed with the operation
23
- # Send a custom message to UI to display a warning in the chat log
24
- warning_text <- "This operation might take a moment. Please be patient."
25
- excel_operations <- c("new_read_excel_file") # Add other excel related op_names here
26
- if (operation_name %in% excel_operations) {
27
- warning_text <- "Processing Excel file(s), this may take longer. Please be patient."
28
- }
29
- session$sendCustomMessage(type = "long_op_custom_warning", message = list(text = warning_text))
30
-
31
- result <- tryCatch({
32
- operation_func() # Execute the actual operation
33
- }, error = function(e) {
34
- stop(e)
35
- })
36
-
37
- # Save to cache
38
- save_cached_item(cache_key, result)
39
-
40
- return(result)
41
- }
42
-
43
- # Function to check if an operation might be long-running
44
- isLongRunningOperation <- function(operation_name) {
45
- # List of operations that typically take longer
46
- long_operations <- c(
47
- "get_processed_tf_data",
48
- "get_tf_wave_search_data",
49
- "get_tf_correlation_data",
50
- "get_tf_community_sheet_data",
51
- "new_read_excel_file"
52
- )
53
-
54
- return(operation_name %in% long_operations)
55
- }
56
-
57
- # Function to wrap a reactive expression with warning overlay
58
- # This will now need to be adapted if we want caching for reactives.
59
- # For simplicity, let's assume for now that caching is applied at a lower level before reactive is involved,
60
- # or that specific reactive expressions will call withWarningOverlayAndCache directly.
61
- withWarningOverlayReactive <- function(session, reactive_expr, operation_name) {
62
- # This function needs to be re-thought if caching is to be applied transparently to any reactive_expr.
63
- # The current caching model in withWarningOverlayAndCache assumes an operation_func and its specific args.
64
- # A reactive_expr doesn't fit this model directly without knowing what makes it unique.
65
- if (isLongRunningOperation(operation_name)) {
66
- # This reactive wrapper would show the overlay but not handle caching itself.
67
- # Caching should ideally happen inside the reactive_expr if it calls a cacheable function.
68
- reactive({
69
- showWarningOverlay(session)
70
- res <- reactive_expr()
71
- hideWarningOverlay(session)
72
- res
73
- })
74
- } else {
75
- reactive_expr
76
- }
77
  }
 
1
+ # long_operations.R
2
+
3
+ library(shiny)
4
+
5
+ # Source the caching functions
6
+ source("R/caching.R")
7
+
8
+ # Function to wrap a long-running operation with warning overlay and caching
9
+ withWarningOverlayAndCache <- function(session, operation_name, operation_func, ..., max_cache_age_seconds = NULL) {
10
+ # Generate a cache key based on operation name and its specific arguments
11
+ # Note: The arguments passed to `...` for generate_cache_key should uniquely identify this call.
12
+ # This might need careful handling depending on how operation_func uses its environment or global vars.
13
+ cache_key <- generate_cache_key(operation_name, ...)
14
+
15
+ # Try to get from cache first
16
+ cached_result <- get_cached_item(cache_key, max_age_seconds = max_cache_age_seconds)
17
+
18
+ if (!is.null(cached_result)) {
19
+ return(cached_result)
20
+ }
21
+
22
+ # If not in cache or stale, proceed with the operation
23
+ # Send a custom message to UI to display a warning in the chat log
24
+ warning_text <- "This operation might take a moment. Please be patient."
25
+ excel_operations <- c("new_read_excel_file") # Add other excel related op_names here
26
+ if (operation_name %in% excel_operations) {
27
+ warning_text <- "Processing Excel file(s), this may take longer. Please be patient."
28
+ }
29
+ session$sendCustomMessage(type = "long_op_custom_warning", message = list(text = warning_text))
30
+
31
+ result <- tryCatch({
32
+ operation_func() # Execute the actual operation
33
+ }, error = function(e) {
34
+ stop(e)
35
+ })
36
+
37
+ # Save to cache
38
+ save_cached_item(cache_key, result)
39
+
40
+ return(result)
41
+ }
42
+
43
+ # Function to check if an operation might be long-running
44
+ isLongRunningOperation <- function(operation_name) {
45
+ # List of operations that typically take longer
46
+ long_operations <- c(
47
+ "get_processed_tf_data",
48
+ "get_tf_wave_search_data",
49
+ "get_tf_correlation_data",
50
+ "get_tf_community_sheet_data",
51
+ "new_read_excel_file"
52
+ )
53
+
54
+ return(operation_name %in% long_operations)
55
+ }
56
+
57
+ # Function to wrap a reactive expression with warning overlay
58
+ # This will now need to be adapted if we want caching for reactives.
59
+ # For simplicity, let's assume for now that caching is applied at a lower level before reactive is involved,
60
+ # or that specific reactive expressions will call withWarningOverlayAndCache directly.
61
+ withWarningOverlayReactive <- function(session, reactive_expr, operation_name) {
62
+ # This function needs to be re-thought if caching is to be applied transparently to any reactive_expr.
63
+ # The current caching model in withWarningOverlayAndCache assumes an operation_func and its specific args.
64
+ # A reactive_expr doesn't fit this model directly without knowing what makes it unique.
65
+ if (isLongRunningOperation(operation_name)) {
66
+ # This reactive wrapper would show the overlay but not handle caching itself.
67
+ # Caching should ideally happen inside the reactive_expr if it calls a cacheable function.
68
+ reactive({
69
+ showWarningOverlay(session)
70
+ res <- reactive_expr()
71
+ hideWarningOverlay(session)
72
+ res
73
+ })
74
+ } else {
75
+ reactive_expr
76
+ }
77
  }
main.py CHANGED
@@ -1,38 +1,38 @@
1
- # main.py
2
- import os
3
- from openai import OpenAI
4
- from agents.manager_agent import ManagerAgent
5
-
6
- API_KEY_FILE = "api_key.txt" # Define the API key filename
7
-
8
- if __name__ == "__main__":
9
- print("Application starting...")
10
-
11
- api_key = None
12
- client = None
13
-
14
- try:
15
- # Try to read the API key from the file
16
- with open(API_KEY_FILE, 'r') as f:
17
- api_key = f.read().strip()
18
- if not api_key:
19
- print(f"Warning: {API_KEY_FILE} is empty. LLM features will be disabled.")
20
- api_key = None # Ensure api_key is None if file is empty
21
- else:
22
- print(f"Successfully read API key from {API_KEY_FILE}.")
23
- except FileNotFoundError:
24
- print(f"Warning: {API_KEY_FILE} not found. LLM features will be disabled.")
25
- except Exception as e:
26
- print(f"Error reading {API_KEY_FILE}: {e}. LLM features will be disabled.")
27
-
28
- if api_key:
29
- try:
30
- client = OpenAI(api_key=api_key)
31
- print("OpenAI client initialized successfully.")
32
- except Exception as e:
33
- print(f"Error initializing OpenAI client: {e}. LLM features will be disabled.")
34
- client = None # Ensure client is None if initialization fails
35
-
36
- manager = ManagerAgent(openai_api_key=api_key, openai_client=client)
37
- manager.start_interactive_session()
38
  print("Application ended.")
 
1
+ # main.py
2
+ import os
3
+ from openai import OpenAI
4
+ from agents.manager_agent import ManagerAgent
5
+
6
+ API_KEY_FILE = "api_key.txt" # Define the API key filename
7
+
8
+ if __name__ == "__main__":
9
+ print("Application starting...")
10
+
11
+ api_key = None
12
+ client = None
13
+
14
+ try:
15
+ # Try to read the API key from the file
16
+ with open(API_KEY_FILE, 'r') as f:
17
+ api_key = f.read().strip()
18
+ if not api_key:
19
+ print(f"Warning: {API_KEY_FILE} is empty. LLM features will be disabled.")
20
+ api_key = None # Ensure api_key is None if file is empty
21
+ else:
22
+ print(f"Successfully read API key from {API_KEY_FILE}.")
23
+ except FileNotFoundError:
24
+ print(f"Warning: {API_KEY_FILE} not found. LLM features will be disabled.")
25
+ except Exception as e:
26
+ print(f"Error reading {API_KEY_FILE}: {e}. LLM features will be disabled.")
27
+
28
+ if api_key:
29
+ try:
30
+ client = OpenAI(api_key=api_key)
31
+ print("OpenAI client initialized successfully.")
32
+ except Exception as e:
33
+ print(f"Error initializing OpenAI client: {e}. LLM features will be disabled.")
34
+ client = None # Ensure client is None if initialization fails
35
+
36
+ manager = ManagerAgent(openai_api_key=api_key, openai_client=client)
37
+ manager.start_interactive_session()
38
  print("Application ended.")
plan_temp.txt CHANGED
@@ -1,30 +1,30 @@
1
- i dont think that's reasonable. here's my plan and you can compare current agents against the plan. Correct current implementation to align with my plan:
2
-
3
- For every query, the generation agent go through the steps:
4
- if a dataset, an image, or a paper is provided, add them when creating chat completion. If not, proceed to step 1.
5
-
6
- 1. analyze query
7
- 2. analyze the conversation history if there's any
8
- 3. analyze images, paper, data according to the plan if there's any provided with chat completion.
9
- 4. analyze the error from previous attempt is there's any
10
- 5. read the paper description short version to understand what the paper is about
11
- 6. decide whether the user query can be answered directly or need more information from the paper; if yes, read it
12
- 7. read the tools documentation
13
- 8. decide which tools can be helpful when answering the query; if there are any, prepare the list of tools going to be used
14
- 9. read the data documentation
15
- 10. decide which datasets are relevant to the user query; if there are any, prepare the list of datasets going to be used
16
- 11. decide whether the user query can be solved by paper or tools or data or a combnation of them, if not, prepare a signal NEED_CODING = TRUE but dont send it yet. if not move to the next step
17
- 12. decide whether the user query is about image(s). if so, prepare a list of images needed.
18
- 13. put everything together to make a plan
19
- - this process of thinking must be included in generation agent's LLM's output. it will be used to
20
-
21
- Supervisor agent reviews the plan, focusing on the code and check for suspicious, malicious behavior. only common packages import are allowed
22
-
23
- executor agent executes the plan if the plan contains tool execution or code
24
-
25
- manager records everything from all LLMs and users, and deem whether the user's query can be considered as answered. Note that if agents only propose a plan but the results are not gathered yet it cannot be consider as a proper answer - as in most cases where generation agent propose a plan in iteration 1. if the manager agent deems that a plan is proposed, but results not collected / plan not executed and there's no error from the LLM, then manager agent tells generation agent to initialize a different chat completion with the images, datasets requested by generation's plan. This attempt instructed by manager will be different from a normal attempt, it does not count to the allowed attempt count.
26
-
27
- if an error occurs in any stage, the error must be reported to the manager. the manager will record all the errors. once an error is detected, another attempt will start. and we go back to the generation agent step again. there will be 3 attempts allowed
28
-
29
- tell me whether you think my plan is clear and reasonable. if there any part missing or problematic
30
  if not, proceed to implementation
 
1
+ i dont think that's reasonable. here's my plan and you can compare current agents against the plan. Correct current implementation to align with my plan:
2
+
3
+ For every query, the generation agent go through the steps:
4
+ if a dataset, an image, or a paper is provided, add them when creating chat completion. If not, proceed to step 1.
5
+
6
+ 1. analyze query
7
+ 2. analyze the conversation history if there's any
8
+ 3. analyze images, paper, data according to the plan if there's any provided with chat completion.
9
+ 4. analyze the error from previous attempt is there's any
10
+ 5. read the paper description short version to understand what the paper is about
11
+ 6. decide whether the user query can be answered directly or need more information from the paper; if yes, read it
12
+ 7. read the tools documentation
13
+ 8. decide which tools can be helpful when answering the query; if there are any, prepare the list of tools going to be used
14
+ 9. read the data documentation
15
+ 10. decide which datasets are relevant to the user query; if there are any, prepare the list of datasets going to be used
16
+ 11. decide whether the user query can be solved by paper or tools or data or a combnation of them, if not, prepare a signal NEED_CODING = TRUE but dont send it yet. if not move to the next step
17
+ 12. decide whether the user query is about image(s). if so, prepare a list of images needed.
18
+ 13. put everything together to make a plan
19
+ - this process of thinking must be included in generation agent's LLM's output. it will be used to
20
+
21
+ Supervisor agent reviews the plan, focusing on the code and check for suspicious, malicious behavior. only common packages import are allowed
22
+
23
+ executor agent executes the plan if the plan contains tool execution or code
24
+
25
+ manager records everything from all LLMs and users, and deem whether the user's query can be considered as answered. Note that if agents only propose a plan but the results are not gathered yet it cannot be consider as a proper answer - as in most cases where generation agent propose a plan in iteration 1. if the manager agent deems that a plan is proposed, but results not collected / plan not executed and there's no error from the LLM, then manager agent tells generation agent to initialize a different chat completion with the images, datasets requested by generation's plan. This attempt instructed by manager will be different from a normal attempt, it does not count to the allowed attempt count.
26
+
27
+ if an error occurs in any stage, the error must be reported to the manager. the manager will record all the errors. once an error is detected, another attempt will start. and we go back to the generation agent step again. there will be 3 attempts allowed
28
+
29
+ tell me whether you think my plan is clear and reasonable. if there any part missing or problematic
30
  if not, proceed to implementation
server.R CHANGED
The diff for this file is too large to render. See raw diff
 
tested_queries.txt CHANGED
@@ -1,40 +1,40 @@
1
- # --- Easy Queries (Navigation & Simple Data Retrieval) ---
2
-
3
- # Navigation
4
- 1. "Show me the home page."
5
- 2. "Take me to the TE (Terminal Exhaustion) data section."
6
- 3. "I want to see the multi-omics data."
7
- 4. "Navigate to the TF (Transcription Factor) Wave Analysis overview."
8
- 5. "Where can I find information about TRM communities?"
9
-
10
- # Simple Data Retrieval (from existing tables/UI elements)
11
- 6. "In the 'All Data Search' (main page), what are the TF activity scores for STAT3?"
12
- 7. "For the Naive T-cell state, search for scores related to JUNB."
13
- 8. "What waves is the TF 'BATF' a part of?" (Uses searchtfwaves.xlsx)
14
- 9. "Display the TRM communities table."
15
- 10. "Find the research paper by 'Chen' in the multi-omics data." (Assumes 'Chen' is an author)
16
-
17
- # --- Medium Queries (Requires Tool Use & Simple Code for Analysis/Formatting) ---
18
-
19
- # Basic Analysis / Data Manipulation (if agent can generate code for simple tasks)
20
- 11. "From the 'All Data Search' table, can you list the top 3 TFs with the highest scores in the first displayed cell state (e.g., Naive_Day0_vs_Day7_UP)?" (Requires identifying a column and finding max values)
21
- 12. "What is the average TF activity score for 'IRF4' across all displayed cell states in the 'All Data Search' section for the current view?" (Requires iterating through columns if multiple are shown for IRF4)
22
- 13. "Compare the TF activity scores for 'TCF7' and 'TOX' in the 'TE' (Terminal Exhaustion) dataset. Which one is generally higher?"
23
- 14. "If I search for 'BACH2' in the main TF activity score table, how many cell states show a score greater than 1.0?"
24
- 15. "Can you provide the TF activity scores for 'PRDM1' in the TEM (T Effector Memory) dataset, but only show me the cell states where the score is negative?"
25
-
26
- # --- Difficult Queries (Requires LLM Interpretation, Insight Generation, Complex Tool Orchestration) ---
27
-
28
- # Insight Generation & Interpretation
29
- 16. "Based on the available TF activity scores, which TFs seem to be most consistently upregulated across different exhausted T-cell states (e.g., TEXprog, TEXeff, TEXterm)?" (Requires understanding of "exhausted", cross-table comparison, and summarization)
30
- 17. "Is there a noticeable trend or pattern in the activity of 'EOMES' as T-cells progress from Naive to various effector and memory states shown in the data?" (Requires interpreting progression and comparing multiple datasets)
31
- 18. "Considering the TF communities data for TRM and TEX, are there any TFs that are prominent in both TRM and TEX communities, suggesting a shared role?" (Requires comparing two distinct datasets/visualizations and identifying overlaps)
32
- 19. "Analyze the TF activity scores for 'FOXO1'. Does its activity pattern suggest a role in maintaining T-cell quiescence or promoting activation/exhaustion based on the data available across different T-cell states?" (Requires biological interpretation linked to data patterns)
33
- 20. "If a researcher is interested in TFs that are highly active in T Effector Memory (TEM) cells but show low activity in Terminally Exhausted (TEXterm) cells, which TFs should they investigate further based on the provided datasets?" (Requires filtering, comparison across datasets, and a recommendation)
34
- 21. "Looking at the TF Wave Analysis, which TFs are predominantly active in early waves versus late waves? What might this imply about their roles in T-cell differentiation or response dynamics?" (Requires interpreting the wave data and drawing higher-level conclusions)
35
- 22. "The user uploaded an image of a UMAP plot showing clusters. The file is 'www/test_images/umap_example.png'. Can you describe what you see in the image and how it might relate to T-cell states if cluster A is Naive, cluster B is TEM, and cluster C is TEX?" (Requires multimodal input, assuming the agent can be pointed to local files for analysis - this tests the image upload and interpretation flow we built)
36
- 23. "Given the data in 'Table_TF PageRank Scores for Audrey.xlsx', identify three TFs that have significantly different activity scores between 'Naive_Day0_vs_Day7_UP' and 'MP_Day0_vs_Day7_UP'. Explain the potential biological significance of these differences." (Requires direct data analysis from a file, comparison, and biological reasoning)
37
-
38
- # Creative/Hypothetical (tests robustness and deeper understanding)
39
- 24. "If we wanted to design an experiment to reverse T-cell exhaustion, which 2-3 TFs might be good targets for modulation (activation or inhibition) based on their activity profiles in the provided datasets, and why?"
40
  25. "Explain the overall story the TF activity data tells about T-cell differentiation and exhaustion from Naive to Terminally Exhausted states, highlighting 3 key TF players and their changing roles."
 
1
+ # --- Easy Queries (Navigation & Simple Data Retrieval) ---
2
+
3
+ # Navigation
4
+ 1. "Show me the home page."
5
+ 2. "Take me to the TE (Terminal Exhaustion) data section."
6
+ 3. "I want to see the multi-omics data."
7
+ 4. "Navigate to the TF (Transcription Factor) Wave Analysis overview."
8
+ 5. "Where can I find information about TRM communities?"
9
+
10
+ # Simple Data Retrieval (from existing tables/UI elements)
11
+ 6. "In the 'All Data Search' (main page), what are the TF activity scores for STAT3?"
12
+ 7. "For the Naive T-cell state, search for scores related to JUNB."
13
+ 8. "What waves is the TF 'BATF' a part of?" (Uses searchtfwaves.xlsx)
14
+ 9. "Display the TRM communities table."
15
+ 10. "Find the research paper by 'Chen' in the multi-omics data." (Assumes 'Chen' is an author)
16
+
17
+ # --- Medium Queries (Requires Tool Use & Simple Code for Analysis/Formatting) ---
18
+
19
+ # Basic Analysis / Data Manipulation (if agent can generate code for simple tasks)
20
+ 11. "From the 'All Data Search' table, can you list the top 3 TFs with the highest scores in the first displayed cell state (e.g., Naive_Day0_vs_Day7_UP)?" (Requires identifying a column and finding max values)
21
+ 12. "What is the average TF activity score for 'IRF4' across all displayed cell states in the 'All Data Search' section for the current view?" (Requires iterating through columns if multiple are shown for IRF4)
22
+ 13. "Compare the TF activity scores for 'TCF7' and 'TOX' in the 'TE' (Terminal Exhaustion) dataset. Which one is generally higher?"
23
+ 14. "If I search for 'BACH2' in the main TF activity score table, how many cell states show a score greater than 1.0?"
24
+ 15. "Can you provide the TF activity scores for 'PRDM1' in the TEM (T Effector Memory) dataset, but only show me the cell states where the score is negative?"
25
+
26
+ # --- Difficult Queries (Requires LLM Interpretation, Insight Generation, Complex Tool Orchestration) ---
27
+
28
+ # Insight Generation & Interpretation
29
+ 16. "Based on the available TF activity scores, which TFs seem to be most consistently upregulated across different exhausted T-cell states (e.g., TEXprog, TEXeff, TEXterm)?" (Requires understanding of "exhausted", cross-table comparison, and summarization)
30
+ 17. "Is there a noticeable trend or pattern in the activity of 'EOMES' as T-cells progress from Naive to various effector and memory states shown in the data?" (Requires interpreting progression and comparing multiple datasets)
31
+ 18. "Considering the TF communities data for TRM and TEX, are there any TFs that are prominent in both TRM and TEX communities, suggesting a shared role?" (Requires comparing two distinct datasets/visualizations and identifying overlaps)
32
+ 19. "Analyze the TF activity scores for 'FOXO1'. Does its activity pattern suggest a role in maintaining T-cell quiescence or promoting activation/exhaustion based on the data available across different T-cell states?" (Requires biological interpretation linked to data patterns)
33
+ 20. "If a researcher is interested in TFs that are highly active in T Effector Memory (TEM) cells but show low activity in Terminally Exhausted (TEXterm) cells, which TFs should they investigate further based on the provided datasets?" (Requires filtering, comparison across datasets, and a recommendation)
34
+ 21. "Looking at the TF Wave Analysis, which TFs are predominantly active in early waves versus late waves? What might this imply about their roles in T-cell differentiation or response dynamics?" (Requires interpreting the wave data and drawing higher-level conclusions)
35
+ 22. "The user uploaded an image of a UMAP plot showing clusters. The file is 'www/test_images/umap_example.png'. Can you describe what you see in the image and how it might relate to T-cell states if cluster A is Naive, cluster B is TEM, and cluster C is TEX?" (Requires multimodal input, assuming the agent can be pointed to local files for analysis - this tests the image upload and interpretation flow we built)
36
+ 23. "Given the data in 'Table_TF PageRank Scores for Audrey.xlsx', identify three TFs that have significantly different activity scores between 'Naive_Day0_vs_Day7_UP' and 'MP_Day0_vs_Day7_UP'. Explain the potential biological significance of these differences." (Requires direct data analysis from a file, comparison, and biological reasoning)
37
+
38
+ # Creative/Hypothetical (tests robustness and deeper understanding)
39
+ 24. "If we wanted to design an experiment to reverse T-cell exhaustion, which 2-3 TFs might be good targets for modulation (activation or inhibition) based on their activity profiles in the provided datasets, and why?"
40
  25. "Explain the overall story the TF activity data tells about T-cell differentiation and exhaustion from Naive to Terminally Exhausted states, highlighting 3 key TF players and their changing roles."
tools/agent_tools.py CHANGED
The diff for this file is too large to render. See raw diff
 
tools/agent_tools_documentation.md CHANGED
@@ -1,190 +1,190 @@
1
- # Agent Tools Documentation
2
-
3
- This document outlines the granular tools that can be created or extracted from the TaijiChat R Shiny application. These tools are intended for an agent system to access data, calculations, methodologies, tables, and graphs from the application.
4
-
5
- ---
6
-
7
- Tool Name: `get_raw_excel_data`
8
- Description: Reads a specified Excel file and returns its raw content as a list of lists, where each inner list represents a row. This tool is generic; the `file_path` should be an absolute path or a path relative to the project root (e.g., "www/some_data.xlsx"). For predefined datasets within the application structure, other more specific tools should be preferred if available.
9
- Input: `file_path` (string) - The path to the Excel file.
10
- Output: `data` (list of lists of strings/numbers) - The raw data from the Excel sheet. Returns an empty list if the file is not found or cannot be read.
11
-
12
- ---
13
-
14
- Tool Name: `get_processed_tf_data`
15
- Description: Reads and processes a TF-related Excel file identified by its dataset_identifier (e.g., "Naive", "Overall_TF_PageRank"). It uses an internal mapping (get_tf_catalog_dataset_path) to find the actual file path within the 'www/tablePagerank/' directory. The standard processing includes: reading the Excel file, transposing it, using the original first row as new column headers, and then removing this header row from the data.
16
- Input: `dataset_identifier` (string) - The identifier for the dataset. Valid identifiers include: "Overall_TF_PageRank", "Naive", "TE", "MP", "TCM", "TEM", "TRM", "TEXprog", "TEXeff", "TEXterm".
17
- Output: `data` (list of lists of strings/numbers) - The processed data, where the first inner list contains the headers, and subsequent lists are data rows. Returns an empty list if processing fails or identifier is invalid.
18
-
19
- ---
20
-
21
- Tool Name: `filter_data_by_column_keywords`
22
- Description: Filters a dataset (list of lists, where the first list is headers) based on keywords matching its column names. This is for data that has already been processed (e.g., by `get_processed_tf_data`) where TFs or genes are column headers. The keyword search is case-insensitive and supports multiple comma-separated keywords. If no keywords are provided, the original dataset is returned.
23
- Input:
24
- `dataset` (list of lists) - The data to filter, with the first list being headers.
25
- `keywords` (string) - Comma-separated keywords to search for in column headers.
26
- Output: `filtered_dataset` (list of lists) - The subset of the data containing only the matching columns (including the header row). Returns an empty list (with headers only) if no columns match.
27
-
28
- ---
29
-
30
- Tool Name: `get_tf_wave_search_data`
31
- Description: Reads the `searchtfwaves.xlsx` file from `www/waveanalysis/`, which contains TF names organized by "waves" (Wave1 to Wave7 as columns).
32
- Input: `tf_search_term` (string, optional) - A specific TF name to search for. If empty or not provided, all TF wave data is returned. The search is case-insensitive.
33
- Output: `wave_data` (dictionary) - If `tf_search_term` is provided and matches, returns a structure like `{"WaveX": ["TF1", "TF2"], "WaveY": ["TF1"]}` showing which waves the TF belongs to. If no `tf_search_term`, returns the full data as `{"Wave1": ["All TFs in Wave1"], "Wave2": ["All TFs in Wave2"], ...}`. If no matches are found for a search term, an empty dictionary is returned.
34
-
35
- ---
36
-
37
- Tool Name: `get_tf_correlation_data`
38
- Description: Reads the `TF-TFcorTRMTEX.xlsx` file from `www/TFcorintextrm/`. If a `tf_name` is provided, it filters the data for that TF (case-insensitive match on the primary TF identifier column, typically "TF Name" or the first column).
39
- Input: `tf_name` (string, optional) - The specific TF name to search for. If empty or not provided, returns the full dataset.
40
- Output: `correlation_data` (list of lists) - The filtered (or full) data from the correlation table. The first list is headers. Returns an empty list (with headers only) if `tf_name` is provided but not found or if the file cannot be processed.
41
-
42
- ---
43
-
44
- Tool Name: `get_tf_correlation_image_path`
45
- Description: Reads the `TF-TFcorTRMTEX.xlsx` file from `www/TFcorintextrm/`, finds the row for the given `tf_name` (case-insensitive match on the primary TF identifier column), and returns the path stored in the "TF Merged Graph Path" column. The returned path is relative to the project's `www` directory (e.g., "www/networkanalysis/images/BATF_graph.png").
46
- Input: `tf_name` (string) - The specific TF name.
47
- Output: `image_path` (string) - The relative web path to the image or an empty string if not found or if the file cannot be processed.
48
-
49
- ---
50
-
51
- Tool Name: `list_all_tfs_in_correlation_data`
52
- Description: Reads the `TF-TFcorTRMTEX.xlsx` file from `www/TFcorintextrm/` and returns a list of all unique TF names from the primary TF identifier column (typically "TF Name" or the first column). Filters out empty strings and 'nan'.
53
- Input: None
54
- Output: `tf_list` (list of strings) - A list of TF names. Returns an empty list if the file cannot be processed.
55
-
56
- ---
57
-
58
- Tool Name: `get_tf_community_sheet_data`
59
- Description: Reads one of the TF community Excel files (`trmcommunities.xlsx` or `texcommunities.xlsx`) located in `www/tfcommunities/`.
60
- Input: `community_type` (string) - Either "trm" or "texterm".
61
- Output: `community_data` (list of lists) - Data from the specified community sheet (raw format, first list is headers). Returns an empty list if the type is invalid or file not found/processed.
62
-
63
- ---
64
-
65
- Tool Name: `get_static_image_path`
66
- Description: Returns the predefined relative web path (e.g., "www/images/logo.png") for a known static image asset. These paths are typically relative to the project root.
67
- Input: `image_identifier` (string) - A unique key representing the image (e.g., "home_page_diagram", "ucsd_logo", "naive_bubble_plot", "wave1_main_img", "wave1_gokegg_img", "wave1_ranked_text1_img", "tfcat_overview_img", "network_correlation_desc_img").
68
- Output: `image_path` (string) - The relative path (e.g., "www/homedesc.png"). Returns an empty string if identifier is unknown. This tool relies on an internal mapping (`_STATIC_IMAGE_WEB_PATHS` in `tools.agent_tools`).
69
-
70
- ---
71
-
72
- Tool Name: `get_ui_descriptive_text`
73
- Description: Retrieves predefined descriptive text, methodology explanations, or captions by its identifier, primarily from `tools/ui_texts.json`.
74
- Input: `text_identifier` (string) - A unique key representing the text block (e.g., "tf_score_calculation_info", "cell_state_specificity_info", "wave_analysis_overview_text", "wave_1_analysis_placeholder_details").
75
- Output: `descriptive_text` (string) - The requested text block. Returns an empty string if identifier is unknown.
76
-
77
- ---
78
-
79
- Tool Name: `list_available_tf_catalog_datasets`
80
- Description: Returns a list of valid `dataset_identifier` strings that can be used with the `get_processed_tf_data` tool.
81
- Input: None
82
- Output: `dataset_identifiers` (list of strings) - E.g., ["Overall_TF_PageRank", "Naive", "TE", "MP", "TCM", "TEM", "TRM", "TEXprog", "TEXeff", "TEXterm"].
83
-
84
- ---
85
-
86
- Tool Name: `list_available_cell_state_bubble_plots`
87
- Description: Returns a list of identifiers for available cell-state specific bubble plot images. These identifiers can be used with `get_static_image_path`.
88
- Input: None
89
- Output: `image_identifiers` (list of strings) - E.g., ["naive_bubble_plot", "te_bubble_plot", ...]. Derived from internal mapping in `tools.agent_tools`.
90
-
91
- ---
92
-
93
- Tool Name: `list_available_wave_analysis_assets`
94
- Description: Returns a structured dictionary of available asset identifiers for a specific TF wave (main image, GO/KEGG image, ranked text images). Identifiers can be used with `get_static_image_path`.
95
- Input: `wave_number` (integer, 1-7) - The wave number.
96
- Output: `asset_info` (dictionary) - E.g., `{"main_image_id": "waveX_main_img", "gokegg_image_id": "waveX_gokegg_img", "ranked_text_image_ids": ["waveX_ranked_text1_img", ...]}`. Returns empty if wave number is invalid. Derived from internal mapping in `tools.agent_tools`.
97
-
98
- ---
99
-
100
- Tool Name: `get_internal_navigation_info`
101
- Description: Provides information about where an internal UI link (like those on the homepage image map or wave overview images) is intended to navigate within the application structure.
102
- Input: `link_id` (string) - The identifier of the link (e.g., "to_tfcat", "to_tfwave", "to_tfnet", "c1_link", "c2_link", etc.).
103
- Output: `navigation_target_description` (string) - A human-readable description of the target (e.g., "Navigates to the 'TF Catalog' section.", "Navigates to the 'Wave 1 Analysis' tab."). Derived from internal mapping in `tools.agent_tools`.
104
-
105
- ---
106
-
107
- Tool Name: `get_biorxiv_paper_url`
108
- Description: Returns the URL for the main bioRxiv paper referenced in the application.
109
- Input: None
110
- Output: `url` (string) - The bioRxiv paper URL.
111
-
112
- ---
113
-
114
- Tool Name: `list_all_files_in_www_directory`
115
- Description: Scans the entire `www/` directory (and its subdirectories, excluding common hidden/system files) and returns a list of all files found. For each file, it provides its relative path from the project root (e.g., "www/images/logo.png"), its detected MIME type (e.g., "image/png", "text/csv", "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"), and its size in bytes. This tool helps in understanding all available static assets and data files within the web-accessible `www` directory.
116
- Input: None
117
- Output: `file_manifest` (list of dictionaries) - Each dictionary represents a file and contains the keys: `path` (string), `type` (string), `size` (integer). Example item: `{"path": "www/data/report.txt", "type": "text/plain", "size": 1024}`. Returns an empty list if the `www` directory isn't found or is empty.
118
-
119
- ---
120
-
121
- ### `multi_source_literature_search(queries: list[str], max_results_per_query_per_source: int = 1, max_total_unique_papers: int = 10) -> list[dict]`
122
-
123
- Searches for academic literature across multiple sources (Semantic Scholar, PubMed, ArXiv) using a list of provided search queries. It then de-duplicates the results based primarily on DOI, and secondarily on a combination of title and first author if DOI is not available. The search process stops early if the `max_total_unique_papers` limit is reached.
124
-
125
- **Args:**
126
-
127
- * `queries (list[str])`: A list of search query strings. The GenerationAgent should brainstorm 3-5 diverse queries relevant to the user's request.
128
- * `max_results_per_query_per_source (int)`: The maximum number of results to fetch from EACH academic source (Semantic Scholar, PubMed, ArXiv) for EACH query string. Defaults to `1`.
129
- * `max_total_unique_papers (int)`: The maximum total number of unique de-duplicated papers to return across all queries and sources. Defaults to `10`. The tool will stop fetching more data once this limit is met.
130
-
131
- **Returns:**
132
-
133
- * `list[dict]`: A consolidated and de-duplicated list of paper details, containing up to `max_total_unique_papers`. Each dictionary in the list represents a paper and has the following keys:
134
- * `"title" (str)`: The title of the paper. "N/A" if not available.
135
- * `"authors" (list[str])`: A list of author names. ["N/A"] if not available.
136
- * `"year" (str | int)`: The publication year. "N/A" if not available.
137
- * `"abstract" (str)`: A snippet of the abstract (typically up to 500 characters followed by "..."). "N/A" if not available.
138
- * `"doi" (str | None)`: The Digital Object Identifier. `None` if not available.
139
- * `"url" (str)`: A direct URL to the paper (e.g., PubMed link, ArXiv link, Semantic Scholar link). "N/A" if not available.
140
- * `"venue" (str)`: The publication venue (e.g., journal name, "ArXiv"). "N/A" if not available.
141
- * `"source_api" (str)`: The API from which this record was retrieved (e.g., "Semantic Scholar", "PubMed", "ArXiv").
142
-
143
- **GenerationAgent Usage Example (for `python_code` field when `status` is `AWAITING_DATA`):**
144
-
145
- ```python
146
- # Example: User asks for up to 3 papers
147
- print(json.dumps({'intermediate_data_for_llm': tools.multi_source_literature_search(queries=["T-cell exhaustion markers AND cancer", "immunotherapy for melanoma AND biomarkers"], max_results_per_query_per_source=1, max_total_unique_papers=3)}))
148
-
149
- # Example: Defaulting to 10 total unique papers
150
- print(json.dumps({'intermediate_data_for_llm': tools.multi_source_literature_search(queries=["COVID-19 long-term effects"], max_results_per_query_per_source=2)}))
151
- ```
152
-
153
- **Important Considerations for GenerationAgent:**
154
-
155
- * When results are returned from this tool, the `GenerationAgent`'s `explanation` (for `CODE_COMPLETE` status) should present a summary of the *found papers* (e.g., titles, authors, URLs). It should clearly state that these are potential literature leads and should *not* yet claim to have read or summarized the full content of these papers in that same turn, unless a subsequent tool call for summarization is planned and executed.
156
-
157
- ---
158
-
159
- ### `fetch_text_from_urls(paper_info_list: list[dict], max_chars_per_paper: int = 15000) -> list[dict]`
160
-
161
- Attempts to fetch and extract textual content from the URLs of papers provided in a list. This tool is typically used after `multi_source_literature_search` to gather content for summarization by the GenerationAgent.
162
-
163
- **Args:**
164
-
165
- * `paper_info_list (list[dict])`: A list of paper dictionaries, as returned by `multi_source_literature_search`. Each dictionary is expected to have at least a `"url"` key. Other keys like `"title"` and `"source_api"` are used for logging.
166
- * `max_chars_per_paper (int)`: The maximum number of characters of text to retrieve and store for each paper. Defaults to `15000`. Text longer than this will be truncated.
167
-
168
- **Returns:**
169
-
170
- * `list[dict]`: The input `paper_info_list`, where each paper dictionary is augmented with a new key `"retrieved_text_content"`.
171
- * If successful, `"retrieved_text_content" (str)` will contain the extracted text (up to `max_chars_per_paper`).
172
- * If fetching or parsing fails for a paper, `"retrieved_text_content" (str)` will contain an error message (e.g., "Error: Invalid or missing URL.", "Error fetching URL: ...", "Error: No text could be extracted.").
173
-
174
- **GenerationAgent Usage Example (for `python_code` field when `status` is `AWAITING_DATA`):**
175
-
176
- This tool is usually the second step in a literature review process.
177
-
178
- ```python
179
- # Assume 'list_of_papers_from_search' is a variable holding the output from a previous
180
- # call to tools.multi_source_literature_search(...)
181
- print(json.dumps({'intermediate_data_for_llm': tools.fetch_text_from_urls(paper_info_list=list_of_papers_from_search, max_chars_per_paper=10000)}))
182
- ```
183
-
184
- **Important Considerations for GenerationAgent:**
185
-
186
- * After this tool returns the `paper_info_list` (now with `"retrieved_text_content"`), the `GenerationAgent` is responsible for using its own LLM capabilities to read the `"retrieved_text_content"` for each paper and generate summaries if requested by the user or if it's part of its plan.
187
- * The `GenerationAgent` should be prepared for `"retrieved_text_content"` to contain error messages and handle them gracefully in its summarization logic (e.g., by stating that text for a particular paper could not be retrieved).
188
- * Web scraping is inherently unreliable; success in fetching and parsing text can vary greatly between websites. The agent should not assume text will always be available.
189
-
190
  ---
 
1
+ # Agent Tools Documentation
2
+
3
+ This document outlines the granular tools that can be created or extracted from the TaijiChat R Shiny application. These tools are intended for an agent system to access data, calculations, methodologies, tables, and graphs from the application.
4
+
5
+ ---
6
+
7
+ Tool Name: `get_raw_excel_data`
8
+ Description: Reads a specified Excel file and returns its raw content as a list of lists, where each inner list represents a row. This tool is generic; the `file_path` should be an absolute path or a path relative to the project root (e.g., "www/some_data.xlsx"). For predefined datasets within the application structure, other more specific tools should be preferred if available.
9
+ Input: `file_path` (string) - The path to the Excel file.
10
+ Output: `data` (list of lists of strings/numbers) - The raw data from the Excel sheet. Returns an empty list if the file is not found or cannot be read.
11
+
12
+ ---
13
+
14
+ Tool Name: `get_processed_tf_data`
15
+ Description: Reads and processes a TF-related Excel file identified by its dataset_identifier (e.g., "Naive", "Overall_TF_PageRank"). It uses an internal mapping (get_tf_catalog_dataset_path) to find the actual file path within the 'www/tablePagerank/' directory. The standard processing includes: reading the Excel file, transposing it, using the original first row as new column headers, and then removing this header row from the data.
16
+ Input: `dataset_identifier` (string) - The identifier for the dataset. Valid identifiers include: "Overall_TF_PageRank", "Naive", "TE", "MP", "TCM", "TEM", "TRM", "TEXprog", "TEXeff", "TEXterm".
17
+ Output: `data` (list of lists of strings/numbers) - The processed data, where the first inner list contains the headers, and subsequent lists are data rows. Returns an empty list if processing fails or identifier is invalid.
18
+
19
+ ---
20
+
21
+ Tool Name: `filter_data_by_column_keywords`
22
+ Description: Filters a dataset (list of lists, where the first list is headers) based on keywords matching its column names. This is for data that has already been processed (e.g., by `get_processed_tf_data`) where TFs or genes are column headers. The keyword search is case-insensitive and supports multiple comma-separated keywords. If no keywords are provided, the original dataset is returned.
23
+ Input:
24
+ `dataset` (list of lists) - The data to filter, with the first list being headers.
25
+ `keywords` (string) - Comma-separated keywords to search for in column headers.
26
+ Output: `filtered_dataset` (list of lists) - The subset of the data containing only the matching columns (including the header row). Returns an empty list (with headers only) if no columns match.
27
+
28
+ ---
29
+
30
+ Tool Name: `get_tf_wave_search_data`
31
+ Description: Reads the `searchtfwaves.xlsx` file from `www/waveanalysis/`, which contains TF names organized by "waves" (Wave1 to Wave7 as columns).
32
+ Input: `tf_search_term` (string, optional) - A specific TF name to search for. If empty or not provided, all TF wave data is returned. The search is case-insensitive.
33
+ Output: `wave_data` (dictionary) - If `tf_search_term` is provided and matches, returns a structure like `{"WaveX": ["TF1", "TF2"], "WaveY": ["TF1"]}` showing which waves the TF belongs to. If no `tf_search_term`, returns the full data as `{"Wave1": ["All TFs in Wave1"], "Wave2": ["All TFs in Wave2"], ...}`. If no matches are found for a search term, an empty dictionary is returned.
34
+
35
+ ---
36
+
37
+ Tool Name: `get_tf_correlation_data`
38
+ Description: Reads the `TF-TFcorTRMTEX.xlsx` file from `www/TFcorintextrm/`. If a `tf_name` is provided, it filters the data for that TF (case-insensitive match on the primary TF identifier column, typically "TF Name" or the first column).
39
+ Input: `tf_name` (string, optional) - The specific TF name to search for. If empty or not provided, returns the full dataset.
40
+ Output: `correlation_data` (list of lists) - The filtered (or full) data from the correlation table. The first list is headers. Returns an empty list (with headers only) if `tf_name` is provided but not found or if the file cannot be processed.
41
+
42
+ ---
43
+
44
+ Tool Name: `get_tf_correlation_image_path`
45
+ Description: Reads the `TF-TFcorTRMTEX.xlsx` file from `www/TFcorintextrm/`, finds the row for the given `tf_name` (case-insensitive match on the primary TF identifier column), and returns the path stored in the "TF Merged Graph Path" column. The returned path is relative to the project's `www` directory (e.g., "www/networkanalysis/images/BATF_graph.png").
46
+ Input: `tf_name` (string) - The specific TF name.
47
+ Output: `image_path` (string) - The relative web path to the image or an empty string if not found or if the file cannot be processed.
48
+
49
+ ---
50
+
51
+ Tool Name: `list_all_tfs_in_correlation_data`
52
+ Description: Reads the `TF-TFcorTRMTEX.xlsx` file from `www/TFcorintextrm/` and returns a list of all unique TF names from the primary TF identifier column (typically "TF Name" or the first column). Filters out empty strings and 'nan'.
53
+ Input: None
54
+ Output: `tf_list` (list of strings) - A list of TF names. Returns an empty list if the file cannot be processed.
55
+
56
+ ---
57
+
58
+ Tool Name: `get_tf_community_sheet_data`
59
+ Description: Reads one of the TF community Excel files (`trmcommunities.xlsx` or `texcommunities.xlsx`) located in `www/tfcommunities/`.
60
+ Input: `community_type` (string) - Either "trm" or "texterm".
61
+ Output: `community_data` (list of lists) - Data from the specified community sheet (raw format, first list is headers). Returns an empty list if the type is invalid or file not found/processed.
62
+
63
+ ---
64
+
65
+ Tool Name: `get_static_image_path`
66
+ Description: Returns the predefined relative web path (e.g., "www/images/logo.png") for a known static image asset. These paths are typically relative to the project root.
67
+ Input: `image_identifier` (string) - A unique key representing the image (e.g., "home_page_diagram", "ucsd_logo", "naive_bubble_plot", "wave1_main_img", "wave1_gokegg_img", "wave1_ranked_text1_img", "tfcat_overview_img", "network_correlation_desc_img").
68
+ Output: `image_path` (string) - The relative path (e.g., "www/homedesc.png"). Returns an empty string if identifier is unknown. This tool relies on an internal mapping (`_STATIC_IMAGE_WEB_PATHS` in `tools.agent_tools`).
69
+
70
+ ---
71
+
72
+ Tool Name: `get_ui_descriptive_text`
73
+ Description: Retrieves predefined descriptive text, methodology explanations, or captions by its identifier, primarily from `tools/ui_texts.json`.
74
+ Input: `text_identifier` (string) - A unique key representing the text block (e.g., "tf_score_calculation_info", "cell_state_specificity_info", "wave_analysis_overview_text", "wave_1_analysis_placeholder_details").
75
+ Output: `descriptive_text` (string) - The requested text block. Returns an empty string if identifier is unknown.
76
+
77
+ ---
78
+
79
+ Tool Name: `list_available_tf_catalog_datasets`
80
+ Description: Returns a list of valid `dataset_identifier` strings that can be used with the `get_processed_tf_data` tool.
81
+ Input: None
82
+ Output: `dataset_identifiers` (list of strings) - E.g., ["Overall_TF_PageRank", "Naive", "TE", "MP", "TCM", "TEM", "TRM", "TEXprog", "TEXeff", "TEXterm"].
83
+
84
+ ---
85
+
86
+ Tool Name: `list_available_cell_state_bubble_plots`
87
+ Description: Returns a list of identifiers for available cell-state specific bubble plot images. These identifiers can be used with `get_static_image_path`.
88
+ Input: None
89
+ Output: `image_identifiers` (list of strings) - E.g., ["naive_bubble_plot", "te_bubble_plot", ...]. Derived from internal mapping in `tools.agent_tools`.
90
+
91
+ ---
92
+
93
+ Tool Name: `list_available_wave_analysis_assets`
94
+ Description: Returns a structured dictionary of available asset identifiers for a specific TF wave (main image, GO/KEGG image, ranked text images). Identifiers can be used with `get_static_image_path`.
95
+ Input: `wave_number` (integer, 1-7) - The wave number.
96
+ Output: `asset_info` (dictionary) - E.g., `{"main_image_id": "waveX_main_img", "gokegg_image_id": "waveX_gokegg_img", "ranked_text_image_ids": ["waveX_ranked_text1_img", ...]}`. Returns empty if wave number is invalid. Derived from internal mapping in `tools.agent_tools`.
97
+
98
+ ---
99
+
100
+ Tool Name: `get_internal_navigation_info`
101
+ Description: Provides information about where an internal UI link (like those on the homepage image map or wave overview images) is intended to navigate within the application structure.
102
+ Input: `link_id` (string) - The identifier of the link (e.g., "to_tfcat", "to_tfwave", "to_tfnet", "c1_link", "c2_link", etc.).
103
+ Output: `navigation_target_description` (string) - A human-readable description of the target (e.g., "Navigates to the 'TF Catalog' section.", "Navigates to the 'Wave 1 Analysis' tab."). Derived from internal mapping in `tools.agent_tools`.
104
+
105
+ ---
106
+
107
+ Tool Name: `get_biorxiv_paper_url`
108
+ Description: Returns the URL for the main bioRxiv paper referenced in the application.
109
+ Input: None
110
+ Output: `url` (string) - The bioRxiv paper URL.
111
+
112
+ ---
113
+
114
+ Tool Name: `list_all_files_in_www_directory`
115
+ Description: Scans the entire `www/` directory (and its subdirectories, excluding common hidden/system files) and returns a list of all files found. For each file, it provides its relative path from the project root (e.g., "www/images/logo.png"), its detected MIME type (e.g., "image/png", "text/csv", "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"), and its size in bytes. This tool helps in understanding all available static assets and data files within the web-accessible `www` directory.
116
+ Input: None
117
+ Output: `file_manifest` (list of dictionaries) - Each dictionary represents a file and contains the keys: `path` (string), `type` (string), `size` (integer). Example item: `{"path": "www/data/report.txt", "type": "text/plain", "size": 1024}`. Returns an empty list if the `www` directory isn't found or is empty.
118
+
119
+ ---
120
+
121
+ ### `multi_source_literature_search(queries: list[str], max_results_per_query_per_source: int = 1, max_total_unique_papers: int = 10) -> list[dict]`
122
+
123
+ Searches for academic literature across multiple sources (Semantic Scholar, PubMed, ArXiv) using a list of provided search queries. It then de-duplicates the results based primarily on DOI, and secondarily on a combination of title and first author if DOI is not available. The search process stops early if the `max_total_unique_papers` limit is reached.
124
+
125
+ **Args:**
126
+
127
+ * `queries (list[str])`: A list of search query strings. The GenerationAgent should brainstorm 3-5 diverse queries relevant to the user's request.
128
+ * `max_results_per_query_per_source (int)`: The maximum number of results to fetch from EACH academic source (Semantic Scholar, PubMed, ArXiv) for EACH query string. Defaults to `1`.
129
+ * `max_total_unique_papers (int)`: The maximum total number of unique de-duplicated papers to return across all queries and sources. Defaults to `10`. The tool will stop fetching more data once this limit is met.
130
+
131
+ **Returns:**
132
+
133
+ * `list[dict]`: A consolidated and de-duplicated list of paper details, containing up to `max_total_unique_papers`. Each dictionary in the list represents a paper and has the following keys:
134
+ * `"title" (str)`: The title of the paper. "N/A" if not available.
135
+ * `"authors" (list[str])`: A list of author names. ["N/A"] if not available.
136
+ * `"year" (str | int)`: The publication year. "N/A" if not available.
137
+ * `"abstract" (str)`: A snippet of the abstract (typically up to 500 characters followed by "..."). "N/A" if not available.
138
+ * `"doi" (str | None)`: The Digital Object Identifier. `None` if not available.
139
+ * `"url" (str)`: A direct URL to the paper (e.g., PubMed link, ArXiv link, Semantic Scholar link). "N/A" if not available.
140
+ * `"venue" (str)`: The publication venue (e.g., journal name, "ArXiv"). "N/A" if not available.
141
+ * `"source_api" (str)`: The API from which this record was retrieved (e.g., "Semantic Scholar", "PubMed", "ArXiv").
142
+
143
+ **GenerationAgent Usage Example (for `python_code` field when `status` is `AWAITING_DATA`):**
144
+
145
+ ```python
146
+ # Example: User asks for up to 3 papers
147
+ print(json.dumps({'intermediate_data_for_llm': tools.multi_source_literature_search(queries=["T-cell exhaustion markers AND cancer", "immunotherapy for melanoma AND biomarkers"], max_results_per_query_per_source=1, max_total_unique_papers=3)}))
148
+
149
+ # Example: Defaulting to 10 total unique papers
150
+ print(json.dumps({'intermediate_data_for_llm': tools.multi_source_literature_search(queries=["COVID-19 long-term effects"], max_results_per_query_per_source=2)}))
151
+ ```
152
+
153
+ **Important Considerations for GenerationAgent:**
154
+
155
+ * When results are returned from this tool, the `GenerationAgent`'s `explanation` (for `CODE_COMPLETE` status) should present a summary of the *found papers* (e.g., titles, authors, URLs). It should clearly state that these are potential literature leads and should *not* yet claim to have read or summarized the full content of these papers in that same turn, unless a subsequent tool call for summarization is planned and executed.
156
+
157
+ ---
158
+
159
+ ### `fetch_text_from_urls(paper_info_list: list[dict], max_chars_per_paper: int = 15000) -> list[dict]`
160
+
161
+ Attempts to fetch and extract textual content from the URLs of papers provided in a list. This tool is typically used after `multi_source_literature_search` to gather content for summarization by the GenerationAgent.
162
+
163
+ **Args:**
164
+
165
+ * `paper_info_list (list[dict])`: A list of paper dictionaries, as returned by `multi_source_literature_search`. Each dictionary is expected to have at least a `"url"` key. Other keys like `"title"` and `"source_api"` are used for logging.
166
+ * `max_chars_per_paper (int)`: The maximum number of characters of text to retrieve and store for each paper. Defaults to `15000`. Text longer than this will be truncated.
167
+
168
+ **Returns:**
169
+
170
+ * `list[dict]`: The input `paper_info_list`, where each paper dictionary is augmented with a new key `"retrieved_text_content"`.
171
+ * If successful, `"retrieved_text_content" (str)` will contain the extracted text (up to `max_chars_per_paper`).
172
+ * If fetching or parsing fails for a paper, `"retrieved_text_content" (str)` will contain an error message (e.g., "Error: Invalid or missing URL.", "Error fetching URL: ...", "Error: No text could be extracted.").
173
+
174
+ **GenerationAgent Usage Example (for `python_code` field when `status` is `AWAITING_DATA`):**
175
+
176
+ This tool is usually the second step in a literature review process.
177
+
178
+ ```python
179
+ # Assume 'list_of_papers_from_search' is a variable holding the output from a previous
180
+ # call to tools.multi_source_literature_search(...)
181
+ print(json.dumps({'intermediate_data_for_llm': tools.fetch_text_from_urls(paper_info_list=list_of_papers_from_search, max_chars_per_paper=10000)}))
182
+ ```
183
+
184
+ **Important Considerations for GenerationAgent:**
185
+
186
+ * After this tool returns the `paper_info_list` (now with `"retrieved_text_content"`), the `GenerationAgent` is responsible for using its own LLM capabilities to read the `"retrieved_text_content"` for each paper and generate summaries if requested by the user or if it's part of its plan.
187
+ * The `GenerationAgent` should be prepared for `"retrieved_text_content"` to contain error messages and handle them gracefully in its summarization logic (e.g., by stating that text for a particular paper could not be retrieved).
188
+ * Web scraping is inherently unreliable; success in fetching and parsing text can vary greatly between websites. The agent should not assume text will always be available.
189
+
190
  ---
tools/excel_data_documentation.md CHANGED
@@ -1,183 +1,183 @@
1
- # Documented Excel Files
2
-
3
- This file lists the Excel files that have been analyzed and documented.
4
-
5
- * `./www/multi-omicsdata.xlsx`
6
-
7
- * `./www/networkanalysis/comp_log2FC_RegulatedData_TRMTEXterm.xlsx`
8
- comp_log2FC_RegulatedData_TRMTEXterm.xlsx tabulates log₂ fold-change values for 17,483 genes (rows) across 198 transcription factors (columns) in the TRM→TexTerm regulated-data comparison. The first column ("Unnamed: 0") lists each gene's identifier (e.g. "0610005C13RIK"); each subsequent column is named by a TF (Ahr, Arid3a, Arnt, …, Zscan20) and contains the corresponding log₂ fold-change value.
9
-
10
- For instance, a value of 19.615925 in row 0610009B22RIK under Arnt indicates that gene 0610009B22RIK exhibited a log₂ fold-change of 19.615925 in the Arnt-associated regulated data when comparing TRM to TexTerm.
11
-
12
- * `./www/old files/log2FC_RegulatedData_TRMTEXterm.xlsx`
13
-
14
- * `./www/tablePagerank/MP.xlsx`
15
- MP.xlsx tabulates performance scores for 57 transcription factors ("TF") across 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention <Method>_<TrainingDataset>_<EvaluationDataset>, where:
16
-
17
- Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.
18
-
19
- TrainingDataset is the source dataset used to train the model (e.g., Mackay, Chung, Scott, etc.).
20
-
21
- EvaluationDataset is the dataset on which performance was assessed.
22
-
23
- Each cell contains the resulting floating-point score for that TF under the specified method and dataset pairing.
24
-
25
- For example, a cell value of 0.72 in row GATA1 under column MP_Mackay_Chung means that the MP scoring method—trained on the Mackay dataset—achieved a performance score of 0.72 when evaluated on the Chung dataset.
26
-
27
- * `./www/tablePagerank/Naive.xlsx`
28
- Naive.xlsx tabulates performance scores for 31 transcription factors ("TF") across the same 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention <Method>_<TrainingDataset>_<EvaluationDataset>, where:
29
-
30
- Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.
31
-
32
- TrainingDataset is the source dataset used to train the model (e.g., Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson, etc.).
33
-
34
- EvaluationDataset is the dataset on which performance was assessed.
35
-
36
- Each cell contains the resulting floating-point score for that TF under the specified method and dataset pairing.
37
-
38
- For example, in row Tcf7 under column Naive_Kaech_Chung, the value 1.626392 indicates that the Naive scoring method—trained on the Kaech dataset—achieved a performance score of 1.626392 when evaluated on the Chung dataset.
39
-
40
- * `./www/tablePagerank/Table_TF PageRank Scores for Audrey.xlsx`
41
- Table_TF PageRank Scores for Audrey.xlsx tabulates PageRank‐derived scores for 308 transcription factors (“TF”) across the same 42 method–dataset combinations, with two additional annotation columns:
42
-
43
- TF (first column): Transcription factor name.
44
-
45
- Category: Broad TF class (e.g. “Universal TFs,” “Lineage-specific TFs,” etc.).
46
-
47
- Cell-state specificity: Whether the TF is “Universal,” “Pluripotent,” “Myeloid,” etc.
48
-
49
- Each of the remaining 42 columns follows the convention <Method>_<TrainingDataset>_<EvaluationDataset>, where:
50
-
51
- Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.
52
-
53
- TrainingDataset is the dataset used to fit the PageRank model (e.g. Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson).
54
-
55
- EvaluationDataset is the dataset on which the PageRank scores were assessed.
56
-
57
- Each cell holds the floating-point PageRank score for that TF under the specified method and dataset pairing.
58
-
59
- For example, a value of 1.003938 in row Elf1 under column Naive_Kaech_Kaech indicates that the Naive PageRank model—trained and evaluated on the Kaech dataset—assigned Elf1 a score of 1.003938.
60
-
61
- * `./www/tablePagerank/TCM.xlsx`
62
- TCM.xlsx tabulates performance scores for 28 transcription factors (“TF”) across 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention <Method>_<TrainingDataset>_<EvaluationDataset>, where:
63
-
64
- Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.
65
-
66
- TrainingDataset is the source dataset used to train the model (e.g. Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson).
67
-
68
- EvaluationDataset is the dataset on which performance was assessed (e.g. Chung, Mackay, Scott, etc.).
69
-
70
- Each cell holds the resulting floating-point metric for that TF under the specified method and dataset pairing.
71
-
72
- For example, a value of 0.837792 in row Msgn1 under column TCM_Mackay_Chung indicates that the TCM scoring method—trained on the Mackay dataset—achieved a performance score of 0.837792 when evaluated on the Chung dataset.
73
-
74
- * `./www/tablePagerank/TE.xlsx`
75
- TE.xlsx tabulates performance scores for 33 transcription factors (“TF”) across the same 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention <Method>_<TrainingDataset>_<EvaluationDataset>, where:
76
-
77
- Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.
78
-
79
- TrainingDataset is the source dataset used to train the model (e.g., Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson).
80
-
81
- EvaluationDataset is the dataset on which performance was assessed (e.g., Chung, Scott, Mackay, etc.).
82
-
83
- Each cell contains the resulting floating-point metric for that TF under the specified method and dataset pairing.
84
-
85
- For example, if you see 0.65 in row Myod1 under column TE_Mackay_Chung, it means that the TE method—trained on the Mackay dataset—achieved a performance score of 0.65 when evaluated on the Chung dataset.
86
-
87
- * `./www/tablePagerank/TEM.xlsx`
88
- TEM.xlsx tabulates performance scores for 25 transcription factors (“TF”) across 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention <Method>_<TrainingDataset>_<EvaluationDataset>, where:
89
-
90
- Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.
91
-
92
- TrainingDataset is the source dataset used to train the model (e.g., Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson).
93
-
94
- EvaluationDataset is the dataset on which performance was assessed (e.g., Chung, Mackay, Scott, etc.).
95
-
96
- Each cell contains the resulting floating-point metric for that TF under the specified method and dataset pairing.
97
-
98
- For example, a value of 1.6696786566 in row Foxc2 under column TEM_Mackay_Chung means that the TEM scoring method—trained on the Mackay dataset—achieved a performance score of 1.6696786566 when evaluated on the Chung dataset.
99
-
100
- * `./www/tablePagerank/TEXeff.xlsx`
101
- TEXeff.xlsx tabulates performance scores for 62 transcription factors (“TF”) across 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention <Method>_<TrainingDataset>_<EvaluationDataset>, where:
102
-
103
- Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.
104
-
105
- TrainingDataset is the source dataset used to train the model (e.g. Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson).
106
-
107
- EvaluationDataset is the dataset on which performance was assessed.
108
-
109
- Each cell contains the resulting floating‐point metric for that TF under the specified method and dataset pairing.
110
-
111
- For example, a value of 0.647 in row Vax2 under column TexTerm_Hudson_Beltra means that the TexTerm scoring method—trained on the Hudson dataset—achieved a performance score of 0.647 when evaluated on the Beltra dataset.
112
-
113
- * `./www/tablePagerank/TEXprog.xlsx`
114
- TEXprog.xlsx tabulates performance scores for 63 transcription factors (“TF”) across 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention <Method>_<TrainingDataset>_<EvaluationDataset>, where:
115
-
116
- Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.
117
-
118
- TrainingDataset is the source dataset used to train the model (e.g. Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson).
119
-
120
- EvaluationDataset is the dataset on which performance was assessed (e.g. Chung, Mackay, Scott, etc.).
121
-
122
- Each cell holds the resulting floating-point metric for that TF under the specified method and dataset pairing.
123
-
124
- For example, a value of 1.5403 in row Irf9 under column TexProg_Beltra_Chung means that the TexProg scoring method—trained on the Beltra dataset—achieved a performance score of 1.5403 when evaluated on the Chung dataset.
125
-
126
- * `./www/tablePagerank/TEXterm.xlsx`
127
- TEXterm.xlsx tabulates performance scores for 51 transcription factors (“TF”) across 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention <Method>_<TrainingDataset>_<EvaluationDataset>, where:
128
-
129
- Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.
130
-
131
- TrainingDataset is the dataset used to fit the model (e.g. Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson).
132
-
133
- EvaluationDataset is the dataset on which performance was assessed.
134
-
135
- Each cell holds the floating-point metric for that TF under the specified method and dataset pairing.
136
-
137
- For example, a value of 0.912 in row Sox2 under column TexTerm_Scott_Mackay means that the TexTerm method—trained on the Scott dataset—achieved a performance score of 0.912 when evaluated on the Mackay dataset.
138
-
139
- * `./www/tablePagerank/TRM.xlsx`
140
- TRM.xlsx tabulates performance scores for 43 transcription factors (“TF”) across the same 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention <Method>_<TrainingDataset>_<EvaluationDataset>, where:
141
-
142
- Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.
143
-
144
- TrainingDataset is the dataset used to train the model (e.g., Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson).
145
-
146
- EvaluationDataset is the dataset on which performance was assessed (e.g., Chung, Mackay, Scott, etc.).
147
-
148
- Each cell contains the resulting floating-point metric for that TF under the specified method and dataset pairing.
149
-
150
- For example, a value of 0.91 in row PU.1 under column TRM.IEL_Chung_Mackay means that the TRM.IEL scoring method—trained on the Chung dataset—achieved a performance score of 0.91 when evaluated on the Mackay dataset.
151
-
152
- * `./www/tfcommunities/texcommunities.xlsx`
153
- texcommunities.xlsx is a multi-sheet workbook (12 sheets) that organizes transcription factors into network "communities" for two models—TEX and TRM:
154
-
155
- TEX Communities: A summary sheet with two columns—C (community ID, e.g. C1–C5) and TF Members (a comma-separated list of all TFs in that community).
156
-
157
- TEX_c1 through TEX_c5: One sheet per TEX community, each listing a single TF column of member factors.
158
-
159
- TRM Communities: A parallel summary sheet for the TRM model, also with C and TF Members columns.
160
-
161
- TRM_c1 through TRM_c5: Individual sheets listing TFs for each TRM community.
162
-
163
- Each community groups TFs based on network topology under the respective model. For example, in the TEX Communities sheet, community C1 includes the following TF members: Usf1, Arnt, Mlx, Srebf1, Arntl, Tfe3, Heyl, Bhlhe40, …, indicating that these factors cluster together in the TEX network.
164
-
165
- * `./www/tfcommunities/trmcommunities.xlsx`
166
- trmcommunities.xlsx is a multi‐sheet workbook (6 sheets) that defines transcription factor communities for the TRM network model:
167
-
168
- TRM Communities: A summary sheet with two columns—C (community ID, C1–C5) and TF Members (a comma‐separated list of all TFs in that community).
169
-
170
- TRM_c1 through TRM_c5: Each sheet lists a single TF column naming the factors that belong to that community.
171
-
172
- These communities reflect clusters of TFs based on network topology under the TRM model. For example, in the TRM Communities sheet, community C1 might include TFs such as PU.1, Runx3, and Irf4, indicating that these factors form a tightly connected module in the TRM network.
173
-
174
- * `./www/TFcorintextrm/TF-TFcorTRMTEX.xlsx`
175
- TF-TFcorTRMTEX.xlsx contains pairwise correlation matrices of transcription factor scores for both the TRM and TEX models. It has two sheets:
176
-
177
- TRM: A square matrix where both rows and columns list the same set of TFs; each cell at the intersection of TF A (row) and TF B (column) gives the Pearson correlation coefficient between their TRM PageRank (or performance) scores across all dataset contexts.
178
-
179
- TEX: The analogous matrix for the TEX model.
180
-
181
- For example, on the TRM sheet, the value 0.82 at row PU.1 and column Runx3 indicates that PU.1 and Runx3 have a correlation of 0.82 in their TRM-derived scores.
182
-
183
- * `./www/waveanalysis/searchtfwaves.xlsx`
 
1
+ # Documented Excel Files
2
+
3
+ This file lists the Excel files that have been analyzed and documented.
4
+
5
+ * `./www/multi-omicsdata.xlsx`
6
+
7
+ * `./www/networkanalysis/comp_log2FC_RegulatedData_TRMTEXterm.xlsx`
8
+ comp_log2FC_RegulatedData_TRMTEXterm.xlsx tabulates log₂ fold-change values for 17,483 genes (rows) across 198 transcription factors (columns) in the TRM→TexTerm regulated-data comparison. The first column ("Unnamed: 0") lists each gene's identifier (e.g. "0610005C13RIK"); each subsequent column is named by a TF (Ahr, Arid3a, Arnt, …, Zscan20) and contains the corresponding log₂ fold-change value.
9
+
10
+ For instance, a value of 19.615925 in row 0610009B22RIK under Arnt indicates that gene 0610009B22RIK exhibited a log₂ fold-change of 19.615925 in the Arnt-associated regulated data when comparing TRM to TexTerm.
11
+
12
+ * `./www/old files/log2FC_RegulatedData_TRMTEXterm.xlsx`
13
+
14
+ * `./www/tablePagerank/MP.xlsx`
15
+ MP.xlsx tabulates performance scores for 57 transcription factors ("TF") across 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention <Method>_<TrainingDataset>_<EvaluationDataset>, where:
16
+
17
+ Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.
18
+
19
+ TrainingDataset is the source dataset used to train the model (e.g., Mackay, Chung, Scott, etc.).
20
+
21
+ EvaluationDataset is the dataset on which performance was assessed.
22
+
23
+ Each cell contains the resulting floating-point score for that TF under the specified method and dataset pairing.
24
+
25
+ For example, a cell value of 0.72 in row GATA1 under column MP_Mackay_Chung means that the MP scoring method—trained on the Mackay dataset—achieved a performance score of 0.72 when evaluated on the Chung dataset.
26
+
27
+ * `./www/tablePagerank/Naive.xlsx`
28
+ Naive.xlsx tabulates performance scores for 31 transcription factors ("TF") across the same 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention <Method>_<TrainingDataset>_<EvaluationDataset>, where:
29
+
30
+ Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.
31
+
32
+ TrainingDataset is the source dataset used to train the model (e.g., Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson, etc.).
33
+
34
+ EvaluationDataset is the dataset on which performance was assessed.
35
+
36
+ Each cell contains the resulting floating-point score for that TF under the specified method and dataset pairing.
37
+
38
+ For example, in row Tcf7 under column Naive_Kaech_Chung, the value 1.626392 indicates that the Naive scoring method—trained on the Kaech dataset—achieved a performance score of 1.626392 when evaluated on the Chung dataset.
39
+
40
+ * `./www/tablePagerank/Table_TF PageRank Scores for Audrey.xlsx`
41
+ Table_TF PageRank Scores for Audrey.xlsx tabulates PageRank‐derived scores for 308 transcription factors (“TF”) across the same 42 method–dataset combinations, with two additional annotation columns:
42
+
43
+ TF (first column): Transcription factor name.
44
+
45
+ Category: Broad TF class (e.g. “Universal TFs,” “Lineage-specific TFs,” etc.).
46
+
47
+ Cell-state specificity: Whether the TF is “Universal,” “Pluripotent,” “Myeloid,” etc.
48
+
49
+ Each of the remaining 42 columns follows the convention <Method>_<TrainingDataset>_<EvaluationDataset>, where:
50
+
51
+ Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.
52
+
53
+ TrainingDataset is the dataset used to fit the PageRank model (e.g. Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson).
54
+
55
+ EvaluationDataset is the dataset on which the PageRank scores were assessed.
56
+
57
+ Each cell holds the floating-point PageRank score for that TF under the specified method and dataset pairing.
58
+
59
+ For example, a value of 1.003938 in row Elf1 under column Naive_Kaech_Kaech indicates that the Naive PageRank model—trained and evaluated on the Kaech dataset—assigned Elf1 a score of 1.003938.
60
+
61
+ * `./www/tablePagerank/TCM.xlsx`
62
+ TCM.xlsx tabulates performance scores for 28 transcription factors (“TF”) across 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention <Method>_<TrainingDataset>_<EvaluationDataset>, where:
63
+
64
+ Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.
65
+
66
+ TrainingDataset is the source dataset used to train the model (e.g. Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson).
67
+
68
+ EvaluationDataset is the dataset on which performance was assessed (e.g. Chung, Mackay, Scott, etc.).
69
+
70
+ Each cell holds the resulting floating-point metric for that TF under the specified method and dataset pairing.
71
+
72
+ For example, a value of 0.837792 in row Msgn1 under column TCM_Mackay_Chung indicates that the TCM scoring method—trained on the Mackay dataset—achieved a performance score of 0.837792 when evaluated on the Chung dataset.
73
+
74
+ * `./www/tablePagerank/TE.xlsx`
75
+ TE.xlsx tabulates performance scores for 33 transcription factors (“TF”) across the same 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention <Method>_<TrainingDataset>_<EvaluationDataset>, where:
76
+
77
+ Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.
78
+
79
+ TrainingDataset is the source dataset used to train the model (e.g., Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson).
80
+
81
+ EvaluationDataset is the dataset on which performance was assessed (e.g., Chung, Scott, Mackay, etc.).
82
+
83
+ Each cell contains the resulting floating-point metric for that TF under the specified method and dataset pairing.
84
+
85
+ For example, if you see 0.65 in row Myod1 under column TE_Mackay_Chung, it means that the TE method—trained on the Mackay dataset—achieved a performance score of 0.65 when evaluated on the Chung dataset.
86
+
87
+ * `./www/tablePagerank/TEM.xlsx`
88
+ TEM.xlsx tabulates performance scores for 25 transcription factors (“TF”) across 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention <Method>_<TrainingDataset>_<EvaluationDataset>, where:
89
+
90
+ Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.
91
+
92
+ TrainingDataset is the source dataset used to train the model (e.g., Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson).
93
+
94
+ EvaluationDataset is the dataset on which performance was assessed (e.g., Chung, Mackay, Scott, etc.).
95
+
96
+ Each cell contains the resulting floating-point metric for that TF under the specified method and dataset pairing.
97
+
98
+ For example, a value of 1.6696786566 in row Foxc2 under column TEM_Mackay_Chung means that the TEM scoring method—trained on the Mackay dataset—achieved a performance score of 1.6696786566 when evaluated on the Chung dataset.
99
+
100
+ * `./www/tablePagerank/TEXeff.xlsx`
101
+ TEXeff.xlsx tabulates performance scores for 62 transcription factors (“TF”) across 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention <Method>_<TrainingDataset>_<EvaluationDataset>, where:
102
+
103
+ Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.
104
+
105
+ TrainingDataset is the source dataset used to train the model (e.g. Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson).
106
+
107
+ EvaluationDataset is the dataset on which performance was assessed.
108
+
109
+ Each cell contains the resulting floating‐point metric for that TF under the specified method and dataset pairing.
110
+
111
+ For example, a value of 0.647 in row Vax2 under column TexTerm_Hudson_Beltra means that the TexTerm scoring method—trained on the Hudson dataset—achieved a performance score of 0.647 when evaluated on the Beltra dataset.
112
+
113
+ * `./www/tablePagerank/TEXprog.xlsx`
114
+ TEXprog.xlsx tabulates performance scores for 63 transcription factors (“TF”) across 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention <Method>_<TrainingDataset>_<EvaluationDataset>, where:
115
+
116
+ Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.
117
+
118
+ TrainingDataset is the source dataset used to train the model (e.g. Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson).
119
+
120
+ EvaluationDataset is the dataset on which performance was assessed (e.g. Chung, Mackay, Scott, etc.).
121
+
122
+ Each cell holds the resulting floating-point metric for that TF under the specified method and dataset pairing.
123
+
124
+ For example, a value of 1.5403 in row Irf9 under column TexProg_Beltra_Chung means that the TexProg scoring method—trained on the Beltra dataset—achieved a performance score of 1.5403 when evaluated on the Chung dataset.
125
+
126
+ * `./www/tablePagerank/TEXterm.xlsx`
127
+ TEXterm.xlsx tabulates performance scores for 51 transcription factors (“TF”) across 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention <Method>_<TrainingDataset>_<EvaluationDataset>, where:
128
+
129
+ Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.
130
+
131
+ TrainingDataset is the dataset used to fit the model (e.g. Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson).
132
+
133
+ EvaluationDataset is the dataset on which performance was assessed.
134
+
135
+ Each cell holds the floating-point metric for that TF under the specified method and dataset pairing.
136
+
137
+ For example, a value of 0.912 in row Sox2 under column TexTerm_Scott_Mackay means that the TexTerm method—trained on the Scott dataset—achieved a performance score of 0.912 when evaluated on the Mackay dataset.
138
+
139
+ * `./www/tablePagerank/TRM.xlsx`
140
+ TRM.xlsx tabulates performance scores for 43 transcription factors (“TF”) across the same 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention <Method>_<TrainingDataset>_<EvaluationDataset>, where:
141
+
142
+ Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.
143
+
144
+ TrainingDataset is the dataset used to train the model (e.g., Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson).
145
+
146
+ EvaluationDataset is the dataset on which performance was assessed (e.g., Chung, Mackay, Scott, etc.).
147
+
148
+ Each cell contains the resulting floating-point metric for that TF under the specified method and dataset pairing.
149
+
150
+ For example, a value of 0.91 in row PU.1 under column TRM.IEL_Chung_Mackay means that the TRM.IEL scoring method—trained on the Chung dataset—achieved a performance score of 0.91 when evaluated on the Mackay dataset.
151
+
152
+ * `./www/tfcommunities/texcommunities.xlsx`
153
+ texcommunities.xlsx is a multi-sheet workbook (12 sheets) that organizes transcription factors into network "communities" for two models—TEX and TRM:
154
+
155
+ TEX Communities: A summary sheet with two columns—C (community ID, e.g. C1–C5) and TF Members (a comma-separated list of all TFs in that community).
156
+
157
+ TEX_c1 through TEX_c5: One sheet per TEX community, each listing a single TF column of member factors.
158
+
159
+ TRM Communities: A parallel summary sheet for the TRM model, also with C and TF Members columns.
160
+
161
+ TRM_c1 through TRM_c5: Individual sheets listing TFs for each TRM community.
162
+
163
+ Each community groups TFs based on network topology under the respective model. For example, in the TEX Communities sheet, community C1 includes the following TF members: Usf1, Arnt, Mlx, Srebf1, Arntl, Tfe3, Heyl, Bhlhe40, …, indicating that these factors cluster together in the TEX network.
164
+
165
+ * `./www/tfcommunities/trmcommunities.xlsx`
166
+ trmcommunities.xlsx is a multi‐sheet workbook (6 sheets) that defines transcription factor communities for the TRM network model:
167
+
168
+ TRM Communities: A summary sheet with two columns—C (community ID, C1–C5) and TF Members (a comma‐separated list of all TFs in that community).
169
+
170
+ TRM_c1 through TRM_c5: Each sheet lists a single TF column naming the factors that belong to that community.
171
+
172
+ These communities reflect clusters of TFs based on network topology under the TRM model. For example, in the TRM Communities sheet, community C1 might include TFs such as PU.1, Runx3, and Irf4, indicating that these factors form a tightly connected module in the TRM network.
173
+
174
+ * `./www/TFcorintextrm/TF-TFcorTRMTEX.xlsx`
175
+ TF-TFcorTRMTEX.xlsx contains pairwise correlation matrices of transcription factor scores for both the TRM and TEX models. It has two sheets:
176
+
177
+ TRM: A square matrix where both rows and columns list the same set of TFs; each cell at the intersection of TF A (row) and TF B (column) gives the Pearson correlation coefficient between their TRM PageRank (or performance) scores across all dataset contexts.
178
+
179
+ TEX: The analogous matrix for the TEX model.
180
+
181
+ For example, on the TRM sheet, the value 0.82 at row PU.1 and column Runx3 indicates that PU.1 and Runx3 have a correlation of 0.82 in their TRM-derived scores.
182
+
183
+ * `./www/waveanalysis/searchtfwaves.xlsx`
tools/ui_texts.json CHANGED
@@ -1,27 +1,27 @@
1
- {
2
- "home_intro_1": "This website serves as a companion resource to our study, \"Multi-Omics Atlas-Assisted Discovery of Transcription Factors for Selective T Cell State Programming.\" It is designed to make the bioinformatics analyses and data from the study accessible to researchers from diverse backgrounds, including those without extensive bioinformatics expertise. The platform provides comprehensive tools to explore transcription factor (TF) activities across distinct T cell states, enabling users to examine TF scores for specific cell states, multi-state TFs and their regulatory roles, visualize TF relationships using wave and network analyses and access a searchable database of TF scores and multi-omics data. Explore our TF catalog, network analyses, and more to uncover insights into our study.",
3
- "home_tcell_states_desc": "Naive T cells adopt diverse states in various contexts, such as acute or chronic infections and tumors. Upon activation, they become early effector (EE) cells that differentiate into distinct CD8+ T cell subsets with varied trafficking patterns—residing in lymphoid organs, blood, or peripheral tissues. In acute infection, TE (Terminal Effector) cells are found in the spleen or blood, while MP (Memory Precursor) cells mainly reside in lymphoid structures. TCM (Central Memory) and TEM (Effector Memory) cells circulate in the blood, with TCM predominant in lymphoid organs and TEM in tissues. TPM cells circulate throughout lymph, blood, and tissues, while TRM (Tissue-Resident Memory) cells stay long-term in tissues. Similarly, chronic infections or tumors induce T cell states through chronic antigen exposure, leading to a spectrum of exhaustion, from TEXprog (Progenitor) to TEXterm (Terminal). These cells lose function over time, with TEXterm becoming dysfunctional. In this study, we developed a pipeline to analyze transcription factor regulation across CD8+ T cell states, enabling therapeutic manipulation. Using 121 experiments, we created an epigenetic and transcription atlas of 9 cell states, allowing an unbiased analysis of unique and shared transcription factor activities across memory and exhaustion contexts.",
4
- "home_study_summary": "Transcription factors (TFs) regulate the differentiation of T cells into diverse states with distinct functionalities. To precisely program desired T cell states in viral infections and cancers, we generated a comprehensive transcriptional and epigenetic atlas of nine CD8+ T cell differentiation states for TF activity prediction. Our analysis catalogued TF activity fingerprints of each state, uncovering new regulatory mechanisms that govern selective cell state differentiation. Leveraging this platform, we focused on two critical T cell states in tumor and virus control: terminally exhausted T cells (TEXterm), which are dysfunctional, and tissue-resident memory T cells (TRM), which are protective. Despite their functional differences, these states share significant transcriptional and anatomical similarities, making it both challenging and essential to engineer T cells that avoid TEXterm differentiation while preserving beneficial TRM characteristics. Through in vivo CRISPR screening combined with single-cell RNA sequencing (Perturb-seq), we validated the specific TFs driving the TEXterm state and confirmed the accuracy of TF specificity predictions. Importantly, we discovered novel TEXterm-specific TFs such as ZSCAN20, JDP2, and ZFP324. The deletion of these TEXterm-specific TFs in T cells enhanced tumor control and synergized with immune checkpoint blockade. Additionally, this study identified multi-state TFs like HIC1 and GFI1, which are vital for both TEXterm and TRM states. Furthermore, our global TF community analysis and Perturb-seq experiments revealed how TFs differentially regulate key processes in TRM and TEXterm cells, uncovering new biological pathways like protein catabolism that are specifically linked to TEXterm differentiation. In summary, our platform systematically identifies TF programs across diverse T cell states, facilitating the engineering of specific T cell states to improve tumor control and providing insights into the cellular mechanisms underlying their functional disparities.",
5
- "chat_disclaimer": "⚠️ TaijiChat can make errors. Please verify important scientific information and consult original research papers for critical findings.",
6
- "chat_setup_warning": "📊 Note: Your first query may take longer as we initialize the data analysis system.",
7
- "tf_score_calculation_info": "TF score: normalized PageRank scores across samples. Higher scores mean higher activity. TF score takes account of TF expression level, ATAC-seq peak intensity, and motif binding affinity. A TF needs to have both open chromatin regions and gene expression of its downstream genes to have high PageRank scores. For the detailed information of how TF scores are calculated, visit the link: https://taiji-pipeline.github.io/algorithm_PageRank.html",
8
- "cell_state_specificity_info": "Cell-state specificity: significantly higher TF score in specific cell state. For example, 'TEXterm' means TF has significantly higher TF score in TEXterm. \"ALL\" means no cell-state specificity.",
9
- "sample_nomenclature_info": "Sample name nomenclature: cell state abbreviation + first author of RNA dataset + first author of ATAC dataset. If the same author has multiple datasets, then publication year and month is used. For example: 'Milner17' means Milner paper published in 2017; 'MilnerApr' means Milner paper published in April.",
10
- "cell_state_bubble_plot_desc": "Below you will find the bubble plot and a searchable excel file containing all the normalized TF activity scores. Circle size represents the logarithm of gene expression, and the color represents the normalized PageRank score.",
11
- "wave_analysis_overview_text": "To evaluate the predicted TFs governing specific T cell differentiation pathways, we identified dynamic activity patterns of TF groups, termed 'Transcription factor waves'. Transcription factor waves are generated via integration of the unbiased clustering and prior immunology knowledge. This curates catalogs of TFs associated with different cell states or differentiation trajectories. Circles represent specific cell state. Color indicates normalized PageRank scores with red displaying high values. Click to check TFs and GSEA results associated with each wave.",
12
- "wave_analysis_seven_waves_desc": "Seven TF waves associated with distinctive biological pathways were identified. For example, the TRM TF wave (Wave 6) includes several members of the AP-1 family (e.g. Atf3, Fosb, Fosl2, and Jun), which aligns well with a recent report of their roles in TRM formation (link: https://www.biorxiv.org/content/10.1101/2023.09.29.560006v1). This wave was uniquely linked to the transforming growth factor beta (TGFβ) response pathway. Conversely, the TEX TF (including TEXprog, TEXeff, and TEXterm) wave, Wave 2, was characterized by a distinct set of TFs, such as Irf8, Jdp2, Nfatc1, and Vax2, among others, that correlated with pathways related to PD1 and senescence.",
13
- "wave_analysis_click_prompt": "Click on the wave images below to be redirected to their corresponding pages and learn more about them!",
14
- "wave_X_analysis_placeholder_details": "Details about the Wave {X} analysis go here.",
15
- "tf_network_correlation_methodology": "Inspired by DBPNet (1) which is a framework to identify cooperations between DNA-binding proteins using Chromatin immunoprecipitation followed by sequencing (ChIP-seq) and Hi-C data, we constructed TF interaction network based on Taiji's output, which is a TF-regulatee network. For each context, we first combined the cell state-important TFs and cell state-specific TFs. In total, 159 TFs for TEXterm and 170 TFs for TRM were selected. We then combined TEXterm/TRM samples' network by taking the mean value of edge weight for each TF-regulatee pair. Next, regulatees with low variation across TFs (standard deviation <= 1) were removed, then correlation matrix between TFs is calculated by taking account of the Spearman's correlation of edge weight for each TF-regulatee pair. R package \"huge\" (2) is used to build a graphical model and construct the graph. We employed the Graphical lasso algorithm and the shrunken ECDF (empirical cumulative distribution function) estimator. We used a lasso penalty parameter lambda equal to 0.052 to control the regularization. We chose this value based on the local minimum point on the sparsity-lambda curve. When lambda = 0.05, around 15% of TF-TF pairs are considered as connected in the network. To estimate the false discovery rate, we generated a null model by random shuffling the edge weight of TF-regulatee pair across TFs. When the same algorithm is applied to this dataset, the chosen cutoff identifies zero interaction, suggesting that the method with cutoff equal to 0.05 has a very low false discovery rate.",
16
- "tf_network_correlation_legend": "Key for TF-TF Network Image: Circle: TF-specificity Green: TRM-specific Brown: TEXterm-specific. Line thickness: TF-TF interaction intensity. Line color: Green: TF-TF interaction found in TRM, Brown: TF-TF interaction found in TEXterm.",
17
- "tf_community_methodology": "Communities were detected using Leiden algorithm (3) with modularity as objective function and resolution as 0.9 since it reached the highest clustering modularity. In total, we identified 5 communities for each context. Network visualization was performed by graph with Fruchterman-Reingold layout algorithm (4) utilizing R package \"igraph\" (https://r.igraph.org/).",
18
- "tf_community_trmtexcom_image_desc": "TF-TF association clustering generates five TF communities between TRM and TEXterm cells. Left: The overall community topology is shaped by shared TFs (gray) in both TRM and TEXterm cells. Middle and Right: TRM and TEXterm cells show differential TF-TF interactions within each community and between communities. Top 10% of interactions are shown. The line thickness represents the interaction intensity.",
19
- "tf_community_members_prompt": "Members of each TF community in each cell state are below:",
20
- "tf_community_pathway_desc": "TF neighbor communities in TEXterm and TRM cells, respectively are linked to different biological processes (below). The overall topology of these communities was influenced by multi-state TFs active in both cell states, while cell state-specific TFs created unique interaction patterns between the communities. Pathway analysis of the regulatees in each community suggested that TRM- or TEXterm-specific TFs within each community controlled different biological pathways. For example, in TRM cells, community 3 was associated with cell-cell adhesion and response to TGFβ, and community 1 was associated with RNA metabolism, but in TEXterm cells, community 3 was linked to apoptosis, and community 1 was coupled to catabolism, proteolysis, ubiquitin-proteasome, and autophagy.",
21
- "multi_omics_data_table_scroll_prompt": "Scroll horizontally to view entire data table",
22
- "multi_omics_data_composition_desc": "Composition of multi-omic atlas. A total of 121 experiments across multiple data sets were utilized to generate an epigenetic and transcriptional atlas of murine CD8+ T cells under chronic and acute antigen exposure. Unless stated, all CD8+ T cells were isolated from spleens.",
23
- "citation_zhang_2016": "1. Zhang, K., Li, N., Ainsworth, R. I. & Wang, W. Systematic identification of protein combinations mediating chromatin looping. Nat. Commun. 7, 1–11 (2016).",
24
- "citation_zhao_2020": "2. Zhao, T., Liu, H., Roeder, K., Lafferty, J. & Wasserman, L. The huge Package for High-dimensional Undirected Graph Estimation in R. (2020) doi:10.48550/arXiv.2006.14781.",
25
- "citation_traag_2019": "3. Traag, V. A., Waltman, L. & van Eck, N. J. From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep. 9, 5233 (2019).",
26
- "citation_schonfeld_2019": "4. Schönfeld, M. & Pfeffer, J. Fruchterman/Reingold (1991): Graph Drawing by Force-Directed Placement. Schlüsselwerke der Netzwerkforschung 217–220 https://doi.org/10.1007/978-3-658-21742-6_49 (2019)."
27
  }
 
1
+ {
2
+ "home_intro_1": "This website serves as a companion resource to our study, \"Multi-Omics Atlas-Assisted Discovery of Transcription Factors for Selective T Cell State Programming.\" It is designed to make the bioinformatics analyses and data from the study accessible to researchers from diverse backgrounds, including those without extensive bioinformatics expertise. The platform provides comprehensive tools to explore transcription factor (TF) activities across distinct T cell states, enabling users to examine TF scores for specific cell states, multi-state TFs and their regulatory roles, visualize TF relationships using wave and network analyses and access a searchable database of TF scores and multi-omics data. Explore our TF catalog, network analyses, and more to uncover insights into our study.",
3
+ "home_tcell_states_desc": "Naive T cells adopt diverse states in various contexts, such as acute or chronic infections and tumors. Upon activation, they become early effector (EE) cells that differentiate into distinct CD8+ T cell subsets with varied trafficking patterns—residing in lymphoid organs, blood, or peripheral tissues. In acute infection, TE (Terminal Effector) cells are found in the spleen or blood, while MP (Memory Precursor) cells mainly reside in lymphoid structures. TCM (Central Memory) and TEM (Effector Memory) cells circulate in the blood, with TCM predominant in lymphoid organs and TEM in tissues. TPM cells circulate throughout lymph, blood, and tissues, while TRM (Tissue-Resident Memory) cells stay long-term in tissues. Similarly, chronic infections or tumors induce T cell states through chronic antigen exposure, leading to a spectrum of exhaustion, from TEXprog (Progenitor) to TEXterm (Terminal). These cells lose function over time, with TEXterm becoming dysfunctional. In this study, we developed a pipeline to analyze transcription factor regulation across CD8+ T cell states, enabling therapeutic manipulation. Using 121 experiments, we created an epigenetic and transcription atlas of 9 cell states, allowing an unbiased analysis of unique and shared transcription factor activities across memory and exhaustion contexts.",
4
+ "home_study_summary": "Transcription factors (TFs) regulate the differentiation of T cells into diverse states with distinct functionalities. To precisely program desired T cell states in viral infections and cancers, we generated a comprehensive transcriptional and epigenetic atlas of nine CD8+ T cell differentiation states for TF activity prediction. Our analysis catalogued TF activity fingerprints of each state, uncovering new regulatory mechanisms that govern selective cell state differentiation. Leveraging this platform, we focused on two critical T cell states in tumor and virus control: terminally exhausted T cells (TEXterm), which are dysfunctional, and tissue-resident memory T cells (TRM), which are protective. Despite their functional differences, these states share significant transcriptional and anatomical similarities, making it both challenging and essential to engineer T cells that avoid TEXterm differentiation while preserving beneficial TRM characteristics. Through in vivo CRISPR screening combined with single-cell RNA sequencing (Perturb-seq), we validated the specific TFs driving the TEXterm state and confirmed the accuracy of TF specificity predictions. Importantly, we discovered novel TEXterm-specific TFs such as ZSCAN20, JDP2, and ZFP324. The deletion of these TEXterm-specific TFs in T cells enhanced tumor control and synergized with immune checkpoint blockade. Additionally, this study identified multi-state TFs like HIC1 and GFI1, which are vital for both TEXterm and TRM states. Furthermore, our global TF community analysis and Perturb-seq experiments revealed how TFs differentially regulate key processes in TRM and TEXterm cells, uncovering new biological pathways like protein catabolism that are specifically linked to TEXterm differentiation. In summary, our platform systematically identifies TF programs across diverse T cell states, facilitating the engineering of specific T cell states to improve tumor control and providing insights into the cellular mechanisms underlying their functional disparities.",
5
+ "chat_disclaimer": "⚠️ TaijiChat can make errors. Please verify important scientific information and consult original research papers for critical findings.",
6
+ "chat_setup_warning": "📊 Note: Your first query may take longer as we initialize the data analysis system.",
7
+ "tf_score_calculation_info": "TF score: normalized PageRank scores across samples. Higher scores mean higher activity. TF score takes account of TF expression level, ATAC-seq peak intensity, and motif binding affinity. A TF needs to have both open chromatin regions and gene expression of its downstream genes to have high PageRank scores. For the detailed information of how TF scores are calculated, visit the link: https://taiji-pipeline.github.io/algorithm_PageRank.html",
8
+ "cell_state_specificity_info": "Cell-state specificity: significantly higher TF score in specific cell state. For example, 'TEXterm' means TF has significantly higher TF score in TEXterm. \"ALL\" means no cell-state specificity.",
9
+ "sample_nomenclature_info": "Sample name nomenclature: cell state abbreviation + first author of RNA dataset + first author of ATAC dataset. If the same author has multiple datasets, then publication year and month is used. For example: 'Milner17' means Milner paper published in 2017; 'MilnerApr' means Milner paper published in April.",
10
+ "cell_state_bubble_plot_desc": "Below you will find the bubble plot and a searchable excel file containing all the normalized TF activity scores. Circle size represents the logarithm of gene expression, and the color represents the normalized PageRank score.",
11
+ "wave_analysis_overview_text": "To evaluate the predicted TFs governing specific T cell differentiation pathways, we identified dynamic activity patterns of TF groups, termed 'Transcription factor waves'. Transcription factor waves are generated via integration of the unbiased clustering and prior immunology knowledge. This curates catalogs of TFs associated with different cell states or differentiation trajectories. Circles represent specific cell state. Color indicates normalized PageRank scores with red displaying high values. Click to check TFs and GSEA results associated with each wave.",
12
+ "wave_analysis_seven_waves_desc": "Seven TF waves associated with distinctive biological pathways were identified. For example, the TRM TF wave (Wave 6) includes several members of the AP-1 family (e.g. Atf3, Fosb, Fosl2, and Jun), which aligns well with a recent report of their roles in TRM formation (link: https://www.biorxiv.org/content/10.1101/2023.09.29.560006v1). This wave was uniquely linked to the transforming growth factor beta (TGFβ) response pathway. Conversely, the TEX TF (including TEXprog, TEXeff, and TEXterm) wave, Wave 2, was characterized by a distinct set of TFs, such as Irf8, Jdp2, Nfatc1, and Vax2, among others, that correlated with pathways related to PD1 and senescence.",
13
+ "wave_analysis_click_prompt": "Click on the wave images below to be redirected to their corresponding pages and learn more about them!",
14
+ "wave_X_analysis_placeholder_details": "Details about the Wave {X} analysis go here.",
15
+ "tf_network_correlation_methodology": "Inspired by DBPNet (1) which is a framework to identify cooperations between DNA-binding proteins using Chromatin immunoprecipitation followed by sequencing (ChIP-seq) and Hi-C data, we constructed TF interaction network based on Taiji's output, which is a TF-regulatee network. For each context, we first combined the cell state-important TFs and cell state-specific TFs. In total, 159 TFs for TEXterm and 170 TFs for TRM were selected. We then combined TEXterm/TRM samples' network by taking the mean value of edge weight for each TF-regulatee pair. Next, regulatees with low variation across TFs (standard deviation <= 1) were removed, then correlation matrix between TFs is calculated by taking account of the Spearman's correlation of edge weight for each TF-regulatee pair. R package \"huge\" (2) is used to build a graphical model and construct the graph. We employed the Graphical lasso algorithm and the shrunken ECDF (empirical cumulative distribution function) estimator. We used a lasso penalty parameter lambda equal to 0.052 to control the regularization. We chose this value based on the local minimum point on the sparsity-lambda curve. When lambda = 0.05, around 15% of TF-TF pairs are considered as connected in the network. To estimate the false discovery rate, we generated a null model by random shuffling the edge weight of TF-regulatee pair across TFs. When the same algorithm is applied to this dataset, the chosen cutoff identifies zero interaction, suggesting that the method with cutoff equal to 0.05 has a very low false discovery rate.",
16
+ "tf_network_correlation_legend": "Key for TF-TF Network Image: Circle: TF-specificity Green: TRM-specific Brown: TEXterm-specific. Line thickness: TF-TF interaction intensity. Line color: Green: TF-TF interaction found in TRM, Brown: TF-TF interaction found in TEXterm.",
17
+ "tf_community_methodology": "Communities were detected using Leiden algorithm (3) with modularity as objective function and resolution as 0.9 since it reached the highest clustering modularity. In total, we identified 5 communities for each context. Network visualization was performed by graph with Fruchterman-Reingold layout algorithm (4) utilizing R package \"igraph\" (https://r.igraph.org/).",
18
+ "tf_community_trmtexcom_image_desc": "TF-TF association clustering generates five TF communities between TRM and TEXterm cells. Left: The overall community topology is shaped by shared TFs (gray) in both TRM and TEXterm cells. Middle and Right: TRM and TEXterm cells show differential TF-TF interactions within each community and between communities. Top 10% of interactions are shown. The line thickness represents the interaction intensity.",
19
+ "tf_community_members_prompt": "Members of each TF community in each cell state are below:",
20
+ "tf_community_pathway_desc": "TF neighbor communities in TEXterm and TRM cells, respectively are linked to different biological processes (below). The overall topology of these communities was influenced by multi-state TFs active in both cell states, while cell state-specific TFs created unique interaction patterns between the communities. Pathway analysis of the regulatees in each community suggested that TRM- or TEXterm-specific TFs within each community controlled different biological pathways. For example, in TRM cells, community 3 was associated with cell-cell adhesion and response to TGFβ, and community 1 was associated with RNA metabolism, but in TEXterm cells, community 3 was linked to apoptosis, and community 1 was coupled to catabolism, proteolysis, ubiquitin-proteasome, and autophagy.",
21
+ "multi_omics_data_table_scroll_prompt": "Scroll horizontally to view entire data table",
22
+ "multi_omics_data_composition_desc": "Composition of multi-omic atlas. A total of 121 experiments across multiple data sets were utilized to generate an epigenetic and transcriptional atlas of murine CD8+ T cells under chronic and acute antigen exposure. Unless stated, all CD8+ T cells were isolated from spleens.",
23
+ "citation_zhang_2016": "1. Zhang, K., Li, N., Ainsworth, R. I. & Wang, W. Systematic identification of protein combinations mediating chromatin looping. Nat. Commun. 7, 1–11 (2016).",
24
+ "citation_zhao_2020": "2. Zhao, T., Liu, H., Roeder, K., Lafferty, J. & Wasserman, L. The huge Package for High-dimensional Undirected Graph Estimation in R. (2020) doi:10.48550/arXiv.2006.14781.",
25
+ "citation_traag_2019": "3. Traag, V. A., Waltman, L. & van Eck, N. J. From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep. 9, 5233 (2019).",
26
+ "citation_schonfeld_2019": "4. Schönfeld, M. & Pfeffer, J. Fruchterman/Reingold (1991): Graph Drawing by Force-Directed Placement. Schlüsselwerke der Netzwerkforschung 217–220 https://doi.org/10.1007/978-3-658-21742-6_49 (2019)."
27
  }
ui.R CHANGED
The diff for this file is too large to render. See raw diff
 
warning_overlay.R CHANGED
@@ -1,63 +1,63 @@
1
- # warning_overlay.R
2
- # This file previously contained UI and JS for a fixed overlay.
3
- # That functionality is being replaced by displaying warnings as messages in the chat log.
4
- # Relevant logic will now be primarily in chat_script.js (to handle new message types)
5
- # and long_operations.R (to send custom Shiny messages).
6
-
7
- # shinyjs is still useful for other UI manipulations, so useShinyjs() should still be called in the main UI.
8
-
9
- # Helper to ensure shinyjs is ready (can be called in your main UI definition if not already)
10
- # useShinyjsForWarning <- function() {
11
- # shinyjs::useShinyjs()
12
- # }
13
-
14
- # UI for the warning overlay
15
- warningOverlayUI <- function() {
16
- # The overlay itself, initially hidden
17
- # It will be placed on top of the chat/thinking area
18
- # Assuming the "thinking box" is part of the chat sidebar or main content area
19
- # This overlay will cover its parent (e.g., the chat sidebar if placed within it)
20
- div(
21
- id = "thinkingWarningOverlay",
22
- style = "position: absolute; top: 0; left: 0; width: 100%; height: 100%; background-color: rgba(255, 0, 0, 0.3); /* Half-transparent red */ z-index: 2000; /* Ensure it's on top */ display: none; /* Initially hidden */ justify-content: center; align-items: center; text-align: center; flex-direction: column;",
23
- div(
24
- style = "background-color: white; padding: 20px; border-radius: 5px; box-shadow: 0 0 10px rgba(0,0,0,0.5);",
25
- h4("Processing your request..."),
26
- p("This may take a moment, especially with large datasets. Please be patient."),
27
- tags$div(class = "spinner-border text-primary", role = "status",
28
- tags$span(class="sr-only", "Loading...")
29
- ) # Simple spinner
30
- )
31
- )
32
- }
33
-
34
- # JavaScript to show/hide the overlay
35
- # These functions will be callable from the server
36
- warningOverlayJS <- "
37
- shinyjs.showWarningOverlay = function() {
38
- var overlay = document.getElementById('thinkingWarningOverlay');
39
- if (overlay) {
40
- overlay.style.display = 'flex'; // Use flex to center content
41
- console.log('Showing warning overlay.');
42
- }
43
- }
44
-
45
- shinyjs.hideWarningOverlay = function() {
46
- var overlay = document.getElementById('thinkingWarningOverlay');
47
- if (overlay) {
48
- overlay.style.display = 'none';
49
- console.log('Hiding warning overlay.');
50
- }
51
- }
52
- "
53
-
54
- # Functions to be called from the R server-side
55
- showWarningOverlay <- function(session) {
56
- # Ensure shinyjs is initialized (typically in ui.R or server.R globally)
57
- # If not already, add: useShinyjs()
58
- shinyjs::runjs("shinyjs.showWarningOverlay();")
59
- }
60
-
61
- hideWarningOverlay <- function(session) {
62
- shinyjs::runjs("shinyjs.hideWarningOverlay();")
63
  }
 
1
+ # warning_overlay.R
2
+ # This file previously contained UI and JS for a fixed overlay.
3
+ # That functionality is being replaced by displaying warnings as messages in the chat log.
4
+ # Relevant logic will now be primarily in chat_script.js (to handle new message types)
5
+ # and long_operations.R (to send custom Shiny messages).
6
+
7
+ # shinyjs is still useful for other UI manipulations, so useShinyjs() should still be called in the main UI.
8
+
9
+ # Helper to ensure shinyjs is ready (can be called in your main UI definition if not already)
10
+ # useShinyjsForWarning <- function() {
11
+ # shinyjs::useShinyjs()
12
+ # }
13
+
14
+ # UI for the warning overlay
15
+ warningOverlayUI <- function() {
16
+ # The overlay itself, initially hidden
17
+ # It will be placed on top of the chat/thinking area
18
+ # Assuming the "thinking box" is part of the chat sidebar or main content area
19
+ # This overlay will cover its parent (e.g., the chat sidebar if placed within it)
20
+ div(
21
+ id = "thinkingWarningOverlay",
22
+ style = "position: absolute; top: 0; left: 0; width: 100%; height: 100%; background-color: rgba(255, 0, 0, 0.3); /* Half-transparent red */ z-index: 2000; /* Ensure it's on top */ display: none; /* Initially hidden */ justify-content: center; align-items: center; text-align: center; flex-direction: column;",
23
+ div(
24
+ style = "background-color: white; padding: 20px; border-radius: 5px; box-shadow: 0 0 10px rgba(0,0,0,0.5);",
25
+ h4("Processing your request..."),
26
+ p("This may take a moment, especially with large datasets. Please be patient."),
27
+ tags$div(class = "spinner-border text-primary", role = "status",
28
+ tags$span(class="sr-only", "Loading...")
29
+ ) # Simple spinner
30
+ )
31
+ )
32
+ }
33
+
34
+ # JavaScript to show/hide the overlay
35
+ # These functions will be callable from the server
36
+ warningOverlayJS <- "
37
+ shinyjs.showWarningOverlay = function() {
38
+ var overlay = document.getElementById('thinkingWarningOverlay');
39
+ if (overlay) {
40
+ overlay.style.display = 'flex'; // Use flex to center content
41
+ console.log('Showing warning overlay.');
42
+ }
43
+ }
44
+
45
+ shinyjs.hideWarningOverlay = function() {
46
+ var overlay = document.getElementById('thinkingWarningOverlay');
47
+ if (overlay) {
48
+ overlay.style.display = 'none';
49
+ console.log('Hiding warning overlay.');
50
+ }
51
+ }
52
+ "
53
+
54
+ # Functions to be called from the R server-side
55
+ showWarningOverlay <- function(session) {
56
+ # Ensure shinyjs is initialized (typically in ui.R or server.R globally)
57
+ # If not already, add: useShinyjs()
58
+ shinyjs::runjs("shinyjs.showWarningOverlay();")
59
+ }
60
+
61
+ hideWarningOverlay <- function(session) {
62
+ shinyjs::runjs("shinyjs.hideWarningOverlay();")
63
  }
www/chat_script.js CHANGED
@@ -1,643 +1,559 @@
1
- // www/chat_script.js
2
-
3
- // Ensure jQuery and document are ready
4
- $(document).ready(function() {
5
- console.log("Document ready - chat_script.js initializing");
6
-
7
- $(document).on('shiny:connected', function(event) {
8
- console.log("Shiny connected - chat_script.js executing");
9
- initializeChatUI();
10
- setupImageViewer();
11
- });
12
- });
13
-
14
- // Setup full-size image viewer
15
- function setupImageViewer() {
16
- // Create modal for full-size images if it doesn't exist
17
- if ($('#fullImageModal').length === 0) {
18
- const modalHtml = `
19
- <div id="fullImageModal" class="modal">
20
- <span class="close-modal">&times;</span>
21
- <img class="modal-content" id="fullSizeImage">
22
- </div>
23
- `;
24
- $('body').append(modalHtml);
25
-
26
- // Add CSS for the modal
27
- const modalCss = `
28
- <style>
29
- /* Image modal styles */
30
- .chat-image-preview {
31
- max-width: 250px;
32
- max-height: 200px;
33
- cursor: pointer;
34
- border: 1px solid #ccc;
35
- border-radius: 5px;
36
- margin: 5px 0;
37
- transition: transform 0.2s;
38
- }
39
- .chat-image-preview:hover {
40
- transform: scale(1.05);
41
- }
42
- .chat-image-container {
43
- margin: 10px 0;
44
- }
45
- #fullImageModal {
46
- display: none;
47
- position: fixed;
48
- z-index: 9999;
49
- left: 0;
50
- top: 0;
51
- width: 100%;
52
- height: 100%;
53
- overflow: auto;
54
- background-color: rgba(0,0,0,0.9);
55
- }
56
- #fullImageModal .modal-content {
57
- margin: auto;
58
- display: block;
59
- max-width: 90%;
60
- max-height: 90%;
61
- }
62
- .close-modal {
63
- position: absolute;
64
- top: 15px;
65
- right: 35px;
66
- color: #f1f1f1;
67
- font-size: 40px;
68
- font-weight: bold;
69
- cursor: pointer;
70
- }
71
- </style>
72
- `;
73
- $('head').append(modalCss);
74
-
75
- // Close modal when clicking X or outside the image
76
- $('.close-modal').click(function() {
77
- $('#fullImageModal').hide();
78
- });
79
-
80
- $(document).click(function(event) {
81
- if (event.target === document.getElementById('fullImageModal')) {
82
- $('#fullImageModal').hide();
83
- }
84
- });
85
- }
86
-
87
- // Handle "activate_image_viewer" message from server
88
- Shiny.addCustomMessageHandler("activate_image_viewer", function(message) {
89
- console.log("Image viewer activated");
90
- });
91
- }
92
-
93
- // Function to show full-size image
94
- window.showFullImage = function(imagePath) {
95
- console.log("Showing full image:", imagePath);
96
-
97
- // Debug image loading
98
- var img = new Image();
99
- img.onload = function() {
100
- console.log("Image loaded successfully:", imagePath, "Size:", this.width, "x", this.height);
101
- };
102
- img.onerror = function() {
103
- console.error("Failed to load image:", imagePath);
104
- // Try alternative path by removing 'www' prefix
105
- var altPath = imagePath.replace(/^www\//, '');
106
- console.log("Trying alternative path:", altPath);
107
- $('#fullSizeImage').attr('src', altPath);
108
- };
109
- img.src = imagePath;
110
-
111
- $('#fullSizeImage').attr('src', imagePath);
112
- $('#fullImageModal').show();
113
- }
114
-
115
- function initializeChatUI() {
116
- var isFirstChatOpenThisSession = true;
117
- var isResizing = false;
118
- var startX;
119
- var startWidth;
120
-
121
- var $chatMessages = $('#chatMessages'); // Cache the selector
122
- var autoScrollEnabled = true;
123
- var scrollThreshold = 20; // Pixels from bottom to re-enable auto-scroll
124
-
125
- // --- Dynamically create and insert the Chat tab --- START ---
126
- var chatTabExists = $('#customChatTabLink').length > 0;
127
- if (!chatTabExists) {
128
- var $navbarList = $('ul.nav.navbar-nav').first();
129
-
130
- if ($navbarList.length > 0) {
131
- var $chatTabLi = $('<li></li>').addClass('nav-item custom-chat-tab-li');
132
- var $chatTabLink = $('<a></a>')
133
- .attr('id', 'customChatTabLink')
134
- .attr('href', '#')
135
- .addClass('nav-link')
136
- .html('<i class="fa fa-comments"></i> Chat');
137
-
138
- $chatTabLi.append($chatTabLink);
139
- $navbarList.append($chatTabLi);
140
- console.log("Custom 'Chat' tab dynamically added to navbar");
141
- } else {
142
- console.warn("Could not find navbar list to insert Chat tab");
143
- }
144
- }
145
-
146
- // Remove previous handlers
147
- $(document).off('click.chatToggle', 'a[data-value="chatTabTrigger"]');
148
- $('a[data-value="chatTabTrigger"]').off('click.chatToggle');
149
-
150
- var oldChatTabLink = $('a[data-toggle="tab"][data-value="chatTabTrigger"]');
151
- if (oldChatTabLink.length > 0) {
152
- oldChatTabLink.off('click.bs.tab.data-api');
153
- oldChatTabLink.attr('href', 'javascript:void(0);');
154
- oldChatTabLink.removeAttr('data-toggle');
155
- }
156
-
157
- $(document).off('click.chatNavbarButton', '#chatNavbarButton');
158
-
159
- // Chat toggle handler
160
- $(document).off('click.customChatTab').on('click.customChatTab', '#customChatTabLink', function(event) {
161
- event.preventDefault();
162
- event.stopPropagation();
163
- console.log("Chat tab clicked");
164
-
165
- var sidebar = $('#chatSidebar');
166
- console.log("Sidebar visibility:", sidebar.is(':visible'));
167
-
168
- if (sidebar.is(':visible')) {
169
- sidebar.fadeOut();
170
- } else {
171
- sidebar.fadeIn(function() {
172
- if (isFirstChatOpenThisSession) {
173
- addChatMessage("How can I help you today?", 'agent');
174
- addChatMessage("⚠️ TaijiChat can make errors. Please verify important scientific information and consult original research papers for critical findings.", 'agent', false, true);
175
- addChatMessage("📊 Note: Your first query may take longer as we initialize the data analysis system.", 'agent', false, true);
176
- isFirstChatOpenThisSession = false;
177
- }
178
- });
179
- }
180
- });
181
-
182
- // Close button handler
183
- $(document).off('click.chatClose').on('click.chatClose', '#closeChatSidebarBtn', function() {
184
- console.log("Close button clicked");
185
- $('#chatSidebar').fadeOut();
186
- });
187
-
188
- // Resize functionality
189
- console.log("Setting up resize handlers");
190
-
191
- // Remove any existing handlers first
192
- $(document).off('mousedown.resizeHandle');
193
- $(document).off('mousemove.resizePanel');
194
- $(document).off('mouseup.resizePanel');
195
-
196
- // Add new handlers using event delegation
197
- $(document).on('mousedown.resizeHandle', '.resize-handle', function(e) {
198
- console.log("Resize handle mousedown detected");
199
- isResizing = true;
200
- startX = e.pageX;
201
- var sidebar = $('#chatSidebar');
202
- startWidth = sidebar.width();
203
- console.log("Initial width:", startWidth);
204
- e.preventDefault();
205
- $('body').css('user-select', 'none'); // Prevent text selection while dragging
206
- });
207
-
208
- $(document).on('mousemove.resizePanel', function(e) {
209
- if (!isResizing) return;
210
-
211
- var sidebar = $('#chatSidebar');
212
- var windowWidth = $(window).width();
213
- var width = windowWidth - e.pageX;
214
- width = Math.max(250, Math.min(width, 3200));
215
- console.log("Resizing to width:", width);
216
-
217
- sidebar.css({
218
- 'width': width + 'px',
219
- 'transition': 'none' // Disable transition during drag
220
- });
221
- });
222
-
223
- $(document).on('mouseup.resizePanel', function(e) {
224
- if (isResizing) {
225
- console.log("Resize ended");
226
- isResizing = false;
227
- $('body').css('user-select', ''); // Re-enable text selection
228
- $('#chatSidebar').css('transition', ''); // Re-enable transitions
229
- }
230
- });
231
-
232
- $(document).on('mouseenter', '.resize-handle', function() {
233
- console.log('Mouse entered resize handle');
234
- });
235
-
236
- // Message handling functionality
237
- var thinkingMessageElement = null;
238
- var currentThoughtsContainer = null;
239
-
240
- // Track if thinking animation is in progress
241
- var thinkingTypingInProgress = false;
242
- var resultTypingQueue = [];
243
-
244
- // Scroll listener for chat messages panel
245
- if ($chatMessages.length) { // Ensure element exists before attaching listener
246
- $chatMessages.on('scroll.chatAutoScroll', function() {
247
- // Check if scrolled near the bottom
248
- if (this.scrollHeight - this.scrollTop - this.clientHeight < scrollThreshold) {
249
- if (!autoScrollEnabled) {
250
- // console.log("Auto-scroll re-enabled (scrolled to bottom).");
251
- autoScrollEnabled = true;
252
- }
253
- } else {
254
- if (autoScrollEnabled) {
255
- // console.log("Auto-scroll disabled (user scrolled up).");
256
- autoScrollEnabled = false;
257
- }
258
- }
259
- });
260
- } else {
261
- console.warn("#chatMessages element not found for scroll listener.");
262
- }
263
-
264
- function typeTextLine($element, text, callback, speed = 10) {
265
- let i = 0;
266
- function typeChar() {
267
- if (i < text.length) {
268
- $element.append(text.charAt(i));
269
- i++;
270
- if (autoScrollEnabled) { // Conditional scroll
271
- $chatMessages.scrollTop($chatMessages[0].scrollHeight); // Scroll after each character
272
- }
273
- setTimeout(typeChar, speed);
274
- } else if (callback) {
275
- callback();
276
- }
277
- }
278
- typeChar();
279
- }
280
-
281
- function typeTextLines($container, lines, lineClass, speed, doneCallback) {
282
- let idx = 0;
283
- function typeNextLine() {
284
- if (idx < lines.length) {
285
- var $lineDiv = $('<div></div>').addClass(lineClass);
286
- $container.append($lineDiv);
287
- typeTextLine($lineDiv, lines[idx], function() {
288
- idx++;
289
- typeNextLine();
290
- }, speed);
291
- } else if (doneCallback) {
292
- doneCallback();
293
- }
294
- }
295
- typeNextLine();
296
- }
297
-
298
- function addChatMessage(messageText, messageType, isThinkingMessage = false, isDisclaimer = false) {
299
- var messageClass = messageType === 'user' ? 'user-message' : 'agent-message';
300
- if (isThinkingMessage) {
301
- messageClass += ' thinking-message';
302
- }
303
- if (isDisclaimer) {
304
- messageClass += ' disclaimer';
305
- }
306
- var $chatMessages = $('#chatMessages');
307
-
308
- var $messageDiv = $('<div></div>').addClass('chat-message').addClass(messageClass);
309
-
310
- if (messageType === 'user') {
311
- // For user messages, just append the text directly without animation
312
- $messageDiv.text(messageText);
313
- $chatMessages.append($messageDiv);
314
- // Always scroll for user's own messages, and re-enable autoScroll
315
- autoScrollEnabled = true;
316
- $chatMessages.scrollTop($chatMessages[0].scrollHeight);
317
- return;
318
- }
319
-
320
- // Check if message contains HTML (for images)
321
- var containsHtml = /<[a-z][\s\S]*>/i.test(messageText);
322
-
323
- if (isThinkingMessage) {
324
- // Guarantee thinkingTypingInProgress is set before any animation starts
325
- thinkingTypingInProgress = true;
326
- $messageDiv.html('<span class="thought-toggle-arrow" role="button" tabindex="0">&#9658;</span> ' +
327
- '<span class="thinking-text"></span>' +
328
- '<div class="thoughts-area" style="display: none; margin-left: 20px; font-style: italic; color: #555;"></div>');
329
- thinkingMessageElement = $messageDiv;
330
- currentThoughtsContainer = $messageDiv.find('.thoughts-area');
331
- $chatMessages.append($messageDiv);
332
- // Split main thinking into lines
333
- var mainLines = messageText.split('\n');
334
- typeTextLines($messageDiv.find('.thinking-text'), mainLines, '', 4, function() {
335
- // If there are already thoughts queued, type them line by line
336
- var $thoughtsArea = $messageDiv.find('.thoughts-area');
337
- var thoughtDivs = $thoughtsArea.children('.thought-item').toArray();
338
- function typeThoughtsSequentially(idx) {
339
- if (idx < thoughtDivs.length) {
340
- var $thoughtDiv = $(thoughtDivs[idx]);
341
- var text = $thoughtDiv.data('pending-text');
342
- $thoughtDiv.removeData('pending-text');
343
- typeTextLine($thoughtDiv, text, function() {
344
- typeThoughtsSequentially(idx + 1);
345
- }, 4);
346
- } else {
347
- // Only now set thinkingTypingInProgress to false
348
- thinkingTypingInProgress = false;
349
- // If a result is queued, type it now
350
- if (resultTypingQueue.length > 0) {
351
- var nextResult = resultTypingQueue.shift();
352
- nextResult();
353
- }
354
- }
355
- }
356
- typeThoughtsSequentially(0);
357
- });
358
- // Scroll to bottom if auto-scroll is enabled
359
- if (autoScrollEnabled) {
360
- $chatMessages.scrollTop($chatMessages[0].scrollHeight);
361
- }
362
- return;
363
- }
364
-
365
- function processRegularMessage() {
366
- // Check if message contains HTML (for images)
367
- if (containsHtml) {
368
- // If contains HTML, add it directly without typing animation
369
- console.log("HTML content detected in message, adding directly:", messageText);
370
- // Check if it has an image tag
371
- if (messageText.indexOf('<img') !== -1) {
372
- console.log("Image tag found in message");
373
- // Extract image src for debugging
374
- var imgSrcMatch = messageText.match(/src="([^"]+)"/);
375
- if (imgSrcMatch && imgSrcMatch.length > 1) {
376
- console.log("Image src:", imgSrcMatch[1]);
377
- }
378
- }
379
- $messageDiv.html(messageText);
380
- $chatMessages.append($messageDiv);
381
- if (autoScrollEnabled) {
382
- $chatMessages.scrollTop($chatMessages[0].scrollHeight);
383
- }
384
- } else {
385
- // For normal text messages, use typing animation
386
- $chatMessages.append($messageDiv);
387
- var lines = messageText.split('\n');
388
- typeTextLines($messageDiv, lines, '', 5, function() {
389
- if (autoScrollEnabled) {
390
- $chatMessages.scrollTop($chatMessages[0].scrollHeight);
391
- }
392
- });
393
- }
394
- }
395
-
396
- if (thinkingTypingInProgress) {
397
- // If thinking animation is in progress, queue this message
398
- resultTypingQueue.push(processRegularMessage);
399
- } else {
400
- // Otherwise, process immediately
401
- processRegularMessage();
402
- }
403
- }
404
-
405
- // Thought toggle handler
406
- $(document).off('click.thoughtToggle').on('click.thoughtToggle', '.thought-toggle-arrow', function() {
407
- var $arrow = $(this);
408
- var $thoughtsArea = $arrow.siblings('.thoughts-area');
409
- $thoughtsArea.slideToggle(200);
410
- $arrow.html($arrow.html() === '►' ? '▼' : '►');
411
- });
412
-
413
- // Send message handlers
414
- $('#sendChatMsg').off('click.chatSend').on('click.chatSend', function() {
415
- var messageText = $('#chatInput').val().trim();
416
- if (messageText) {
417
- // Disable input and button
418
- $('#chatInput').prop('disabled', true);
419
- $('#sendChatMsg').prop('disabled', true);
420
-
421
- addChatMessage(messageText, 'user');
422
- $('#chatInput').val(''); // Clear input after grabbing value
423
- Shiny.setInputValue("user_chat_message", messageText, {priority: "event"});
424
- }
425
- });
426
-
427
- $('#chatInput').off('keypress.chatSend').on('keypress.chatSend', function(e) {
428
- if (e.which == 13 && !$(this).prop('disabled')) { // Check if not disabled
429
- e.preventDefault();
430
- $('#sendChatMsg').click(); // This will trigger the click handler above
431
- }
432
- });
433
-
434
- // Shiny message handlers
435
- Shiny.addCustomMessageHandler("agent_thinking_started", function(message) {
436
- console.log("Received thinking started message");
437
- if(message && typeof message.text === 'string') {
438
- // if (thinkingMessageElement) {
439
- // thinkingMessageElement.remove();
440
- // thinkingMessageElement = null;
441
- // }
442
- // if (currentThoughtsContainer) {
443
- // currentThoughtsContainer = null; // It's part of thinkingMessageElement, will be handled if parent is removed
444
- // }
445
- addChatMessage(message.text, 'agent', true);
446
- }
447
- });
448
-
449
- Shiny.addCustomMessageHandler("agent_new_thought", function(message) {
450
- console.log("Received new thought");
451
- if (message && typeof message.text === 'string' && currentThoughtsContainer) {
452
- var $thoughtDiv = $('<div></div>').addClass('thought-item');
453
- $thoughtDiv.data('pending-text', message.text);
454
- currentThoughtsContainer.append($thoughtDiv);
455
- // If thinking text is done, type this thought now, else it will be picked up in the queue
456
- if (!thinkingTypingInProgress) {
457
- // Guarantee thinkingTypingInProgress is set before starting thoughts
458
- thinkingTypingInProgress = true;
459
- var $thoughtsArea = currentThoughtsContainer;
460
- var thoughtDivs = $thoughtsArea.children('.thought-item').toArray();
461
- function typeThoughtsSequentially(idx) {
462
- if (idx < thoughtDivs.length) {
463
- var $thoughtDiv = $(thoughtDivs[idx]);
464
- if ($thoughtDiv.text().length === 0) { // Check if text has not been typed yet
465
- var text = $thoughtDiv.data('pending-text');
466
- $thoughtDiv.removeData('pending-text');
467
- typeTextLine($thoughtDiv, text, function() {
468
- typeThoughtsSequentially(idx + 1);
469
- }, 4);
470
- } else {
471
- // Already typed (e.g., if this function is re-entered)
472
- typeThoughtsSequentially(idx + 1);
473
- }
474
- } else {
475
- thinkingTypingInProgress = false;
476
- if (resultTypingQueue.length > 0) {
477
- var nextResult = resultTypingQueue.shift();
478
- nextResult(); // This will handle its own scrolling if needed
479
- }
480
- }
481
- }
482
- typeThoughtsSequentially(0);
483
- }
484
- if (autoScrollEnabled) { // Conditional scroll
485
- $chatMessages.scrollTop($chatMessages[0].scrollHeight);
486
- }
487
- }
488
- });
489
-
490
- Shiny.addCustomMessageHandler("agent_chat_response", function(message) {
491
- console.log("Received chat response");
492
- if(message && typeof message.text === 'string') {
493
- addChatMessage(message.text, 'agent');
494
- }
495
- // Re-enable input and button
496
- $('#chatInput').prop('disabled', false);
497
- $('#sendChatMsg').prop('disabled', false);
498
- $('#chatInput').focus(); // Optionally focus the input field
499
- });
500
-
501
- Shiny.addCustomMessageHandler("long_op_custom_warning", function(message) {
502
- if (message && typeof message.text === 'string') {
503
- // Add message to chat display, styled as a warning
504
- // You might want to add a specific class for styling, e.g., 'long-op-warning-message'
505
- // For now, it will use the default 'agent-message' style but appear as a distinct message.
506
- var $chatMessages = $('#chatMessages');
507
- var $messageDiv = $('<div></div>').addClass('chat-message agent-message long-op-warning'); // Added long-op-warning class
508
- $messageDiv.css({
509
- 'background-color': 'rgba(255, 0, 0, 0.1)', // Light red, less intense than the old overlay
510
- 'border': '1px solid rgba(255, 0, 0, 0.3)',
511
- 'color': '#721c24', // Darker red text for readability
512
- 'padding': '10px',
513
- 'margin-bottom': '10px',
514
- 'border-radius': '5px'
515
- });
516
- $messageDiv.text(message.text);
517
- $chatMessages.append($messageDiv);
518
- $chatMessages.scrollTop($chatMessages[0].scrollHeight);
519
- }
520
- });
521
-
522
- Shiny.addCustomMessageHandler("agent_processing_error", function(message) {
523
- // This is a new handler you might need if server.R sends a specific error message type
524
- // For now, agent_chat_response handles errors from server.R's tryCatch
525
- console.error("Agent processing error:", message.text);
526
- // Ensure UI is re-enabled even on specific error messages
527
- $('#chatInput').prop('disabled', false);
528
- $('#sendChatMsg').prop('disabled', false);
529
- $('#chatInput').focus();
530
- // Optionally display a more prominent error in the chat
531
- if(message && typeof message.text === 'string') {
532
- addChatMessage("Error: " + message.text, 'agent-error'); // Define 'agent-error' style if needed
533
- } else {
534
- addChatMessage("An unexpected error occurred with the agent.", 'agent-error');
535
- }
536
- });
537
-
538
- Shiny.addCustomMessageHandler("literature_confirmation_request", function(message) {
539
- console.log("Received literature confirmation request");
540
- if (message && typeof message.text === 'string') {
541
- showLiteratureConfirmationDialog(message.text);
542
- }
543
- });
544
-
545
- function showLiteratureConfirmationDialog(messageText) {
546
- var $chatMessages = $('#chatMessages');
547
-
548
- // Remove any existing confirmation dialogs first
549
- $('.literature-confirmation-dialog').remove();
550
-
551
- // Create a wrapper for the dialog that doesn't interfere with chat message styling
552
- var $dialogWrapper = $('<div></div>').addClass('chat-message').css({
553
- 'background': 'transparent',
554
- 'border': 'none',
555
- 'padding': '10px 0',
556
- 'margin': '8px 0',
557
- 'max-width': '100%',
558
- 'float': 'none',
559
- 'display': 'flex',
560
- 'justify-content': 'center',
561
- 'align-items': 'center',
562
- 'width': '100%'
563
- });
564
-
565
- // Create the confirmation dialog box
566
- var $confirmationDiv = $('<div></div>').addClass('literature-confirmation-dialog');
567
- $confirmationDiv.html(`
568
- <div class="confirmation-content">
569
- <div class="confirmation-message">${messageText}</div>
570
- <div class="confirmation-buttons">
571
- <button class="btn confirmation-paper" onclick="handleLiteratureConfirmation('paper')">
572
- Paper Only
573
- </button>
574
- <button class="btn confirmation-external" onclick="handleLiteratureConfirmation('external')">
575
- External Only
576
- </button>
577
- <button class="btn confirmation-both" onclick="handleLiteratureConfirmation('both')">
578
- Both Sources
579
- </button>
580
- <button class="btn confirmation-none" onclick="handleLiteratureConfirmation('none')">
581
- None
582
- </button>
583
- </div>
584
- </div>
585
- `);
586
-
587
- // Add the dialog to the wrapper and then to chat
588
- $dialogWrapper.append($confirmationDiv);
589
- $chatMessages.append($dialogWrapper);
590
-
591
- // Smooth scroll to the dialog
592
- if (autoScrollEnabled) {
593
- $chatMessages.animate({
594
- scrollTop: $chatMessages[0].scrollHeight
595
- }, 300);
596
- }
597
-
598
- // Disable input while waiting for confirmation
599
- $('#chatInput').prop('disabled', true);
600
- $('#sendChatMsg').prop('disabled', true);
601
- }
602
-
603
- // Global function to handle literature confirmation response
604
- window.handleLiteratureConfirmation = function(userChoice) {
605
- console.log("Literature confirmation choice:", userChoice);
606
-
607
- // Add a brief feedback before removing the dialog
608
- var $confirmationDialog = $('.literature-confirmation-dialog');
609
- var choiceText;
610
- switch(userChoice) {
611
- case 'paper':
612
- choiceText = "Using the underlying paper only...";
613
- break;
614
- case 'external':
615
- choiceText = "Using external literature only...";
616
- break;
617
- case 'both':
618
- choiceText = "Using both paper and external literature...";
619
- break;
620
- case 'none':
621
- choiceText = "Proceeding without literature sources...";
622
- break;
623
- default:
624
- choiceText = "Processing your choice...";
625
- }
626
-
627
- // Show brief feedback
628
- $confirmationDialog.find('.confirmation-message').html(`<em>${choiceText}</em>`);
629
- $confirmationDialog.find('.confirmation-buttons').fadeOut(200);
630
-
631
- // Remove the dialog after a brief delay
632
- setTimeout(function() {
633
- $('.literature-confirmation-dialog').closest('.chat-message').fadeOut(300, function() {
634
- $(this).remove();
635
- });
636
- }, 1000);
637
-
638
- // Send the response to Shiny
639
- Shiny.setInputValue("literature_confirmation_response", userChoice, {priority: "event"});
640
-
641
- // Keep input disabled - it will be re-enabled when the agent responds
642
- };
643
- }
 
1
+ // www/chat_script.js
2
+
3
+ // Ensure jQuery and document are ready
4
+ $(document).ready(function() {
5
+ console.log("Document ready - chat_script.js initializing");
6
+
7
+ $(document).on('shiny:connected', function(event) {
8
+ console.log("Shiny connected - chat_script.js executing");
9
+ initializeChatUI();
10
+ setupImageViewer();
11
+ });
12
+ });
13
+
14
+ // Setup full-size image viewer
15
+ function setupImageViewer() {
16
+ // Create modal for full-size images if it doesn't exist
17
+ if ($('#fullImageModal').length === 0) {
18
+ const modalHtml = `
19
+ <div id="fullImageModal" class="modal">
20
+ <span class="close-modal">&times;</span>
21
+ <img class="modal-content" id="fullSizeImage">
22
+ </div>
23
+ `;
24
+ $('body').append(modalHtml);
25
+
26
+ // Add CSS for the modal
27
+ const modalCss = `
28
+ <style>
29
+ /* Image modal styles */
30
+ .chat-image-preview {
31
+ max-width: 250px;
32
+ max-height: 200px;
33
+ cursor: pointer;
34
+ border: 1px solid #ccc;
35
+ border-radius: 5px;
36
+ margin: 5px 0;
37
+ transition: transform 0.2s;
38
+ }
39
+ .chat-image-preview:hover {
40
+ transform: scale(1.05);
41
+ }
42
+ .chat-image-container {
43
+ margin: 10px 0;
44
+ }
45
+ #fullImageModal {
46
+ display: none;
47
+ position: fixed;
48
+ z-index: 9999;
49
+ left: 0;
50
+ top: 0;
51
+ width: 100%;
52
+ height: 100%;
53
+ overflow: auto;
54
+ background-color: rgba(0,0,0,0.9);
55
+ }
56
+ #fullImageModal .modal-content {
57
+ margin: auto;
58
+ display: block;
59
+ max-width: 90%;
60
+ max-height: 90%;
61
+ }
62
+ .close-modal {
63
+ position: absolute;
64
+ top: 15px;
65
+ right: 35px;
66
+ color: #f1f1f1;
67
+ font-size: 40px;
68
+ font-weight: bold;
69
+ cursor: pointer;
70
+ }
71
+ </style>
72
+ `;
73
+ $('head').append(modalCss);
74
+
75
+ // Close modal when clicking X or outside the image
76
+ $('.close-modal').click(function() {
77
+ $('#fullImageModal').hide();
78
+ });
79
+
80
+ $(document).click(function(event) {
81
+ if (event.target === document.getElementById('fullImageModal')) {
82
+ $('#fullImageModal').hide();
83
+ }
84
+ });
85
+ }
86
+
87
+ // Handle "activate_image_viewer" message from server
88
+ Shiny.addCustomMessageHandler("activate_image_viewer", function(message) {
89
+ console.log("Image viewer activated");
90
+ });
91
+ }
92
+
93
+ // Function to show full-size image
94
+ window.showFullImage = function(imagePath) {
95
+ console.log("Showing full image:", imagePath);
96
+
97
+ // Debug image loading
98
+ var img = new Image();
99
+ img.onload = function() {
100
+ console.log("Image loaded successfully:", imagePath, "Size:", this.width, "x", this.height);
101
+ };
102
+ img.onerror = function() {
103
+ console.error("Failed to load image:", imagePath);
104
+ // Try alternative path by removing 'www' prefix
105
+ var altPath = imagePath.replace(/^www\//, '');
106
+ console.log("Trying alternative path:", altPath);
107
+ $('#fullSizeImage').attr('src', altPath);
108
+ };
109
+ img.src = imagePath;
110
+
111
+ $('#fullSizeImage').attr('src', imagePath);
112
+ $('#fullImageModal').show();
113
+ }
114
+
115
+ function initializeChatUI() {
116
+ var isFirstChatOpenThisSession = true;
117
+ var isResizing = false;
118
+ var startX;
119
+ var startWidth;
120
+
121
+ var $chatMessages = $('#chatMessages'); // Cache the selector
122
+ var autoScrollEnabled = true;
123
+ var scrollThreshold = 20; // Pixels from bottom to re-enable auto-scroll
124
+
125
+ // --- Dynamically create and insert the Chat tab --- START ---
126
+ var chatTabExists = $('#customChatTabLink').length > 0;
127
+ if (!chatTabExists) {
128
+ var $navbarList = $('ul.nav.navbar-nav').first();
129
+
130
+ if ($navbarList.length > 0) {
131
+ var $chatTabLi = $('<li></li>').addClass('nav-item custom-chat-tab-li');
132
+ var $chatTabLink = $('<a></a>')
133
+ .attr('id', 'customChatTabLink')
134
+ .attr('href', '#')
135
+ .addClass('nav-link')
136
+ .html('<i class="fa fa-comments"></i> Chat');
137
+
138
+ $chatTabLi.append($chatTabLink);
139
+ $navbarList.append($chatTabLi);
140
+ console.log("Custom 'Chat' tab dynamically added to navbar");
141
+ } else {
142
+ console.warn("Could not find navbar list to insert Chat tab");
143
+ }
144
+ }
145
+
146
+ // Remove previous handlers
147
+ $(document).off('click.chatToggle', 'a[data-value="chatTabTrigger"]');
148
+ $('a[data-value="chatTabTrigger"]').off('click.chatToggle');
149
+
150
+ var oldChatTabLink = $('a[data-toggle="tab"][data-value="chatTabTrigger"]');
151
+ if (oldChatTabLink.length > 0) {
152
+ oldChatTabLink.off('click.bs.tab.data-api');
153
+ oldChatTabLink.attr('href', 'javascript:void(0);');
154
+ oldChatTabLink.removeAttr('data-toggle');
155
+ }
156
+
157
+ $(document).off('click.chatNavbarButton', '#chatNavbarButton');
158
+
159
+ // Chat toggle handler
160
+ $(document).off('click.customChatTab').on('click.customChatTab', '#customChatTabLink', function(event) {
161
+ event.preventDefault();
162
+ event.stopPropagation();
163
+ console.log("Chat tab clicked");
164
+
165
+ var sidebar = $('#chatSidebar');
166
+ console.log("Sidebar visibility:", sidebar.is(':visible'));
167
+
168
+ if (sidebar.is(':visible')) {
169
+ sidebar.fadeOut();
170
+ } else {
171
+ sidebar.fadeIn(function() {
172
+ if (isFirstChatOpenThisSession) {
173
+ addChatMessage("How can I help you today?", 'agent');
174
+ addChatMessage("⚠️ TaijiChat can make errors. Please verify important scientific information and consult original research papers for critical findings.", 'agent', false, true);
175
+ addChatMessage("📊 Note: Your first query may take longer as we initialize the data analysis system.", 'agent', false, true);
176
+ isFirstChatOpenThisSession = false;
177
+ }
178
+ });
179
+ }
180
+ });
181
+
182
+ // Close button handler
183
+ $(document).off('click.chatClose').on('click.chatClose', '#closeChatSidebarBtn', function() {
184
+ console.log("Close button clicked");
185
+ $('#chatSidebar').fadeOut();
186
+ });
187
+
188
+ // Resize functionality
189
+ console.log("Setting up resize handlers");
190
+
191
+ // Remove any existing handlers first
192
+ $(document).off('mousedown.resizeHandle');
193
+ $(document).off('mousemove.resizePanel');
194
+ $(document).off('mouseup.resizePanel');
195
+
196
+ // Add new handlers using event delegation
197
+ $(document).on('mousedown.resizeHandle', '.resize-handle', function(e) {
198
+ console.log("Resize handle mousedown detected");
199
+ isResizing = true;
200
+ startX = e.pageX;
201
+ var sidebar = $('#chatSidebar');
202
+ startWidth = sidebar.width();
203
+ console.log("Initial width:", startWidth);
204
+ e.preventDefault();
205
+ $('body').css('user-select', 'none'); // Prevent text selection while dragging
206
+ });
207
+
208
+ $(document).on('mousemove.resizePanel', function(e) {
209
+ if (!isResizing) return;
210
+
211
+ var sidebar = $('#chatSidebar');
212
+ var windowWidth = $(window).width();
213
+ var width = windowWidth - e.pageX;
214
+ width = Math.max(250, Math.min(width, 3200));
215
+ console.log("Resizing to width:", width);
216
+
217
+ sidebar.css({
218
+ 'width': width + 'px',
219
+ 'transition': 'none' // Disable transition during drag
220
+ });
221
+ });
222
+
223
+ $(document).on('mouseup.resizePanel', function(e) {
224
+ if (isResizing) {
225
+ console.log("Resize ended");
226
+ isResizing = false;
227
+ $('body').css('user-select', ''); // Re-enable text selection
228
+ $('#chatSidebar').css('transition', ''); // Re-enable transitions
229
+ }
230
+ });
231
+
232
+ $(document).on('mouseenter', '.resize-handle', function() {
233
+ console.log('Mouse entered resize handle');
234
+ });
235
+
236
+ // Message handling functionality
237
+ var thinkingMessageElement = null;
238
+ var currentThoughtsContainer = null;
239
+
240
+ // Track if thinking animation is in progress
241
+ var thinkingTypingInProgress = false;
242
+ var resultTypingQueue = [];
243
+
244
+ // Scroll listener for chat messages panel
245
+ if ($chatMessages.length) { // Ensure element exists before attaching listener
246
+ $chatMessages.on('scroll.chatAutoScroll', function() {
247
+ // Check if scrolled near the bottom
248
+ if (this.scrollHeight - this.scrollTop - this.clientHeight < scrollThreshold) {
249
+ if (!autoScrollEnabled) {
250
+ // console.log("Auto-scroll re-enabled (scrolled to bottom).");
251
+ autoScrollEnabled = true;
252
+ }
253
+ } else {
254
+ if (autoScrollEnabled) {
255
+ // console.log("Auto-scroll disabled (user scrolled up).");
256
+ autoScrollEnabled = false;
257
+ }
258
+ }
259
+ });
260
+ } else {
261
+ console.warn("#chatMessages element not found for scroll listener.");
262
+ }
263
+
264
+ function typeTextLine($element, text, callback, speed = 10) {
265
+ let i = 0;
266
+ function typeChar() {
267
+ if (i < text.length) {
268
+ $element.append(text.charAt(i));
269
+ i++;
270
+ if (autoScrollEnabled) { // Conditional scroll
271
+ $chatMessages.scrollTop($chatMessages[0].scrollHeight); // Scroll after each character
272
+ }
273
+ setTimeout(typeChar, speed);
274
+ } else if (callback) {
275
+ callback();
276
+ }
277
+ }
278
+ typeChar();
279
+ }
280
+
281
+ function typeTextLines($container, lines, lineClass, speed, doneCallback) {
282
+ let idx = 0;
283
+ function typeNextLine() {
284
+ if (idx < lines.length) {
285
+ var $lineDiv = $('<div></div>').addClass(lineClass);
286
+ $container.append($lineDiv);
287
+ typeTextLine($lineDiv, lines[idx], function() {
288
+ idx++;
289
+ typeNextLine();
290
+ }, speed);
291
+ } else if (doneCallback) {
292
+ doneCallback();
293
+ }
294
+ }
295
+ typeNextLine();
296
+ }
297
+
298
+ function addChatMessage(messageText, messageType, isThinkingMessage = false, isDisclaimer = false) {
299
+ var messageClass = messageType === 'user' ? 'user-message' : 'agent-message';
300
+ if (isThinkingMessage) {
301
+ messageClass += ' thinking-message';
302
+ }
303
+ if (isDisclaimer) {
304
+ messageClass += ' disclaimer';
305
+ }
306
+ var $chatMessages = $('#chatMessages');
307
+
308
+ var $messageDiv = $('<div></div>').addClass('chat-message').addClass(messageClass);
309
+
310
+ if (messageType === 'user') {
311
+ // For user messages, just append the text directly without animation
312
+ $messageDiv.text(messageText);
313
+ $chatMessages.append($messageDiv);
314
+ // Always scroll for user's own messages, and re-enable autoScroll
315
+ autoScrollEnabled = true;
316
+ $chatMessages.scrollTop($chatMessages[0].scrollHeight);
317
+ return;
318
+ }
319
+
320
+ // Check if message contains HTML (for images)
321
+ var containsHtml = /<[a-z][\s\S]*>/i.test(messageText);
322
+
323
+ if (isThinkingMessage) {
324
+ // Guarantee thinkingTypingInProgress is set before any animation starts
325
+ thinkingTypingInProgress = true;
326
+ $messageDiv.html('<span class="thought-toggle-arrow" role="button" tabindex="0">&#9658;</span> ' +
327
+ '<span class="thinking-text"></span>' +
328
+ '<div class="thoughts-area" style="display: none; margin-left: 20px; font-style: italic; color: #555;"></div>');
329
+ thinkingMessageElement = $messageDiv;
330
+ currentThoughtsContainer = $messageDiv.find('.thoughts-area');
331
+ $chatMessages.append($messageDiv);
332
+ // Split main thinking into lines
333
+ var mainLines = messageText.split('\n');
334
+ typeTextLines($messageDiv.find('.thinking-text'), mainLines, '', 4, function() {
335
+ // If there are already thoughts queued, type them line by line
336
+ var $thoughtsArea = $messageDiv.find('.thoughts-area');
337
+ var thoughtDivs = $thoughtsArea.children('.thought-item').toArray();
338
+ function typeThoughtsSequentially(idx) {
339
+ if (idx < thoughtDivs.length) {
340
+ var $thoughtDiv = $(thoughtDivs[idx]);
341
+ var text = $thoughtDiv.data('pending-text');
342
+ $thoughtDiv.removeData('pending-text');
343
+ typeTextLine($thoughtDiv, text, function() {
344
+ typeThoughtsSequentially(idx + 1);
345
+ }, 4);
346
+ } else {
347
+ // Only now set thinkingTypingInProgress to false
348
+ thinkingTypingInProgress = false;
349
+ // If a result is queued, type it now
350
+ if (resultTypingQueue.length > 0) {
351
+ var nextResult = resultTypingQueue.shift();
352
+ nextResult();
353
+ }
354
+ }
355
+ }
356
+ typeThoughtsSequentially(0);
357
+ });
358
+ // Scroll to bottom if auto-scroll is enabled
359
+ if (autoScrollEnabled) {
360
+ $chatMessages.scrollTop($chatMessages[0].scrollHeight);
361
+ }
362
+ return;
363
+ }
364
+
365
+ function processRegularMessage() {
366
+ // Check if message contains HTML (for images)
367
+ if (containsHtml) {
368
+ // If contains HTML, add it directly without typing animation
369
+ console.log("HTML content detected in message, adding directly:", messageText);
370
+ // Check if it has an image tag
371
+ if (messageText.indexOf('<img') !== -1) {
372
+ console.log("Image tag found in message");
373
+ // Extract image src for debugging
374
+ var imgSrcMatch = messageText.match(/src="([^"]+)"/);
375
+ if (imgSrcMatch && imgSrcMatch.length > 1) {
376
+ console.log("Image src:", imgSrcMatch[1]);
377
+ }
378
+ }
379
+ $messageDiv.html(messageText);
380
+ $chatMessages.append($messageDiv);
381
+ if (autoScrollEnabled) {
382
+ $chatMessages.scrollTop($chatMessages[0].scrollHeight);
383
+ }
384
+ } else {
385
+ // For normal text messages, use typing animation
386
+ $chatMessages.append($messageDiv);
387
+ var lines = messageText.split('\n');
388
+ typeTextLines($messageDiv, lines, '', 5, function() {
389
+ if (autoScrollEnabled) {
390
+ $chatMessages.scrollTop($chatMessages[0].scrollHeight);
391
+ }
392
+ });
393
+ }
394
+ }
395
+
396
+ if (thinkingTypingInProgress) {
397
+ // If thinking animation is in progress, queue this message
398
+ resultTypingQueue.push(processRegularMessage);
399
+ } else {
400
+ // Otherwise, process immediately
401
+ processRegularMessage();
402
+ }
403
+ }
404
+
405
+ // Thought toggle handler
406
+ $(document).off('click.thoughtToggle').on('click.thoughtToggle', '.thought-toggle-arrow', function() {
407
+ var $arrow = $(this);
408
+ var $thoughtsArea = $arrow.siblings('.thoughts-area');
409
+ $thoughtsArea.slideToggle(200);
410
+ $arrow.html($arrow.html() === '►' ? '▼' : '►');
411
+ });
412
+
413
+ // Send message handlers
414
+ $('#sendChatMsg').off('click.chatSend').on('click.chatSend', function() {
415
+ var messageText = $('#chatInput').val().trim();
416
+ if (messageText) {
417
+ // Disable input and button
418
+ $('#chatInput').prop('disabled', true);
419
+ $('#sendChatMsg').prop('disabled', true);
420
+
421
+ addChatMessage(messageText, 'user');
422
+ $('#chatInput').val(''); // Clear input after grabbing value
423
+ Shiny.setInputValue("user_chat_message", messageText, {priority: "event"});
424
+ }
425
+ });
426
+
427
+ $('#chatInput').off('keypress.chatSend').on('keypress.chatSend', function(e) {
428
+ if (e.which == 13 && !$(this).prop('disabled')) { // Check if not disabled
429
+ e.preventDefault();
430
+ $('#sendChatMsg').click(); // This will trigger the click handler above
431
+ }
432
+ });
433
+
434
+ // Shiny message handlers
435
+ Shiny.addCustomMessageHandler("agent_thinking_started", function(message) {
436
+ console.log("Received thinking started message");
437
+ if(message && typeof message.text === 'string') {
438
+ // if (thinkingMessageElement) {
439
+ // thinkingMessageElement.remove();
440
+ // thinkingMessageElement = null;
441
+ // }
442
+ // if (currentThoughtsContainer) {
443
+ // currentThoughtsContainer = null; // It's part of thinkingMessageElement, will be handled if parent is removed
444
+ // }
445
+ addChatMessage(message.text, 'agent', true);
446
+ }
447
+ });
448
+
449
+ Shiny.addCustomMessageHandler("agent_new_thought", function(message) {
450
+ console.log("Received new thought");
451
+ if (message && typeof message.text === 'string' && currentThoughtsContainer) {
452
+ var $thoughtDiv = $('<div></div>').addClass('thought-item');
453
+ $thoughtDiv.data('pending-text', message.text);
454
+ currentThoughtsContainer.append($thoughtDiv);
455
+ // If thinking text is done, type this thought now, else it will be picked up in the queue
456
+ if (!thinkingTypingInProgress) {
457
+ // Guarantee thinkingTypingInProgress is set before starting thoughts
458
+ thinkingTypingInProgress = true;
459
+ var $thoughtsArea = currentThoughtsContainer;
460
+ var thoughtDivs = $thoughtsArea.children('.thought-item').toArray();
461
+ function typeThoughtsSequentially(idx) {
462
+ if (idx < thoughtDivs.length) {
463
+ var $thoughtDiv = $(thoughtDivs[idx]);
464
+ if ($thoughtDiv.text().length === 0) { // Check if text has not been typed yet
465
+ var text = $thoughtDiv.data('pending-text');
466
+ $thoughtDiv.removeData('pending-text');
467
+ typeTextLine($thoughtDiv, text, function() {
468
+ typeThoughtsSequentially(idx + 1);
469
+ }, 4);
470
+ } else {
471
+ // Already typed (e.g., if this function is re-entered)
472
+ typeThoughtsSequentially(idx + 1);
473
+ }
474
+ } else {
475
+ thinkingTypingInProgress = false;
476
+ if (resultTypingQueue.length > 0) {
477
+ var nextResult = resultTypingQueue.shift();
478
+ nextResult(); // This will handle its own scrolling if needed
479
+ }
480
+ }
481
+ }
482
+ typeThoughtsSequentially(0);
483
+ }
484
+ if (autoScrollEnabled) { // Conditional scroll
485
+ $chatMessages.scrollTop($chatMessages[0].scrollHeight);
486
+ }
487
+ }
488
+ });
489
+
490
+ Shiny.addCustomMessageHandler("agent_chat_response", function(message) {
491
+ console.log("Received chat response");
492
+ if(message && typeof message.text === 'string') {
493
+ addChatMessage(message.text, 'agent');
494
+ }
495
+ // Re-enable input and button
496
+ $('#chatInput').prop('disabled', false);
497
+ $('#sendChatMsg').prop('disabled', false);
498
+ $('#chatInput').focus(); // Optionally focus the input field
499
+ });
500
+
501
+ Shiny.addCustomMessageHandler("long_op_custom_warning", function(message) {
502
+ if (message && typeof message.text === 'string') {
503
+ // Add message to chat display, styled as a warning
504
+ // You might want to add a specific class for styling, e.g., 'long-op-warning-message'
505
+ // For now, it will use the default 'agent-message' style but appear as a distinct message.
506
+ var $chatMessages = $('#chatMessages');
507
+ var $messageDiv = $('<div></div>').addClass('chat-message agent-message long-op-warning'); // Added long-op-warning class
508
+ $messageDiv.css({
509
+ 'background-color': 'rgba(255, 0, 0, 0.1)', // Light red, less intense than the old overlay
510
+ 'border': '1px solid rgba(255, 0, 0, 0.3)',
511
+ 'color': '#721c24', // Darker red text for readability
512
+ 'padding': '10px',
513
+ 'margin-bottom': '10px',
514
+ 'border-radius': '5px'
515
+ });
516
+ $messageDiv.text(message.text);
517
+ $chatMessages.append($messageDiv);
518
+ $chatMessages.scrollTop($chatMessages[0].scrollHeight);
519
+ }
520
+ });
521
+
522
+ Shiny.addCustomMessageHandler("agent_processing_error", function(message) {
523
+ // This is a new handler you might need if server.R sends a specific error message type
524
+ // For now, agent_chat_response handles errors from server.R's tryCatch
525
+ console.error("Agent processing error:", message.text);
526
+ // Ensure UI is re-enabled even on specific error messages
527
+ $('#chatInput').prop('disabled', false);
528
+ $('#sendChatMsg').prop('disabled', false);
529
+ $('#chatInput').focus();
530
+ // Optionally display a more prominent error in the chat
531
+ if(message && typeof message.text === 'string') {
532
+ addChatMessage("Error: " + message.text, 'agent-error'); // Define 'agent-error' style if needed
533
+ } else {
534
+ addChatMessage("An unexpected error occurred with the agent.", 'agent-error');
535
+ }
536
+ });
537
+
538
+
539
+ // Literature toggle handler
540
+ $(document).off('click.literatureToggle').on('click.literatureToggle', '#literatureToggleBtn', function() {
541
+ var $btn = $(this);
542
+ $btn.toggleClass('active');
543
+
544
+ var isEnabled = $btn.hasClass('active');
545
+ Shiny.setInputValue("literature_search_enabled", isEnabled, {priority: "event"});
546
+
547
+ // Update button text/icon based on state
548
+ if (isEnabled) {
549
+ $btn.html('<i class="fa fa-search"></i> External Literature (ON)');
550
+ } else {
551
+ $btn.html('<i class="fa fa-search"></i> External Literature (OFF)');
552
+ }
553
+ });
554
+
555
+ // Initialize button state (default: disabled)
556
+ setTimeout(function() {
557
+ Shiny.setInputValue("literature_search_enabled", false);
558
+ }, 100);
559
+ }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
www/chat_styles.css CHANGED
@@ -1,336 +1,197 @@
1
- /* www/chat_styles.css */
2
-
3
- .chat-toggle-button {
4
- position: fixed; /* Fixed position */
5
- top: 10px; /* Adjust as needed for navbar height */
6
- right: 20px;
7
- z-index: 1051; /* Higher than sidebar to be clickable if sidebar somehow overlaps header */
8
- }
9
-
10
- /* Styles for the chat sidebar itself are mostly inline in chatSidebarUI for now,
11
- but could be moved here. For example: */
12
- .chat-sidebar {
13
- position: fixed !important;
14
- right: 0;
15
- top: 0;
16
- height: 100vh;
17
- width: 350px;
18
- min-width: 250px;
19
- max-width: 3200px;
20
- z-index: 1050;
21
- background-color: #f8f9fa;
22
- border-left: 1px solid #dee2e6;
23
- box-shadow: -2px 0 5px rgba(0,0,0,0.1);
24
- transition: width 0.1s ease-out;
25
- padding: 15px;
26
- display: none;
27
- pointer-events: auto;
28
- }
29
-
30
- .chat-messages-area {
31
- /* height: calc(100vh - 200px); */ /* Adjust based on header/footer/input area */
32
- /* overflow-y: auto; */
33
- /* border: 1px solid #ccc; */
34
- /* padding: 10px; */
35
- /* margin-bottom: 10px; */
36
- /* background-color: white; */
37
- }
38
-
39
- .chat-input-area {
40
- /* display: flex; */
41
- box-sizing: border-box; /* Add this to the container */
42
- }
43
-
44
- /* Basic styling for messages (example) */
45
- .chat-message {
46
- padding: 8px 12px;
47
- margin-bottom: 8px;
48
- border-radius: 15px;
49
- word-wrap: break-word;
50
- max-width: 85%;
51
- clear: both; /* Ensures messages don't overlap if floats were used */
52
- }
53
- .user-message {
54
- background-color: #007bff; /* Primary blue for user */
55
- color: white;
56
- float: right;
57
- margin-left: auto; /* Pushes to the right */
58
- border-bottom-right-radius: 5px; /* Slightly different rounding for bubble effect */
59
- }
60
- .agent-message {
61
- background-color: #e9ecef; /* Light grey for agent */
62
- color: #495057;
63
- float: left;
64
- margin-right: auto; /* Pushes to the left */
65
- border-bottom-left-radius: 5px; /* Slightly different rounding for bubble effect */
66
- }
67
-
68
- /* Styling for disclaimer messages */
69
- .agent-message.disclaimer {
70
- background-color: #fff3cd; /* Light yellow background for warning */
71
- border: 1px solid #ffeaa7; /* Light orange border */
72
- color: #856404; /* Dark yellow-brown text */
73
- font-size: 0.9em; /* Slightly smaller text */
74
- font-style: italic; /* Italic for emphasis */
75
- }
76
-
77
- /* Styles for the custom chat tab list item */
78
- .custom-chat-tab-li {
79
- position: relative; /* Allows precise positioning of the link if needed */
80
- margin-top: 12px; /* ADJUSTED: Increased to 15px to move the tab down further. */
81
- }
82
-
83
- /* Styles for the custom chat tab link */
84
- #customChatTabLink {
85
- padding: 10px 15px; /* Adjust padding to match other tabs */
86
- line-height: 20px; /* Adjust line-height to match other tabs */
87
- display: block;
88
- position: relative;
89
- color: white !important; /* Make font color white */
90
- }
91
-
92
- #customChatTabLink:hover,
93
- #customChatTabLink:focus {
94
- text-decoration: none;
95
- background-color: #333333; /* Darker background on hover for white text, e.g., dark grey */
96
- color: white !important; /* Ensure hover text color is also white */
97
- }
98
-
99
- /* Ensure input and button in chat sidebar respect their defined widths */
100
- .chat-input-area #chatInput {
101
- box-sizing: border-box;
102
- flex-grow: 1; /* Allow input to take available space */
103
- margin-right: 5px; /* Add a small margin to separate from button if needed */
104
- height: 38px !important; /* Explicit height */
105
- }
106
-
107
- .chat-input-area #sendChatMsg {
108
- box-sizing: border-box;
109
- flex-shrink: 0; /* Prevent button from shrinking */
110
- display: inline-flex !important; /* Make the button a flex container */
111
- align-items: center !important; /* Vertically center content inside the button */
112
- justify-content: center !important; /* Horizontally center content (good for icon+text) */
113
- height: 38px !important; /* Explicit height, matching input */
114
- padding-left: 10px !important; /* Ensure some horizontal padding for button text */
115
- padding-right: 10px !important; /* Ensure some horizontal padding for button text */
116
- }
117
-
118
- /* Resize handle styles */
119
- .resize-handle {
120
- position: absolute;
121
- left: 0;
122
- top: 0;
123
- width: 8px;
124
- height: 100%;
125
- cursor: ew-resize;
126
- background-color: transparent;
127
- z-index: 9999 !important;
128
- transition: background-color 0.2s;
129
- pointer-events: auto;
130
- }
131
-
132
- .resize-handle:hover {
133
- background-color: rgba(0, 0, 0, 0.1) !important;
134
- }
135
-
136
- .thinking-message {
137
- background-color: #fffbe6;
138
- border-left: 4px solid #ffd700;
139
- position: relative;
140
- }
141
- .thinking-text, .thought-item {
142
- font-family: 'Fira Mono', 'Consolas', monospace;
143
- white-space: pre-wrap;
144
- display: block;
145
- vertical-align: middle;
146
- }
147
- .thinking-text.typing-cursor:after, .thought-item.typing-cursor:after {
148
- content: '|';
149
- animation: blink-cursor 1s steps(1) infinite;
150
- margin-left: 2px;
151
- color: #888;
152
- }
153
- @keyframes blink-cursor {
154
- 0%, 100% { opacity: 1; }
155
- 50% { opacity: 0; }
156
- }
157
- .thoughts-area {
158
- background: #f9f9f9;
159
- border-left: 2px dashed #ffd700;
160
- margin-top: 6px;
161
- padding: 6px 10px;
162
- border-radius: 8px;
163
- }
164
- .thought-toggle-arrow {
165
- cursor: pointer;
166
- font-size: 1.1em;
167
- color: #bfa100;
168
- margin-right: 4px;
169
- user-select: none;
170
- }
171
-
172
- /* Literature confirmation dialog styles */
173
- .literature-confirmation-dialog {
174
- background: linear-gradient(135deg, #fff3cd 0%, #fef5e7 100%);
175
- border: 2px solid #ffc107;
176
- border-radius: 10px;
177
- padding: 18px 22px;
178
- margin: 12px auto;
179
- box-shadow: 0 3px 12px rgba(0,0,0,0.12);
180
- animation: fadeInScale 0.4s ease-out;
181
- max-width: 85%;
182
- width: fit-content;
183
- min-width: 280px;
184
- position: relative;
185
- display: block;
186
- }
187
-
188
- .confirmation-content {
189
- text-align: center;
190
- width: 100%;
191
- }
192
-
193
- .confirmation-message {
194
- font-size: 14px;
195
- color: #856404;
196
- margin-bottom: 18px;
197
- line-height: 1.5;
198
- font-weight: 500;
199
- padding: 0 8px;
200
- max-width: 350px;
201
- margin-left: auto;
202
- margin-right: auto;
203
- }
204
-
205
- .confirmation-buttons {
206
- display: grid;
207
- grid-template-columns: 1fr 1fr;
208
- gap: 8px;
209
- justify-content: center;
210
- align-items: center;
211
- }
212
-
213
- .confirmation-buttons .btn {
214
- min-width: 70px;
215
- font-size: 12px;
216
- font-weight: 600;
217
- padding: 8px 12px;
218
- border-radius: 6px;
219
- cursor: pointer;
220
- border: none;
221
- transition: all 0.3s ease;
222
- display: inline-flex;
223
- align-items: center;
224
- justify-content: center;
225
- white-space: nowrap;
226
- text-align: center;
227
- }
228
-
229
- .confirmation-paper {
230
- background: linear-gradient(135deg, #007bff 0%, #0056b3 100%);
231
- color: white;
232
- box-shadow: 0 2px 6px rgba(0, 123, 255, 0.3);
233
- }
234
-
235
- .confirmation-paper:hover {
236
- background: linear-gradient(135deg, #0056b3 0%, #004085 100%);
237
- transform: translateY(-1px);
238
- box-shadow: 0 3px 8px rgba(0, 123, 255, 0.4);
239
- }
240
-
241
- .confirmation-external {
242
- background: linear-gradient(135deg, #28a745 0%, #20c997 100%);
243
- color: white;
244
- box-shadow: 0 2px 6px rgba(40, 167, 69, 0.3);
245
- }
246
-
247
- .confirmation-external:hover {
248
- background: linear-gradient(135deg, #218838 0%, #1aa085 100%);
249
- transform: translateY(-1px);
250
- box-shadow: 0 3px 8px rgba(40, 167, 69, 0.4);
251
- }
252
-
253
- .confirmation-both {
254
- background: linear-gradient(135deg, #6f42c1 0%, #5a2d91 100%);
255
- color: white;
256
- box-shadow: 0 2px 6px rgba(111, 66, 193, 0.3);
257
- }
258
-
259
- .confirmation-both:hover {
260
- background: linear-gradient(135deg, #5a2d91 0%, #4c2470 100%);
261
- transform: translateY(-1px);
262
- box-shadow: 0 3px 8px rgba(111, 66, 193, 0.4);
263
- }
264
-
265
- .confirmation-none {
266
- background: linear-gradient(135deg, #6c757d 0%, #adb5bd 100%);
267
- color: white;
268
- box-shadow: 0 2px 6px rgba(108, 117, 125, 0.3);
269
- }
270
-
271
- .confirmation-none:hover {
272
- background: linear-gradient(135deg, #545b62 0%, #868e96 100%);
273
- transform: translateY(-1px);
274
- box-shadow: 0 3px 8px rgba(108, 117, 125, 0.4);
275
- }
276
-
277
- @keyframes fadeIn {
278
- from { opacity: 0; transform: translateY(-10px); }
279
- to { opacity: 1; transform: translateY(0); }
280
- }
281
-
282
- @keyframes fadeInScale {
283
- from {
284
- opacity: 0;
285
- transform: translateY(-10px) scale(0.95);
286
- }
287
- to {
288
- opacity: 1;
289
- transform: translateY(0) scale(1);
290
- }
291
- }
292
-
293
- /* Responsive design for confirmation dialog */
294
- @media (max-width: 380px) {
295
- .literature-confirmation-dialog {
296
- margin: 8px 3px;
297
- padding: 15px 12px;
298
- min-width: 250px;
299
- max-width: calc(100% - 6px);
300
- }
301
-
302
- .confirmation-message {
303
- font-size: 13px;
304
- padding: 0 6px;
305
- margin-bottom: 15px;
306
- }
307
-
308
- .confirmation-buttons {
309
- grid-template-columns: 1fr;
310
- gap: 6px;
311
- }
312
-
313
- .confirmation-buttons .btn {
314
- font-size: 11px;
315
- padding: 6px 10px;
316
- min-width: 60px;
317
- }
318
- }
319
-
320
- @media (max-width: 280px) {
321
- .literature-confirmation-dialog {
322
- margin: 6px 2px;
323
- padding: 12px 8px;
324
- min-width: auto;
325
- }
326
-
327
- .confirmation-message {
328
- font-size: 12px;
329
- padding: 0 4px;
330
- }
331
-
332
- .confirmation-buttons .btn {
333
- font-size: 10px;
334
- padding: 5px 8px;
335
- }
336
  }
 
1
+ /* www/chat_styles.css */
2
+
3
+ .chat-toggle-button {
4
+ position: fixed; /* Fixed position */
5
+ top: 10px; /* Adjust as needed for navbar height */
6
+ right: 20px;
7
+ z-index: 1051; /* Higher than sidebar to be clickable if sidebar somehow overlaps header */
8
+ }
9
+
10
+ /* Styles for the chat sidebar itself are mostly inline in chatSidebarUI for now,
11
+ but could be moved here. For example: */
12
+ .chat-sidebar {
13
+ position: fixed !important;
14
+ right: 0;
15
+ top: 0;
16
+ height: 100vh;
17
+ width: 350px;
18
+ min-width: 250px;
19
+ max-width: 3200px;
20
+ z-index: 1050;
21
+ background-color: #f8f9fa;
22
+ border-left: 1px solid #dee2e6;
23
+ box-shadow: -2px 0 5px rgba(0,0,0,0.1);
24
+ transition: width 0.1s ease-out;
25
+ padding: 15px;
26
+ display: none;
27
+ pointer-events: auto;
28
+ }
29
+
30
+ .chat-messages-area {
31
+ /* height: calc(100vh - 200px); */ /* Adjust based on header/footer/input area */
32
+ /* overflow-y: auto; */
33
+ /* border: 1px solid #ccc; */
34
+ /* padding: 10px; */
35
+ /* margin-bottom: 10px; */
36
+ /* background-color: white; */
37
+ }
38
+
39
+ .chat-input-area {
40
+ /* display: flex; */
41
+ box-sizing: border-box; /* Add this to the container */
42
+ }
43
+
44
+ /* Basic styling for messages (example) */
45
+ .chat-message {
46
+ padding: 8px 12px;
47
+ margin-bottom: 8px;
48
+ border-radius: 15px;
49
+ word-wrap: break-word;
50
+ max-width: 85%;
51
+ clear: both; /* Ensures messages don't overlap if floats were used */
52
+ }
53
+ .user-message {
54
+ background-color: #007bff; /* Primary blue for user */
55
+ color: white;
56
+ float: right;
57
+ margin-left: auto; /* Pushes to the right */
58
+ border-bottom-right-radius: 5px; /* Slightly different rounding for bubble effect */
59
+ }
60
+ .agent-message {
61
+ background-color: #e9ecef; /* Light grey for agent */
62
+ color: #495057;
63
+ float: left;
64
+ margin-right: auto; /* Pushes to the left */
65
+ border-bottom-left-radius: 5px; /* Slightly different rounding for bubble effect */
66
+ }
67
+
68
+ /* Styling for disclaimer messages */
69
+ .agent-message.disclaimer {
70
+ background-color: #fff3cd; /* Light yellow background for warning */
71
+ border: 1px solid #ffeaa7; /* Light orange border */
72
+ color: #856404; /* Dark yellow-brown text */
73
+ font-size: 0.9em; /* Slightly smaller text */
74
+ font-style: italic; /* Italic for emphasis */
75
+ }
76
+
77
+ /* Styles for the custom chat tab list item */
78
+ .custom-chat-tab-li {
79
+ position: relative; /* Allows precise positioning of the link if needed */
80
+ margin-top: 12px; /* ADJUSTED: Increased to 15px to move the tab down further. */
81
+ }
82
+
83
+ /* Styles for the custom chat tab link */
84
+ #customChatTabLink {
85
+ padding: 10px 15px; /* Adjust padding to match other tabs */
86
+ line-height: 20px; /* Adjust line-height to match other tabs */
87
+ display: block;
88
+ position: relative;
89
+ color: white !important; /* Make font color white */
90
+ }
91
+
92
+ #customChatTabLink:hover,
93
+ #customChatTabLink:focus {
94
+ text-decoration: none;
95
+ background-color: #333333; /* Darker background on hover for white text, e.g., dark grey */
96
+ color: white !important; /* Ensure hover text color is also white */
97
+ }
98
+
99
+ /* Ensure input and button in chat sidebar respect their defined widths */
100
+ .chat-input-area #chatInput {
101
+ box-sizing: border-box;
102
+ flex-grow: 1; /* Allow input to take available space */
103
+ margin-right: 5px; /* Add a small margin to separate from button if needed */
104
+ height: 38px !important; /* Explicit height */
105
+ }
106
+
107
+ .chat-input-area #sendChatMsg {
108
+ box-sizing: border-box;
109
+ flex-shrink: 0; /* Prevent button from shrinking */
110
+ display: inline-flex !important; /* Make the button a flex container */
111
+ align-items: center !important; /* Vertically center content inside the button */
112
+ justify-content: center !important; /* Horizontally center content (good for icon+text) */
113
+ height: 38px !important; /* Explicit height, matching input */
114
+ padding-left: 10px !important; /* Ensure some horizontal padding for button text */
115
+ padding-right: 10px !important; /* Ensure some horizontal padding for button text */
116
+ }
117
+
118
+ /* Resize handle styles */
119
+ .resize-handle {
120
+ position: absolute;
121
+ left: 0;
122
+ top: 0;
123
+ width: 8px;
124
+ height: 100%;
125
+ cursor: ew-resize;
126
+ background-color: transparent;
127
+ z-index: 9999 !important;
128
+ transition: background-color 0.2s;
129
+ pointer-events: auto;
130
+ }
131
+
132
+ .resize-handle:hover {
133
+ background-color: rgba(0, 0, 0, 0.1) !important;
134
+ }
135
+
136
+ .thinking-message {
137
+ background-color: #fffbe6;
138
+ border-left: 4px solid #ffd700;
139
+ position: relative;
140
+ }
141
+ .thinking-text, .thought-item {
142
+ font-family: 'Fira Mono', 'Consolas', monospace;
143
+ white-space: pre-wrap;
144
+ display: block;
145
+ vertical-align: middle;
146
+ }
147
+ .thinking-text.typing-cursor:after, .thought-item.typing-cursor:after {
148
+ content: '|';
149
+ animation: blink-cursor 1s steps(1) infinite;
150
+ margin-left: 2px;
151
+ color: #888;
152
+ }
153
+ @keyframes blink-cursor {
154
+ 0%, 100% { opacity: 1; }
155
+ 50% { opacity: 0; }
156
+ }
157
+ .thoughts-area {
158
+ background: #f9f9f9;
159
+ border-left: 2px dashed #ffd700;
160
+ margin-top: 6px;
161
+ padding: 6px 10px;
162
+ border-radius: 8px;
163
+ }
164
+ .thought-toggle-arrow {
165
+ cursor: pointer;
166
+ font-size: 1.1em;
167
+ color: #bfa100;
168
+ margin-right: 4px;
169
+ user-select: none;
170
+ }
171
+
172
+
173
+ /* Literature toggle button styles */
174
+ .literature-toggle-btn {
175
+ transition: all 0.3s ease;
176
+ border: 2px solid #17a2b8;
177
+ }
178
+
179
+ .literature-toggle-btn.active {
180
+ background-color: #17a2b8 !important;
181
+ color: white !important;
182
+ border-color: #17a2b8;
183
+ }
184
+
185
+ .literature-toggle-btn:not(.active) {
186
+ background-color: transparent;
187
+ color: #17a2b8;
188
+ }
189
+
190
+ .literature-toggle-btn:hover {
191
+ background-color: #17a2b8;
192
+ color: white;
193
+ }
194
+
195
+ .literature-toggle-container {
196
+ margin: 5px 0;
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
197
  }
www/pages_description.md CHANGED
@@ -1,30 +1,30 @@
1
- homepage:
2
- - This webpage introduces the "TF atlas of CD8+ T cell states," a platform resulting from a multi-omics study focused on understanding and selectively programming T cell differentiation. The research, a collaboration involving UC San Diego, the Salk Institute, and The University of North Carolina at Chapel Hill, leverages a comprehensive transcriptional and epigenetic atlas generated from RNA-seq and ATAC-seq data. The atlas helps predict transcription factor (TF) activity and define differentiation trajectories, aiming to identify TFs that can control specific T cell states, such as terminally exhausted and tissue-resident memory T cells, for potential therapeutic applications in areas like cancer and viral infections.
3
-
4
- TF Catalog - Search TF Scores:
5
- This webpage provides a tool to "Search TF Scores" related to T cell differentiation. It features a diagram illustrating the "Memory path" and "Exhaustion path" of T cells, including states like Naive, Memory Precursor (MP), Effector T cell (TE), Tissue-Resident Memory (TRM), Terminal Effector (TEM), Progenitor Exhausted (TEXprog), and Terminally Exhausted (TEX). The core of the page is a searchable table displaying "TF activity score" for various transcription factors (TFs) across different T cell states and datasets. This allows users to explore and compare the activity levels of specific TFs in distinct T cell populations, aiding in the understanding of T cell fate decisions.
6
-
7
- TF Catalog - Cell State Specific TF Catalog:
8
- This webpage displays "Naive Specific Cells & normalized TF Activity Scores" as part of a "Cell State Specific TF Catalog." It features a dot plot visualization where each row represents a transcription factor (TF) and each column likely represents a specific sample or condition within naive T cells. The color intensity of the dots corresponds to the normalized TF activity score (PageRank score), while the size of the dots indicates the log-transformed gene expression level (TPM). This allows users to explore and compare the activity and expression of various TFs within the naive T cell state.
9
-
10
- TF Catalog - Multi-State TFs
11
- This webpage displays a series of heatmaps visualizing normalized PageRank scores, likely representing transcription factor activity, across various T cell differentiation states (Naive, MP, TE, TRM, TEM, TEXprog, TEXeff, TEXterm). The heatmaps are segmented into categories such as "Shared in cell states from acute infection," "Shared in cell states from chronic infection," and specific T cell subsets like "TRM & TEXPROG" and "MP, TE, TEXPROG." This presentation allows for the comparative analysis of transcription factor activity profiles across different T cell populations and under varying immunological contexts, revealing potential regulatory patterns.
12
-
13
- TF Wave Analysis:
14
- This webpage, titled "TF Wave Analysis," is dedicated to exploring the dynamic activity patterns of transcription factors (TFs) during T cell differentiation. It presents a series of visualizations, referred to as "TF Waves" (Wave 1, Wave 2, etc.), which illustrate how the activity of different sets of TFs changes as T cells transition through various states (Naive, MP, TRM, TEM, TEXprog, TEXterm). These waves are depicted on diagrams of T cell differentiation pathways, with color intensity and accompanying bar graphs likely indicating the strength or timing of TF activity. A table at the bottom of the page lists specific TFs and their association with these identified waves, allowing users to understand the sequential and coordinated roles of TFs in orchestrating T cell fate.
15
-
16
- TF Network Analysis - Search TF-TF Correlation in TRM/TEXterm:
17
- This webpage provides a tool to "Search TF-TF Correlation in TRM/TEXterm," allowing users to explore interactions and correlations between transcription factors (TFs) specifically within Tissue-Resident Memory (TRM) and Terminally Exhausted (TEXterm) T cell states.
18
-
19
- The page explains that it uses data from ChIP-seq and Hi-C to build TF interaction networks. It visualizes these relationships, showing how a "TF-regulatee network" and a "TF X TF correlation matrix" contribute to understanding "TF-TF association." Users can enter a transcription factor of interest to search for its correlations. A key explains the network visualization: circle color indicates TF specificity to TRM (green) or TEXterm (brown), line thickness denotes interaction intensity, and line color shows if the interaction is found in TRM (green) or TEXterm (brown). This tool aims to identify cooperations between DNA-binding proteins in these specific T cell states.
20
-
21
- TF Network Analysis - TF Community in TRM/TEXterm:
22
- This webpage focuses on "TF Community in TRM/TEXterm," illustrating how transcription factor (TF) associations are analyzed through clustering to identify distinct TF communities within Tissue-Resident Memory (TRM) and Terminally Exhausted (TEXterm) T cells.
23
-
24
- The page displays several network visualizations: one showing combined TRM and TEXterm TF communities and their interconnections, another detailing TRM-specific TF-TF interactions organized into communities (C1-C5), and a third depicting TEXterm-specific TF-TF interactions, also grouped into communities (C1-C5).
25
-
26
- Below these networks, tables likely provide details about the TFs that constitute each identified community. Furthermore, the webpage highlights "Shared pathways" between TRM and TEXterm communities and "Enriched pathways" specific to either the TRM or TEXterm TF networks, linking these TF communities to biological functions such as "IL-2 production," "cell-cell adhesion," "T cell activation," "intrinsic apoptosis," and "catabolism." This allows for an understanding of the collaborative roles of TFs in regulating distinct cellular processes within these two T cell states.
27
-
28
- Multi-omics Data:
29
- This webpage, titled "Multi-omics Data," presents a comprehensive, scrollable table cataloging various experimental datasets relevant to T cell research. Each entry in the table details specific studies, including information such as the primary author, laboratory, publication year, data accession number, type of data (e.g., RNA-seq, ATAC-seq), biological species (primarily mouse), infection model used (e.g., LCMV Arm), and the specific T cell populations analyzed (e.g., Naive, MP, TE, TRM, TexProg, TexTerm) along with their defining markers or characteristics. The page includes features for searching and adjusting the number of displayed entries, indicating it serves as an interactive repository for accessing and reviewing details of diverse multi-omics datasets.
30
-
 
1
+ homepage:
2
+ - This webpage introduces the "TF atlas of CD8+ T cell states," a platform resulting from a multi-omics study focused on understanding and selectively programming T cell differentiation. The research, a collaboration involving UC San Diego, the Salk Institute, and The University of North Carolina at Chapel Hill, leverages a comprehensive transcriptional and epigenetic atlas generated from RNA-seq and ATAC-seq data. The atlas helps predict transcription factor (TF) activity and define differentiation trajectories, aiming to identify TFs that can control specific T cell states, such as terminally exhausted and tissue-resident memory T cells, for potential therapeutic applications in areas like cancer and viral infections.
3
+
4
+ TF Catalog - Search TF Scores:
5
+ This webpage provides a tool to "Search TF Scores" related to T cell differentiation. It features a diagram illustrating the "Memory path" and "Exhaustion path" of T cells, including states like Naive, Memory Precursor (MP), Effector T cell (TE), Tissue-Resident Memory (TRM), Terminal Effector (TEM), Progenitor Exhausted (TEXprog), and Terminally Exhausted (TEX). The core of the page is a searchable table displaying "TF activity score" for various transcription factors (TFs) across different T cell states and datasets. This allows users to explore and compare the activity levels of specific TFs in distinct T cell populations, aiding in the understanding of T cell fate decisions.
6
+
7
+ TF Catalog - Cell State Specific TF Catalog:
8
+ This webpage displays "Naive Specific Cells & normalized TF Activity Scores" as part of a "Cell State Specific TF Catalog." It features a dot plot visualization where each row represents a transcription factor (TF) and each column likely represents a specific sample or condition within naive T cells. The color intensity of the dots corresponds to the normalized TF activity score (PageRank score), while the size of the dots indicates the log-transformed gene expression level (TPM). This allows users to explore and compare the activity and expression of various TFs within the naive T cell state.
9
+
10
+ TF Catalog - Multi-State TFs
11
+ This webpage displays a series of heatmaps visualizing normalized PageRank scores, likely representing transcription factor activity, across various T cell differentiation states (Naive, MP, TE, TRM, TEM, TEXprog, TEXeff, TEXterm). The heatmaps are segmented into categories such as "Shared in cell states from acute infection," "Shared in cell states from chronic infection," and specific T cell subsets like "TRM & TEXPROG" and "MP, TE, TEXPROG." This presentation allows for the comparative analysis of transcription factor activity profiles across different T cell populations and under varying immunological contexts, revealing potential regulatory patterns.
12
+
13
+ TF Wave Analysis:
14
+ This webpage, titled "TF Wave Analysis," is dedicated to exploring the dynamic activity patterns of transcription factors (TFs) during T cell differentiation. It presents a series of visualizations, referred to as "TF Waves" (Wave 1, Wave 2, etc.), which illustrate how the activity of different sets of TFs changes as T cells transition through various states (Naive, MP, TRM, TEM, TEXprog, TEXterm). These waves are depicted on diagrams of T cell differentiation pathways, with color intensity and accompanying bar graphs likely indicating the strength or timing of TF activity. A table at the bottom of the page lists specific TFs and their association with these identified waves, allowing users to understand the sequential and coordinated roles of TFs in orchestrating T cell fate.
15
+
16
+ TF Network Analysis - Search TF-TF Correlation in TRM/TEXterm:
17
+ This webpage provides a tool to "Search TF-TF Correlation in TRM/TEXterm," allowing users to explore interactions and correlations between transcription factors (TFs) specifically within Tissue-Resident Memory (TRM) and Terminally Exhausted (TEXterm) T cell states.
18
+
19
+ The page explains that it uses data from ChIP-seq and Hi-C to build TF interaction networks. It visualizes these relationships, showing how a "TF-regulatee network" and a "TF X TF correlation matrix" contribute to understanding "TF-TF association." Users can enter a transcription factor of interest to search for its correlations. A key explains the network visualization: circle color indicates TF specificity to TRM (green) or TEXterm (brown), line thickness denotes interaction intensity, and line color shows if the interaction is found in TRM (green) or TEXterm (brown). This tool aims to identify cooperations between DNA-binding proteins in these specific T cell states.
20
+
21
+ TF Network Analysis - TF Community in TRM/TEXterm:
22
+ This webpage focuses on "TF Community in TRM/TEXterm," illustrating how transcription factor (TF) associations are analyzed through clustering to identify distinct TF communities within Tissue-Resident Memory (TRM) and Terminally Exhausted (TEXterm) T cells.
23
+
24
+ The page displays several network visualizations: one showing combined TRM and TEXterm TF communities and their interconnections, another detailing TRM-specific TF-TF interactions organized into communities (C1-C5), and a third depicting TEXterm-specific TF-TF interactions, also grouped into communities (C1-C5).
25
+
26
+ Below these networks, tables likely provide details about the TFs that constitute each identified community. Furthermore, the webpage highlights "Shared pathways" between TRM and TEXterm communities and "Enriched pathways" specific to either the TRM or TEXterm TF networks, linking these TF communities to biological functions such as "IL-2 production," "cell-cell adhesion," "T cell activation," "intrinsic apoptosis," and "catabolism." This allows for an understanding of the collaborative roles of TFs in regulating distinct cellular processes within these two T cell states.
27
+
28
+ Multi-omics Data:
29
+ This webpage, titled "Multi-omics Data," presents a comprehensive, scrollable table cataloging various experimental datasets relevant to T cell research. Each entry in the table details specific studies, including information such as the primary author, laboratory, publication year, data accession number, type of data (e.g., RNA-seq, ATAC-seq), biological species (primarily mouse), infection model used (e.g., LCMV Arm), and the specific T cell populations analyzed (e.g., Naive, MP, TE, TRM, TexProg, TexTerm) along with their defining markers or characteristics. The page includes features for searching and adjusting the number of displayed entries, indicating it serves as an interactive repository for accessing and reviewing details of diverse multi-omics datasets.
30
+
www_backup_original/pages_description.md CHANGED
@@ -1,30 +1,30 @@
1
- homepage:
2
- - This webpage introduces the "TF atlas of CD8+ T cell states," a platform resulting from a multi-omics study focused on understanding and selectively programming T cell differentiation. The research, a collaboration involving UC San Diego, the Salk Institute, and The University of North Carolina at Chapel Hill, leverages a comprehensive transcriptional and epigenetic atlas generated from RNA-seq and ATAC-seq data. The atlas helps predict transcription factor (TF) activity and define differentiation trajectories, aiming to identify TFs that can control specific T cell states, such as terminally exhausted and tissue-resident memory T cells, for potential therapeutic applications in areas like cancer and viral infections.
3
-
4
- TF Catalog - Search TF Scores:
5
- This webpage provides a tool to "Search TF Scores" related to T cell differentiation. It features a diagram illustrating the "Memory path" and "Exhaustion path" of T cells, including states like Naive, Memory Precursor (MP), Effector T cell (TE), Tissue-Resident Memory (TRM), Terminal Effector (TEM), Progenitor Exhausted (TEXprog), and Terminally Exhausted (TEX). The core of the page is a searchable table displaying "TF activity score" for various transcription factors (TFs) across different T cell states and datasets. This allows users to explore and compare the activity levels of specific TFs in distinct T cell populations, aiding in the understanding of T cell fate decisions.
6
-
7
- TF Catalog - Cell State Specific TF Catalog:
8
- This webpage displays "Naive Specific Cells & normalized TF Activity Scores" as part of a "Cell State Specific TF Catalog." It features a dot plot visualization where each row represents a transcription factor (TF) and each column likely represents a specific sample or condition within naive T cells. The color intensity of the dots corresponds to the normalized TF activity score (PageRank score), while the size of the dots indicates the log-transformed gene expression level (TPM). This allows users to explore and compare the activity and expression of various TFs within the naive T cell state.
9
-
10
- TF Catalog - Multi-State TFs
11
- This webpage displays a series of heatmaps visualizing normalized PageRank scores, likely representing transcription factor activity, across various T cell differentiation states (Naive, MP, TE, TRM, TEM, TEXprog, TEXeff, TEXterm). The heatmaps are segmented into categories such as "Shared in cell states from acute infection," "Shared in cell states from chronic infection," and specific T cell subsets like "TRM & TEXPROG" and "MP, TE, TEXPROG." This presentation allows for the comparative analysis of transcription factor activity profiles across different T cell populations and under varying immunological contexts, revealing potential regulatory patterns.
12
-
13
- TF Wave Analysis:
14
- This webpage, titled "TF Wave Analysis," is dedicated to exploring the dynamic activity patterns of transcription factors (TFs) during T cell differentiation. It presents a series of visualizations, referred to as "TF Waves" (Wave 1, Wave 2, etc.), which illustrate how the activity of different sets of TFs changes as T cells transition through various states (Naive, MP, TRM, TEM, TEXprog, TEXterm). These waves are depicted on diagrams of T cell differentiation pathways, with color intensity and accompanying bar graphs likely indicating the strength or timing of TF activity. A table at the bottom of the page lists specific TFs and their association with these identified waves, allowing users to understand the sequential and coordinated roles of TFs in orchestrating T cell fate.
15
-
16
- TF Network Analysis - Search TF-TF Correlation in TRM/TEXterm:
17
- This webpage provides a tool to "Search TF-TF Correlation in TRM/TEXterm," allowing users to explore interactions and correlations between transcription factors (TFs) specifically within Tissue-Resident Memory (TRM) and Terminally Exhausted (TEXterm) T cell states.
18
-
19
- The page explains that it uses data from ChIP-seq and Hi-C to build TF interaction networks. It visualizes these relationships, showing how a "TF-regulatee network" and a "TF X TF correlation matrix" contribute to understanding "TF-TF association." Users can enter a transcription factor of interest to search for its correlations. A key explains the network visualization: circle color indicates TF specificity to TRM (green) or TEXterm (brown), line thickness denotes interaction intensity, and line color shows if the interaction is found in TRM (green) or TEXterm (brown). This tool aims to identify cooperations between DNA-binding proteins in these specific T cell states.
20
-
21
- TF Network Analysis - TF Community in TRM/TEXterm:
22
- This webpage focuses on "TF Community in TRM/TEXterm," illustrating how transcription factor (TF) associations are analyzed through clustering to identify distinct TF communities within Tissue-Resident Memory (TRM) and Terminally Exhausted (TEXterm) T cells.
23
-
24
- The page displays several network visualizations: one showing combined TRM and TEXterm TF communities and their interconnections, another detailing TRM-specific TF-TF interactions organized into communities (C1-C5), and a third depicting TEXterm-specific TF-TF interactions, also grouped into communities (C1-C5).
25
-
26
- Below these networks, tables likely provide details about the TFs that constitute each identified community. Furthermore, the webpage highlights "Shared pathways" between TRM and TEXterm communities and "Enriched pathways" specific to either the TRM or TEXterm TF networks, linking these TF communities to biological functions such as "IL-2 production," "cell-cell adhesion," "T cell activation," "intrinsic apoptosis," and "catabolism." This allows for an understanding of the collaborative roles of TFs in regulating distinct cellular processes within these two T cell states.
27
-
28
- Multi-omics Data:
29
- This webpage, titled "Multi-omics Data," presents a comprehensive, scrollable table cataloging various experimental datasets relevant to T cell research. Each entry in the table details specific studies, including information such as the primary author, laboratory, publication year, data accession number, type of data (e.g., RNA-seq, ATAC-seq), biological species (primarily mouse), infection model used (e.g., LCMV Arm), and the specific T cell populations analyzed (e.g., Naive, MP, TE, TRM, TexProg, TexTerm) along with their defining markers or characteristics. The page includes features for searching and adjusting the number of displayed entries, indicating it serves as an interactive repository for accessing and reviewing details of diverse multi-omics datasets.
30
-
 
1
+ homepage:
2
+ - This webpage introduces the "TF atlas of CD8+ T cell states," a platform resulting from a multi-omics study focused on understanding and selectively programming T cell differentiation. The research, a collaboration involving UC San Diego, the Salk Institute, and The University of North Carolina at Chapel Hill, leverages a comprehensive transcriptional and epigenetic atlas generated from RNA-seq and ATAC-seq data. The atlas helps predict transcription factor (TF) activity and define differentiation trajectories, aiming to identify TFs that can control specific T cell states, such as terminally exhausted and tissue-resident memory T cells, for potential therapeutic applications in areas like cancer and viral infections.
3
+
4
+ TF Catalog - Search TF Scores:
5
+ This webpage provides a tool to "Search TF Scores" related to T cell differentiation. It features a diagram illustrating the "Memory path" and "Exhaustion path" of T cells, including states like Naive, Memory Precursor (MP), Effector T cell (TE), Tissue-Resident Memory (TRM), Terminal Effector (TEM), Progenitor Exhausted (TEXprog), and Terminally Exhausted (TEX). The core of the page is a searchable table displaying "TF activity score" for various transcription factors (TFs) across different T cell states and datasets. This allows users to explore and compare the activity levels of specific TFs in distinct T cell populations, aiding in the understanding of T cell fate decisions.
6
+
7
+ TF Catalog - Cell State Specific TF Catalog:
8
+ This webpage displays "Naive Specific Cells & normalized TF Activity Scores" as part of a "Cell State Specific TF Catalog." It features a dot plot visualization where each row represents a transcription factor (TF) and each column likely represents a specific sample or condition within naive T cells. The color intensity of the dots corresponds to the normalized TF activity score (PageRank score), while the size of the dots indicates the log-transformed gene expression level (TPM). This allows users to explore and compare the activity and expression of various TFs within the naive T cell state.
9
+
10
+ TF Catalog - Multi-State TFs
11
+ This webpage displays a series of heatmaps visualizing normalized PageRank scores, likely representing transcription factor activity, across various T cell differentiation states (Naive, MP, TE, TRM, TEM, TEXprog, TEXeff, TEXterm). The heatmaps are segmented into categories such as "Shared in cell states from acute infection," "Shared in cell states from chronic infection," and specific T cell subsets like "TRM & TEXPROG" and "MP, TE, TEXPROG." This presentation allows for the comparative analysis of transcription factor activity profiles across different T cell populations and under varying immunological contexts, revealing potential regulatory patterns.
12
+
13
+ TF Wave Analysis:
14
+ This webpage, titled "TF Wave Analysis," is dedicated to exploring the dynamic activity patterns of transcription factors (TFs) during T cell differentiation. It presents a series of visualizations, referred to as "TF Waves" (Wave 1, Wave 2, etc.), which illustrate how the activity of different sets of TFs changes as T cells transition through various states (Naive, MP, TRM, TEM, TEXprog, TEXterm). These waves are depicted on diagrams of T cell differentiation pathways, with color intensity and accompanying bar graphs likely indicating the strength or timing of TF activity. A table at the bottom of the page lists specific TFs and their association with these identified waves, allowing users to understand the sequential and coordinated roles of TFs in orchestrating T cell fate.
15
+
16
+ TF Network Analysis - Search TF-TF Correlation in TRM/TEXterm:
17
+ This webpage provides a tool to "Search TF-TF Correlation in TRM/TEXterm," allowing users to explore interactions and correlations between transcription factors (TFs) specifically within Tissue-Resident Memory (TRM) and Terminally Exhausted (TEXterm) T cell states.
18
+
19
+ The page explains that it uses data from ChIP-seq and Hi-C to build TF interaction networks. It visualizes these relationships, showing how a "TF-regulatee network" and a "TF X TF correlation matrix" contribute to understanding "TF-TF association." Users can enter a transcription factor of interest to search for its correlations. A key explains the network visualization: circle color indicates TF specificity to TRM (green) or TEXterm (brown), line thickness denotes interaction intensity, and line color shows if the interaction is found in TRM (green) or TEXterm (brown). This tool aims to identify cooperations between DNA-binding proteins in these specific T cell states.
20
+
21
+ TF Network Analysis - TF Community in TRM/TEXterm:
22
+ This webpage focuses on "TF Community in TRM/TEXterm," illustrating how transcription factor (TF) associations are analyzed through clustering to identify distinct TF communities within Tissue-Resident Memory (TRM) and Terminally Exhausted (TEXterm) T cells.
23
+
24
+ The page displays several network visualizations: one showing combined TRM and TEXterm TF communities and their interconnections, another detailing TRM-specific TF-TF interactions organized into communities (C1-C5), and a third depicting TEXterm-specific TF-TF interactions, also grouped into communities (C1-C5).
25
+
26
+ Below these networks, tables likely provide details about the TFs that constitute each identified community. Furthermore, the webpage highlights "Shared pathways" between TRM and TEXterm communities and "Enriched pathways" specific to either the TRM or TEXterm TF networks, linking these TF communities to biological functions such as "IL-2 production," "cell-cell adhesion," "T cell activation," "intrinsic apoptosis," and "catabolism." This allows for an understanding of the collaborative roles of TFs in regulating distinct cellular processes within these two T cell states.
27
+
28
+ Multi-omics Data:
29
+ This webpage, titled "Multi-omics Data," presents a comprehensive, scrollable table cataloging various experimental datasets relevant to T cell research. Each entry in the table details specific studies, including information such as the primary author, laboratory, publication year, data accession number, type of data (e.g., RNA-seq, ATAC-seq), biological species (primarily mouse), infection model used (e.g., LCMV Arm), and the specific T cell populations analyzed (e.g., Naive, MP, TE, TRM, TexProg, TexTerm) along with their defining markers or characteristics. The page includes features for searching and adjusting the number of displayed entries, indicating it serves as an interactive repository for accessing and reviewing details of diverse multi-omics datasets.
30
+