Spaces:

taijichat
/

chat

Sleeping

WeMWish commited on Jun 30

Commit

557ed35

1 Parent(s): 2ad0e14

Fix infinite loop bug in literature search system

- Fix JSON detection regex in GenerationAgent (double backslash escaping issue)
- Implement structure-based literature data detection instead of text patterns
- Add operation tracking to prevent duplicate code execution
- Add comprehensive debugging logs for conversation history flow
- Implement proper loop control with attempt limits

Files changed (30) hide show

.dockerignore +39 -39
.gitignore +11 -12
CHANGELOG.md +27 -0
CLAUDE.md +157 -0
Dockerfile +47 -47
R/caching.R +100 -100
WORKFLOW_CHANGES.md +287 -0
agents/executor_agent.py +80 -80
agents/generation_agent.py +63 -59
agents/manager_agent.py +53 -39
agents/supervisor_agent.py +267 -267
cache_data/excel_schema_cache.json +5 -5
chat_ui.R +8 -0
codebase_analysis.md +152 -152
folder_structure_documentation.md +107 -107
long_operations.R +76 -76
main.py +37 -37
plan_temp.txt +29 -29
server.R +0 -0
tested_queries.txt +39 -39
tools/agent_tools.py +0 -0
tools/agent_tools_documentation.md +189 -189
tools/excel_data_documentation.md +183 -183
tools/ui_texts.json +26 -26
ui.R +0 -0
warning_overlay.R +62 -62
www/chat_script.js +559 -643
www/chat_styles.css +196 -335
www/pages_description.md +30 -30
www_backup_original/pages_description.md +30 -30

.dockerignore CHANGED Viewed

@@ -1,40 +1,40 @@
-.git
-.Rproj.user
-.Rhistory
-.RData
-.Ruserdata
-# Ignore API key file
-# api_key.txt
-# Ignore Python cache
-__pycache__/
-*.pyc
-*.pyo
-*.pyd
-# Ignore R cache/build artifacts if any
-*.rds
-*.Renviron
-# OS-specific files
-.DS_Store
-Thumbs.db
-# IDE specific folders
-.vscode/
-.idea/
-# Logs and temporary files
-traces/
-*.log
-temp/
-# Ignore local R/reticulate config
-.Rprofile
-# Ignore virtual environments
-venv/
-.venv/
-ENV/
 env/

+.git
+.Rproj.user
+.Rhistory
+.RData
+.Ruserdata
+# Ignore API key file
+# api_key.txt
+# Ignore Python cache
+__pycache__/
+*.pyc
+*.pyo
+*.pyd
+# Ignore R cache/build artifacts if any
+*.rds
+*.Renviron
+# OS-specific files
+.DS_Store
+Thumbs.db
+# IDE specific folders
+.vscode/
+.idea/
+# Logs and temporary files
+traces/
+*.log
+temp/
+# Ignore local R/reticulate config
+.Rprofile
+# Ignore virtual environments
+venv/
+.venv/
+ENV/
 env/

.gitignore CHANGED Viewed

@@ -1,12 +1,11 @@
-__pycache__/
-api_key.txt
-# Ignore local R/reticulate config
-.Rprofile
-# Ignore virtual environments
-venv/
-.venv/
-ENV/
-env/
-*.md

+__pycache__/
+api_key.txt
+# Ignore local R/reticulate config
+.Rprofile
+# Ignore virtual environments
+venv/
+.venv/
+ENV/
+env/

CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,32 @@
 # TaijiChat Performance Optimization Changelog
 All notable changes to the TaijiChat performance optimization project are documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),

 # TaijiChat Performance Optimization Changelog
+## [Unreleased]
+### Added
+- Literature search toggle button in chat interface
+  - Users can now explicitly enable/disable external literature search
+  - Button placed below chat input for easy access
+  - Default state: external literature search disabled
+  - Visual feedback with active/inactive button states
+### Changed
+- Replaced automatic literature search behavior with user-controlled toggle
+- Literature search now defaults to disabled unless explicitly enabled by user
+### Fixed
+- Fixed infinite loop bug in literature search functionality
+- Fixed JSON detection regex in GenerationAgent (corrected escaping)
+- Fixed literature search data detection using structure-based analysis
+- Added operation tracking to prevent duplicate executions
+- Added comprehensive debugging for conversation history and loop detection
+### Removed
+- Legacy literature confirmation dialog system
+- Automatic LLM-based detection of literature search intent
+- Post-query literature confirmation prompts
+---
 All notable changes to the TaijiChat performance optimization project are documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),

CLAUDE.md ADDED Viewed

	@@ -0,0 +1,157 @@

+# CLAUDE.md
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+## Project Overview
+TaijiChat is a Shiny web application that combines R and Python to provide an interactive chat interface for analyzing transcription factor data from T cell states research. The application uses a multi-agent architecture with OpenAI GPT models to generate insights and visualizations from genomics datasets.
+## Key Architecture
+### Multi-Agent System
+The application uses a specialized agent architecture for handling user queries:
+- **ManagerAgent** (`agents/manager_agent.py`): Central orchestrator that manages conversation history, file uploads, and coordinates between specialized agents
+- **GenerationAgent** (`agents/generation_agent.py`): Creates execution plans using a structured 13-step reasoning process
+- **SupervisorAgent** (`agents/supervisor_agent.py`): Reviews generated code for safety and compliance before execution
+- **ExecutorAgent** (`agents/executor_agent.py`): Executes approved Python code in a restricted environment
+### Technology Stack
+- **R (Shiny)**: Frontend web interface and server logic
+- **Python**: Backend agents and data processing tools
+- **reticulate**: R-Python integration bridge
+- **OpenAI API**: Powers the intelligent agents
+- **Docker**: Containerization for deployment
+### Data Sources
+The application works with transcription factor research data stored in:
+- `/www/tablePagerank/`: TF PageRank scores for different cell states
+- `/www/waveanalysis/`: Wave analysis data and visualizations
+- `/www/TFcorintextrm/`: TF correlation data and network graphs
+- `/www/tfcommunities/`: Community analysis results
+## Development Commands
+### Running the Application
+**Docker (Recommended):**
+```bash
+docker build -t taijichat .
+docker run -p 7860:7860 taijichat
+```
+**Local R Development:**
+```r
+# Ensure Python environment is configured in ui.R first
+shiny::runApp('.', host='0.0.0.0', port=7860)
+```
+### Environment Setup
+**API Key Configuration:**
+- Set `OPENAI_API_KEY` environment variable, OR
+- Create `api_key.txt` file in project root
+**Python Environment:**
+Configure reticulate in `ui.R` by uncommenting one of:
+```r
+# Option 1: Python executable path
+reticulate::use_python("/path/to/python", required = TRUE)
+# Option 2: Virtual environment
+reticulate::use_virtualenv("venv_name", required = TRUE)
+# Option 3: Conda environment
+reticulate::use_condaenv("conda_env_name", required = TRUE)
+```
+**Install Python Dependencies:**
+```bash
+pip install -r requirements.txt
+```
+### Performance Features
+**Async Processing (Default):**
+- Set `TAIJICHAT_USE_ASYNC=TRUE` to enable async agents (default)
+- Set `TAIJICHAT_USE_ASYNC=FALSE` to use synchronous agents
+**Cache Management:**
+```r
+# Check cache statistics
+reticulate::py_run_string("
+from agents.smart_cache import get_cache_stats
+print('Cache Stats:', get_cache_stats())
+")
+```
+## Key Implementation Details
+### Agent Tool System
+All data analysis functions are centralized in `tools/agent_tools.py`:
+- Excel file discovery and schema caching
+- Literature search across multiple sources (Semantic Scholar, PubMed, ArXiv)
+- Image processing and visualization
+- TF ranking and analysis functions
+### Safety and Security
+- **Restricted Execution**: Python code runs in limited scope with only approved modules
+- **Code Review**: SupervisorAgent validates all generated code before execution
+- **Sandboxed Environment**: Only predefined tools and functions are accessible
+### R-Python Integration
+- Uses reticulate for bidirectional communication
+- R callbacks enable real-time progress updates from Python agents
+- Shared conversation history between R frontend and Python backend
+### File Management
+- Supports PDF upload with automatic conversion to images
+- File IDs track uploaded documents throughout conversation
+- Images stored in `/www/` with organized subdirectories
+## Important Considerations
+### Data Handling
+- **Pre-ranked Tables**: Never re-sort TF ranking data - tables come pre-ranked by importance
+- **Path Management**: All file paths are relative to project root via `BASE_WWW_PATH`
+- **Caching**: 5-minute TTL with 100MB memory limit for performance
+### Code Generation Rules
+- Only use functions from `tools.agent_tools` module
+- No direct file system access outside `/www/` directory
+- No external imports beyond approved modules in executor environment
+- All generated code must pass SupervisorAgent safety review
+### UI Integration
+- Chat interface uses custom JavaScript for real-time updates
+- Lazy loading implemented for large image datasets
+- Progress streaming shows agent reasoning steps to users
+## Troubleshooting
+**Common Issues:**
+- **reticulate errors**: Verify Python environment configuration in `ui.R`
+- **Import failures**: Ensure all requirements are installed in configured Python environment
+- **API errors**: Check `OPENAI_API_KEY` is set correctly
+- **Performance issues**: Enable async mode with `TAIJICHAT_USE_ASYNC=TRUE`
+**Asset Optimization:**
+- Images optimized to 49% of original size for faster loading
+- Backup of original assets available in `www_backup_original/`
+## Literature Search Toggle Feature
+### **Overview**
+TaijiChat includes a toggle button that allows users to control external literature search functionality. This replaces the previous automatic detection and confirmation system.
+### **User Interface**
+- **Location**: Literature toggle button appears below the chat input area
+- **Default State**: External literature search is **disabled** by default
+- **Visual Feedback**: Button changes color and text to indicate enabled/disabled state
+- **Controls**: Click to enable/disable external literature search
+### **Technical Implementation**
+- **Frontend**: Button state managed via JavaScript and CSS styling
+- **Backend**: Literature preference passed from R to Python agents
+- **Agent Integration**: `ManagerAgent.process_single_query_with_preferences()` method handles literature control
+- **Internal Data**: Paper-based analysis (internal dataset) remains always enabled

Dockerfile CHANGED Viewed

@@ -1,48 +1,48 @@
-# Base image with R and Python
-FROM rocker/r-ver:4.3.2
-# Set the Python executable path for reticulate
-ENV RETICULATE_PYTHON /usr/bin/python3
-# Install system dependencies
-RUN apt-get update && apt-get install -y \
-    python3-pip \
-    python3-dev \
-    python3-venv \
-    libpng-dev \
-    libxml2-dev \
-    libssl-dev \
-    libcurl4-openssl-dev \
-    && apt-get clean && rm -rf /var/lib/apt/lists/*
-# Install R packages
-# Added .libPaths() to ensure installation in the main library site
-RUN R -e "print(.libPaths()); install.packages(c('shiny', 'readxl', 'DT', 'dplyr', 'reticulate', 'shinythemes', 'png', 'shinyjs', 'digest'), repos='http://cran.rstudio.com/', lib=.libPaths()[1])"
-# Verify reticulate installation
-RUN R -e "if (!requireNamespace('reticulate', quietly = TRUE)) { stop('reticulate package not found after installation') } else { print(paste('reticulate version:', packageVersion('reticulate'))) }"
-# Verify png installation
-RUN R -e "if (!requireNamespace('png', quietly = TRUE)) { stop('png package not found after installation') } else { print(paste('png version:', packageVersion('png'))) }"
-# Verify shinyjs installation
-RUN R -e "if (!requireNamespace('shinyjs', quietly = TRUE)) { stop('shinyjs package not found after installation') } else { print(paste('shinyjs version:', packageVersion('shinyjs'))) }"
-# Verify digest installation
-RUN R -e "if (!requireNamespace('digest', quietly = TRUE)) { stop('digest package not found after installation') } else { print(paste('digest version:', packageVersion('digest'))) }"
-# Install Python packages
-COPY requirements.txt /app/requirements.txt
-RUN pip3 install --no-cache-dir -r /app/requirements.txt
-# Create app directory
-WORKDIR /app
-# Copy application files
-COPY . /app
-# Expose port
-EXPOSE 7860
-# Run the application
 CMD ["R", "-e", "shiny::runApp('/app', host='0.0.0.0', port=7860)"]

+# Base image with R and Python
+FROM rocker/r-ver:4.3.2
+# Set the Python executable path for reticulate
+ENV RETICULATE_PYTHON /usr/bin/python3
+# Install system dependencies
+RUN apt-get update && apt-get install -y \
+    python3-pip \
+    python3-dev \
+    python3-venv \
+    libpng-dev \
+    libxml2-dev \
+    libssl-dev \
+    libcurl4-openssl-dev \
+    && apt-get clean && rm -rf /var/lib/apt/lists/*
+# Install R packages
+# Added .libPaths() to ensure installation in the main library site
+RUN R -e "print(.libPaths()); install.packages(c('shiny', 'readxl', 'DT', 'dplyr', 'reticulate', 'shinythemes', 'png', 'shinyjs', 'digest'), repos='http://cran.rstudio.com/', lib=.libPaths()[1])"
+# Verify reticulate installation
+RUN R -e "if (!requireNamespace('reticulate', quietly = TRUE)) { stop('reticulate package not found after installation') } else { print(paste('reticulate version:', packageVersion('reticulate'))) }"
+# Verify png installation
+RUN R -e "if (!requireNamespace('png', quietly = TRUE)) { stop('png package not found after installation') } else { print(paste('png version:', packageVersion('png'))) }"
+# Verify shinyjs installation
+RUN R -e "if (!requireNamespace('shinyjs', quietly = TRUE)) { stop('shinyjs package not found after installation') } else { print(paste('shinyjs version:', packageVersion('shinyjs'))) }"
+# Verify digest installation
+RUN R -e "if (!requireNamespace('digest', quietly = TRUE)) { stop('digest package not found after installation') } else { print(paste('digest version:', packageVersion('digest'))) }"
+# Install Python packages
+COPY requirements.txt /app/requirements.txt
+RUN pip3 install --no-cache-dir -r /app/requirements.txt
+# Create app directory
+WORKDIR /app
+# Copy application files
+COPY . /app
+# Expose port
+EXPOSE 7860
+# Run the application
 CMD ["R", "-e", "shiny::runApp('/app', host='0.0.0.0', port=7860)"]

R/caching.R CHANGED Viewed

@@ -1,101 +1,101 @@
-# caching.R
-# Directory to store cache files
-CACHE_DIR <- "./cache_data"
-# Ensure cache directory exists
-if (!dir.exists(CACHE_DIR)) {
-  dir.create(CACHE_DIR, recursive = TRUE)
-}
-#' Generate a cache key from an operation name and its arguments.
-#'
-#' @param operation_name A string identifying the operation.
-#' @param ... Arguments to the operation, used to create a unique hash.
-#' @return A string representing the cache key.
-generate_cache_key <- function(operation_name, ...) {
-  # Create a list of arguments
-  args_list <- list(...)
-  # Combine operation name and a digest of the arguments
-  # Ensure consistent ordering of named arguments for consistent hashing
-  if (length(args_list) > 0) {
-    if (!is.null(names(args_list))) {
-      args_list <- args_list[order(names(args_list))]
-    }
-    # Use deparse and digest for complex arguments
-    args_digest <- digest::digest(lapply(args_list, deparse))
-    key_string <- paste(operation_name, args_digest, sep = "_")
-  } else {
-    key_string <- operation_name
-  }
-  # Sanitize the key to be a valid filename
-  key_string <- gsub("[^a-zA-Z0-9_.-]", "_", key_string)
-  return(paste0(key_string, ".rds"))
-}
-#' Retrieve an item from the cache.
-#'
-#' @param key The cache key (typically generated by generate_cache_key).
-#' @param max_age_seconds The maximum age of the cache file in seconds.
-#'                        If the cache file is older, it's considered stale.
-#'                        Default is NULL (no age check).
-#' @return The cached item, or NULL if not found or stale.
-get_cached_item <- function(key, max_age_seconds = NULL) {
-  cache_file_path <- file.path(CACHE_DIR, key)
-  if (file.exists(cache_file_path)) {
-    if (!is.null(max_age_seconds)) {
-      file_info <- file.info(cache_file_path)
-      if (difftime(Sys.time(), file_info$mtime, units = "secs") > max_age_seconds) {
-        # Cache is stale
-        message(paste("Cache stale for key:", key, "- Recomputing."))
-        return(NULL)
-      }
-    }
-    message(paste("Cache hit for key:", key))
-    return(readRDS(cache_file_path))
-  } else {
-    message(paste("Cache miss for key:", key))
-    return(NULL)
-  }
-}
-#' Save an item to the cache.
-#'
-#' @param key The cache key.
-#' @param value The item to save.
-save_cached_item <- function(key, value) {
-  if (is.null(value)) {
-    # Avoid saving NULLs if an operation truly returns NULL
-    # Or handle as an explicit cache clear if needed
-    message(paste("Skipping saving NULL value to cache for key:", key))
-    return()
-  }
-  cache_file_path <- file.path(CACHE_DIR, key)
-  tryCatch({
-    saveRDS(value, file = cache_file_path)
-    message(paste("Saved item to cache. Key:", key))
-  }, error = function(e) {
-    warning(paste("Error saving item to cache for key:", key, ":", e$message))
-  })
-}
-#' Clear the entire cache directory.
-clear_all_cache <- function() {
-  files_in_cache <- list.files(CACHE_DIR, full.names = TRUE)
-  if (length(files_in_cache) > 0) {
-    removed_files <- file.remove(files_in_cache)
-    message(paste("Cleared", sum(removed_files), "files from cache."))
-  } else {
-    message("Cache directory is already empty.")
-  }
-}
-# Ensure digest package is available
-if (!requireNamespace("digest", quietly = TRUE)) {
-  # This is a server-side script, so direct installation might be okay
-  # but ideally should be in requirements or Dockerfile.
-  # For now, just message that it's needed.
-  message("Package 'digest' is not installed. Cache key generation might not be robust. Please install it.")
 }

+# caching.R
+# Directory to store cache files
+CACHE_DIR <- "./cache_data"
+# Ensure cache directory exists
+if (!dir.exists(CACHE_DIR)) {
+  dir.create(CACHE_DIR, recursive = TRUE)
+}
+#' Generate a cache key from an operation name and its arguments.
+#'
+#' @param operation_name A string identifying the operation.
+#' @param ... Arguments to the operation, used to create a unique hash.
+#' @return A string representing the cache key.
+generate_cache_key <- function(operation_name, ...) {
+  # Create a list of arguments
+  args_list <- list(...)
+  # Combine operation name and a digest of the arguments
+  # Ensure consistent ordering of named arguments for consistent hashing
+  if (length(args_list) > 0) {
+    if (!is.null(names(args_list))) {
+      args_list <- args_list[order(names(args_list))]
+    }
+    # Use deparse and digest for complex arguments
+    args_digest <- digest::digest(lapply(args_list, deparse))
+    key_string <- paste(operation_name, args_digest, sep = "_")
+  } else {
+    key_string <- operation_name
+  }
+  # Sanitize the key to be a valid filename
+  key_string <- gsub("[^a-zA-Z0-9_.-]", "_", key_string)
+  return(paste0(key_string, ".rds"))
+}
+#' Retrieve an item from the cache.
+#'
+#' @param key The cache key (typically generated by generate_cache_key).
+#' @param max_age_seconds The maximum age of the cache file in seconds.
+#'                        If the cache file is older, it's considered stale.
+#'                        Default is NULL (no age check).
+#' @return The cached item, or NULL if not found or stale.
+get_cached_item <- function(key, max_age_seconds = NULL) {
+  cache_file_path <- file.path(CACHE_DIR, key)
+  if (file.exists(cache_file_path)) {
+    if (!is.null(max_age_seconds)) {
+      file_info <- file.info(cache_file_path)
+      if (difftime(Sys.time(), file_info$mtime, units = "secs") > max_age_seconds) {
+        # Cache is stale
+        message(paste("Cache stale for key:", key, "- Recomputing."))
+        return(NULL)
+      }
+    }
+    message(paste("Cache hit for key:", key))
+    return(readRDS(cache_file_path))
+  } else {
+    message(paste("Cache miss for key:", key))
+    return(NULL)
+  }
+}
+#' Save an item to the cache.
+#'
+#' @param key The cache key.
+#' @param value The item to save.
+save_cached_item <- function(key, value) {
+  if (is.null(value)) {
+    # Avoid saving NULLs if an operation truly returns NULL
+    # Or handle as an explicit cache clear if needed
+    message(paste("Skipping saving NULL value to cache for key:", key))
+    return()
+  }
+  cache_file_path <- file.path(CACHE_DIR, key)
+  tryCatch({
+    saveRDS(value, file = cache_file_path)
+    message(paste("Saved item to cache. Key:", key))
+  }, error = function(e) {
+    warning(paste("Error saving item to cache for key:", key, ":", e$message))
+  })
+}
+#' Clear the entire cache directory.
+clear_all_cache <- function() {
+  files_in_cache <- list.files(CACHE_DIR, full.names = TRUE)
+  if (length(files_in_cache) > 0) {
+    removed_files <- file.remove(files_in_cache)
+    message(paste("Cleared", sum(removed_files), "files from cache."))
+  } else {
+    message("Cache directory is already empty.")
+  }
+}
+# Ensure digest package is available
+if (!requireNamespace("digest", quietly = TRUE)) {
+  # This is a server-side script, so direct installation might be okay
+  # but ideally should be in requirements or Dockerfile.
+  # For now, just message that it's needed.
+  message("Package 'digest' is not installed. Cache key generation might not be robust. Please install it.")
 }

WORKFLOW_CHANGES.md ADDED Viewed

	@@ -0,0 +1,287 @@

+# TaijiChat Workflow Changes: Literature Dialog Removal
+## Overview
+This document outlines the major changes made to the TaijiChat multi-agent system to improve user experience by removing the upfront literature confirmation dialog and implementing a post-analysis literature exploration approach.
+## Problem Statement
+### Previous Workflow Issues:
+1. **User Friction**: Every query was blocked by a literature preference dialog before processing
+2. **Interruption of Flow**: Users had to make decisions before seeing any analysis results
+3. **Unclear Context**: Users couldn't make informed decisions about literature sources without seeing initial results
+4. **Pattern Matching Limitations**: Hardcoded keyword matching was unreliable for determining user intent
+## Solution Design
+### New Workflow Philosophy:
+- **Analyze First, Explore Later**: Provide immediate value with optional deeper exploration
+- **LLM-Powered Classification**: Use AI reasoning instead of pattern matching for intent detection
+- **Clear Source Distinction**: Differentiate between primary paper (guaranteed) vs external literature (supplementary)
+- **Progressive Disclosure**: Natural conversation flow with contextual followup options
+## Implementation Details
+### 1. ManagerAgent Changes (`agents/manager_agent.py`)
+#### **Removed Components:**
+```python
+# REMOVED: Literature confirmation dialog
+def _request_literature_confirmation_upfront(self, user_query: str) -> str:
+    # This entire method was removed
+```
+#### **Modified Components:**
+```python
+def _process_turn(self, user_query_text: str) -> tuple:
+    # OLD: Asked for literature preferences before processing
+    # NEW: Process directly with default settings (both sources enabled)
+    response_text = self._process_with_literature_preferences(
+        user_query_text,
+        use_paper=True,
+        use_external_literature=True
+    )
+    return response_text, False, None
+```
+#### **Enhanced Features:**
+- Proper conversation history management
+- Direct processing without interruption
+- Maintains all existing security features
+### 2. GenerationAgent Changes (`agents/generation_agent.py`)
+#### **Enhanced 13-Step Reasoning Process:**
+```
+1. Analyze the user query in detail
+2. Analyze the conversation history if there's any
+3. Analyze images, paper, data according to the plan if there's any provided
+4. Analyze errors from previous attempts if there's any
+5. Read the paper description to understand what the paper is about
+6. **NEW: QUERY TYPE CLASSIFICATION:**
+   - Is this a NEW_TASK (fresh analytical question) or FOLLOWUP_REQUEST (responding to literature offer)?
+   - If FOLLOWUP_REQUEST, what does user want: PRIMARY_PAPER, EXTERNAL_LITERATURE, or COMPREHENSIVE?
+   - Base decision on conversation context and user intent, not keywords
+   - Consider if previous response contained "Explore Supporting Literature" section
+7. Read the tools documentation thoroughly
+8. Decide which tools can be helpful when answering the query
+9. Read the data documentation
+10. Decide which datasets are relevant to the user query
+11. Decide whether the user query can be solved by paper or tools or data or a combination
+12. Decide whether the user query is about image(s)
+13. Put everything together to make a comprehensive plan
+```
+#### **New Helper Methods:**
+```python
+def _check_for_literature_offer(self, conversation_history: list) -> bool:
+    """Check if previous response contained literature exploration offer."""
+def _classify_query_type(self, user_query: str, conversation_history: list) -> dict:
+    """Provide context for LLM-based query classification."""
+def _append_literature_offer(self, explanation: str) -> str:
+    """Append literature exploration options to NEW_TASK responses."""
+```
+#### **Response Format Rules:**
+- **NEW_TASK**: Provide analysis + literature exploration offer
+- **FOLLOWUP_REQUEST**: Execute requested literature analysis without new offer
+### 3. Literature Offer Format
+#### **Clear Source Distinction:**
+```markdown
+---
+**Explore Supporting Literature:**
+📄 **Primary Paper**: Analyze the foundational research paper this website is based on for additional context about these findings.
+🔍 **Recent Publications**: Search external academic databases for the latest research on these topics.
+📚 **Comprehensive**: Get insights from both the foundational paper and recent literature.
+*Note: External literature serves as supplementary information only.*
+```
+#### **Key Benefits:**
+- **Primary Paper**: Vetted, guaranteed accuracy, foundational to website
+- **External Literature**: Recent, supplementary, not guaranteed by website
+- **User Choice**: Informed decision about source reliability vs recency
+## Workflow Examples
+### Example 1: Fresh Query → Analysis + Offer
+**User Input:** *"What are the top 5 TEXterm-specific TFs?"*
+**System Flow:**
+1. ManagerAgent processes immediately (no dialog)
+2. GenerationAgent Step 6: Classification → NEW_TASK
+3. Execute TF data analysis
+4. Return results with literature exploration offer
+**Expected Response:**
+```
+The top 5 TEXterm-specific transcription factors are:
+1. Zscan20 (p-value: 0.001)
+2. Jdp2 (p-value: 0.002)
+3. Zfp324 (p-value: 0.003)
+4. Batf (p-value: 0.004)
+5. Ikzf1 (p-value: 0.005)
+These rankings are based on statistical significance from the dataset analysis.
+---
+**Explore Supporting Literature:**
+📄 **Primary Paper**: Analyze the foundational research paper this website is based on for additional context about these TFs.
+🔍 **Recent Publications**: Search external academic databases for the latest research on these transcription factors.
+📚 **Comprehensive**: Get insights from both the foundational paper and recent literature.
+*Note: External literature serves as supplementary information only.*
+```
+### Example 2: Literature Followup → Targeted Analysis
+**User Input:** *"Search recent publications about these TFs"*
+**System Flow:**
+1. GenerationAgent detects previous literature offer
+2. Step 6: Classification → FOLLOWUP_REQUEST, intent: EXTERNAL_LITERATURE
+3. Execute literature search using previous TF context
+4. Return literature analysis (no new offer)
+**Expected Response:**
+```
+## Recent Literature on TEXterm Transcription Factors
+Based on external academic database search, here are key recent findings:
+**Zscan20 in T Cell Exhaustion:**
+Recent studies [1] demonstrate that Zscan20 acts as a master regulator of terminal exhaustion...
+**Jdp2 Regulatory Networks:**
+New research [2] reveals Jdp2's role in chromatin remodeling during exhaustion programming...
+[Additional literature analysis with proper citations]
+## References
+[1] Smith et al. (2023). Zscan20 controls T cell exhaustion pathways. Nature Immunology.
+[2] Johnson et al. (2023). Jdp2 in immune regulation. Cell.
+*This analysis is based on external literature sources and serves as supplementary information.*
+```
+### Example 3: Primary Paper Request → Paper Analysis
+**User Input:** *"What does the foundational study say about these TFs?"*
+**System Flow:**
+1. Step 6: Classification → FOLLOWUP_REQUEST, intent: PRIMARY_PAPER
+2. Analyze paper.pdf with previous TF context
+3. Return focused paper analysis
+## Technical Implementation
+### Query Classification Logic
+The system uses LLM reasoning instead of pattern matching:
+```python
+# Context provided to LLM for classification
+classification_instructions = f"\\n\\nQUERY CLASSIFICATION CONTEXT:"
+classification_instructions += f"\\n- Previous response had literature offer: {has_previous_offer}"
+if has_previous_offer:
+    classification_instructions += "\\n- This query might be a FOLLOWUP_REQUEST for literature analysis"
+    classification_instructions += "\\n- Determine user intent: PRIMARY_PAPER, EXTERNAL_LITERATURE, or COMPREHENSIVE"
+    classification_instructions += "\\n- If FOLLOWUP_REQUEST, do NOT append literature offer to final response"
+else:
+    classification_instructions += "\\n- This is likely a NEW_TASK requiring fresh analysis"
+    classification_instructions += "\\n- If status is CODE_COMPLETE, append literature offer to explanation"
+```
+### Conversation History Management
+```python
+# ManagerAgent properly manages conversation state
+def _process_with_literature_preferences(self, user_query: str, use_paper: bool, use_external_literature: bool) -> str:
+    # Process query and get response
+    final_response = final_plan_for_turn.get('explanation', 'Processing completed.')
+    # Add response to conversation history for future context
+    self.conversation_history.append({"role": "assistant", "content": final_response})
+    return final_response
+```
+## Benefits
+### 1. **Improved User Experience**
+- **Immediate Response**: No blocking dialogs
+- **Natural Flow**: Conversational interaction
+- **Informed Decisions**: Literature choices made after seeing results
+### 2. **Better Intent Recognition**
+- **LLM-Powered**: Semantic understanding vs keyword matching
+- **Context-Aware**: Considers conversation history
+- **Flexible**: Adapts to various user phrasings
+### 3. **Clear Information Hierarchy**
+- **Primary Sources**: Guaranteed accuracy, foundational research
+- **Supplementary Sources**: Recent literature, clearly marked as external
+- **User Agency**: Informed choice about source reliability
+### 4. **Maintained Security**
+- **All existing safeguards preserved**
+- **SupervisorAgent**: Code review unchanged
+- **ExecutorAgent**: Sandboxed execution unchanged
+- **Literature preferences**: Still respected in execution
+## Testing
+### Test Scenarios Created:
+1. **Fresh Query Test**: Verify immediate analysis + literature offer
+2. **External Literature Followup**: Test FOLLOWUP_REQUEST classification
+3. **Primary Paper Followup**: Test paper analysis request
+4. **Conversation Context**: Verify proper history management
+### Test File: `test_workflow.py`
+- Comprehensive workflow testing
+- Conversation history verification
+- Response format validation
+## Migration Notes
+### Backward Compatibility
+- **R Interface**: `handle_literature_confirmation()` method marked as LEGACY but preserved
+- **Existing Data**: All dataset access patterns unchanged
+- **Security Model**: No changes to permission structure
+### Deployment Considerations
+- **No breaking changes** to existing functionality
+- **Enhanced user experience** without compromising security
+- **Gradual rollout** possible through feature flags if needed
+## Future Enhancements
+### Potential Improvements:
+1. **Smart Context Extraction**: Better extraction of relevant terms from previous analysis for literature searches
+2. **Citation Quality**: Enhanced citation formatting and link validation
+3. **User Preferences**: Optional user settings to remember literature preferences
+4. **Analytics**: Track which literature options users choose most frequently
+## Conclusion
+The new workflow successfully addresses the original user experience issues while maintaining all security and functionality requirements. The system now provides immediate value to users while offering natural pathways for deeper exploration, creating a more engaging and efficient interaction model.
+Key success metrics:
+- ✅ **Removed user friction**: No blocking dialogs
+- ✅ **Maintained security**: All safeguards preserved
+- ✅ **Improved classification**: LLM-based intent recognition
+- ✅ **Clear information hierarchy**: Distinguished source types
+- ✅ **Natural conversation flow**: Progressive disclosure model

agents/executor_agent.py CHANGED Viewed

@@ -1,81 +1,81 @@
-# agents/executor_agent.py
-import io
-import contextlib
-import json
-# IMPORTANT: We will dynamically import the tools module when needed for execution
-# to ensure it uses the version from the tools/ directory.
-# import tools.agent_tools as agent_tools # Avoid top-level import for now
-class ExecutorAgent:
-    def __init__(self, openai_api_key: str = None):
-        print("ExecutorAgent initialized.")
-        self.openai_api_key = openai_api_key
-    def execute_code(self, python_code: str) -> dict:
-        print(f"ExecutorAgent received code for execution:\n{python_code}")
-        # Dynamically import agent_tools from the tools directory
-        # This assumes main.py is run from the project root.
-        import sys
-        import os
-        # Add project root to sys.path to allow `import tools.agent_tools`
-        # This might be needed if executor_agent.py itself is run from a different context later,
-        # but for now, assuming standard Python module resolution from root where main.py is.
-        # script_dir = os.path.dirname(os.path.abspath(__file__))
-        # project_root = os.path.abspath(os.path.join(script_dir, ".."))
-        # if project_root not in sys.path:
-        #     sys.path.insert(0, project_root)
-        try:
-            # Ensure tools.agent_tools can be imported relative to project root
-            import tools.agent_tools as agent_tools_module
-        except ImportError as e:
-            return {
-                "execution_output": f"ExecutorAgent Error: Could not import agent_tools module. Ensure it's in tools/ and __init__.py might be needed in tools/. Error: {e}",
-                "execution_status": "ERROR: ImportFailure"
-            }
-        except Exception as e:
-             return {
-                "execution_output": f"ExecutorAgent Error: Unexpected error during tools import. Error: {e}",
-                "execution_status": "ERROR: ImportFailure"
-            }
-        # Create a restricted global scope for exec()
-        # Only allow access to the agent_tools module (aliased as 'tools') and builtins
-        restricted_globals = {
-            "__builtins__": __builtins__, # Standard builtins (print, len, etc.)
-            "tools": agent_tools_module,
-            "json": json,
-            "api_key": self.openai_api_key
-        }
-        # No separate locals, exec will use restricted_globals as locals too
-        captured_output = io.StringIO()
-        try:
-            with contextlib.redirect_stdout(captured_output):
-                exec(python_code, restricted_globals)
-            output_str = captured_output.getvalue()
-            return {
-                "execution_output": output_str.strip() if output_str else "(No output printed by code)",
-                "execution_status": "SUCCESS"
-            }
-        except Exception as e:
-            error_details = f"{type(e).__name__}: {str(e)}"
-            # Try to get traceback if possible, though might be complex to format cleanly here
-            return {
-                "execution_output": f"Execution Error!\n{error_details}",
-                "execution_status": f"ERROR: {type(e).__name__}"
-            }
-if __name__ == '__main__':
-    # For testing individual agent if needed
-    # executor = ExecutorAgent()
-    # result = executor.execute_code("print(tools.get_biorxiv_paper_url())")
-    # print(result)
-    # result_error = executor.execute_code("print(tools.non_existent_tool())")
-    # print(result_error)
-    # result_unsafe = executor.execute_code("import os\nprint('dangerous')")
-    # print(result_unsafe) # Should fail at exec if globals are well-restricted from direct os import
     print("ExecutorAgent should be orchestrated by the ManagerAgent.")

+# agents/executor_agent.py
+import io
+import contextlib
+import json
+# IMPORTANT: We will dynamically import the tools module when needed for execution
+# to ensure it uses the version from the tools/ directory.
+# import tools.agent_tools as agent_tools # Avoid top-level import for now
+class ExecutorAgent:
+    def __init__(self, openai_api_key: str = None):
+        print("ExecutorAgent initialized.")
+        self.openai_api_key = openai_api_key
+    def execute_code(self, python_code: str) -> dict:
+        print(f"ExecutorAgent received code for execution:\n{python_code}")
+        # Dynamically import agent_tools from the tools directory
+        # This assumes main.py is run from the project root.
+        import sys
+        import os
+        # Add project root to sys.path to allow `import tools.agent_tools`
+        # This might be needed if executor_agent.py itself is run from a different context later,
+        # but for now, assuming standard Python module resolution from root where main.py is.
+        # script_dir = os.path.dirname(os.path.abspath(__file__))
+        # project_root = os.path.abspath(os.path.join(script_dir, ".."))
+        # if project_root not in sys.path:
+        #     sys.path.insert(0, project_root)
+        try:
+            # Ensure tools.agent_tools can be imported relative to project root
+            import tools.agent_tools as agent_tools_module
+        except ImportError as e:
+            return {
+                "execution_output": f"ExecutorAgent Error: Could not import agent_tools module. Ensure it's in tools/ and __init__.py might be needed in tools/. Error: {e}",
+                "execution_status": "ERROR: ImportFailure"
+            }
+        except Exception as e:
+             return {
+                "execution_output": f"ExecutorAgent Error: Unexpected error during tools import. Error: {e}",
+                "execution_status": "ERROR: ImportFailure"
+            }
+        # Create a restricted global scope for exec()
+        # Only allow access to the agent_tools module (aliased as 'tools') and builtins
+        restricted_globals = {
+            "__builtins__": __builtins__, # Standard builtins (print, len, etc.)
+            "tools": agent_tools_module,
+            "json": json,
+            "api_key": self.openai_api_key
+        }
+        # No separate locals, exec will use restricted_globals as locals too
+        captured_output = io.StringIO()
+        try:
+            with contextlib.redirect_stdout(captured_output):
+                exec(python_code, restricted_globals)
+            output_str = captured_output.getvalue()
+            return {
+                "execution_output": output_str.strip() if output_str else "(No output printed by code)",
+                "execution_status": "SUCCESS"
+            }
+        except Exception as e:
+            error_details = f"{type(e).__name__}: {str(e)}"
+            # Try to get traceback if possible, though might be complex to format cleanly here
+            return {
+                "execution_output": f"Execution Error!\n{error_details}",
+                "execution_status": f"ERROR: {type(e).__name__}"
+            }
+if __name__ == '__main__':
+    # For testing individual agent if needed
+    # executor = ExecutorAgent()
+    # result = executor.execute_code("print(tools.get_biorxiv_paper_url())")
+    # print(result)
+    # result_error = executor.execute_code("print(tools.non_existent_tool())")
+    # print(result_error)
+    # result_unsafe = executor.execute_code("import os\nprint('dangerous')")
+    # print(result_unsafe) # Should fail at exec if globals are well-restricted from direct os import
     print("ExecutorAgent should be orchestrated by the ManagerAgent.")

agents/generation_agent.py CHANGED Viewed

@@ -33,11 +33,10 @@ For EVERY query, you MUST follow this EXACT 13-step structured approach:
 3. Analyze images, paper, data according to the plan if there's any provided
 4. Analyze errors from previous attempts if there's any
 5. Read the paper description to understand what the paper is about
-6. **QUERY TYPE CLASSIFICATION:**
-   - Is this a NEW_TASK (fresh analytical question) or FOLLOWUP_REQUEST (responding to literature offer)?
-   - If FOLLOWUP_REQUEST, what does user want: PRIMARY_PAPER, EXTERNAL_LITERATURE, or COMPREHENSIVE?
-   - Base decision on conversation context and user intent, not keywords
-   - Consider if previous response contained "Explore Supporting Literature" section
 7. Read the tools documentation thoroughly
 8. Decide which tools can be helpful when answering the query; if there are any, prepare the list of tools to be used
 9. Read the data documentation
@@ -93,21 +92,8 @@ You MUST output a single JSON object with these fields:
 - "explanation": User-facing explanation or report of your findings
 **RESPONSE FORMAT RULES:**
-- For NEW_TASK queries with status CODE_COMPLETE: Always append literature exploration offer to explanation
-- For FOLLOWUP_REQUEST queries: Provide requested analysis without offering literature options again
-- Literature offer format:
----
-**Explore Supporting Literature:**
-📄 **Primary Paper**: Analyze the foundational research paper this website is based on for additional context about these findings.
-🔍 **Recent Publications**: Search external academic databases for the latest research on these topics.
-📚 **Comprehensive**: Get insights from both the foundational paper and recent literature.
-*Note: External literature serves as supplementary information only.*
 **STATUS TYPES:**
 - "AWAITING_DATA": Use when fetching data with Python tools
@@ -433,20 +419,30 @@ class GenerationAgent:
             }
         # Look for JSON blocks in conversation history
-        for turn in reversed(conversation_history[-6:]): # Check last 6 turns for relevant context
             content_from_history = turn.get("content", "")
             # Regex to find ```json ... ``` blocks
-            # Using re.DOTALL to make . match newlines within the JSON block
-            # Using re.IGNORECASE for ```json opening tag flexibility (though strictly lowercase is typical)
-            json_block_match = re.search(r"```json\\s*(.*?)\\s*```", content_from_history, flags=re.DOTALL | re.IGNORECASE)
             if not json_block_match:
                 continue # No JSON block in this turn's content
             try:
                 # The actual JSON string is in group(1) of the match
                 json_string_from_history = json_block_match.group(1)
                 json_data_from_history = json.loads(json_string_from_history)
                 # PHASE 3 FOR IMAGES: Check for image description JSON
                 if "description" in json_data_from_history and "intermediate_data_for_llm" not in json_data_from_history: # Avoid conflict if key names overlap
@@ -473,16 +469,11 @@ class GenerationAgent:
                     if user_query.startswith("FINAL_FORMATTING_REQUEST:"):
                         query_for_classification = user_query.split("Original query: ", 1)[-1] if "Original query: " in user_query else user_query
-                    classification_context = self._classify_query_type(query_for_classification, conversation_history)
-                    is_followup = classification_context.get("likely_followup", False)
-                    # Append literature offer for NEW_TASK queries
                     final_explanation = base_explanation
-                    if not is_followup:
-                        final_explanation = self._append_literature_offer(base_explanation)
                     return {
-                        "thought": "I have retrieved the top transcription factors as requested from history and will present them with appropriate literature exploration options if this is a new task.",
                         "status": "CODE_COMPLETE",
                         "python_code": "",
                         "explanation": final_explanation
@@ -493,12 +484,39 @@ class GenerationAgent:
                     intermediate_content = json_data_from_history["intermediate_data_for_llm"]
                     # Determine if this data is from a literature search tool
                     is_literature_search_data = False
-                    if "CONTEXT_FROM_RESOURCE_FETCH" in content_from_history:
-                        # Example history content: "CONTEXT_FROM_RESOURCE_FETCH (original_identifier: print(json.dumps({'intermediate_data_for_llm': tools.multi_source_literature_search(...)}))): ..."
                         if ("tools.multi_source_literature_search" in content_from_history or
-                            "tools.fetch_text_from_urls" in content_from_history):
                             is_literature_search_data = True
                     if is_literature_search_data:
                         print(f"[GenerationAgent] Found literature search data (intermediate_data_for_llm) in history. Proceeding to summarization.")
@@ -625,9 +643,13 @@ class GenerationAgent:
                 else:
                     print(f"[GenerationAgent] Found unknown JSON format in conversation history, continuing search")
-            except json.JSONDecodeError:
                 continue  # not valid JSON, skip
         # PHASE 1: No special conditions met, start with data/image fetching
         try:
             # Format conversation history for reference
@@ -660,14 +682,8 @@ class GenerationAgent:
             has_previous_offer = classification_context.get("has_previous_offer", False)
             classification_instructions = f"\\n\\nQUERY CLASSIFICATION CONTEXT:"
-            classification_instructions += f"\\n- Previous response had literature offer: {has_previous_offer}"
-            if has_previous_offer:
-                classification_instructions += "\\n- This query might be a FOLLOWUP_REQUEST for literature analysis"
-                classification_instructions += "\\n- Determine user intent: PRIMARY_PAPER, EXTERNAL_LITERATURE, or COMPREHENSIVE"
-                classification_instructions += "\\n- If FOLLOWUP_REQUEST, do NOT append literature offer to final response"
-            else:
-                classification_instructions += "\\n- This is likely a NEW_TASK requiring fresh analysis"
-                classification_instructions += "\\n- If status is CODE_COMPLETE, append literature offer to explanation"
             comprehensive_text_prompt += classification_instructions
@@ -948,23 +964,11 @@ class GenerationAgent:
     def _append_literature_offer(self, explanation: str) -> str:
         """
-        Append literature exploration options to final responses for NEW_TASK queries.
         """
-        literature_offer = """
----
-**Explore Supporting Literature:**
-📄 **Primary Paper**: Analyze the foundational research paper this website is based on for additional context about these findings.
-🔍 **Recent Publications**: Search external academic databases for the latest research on these topics.
-📚 **Comprehensive**: Get insights from both the foundational paper and recent literature.
-*Note: External literature serves as supplementary information only.*"""
-        return explanation + literature_offer
 if __name__ == '__main__':
     print("GenerationAgent should be orchestrated by the ManagerAgent.")

 3. Analyze images, paper, data according to the plan if there's any provided
 4. Analyze errors from previous attempts if there's any
 5. Read the paper description to understand what the paper is about
+6. **QUERY ANALYSIS:**
+   - Understand the user's specific request and intent
+   - Consider conversation context and previous responses
+   - Focus on providing direct, helpful analysis
 7. Read the tools documentation thoroughly
 8. Decide which tools can be helpful when answering the query; if there are any, prepare the list of tools to be used
 9. Read the data documentation
 - "explanation": User-facing explanation or report of your findings
 **RESPONSE FORMAT RULES:**
+- Provide clear, direct responses to user queries
+- Focus on the analysis results without additional promotional content
 **STATUS TYPES:**
 - "AWAITING_DATA": Use when fetching data with Python tools
             }
         # Look for JSON blocks in conversation history
+        print(f"[DEBUG] GenerationAgent: Searching for JSON blocks in conversation history")
+        print(f"[DEBUG] - Checking last {min(6, len(conversation_history))} turns out of {len(conversation_history)} total")
+        for i, turn in enumerate(reversed(conversation_history[-6:])): # Check last 6 turns for relevant context
             content_from_history = turn.get("content", "")
+            turn_index = len(conversation_history) - 6 + i
+            print(f"[DEBUG] - Turn {turn_index}: {turn.get('role')} - Content: {content_from_history[:100]}...")
             # Regex to find ```json ... ``` blocks
+            # FIX: Use single backslash for whitespace patterns
+            json_block_match = re.search(r"```json\s*(.*?)\s*```", content_from_history, flags=re.DOTALL | re.IGNORECASE)
             if not json_block_match:
+                print(f"[DEBUG] - Turn {turn_index}: No JSON block found")
                 continue # No JSON block in this turn's content
+            print(f"[DEBUG] - Turn {turn_index}: Found JSON block! Extracting...")
             try:
                 # The actual JSON string is in group(1) of the match
                 json_string_from_history = json_block_match.group(1)
+                print(f"[DEBUG] - JSON string extracted: {json_string_from_history[:100]}...")
                 json_data_from_history = json.loads(json_string_from_history)
+                print(f"[DEBUG] - JSON parsed successfully! Keys: {list(json_data_from_history.keys())}")
                 # PHASE 3 FOR IMAGES: Check for image description JSON
                 if "description" in json_data_from_history and "intermediate_data_for_llm" not in json_data_from_history: # Avoid conflict if key names overlap
                     if user_query.startswith("FINAL_FORMATTING_REQUEST:"):
                         query_for_classification = user_query.split("Original query: ", 1)[-1] if "Original query: " in user_query else user_query
+                    # Literature offers are now controlled by the toggle button in the frontend
                     final_explanation = base_explanation
                     return {
+                        "thought": "I have retrieved the top transcription factors as requested from history and will present them directly.",
                         "status": "CODE_COMPLETE",
                         "python_code": "",
                         "explanation": final_explanation
                     intermediate_content = json_data_from_history["intermediate_data_for_llm"]
                     # Determine if this data is from a literature search tool
+                    # Check the structure of the data to identify if it's literature data
                     is_literature_search_data = False
+                    print(f"[DEBUG] Checking intermediate_data_for_llm structure for literature detection")
+                    print(f"[DEBUG] - Data type: {type(intermediate_content)}")
+                    if isinstance(intermediate_content, list) and intermediate_content:
+                        # Check if the first item has literature-like structure
+                        first_item = intermediate_content[0]
+                        print(f"[DEBUG] - First item type: {type(first_item)}")
+                        if isinstance(first_item, dict):
+                            first_item_keys = list(first_item.keys())
+                            print(f"[DEBUG] - First item keys: {first_item_keys}")
+                            # Literature papers typically have title, authors, abstract, etc.
+                            literature_indicators = ['title', 'authors', 'abstract', 'doi', 'source_api']
+                            # Check if at least 2 literature indicators are present
+                            found_indicators = sum(1 for key in literature_indicators if key in first_item_keys)
+                            print(f"[DEBUG] - Found {found_indicators} literature indicators out of {len(literature_indicators)}")
+                            if found_indicators >= 2:
+                                is_literature_search_data = True
+                                print(f"[DEBUG] - Identified as literature search data based on structure")
+                    # Fallback: check content for literature search patterns (legacy method)
+                    if not is_literature_search_data:
                         if ("tools.multi_source_literature_search" in content_from_history or
+                            "tools.fetch_text_from_urls" in content_from_history or
+                            "CONTEXT_FROM_RESOURCE_FETCH" in content_from_history):
                             is_literature_search_data = True
+                            print(f"[DEBUG] - Identified as literature search data based on content patterns")
                     if is_literature_search_data:
                         print(f"[GenerationAgent] Found literature search data (intermediate_data_for_llm) in history. Proceeding to summarization.")
                 else:
                     print(f"[GenerationAgent] Found unknown JSON format in conversation history, continuing search")
+            except json.JSONDecodeError as e:
+                print(f"[DEBUG] - Turn {turn_index}: JSON parsing error: {e}")
+                print(f"[DEBUG] - Raw JSON string that failed: {json_string_from_history[:200]}...")
                 continue  # not valid JSON, skip
+        print(f"[DEBUG] GenerationAgent: No valid JSON blocks found in conversation history")
         # PHASE 1: No special conditions met, start with data/image fetching
         try:
             # Format conversation history for reference
             has_previous_offer = classification_context.get("has_previous_offer", False)
             classification_instructions = f"\\n\\nQUERY CLASSIFICATION CONTEXT:"
+            # Literature offers are now controlled by the toggle button in the frontend
+            classification_instructions += "\\n- Focus on providing clear, direct responses to user queries"
             comprehensive_text_prompt += classification_instructions
     def _append_literature_offer(self, explanation: str) -> str:
         """
+        DISABLED: Literature exploration options are now controlled by toggle button.
         """
+        # Literature offers are now controlled by the toggle button in the frontend
+        # Return explanation unchanged
+        return explanation
 if __name__ == '__main__':
     print("GenerationAgent should be orchestrated by the ManagerAgent.")

agents/manager_agent.py CHANGED Viewed

@@ -138,38 +138,6 @@ class ManagerAgent:
     # REMOVED: _request_literature_confirmation_upfront - no longer needed
     # Literature preferences are now handled as post-analysis options
-    def handle_literature_confirmation(self, user_response: str, original_query: str = None) -> str:
-        """
-        LEGACY: Public method to handle literature confirmation from R/UI.
-        NOTE: This method may no longer be needed with the new workflow, but kept for backward compatibility.
-        Literature preferences are now handled as post-analysis followup requests.
-        """
-        print(f"[ManagerAgent] Received literature confirmation: {user_response}")
-        # Get the stored query
-        user_query = self.pending_literature_query or original_query
-        if not user_query:
-            return "No pending literature query found."
-        # Clear the pending query
-        self.pending_literature_query = None
-        # Process the query with the specified literature preferences
-        try:
-            # Parse user preferences
-            use_paper = user_response in ["both", "paper"]
-            use_external_literature = user_response in ["both", "external"]
-            print(f"[ManagerAgent] Processing with preferences - Paper: {use_paper}, External: {use_external_literature}")
-            self._send_thought_to_r(f"Processing with literature preferences: {user_response}")
-            # Continue with the full processing pipeline with preferences
-            return self._process_with_literature_preferences(user_query, use_paper, use_external_literature)
-        except Exception as e:
-            error_msg = f"Error processing with literature preferences: {str(e)}"
-            print(f"[ManagerAgent] {error_msg}")
-            return error_msg
     def _continue_with_literature_plan(self, plan: dict) -> str:
         """Continue processing with the original plan that includes literature search."""
@@ -224,13 +192,14 @@ class ManagerAgent:
         print(f"[Manager._process_turn] Processing query: '{user_query_text[:100]}...'")
         self._send_thought_to_r(f"Processing query: '{user_query_text[:50]}...'") # THOUGHT
-        # --- Process directly with default literature settings (both sources enabled) ---
-        print(f"[Manager._process_turn] Processing with default literature settings")
-        self._send_thought_to_r("Processing query with both literature sources enabled...")
         response_text = self._process_with_literature_preferences(
             user_query_text,
-            use_paper=True,
-            use_external_literature=True
         )
         return response_text, False, None
@@ -256,6 +225,9 @@ class ManagerAgent:
             current_query_for_generation_agent = user_query
             previous_generation_attempts = []
             # This variable will hold the File ID if the manager uploads a file and needs to re-call generate_code_plan
             image_file_id_for_analysis_step = None
@@ -275,7 +247,13 @@ class ManagerAgent:
                 call_ga_again_for_follow_up = True
                 current_plan_holder = final_plan_for_turn
-                while call_ga_again_for_follow_up:
                     call_ga_again_for_follow_up = False
                     if not self.generation_agent:
@@ -286,6 +264,13 @@ class ManagerAgent:
                     self._send_thought_to_r(f"Asking GenerationAgent for a plan with literature preferences...")
                     # Pass literature preferences to GenerationAgent
                     plan = self.generation_agent.generate_code_plan(
                         user_query=effective_query_for_ga,
@@ -331,8 +316,17 @@ class ManagerAgent:
                         if supervisor_status != "APPROVED_FOR_EXECUTION":
                             return f"Code execution blocked by supervisor: {supervisor_feedback}"
                         # Execute the code
                         self._send_thought_to_r("Executing code...")
                         execution_result = self.executor_agent.execute_code(code_to_execute)
                         execution_output = execution_result.get("execution_output", "")
                         execution_status = execution_result.get("execution_status", "UNKNOWN")
@@ -340,14 +334,26 @@ class ManagerAgent:
                         if execution_status == "SUCCESS":
                             self._send_thought_to_r(f"Code execution successful.")
                             # Add results to conversation history
-                            self.conversation_history.append({"role": "assistant", "content": f"```json\n{execution_output}\n```"})
                             # Always continue to GenerationAgent for final formatting
                             # This ensures literature offers and proper response formatting
                             if "intermediate_data_for_llm" in execution_output:
                                 call_ga_again_for_follow_up = True
                             else:
                                 # Instead of returning raw execution output, let GenerationAgent format it
                                 call_ga_again_for_follow_up = True
                                 # Set a flag so GenerationAgent knows this is final formatting phase
@@ -381,6 +387,14 @@ class ManagerAgent:
             self.conversation_history.append({"role": "assistant", "content": error_msg})
             return error_msg
     def process_single_query(self, user_query_text: str, conversation_history_from_r: list = None) -> str:
         """
         Processes a single query, suitable for calling from an external system like R/Shiny.

     # REMOVED: _request_literature_confirmation_upfront - no longer needed
     # Literature preferences are now handled as post-analysis options
     def _continue_with_literature_plan(self, plan: dict) -> str:
         """Continue processing with the original plan that includes literature search."""
         print(f"[Manager._process_turn] Processing query: '{user_query_text[:100]}...'")
         self._send_thought_to_r(f"Processing query: '{user_query_text[:50]}...'") # THOUGHT
+        # --- Process with dynamic literature settings based on frontend preference ---
+        use_external_literature = getattr(self, 'literature_enabled', False)  # Default to False
+        print(f"[Manager._process_turn] Processing with literature enabled: {use_external_literature}")
+        self._send_thought_to_r(f"Processing query with external literature: {'enabled' if use_external_literature else 'disabled'}")
         response_text = self._process_with_literature_preferences(
             user_query_text,
+            use_paper=True,  # Keep paper (internal data) always enabled
+            use_external_literature=use_external_literature  # Use frontend preference
         )
         return response_text, False, None
             current_query_for_generation_agent = user_query
             previous_generation_attempts = []
+            # Track attempted operations to prevent infinite loops
+            attempted_operations = set()
             # This variable will hold the File ID if the manager uploads a file and needs to re-call generate_code_plan
             image_file_id_for_analysis_step = None
                 call_ga_again_for_follow_up = True
                 current_plan_holder = final_plan_for_turn
+                while call_ga_again_for_follow_up and current_data_fetch_attempt < max_data_fetch_attempts_per_generation:
+                    current_data_fetch_attempt += 1
+                    print(f"[DEBUG] Data fetch attempt {current_data_fetch_attempt}/{max_data_fetch_attempts_per_generation}")
+                    if current_data_fetch_attempt > max_data_fetch_attempts_per_generation:
+                        print(f"[DEBUG] Maximum data fetch attempts reached for generation {current_generation_attempt}")
+                        break
                     call_ga_again_for_follow_up = False
                     if not self.generation_agent:
                     self._send_thought_to_r(f"Asking GenerationAgent for a plan with literature preferences...")
+                    # DEBUG: Log conversation history being passed to GenerationAgent
+                    print(f"[DEBUG] Passing conversation history to GenerationAgent:")
+                    print(f"[DEBUG] - History length: {len(self.conversation_history)}")
+                    print(f"[DEBUG] - History roles: {[msg['role'] for msg in self.conversation_history]}")
+                    for i, msg in enumerate(self.conversation_history[-3:]):  # Show last 3 messages
+                        print(f"[DEBUG] - Message {len(self.conversation_history)-3+i}: {msg['role']} - {msg['content'][:100]}...")
                     # Pass literature preferences to GenerationAgent
                     plan = self.generation_agent.generate_code_plan(
                         user_query=effective_query_for_ga,
                         if supervisor_status != "APPROVED_FOR_EXECUTION":
                             return f"Code execution blocked by supervisor: {supervisor_feedback}"
+                        # Check if this operation has been attempted before to prevent loops
+                        operation_signature = f"{code_to_execute.strip()[:100]}"  # Use first 100 chars as signature
+                        if operation_signature in attempted_operations:
+                            print(f"[DEBUG] Loop detected! Operation already attempted: {operation_signature[:50]}...")
+                            return "Loop detected: This operation has already been attempted. Please try a different approach."
                         # Execute the code
                         self._send_thought_to_r("Executing code...")
+                        attempted_operations.add(operation_signature)
+                        print(f"[DEBUG] Added operation to attempted set: {operation_signature[:50]}...")
                         execution_result = self.executor_agent.execute_code(code_to_execute)
                         execution_output = execution_result.get("execution_output", "")
                         execution_status = execution_result.get("execution_status", "UNKNOWN")
                         if execution_status == "SUCCESS":
                             self._send_thought_to_r(f"Code execution successful.")
+                            # DEBUG: Log conversation history before storing results
+                            print(f"[DEBUG] Conversation history length before storing ExecutorAgent result: {len(self.conversation_history)}")
+                            print(f"[DEBUG] ExecutorAgent output being stored: {execution_output[:200]}...")
                             # Add results to conversation history
+                            stored_content = f"```json\n{execution_output}\n```"
+                            self.conversation_history.append({"role": "assistant", "content": stored_content})
+                            # DEBUG: Log conversation history after storing results
+                            print(f"[DEBUG] Conversation history length after storing result: {len(self.conversation_history)}")
+                            print(f"[DEBUG] Last conversation entry: {self.conversation_history[-1]['content'][:200]}...")
+                            print(f"[DEBUG] Full conversation history roles: {[msg['role'] for msg in self.conversation_history]}")
                             # Always continue to GenerationAgent for final formatting
                             # This ensures literature offers and proper response formatting
                             if "intermediate_data_for_llm" in execution_output:
+                                print(f"[DEBUG] Found 'intermediate_data_for_llm' in output - continuing to GenerationAgent for processing")
                                 call_ga_again_for_follow_up = True
                             else:
+                                print(f"[DEBUG] No 'intermediate_data_for_llm' found - requesting final formatting from GenerationAgent")
                                 # Instead of returning raw execution output, let GenerationAgent format it
                                 call_ga_again_for_follow_up = True
                                 # Set a flag so GenerationAgent knows this is final formatting phase
             self.conversation_history.append({"role": "assistant", "content": error_msg})
             return error_msg
+    def process_single_query_with_preferences(self, user_query_text: str,
+                                             conversation_history_from_r: list = None,
+                                             literature_enabled: bool = True) -> str:
+        """Process query with explicit literature preference from frontend."""
+        print(f"[Manager.process_single_query_with_preferences] Literature enabled: {literature_enabled}")
+        self.literature_enabled = literature_enabled
+        return self.process_single_query(user_query_text, conversation_history_from_r)
     def process_single_query(self, user_query_text: str, conversation_history_from_r: list = None) -> str:
         """
         Processes a single query, suitable for calling from an external system like R/Shiny.

agents/supervisor_agent.py CHANGED Viewed

@@ -1,268 +1,268 @@
-# agents/supervisor_agent.py
-import json
-import time # Added for polling
-import os # Added for path operations
-from openai import OpenAI # Ensure OpenAI is imported for client usage
-# --- Constants for the Supervisor Assistant ---
-SUPERVISOR_ASSISTANT_NAME = "TaijiChat Code Review Assistant"
-SUPERVISOR_INSTRUCTIONS_TEMPLATE_FILE = "supervisor_instructions_template.md" # New
-# Define JSON examples as separate strings
-EXAMPLE_JSON_APPROVED = '''
-{
-  "safety_feedback": "Code uses the 'tools' module correctly and employs permitted built-in functions for data processing. No forbidden operations detected.",
-  "safety_status": "APPROVED_FOR_EXECUTION",
-  "user_facing_rejection_reason": "Approved."
-}
-'''
-EXAMPLE_JSON_REJECTED = '''
-{
-  "safety_feedback": "Forbidden operation detected: Code attempts to import the 'os' module, which is not allowed (Rule I.4).",
-  "safety_status": "REJECTED_NEEDS_REVISION",
-  "user_facing_rejection_reason": "The code attempted a restricted operation that is not permitted for safety reasons."
-}
-'''
-# SUPERVISOR_ASSISTANT_INSTRUCTIONS f-string is now removed. It will be loaded from the template file.
-POLLING_INTERVAL_S = 1
-MAX_POLLING_ATTEMPTS = 60
-class SupervisorAgent:
-    def __init__(self, client_openai: OpenAI = None):
-        self.client = client_openai
-        self.supervisor_assistant = None
-        self.formatted_instructions = self._load_and_format_instructions() # Load instructions
-        if not self.formatted_instructions:
-            print("SupervisorAgent Critical: Failed to load or format supervisor instructions. Assistant may not function correctly.")
-            # Potentially raise an error or set a flag indicating a critical failure
-        if self.client:
-            try:
-                self._create_or_retrieve_supervisor_assistant()
-                print("SupervisorAgent: Successfully created/retrieved Supervisor Assistant.")
-            except Exception as e:
-                print(f"SupervisorAgent Error: Could not create/retrieve/update Supervisor Assistant: {str(e)}")
-                self.supervisor_assistant = None
-        else:
-            print("SupervisorAgent Critical: OpenAI client not provided. Cannot create Supervisor Assistant.")
-    def _load_and_format_instructions(self) -> str:
-        """Loads instructions from a template file and formats them with JSON examples."""
-        try:
-            # Construct path relative to this file's location
-            script_dir = os.path.dirname(os.path.abspath(__file__))
-            template_path = os.path.join(script_dir, SUPERVISOR_INSTRUCTIONS_TEMPLATE_FILE)
-            with open(template_path, 'r', encoding='utf-8') as f:
-                template_content = f.read()
-            return template_content.format(
-                example_approved_json_str=EXAMPLE_JSON_APPROVED,
-                example_rejected_json_str=EXAMPLE_JSON_REJECTED
-            )
-        except FileNotFoundError:
-            print(f"SupervisorAgent Error: Instructions template file not found at {template_path}")
-            return None
-        except KeyError as e:
-            print(f"SupervisorAgent Error: Placeholder key missing in instructions template: {e}")
-            return None
-        except Exception as e:
-            print(f"SupervisorAgent Error: Failed to load/format instructions: {e}")
-            return None
-    def _create_or_retrieve_supervisor_assistant(self):
-        if not self.client or not self.formatted_instructions:
-            if not self.formatted_instructions:
-                print("SupervisorAgent Error: Cannot create/retrieve assistant because instructions are missing.")
-            if not self.client:
-                print("SupervisorAgent Error: Cannot create/retrieve assistant because OpenAI client is missing.")
-            return
-        try:
-            print("SupervisorAgent: Attempting to list existing assistants...")
-            try:
-                assistants = self.client.beta.assistants.list(order="desc", limit="20")
-            except Exception as list_error:
-                print(f"SupervisorAgent Error: Failed to list assistants. Error type: {type(list_error).__name__}, Error: {str(list_error)}")
-                if hasattr(list_error, 'response'):
-                    print(f"SupervisorAgent Error: Response status: {list_error.response.status_code if hasattr(list_error.response, 'status_code') else 'N/A'}")
-                    print(f"SupervisorAgent Error: Response body: {list_error.response.text if hasattr(list_error.response, 'text') else 'N/A'}")
-                raise
-            found_assistant = None
-            for assistant in assistants.data:
-                if assistant.name == SUPERVISOR_ASSISTANT_NAME:
-                    found_assistant = assistant
-                    print(f"SupervisorAgent: Found existing assistant with ID: {assistant.id}")
-                    break
-            if found_assistant:
-                print(f"SupervisorAgent: Updating existing assistant {found_assistant.id}...")
-                try:
-                    self.supervisor_assistant = self.client.beta.assistants.update(
-                        assistant_id=found_assistant.id,
-                        instructions=self.formatted_instructions,
-                        model="gpt-4o",
-                        tools=[]
-                    )
-                    print(f"SupervisorAgent: Successfully updated assistant {self.supervisor_assistant.id}")
-                except Exception as update_error:
-                    print(f"SupervisorAgent Error: Failed to update assistant: {str(update_error)}")
-                    raise
-            else:
-                print(f"SupervisorAgent: Creating new assistant '{SUPERVISOR_ASSISTANT_NAME}'...")
-                try:
-                    self.supervisor_assistant = self.client.beta.assistants.create(
-                        name=SUPERVISOR_ASSISTANT_NAME,
-                        instructions=self.formatted_instructions,
-                        model="gpt-4o",
-                        tools=[]
-                    )
-                    print(f"SupervisorAgent: Successfully created assistant with ID: {self.supervisor_assistant.id}")
-                except Exception as create_error:
-                    print(f"SupervisorAgent Error: Failed to create assistant: {str(create_error)}")
-                    raise
-            if not self.supervisor_assistant:
-                raise Exception("Assistant object is None after creation/update")
-        except Exception as e:
-            print(f"SupervisorAgent Error: Could not create/retrieve/update Supervisor Assistant: {str(e)}")
-            self.supervisor_assistant = None
-            raise  # Re-raise the exception to be handled by the caller
-    def review_code(self, python_code: str, thought: str): # Removed client_openai from params
-        print(f"SupervisorAgent.review_code received code. Thought: {thought[:100]}...") # Log more of the thought
-        if not python_code.strip():
-             print("SupervisorAgent: No actual code provided for review. Approving as safe.")
-             return {"safety_feedback": "No code provided by Generation Agent.", "safety_status": "APPROVED_FOR_EXECUTION", "user_facing_rejection_reason": ""}
-        if not self.client or not self.supervisor_assistant:
-            print("SupervisorAgent Error: OpenAI client or Supervisor Assistant not available for code review.")
-            return {"safety_feedback": "Error: Supervisor Agent not properly initialized.", "safety_status": "REJECTED_NEEDS_REVISION", "user_facing_rejection_reason": "The supervisor agent encountered an error."}
-        thread = None # Initialize for the finally block
-        try:
-            # 1. Construct User Message Content
-            user_message_content = (
-                f"Please review the following Python code for safety and correctness based on your instructions.\\n\\n"
-                f"Context (AI's plan that generated this code):\\n{thought}\\n\\n"
-                f"Python Code to Review:\\n```python\\n{python_code}\\n```\\n"
-                f"Ensure your response is only the required JSON object."
-            )
-            # 2. Create a Thread
-            # print("SupervisorAgent: Creating new Thread for review...")
-            thread = self.client.beta.threads.create()
-            # print(f"SupervisorAgent: Thread created: {thread.id}")
-            # 3. Add message to Thread
-            self.client.beta.threads.messages.create(
-                thread_id=thread.id,
-                role="user",
-                content=user_message_content
-            )
-            # print("SupervisorAgent: Message added to Thread.")
-            # 4. Create Run
-            # print(f"SupervisorAgent: Creating Run for Assistant {self.supervisor_assistant.id} on Thread {thread.id}...")
-            run = self.client.beta.threads.runs.create(
-                thread_id=thread.id,
-                assistant_id=self.supervisor_assistant.id
-            )
-            # print(f"SupervisorAgent: Run created: {run.id}, Status: {run.status}")
-            # 5. Poll Run for completion
-            attempts = 0
-            while run.status in ["queued", "in_progress"] and attempts < MAX_POLLING_ATTEMPTS:
-                time.sleep(POLLING_INTERVAL_S)
-                run = self.client.beta.threads.runs.retrieve(thread_id=thread.id, run_id=run.id)
-                # print(f"SupervisorAgent: Polling Run {run.id}: Status: {run.status}") # Verbose
-                attempts += 1
-            # 6. Process Run Outcome
-            if run.status == "completed":
-                # print(f"SupervisorAgent: Run {run.id} completed.")
-                messages_response = self.client.beta.threads.messages.list(thread_id=thread.id, order="desc", limit=1)
-                if messages_response.data and messages_response.data[0].content and messages_response.data[0].content[0].type == "text":
-                    assistant_response_json_str = messages_response.data[0].content[0].text.value
-                    # print(f"SupervisorAgent: Raw LLM review JSON: {assistant_response_json_str}")
-                    # Strip markdown fences if present
-                    if assistant_response_json_str.startswith("```json"):
-                        assistant_response_json_str = assistant_response_json_str[len("```json"):].strip()
-                    if assistant_response_json_str.startswith("```"):
-                        assistant_response_json_str = assistant_response_json_str[len("```"):].strip()
-                    if assistant_response_json_str.endswith("```"):
-                        assistant_response_json_str = assistant_response_json_str[:-len("```")].strip()
-                    try:
-                        parsed_response = json.loads(assistant_response_json_str)
-                        # Validate structure
-                        if not all(k in parsed_response for k in ["safety_feedback", "safety_status", "user_facing_rejection_reason"]):
-                            print("SupervisorAgent Error: LLM review JSON missing required keys.")
-                            return {
-                                "safety_feedback": "Internal Error: LLM review response malformed (missing keys).",
-                                "safety_status": "REJECTED_NEEDS_REVISION",
-                                "user_facing_rejection_reason": "The code review process encountered an internal error."
-                            }
-                        # Validate safety_status value
-                        if parsed_response["safety_status"] not in ["APPROVED_FOR_EXECUTION", "REJECTED_NEEDS_REVISION"]:
-                            print(f"SupervisorAgent Error: LLM returned an invalid safety_status value: {parsed_response['safety_status']}.")
-                            # Override with rejection if status is invalid
-                            parsed_response["safety_feedback"] += " (Original status was invalid)"
-                            parsed_response["safety_status"] = "REJECTED_NEEDS_REVISION"
-                            # Ensure user_facing_rejection_reason is generic if not already sensible
-                            if not parsed_response.get("user_facing_rejection_reason"):
-                                parsed_response["user_facing_rejection_reason"] = "The code could not be validated due to an internal status error."
-                        # Ensure user_facing_rejection_reason is present if rejected, and appropriate if approved
-                        if parsed_response["safety_status"] == "REJECTED_NEEDS_REVISION" and not parsed_response.get("user_facing_rejection_reason","").strip():
-                            parsed_response["user_facing_rejection_reason"] = "The proposed code could not be approved due to safety or correctness concerns." # Default user reason
-                        elif parsed_response["safety_status"] == "APPROVED_FOR_EXECUTION" and not parsed_response.get("user_facing_rejection_reason","").strip():
-                             parsed_response["user_facing_rejection_reason"] = "Approved."
-                        return parsed_response
-                    except json.JSONDecodeError as e:
-                        print(f"SupervisorAgent JSONDecodeError: Could not parse LLM review JSON: {e}. Response: {assistant_response_json_str}")
-                        return {
-                            "safety_feedback": f"Internal Error: Failed to parse LLM review JSON. {e}",
-                            "safety_status": "REJECTED_NEEDS_REVISION",
-                            "user_facing_rejection_reason": "The code review result was unreadable."
-                        }
-                else:
-                    print("SupervisorAgent Error: No valid message content from assistant after review run completion.")
-                    return {
-                        "safety_feedback": "Internal Error: No content from supervisor assistant.",
-                        "safety_status": "REJECTED_NEEDS_REVISION",
-                        "user_facing_rejection_reason": "The supervisor agent provided no response."
-                    }
-            else:
-                error_message = f"Review run failed or timed out. Status: {run.status}"
-                if run.last_error:
-                    error_message += f" Last Error: {run.last_error.message}"
-                print(f"SupervisorAgent Error: {error_message}")
-                return {"safety_feedback": error_message, "safety_status": "REJECTED_NEEDS_REVISION", "user_facing_rejection_reason": "The code review process encountered an error."}
-        except Exception as e:
-            print(f"SupervisorAgent Error: General exception during review_code: {e}")
-            return {
-                "safety_feedback": f"General exception in review_code: {e}",
-                "safety_status": "REJECTED_NEEDS_REVISION",
-                "user_facing_rejection_reason": "A general error occurred during code review."
-            }
-        finally:
-            # 7. Delete Thread
-            if thread:
-                try:
-                    # print(f"SupervisorAgent: Deleting Thread {thread.id}...")
-                    self.client.beta.threads.delete(thread.id)
-                    # print(f"SupervisorAgent: Thread {thread.id} deleted.")
-                except Exception as e:
-                    print(f"SupervisorAgent Error: Failed to delete Thread {thread.id if thread else 'Unknown'}: {e}")
-if __name__ == '__main__':
     print("SupervisorAgent should be orchestrated by the ManagerAgent.")

+# agents/supervisor_agent.py
+import json
+import time # Added for polling
+import os # Added for path operations
+from openai import OpenAI # Ensure OpenAI is imported for client usage
+# --- Constants for the Supervisor Assistant ---
+SUPERVISOR_ASSISTANT_NAME = "TaijiChat Code Review Assistant"
+SUPERVISOR_INSTRUCTIONS_TEMPLATE_FILE = "supervisor_instructions_template.md" # New
+# Define JSON examples as separate strings
+EXAMPLE_JSON_APPROVED = '''
+{
+  "safety_feedback": "Code uses the 'tools' module correctly and employs permitted built-in functions for data processing. No forbidden operations detected.",
+  "safety_status": "APPROVED_FOR_EXECUTION",
+  "user_facing_rejection_reason": "Approved."
+}
+'''
+EXAMPLE_JSON_REJECTED = '''
+{
+  "safety_feedback": "Forbidden operation detected: Code attempts to import the 'os' module, which is not allowed (Rule I.4).",
+  "safety_status": "REJECTED_NEEDS_REVISION",
+  "user_facing_rejection_reason": "The code attempted a restricted operation that is not permitted for safety reasons."
+}
+'''
+# SUPERVISOR_ASSISTANT_INSTRUCTIONS f-string is now removed. It will be loaded from the template file.
+POLLING_INTERVAL_S = 1
+MAX_POLLING_ATTEMPTS = 60
+class SupervisorAgent:
+    def __init__(self, client_openai: OpenAI = None):
+        self.client = client_openai
+        self.supervisor_assistant = None
+        self.formatted_instructions = self._load_and_format_instructions() # Load instructions
+        if not self.formatted_instructions:
+            print("SupervisorAgent Critical: Failed to load or format supervisor instructions. Assistant may not function correctly.")
+            # Potentially raise an error or set a flag indicating a critical failure
+        if self.client:
+            try:
+                self._create_or_retrieve_supervisor_assistant()
+                print("SupervisorAgent: Successfully created/retrieved Supervisor Assistant.")
+            except Exception as e:
+                print(f"SupervisorAgent Error: Could not create/retrieve/update Supervisor Assistant: {str(e)}")
+                self.supervisor_assistant = None
+        else:
+            print("SupervisorAgent Critical: OpenAI client not provided. Cannot create Supervisor Assistant.")
+    def _load_and_format_instructions(self) -> str:
+        """Loads instructions from a template file and formats them with JSON examples."""
+        try:
+            # Construct path relative to this file's location
+            script_dir = os.path.dirname(os.path.abspath(__file__))
+            template_path = os.path.join(script_dir, SUPERVISOR_INSTRUCTIONS_TEMPLATE_FILE)
+            with open(template_path, 'r', encoding='utf-8') as f:
+                template_content = f.read()
+            return template_content.format(
+                example_approved_json_str=EXAMPLE_JSON_APPROVED,
+                example_rejected_json_str=EXAMPLE_JSON_REJECTED
+            )
+        except FileNotFoundError:
+            print(f"SupervisorAgent Error: Instructions template file not found at {template_path}")
+            return None
+        except KeyError as e:
+            print(f"SupervisorAgent Error: Placeholder key missing in instructions template: {e}")
+            return None
+        except Exception as e:
+            print(f"SupervisorAgent Error: Failed to load/format instructions: {e}")
+            return None
+    def _create_or_retrieve_supervisor_assistant(self):
+        if not self.client or not self.formatted_instructions:
+            if not self.formatted_instructions:
+                print("SupervisorAgent Error: Cannot create/retrieve assistant because instructions are missing.")
+            if not self.client:
+                print("SupervisorAgent Error: Cannot create/retrieve assistant because OpenAI client is missing.")
+            return
+        try:
+            print("SupervisorAgent: Attempting to list existing assistants...")
+            try:
+                assistants = self.client.beta.assistants.list(order="desc", limit="20")
+            except Exception as list_error:
+                print(f"SupervisorAgent Error: Failed to list assistants. Error type: {type(list_error).__name__}, Error: {str(list_error)}")
+                if hasattr(list_error, 'response'):
+                    print(f"SupervisorAgent Error: Response status: {list_error.response.status_code if hasattr(list_error.response, 'status_code') else 'N/A'}")
+                    print(f"SupervisorAgent Error: Response body: {list_error.response.text if hasattr(list_error.response, 'text') else 'N/A'}")
+                raise
+            found_assistant = None
+            for assistant in assistants.data:
+                if assistant.name == SUPERVISOR_ASSISTANT_NAME:
+                    found_assistant = assistant
+                    print(f"SupervisorAgent: Found existing assistant with ID: {assistant.id}")
+                    break
+            if found_assistant:
+                print(f"SupervisorAgent: Updating existing assistant {found_assistant.id}...")
+                try:
+                    self.supervisor_assistant = self.client.beta.assistants.update(
+                        assistant_id=found_assistant.id,
+                        instructions=self.formatted_instructions,
+                        model="gpt-4o",
+                        tools=[]
+                    )
+                    print(f"SupervisorAgent: Successfully updated assistant {self.supervisor_assistant.id}")
+                except Exception as update_error:
+                    print(f"SupervisorAgent Error: Failed to update assistant: {str(update_error)}")
+                    raise
+            else:
+                print(f"SupervisorAgent: Creating new assistant '{SUPERVISOR_ASSISTANT_NAME}'...")
+                try:
+                    self.supervisor_assistant = self.client.beta.assistants.create(
+                        name=SUPERVISOR_ASSISTANT_NAME,
+                        instructions=self.formatted_instructions,
+                        model="gpt-4o",
+                        tools=[]
+                    )
+                    print(f"SupervisorAgent: Successfully created assistant with ID: {self.supervisor_assistant.id}")
+                except Exception as create_error:
+                    print(f"SupervisorAgent Error: Failed to create assistant: {str(create_error)}")
+                    raise
+            if not self.supervisor_assistant:
+                raise Exception("Assistant object is None after creation/update")
+        except Exception as e:
+            print(f"SupervisorAgent Error: Could not create/retrieve/update Supervisor Assistant: {str(e)}")
+            self.supervisor_assistant = None
+            raise  # Re-raise the exception to be handled by the caller
+    def review_code(self, python_code: str, thought: str): # Removed client_openai from params
+        print(f"SupervisorAgent.review_code received code. Thought: {thought[:100]}...") # Log more of the thought
+        if not python_code.strip():
+             print("SupervisorAgent: No actual code provided for review. Approving as safe.")
+             return {"safety_feedback": "No code provided by Generation Agent.", "safety_status": "APPROVED_FOR_EXECUTION", "user_facing_rejection_reason": ""}
+        if not self.client or not self.supervisor_assistant:
+            print("SupervisorAgent Error: OpenAI client or Supervisor Assistant not available for code review.")
+            return {"safety_feedback": "Error: Supervisor Agent not properly initialized.", "safety_status": "REJECTED_NEEDS_REVISION", "user_facing_rejection_reason": "The supervisor agent encountered an error."}
+        thread = None # Initialize for the finally block
+        try:
+            # 1. Construct User Message Content
+            user_message_content = (
+                f"Please review the following Python code for safety and correctness based on your instructions.\\n\\n"
+                f"Context (AI's plan that generated this code):\\n{thought}\\n\\n"
+                f"Python Code to Review:\\n```python\\n{python_code}\\n```\\n"
+                f"Ensure your response is only the required JSON object."
+            )
+            # 2. Create a Thread
+            # print("SupervisorAgent: Creating new Thread for review...")
+            thread = self.client.beta.threads.create()
+            # print(f"SupervisorAgent: Thread created: {thread.id}")
+            # 3. Add message to Thread
+            self.client.beta.threads.messages.create(
+                thread_id=thread.id,
+                role="user",
+                content=user_message_content
+            )
+            # print("SupervisorAgent: Message added to Thread.")
+            # 4. Create Run
+            # print(f"SupervisorAgent: Creating Run for Assistant {self.supervisor_assistant.id} on Thread {thread.id}...")
+            run = self.client.beta.threads.runs.create(
+                thread_id=thread.id,
+                assistant_id=self.supervisor_assistant.id
+            )
+            # print(f"SupervisorAgent: Run created: {run.id}, Status: {run.status}")
+            # 5. Poll Run for completion
+            attempts = 0
+            while run.status in ["queued", "in_progress"] and attempts < MAX_POLLING_ATTEMPTS:
+                time.sleep(POLLING_INTERVAL_S)
+                run = self.client.beta.threads.runs.retrieve(thread_id=thread.id, run_id=run.id)
+                # print(f"SupervisorAgent: Polling Run {run.id}: Status: {run.status}") # Verbose
+                attempts += 1
+            # 6. Process Run Outcome
+            if run.status == "completed":
+                # print(f"SupervisorAgent: Run {run.id} completed.")
+                messages_response = self.client.beta.threads.messages.list(thread_id=thread.id, order="desc", limit=1)
+                if messages_response.data and messages_response.data[0].content and messages_response.data[0].content[0].type == "text":
+                    assistant_response_json_str = messages_response.data[0].content[0].text.value
+                    # print(f"SupervisorAgent: Raw LLM review JSON: {assistant_response_json_str}")
+                    # Strip markdown fences if present
+                    if assistant_response_json_str.startswith("```json"):
+                        assistant_response_json_str = assistant_response_json_str[len("```json"):].strip()
+                    if assistant_response_json_str.startswith("```"):
+                        assistant_response_json_str = assistant_response_json_str[len("```"):].strip()
+                    if assistant_response_json_str.endswith("```"):
+                        assistant_response_json_str = assistant_response_json_str[:-len("```")].strip()
+                    try:
+                        parsed_response = json.loads(assistant_response_json_str)
+                        # Validate structure
+                        if not all(k in parsed_response for k in ["safety_feedback", "safety_status", "user_facing_rejection_reason"]):
+                            print("SupervisorAgent Error: LLM review JSON missing required keys.")
+                            return {
+                                "safety_feedback": "Internal Error: LLM review response malformed (missing keys).",
+                                "safety_status": "REJECTED_NEEDS_REVISION",
+                                "user_facing_rejection_reason": "The code review process encountered an internal error."
+                            }
+                        # Validate safety_status value
+                        if parsed_response["safety_status"] not in ["APPROVED_FOR_EXECUTION", "REJECTED_NEEDS_REVISION"]:
+                            print(f"SupervisorAgent Error: LLM returned an invalid safety_status value: {parsed_response['safety_status']}.")
+                            # Override with rejection if status is invalid
+                            parsed_response["safety_feedback"] += " (Original status was invalid)"
+                            parsed_response["safety_status"] = "REJECTED_NEEDS_REVISION"
+                            # Ensure user_facing_rejection_reason is generic if not already sensible
+                            if not parsed_response.get("user_facing_rejection_reason"):
+                                parsed_response["user_facing_rejection_reason"] = "The code could not be validated due to an internal status error."
+                        # Ensure user_facing_rejection_reason is present if rejected, and appropriate if approved
+                        if parsed_response["safety_status"] == "REJECTED_NEEDS_REVISION" and not parsed_response.get("user_facing_rejection_reason","").strip():
+                            parsed_response["user_facing_rejection_reason"] = "The proposed code could not be approved due to safety or correctness concerns." # Default user reason
+                        elif parsed_response["safety_status"] == "APPROVED_FOR_EXECUTION" and not parsed_response.get("user_facing_rejection_reason","").strip():
+                             parsed_response["user_facing_rejection_reason"] = "Approved."
+                        return parsed_response
+                    except json.JSONDecodeError as e:
+                        print(f"SupervisorAgent JSONDecodeError: Could not parse LLM review JSON: {e}. Response: {assistant_response_json_str}")
+                        return {
+                            "safety_feedback": f"Internal Error: Failed to parse LLM review JSON. {e}",
+                            "safety_status": "REJECTED_NEEDS_REVISION",
+                            "user_facing_rejection_reason": "The code review result was unreadable."
+                        }
+                else:
+                    print("SupervisorAgent Error: No valid message content from assistant after review run completion.")
+                    return {
+                        "safety_feedback": "Internal Error: No content from supervisor assistant.",
+                        "safety_status": "REJECTED_NEEDS_REVISION",
+                        "user_facing_rejection_reason": "The supervisor agent provided no response."
+                    }
+            else:
+                error_message = f"Review run failed or timed out. Status: {run.status}"
+                if run.last_error:
+                    error_message += f" Last Error: {run.last_error.message}"
+                print(f"SupervisorAgent Error: {error_message}")
+                return {"safety_feedback": error_message, "safety_status": "REJECTED_NEEDS_REVISION", "user_facing_rejection_reason": "The code review process encountered an error."}
+        except Exception as e:
+            print(f"SupervisorAgent Error: General exception during review_code: {e}")
+            return {
+                "safety_feedback": f"General exception in review_code: {e}",
+                "safety_status": "REJECTED_NEEDS_REVISION",
+                "user_facing_rejection_reason": "A general error occurred during code review."
+            }
+        finally:
+            # 7. Delete Thread
+            if thread:
+                try:
+                    # print(f"SupervisorAgent: Deleting Thread {thread.id}...")
+                    self.client.beta.threads.delete(thread.id)
+                    # print(f"SupervisorAgent: Thread {thread.id} deleted.")
+                except Exception as e:
+                    print(f"SupervisorAgent Error: Failed to delete Thread {thread.id if thread else 'Unknown'}: {e}")
+if __name__ == '__main__':
     print("SupervisorAgent should be orchestrated by the ManagerAgent.")

cache_data/excel_schema_cache.json CHANGED Viewed

@@ -1,6 +1,6 @@
-{
-    "freshness_signature": {
-        "multi-omicsdata.xlsx": 1748219732.5558078
-    },
-    "formatted_schema_string": "DYNAMICALLY DISCOVERED EXCEL SCHEMAS (first sheet columns shown):\n- File: 'www\\multi-omicsdata.xlsx' (Identifier: 'multi_omicsdata')\n  Sheets: [Dataset detail, specific TF-Taiji, newspecific TF-Taiji, Validation list, T cell differentiation map, Kay nanostring panel, Meeting summary, Kays wetlab to do, Wang Lab work to do, Philip et al. 2017 Nature sampl, T cell migration associated gen, KO EX-> proEX, Reprogramming MP -> TRM, Reprogramming TEX -> TRM, Reprogramming TEX -> proEX]\n  Columns (first sheet): [Author, Lab, Year, DOI, Accession, Data type, Species, Infection, Naive, MP (MPEC), ... (total 23 columns)]\n- File: 'www\\networkanalysis\\comp_log2FC_RegulatedData_TRMTEXterm.xlsx' (Identifier: 'comp_log2FC_RegulatedData_TRMTEXterm')\n  Sheets: [Worksheet]\n  Columns (first sheet): [Unnamed: 0, Ahr, Arid3a, Arnt, Arntl, Atf1, Atf2, Atf3, Atf4, Atf7, ... (total 199 columns)]\n- File: 'www\\old files\\log2FC_RegulatedData_TRMTEXterm.xlsx' (Identifier: 'log2FC_RegulatedData_TRMTEXterm')\n  Sheets: [Worksheet]\n  Columns (first sheet): [Unnamed: 0, Ahr, Arid3a, Arnt, Arntl, Atf1, Atf2, Atf3, Atf4, Atf7, ... (total 199 columns)]\n- File: 'www\\tablePagerank\\MP.xlsx' (Identifier: 'MP')\n  Sheets: [Sheet1]\n  Columns (first sheet): [TF, Naive_Kaech_Kaech, Naive_Kaech_Chung, Naive_Mackay_Chung, Naive_MilnerAug_Chung, Naive_Renkema_Chung, Naive_Scott_Scott, MP_Kaech_Chung, MP_Kaech_Kaech, MP_Kaech_Scott, ... (total 43 columns)]\n- File: 'www\\tablePagerank\\Naive.xlsx' (Identifier: 'Naive')\n  Sheets: [Sheet1]\n  Columns (first sheet): [TF, Naive_Kaech_Kaech, Naive_Kaech_Chung, Naive_Mackay_Chung, Naive_MilnerAug_Chung, Naive_Renkema_Chung, Naive_Scott_Scott, MP_Kaech_Chung, MP_Kaech_Kaech, MP_Kaech_Scott, ... (total 43 columns)]\n- File: 'www\\tablePagerank\\Table_TF PageRank Scores for Audrey.xlsx' (Identifier: 'Table_TF_PageRank_Scores_for_Audrey')\n  Sheets: [Fig_1F (Multi state-specific TF, Fig_1G (Single state-specific T]\n  Columns (first sheet): [Unnamed: 0, Category, Cell-state specificity, Naive_Kaech_Kaech, Naive_Kaech_Chung, Naive_Mackay_Chung, Naive_MilnerAug_Chung, Naive_Renkema_Chung, Naive_Scott_Scott, MP_Kaech_Chung, ... (total 45 columns)]\n- File: 'www\\tablePagerank\\TCM.xlsx' (Identifier: 'TCM')\n  Sheets: [Sheet1]\n  Columns (first sheet): [TF, Naive_Kaech_Kaech, Naive_Kaech_Chung, Naive_Mackay_Chung, Naive_MilnerAug_Chung, Naive_Renkema_Chung, Naive_Scott_Scott, MP_Kaech_Chung, MP_Kaech_Kaech, MP_Kaech_Scott, ... (total 43 columns)]\n- File: 'www\\tablePagerank\\TE.xlsx' (Identifier: 'TE')\n  Sheets: [Sheet1]\n  Columns (first sheet): [TF, Naive_Kaech_Kaech, Naive_Kaech_Chung, Naive_Mackay_Chung, Naive_MilnerAug_Chung, Naive_Renkema_Chung, Naive_Scott_Scott, MP_Kaech_Chung, MP_Kaech_Kaech, MP_Kaech_Scott, ... (total 43 columns)]\n- File: 'www\\tablePagerank\\TEM.xlsx' (Identifier: 'TEM')\n  Sheets: [Sheet1]\n  Columns (first sheet): [TF, Naive_Kaech_Kaech, Naive_Kaech_Chung, Naive_Mackay_Chung, Naive_MilnerAug_Chung, Naive_Renkema_Chung, Naive_Scott_Scott, MP_Kaech_Chung, MP_Kaech_Kaech, MP_Kaech_Scott, ... (total 43 columns)]\n- File: 'www\\tablePagerank\\TEXeff.xlsx' (Identifier: 'TEXeff')\n  Sheets: [Sheet1]\n  Columns (first sheet): [TF, Naive_Kaech_Kaech, Naive_Kaech_Chung, Naive_Mackay_Chung, Naive_MilnerAug_Chung, Naive_Renkema_Chung, Naive_Scott_Scott, MP_Kaech_Chung, MP_Kaech_Kaech, MP_Kaech_Scott, ... (total 43 columns)]\n- File: 'www\\tablePagerank\\TEXprog.xlsx' (Identifier: 'TEXprog')\n  Sheets: [Sheet1]\n  Columns (first sheet): [TF, Naive_Kaech_Kaech, Naive_Kaech_Chung, Naive_Mackay_Chung, Naive_MilnerAug_Chung, Naive_Renkema_Chung, Naive_Scott_Scott, MP_Kaech_Chung, MP_Kaech_Kaech, MP_Kaech_Scott, ... (total 43 columns)]\n- File: 'www\\tablePagerank\\TEXterm.xlsx' (Identifier: 'TEXterm')\n  Sheets: [Sheet1]\n  Columns (first sheet): [TF, Naive_Kaech_Kaech, Naive_Kaech_Chung, Naive_Mackay_Chung, Naive_MilnerAug_Chung, Naive_Renkema_Chung, Naive_Scott_Scott, MP_Kaech_Chung, MP_Kaech_Kaech, MP_Kaech_Scott, ... (total 43 columns)]\n- File: 'www\\tablePagerank\\TRM.xlsx' (Identifier: 'TRM')\n  Sheets: [Sheet1]\n  Columns (first sheet): [TF, Naive_Kaech_Kaech, Naive_Kaech_Chung, Naive_Mackay_Chung, Naive_MilnerAug_Chung, Naive_Renkema_Chung, Naive_Scott_Scott, MP_Kaech_Chung, MP_Kaech_Kaech, MP_Kaech_Scott, ... (total 43 columns)]\n- File: 'www\\tfcommunities\\texcommunities.xlsx' (Identifier: 'texcommunities')\n  Sheets: [TEX Communities, TEX_c1, TEX_c2, TEX_c3, TEX_c4, TEX_c5, TRM Communities, TRM_c1, TRM_c2, TRM_c3, TRM_c4, TRM_c5]\n  Columns (first sheet): [TEX Communities, TF Members]\n- File: 'www\\tfcommunities\\trmcommunities.xlsx' (Identifier: 'trmcommunities')\n  Sheets: [Sheet1]\n  Columns (first sheet): [TRM Communities, TF Members]\n- File: 'www\\TFcorintextrm\\TF-TFcorTRMTEX.xlsx' (Identifier: 'TF_TFcorTRMTEX')\n  Sheets: [Sheet1]\n  Columns (first sheet): [TF Name, TF Merged Graph Path]\n- File: 'www\\waveanalysis\\searchtfwaves.xlsx' (Identifier: 'searchtfwaves')\n  Sheets: [Sheet1]\n  Columns (first sheet): [Wave 1, Wave 2, Wave 3, Wave 4, Wave 5, Wave 6, Wave 7]"
 }

+{
+    "freshness_signature": {
+        "multi-omicsdata.xlsx": 1748219732.5558078
+    },
+    "formatted_schema_string": "DYNAMICALLY DISCOVERED EXCEL SCHEMAS (first sheet columns shown):\n- File: 'www\\multi-omicsdata.xlsx' (Identifier: 'multi_omicsdata')\n  Sheets: [Dataset detail, specific TF-Taiji, newspecific TF-Taiji, Validation list, T cell differentiation map, Kay nanostring panel, Meeting summary, Kays wetlab to do, Wang Lab work to do, Philip et al. 2017 Nature sampl, T cell migration associated gen, KO EX-> proEX, Reprogramming MP -> TRM, Reprogramming TEX -> TRM, Reprogramming TEX -> proEX]\n  Columns (first sheet): [Author, Lab, Year, DOI, Accession, Data type, Species, Infection, Naive, MP (MPEC), ... (total 23 columns)]\n- File: 'www\\networkanalysis\\comp_log2FC_RegulatedData_TRMTEXterm.xlsx' (Identifier: 'comp_log2FC_RegulatedData_TRMTEXterm')\n  Sheets: [Worksheet]\n  Columns (first sheet): [Unnamed: 0, Ahr, Arid3a, Arnt, Arntl, Atf1, Atf2, Atf3, Atf4, Atf7, ... (total 199 columns)]\n- File: 'www\\old files\\log2FC_RegulatedData_TRMTEXterm.xlsx' (Identifier: 'log2FC_RegulatedData_TRMTEXterm')\n  Sheets: [Worksheet]\n  Columns (first sheet): [Unnamed: 0, Ahr, Arid3a, Arnt, Arntl, Atf1, Atf2, Atf3, Atf4, Atf7, ... (total 199 columns)]\n- File: 'www\\tablePagerank\\MP.xlsx' (Identifier: 'MP')\n  Sheets: [Sheet1]\n  Columns (first sheet): [TF, Naive_Kaech_Kaech, Naive_Kaech_Chung, Naive_Mackay_Chung, Naive_MilnerAug_Chung, Naive_Renkema_Chung, Naive_Scott_Scott, MP_Kaech_Chung, MP_Kaech_Kaech, MP_Kaech_Scott, ... (total 43 columns)]\n- File: 'www\\tablePagerank\\Naive.xlsx' (Identifier: 'Naive')\n  Sheets: [Sheet1]\n  Columns (first sheet): [TF, Naive_Kaech_Kaech, Naive_Kaech_Chung, Naive_Mackay_Chung, Naive_MilnerAug_Chung, Naive_Renkema_Chung, Naive_Scott_Scott, MP_Kaech_Chung, MP_Kaech_Kaech, MP_Kaech_Scott, ... (total 43 columns)]\n- File: 'www\\tablePagerank\\Table_TF PageRank Scores for Audrey.xlsx' (Identifier: 'Table_TF_PageRank_Scores_for_Audrey')\n  Sheets: [Fig_1F (Multi state-specific TF, Fig_1G (Single state-specific T]\n  Columns (first sheet): [Unnamed: 0, Category, Cell-state specificity, Naive_Kaech_Kaech, Naive_Kaech_Chung, Naive_Mackay_Chung, Naive_MilnerAug_Chung, Naive_Renkema_Chung, Naive_Scott_Scott, MP_Kaech_Chung, ... (total 45 columns)]\n- File: 'www\\tablePagerank\\TCM.xlsx' (Identifier: 'TCM')\n  Sheets: [Sheet1]\n  Columns (first sheet): [TF, Naive_Kaech_Kaech, Naive_Kaech_Chung, Naive_Mackay_Chung, Naive_MilnerAug_Chung, Naive_Renkema_Chung, Naive_Scott_Scott, MP_Kaech_Chung, MP_Kaech_Kaech, MP_Kaech_Scott, ... (total 43 columns)]\n- File: 'www\\tablePagerank\\TE.xlsx' (Identifier: 'TE')\n  Sheets: [Sheet1]\n  Columns (first sheet): [TF, Naive_Kaech_Kaech, Naive_Kaech_Chung, Naive_Mackay_Chung, Naive_MilnerAug_Chung, Naive_Renkema_Chung, Naive_Scott_Scott, MP_Kaech_Chung, MP_Kaech_Kaech, MP_Kaech_Scott, ... (total 43 columns)]\n- File: 'www\\tablePagerank\\TEM.xlsx' (Identifier: 'TEM')\n  Sheets: [Sheet1]\n  Columns (first sheet): [TF, Naive_Kaech_Kaech, Naive_Kaech_Chung, Naive_Mackay_Chung, Naive_MilnerAug_Chung, Naive_Renkema_Chung, Naive_Scott_Scott, MP_Kaech_Chung, MP_Kaech_Kaech, MP_Kaech_Scott, ... (total 43 columns)]\n- File: 'www\\tablePagerank\\TEXeff.xlsx' (Identifier: 'TEXeff')\n  Sheets: [Sheet1]\n  Columns (first sheet): [TF, Naive_Kaech_Kaech, Naive_Kaech_Chung, Naive_Mackay_Chung, Naive_MilnerAug_Chung, Naive_Renkema_Chung, Naive_Scott_Scott, MP_Kaech_Chung, MP_Kaech_Kaech, MP_Kaech_Scott, ... (total 43 columns)]\n- File: 'www\\tablePagerank\\TEXprog.xlsx' (Identifier: 'TEXprog')\n  Sheets: [Sheet1]\n  Columns (first sheet): [TF, Naive_Kaech_Kaech, Naive_Kaech_Chung, Naive_Mackay_Chung, Naive_MilnerAug_Chung, Naive_Renkema_Chung, Naive_Scott_Scott, MP_Kaech_Chung, MP_Kaech_Kaech, MP_Kaech_Scott, ... (total 43 columns)]\n- File: 'www\\tablePagerank\\TEXterm.xlsx' (Identifier: 'TEXterm')\n  Sheets: [Sheet1]\n  Columns (first sheet): [TF, Naive_Kaech_Kaech, Naive_Kaech_Chung, Naive_Mackay_Chung, Naive_MilnerAug_Chung, Naive_Renkema_Chung, Naive_Scott_Scott, MP_Kaech_Chung, MP_Kaech_Kaech, MP_Kaech_Scott, ... (total 43 columns)]\n- File: 'www\\tablePagerank\\TRM.xlsx' (Identifier: 'TRM')\n  Sheets: [Sheet1]\n  Columns (first sheet): [TF, Naive_Kaech_Kaech, Naive_Kaech_Chung, Naive_Mackay_Chung, Naive_MilnerAug_Chung, Naive_Renkema_Chung, Naive_Scott_Scott, MP_Kaech_Chung, MP_Kaech_Kaech, MP_Kaech_Scott, ... (total 43 columns)]\n- File: 'www\\tfcommunities\\texcommunities.xlsx' (Identifier: 'texcommunities')\n  Sheets: [TEX Communities, TEX_c1, TEX_c2, TEX_c3, TEX_c4, TEX_c5, TRM Communities, TRM_c1, TRM_c2, TRM_c3, TRM_c4, TRM_c5]\n  Columns (first sheet): [TEX Communities, TF Members]\n- File: 'www\\tfcommunities\\trmcommunities.xlsx' (Identifier: 'trmcommunities')\n  Sheets: [Sheet1]\n  Columns (first sheet): [TRM Communities, TF Members]\n- File: 'www\\TFcorintextrm\\TF-TFcorTRMTEX.xlsx' (Identifier: 'TF_TFcorTRMTEX')\n  Sheets: [Sheet1]\n  Columns (first sheet): [TF Name, TF Merged Graph Path]\n- File: 'www\\waveanalysis\\searchtfwaves.xlsx' (Identifier: 'searchtfwaves')\n  Sheets: [Sheet1]\n  Columns (first sheet): [Wave 1, Wave 2, Wave 3, Wave 4, Wave 5, Wave 6, Wave 7]"
 }

chat_ui.R CHANGED Viewed

@@ -45,6 +45,14 @@ chatSidebarUI <- function() {
         style = "height: calc(100vh - 200px); overflow-y: auto; border: 1px solid #ccc; padding: 10px; margin-bottom: 10px; background-color: white;"
         # Placeholder for messages
       ),
       div(
         class = "chat-input-area", # For styling input area
         style = "display: flex; align-items: stretch;",

         style = "height: calc(100vh - 200px); overflow-y: auto; border: 1px solid #ccc; padding: 10px; margin-bottom: 10px; background-color: white;"
         # Placeholder for messages
       ),
+      div(
+        class = "literature-toggle-container",
+        style = "margin: 5px 0;",
+        actionButton("literatureToggleBtn",
+                     HTML('<i class="fa fa-search"></i> External Literature'),
+                     class = "btn btn-outline-info literature-toggle-btn",
+                     style = "width: 100%; font-size: 12px;")
+      ),
       div(
         class = "chat-input-area", # For styling input area
         style = "display: flex; align-items: stretch;",

codebase_analysis.md CHANGED Viewed

@@ -1,153 +1,153 @@
-# How to Run the Application
-To run this R Shiny application, you will need R and the RStudio IDE (recommended) or another R environment installed on your system. You will also need the `shiny` package and other packages listed as dependencies (`readxl`, `DT`, `dplyr`, `shinythemes`).
-**Steps:**
-1.  **Install R and RStudio:** If you haven't already, download and install R from [CRAN](https://cran.r-project.org/) and RStudio Desktop from [Posit](https://posit.co/download/rstudio-desktop/).
-2.  **Install Required R Packages:** Open R or RStudio and run the following commands in the R console:
-    ```R
-    install.packages(c("shiny", "readxl", "DT", "dplyr", "shinythemes"))
-    ```
-3.  **Set Working Directory:** Navigate your R session's working directory to the root folder of this Shiny application (the folder containing `server.R` and `ui.R`). In RStudio, you can do this by opening either `server.R` or `ui.R` and then going to `Session > Set Working Directory > To Source File Location`.
-4.  **Run the App:** In the R console, execute the following command:
-    ```R
-    shiny::runApp()
-    ```
-    Alternatively, if you have `server.R` or `ui.R` open in RStudio, a "Run App" button will typically appear at the top of the editor pane, which you can click.
-This will launch the application in your default web browser.
----
-# Codebase Analysis: TaijiChat Shiny Application
-## Overview
-The codebase consists of an R Shiny application designed to explore and visualize bioinformatics data related to T cell states and transcription factors (TFs). It appears to be a companion tool for a research publication, aiming to make complex datasets accessible. The application is structured into two main files: `server.R` (server-side logic) and `ui.R` (user interface definition). Data is primarily loaded from Excel files and images stored in a `www/` subdirectory.
-## File Breakdown
-### `server.R` (Server Logic)
-**Key Functionalities:**
-1.  **Data Loading and Preprocessing:**
-    *   Loads multiple Excel datasets for TF PageRank scores, TF wave analysis, TF-TF correlations, TF communities, and multi-omics data. These files are located in `www/tablePagerank/`, `www/waveanalysis/`, `www/TFcorintextrm/`, and `www/tfcommunities/`.
-    *   `new_read_excel_file()`: Reads and transposes Excel files, setting "Regulator Names" from the first column and using the original first row as new column headers.
-    *   `new_filter_data()`: Filters transposed dataframes by column names based on user search input (supports multiple comma-separated, case-insensitive keywords).
-2.  **TF Catalog Data Display (Repetitive Structure):**
-    *   Handles data for Overall TF PageRank, Naive, TE, MP, TCM, TEM, TRM, TEXprog, TEXeff-like, and TEXterm cell states.
-    *   For each dataset:
-        *   Uses `reactiveVal` for column pagination state (4 columns per page).
-        *   `observeEvent`s for "next" and "previous" button functionality.
-        *   Reactive expressions filter data by search term and select columns for the current page.
-        *   Dynamically inserts a styled "Cell state data" row with "TF activity score" (at row index 2 for main PageRank table, row index 0 for others).
-        *   `renderDT` outputs `DT::datatable` with custom options (fixed 45 rows, no search box, JS `rowCallback` to highlight the "TF activity score" row).
-3.  **TF Wave Analysis:**
-    *   Loads TF wave data from `www/waveanalysis/searchtfwaves.xlsx`.
-    *   Allows users to search for a TF and view its associated wave(s) in a transposed table.
-4.  **TF-TF Correlation in TRM/TEXterm:**
-    *   Loads data from `www/TFcorintextrm/TF-TFcorTRMTEX.xlsx`.
-    *   Allows TF search.
-    *   Renders a clickable list of TFs (`actionLink`s).
-    *   Displays tabular data and an associated image ("TF Merged Graph Path") for the selected/searched TF.
-5.  **TF Communities:**
-    *   Loads data from `www/tfcommunities/trmcommunities.xlsx` and `www/tfcommunities/texcommunities.xlsx`.
-    *   Displays them as simple `DT::datatable` objects.
-6.  **Multi-omics Data Table:**
-    *   Loads data from `www/multi-omicsdata.xlsx`.
-    *   Renders as a `DT::datatable`, creating hyperlinks in the "Author" column from a "DOI" column, removing empty columns, and enabling scrolling.
-7.  **Navigation & Other:**
-    *   `observeEvent`s for UI element clicks (e.g., `input$c1_link`) to navigate tabs via `updateNavbarPage`.
-    *   Redirects to a bioRxiv paper URL via `session$sendCustomMessage`.
-    *   Contains significant commented-out code (older logic).
-**Libraries Used:** `shiny`, `readxl`, `DT`, `dplyr`.
-### `ui.R` (User Interface)
-**Key Functionalities:**
-1.  **Overall Structure:**
-    *   Uses `shinytheme("flatly")`.
-    *   `navbarPage` for the main tabbed interface.
-    *   Custom CSS for fonts (`Arial`).
-    *   JavaScript for URL redirection and a modal dialog.
-2.  **Home Tab:**
-    *   Project/study description.
-    *   Layout with an image (`homedesc.png`) featuring clickable `actionLink`s for navigation.
-    *   "Read Now" button linking to the research paper.
-    *   Footer with lab links and logos.
-3.  **TF Catalog (`navbarMenu`):**
-    *   **"Search TF Scores" Tab:**
-        *   Explanatory text, image (`tfcat/onlycellstates.png`).
-        *   Search input (`search_input`), column pagination buttons (`prev_btn`, `next_btn`), `DTOutput("table")`.
-    *   **"Cell State Specific TF Catalog" Tab (`navlistPanel`):**
-        *   Sub-tabs for Naive, TE, MP, Tcm, Tem, Trm, TEXprog, TEXeff-like, TEXterm.
-        *   Each sub-tab has a consistent layout: header, text, a specific bubble plot image (from `www/bubbleplots/`), search input, pagination buttons, and `DTOutput`.
-    *   **"Multi-State TFs" Tab:** Displays a heatmap image (`tfcat/multistatesheatmap.png`).
-4.  **TF Wave Analysis (`navbarMenu`):**
-    *   **"Overview" Tab:**
-        *   Explanatory text, overview image (`tfwaveanal.png`).
-        *   Clickable images (`waveanalysis/c1.jpg` to `c6.jpg`, linked via `c1_link` etc.) for navigation to detail tabs.
-        *   Search input (`search_input_wave`), `DTOutput("table_wave")`.
-    *   **Individual Wave Tabs ("Wave 1" to "Wave 7"):**
-        *   Each tab displays the wave image, a GO KEGG result image, and "Ranked Text" image(s) from `www/waveanalysis/` and `www/waveanalysis/txtJPG/`.
-5.  **TF Network Analysis (`navbarMenu`):**
-    *   **"Search TF-TF correlation in TRM/TEXterm" Tab:**
-        *   Methodology description, image (`networkanalysis/tfcorrdesc.png`).
-        *   `sidebarLayout` with search input (`search`), button (`search_btn`), `tableOutput("gene_list_table")` for available TFs.
-        *   `mainPanel` with `tableOutput("result_table")`, legend, and `uiOutput("image_gallery")`.
-        *   Footer with citations.
-    *   **"TRM/TEXterm TF communities" Tab:**
-        *   Descriptive text, images (`networkanalysis/community.jpg`, `networkanalysis/trmtexcom.png`, `networkanalysis/tfcompathway.png`).
-        *   Two `DTOutput`s (`trmcom`, `texcom`) for community tables.
-        *   Footer with citations.
-6.  **Multi-omics Data Tab:**
-    *   Header, text, `dataTableOutput("multiomicsdatatable")`.
-7.  **Global Header Elements:**
-    *   Defines a modal dialog and associated JavaScript (triggered by an element `#csdescrip_link`, not explicitly found in the provided UI snippets for the main content area).
-    *   JavaScript to send a Shiny input upon `#c1_link` click.
-**Libraries Used:** `shiny`, `shinythemes`, `DT`.
-## General Architecture and Observations
-*   **Purpose:** The application serves as an interactive data exploration tool, likely accompanying a scientific publication on T cell biology.
-*   **Data Source:** Heavily reliant on pre-processed data stored in Excel files and pre-generated images within the `www/` directory. This indicates that the core data processing happens outside this Shiny app.
-*   **Repetitive Code Structure:** Significant code duplication exists in both `server.R` and `ui.R`.
-    *   In `server.R`, the logic for loading, filtering, paginating, and rendering tables for the nine different cell state TF scores is nearly identical.
-    *   In `ui.R`, the layout for each of these cell state specific tabs, and also for each of the seven individual TF wave analysis tabs, is highly repetitive.
-    *   This repetition suggests a strong opportunity for refactoring by creating reusable R functions or Shiny modules to generate these UI and server components dynamically.
-*   **User Interface (UI):** The UI is well-structured with a `navbarPage` and logical tab groupings. It provides good contextual information (descriptions, explanations of scores/plots) for users.
-*   **Interactivity:**
-    *   Search functionality for TFs/regulators across various datasets.
-    *   Custom column-based pagination for wide tables.
-    *   Clickable images and links for navigation between sections.
-    *   Dynamic display of tables and images based on user selections.
-*   **Modularity (Potential):** While not heavily modularized currently due to repetition, the distinct analytical sections (TF Catalog, Wave Analysis, Network Analysis) could be prime candidates for separation into modules if the application were to be expanded or refactored.
-*   **Static Content:** A significant portion of the content, especially in the Wave Analysis and Network Analysis tabs, involves displaying pre-generated static images (plots, pathway results).
-*   **Code Graveyard:** Both files end with a "CODE GRAVEYARD" comment, indicating that there's older, unused code present.
-## Potential Areas for Improvement/Refactoring
-*   **Modularization:** Encapsulate the repetitive UI and server logic for cell-state specific tables and individual wave pages into functions or Shiny modules to reduce code duplication and improve maintainability.
-*   **Dynamic Image Generation (Optional):** If source data and plotting scripts were available, some images currently served statically could potentially be generated dynamically, offering more flexibility. However, for a publication companion app, static images are often sufficient and ensure reproducibility of figures.
-*   **Consolidate Helper Functions:** General utility functions (like `new_read_excel_file` and `new_filter_data`) are well-defined but ensure they are used consistently.
-*   **CSS Styling:** Centralize CSS styling rather than relying heavily on inline `style` attributes within `tags$div` and other elements, potentially using a separate CSS file.
-*   **Modal Trigger:** Clarify or ensure the `#csdescrip_link` element, which triggers the global modal, is present and functional in the UI.
 This analysis provides a snapshot of the codebase's structure, functionality, and potential areas for future development or refinement.

+# How to Run the Application
+To run this R Shiny application, you will need R and the RStudio IDE (recommended) or another R environment installed on your system. You will also need the `shiny` package and other packages listed as dependencies (`readxl`, `DT`, `dplyr`, `shinythemes`).
+**Steps:**
+1.  **Install R and RStudio:** If you haven't already, download and install R from [CRAN](https://cran.r-project.org/) and RStudio Desktop from [Posit](https://posit.co/download/rstudio-desktop/).
+2.  **Install Required R Packages:** Open R or RStudio and run the following commands in the R console:
+    ```R
+    install.packages(c("shiny", "readxl", "DT", "dplyr", "shinythemes"))
+    ```
+3.  **Set Working Directory:** Navigate your R session's working directory to the root folder of this Shiny application (the folder containing `server.R` and `ui.R`). In RStudio, you can do this by opening either `server.R` or `ui.R` and then going to `Session > Set Working Directory > To Source File Location`.
+4.  **Run the App:** In the R console, execute the following command:
+    ```R
+    shiny::runApp()
+    ```
+    Alternatively, if you have `server.R` or `ui.R` open in RStudio, a "Run App" button will typically appear at the top of the editor pane, which you can click.
+This will launch the application in your default web browser.
+---
+# Codebase Analysis: TaijiChat Shiny Application
+## Overview
+The codebase consists of an R Shiny application designed to explore and visualize bioinformatics data related to T cell states and transcription factors (TFs). It appears to be a companion tool for a research publication, aiming to make complex datasets accessible. The application is structured into two main files: `server.R` (server-side logic) and `ui.R` (user interface definition). Data is primarily loaded from Excel files and images stored in a `www/` subdirectory.
+## File Breakdown
+### `server.R` (Server Logic)
+**Key Functionalities:**
+1.  **Data Loading and Preprocessing:**
+    *   Loads multiple Excel datasets for TF PageRank scores, TF wave analysis, TF-TF correlations, TF communities, and multi-omics data. These files are located in `www/tablePagerank/`, `www/waveanalysis/`, `www/TFcorintextrm/`, and `www/tfcommunities/`.
+    *   `new_read_excel_file()`: Reads and transposes Excel files, setting "Regulator Names" from the first column and using the original first row as new column headers.
+    *   `new_filter_data()`: Filters transposed dataframes by column names based on user search input (supports multiple comma-separated, case-insensitive keywords).
+2.  **TF Catalog Data Display (Repetitive Structure):**
+    *   Handles data for Overall TF PageRank, Naive, TE, MP, TCM, TEM, TRM, TEXprog, TEXeff-like, and TEXterm cell states.
+    *   For each dataset:
+        *   Uses `reactiveVal` for column pagination state (4 columns per page).
+        *   `observeEvent`s for "next" and "previous" button functionality.
+        *   Reactive expressions filter data by search term and select columns for the current page.
+        *   Dynamically inserts a styled "Cell state data" row with "TF activity score" (at row index 2 for main PageRank table, row index 0 for others).
+        *   `renderDT` outputs `DT::datatable` with custom options (fixed 45 rows, no search box, JS `rowCallback` to highlight the "TF activity score" row).
+3.  **TF Wave Analysis:**
+    *   Loads TF wave data from `www/waveanalysis/searchtfwaves.xlsx`.
+    *   Allows users to search for a TF and view its associated wave(s) in a transposed table.
+4.  **TF-TF Correlation in TRM/TEXterm:**
+    *   Loads data from `www/TFcorintextrm/TF-TFcorTRMTEX.xlsx`.
+    *   Allows TF search.
+    *   Renders a clickable list of TFs (`actionLink`s).
+    *   Displays tabular data and an associated image ("TF Merged Graph Path") for the selected/searched TF.
+5.  **TF Communities:**
+    *   Loads data from `www/tfcommunities/trmcommunities.xlsx` and `www/tfcommunities/texcommunities.xlsx`.
+    *   Displays them as simple `DT::datatable` objects.
+6.  **Multi-omics Data Table:**
+    *   Loads data from `www/multi-omicsdata.xlsx`.
+    *   Renders as a `DT::datatable`, creating hyperlinks in the "Author" column from a "DOI" column, removing empty columns, and enabling scrolling.
+7.  **Navigation & Other:**
+    *   `observeEvent`s for UI element clicks (e.g., `input$c1_link`) to navigate tabs via `updateNavbarPage`.
+    *   Redirects to a bioRxiv paper URL via `session$sendCustomMessage`.
+    *   Contains significant commented-out code (older logic).
+**Libraries Used:** `shiny`, `readxl`, `DT`, `dplyr`.
+### `ui.R` (User Interface)
+**Key Functionalities:**
+1.  **Overall Structure:**
+    *   Uses `shinytheme("flatly")`.
+    *   `navbarPage` for the main tabbed interface.
+    *   Custom CSS for fonts (`Arial`).
+    *   JavaScript for URL redirection and a modal dialog.
+2.  **Home Tab:**
+    *   Project/study description.
+    *   Layout with an image (`homedesc.png`) featuring clickable `actionLink`s for navigation.
+    *   "Read Now" button linking to the research paper.
+    *   Footer with lab links and logos.
+3.  **TF Catalog (`navbarMenu`):**
+    *   **"Search TF Scores" Tab:**
+        *   Explanatory text, image (`tfcat/onlycellstates.png`).
+        *   Search input (`search_input`), column pagination buttons (`prev_btn`, `next_btn`), `DTOutput("table")`.
+    *   **"Cell State Specific TF Catalog" Tab (`navlistPanel`):**
+        *   Sub-tabs for Naive, TE, MP, Tcm, Tem, Trm, TEXprog, TEXeff-like, TEXterm.
+        *   Each sub-tab has a consistent layout: header, text, a specific bubble plot image (from `www/bubbleplots/`), search input, pagination buttons, and `DTOutput`.
+    *   **"Multi-State TFs" Tab:** Displays a heatmap image (`tfcat/multistatesheatmap.png`).
+4.  **TF Wave Analysis (`navbarMenu`):**
+    *   **"Overview" Tab:**
+        *   Explanatory text, overview image (`tfwaveanal.png`).
+        *   Clickable images (`waveanalysis/c1.jpg` to `c6.jpg`, linked via `c1_link` etc.) for navigation to detail tabs.
+        *   Search input (`search_input_wave`), `DTOutput("table_wave")`.
+    *   **Individual Wave Tabs ("Wave 1" to "Wave 7"):**
+        *   Each tab displays the wave image, a GO KEGG result image, and "Ranked Text" image(s) from `www/waveanalysis/` and `www/waveanalysis/txtJPG/`.
+5.  **TF Network Analysis (`navbarMenu`):**
+    *   **"Search TF-TF correlation in TRM/TEXterm" Tab:**
+        *   Methodology description, image (`networkanalysis/tfcorrdesc.png`).
+        *   `sidebarLayout` with search input (`search`), button (`search_btn`), `tableOutput("gene_list_table")` for available TFs.
+        *   `mainPanel` with `tableOutput("result_table")`, legend, and `uiOutput("image_gallery")`.
+        *   Footer with citations.
+    *   **"TRM/TEXterm TF communities" Tab:**
+        *   Descriptive text, images (`networkanalysis/community.jpg`, `networkanalysis/trmtexcom.png`, `networkanalysis/tfcompathway.png`).
+        *   Two `DTOutput`s (`trmcom`, `texcom`) for community tables.
+        *   Footer with citations.
+6.  **Multi-omics Data Tab:**
+    *   Header, text, `dataTableOutput("multiomicsdatatable")`.
+7.  **Global Header Elements:**
+    *   Defines a modal dialog and associated JavaScript (triggered by an element `#csdescrip_link`, not explicitly found in the provided UI snippets for the main content area).
+    *   JavaScript to send a Shiny input upon `#c1_link` click.
+**Libraries Used:** `shiny`, `shinythemes`, `DT`.
+## General Architecture and Observations
+*   **Purpose:** The application serves as an interactive data exploration tool, likely accompanying a scientific publication on T cell biology.
+*   **Data Source:** Heavily reliant on pre-processed data stored in Excel files and pre-generated images within the `www/` directory. This indicates that the core data processing happens outside this Shiny app.
+*   **Repetitive Code Structure:** Significant code duplication exists in both `server.R` and `ui.R`.
+    *   In `server.R`, the logic for loading, filtering, paginating, and rendering tables for the nine different cell state TF scores is nearly identical.
+    *   In `ui.R`, the layout for each of these cell state specific tabs, and also for each of the seven individual TF wave analysis tabs, is highly repetitive.
+    *   This repetition suggests a strong opportunity for refactoring by creating reusable R functions or Shiny modules to generate these UI and server components dynamically.
+*   **User Interface (UI):** The UI is well-structured with a `navbarPage` and logical tab groupings. It provides good contextual information (descriptions, explanations of scores/plots) for users.
+*   **Interactivity:**
+    *   Search functionality for TFs/regulators across various datasets.
+    *   Custom column-based pagination for wide tables.
+    *   Clickable images and links for navigation between sections.
+    *   Dynamic display of tables and images based on user selections.
+*   **Modularity (Potential):** While not heavily modularized currently due to repetition, the distinct analytical sections (TF Catalog, Wave Analysis, Network Analysis) could be prime candidates for separation into modules if the application were to be expanded or refactored.
+*   **Static Content:** A significant portion of the content, especially in the Wave Analysis and Network Analysis tabs, involves displaying pre-generated static images (plots, pathway results).
+*   **Code Graveyard:** Both files end with a "CODE GRAVEYARD" comment, indicating that there's older, unused code present.
+## Potential Areas for Improvement/Refactoring
+*   **Modularization:** Encapsulate the repetitive UI and server logic for cell-state specific tables and individual wave pages into functions or Shiny modules to reduce code duplication and improve maintainability.
+*   **Dynamic Image Generation (Optional):** If source data and plotting scripts were available, some images currently served statically could potentially be generated dynamically, offering more flexibility. However, for a publication companion app, static images are often sufficient and ensure reproducibility of figures.
+*   **Consolidate Helper Functions:** General utility functions (like `new_read_excel_file` and `new_filter_data`) are well-defined but ensure they are used consistently.
+*   **CSS Styling:** Centralize CSS styling rather than relying heavily on inline `style` attributes within `tags$div` and other elements, potentially using a separate CSS file.
+*   **Modal Trigger:** Clarify or ensure the `#csdescrip_link` element, which triggers the global modal, is present and functional in the UI.
 This analysis provides a snapshot of the codebase's structure, functionality, and potential areas for future development or refinement.

folder_structure_documentation.md CHANGED Viewed

@@ -1,108 +1,108 @@
-# Web Application Folder Structure Documentation
-This document outlines the required folder structure for the TaijiChat R Shiny application, primarily focusing on the `www/` directory, which houses data files and static assets like images.
-## Root Directory Structure
-The application's root directory contains the core R scripts and the `www/` directory:
-```
-/
-|-- server.R
-|-- ui.R
-|-- www/
-|-- (other development files like .git/, codebase_analysis.md etc.)
-```
-## `www/` Directory Structure
-The `www/` directory is crucial as Shiny automatically makes its contents accessible to the web browser. It needs to be organized as follows for the application to find its resources:
-```
-www/
-|-- tablePagerank/                 # Excel files for TF PageRank scores
-|   |-- Table_TF PageRank Scores for Audrey.xlsx # Contains table(s) with PageRank scores for Transcription Factors (TFs), potentially a master or specific analysis file.
-|   |-- Naive.xlsx                 # TF data related to Naive cell state.
-|   |-- TE.xlsx                    # TF data related to T Exhausted cell state.
-|   |-- MP.xlsx                    # TF data related to Memory Precursor cell state.
-|   |-- TCM.xlsx                   # TF data related to Central Memory T cell state.
-|   |-- TEM.xlsx                   # TF data related to Effector Memory T cell state.
-|   |-- TRM.xlsx                   # TF data related to Resident Memory T cell state.
-|   |-- TEXprog.xlsx               # TF data related to Progenitor Exhausted T cell state.
-|   |-- TEXeff.xlsx                # TF data related to Effector Exhausted T cell state.
-|   |-- TEXterm.xlsx               # TF data related to Terminally Exhausted T cell state.
-|
-|-- waveanalysis/                  # Assets for TF Wave Analysis
-|   |-- searchtfwaves.xlsx         # Contains TF names organized by different "waves" of activity or expression.
-|   |-- tfwaveanal.png             # Overview image
-|   |-- c1.jpg                     # Wave 1 image
-|   |-- c2.jpg                     # Wave 2 image
-|   |-- c3.jpg                     # Wave 3 image
-|   |-- c4.jpg                     # Wave 4 image
-|   |-- c5.jpg                     # Wave 5 image
-|   |-- c6.jpg                     # Wave 6 image
-|   |-- c7.jpg                     # Wave 7 image
-|   |-- c1_selected_GO_KEGG.jpg
-|   |-- c2_selected_GO_KEGG_v2.jpg
-|   |-- c3_selected_GO_KEGG.jpg
-|   |-- c4_selected_GO_KEGG.jpg
-|   |-- c5_selected_GO_KEGG.jpg
-|   |-- c6_selected_GO_KEGG.jpg
-|   |-- c7_selected_GO_KEGG.jpg
-|   |
-|   |-- txtJPG/                    # "Ranked Text" images for Wave Analysis
-|       |-- c1_ranked_1.jpg
-|       |-- c1_ranked_2.jpg
-|       |-- c2_ranked.jpg
-|       |-- c3_ranked.jpg
-|       |-- c4_ranked.jpg
-|       |-- c5_ranked.jpg
-|       |-- c6_ranked.jpg
-|       |-- c7_ranked.jpg
-|
-|-- TFcorintextrm/                 # Data for TF-TF correlation
-|   |-- TF-TFcorTRMTEX.xlsx        # Contains data on correlations between Transcription Factors, possibly focused on TRM and TEX states.
-|
-|-- tfcommunities/                 # Data for TF communities
-|   |-- trmcommunities.xlsx        # Data defining TF communities within the TRM (Resident Memory T cell) state.
-|   |-- texcommunities.xlsx        # Data defining TF communities within TEX (Exhausted T cell) states.
-|
-|-- bubbleplots/                   # Images for cell-state specific bubble plots
-|   |-- naivebubble.jpg
-|   |-- tebubble.jpg
-|   |-- mpbubble.jpg
-|   |-- tcmbubble.jpg
-|   |-- tembubble.jpg
-|   |-- trmbubble.jpg
-|   |-- texprogbubble.jpg
-|   |-- texintbubble.jpg           # (Used for TEXeff-like)
-|   |-- textermbubble.jpg
-|
-|-- tfcat/                         # Images for the TF Catalog section
-|   |-- onlycellstates.png
-|   |-- multistatesheatmap.png
-|
-|-- networkanalysis/               # Images for TF Network Analysis section
-|   |-- tfcorrdesc.png
-|   |-- community.jpg
-|   |-- trmtexcom.png
-|   |-- tfcompathway.png
-|
-|-- multi-omicsdata.xlsx           # Main multi-omics data file (e.g., gene expression, chromatin accessibility, protein levels). Structure needs to be inferred or predefined for agent use.
-|
-|-- homedesc.png                   # Image for the home page
-|-- ucsdlogo.png                   # UCSD Logo
-|-- salklogo.png                   # Salk Logo
-|-- unclogo.jpg                    # UNC Logo
-|-- csdescrip.jpeg                 # Image for the modal dialog (if used)
-```
-## Notes:
-*   The filenames listed are based on the explicit references in `server.R` and `ui.R`.
-*   This structure primarily covers files loaded directly by the R scripts or referenced in UI image tags.
-*   For the application to be fully functional, all listed Excel files and image assets must be present in these locations with the correct names.
-*   If an Excel file (e.g., for individual cell states) is derived from a single source table, it's assumed that the source table has been appropriately processed or split into these individual files, or that the application can handle the single source if the server-side logic were adapted.
 This documentation should help in setting up the necessary file environment for the application.

+# Web Application Folder Structure Documentation
+This document outlines the required folder structure for the TaijiChat R Shiny application, primarily focusing on the `www/` directory, which houses data files and static assets like images.
+## Root Directory Structure
+The application's root directory contains the core R scripts and the `www/` directory:
+```
+/
+|-- server.R
+|-- ui.R
+|-- www/
+|-- (other development files like .git/, codebase_analysis.md etc.)
+```
+## `www/` Directory Structure
+The `www/` directory is crucial as Shiny automatically makes its contents accessible to the web browser. It needs to be organized as follows for the application to find its resources:
+```
+www/
+|-- tablePagerank/                 # Excel files for TF PageRank scores
+|   |-- Table_TF PageRank Scores for Audrey.xlsx # Contains table(s) with PageRank scores for Transcription Factors (TFs), potentially a master or specific analysis file.
+|   |-- Naive.xlsx                 # TF data related to Naive cell state.
+|   |-- TE.xlsx                    # TF data related to T Exhausted cell state.
+|   |-- MP.xlsx                    # TF data related to Memory Precursor cell state.
+|   |-- TCM.xlsx                   # TF data related to Central Memory T cell state.
+|   |-- TEM.xlsx                   # TF data related to Effector Memory T cell state.
+|   |-- TRM.xlsx                   # TF data related to Resident Memory T cell state.
+|   |-- TEXprog.xlsx               # TF data related to Progenitor Exhausted T cell state.
+|   |-- TEXeff.xlsx                # TF data related to Effector Exhausted T cell state.
+|   |-- TEXterm.xlsx               # TF data related to Terminally Exhausted T cell state.
+|
+|-- waveanalysis/                  # Assets for TF Wave Analysis
+|   |-- searchtfwaves.xlsx         # Contains TF names organized by different "waves" of activity or expression.
+|   |-- tfwaveanal.png             # Overview image
+|   |-- c1.jpg                     # Wave 1 image
+|   |-- c2.jpg                     # Wave 2 image
+|   |-- c3.jpg                     # Wave 3 image
+|   |-- c4.jpg                     # Wave 4 image
+|   |-- c5.jpg                     # Wave 5 image
+|   |-- c6.jpg                     # Wave 6 image
+|   |-- c7.jpg                     # Wave 7 image
+|   |-- c1_selected_GO_KEGG.jpg
+|   |-- c2_selected_GO_KEGG_v2.jpg
+|   |-- c3_selected_GO_KEGG.jpg
+|   |-- c4_selected_GO_KEGG.jpg
+|   |-- c5_selected_GO_KEGG.jpg
+|   |-- c6_selected_GO_KEGG.jpg
+|   |-- c7_selected_GO_KEGG.jpg
+|   |
+|   |-- txtJPG/                    # "Ranked Text" images for Wave Analysis
+|       |-- c1_ranked_1.jpg
+|       |-- c1_ranked_2.jpg
+|       |-- c2_ranked.jpg
+|       |-- c3_ranked.jpg
+|       |-- c4_ranked.jpg
+|       |-- c5_ranked.jpg
+|       |-- c6_ranked.jpg
+|       |-- c7_ranked.jpg
+|
+|-- TFcorintextrm/                 # Data for TF-TF correlation
+|   |-- TF-TFcorTRMTEX.xlsx        # Contains data on correlations between Transcription Factors, possibly focused on TRM and TEX states.
+|
+|-- tfcommunities/                 # Data for TF communities
+|   |-- trmcommunities.xlsx        # Data defining TF communities within the TRM (Resident Memory T cell) state.
+|   |-- texcommunities.xlsx        # Data defining TF communities within TEX (Exhausted T cell) states.
+|
+|-- bubbleplots/                   # Images for cell-state specific bubble plots
+|   |-- naivebubble.jpg
+|   |-- tebubble.jpg
+|   |-- mpbubble.jpg
+|   |-- tcmbubble.jpg
+|   |-- tembubble.jpg
+|   |-- trmbubble.jpg
+|   |-- texprogbubble.jpg
+|   |-- texintbubble.jpg           # (Used for TEXeff-like)
+|   |-- textermbubble.jpg
+|
+|-- tfcat/                         # Images for the TF Catalog section
+|   |-- onlycellstates.png
+|   |-- multistatesheatmap.png
+|
+|-- networkanalysis/               # Images for TF Network Analysis section
+|   |-- tfcorrdesc.png
+|   |-- community.jpg
+|   |-- trmtexcom.png
+|   |-- tfcompathway.png
+|
+|-- multi-omicsdata.xlsx           # Main multi-omics data file (e.g., gene expression, chromatin accessibility, protein levels). Structure needs to be inferred or predefined for agent use.
+|
+|-- homedesc.png                   # Image for the home page
+|-- ucsdlogo.png                   # UCSD Logo
+|-- salklogo.png                   # Salk Logo
+|-- unclogo.jpg                    # UNC Logo
+|-- csdescrip.jpeg                 # Image for the modal dialog (if used)
+```
+## Notes:
+*   The filenames listed are based on the explicit references in `server.R` and `ui.R`.
+*   This structure primarily covers files loaded directly by the R scripts or referenced in UI image tags.
+*   For the application to be fully functional, all listed Excel files and image assets must be present in these locations with the correct names.
+*   If an Excel file (e.g., for individual cell states) is derived from a single source table, it's assumed that the source table has been appropriately processed or split into these individual files, or that the application can handle the single source if the server-side logic were adapted.
 This documentation should help in setting up the necessary file environment for the application.

long_operations.R CHANGED Viewed

@@ -1,77 +1,77 @@
-# long_operations.R
-library(shiny)
-# Source the caching functions
-source("R/caching.R")
-# Function to wrap a long-running operation with warning overlay and caching
-withWarningOverlayAndCache <- function(session, operation_name, operation_func, ..., max_cache_age_seconds = NULL) {
-  # Generate a cache key based on operation name and its specific arguments
-  # Note: The arguments passed to `...` for generate_cache_key should uniquely identify this call.
-  # This might need careful handling depending on how operation_func uses its environment or global vars.
-  cache_key <- generate_cache_key(operation_name, ...)
-  # Try to get from cache first
-  cached_result <- get_cached_item(cache_key, max_age_seconds = max_cache_age_seconds)
-  if (!is.null(cached_result)) {
-    return(cached_result)
-  }
-  # If not in cache or stale, proceed with the operation
-  # Send a custom message to UI to display a warning in the chat log
-  warning_text <- "This operation might take a moment. Please be patient."
-  excel_operations <- c("new_read_excel_file") # Add other excel related op_names here
-  if (operation_name %in% excel_operations) {
-    warning_text <- "Processing Excel file(s), this may take longer. Please be patient."
-  }
-  session$sendCustomMessage(type = "long_op_custom_warning", message = list(text = warning_text))
-  result <- tryCatch({
-    operation_func() # Execute the actual operation
-  }, error = function(e) {
-    stop(e)
-  })
-  # Save to cache
-  save_cached_item(cache_key, result)
-  return(result)
-}
-# Function to check if an operation might be long-running
-isLongRunningOperation <- function(operation_name) {
-  # List of operations that typically take longer
-  long_operations <- c(
-    "get_processed_tf_data",
-    "get_tf_wave_search_data",
-    "get_tf_correlation_data",
-    "get_tf_community_sheet_data",
-    "new_read_excel_file"
-  )
-  return(operation_name %in% long_operations)
-}
-# Function to wrap a reactive expression with warning overlay
-# This will now need to be adapted if we want caching for reactives.
-# For simplicity, let's assume for now that caching is applied at a lower level before reactive is involved,
-# or that specific reactive expressions will call withWarningOverlayAndCache directly.
-withWarningOverlayReactive <- function(session, reactive_expr, operation_name) {
-  # This function needs to be re-thought if caching is to be applied transparently to any reactive_expr.
-  # The current caching model in withWarningOverlayAndCache assumes an operation_func and its specific args.
-  # A reactive_expr doesn't fit this model directly without knowing what makes it unique.
-  if (isLongRunningOperation(operation_name)) {
-    # This reactive wrapper would show the overlay but not handle caching itself.
-    # Caching should ideally happen inside the reactive_expr if it calls a cacheable function.
-    reactive({
-      showWarningOverlay(session)
-      res <- reactive_expr()
-      hideWarningOverlay(session)
-      res
-    })
-  } else {
-    reactive_expr
-  }
 }

+# long_operations.R
+library(shiny)
+# Source the caching functions
+source("R/caching.R")
+# Function to wrap a long-running operation with warning overlay and caching
+withWarningOverlayAndCache <- function(session, operation_name, operation_func, ..., max_cache_age_seconds = NULL) {
+  # Generate a cache key based on operation name and its specific arguments
+  # Note: The arguments passed to `...` for generate_cache_key should uniquely identify this call.
+  # This might need careful handling depending on how operation_func uses its environment or global vars.
+  cache_key <- generate_cache_key(operation_name, ...)
+  # Try to get from cache first
+  cached_result <- get_cached_item(cache_key, max_age_seconds = max_cache_age_seconds)
+  if (!is.null(cached_result)) {
+    return(cached_result)
+  }
+  # If not in cache or stale, proceed with the operation
+  # Send a custom message to UI to display a warning in the chat log
+  warning_text <- "This operation might take a moment. Please be patient."
+  excel_operations <- c("new_read_excel_file") # Add other excel related op_names here
+  if (operation_name %in% excel_operations) {
+    warning_text <- "Processing Excel file(s), this may take longer. Please be patient."
+  }
+  session$sendCustomMessage(type = "long_op_custom_warning", message = list(text = warning_text))
+  result <- tryCatch({
+    operation_func() # Execute the actual operation
+  }, error = function(e) {
+    stop(e)
+  })
+  # Save to cache
+  save_cached_item(cache_key, result)
+  return(result)
+}
+# Function to check if an operation might be long-running
+isLongRunningOperation <- function(operation_name) {
+  # List of operations that typically take longer
+  long_operations <- c(
+    "get_processed_tf_data",
+    "get_tf_wave_search_data",
+    "get_tf_correlation_data",
+    "get_tf_community_sheet_data",
+    "new_read_excel_file"
+  )
+  return(operation_name %in% long_operations)
+}
+# Function to wrap a reactive expression with warning overlay
+# This will now need to be adapted if we want caching for reactives.
+# For simplicity, let's assume for now that caching is applied at a lower level before reactive is involved,
+# or that specific reactive expressions will call withWarningOverlayAndCache directly.
+withWarningOverlayReactive <- function(session, reactive_expr, operation_name) {
+  # This function needs to be re-thought if caching is to be applied transparently to any reactive_expr.
+  # The current caching model in withWarningOverlayAndCache assumes an operation_func and its specific args.
+  # A reactive_expr doesn't fit this model directly without knowing what makes it unique.
+  if (isLongRunningOperation(operation_name)) {
+    # This reactive wrapper would show the overlay but not handle caching itself.
+    # Caching should ideally happen inside the reactive_expr if it calls a cacheable function.
+    reactive({
+      showWarningOverlay(session)
+      res <- reactive_expr()
+      hideWarningOverlay(session)
+      res
+    })
+  } else {
+    reactive_expr
+  }
 }

main.py CHANGED Viewed

@@ -1,38 +1,38 @@
-# main.py
-import os
-from openai import OpenAI
-from agents.manager_agent import ManagerAgent
-API_KEY_FILE = "api_key.txt" # Define the API key filename
-if __name__ == "__main__":
-    print("Application starting...")
-    api_key = None
-    client = None
-    try:
-        # Try to read the API key from the file
-        with open(API_KEY_FILE, 'r') as f:
-            api_key = f.read().strip()
-        if not api_key:
-            print(f"Warning: {API_KEY_FILE} is empty. LLM features will be disabled.")
-            api_key = None # Ensure api_key is None if file is empty
-        else:
-            print(f"Successfully read API key from {API_KEY_FILE}.")
-    except FileNotFoundError:
-        print(f"Warning: {API_KEY_FILE} not found. LLM features will be disabled.")
-    except Exception as e:
-        print(f"Error reading {API_KEY_FILE}: {e}. LLM features will be disabled.")
-    if api_key:
-        try:
-            client = OpenAI(api_key=api_key)
-            print("OpenAI client initialized successfully.")
-        except Exception as e:
-            print(f"Error initializing OpenAI client: {e}. LLM features will be disabled.")
-            client = None # Ensure client is None if initialization fails
-    manager = ManagerAgent(openai_api_key=api_key, openai_client=client)
-    manager.start_interactive_session()
     print("Application ended.")

+# main.py
+import os
+from openai import OpenAI
+from agents.manager_agent import ManagerAgent
+API_KEY_FILE = "api_key.txt" # Define the API key filename
+if __name__ == "__main__":
+    print("Application starting...")
+    api_key = None
+    client = None
+    try:
+        # Try to read the API key from the file
+        with open(API_KEY_FILE, 'r') as f:
+            api_key = f.read().strip()
+        if not api_key:
+            print(f"Warning: {API_KEY_FILE} is empty. LLM features will be disabled.")
+            api_key = None # Ensure api_key is None if file is empty
+        else:
+            print(f"Successfully read API key from {API_KEY_FILE}.")
+    except FileNotFoundError:
+        print(f"Warning: {API_KEY_FILE} not found. LLM features will be disabled.")
+    except Exception as e:
+        print(f"Error reading {API_KEY_FILE}: {e}. LLM features will be disabled.")
+    if api_key:
+        try:
+            client = OpenAI(api_key=api_key)
+            print("OpenAI client initialized successfully.")
+        except Exception as e:
+            print(f"Error initializing OpenAI client: {e}. LLM features will be disabled.")
+            client = None # Ensure client is None if initialization fails
+    manager = ManagerAgent(openai_api_key=api_key, openai_client=client)
+    manager.start_interactive_session()
     print("Application ended.")

plan_temp.txt CHANGED Viewed

@@ -1,30 +1,30 @@
-i dont think that's reasonable. here's my plan and you can compare current agents against the plan. Correct current implementation to align with my plan:
-For every query, the generation agent go through the steps:
-if a dataset, an image, or a paper is provided, add them when creating chat completion. If not, proceed to step 1.
-1. analyze query
-2. analyze the conversation history if there's any
-3. analyze images, paper, data according to the plan if there's any provided with chat completion.
-4. analyze the error from previous attempt is there's any
-5. read the paper description short version to understand what the paper is about
-6. decide whether the user query can be answered directly or need more information from the paper; if yes, read it
-7. read the tools documentation
-8. decide which tools can be helpful when answering the query; if there are any, prepare the list of tools going to be used
-9. read the data documentation
-10. decide which datasets are relevant to the user query; if there are any, prepare the list of datasets going to be used
-11. decide whether the user query can be solved by paper or tools or data or a combnation of them, if not, prepare a signal NEED_CODING = TRUE but dont send it yet. if not move to the next step
-12. decide whether the user query is about image(s). if so, prepare a list of images needed.
-13. put everything together to make a plan
-- this process of thinking must be included in generation agent's LLM's output. it will be used to
-Supervisor agent reviews the plan, focusing on the code and check for suspicious, malicious behavior. only common packages import are allowed
-executor agent executes the plan if the plan contains tool execution or code
-manager records everything from all LLMs and users, and deem whether the user's query can be considered as answered. Note that if agents only propose a plan but the results are not gathered yet it cannot be consider as a proper answer - as in most cases where generation agent propose a plan in iteration 1. if the manager agent deems that a plan is proposed, but results not collected / plan not executed and there's no error from the LLM, then manager agent tells generation agent to initialize a different chat completion with the images, datasets requested by generation's plan. This attempt instructed by manager will be different from a normal attempt, it does not count to the allowed attempt count.
-if an error occurs in any stage, the error must be reported to the manager. the manager will record all the errors. once an error is detected, another attempt will start. and we go back to the generation agent step again. there will be 3 attempts allowed
-tell me whether you think my plan is clear and reasonable. if there any part missing or problematic
 if not, proceed to implementation

+i dont think that's reasonable. here's my plan and you can compare current agents against the plan. Correct current implementation to align with my plan:
+For every query, the generation agent go through the steps:
+if a dataset, an image, or a paper is provided, add them when creating chat completion. If not, proceed to step 1.
+1. analyze query
+2. analyze the conversation history if there's any
+3. analyze images, paper, data according to the plan if there's any provided with chat completion.
+4. analyze the error from previous attempt is there's any
+5. read the paper description short version to understand what the paper is about
+6. decide whether the user query can be answered directly or need more information from the paper; if yes, read it
+7. read the tools documentation
+8. decide which tools can be helpful when answering the query; if there are any, prepare the list of tools going to be used
+9. read the data documentation
+10. decide which datasets are relevant to the user query; if there are any, prepare the list of datasets going to be used
+11. decide whether the user query can be solved by paper or tools or data or a combnation of them, if not, prepare a signal NEED_CODING = TRUE but dont send it yet. if not move to the next step
+12. decide whether the user query is about image(s). if so, prepare a list of images needed.
+13. put everything together to make a plan
+- this process of thinking must be included in generation agent's LLM's output. it will be used to
+Supervisor agent reviews the plan, focusing on the code and check for suspicious, malicious behavior. only common packages import are allowed
+executor agent executes the plan if the plan contains tool execution or code
+manager records everything from all LLMs and users, and deem whether the user's query can be considered as answered. Note that if agents only propose a plan but the results are not gathered yet it cannot be consider as a proper answer - as in most cases where generation agent propose a plan in iteration 1. if the manager agent deems that a plan is proposed, but results not collected / plan not executed and there's no error from the LLM, then manager agent tells generation agent to initialize a different chat completion with the images, datasets requested by generation's plan. This attempt instructed by manager will be different from a normal attempt, it does not count to the allowed attempt count.
+if an error occurs in any stage, the error must be reported to the manager. the manager will record all the errors. once an error is detected, another attempt will start. and we go back to the generation agent step again. there will be 3 attempts allowed
+tell me whether you think my plan is clear and reasonable. if there any part missing or problematic
 if not, proceed to implementation

server.R CHANGED Viewed

The diff for this file is too large to render. See raw diff

tested_queries.txt CHANGED Viewed

@@ -1,40 +1,40 @@
-# --- Easy Queries (Navigation & Simple Data Retrieval) ---
-# Navigation
-1.  "Show me the home page."
-2.  "Take me to the TE (Terminal Exhaustion) data section."
-3.  "I want to see the multi-omics data."
-4.  "Navigate to the TF (Transcription Factor) Wave Analysis overview."
-5.  "Where can I find information about TRM communities?"
-# Simple Data Retrieval (from existing tables/UI elements)
-6.  "In the 'All Data Search' (main page), what are the TF activity scores for STAT3?"
-7.  "For the Naive T-cell state, search for scores related to JUNB."
-8.  "What waves is the TF 'BATF' a part of?" (Uses searchtfwaves.xlsx)
-9.  "Display the TRM communities table."
-10. "Find the research paper by 'Chen' in the multi-omics data." (Assumes 'Chen' is an author)
-# --- Medium Queries (Requires Tool Use & Simple Code for Analysis/Formatting) ---
-# Basic Analysis / Data Manipulation (if agent can generate code for simple tasks)
-11. "From the 'All Data Search' table, can you list the top 3 TFs with the highest scores in the first displayed cell state (e.g., Naive_Day0_vs_Day7_UP)?" (Requires identifying a column and finding max values)
-12. "What is the average TF activity score for 'IRF4' across all displayed cell states in the 'All Data Search' section for the current view?" (Requires iterating through columns if multiple are shown for IRF4)
-13. "Compare the TF activity scores for 'TCF7' and 'TOX' in the 'TE' (Terminal Exhaustion) dataset. Which one is generally higher?"
-14. "If I search for 'BACH2' in the main TF activity score table, how many cell states show a score greater than 1.0?"
-15. "Can you provide the TF activity scores for 'PRDM1' in the TEM (T Effector Memory) dataset, but only show me the cell states where the score is negative?"
-# --- Difficult Queries (Requires LLM Interpretation, Insight Generation, Complex Tool Orchestration) ---
-# Insight Generation & Interpretation
-16. "Based on the available TF activity scores, which TFs seem to be most consistently upregulated across different exhausted T-cell states (e.g., TEXprog, TEXeff, TEXterm)?" (Requires understanding of "exhausted", cross-table comparison, and summarization)
-17. "Is there a noticeable trend or pattern in the activity of 'EOMES' as T-cells progress from Naive to various effector and memory states shown in the data?" (Requires interpreting progression and comparing multiple datasets)
-18. "Considering the TF communities data for TRM and TEX, are there any TFs that are prominent in both TRM and TEX communities, suggesting a shared role?" (Requires comparing two distinct datasets/visualizations and identifying overlaps)
-19. "Analyze the TF activity scores for 'FOXO1'. Does its activity pattern suggest a role in maintaining T-cell quiescence or promoting activation/exhaustion based on the data available across different T-cell states?" (Requires biological interpretation linked to data patterns)
-20. "If a researcher is interested in TFs that are highly active in T Effector Memory (TEM) cells but show low activity in Terminally Exhausted (TEXterm) cells, which TFs should they investigate further based on the provided datasets?" (Requires filtering, comparison across datasets, and a recommendation)
-21. "Looking at the TF Wave Analysis, which TFs are predominantly active in early waves versus late waves? What might this imply about their roles in T-cell differentiation or response dynamics?" (Requires interpreting the wave data and drawing higher-level conclusions)
-22. "The user uploaded an image of a UMAP plot showing clusters. The file is 'www/test_images/umap_example.png'. Can you describe what you see in the image and how it might relate to T-cell states if cluster A is Naive, cluster B is TEM, and cluster C is TEX?" (Requires multimodal input, assuming the agent can be pointed to local files for analysis - this tests the image upload and interpretation flow we built)
-23. "Given the data in 'Table_TF PageRank Scores for Audrey.xlsx', identify three TFs that have significantly different activity scores between 'Naive_Day0_vs_Day7_UP' and 'MP_Day0_vs_Day7_UP'. Explain the potential biological significance of these differences." (Requires direct data analysis from a file, comparison, and biological reasoning)
-# Creative/Hypothetical (tests robustness and deeper understanding)
-24. "If we wanted to design an experiment to reverse T-cell exhaustion, which 2-3 TFs might be good targets for modulation (activation or inhibition) based on their activity profiles in the provided datasets, and why?"
 25. "Explain the overall story the TF activity data tells about T-cell differentiation and exhaustion from Naive to Terminally Exhausted states, highlighting 3 key TF players and their changing roles."

+# --- Easy Queries (Navigation & Simple Data Retrieval) ---
+# Navigation
+1.  "Show me the home page."
+2.  "Take me to the TE (Terminal Exhaustion) data section."
+3.  "I want to see the multi-omics data."
+4.  "Navigate to the TF (Transcription Factor) Wave Analysis overview."
+5.  "Where can I find information about TRM communities?"
+# Simple Data Retrieval (from existing tables/UI elements)
+6.  "In the 'All Data Search' (main page), what are the TF activity scores for STAT3?"
+7.  "For the Naive T-cell state, search for scores related to JUNB."
+8.  "What waves is the TF 'BATF' a part of?" (Uses searchtfwaves.xlsx)
+9.  "Display the TRM communities table."
+10. "Find the research paper by 'Chen' in the multi-omics data." (Assumes 'Chen' is an author)
+# --- Medium Queries (Requires Tool Use & Simple Code for Analysis/Formatting) ---
+# Basic Analysis / Data Manipulation (if agent can generate code for simple tasks)
+11. "From the 'All Data Search' table, can you list the top 3 TFs with the highest scores in the first displayed cell state (e.g., Naive_Day0_vs_Day7_UP)?" (Requires identifying a column and finding max values)
+12. "What is the average TF activity score for 'IRF4' across all displayed cell states in the 'All Data Search' section for the current view?" (Requires iterating through columns if multiple are shown for IRF4)
+13. "Compare the TF activity scores for 'TCF7' and 'TOX' in the 'TE' (Terminal Exhaustion) dataset. Which one is generally higher?"
+14. "If I search for 'BACH2' in the main TF activity score table, how many cell states show a score greater than 1.0?"
+15. "Can you provide the TF activity scores for 'PRDM1' in the TEM (T Effector Memory) dataset, but only show me the cell states where the score is negative?"
+# --- Difficult Queries (Requires LLM Interpretation, Insight Generation, Complex Tool Orchestration) ---
+# Insight Generation & Interpretation
+16. "Based on the available TF activity scores, which TFs seem to be most consistently upregulated across different exhausted T-cell states (e.g., TEXprog, TEXeff, TEXterm)?" (Requires understanding of "exhausted", cross-table comparison, and summarization)
+17. "Is there a noticeable trend or pattern in the activity of 'EOMES' as T-cells progress from Naive to various effector and memory states shown in the data?" (Requires interpreting progression and comparing multiple datasets)
+18. "Considering the TF communities data for TRM and TEX, are there any TFs that are prominent in both TRM and TEX communities, suggesting a shared role?" (Requires comparing two distinct datasets/visualizations and identifying overlaps)
+19. "Analyze the TF activity scores for 'FOXO1'. Does its activity pattern suggest a role in maintaining T-cell quiescence or promoting activation/exhaustion based on the data available across different T-cell states?" (Requires biological interpretation linked to data patterns)
+20. "If a researcher is interested in TFs that are highly active in T Effector Memory (TEM) cells but show low activity in Terminally Exhausted (TEXterm) cells, which TFs should they investigate further based on the provided datasets?" (Requires filtering, comparison across datasets, and a recommendation)
+21. "Looking at the TF Wave Analysis, which TFs are predominantly active in early waves versus late waves? What might this imply about their roles in T-cell differentiation or response dynamics?" (Requires interpreting the wave data and drawing higher-level conclusions)
+22. "The user uploaded an image of a UMAP plot showing clusters. The file is 'www/test_images/umap_example.png'. Can you describe what you see in the image and how it might relate to T-cell states if cluster A is Naive, cluster B is TEM, and cluster C is TEX?" (Requires multimodal input, assuming the agent can be pointed to local files for analysis - this tests the image upload and interpretation flow we built)
+23. "Given the data in 'Table_TF PageRank Scores for Audrey.xlsx', identify three TFs that have significantly different activity scores between 'Naive_Day0_vs_Day7_UP' and 'MP_Day0_vs_Day7_UP'. Explain the potential biological significance of these differences." (Requires direct data analysis from a file, comparison, and biological reasoning)
+# Creative/Hypothetical (tests robustness and deeper understanding)
+24. "If we wanted to design an experiment to reverse T-cell exhaustion, which 2-3 TFs might be good targets for modulation (activation or inhibition) based on their activity profiles in the provided datasets, and why?"
 25. "Explain the overall story the TF activity data tells about T-cell differentiation and exhaustion from Naive to Terminally Exhausted states, highlighting 3 key TF players and their changing roles."

tools/agent_tools.py CHANGED Viewed

The diff for this file is too large to render. See raw diff

tools/agent_tools_documentation.md CHANGED Viewed

@@ -1,190 +1,190 @@
-# Agent Tools Documentation
-This document outlines the granular tools that can be created or extracted from the TaijiChat R Shiny application. These tools are intended for an agent system to access data, calculations, methodologies, tables, and graphs from the application.
----
-Tool Name: `get_raw_excel_data`
-Description: Reads a specified Excel file and returns its raw content as a list of lists, where each inner list represents a row. This tool is generic; the `file_path` should be an absolute path or a path relative to the project root (e.g., "www/some_data.xlsx"). For predefined datasets within the application structure, other more specific tools should be preferred if available.
-Input: `file_path` (string) - The path to the Excel file.
-Output: `data` (list of lists of strings/numbers) - The raw data from the Excel sheet. Returns an empty list if the file is not found or cannot be read.
----
-Tool Name: `get_processed_tf_data`
-Description: Reads and processes a TF-related Excel file identified by its dataset_identifier (e.g., "Naive", "Overall_TF_PageRank"). It uses an internal mapping (get_tf_catalog_dataset_path) to find the actual file path within the 'www/tablePagerank/' directory. The standard processing includes: reading the Excel file, transposing it, using the original first row as new column headers, and then removing this header row from the data.
-Input: `dataset_identifier` (string) - The identifier for the dataset. Valid identifiers include: "Overall_TF_PageRank", "Naive", "TE", "MP", "TCM", "TEM", "TRM", "TEXprog", "TEXeff", "TEXterm".
-Output: `data` (list of lists of strings/numbers) - The processed data, where the first inner list contains the headers, and subsequent lists are data rows. Returns an empty list if processing fails or identifier is invalid.
----
-Tool Name: `filter_data_by_column_keywords`
-Description: Filters a dataset (list of lists, where the first list is headers) based on keywords matching its column names. This is for data that has already been processed (e.g., by `get_processed_tf_data`) where TFs or genes are column headers. The keyword search is case-insensitive and supports multiple comma-separated keywords. If no keywords are provided, the original dataset is returned.
-Input:
-    `dataset` (list of lists) - The data to filter, with the first list being headers.
-    `keywords` (string) - Comma-separated keywords to search for in column headers.
-Output: `filtered_dataset` (list of lists) - The subset of the data containing only the matching columns (including the header row). Returns an empty list (with headers only) if no columns match.
----
-Tool Name: `get_tf_wave_search_data`
-Description: Reads the `searchtfwaves.xlsx` file from `www/waveanalysis/`, which contains TF names organized by "waves" (Wave1 to Wave7 as columns).
-Input: `tf_search_term` (string, optional) - A specific TF name to search for. If empty or not provided, all TF wave data is returned. The search is case-insensitive.
-Output: `wave_data` (dictionary) - If `tf_search_term` is provided and matches, returns a structure like `{"WaveX": ["TF1", "TF2"], "WaveY": ["TF1"]}` showing which waves the TF belongs to. If no `tf_search_term`, returns the full data as `{"Wave1": ["All TFs in Wave1"], "Wave2": ["All TFs in Wave2"], ...}`. If no matches are found for a search term, an empty dictionary is returned.
----
-Tool Name: `get_tf_correlation_data`
-Description: Reads the `TF-TFcorTRMTEX.xlsx` file from `www/TFcorintextrm/`. If a `tf_name` is provided, it filters the data for that TF (case-insensitive match on the primary TF identifier column, typically "TF Name" or the first column).
-Input: `tf_name` (string, optional) - The specific TF name to search for. If empty or not provided, returns the full dataset.
-Output: `correlation_data` (list of lists) - The filtered (or full) data from the correlation table. The first list is headers. Returns an empty list (with headers only) if `tf_name` is provided but not found or if the file cannot be processed.
----
-Tool Name: `get_tf_correlation_image_path`
-Description: Reads the `TF-TFcorTRMTEX.xlsx` file from `www/TFcorintextrm/`, finds the row for the given `tf_name` (case-insensitive match on the primary TF identifier column), and returns the path stored in the "TF Merged Graph Path" column. The returned path is relative to the project's `www` directory (e.g., "www/networkanalysis/images/BATF_graph.png").
-Input: `tf_name` (string) - The specific TF name.
-Output: `image_path` (string) - The relative web path to the image or an empty string if not found or if the file cannot be processed.
----
-Tool Name: `list_all_tfs_in_correlation_data`
-Description: Reads the `TF-TFcorTRMTEX.xlsx` file from `www/TFcorintextrm/` and returns a list of all unique TF names from the primary TF identifier column (typically "TF Name" or the first column). Filters out empty strings and 'nan'.
-Input: None
-Output: `tf_list` (list of strings) - A list of TF names. Returns an empty list if the file cannot be processed.
----
-Tool Name: `get_tf_community_sheet_data`
-Description: Reads one of the TF community Excel files (`trmcommunities.xlsx` or `texcommunities.xlsx`) located in `www/tfcommunities/`.
-Input: `community_type` (string) - Either "trm" or "texterm".
-Output: `community_data` (list of lists) - Data from the specified community sheet (raw format, first list is headers). Returns an empty list if the type is invalid or file not found/processed.
----
-Tool Name: `get_static_image_path`
-Description: Returns the predefined relative web path (e.g., "www/images/logo.png") for a known static image asset. These paths are typically relative to the project root.
-Input: `image_identifier` (string) - A unique key representing the image (e.g., "home_page_diagram", "ucsd_logo", "naive_bubble_plot", "wave1_main_img", "wave1_gokegg_img", "wave1_ranked_text1_img", "tfcat_overview_img", "network_correlation_desc_img").
-Output: `image_path` (string) - The relative path (e.g., "www/homedesc.png"). Returns an empty string if identifier is unknown. This tool relies on an internal mapping (`_STATIC_IMAGE_WEB_PATHS` in `tools.agent_tools`).
----
-Tool Name: `get_ui_descriptive_text`
-Description: Retrieves predefined descriptive text, methodology explanations, or captions by its identifier, primarily from `tools/ui_texts.json`.
-Input: `text_identifier` (string) - A unique key representing the text block (e.g., "tf_score_calculation_info", "cell_state_specificity_info", "wave_analysis_overview_text", "wave_1_analysis_placeholder_details").
-Output: `descriptive_text` (string) - The requested text block. Returns an empty string if identifier is unknown.
----
-Tool Name: `list_available_tf_catalog_datasets`
-Description: Returns a list of valid `dataset_identifier` strings that can be used with the `get_processed_tf_data` tool.
-Input: None
-Output: `dataset_identifiers` (list of strings) - E.g., ["Overall_TF_PageRank", "Naive", "TE", "MP", "TCM", "TEM", "TRM", "TEXprog", "TEXeff", "TEXterm"].
----
-Tool Name: `list_available_cell_state_bubble_plots`
-Description: Returns a list of identifiers for available cell-state specific bubble plot images. These identifiers can be used with `get_static_image_path`.
-Input: None
-Output: `image_identifiers` (list of strings) - E.g., ["naive_bubble_plot", "te_bubble_plot", ...]. Derived from internal mapping in `tools.agent_tools`.
----
-Tool Name: `list_available_wave_analysis_assets`
-Description: Returns a structured dictionary of available asset identifiers for a specific TF wave (main image, GO/KEGG image, ranked text images). Identifiers can be used with `get_static_image_path`.
-Input: `wave_number` (integer, 1-7) - The wave number.
-Output: `asset_info` (dictionary) - E.g., `{"main_image_id": "waveX_main_img", "gokegg_image_id": "waveX_gokegg_img", "ranked_text_image_ids": ["waveX_ranked_text1_img", ...]}`. Returns empty if wave number is invalid. Derived from internal mapping in `tools.agent_tools`.
----
-Tool Name: `get_internal_navigation_info`
-Description: Provides information about where an internal UI link (like those on the homepage image map or wave overview images) is intended to navigate within the application structure.
-Input: `link_id` (string) - The identifier of the link (e.g., "to_tfcat", "to_tfwave", "to_tfnet", "c1_link", "c2_link", etc.).
-Output: `navigation_target_description` (string) - A human-readable description of the target (e.g., "Navigates to the 'TF Catalog' section.", "Navigates to the 'Wave 1 Analysis' tab."). Derived from internal mapping in `tools.agent_tools`.
----
-Tool Name: `get_biorxiv_paper_url`
-Description: Returns the URL for the main bioRxiv paper referenced in the application.
-Input: None
-Output: `url` (string) - The bioRxiv paper URL.
----
-Tool Name: `list_all_files_in_www_directory`
-Description: Scans the entire `www/` directory (and its subdirectories, excluding common hidden/system files) and returns a list of all files found. For each file, it provides its relative path from the project root (e.g., "www/images/logo.png"), its detected MIME type (e.g., "image/png", "text/csv", "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"), and its size in bytes. This tool helps in understanding all available static assets and data files within the web-accessible `www` directory.
-Input: None
-Output: `file_manifest` (list of dictionaries) - Each dictionary represents a file and contains the keys: `path` (string), `type` (string), `size` (integer). Example item: `{"path": "www/data/report.txt", "type": "text/plain", "size": 1024}`. Returns an empty list if the `www` directory isn't found or is empty.
----
-### `multi_source_literature_search(queries: list[str], max_results_per_query_per_source: int = 1, max_total_unique_papers: int = 10) -> list[dict]`
-Searches for academic literature across multiple sources (Semantic Scholar, PubMed, ArXiv) using a list of provided search queries. It then de-duplicates the results based primarily on DOI, and secondarily on a combination of title and first author if DOI is not available. The search process stops early if the `max_total_unique_papers` limit is reached.
-**Args:**
-*   `queries (list[str])`: A list of search query strings. The GenerationAgent should brainstorm 3-5 diverse queries relevant to the user's request.
-*   `max_results_per_query_per_source (int)`: The maximum number of results to fetch from EACH academic source (Semantic Scholar, PubMed, ArXiv) for EACH query string. Defaults to `1`.
-*   `max_total_unique_papers (int)`: The maximum total number of unique de-duplicated papers to return across all queries and sources. Defaults to `10`. The tool will stop fetching more data once this limit is met.
-**Returns:**
-*   `list[dict]`: A consolidated and de-duplicated list of paper details, containing up to `max_total_unique_papers`. Each dictionary in the list represents a paper and has the following keys:
-    *   `"title" (str)`: The title of the paper. "N/A" if not available.
-    *   `"authors" (list[str])`: A list of author names. ["N/A"] if not available.
-    *   `"year" (str | int)`: The publication year. "N/A" if not available.
-    *   `"abstract" (str)`: A snippet of the abstract (typically up to 500 characters followed by "..."). "N/A" if not available.
-    *   `"doi" (str | None)`: The Digital Object Identifier. `None` if not available.
-    *   `"url" (str)`: A direct URL to the paper (e.g., PubMed link, ArXiv link, Semantic Scholar link). "N/A" if not available.
-    *   `"venue" (str)`: The publication venue (e.g., journal name, "ArXiv"). "N/A" if not available.
-    *   `"source_api" (str)`: The API from which this record was retrieved (e.g., "Semantic Scholar", "PubMed", "ArXiv").
-**GenerationAgent Usage Example (for `python_code` field when `status` is `AWAITING_DATA`):**
-```python
-# Example: User asks for up to 3 papers
-print(json.dumps({'intermediate_data_for_llm': tools.multi_source_literature_search(queries=["T-cell exhaustion markers AND cancer", "immunotherapy for melanoma AND biomarkers"], max_results_per_query_per_source=1, max_total_unique_papers=3)}))
-# Example: Defaulting to 10 total unique papers
-print(json.dumps({'intermediate_data_for_llm': tools.multi_source_literature_search(queries=["COVID-19 long-term effects"], max_results_per_query_per_source=2)}))
-```
-**Important Considerations for GenerationAgent:**
-*   When results are returned from this tool, the `GenerationAgent`'s `explanation` (for `CODE_COMPLETE` status) should present a summary of the *found papers* (e.g., titles, authors, URLs). It should clearly state that these are potential literature leads and should *not* yet claim to have read or summarized the full content of these papers in that same turn, unless a subsequent tool call for summarization is planned and executed.
----
-### `fetch_text_from_urls(paper_info_list: list[dict], max_chars_per_paper: int = 15000) -> list[dict]`
-Attempts to fetch and extract textual content from the URLs of papers provided in a list. This tool is typically used after `multi_source_literature_search` to gather content for summarization by the GenerationAgent.
-**Args:**
-*   `paper_info_list (list[dict])`: A list of paper dictionaries, as returned by `multi_source_literature_search`. Each dictionary is expected to have at least a `"url"` key. Other keys like `"title"` and `"source_api"` are used for logging.
-*   `max_chars_per_paper (int)`: The maximum number of characters of text to retrieve and store for each paper. Defaults to `15000`. Text longer than this will be truncated.
-**Returns:**
-*   `list[dict]`: The input `paper_info_list`, where each paper dictionary is augmented with a new key `"retrieved_text_content"`.
-    *   If successful, `"retrieved_text_content" (str)` will contain the extracted text (up to `max_chars_per_paper`).
-    *   If fetching or parsing fails for a paper, `"retrieved_text_content" (str)` will contain an error message (e.g., "Error: Invalid or missing URL.", "Error fetching URL: ...", "Error: No text could be extracted.").
-**GenerationAgent Usage Example (for `python_code` field when `status` is `AWAITING_DATA`):**
-This tool is usually the second step in a literature review process.
-```python
-# Assume 'list_of_papers_from_search' is a variable holding the output from a previous
-# call to tools.multi_source_literature_search(...)
-print(json.dumps({'intermediate_data_for_llm': tools.fetch_text_from_urls(paper_info_list=list_of_papers_from_search, max_chars_per_paper=10000)}))
-```
-**Important Considerations for GenerationAgent:**
-*   After this tool returns the `paper_info_list` (now with `"retrieved_text_content"`), the `GenerationAgent` is responsible for using its own LLM capabilities to read the `"retrieved_text_content"` for each paper and generate summaries if requested by the user or if it's part of its plan.
-*   The `GenerationAgent` should be prepared for `"retrieved_text_content"` to contain error messages and handle them gracefully in its summarization logic (e.g., by stating that text for a particular paper could not be retrieved).
-*   Web scraping is inherently unreliable; success in fetching and parsing text can vary greatly between websites. The agent should not assume text will always be available.
 ---

+# Agent Tools Documentation
+This document outlines the granular tools that can be created or extracted from the TaijiChat R Shiny application. These tools are intended for an agent system to access data, calculations, methodologies, tables, and graphs from the application.
+---
+Tool Name: `get_raw_excel_data`
+Description: Reads a specified Excel file and returns its raw content as a list of lists, where each inner list represents a row. This tool is generic; the `file_path` should be an absolute path or a path relative to the project root (e.g., "www/some_data.xlsx"). For predefined datasets within the application structure, other more specific tools should be preferred if available.
+Input: `file_path` (string) - The path to the Excel file.
+Output: `data` (list of lists of strings/numbers) - The raw data from the Excel sheet. Returns an empty list if the file is not found or cannot be read.
+---
+Tool Name: `get_processed_tf_data`
+Description: Reads and processes a TF-related Excel file identified by its dataset_identifier (e.g., "Naive", "Overall_TF_PageRank"). It uses an internal mapping (get_tf_catalog_dataset_path) to find the actual file path within the 'www/tablePagerank/' directory. The standard processing includes: reading the Excel file, transposing it, using the original first row as new column headers, and then removing this header row from the data.
+Input: `dataset_identifier` (string) - The identifier for the dataset. Valid identifiers include: "Overall_TF_PageRank", "Naive", "TE", "MP", "TCM", "TEM", "TRM", "TEXprog", "TEXeff", "TEXterm".
+Output: `data` (list of lists of strings/numbers) - The processed data, where the first inner list contains the headers, and subsequent lists are data rows. Returns an empty list if processing fails or identifier is invalid.
+---
+Tool Name: `filter_data_by_column_keywords`
+Description: Filters a dataset (list of lists, where the first list is headers) based on keywords matching its column names. This is for data that has already been processed (e.g., by `get_processed_tf_data`) where TFs or genes are column headers. The keyword search is case-insensitive and supports multiple comma-separated keywords. If no keywords are provided, the original dataset is returned.
+Input:
+    `dataset` (list of lists) - The data to filter, with the first list being headers.
+    `keywords` (string) - Comma-separated keywords to search for in column headers.
+Output: `filtered_dataset` (list of lists) - The subset of the data containing only the matching columns (including the header row). Returns an empty list (with headers only) if no columns match.
+---
+Tool Name: `get_tf_wave_search_data`
+Description: Reads the `searchtfwaves.xlsx` file from `www/waveanalysis/`, which contains TF names organized by "waves" (Wave1 to Wave7 as columns).
+Input: `tf_search_term` (string, optional) - A specific TF name to search for. If empty or not provided, all TF wave data is returned. The search is case-insensitive.
+Output: `wave_data` (dictionary) - If `tf_search_term` is provided and matches, returns a structure like `{"WaveX": ["TF1", "TF2"], "WaveY": ["TF1"]}` showing which waves the TF belongs to. If no `tf_search_term`, returns the full data as `{"Wave1": ["All TFs in Wave1"], "Wave2": ["All TFs in Wave2"], ...}`. If no matches are found for a search term, an empty dictionary is returned.
+---
+Tool Name: `get_tf_correlation_data`
+Description: Reads the `TF-TFcorTRMTEX.xlsx` file from `www/TFcorintextrm/`. If a `tf_name` is provided, it filters the data for that TF (case-insensitive match on the primary TF identifier column, typically "TF Name" or the first column).
+Input: `tf_name` (string, optional) - The specific TF name to search for. If empty or not provided, returns the full dataset.
+Output: `correlation_data` (list of lists) - The filtered (or full) data from the correlation table. The first list is headers. Returns an empty list (with headers only) if `tf_name` is provided but not found or if the file cannot be processed.
+---
+Tool Name: `get_tf_correlation_image_path`
+Description: Reads the `TF-TFcorTRMTEX.xlsx` file from `www/TFcorintextrm/`, finds the row for the given `tf_name` (case-insensitive match on the primary TF identifier column), and returns the path stored in the "TF Merged Graph Path" column. The returned path is relative to the project's `www` directory (e.g., "www/networkanalysis/images/BATF_graph.png").
+Input: `tf_name` (string) - The specific TF name.
+Output: `image_path` (string) - The relative web path to the image or an empty string if not found or if the file cannot be processed.
+---
+Tool Name: `list_all_tfs_in_correlation_data`
+Description: Reads the `TF-TFcorTRMTEX.xlsx` file from `www/TFcorintextrm/` and returns a list of all unique TF names from the primary TF identifier column (typically "TF Name" or the first column). Filters out empty strings and 'nan'.
+Input: None
+Output: `tf_list` (list of strings) - A list of TF names. Returns an empty list if the file cannot be processed.
+---
+Tool Name: `get_tf_community_sheet_data`
+Description: Reads one of the TF community Excel files (`trmcommunities.xlsx` or `texcommunities.xlsx`) located in `www/tfcommunities/`.
+Input: `community_type` (string) - Either "trm" or "texterm".
+Output: `community_data` (list of lists) - Data from the specified community sheet (raw format, first list is headers). Returns an empty list if the type is invalid or file not found/processed.
+---
+Tool Name: `get_static_image_path`
+Description: Returns the predefined relative web path (e.g., "www/images/logo.png") for a known static image asset. These paths are typically relative to the project root.
+Input: `image_identifier` (string) - A unique key representing the image (e.g., "home_page_diagram", "ucsd_logo", "naive_bubble_plot", "wave1_main_img", "wave1_gokegg_img", "wave1_ranked_text1_img", "tfcat_overview_img", "network_correlation_desc_img").
+Output: `image_path` (string) - The relative path (e.g., "www/homedesc.png"). Returns an empty string if identifier is unknown. This tool relies on an internal mapping (`_STATIC_IMAGE_WEB_PATHS` in `tools.agent_tools`).
+---
+Tool Name: `get_ui_descriptive_text`
+Description: Retrieves predefined descriptive text, methodology explanations, or captions by its identifier, primarily from `tools/ui_texts.json`.
+Input: `text_identifier` (string) - A unique key representing the text block (e.g., "tf_score_calculation_info", "cell_state_specificity_info", "wave_analysis_overview_text", "wave_1_analysis_placeholder_details").
+Output: `descriptive_text` (string) - The requested text block. Returns an empty string if identifier is unknown.
+---
+Tool Name: `list_available_tf_catalog_datasets`
+Description: Returns a list of valid `dataset_identifier` strings that can be used with the `get_processed_tf_data` tool.
+Input: None
+Output: `dataset_identifiers` (list of strings) - E.g., ["Overall_TF_PageRank", "Naive", "TE", "MP", "TCM", "TEM", "TRM", "TEXprog", "TEXeff", "TEXterm"].
+---
+Tool Name: `list_available_cell_state_bubble_plots`
+Description: Returns a list of identifiers for available cell-state specific bubble plot images. These identifiers can be used with `get_static_image_path`.
+Input: None
+Output: `image_identifiers` (list of strings) - E.g., ["naive_bubble_plot", "te_bubble_plot", ...]. Derived from internal mapping in `tools.agent_tools`.
+---
+Tool Name: `list_available_wave_analysis_assets`
+Description: Returns a structured dictionary of available asset identifiers for a specific TF wave (main image, GO/KEGG image, ranked text images). Identifiers can be used with `get_static_image_path`.
+Input: `wave_number` (integer, 1-7) - The wave number.
+Output: `asset_info` (dictionary) - E.g., `{"main_image_id": "waveX_main_img", "gokegg_image_id": "waveX_gokegg_img", "ranked_text_image_ids": ["waveX_ranked_text1_img", ...]}`. Returns empty if wave number is invalid. Derived from internal mapping in `tools.agent_tools`.
+---
+Tool Name: `get_internal_navigation_info`
+Description: Provides information about where an internal UI link (like those on the homepage image map or wave overview images) is intended to navigate within the application structure.
+Input: `link_id` (string) - The identifier of the link (e.g., "to_tfcat", "to_tfwave", "to_tfnet", "c1_link", "c2_link", etc.).
+Output: `navigation_target_description` (string) - A human-readable description of the target (e.g., "Navigates to the 'TF Catalog' section.", "Navigates to the 'Wave 1 Analysis' tab."). Derived from internal mapping in `tools.agent_tools`.
+---
+Tool Name: `get_biorxiv_paper_url`
+Description: Returns the URL for the main bioRxiv paper referenced in the application.
+Input: None
+Output: `url` (string) - The bioRxiv paper URL.
+---
+Tool Name: `list_all_files_in_www_directory`
+Description: Scans the entire `www/` directory (and its subdirectories, excluding common hidden/system files) and returns a list of all files found. For each file, it provides its relative path from the project root (e.g., "www/images/logo.png"), its detected MIME type (e.g., "image/png", "text/csv", "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"), and its size in bytes. This tool helps in understanding all available static assets and data files within the web-accessible `www` directory.
+Input: None
+Output: `file_manifest` (list of dictionaries) - Each dictionary represents a file and contains the keys: `path` (string), `type` (string), `size` (integer). Example item: `{"path": "www/data/report.txt", "type": "text/plain", "size": 1024}`. Returns an empty list if the `www` directory isn't found or is empty.
+---
+### `multi_source_literature_search(queries: list[str], max_results_per_query_per_source: int = 1, max_total_unique_papers: int = 10) -> list[dict]`
+Searches for academic literature across multiple sources (Semantic Scholar, PubMed, ArXiv) using a list of provided search queries. It then de-duplicates the results based primarily on DOI, and secondarily on a combination of title and first author if DOI is not available. The search process stops early if the `max_total_unique_papers` limit is reached.
+**Args:**
+*   `queries (list[str])`: A list of search query strings. The GenerationAgent should brainstorm 3-5 diverse queries relevant to the user's request.
+*   `max_results_per_query_per_source (int)`: The maximum number of results to fetch from EACH academic source (Semantic Scholar, PubMed, ArXiv) for EACH query string. Defaults to `1`.
+*   `max_total_unique_papers (int)`: The maximum total number of unique de-duplicated papers to return across all queries and sources. Defaults to `10`. The tool will stop fetching more data once this limit is met.
+**Returns:**
+*   `list[dict]`: A consolidated and de-duplicated list of paper details, containing up to `max_total_unique_papers`. Each dictionary in the list represents a paper and has the following keys:
+    *   `"title" (str)`: The title of the paper. "N/A" if not available.
+    *   `"authors" (list[str])`: A list of author names. ["N/A"] if not available.
+    *   `"year" (str | int)`: The publication year. "N/A" if not available.
+    *   `"abstract" (str)`: A snippet of the abstract (typically up to 500 characters followed by "..."). "N/A" if not available.
+    *   `"doi" (str | None)`: The Digital Object Identifier. `None` if not available.
+    *   `"url" (str)`: A direct URL to the paper (e.g., PubMed link, ArXiv link, Semantic Scholar link). "N/A" if not available.
+    *   `"venue" (str)`: The publication venue (e.g., journal name, "ArXiv"). "N/A" if not available.
+    *   `"source_api" (str)`: The API from which this record was retrieved (e.g., "Semantic Scholar", "PubMed", "ArXiv").
+**GenerationAgent Usage Example (for `python_code` field when `status` is `AWAITING_DATA`):**
+```python
+# Example: User asks for up to 3 papers
+print(json.dumps({'intermediate_data_for_llm': tools.multi_source_literature_search(queries=["T-cell exhaustion markers AND cancer", "immunotherapy for melanoma AND biomarkers"], max_results_per_query_per_source=1, max_total_unique_papers=3)}))
+# Example: Defaulting to 10 total unique papers
+print(json.dumps({'intermediate_data_for_llm': tools.multi_source_literature_search(queries=["COVID-19 long-term effects"], max_results_per_query_per_source=2)}))
+```
+**Important Considerations for GenerationAgent:**
+*   When results are returned from this tool, the `GenerationAgent`'s `explanation` (for `CODE_COMPLETE` status) should present a summary of the *found papers* (e.g., titles, authors, URLs). It should clearly state that these are potential literature leads and should *not* yet claim to have read or summarized the full content of these papers in that same turn, unless a subsequent tool call for summarization is planned and executed.
+---
+### `fetch_text_from_urls(paper_info_list: list[dict], max_chars_per_paper: int = 15000) -> list[dict]`
+Attempts to fetch and extract textual content from the URLs of papers provided in a list. This tool is typically used after `multi_source_literature_search` to gather content for summarization by the GenerationAgent.
+**Args:**
+*   `paper_info_list (list[dict])`: A list of paper dictionaries, as returned by `multi_source_literature_search`. Each dictionary is expected to have at least a `"url"` key. Other keys like `"title"` and `"source_api"` are used for logging.
+*   `max_chars_per_paper (int)`: The maximum number of characters of text to retrieve and store for each paper. Defaults to `15000`. Text longer than this will be truncated.
+**Returns:**
+*   `list[dict]`: The input `paper_info_list`, where each paper dictionary is augmented with a new key `"retrieved_text_content"`.
+    *   If successful, `"retrieved_text_content" (str)` will contain the extracted text (up to `max_chars_per_paper`).
+    *   If fetching or parsing fails for a paper, `"retrieved_text_content" (str)` will contain an error message (e.g., "Error: Invalid or missing URL.", "Error fetching URL: ...", "Error: No text could be extracted.").
+**GenerationAgent Usage Example (for `python_code` field when `status` is `AWAITING_DATA`):**
+This tool is usually the second step in a literature review process.
+```python
+# Assume 'list_of_papers_from_search' is a variable holding the output from a previous
+# call to tools.multi_source_literature_search(...)
+print(json.dumps({'intermediate_data_for_llm': tools.fetch_text_from_urls(paper_info_list=list_of_papers_from_search, max_chars_per_paper=10000)}))
+```
+**Important Considerations for GenerationAgent:**
+*   After this tool returns the `paper_info_list` (now with `"retrieved_text_content"`), the `GenerationAgent` is responsible for using its own LLM capabilities to read the `"retrieved_text_content"` for each paper and generate summaries if requested by the user or if it's part of its plan.
+*   The `GenerationAgent` should be prepared for `"retrieved_text_content"` to contain error messages and handle them gracefully in its summarization logic (e.g., by stating that text for a particular paper could not be retrieved).
+*   Web scraping is inherently unreliable; success in fetching and parsing text can vary greatly between websites. The agent should not assume text will always be available.
 ---

tools/excel_data_documentation.md CHANGED Viewed

@@ -1,183 +1,183 @@
-# Documented Excel Files
-This file lists the Excel files that have been analyzed and documented.
-*   `./www/multi-omicsdata.xlsx`
-*   `./www/networkanalysis/comp_log2FC_RegulatedData_TRMTEXterm.xlsx`
-comp_log2FC_RegulatedData_TRMTEXterm.xlsx tabulates log₂ fold-change values for 17,483 genes (rows) across 198 transcription factors (columns) in the TRM→TexTerm regulated-data comparison. The first column ("Unnamed: 0") lists each gene's identifier (e.g. "0610005C13RIK"); each subsequent column is named by a TF (Ahr, Arid3a, Arnt, …, Zscan20) and contains the corresponding log₂ fold-change value.
-For instance, a value of 19.615925 in row 0610009B22RIK under Arnt indicates that gene 0610009B22RIK exhibited a log₂ fold-change of 19.615925 in the Arnt-associated regulated data when comparing TRM to TexTerm.
-*   `./www/old files/log2FC_RegulatedData_TRMTEXterm.xlsx`
-*   `./www/tablePagerank/MP.xlsx`
-MP.xlsx tabulates performance scores for 57 transcription factors ("TF") across 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention <Method>_<TrainingDataset>_<EvaluationDataset>, where:
-Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.
-TrainingDataset is the source dataset used to train the model (e.g., Mackay, Chung, Scott, etc.).
-EvaluationDataset is the dataset on which performance was assessed.
-Each cell contains the resulting floating-point score for that TF under the specified method and dataset pairing.
-For example, a cell value of 0.72 in row GATA1 under column MP_Mackay_Chung means that the MP scoring method—trained on the Mackay dataset—achieved a performance score of 0.72 when evaluated on the Chung dataset.
-*   `./www/tablePagerank/Naive.xlsx`
-Naive.xlsx tabulates performance scores for 31 transcription factors ("TF") across the same 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention <Method>_<TrainingDataset>_<EvaluationDataset>, where:
-Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.
-TrainingDataset is the source dataset used to train the model (e.g., Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson, etc.).
-EvaluationDataset is the dataset on which performance was assessed.
-Each cell contains the resulting floating-point score for that TF under the specified method and dataset pairing.
-For example, in row Tcf7 under column Naive_Kaech_Chung, the value 1.626392 indicates that the Naive scoring method—trained on the Kaech dataset—achieved a performance score of 1.626392 when evaluated on the Chung dataset.
-*   `./www/tablePagerank/Table_TF PageRank Scores for Audrey.xlsx`
-Table_TF PageRank Scores for Audrey.xlsx tabulates PageRank‐derived scores for 308 transcription factors (“TF”) across the same 42 method–dataset combinations, with two additional annotation columns:
-TF (first column): Transcription factor name.
-Category: Broad TF class (e.g. “Universal TFs,” “Lineage-specific TFs,” etc.).
-Cell-state specificity: Whether the TF is “Universal,” “Pluripotent,” “Myeloid,” etc.
-Each of the remaining 42 columns follows the convention <Method>_<TrainingDataset>_<EvaluationDataset>, where:
-Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.
-TrainingDataset is the dataset used to fit the PageRank model (e.g. Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson).
-EvaluationDataset is the dataset on which the PageRank scores were assessed.
-Each cell holds the floating-point PageRank score for that TF under the specified method and dataset pairing.
-For example, a value of 1.003938 in row Elf1 under column Naive_Kaech_Kaech indicates that the Naive PageRank model—trained and evaluated on the Kaech dataset—assigned Elf1 a score of 1.003938.
-*   `./www/tablePagerank/TCM.xlsx`
-TCM.xlsx tabulates performance scores for 28 transcription factors (“TF”) across 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention <Method>_<TrainingDataset>_<EvaluationDataset>, where:
-Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.
-TrainingDataset is the source dataset used to train the model (e.g. Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson).
-EvaluationDataset is the dataset on which performance was assessed (e.g. Chung, Mackay, Scott, etc.).
-Each cell holds the resulting floating-point metric for that TF under the specified method and dataset pairing.
-For example, a value of 0.837792 in row Msgn1 under column TCM_Mackay_Chung indicates that the TCM scoring method—trained on the Mackay dataset—achieved a performance score of 0.837792 when evaluated on the Chung dataset.
-*   `./www/tablePagerank/TE.xlsx`
-TE.xlsx tabulates performance scores for 33 transcription factors (“TF”) across the same 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention <Method>_<TrainingDataset>_<EvaluationDataset>, where:
-Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.
-TrainingDataset is the source dataset used to train the model (e.g., Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson).
-EvaluationDataset is the dataset on which performance was assessed (e.g., Chung, Scott, Mackay, etc.).
-Each cell contains the resulting floating-point metric for that TF under the specified method and dataset pairing.
-For example, if you see 0.65 in row Myod1 under column TE_Mackay_Chung, it means that the TE method—trained on the Mackay dataset—achieved a performance score of 0.65 when evaluated on the Chung dataset.
-*   `./www/tablePagerank/TEM.xlsx`
-TEM.xlsx tabulates performance scores for 25 transcription factors (“TF”) across 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention <Method>_<TrainingDataset>_<EvaluationDataset>, where:
-Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.
-TrainingDataset is the source dataset used to train the model (e.g., Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson).
-EvaluationDataset is the dataset on which performance was assessed (e.g., Chung, Mackay, Scott, etc.).
-Each cell contains the resulting floating-point metric for that TF under the specified method and dataset pairing.
-For example, a value of 1.6696786566 in row Foxc2 under column TEM_Mackay_Chung means that the TEM scoring method—trained on the Mackay dataset—achieved a performance score of 1.6696786566 when evaluated on the Chung dataset.
-*   `./www/tablePagerank/TEXeff.xlsx`
-TEXeff.xlsx tabulates performance scores for 62 transcription factors (“TF”) across 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention <Method>_<TrainingDataset>_<EvaluationDataset>, where:
-Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.
-TrainingDataset is the source dataset used to train the model (e.g. Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson).
-EvaluationDataset is the dataset on which performance was assessed.
-Each cell contains the resulting floating‐point metric for that TF under the specified method and dataset pairing.
-For example, a value of 0.647 in row Vax2 under column TexTerm_Hudson_Beltra means that the TexTerm scoring method—trained on the Hudson dataset—achieved a performance score of 0.647 when evaluated on the Beltra dataset.
-*   `./www/tablePagerank/TEXprog.xlsx`
-TEXprog.xlsx tabulates performance scores for 63 transcription factors (“TF”) across 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention <Method>_<TrainingDataset>_<EvaluationDataset>, where:
-Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.
-TrainingDataset is the source dataset used to train the model (e.g. Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson).
-EvaluationDataset is the dataset on which performance was assessed (e.g. Chung, Mackay, Scott, etc.).
-Each cell holds the resulting floating-point metric for that TF under the specified method and dataset pairing.
-For example, a value of 1.5403 in row Irf9 under column TexProg_Beltra_Chung means that the TexProg scoring method—trained on the Beltra dataset—achieved a performance score of 1.5403 when evaluated on the Chung dataset.
-*   `./www/tablePagerank/TEXterm.xlsx`
-TEXterm.xlsx tabulates performance scores for 51 transcription factors (“TF”) across 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention <Method>_<TrainingDataset>_<EvaluationDataset>, where:
-Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.
-TrainingDataset is the dataset used to fit the model (e.g. Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson).
-EvaluationDataset is the dataset on which performance was assessed.
-Each cell holds the floating-point metric for that TF under the specified method and dataset pairing.
-For example, a value of 0.912 in row Sox2 under column TexTerm_Scott_Mackay means that the TexTerm method—trained on the Scott dataset—achieved a performance score of 0.912 when evaluated on the Mackay dataset.
-*   `./www/tablePagerank/TRM.xlsx`
-TRM.xlsx tabulates performance scores for 43 transcription factors (“TF”) across the same 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention <Method>_<TrainingDataset>_<EvaluationDataset>, where:
-Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.
-TrainingDataset is the dataset used to train the model (e.g., Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson).
-EvaluationDataset is the dataset on which performance was assessed (e.g., Chung, Mackay, Scott, etc.).
-Each cell contains the resulting floating-point metric for that TF under the specified method and dataset pairing.
-For example, a value of 0.91 in row PU.1 under column TRM.IEL_Chung_Mackay means that the TRM.IEL scoring method—trained on the Chung dataset—achieved a performance score of 0.91 when evaluated on the Mackay dataset.
-*   `./www/tfcommunities/texcommunities.xlsx`
-texcommunities.xlsx is a multi-sheet workbook (12 sheets) that organizes transcription factors into network "communities" for two models—TEX and TRM:
-TEX Communities: A summary sheet with two columns—C (community ID, e.g. C1–C5) and TF Members (a comma-separated list of all TFs in that community).
-TEX_c1 through TEX_c5: One sheet per TEX community, each listing a single TF column of member factors.
-TRM Communities: A parallel summary sheet for the TRM model, also with C and TF Members columns.
-TRM_c1 through TRM_c5: Individual sheets listing TFs for each TRM community.
-Each community groups TFs based on network topology under the respective model. For example, in the TEX Communities sheet, community C1 includes the following TF members: Usf1, Arnt, Mlx, Srebf1, Arntl, Tfe3, Heyl, Bhlhe40, …, indicating that these factors cluster together in the TEX network.
-*   `./www/tfcommunities/trmcommunities.xlsx`
-trmcommunities.xlsx is a multi‐sheet workbook (6 sheets) that defines transcription factor communities for the TRM network model:
-TRM Communities: A summary sheet with two columns—C (community ID, C1–C5) and TF Members (a comma‐separated list of all TFs in that community).
-TRM_c1 through TRM_c5: Each sheet lists a single TF column naming the factors that belong to that community.
-These communities reflect clusters of TFs based on network topology under the TRM model. For example, in the TRM Communities sheet, community C1 might include TFs such as PU.1, Runx3, and Irf4, indicating that these factors form a tightly connected module in the TRM network.
-*   `./www/TFcorintextrm/TF-TFcorTRMTEX.xlsx`
-TF-TFcorTRMTEX.xlsx contains pairwise correlation matrices of transcription factor scores for both the TRM and TEX models. It has two sheets:
-TRM: A square matrix where both rows and columns list the same set of TFs; each cell at the intersection of TF A (row) and TF B (column) gives the Pearson correlation coefficient between their TRM PageRank (or performance) scores across all dataset contexts.
-TEX: The analogous matrix for the TEX model.
-For example, on the TRM sheet, the value 0.82 at row PU.1 and column Runx3 indicates that PU.1 and Runx3 have a correlation of 0.82 in their TRM-derived scores.
-*   `./www/waveanalysis/searchtfwaves.xlsx`

+# Documented Excel Files
+This file lists the Excel files that have been analyzed and documented.
+*   `./www/multi-omicsdata.xlsx`
+*   `./www/networkanalysis/comp_log2FC_RegulatedData_TRMTEXterm.xlsx`
+comp_log2FC_RegulatedData_TRMTEXterm.xlsx tabulates log₂ fold-change values for 17,483 genes (rows) across 198 transcription factors (columns) in the TRM→TexTerm regulated-data comparison. The first column ("Unnamed: 0") lists each gene's identifier (e.g. "0610005C13RIK"); each subsequent column is named by a TF (Ahr, Arid3a, Arnt, …, Zscan20) and contains the corresponding log₂ fold-change value.
+For instance, a value of 19.615925 in row 0610009B22RIK under Arnt indicates that gene 0610009B22RIK exhibited a log₂ fold-change of 19.615925 in the Arnt-associated regulated data when comparing TRM to TexTerm.
+*   `./www/old files/log2FC_RegulatedData_TRMTEXterm.xlsx`
+*   `./www/tablePagerank/MP.xlsx`
+MP.xlsx tabulates performance scores for 57 transcription factors ("TF") across 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention <Method>_<TrainingDataset>_<EvaluationDataset>, where:
+Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.
+TrainingDataset is the source dataset used to train the model (e.g., Mackay, Chung, Scott, etc.).
+EvaluationDataset is the dataset on which performance was assessed.
+Each cell contains the resulting floating-point score for that TF under the specified method and dataset pairing.
+For example, a cell value of 0.72 in row GATA1 under column MP_Mackay_Chung means that the MP scoring method—trained on the Mackay dataset—achieved a performance score of 0.72 when evaluated on the Chung dataset.
+*   `./www/tablePagerank/Naive.xlsx`
+Naive.xlsx tabulates performance scores for 31 transcription factors ("TF") across the same 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention <Method>_<TrainingDataset>_<EvaluationDataset>, where:
+Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.
+TrainingDataset is the source dataset used to train the model (e.g., Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson, etc.).
+EvaluationDataset is the dataset on which performance was assessed.
+Each cell contains the resulting floating-point score for that TF under the specified method and dataset pairing.
+For example, in row Tcf7 under column Naive_Kaech_Chung, the value 1.626392 indicates that the Naive scoring method—trained on the Kaech dataset—achieved a performance score of 1.626392 when evaluated on the Chung dataset.
+*   `./www/tablePagerank/Table_TF PageRank Scores for Audrey.xlsx`
+Table_TF PageRank Scores for Audrey.xlsx tabulates PageRank‐derived scores for 308 transcription factors (“TF”) across the same 42 method–dataset combinations, with two additional annotation columns:
+TF (first column): Transcription factor name.
+Category: Broad TF class (e.g. “Universal TFs,” “Lineage-specific TFs,” etc.).
+Cell-state specificity: Whether the TF is “Universal,” “Pluripotent,” “Myeloid,” etc.
+Each of the remaining 42 columns follows the convention <Method>_<TrainingDataset>_<EvaluationDataset>, where:
+Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.
+TrainingDataset is the dataset used to fit the PageRank model (e.g. Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson).
+EvaluationDataset is the dataset on which the PageRank scores were assessed.
+Each cell holds the floating-point PageRank score for that TF under the specified method and dataset pairing.
+For example, a value of 1.003938 in row Elf1 under column Naive_Kaech_Kaech indicates that the Naive PageRank model—trained and evaluated on the Kaech dataset—assigned Elf1 a score of 1.003938.
+*   `./www/tablePagerank/TCM.xlsx`
+TCM.xlsx tabulates performance scores for 28 transcription factors (“TF”) across 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention <Method>_<TrainingDataset>_<EvaluationDataset>, where:
+Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.
+TrainingDataset is the source dataset used to train the model (e.g. Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson).
+EvaluationDataset is the dataset on which performance was assessed (e.g. Chung, Mackay, Scott, etc.).
+Each cell holds the resulting floating-point metric for that TF under the specified method and dataset pairing.
+For example, a value of 0.837792 in row Msgn1 under column TCM_Mackay_Chung indicates that the TCM scoring method—trained on the Mackay dataset—achieved a performance score of 0.837792 when evaluated on the Chung dataset.
+*   `./www/tablePagerank/TE.xlsx`
+TE.xlsx tabulates performance scores for 33 transcription factors (“TF”) across the same 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention <Method>_<TrainingDataset>_<EvaluationDataset>, where:
+Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.
+TrainingDataset is the source dataset used to train the model (e.g., Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson).
+EvaluationDataset is the dataset on which performance was assessed (e.g., Chung, Scott, Mackay, etc.).
+Each cell contains the resulting floating-point metric for that TF under the specified method and dataset pairing.
+For example, if you see 0.65 in row Myod1 under column TE_Mackay_Chung, it means that the TE method—trained on the Mackay dataset—achieved a performance score of 0.65 when evaluated on the Chung dataset.
+*   `./www/tablePagerank/TEM.xlsx`
+TEM.xlsx tabulates performance scores for 25 transcription factors (“TF”) across 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention <Method>_<TrainingDataset>_<EvaluationDataset>, where:
+Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.
+TrainingDataset is the source dataset used to train the model (e.g., Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson).
+EvaluationDataset is the dataset on which performance was assessed (e.g., Chung, Mackay, Scott, etc.).
+Each cell contains the resulting floating-point metric for that TF under the specified method and dataset pairing.
+For example, a value of 1.6696786566 in row Foxc2 under column TEM_Mackay_Chung means that the TEM scoring method—trained on the Mackay dataset—achieved a performance score of 1.6696786566 when evaluated on the Chung dataset.
+*   `./www/tablePagerank/TEXeff.xlsx`
+TEXeff.xlsx tabulates performance scores for 62 transcription factors (“TF”) across 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention <Method>_<TrainingDataset>_<EvaluationDataset>, where:
+Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.
+TrainingDataset is the source dataset used to train the model (e.g. Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson).
+EvaluationDataset is the dataset on which performance was assessed.
+Each cell contains the resulting floating‐point metric for that TF under the specified method and dataset pairing.
+For example, a value of 0.647 in row Vax2 under column TexTerm_Hudson_Beltra means that the TexTerm scoring method—trained on the Hudson dataset—achieved a performance score of 0.647 when evaluated on the Beltra dataset.
+*   `./www/tablePagerank/TEXprog.xlsx`
+TEXprog.xlsx tabulates performance scores for 63 transcription factors (“TF”) across 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention <Method>_<TrainingDataset>_<EvaluationDataset>, where:
+Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.
+TrainingDataset is the source dataset used to train the model (e.g. Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson).
+EvaluationDataset is the dataset on which performance was assessed (e.g. Chung, Mackay, Scott, etc.).
+Each cell holds the resulting floating-point metric for that TF under the specified method and dataset pairing.
+For example, a value of 1.5403 in row Irf9 under column TexProg_Beltra_Chung means that the TexProg scoring method—trained on the Beltra dataset—achieved a performance score of 1.5403 when evaluated on the Chung dataset.
+*   `./www/tablePagerank/TEXterm.xlsx`
+TEXterm.xlsx tabulates performance scores for 51 transcription factors (“TF”) across 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention <Method>_<TrainingDataset>_<EvaluationDataset>, where:
+Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.
+TrainingDataset is the dataset used to fit the model (e.g. Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson).
+EvaluationDataset is the dataset on which performance was assessed.
+Each cell holds the floating-point metric for that TF under the specified method and dataset pairing.
+For example, a value of 0.912 in row Sox2 under column TexTerm_Scott_Mackay means that the TexTerm method—trained on the Scott dataset—achieved a performance score of 0.912 when evaluated on the Mackay dataset.
+*   `./www/tablePagerank/TRM.xlsx`
+TRM.xlsx tabulates performance scores for 43 transcription factors (“TF”) across the same 42 method–dataset combinations. The first column, TF, lists each factor's name; the remaining columns follow the naming convention <Method>_<TrainingDataset>_<EvaluationDataset>, where:
+Method is one of twelve cell states: Naive, MP, TCM, TE, TEM, TRM.IEL, TRM.liver, TexProg1, TexProg2, TexProg, TexInt, or TexTerm.
+TrainingDataset is the dataset used to train the model (e.g., Kaech, Chung, Mackay, MilnerAug, Renkema, Scott, Beltra, Hudson).
+EvaluationDataset is the dataset on which performance was assessed (e.g., Chung, Mackay, Scott, etc.).
+Each cell contains the resulting floating-point metric for that TF under the specified method and dataset pairing.
+For example, a value of 0.91 in row PU.1 under column TRM.IEL_Chung_Mackay means that the TRM.IEL scoring method—trained on the Chung dataset—achieved a performance score of 0.91 when evaluated on the Mackay dataset.
+*   `./www/tfcommunities/texcommunities.xlsx`
+texcommunities.xlsx is a multi-sheet workbook (12 sheets) that organizes transcription factors into network "communities" for two models—TEX and TRM:
+TEX Communities: A summary sheet with two columns—C (community ID, e.g. C1–C5) and TF Members (a comma-separated list of all TFs in that community).
+TEX_c1 through TEX_c5: One sheet per TEX community, each listing a single TF column of member factors.
+TRM Communities: A parallel summary sheet for the TRM model, also with C and TF Members columns.
+TRM_c1 through TRM_c5: Individual sheets listing TFs for each TRM community.
+Each community groups TFs based on network topology under the respective model. For example, in the TEX Communities sheet, community C1 includes the following TF members: Usf1, Arnt, Mlx, Srebf1, Arntl, Tfe3, Heyl, Bhlhe40, …, indicating that these factors cluster together in the TEX network.
+*   `./www/tfcommunities/trmcommunities.xlsx`
+trmcommunities.xlsx is a multi‐sheet workbook (6 sheets) that defines transcription factor communities for the TRM network model:
+TRM Communities: A summary sheet with two columns—C (community ID, C1–C5) and TF Members (a comma‐separated list of all TFs in that community).
+TRM_c1 through TRM_c5: Each sheet lists a single TF column naming the factors that belong to that community.
+These communities reflect clusters of TFs based on network topology under the TRM model. For example, in the TRM Communities sheet, community C1 might include TFs such as PU.1, Runx3, and Irf4, indicating that these factors form a tightly connected module in the TRM network.
+*   `./www/TFcorintextrm/TF-TFcorTRMTEX.xlsx`
+TF-TFcorTRMTEX.xlsx contains pairwise correlation matrices of transcription factor scores for both the TRM and TEX models. It has two sheets:
+TRM: A square matrix where both rows and columns list the same set of TFs; each cell at the intersection of TF A (row) and TF B (column) gives the Pearson correlation coefficient between their TRM PageRank (or performance) scores across all dataset contexts.
+TEX: The analogous matrix for the TEX model.
+For example, on the TRM sheet, the value 0.82 at row PU.1 and column Runx3 indicates that PU.1 and Runx3 have a correlation of 0.82 in their TRM-derived scores.
+*   `./www/waveanalysis/searchtfwaves.xlsx`

tools/ui_texts.json CHANGED Viewed

@@ -1,27 +1,27 @@
-{
-    "home_intro_1": "This website serves as a companion resource to our study, \"Multi-Omics Atlas-Assisted Discovery of Transcription Factors for Selective T Cell State Programming.\" It is designed to make the bioinformatics analyses and data from the study accessible to researchers from diverse backgrounds, including those without extensive bioinformatics expertise. The platform provides comprehensive tools to explore transcription factor (TF) activities across distinct T cell states, enabling users to examine TF scores for specific cell states, multi-state TFs and their regulatory roles, visualize TF relationships using wave and network analyses and access a searchable database of TF scores and multi-omics data. Explore our TF catalog, network analyses, and more to uncover insights into our study.",
-    "home_tcell_states_desc": "Naive T cells adopt diverse states in various contexts, such as acute or chronic infections and tumors. Upon activation, they become early effector (EE) cells that differentiate into distinct CD8+ T cell subsets with varied trafficking patterns—residing in lymphoid organs, blood, or peripheral tissues. In acute infection, TE (Terminal Effector) cells are found in the spleen or blood, while MP (Memory Precursor) cells mainly reside in lymphoid structures. TCM (Central Memory) and TEM (Effector Memory) cells circulate in the blood, with TCM predominant in lymphoid organs and TEM in tissues. TPM cells circulate throughout lymph, blood, and tissues, while TRM (Tissue-Resident Memory) cells stay long-term in tissues. Similarly, chronic infections or tumors induce T cell states through chronic antigen exposure, leading to a spectrum of exhaustion, from TEXprog (Progenitor) to TEXterm (Terminal). These cells lose function over time, with TEXterm becoming dysfunctional. In this study, we developed a pipeline to analyze transcription factor regulation across CD8+ T cell states, enabling therapeutic manipulation. Using 121 experiments, we created an epigenetic and transcription atlas of 9 cell states, allowing an unbiased analysis of unique and shared transcription factor activities across memory and exhaustion contexts.",
-    "home_study_summary": "Transcription factors (TFs) regulate the differentiation of T cells into diverse states with distinct functionalities. To precisely program desired T cell states in viral infections and cancers, we generated a comprehensive transcriptional and epigenetic atlas of nine CD8+ T cell differentiation states for TF activity prediction. Our analysis catalogued TF activity fingerprints of each state, uncovering new regulatory mechanisms that govern selective cell state differentiation. Leveraging this platform, we focused on two critical T cell states in tumor and virus control: terminally exhausted T cells (TEXterm), which are dysfunctional, and tissue-resident memory T cells (TRM), which are protective. Despite their functional differences, these states share significant transcriptional and anatomical similarities, making it both challenging and essential to engineer T cells that avoid TEXterm differentiation while preserving beneficial TRM characteristics. Through in vivo CRISPR screening combined with single-cell RNA sequencing (Perturb-seq), we validated the specific TFs driving the TEXterm state and confirmed the accuracy of TF specificity predictions. Importantly, we discovered novel TEXterm-specific TFs such as ZSCAN20, JDP2, and ZFP324. The deletion of these TEXterm-specific TFs in T cells enhanced tumor control and synergized with immune checkpoint blockade. Additionally, this study identified multi-state TFs like HIC1 and GFI1, which are vital for both TEXterm and TRM states. Furthermore, our global TF community analysis and Perturb-seq experiments revealed how TFs differentially regulate key processes in TRM and TEXterm cells, uncovering new biological pathways like protein catabolism that are specifically linked to TEXterm differentiation. In summary, our platform systematically identifies TF programs across diverse T cell states, facilitating the engineering of specific T cell states to improve tumor control and providing insights into the cellular mechanisms underlying their functional disparities.",
-    "chat_disclaimer": "⚠️ TaijiChat can make errors. Please verify important scientific information and consult original research papers for critical findings.",
-    "chat_setup_warning": "📊 Note: Your first query may take longer as we initialize the data analysis system.",
-    "tf_score_calculation_info": "TF score: normalized PageRank scores across samples. Higher scores mean higher activity. TF score takes account of TF expression level, ATAC-seq peak intensity, and motif binding affinity. A TF needs to have both open chromatin regions and gene expression of its downstream genes to have high PageRank scores. For the detailed information of how TF scores are calculated, visit the link: https://taiji-pipeline.github.io/algorithm_PageRank.html",
-    "cell_state_specificity_info": "Cell-state specificity: significantly higher TF score in specific cell state. For example, 'TEXterm' means TF has significantly higher TF score in TEXterm. \"ALL\" means no cell-state specificity.",
-    "sample_nomenclature_info": "Sample name nomenclature: cell state abbreviation + first author of RNA dataset + first author of ATAC dataset. If the same author has multiple datasets, then publication year and month is used. For example: 'Milner17' means Milner paper published in 2017; 'MilnerApr' means Milner paper published in April.",
-    "cell_state_bubble_plot_desc": "Below you will find the bubble plot and a searchable excel file containing all the normalized TF activity scores. Circle size represents the logarithm of gene expression, and the color represents the normalized PageRank score.",
-    "wave_analysis_overview_text": "To evaluate the predicted TFs governing specific T cell differentiation pathways, we identified dynamic activity patterns of TF groups, termed 'Transcription factor waves'. Transcription factor waves are generated via integration of the unbiased clustering and prior immunology knowledge. This curates catalogs of TFs associated with different cell states or differentiation trajectories. Circles represent specific cell state. Color indicates normalized PageRank scores with red displaying high values. Click to check TFs and GSEA results associated with each wave.",
-    "wave_analysis_seven_waves_desc": "Seven TF waves associated with distinctive biological pathways were identified. For example, the TRM TF wave (Wave 6) includes several members of the AP-1 family (e.g. Atf3, Fosb, Fosl2, and Jun), which aligns well with a recent report of their roles in TRM formation (link: https://www.biorxiv.org/content/10.1101/2023.09.29.560006v1). This wave was uniquely linked to the transforming growth factor beta (TGFβ) response pathway. Conversely, the TEX TF (including TEXprog, TEXeff, and TEXterm) wave, Wave 2, was characterized by a distinct set of TFs, such as Irf8, Jdp2, Nfatc1, and Vax2, among others, that correlated with pathways related to PD1 and senescence.",
-    "wave_analysis_click_prompt": "Click on the wave images below to be redirected to their corresponding pages and learn more about them!",
-    "wave_X_analysis_placeholder_details": "Details about the Wave {X} analysis go here.",
-    "tf_network_correlation_methodology": "Inspired by DBPNet (1) which is a framework to identify cooperations between DNA-binding proteins using Chromatin immunoprecipitation followed by sequencing (ChIP-seq) and Hi-C data, we constructed TF interaction network based on Taiji's output, which is a TF-regulatee network. For each context, we first combined the cell state-important TFs and cell state-specific TFs. In total, 159 TFs for TEXterm and 170 TFs for TRM were selected. We then combined TEXterm/TRM samples' network by taking the mean value of edge weight for each TF-regulatee pair. Next, regulatees with low variation across TFs (standard deviation <= 1) were removed, then correlation matrix between TFs is calculated by taking account of the Spearman's correlation of edge weight for each TF-regulatee pair. R package \"huge\" (2) is used to build a graphical model and construct the graph. We employed the Graphical lasso algorithm and the shrunken ECDF (empirical cumulative distribution function) estimator. We used a lasso penalty parameter lambda equal to 0.052 to control the regularization. We chose this value based on the local minimum point on the sparsity-lambda curve. When lambda = 0.05, around 15% of TF-TF pairs are considered as connected in the network.  To estimate the false discovery rate, we generated a null model by random shuffling the edge weight of TF-regulatee pair across TFs. When the same algorithm is applied to this dataset, the chosen cutoff identifies zero interaction, suggesting that the method with cutoff equal to 0.05 has a very low false discovery rate.",
-    "tf_network_correlation_legend": "Key for TF-TF Network Image: Circle: TF-specificity Green: TRM-specific Brown: TEXterm-specific. Line thickness: TF-TF interaction intensity. Line color: Green: TF-TF interaction found in TRM, Brown: TF-TF interaction found in TEXterm.",
-    "tf_community_methodology": "Communities were detected using Leiden algorithm (3) with modularity as objective function and resolution as 0.9 since it reached the highest clustering modularity. In total, we identified 5 communities for each context. Network visualization was performed by graph with Fruchterman-Reingold layout algorithm (4) utilizing R package \"igraph\" (https://r.igraph.org/).",
-    "tf_community_trmtexcom_image_desc": "TF-TF association clustering generates five TF communities between TRM and TEXterm cells. Left: The overall community topology is shaped by shared TFs (gray) in both TRM and TEXterm cells. Middle and Right: TRM and TEXterm cells show differential TF-TF interactions within each community and between communities. Top 10% of interactions are shown. The line thickness represents the interaction intensity.",
-    "tf_community_members_prompt": "Members of each TF community in each cell state are below:",
-    "tf_community_pathway_desc": "TF neighbor communities in TEXterm and TRM cells, respectively are linked to different biological processes (below). The overall topology of these communities was influenced by multi-state TFs active in both cell states, while cell state-specific TFs created unique interaction patterns between the communities. Pathway analysis of the regulatees in each community suggested that TRM- or TEXterm-specific TFs within each community controlled different biological pathways. For example, in TRM cells, community 3 was associated with cell-cell adhesion and response to TGFβ, and community 1 was associated with RNA metabolism, but in TEXterm cells, community 3 was linked to apoptosis, and community 1 was coupled to catabolism, proteolysis, ubiquitin-proteasome, and autophagy.",
-    "multi_omics_data_table_scroll_prompt": "Scroll horizontally to view entire data table",
-    "multi_omics_data_composition_desc": "Composition of multi-omic atlas. A total of 121 experiments across multiple data sets were utilized to generate an epigenetic and transcriptional atlas of murine CD8+ T cells under chronic and acute antigen exposure. Unless stated, all CD8+ T cells were isolated from spleens.",
-    "citation_zhang_2016": "1. Zhang, K., Li, N., Ainsworth, R. I. & Wang, W. Systematic identification of protein combinations mediating chromatin looping. Nat. Commun. 7, 1–11 (2016).",
-    "citation_zhao_2020": "2. Zhao, T., Liu, H., Roeder, K., Lafferty, J. & Wasserman, L. The huge Package for High-dimensional Undirected Graph Estimation in R. (2020) doi:10.48550/arXiv.2006.14781.",
-    "citation_traag_2019": "3. Traag, V. A., Waltman, L. & van Eck, N. J. From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep. 9, 5233 (2019).",
-    "citation_schonfeld_2019": "4. Schönfeld, M. & Pfeffer, J. Fruchterman/Reingold (1991): Graph Drawing by Force-Directed Placement. Schlüsselwerke der Netzwerkforschung 217–220 https://doi.org/10.1007/978-3-658-21742-6_49 (2019)."
 }

+{
+    "home_intro_1": "This website serves as a companion resource to our study, \"Multi-Omics Atlas-Assisted Discovery of Transcription Factors for Selective T Cell State Programming.\" It is designed to make the bioinformatics analyses and data from the study accessible to researchers from diverse backgrounds, including those without extensive bioinformatics expertise. The platform provides comprehensive tools to explore transcription factor (TF) activities across distinct T cell states, enabling users to examine TF scores for specific cell states, multi-state TFs and their regulatory roles, visualize TF relationships using wave and network analyses and access a searchable database of TF scores and multi-omics data. Explore our TF catalog, network analyses, and more to uncover insights into our study.",
+    "home_tcell_states_desc": "Naive T cells adopt diverse states in various contexts, such as acute or chronic infections and tumors. Upon activation, they become early effector (EE) cells that differentiate into distinct CD8+ T cell subsets with varied trafficking patterns—residing in lymphoid organs, blood, or peripheral tissues. In acute infection, TE (Terminal Effector) cells are found in the spleen or blood, while MP (Memory Precursor) cells mainly reside in lymphoid structures. TCM (Central Memory) and TEM (Effector Memory) cells circulate in the blood, with TCM predominant in lymphoid organs and TEM in tissues. TPM cells circulate throughout lymph, blood, and tissues, while TRM (Tissue-Resident Memory) cells stay long-term in tissues. Similarly, chronic infections or tumors induce T cell states through chronic antigen exposure, leading to a spectrum of exhaustion, from TEXprog (Progenitor) to TEXterm (Terminal). These cells lose function over time, with TEXterm becoming dysfunctional. In this study, we developed a pipeline to analyze transcription factor regulation across CD8+ T cell states, enabling therapeutic manipulation. Using 121 experiments, we created an epigenetic and transcription atlas of 9 cell states, allowing an unbiased analysis of unique and shared transcription factor activities across memory and exhaustion contexts.",
+    "home_study_summary": "Transcription factors (TFs) regulate the differentiation of T cells into diverse states with distinct functionalities. To precisely program desired T cell states in viral infections and cancers, we generated a comprehensive transcriptional and epigenetic atlas of nine CD8+ T cell differentiation states for TF activity prediction. Our analysis catalogued TF activity fingerprints of each state, uncovering new regulatory mechanisms that govern selective cell state differentiation. Leveraging this platform, we focused on two critical T cell states in tumor and virus control: terminally exhausted T cells (TEXterm), which are dysfunctional, and tissue-resident memory T cells (TRM), which are protective. Despite their functional differences, these states share significant transcriptional and anatomical similarities, making it both challenging and essential to engineer T cells that avoid TEXterm differentiation while preserving beneficial TRM characteristics. Through in vivo CRISPR screening combined with single-cell RNA sequencing (Perturb-seq), we validated the specific TFs driving the TEXterm state and confirmed the accuracy of TF specificity predictions. Importantly, we discovered novel TEXterm-specific TFs such as ZSCAN20, JDP2, and ZFP324. The deletion of these TEXterm-specific TFs in T cells enhanced tumor control and synergized with immune checkpoint blockade. Additionally, this study identified multi-state TFs like HIC1 and GFI1, which are vital for both TEXterm and TRM states. Furthermore, our global TF community analysis and Perturb-seq experiments revealed how TFs differentially regulate key processes in TRM and TEXterm cells, uncovering new biological pathways like protein catabolism that are specifically linked to TEXterm differentiation. In summary, our platform systematically identifies TF programs across diverse T cell states, facilitating the engineering of specific T cell states to improve tumor control and providing insights into the cellular mechanisms underlying their functional disparities.",
+    "chat_disclaimer": "⚠️ TaijiChat can make errors. Please verify important scientific information and consult original research papers for critical findings.",
+    "chat_setup_warning": "📊 Note: Your first query may take longer as we initialize the data analysis system.",
+    "tf_score_calculation_info": "TF score: normalized PageRank scores across samples. Higher scores mean higher activity. TF score takes account of TF expression level, ATAC-seq peak intensity, and motif binding affinity. A TF needs to have both open chromatin regions and gene expression of its downstream genes to have high PageRank scores. For the detailed information of how TF scores are calculated, visit the link: https://taiji-pipeline.github.io/algorithm_PageRank.html",
+    "cell_state_specificity_info": "Cell-state specificity: significantly higher TF score in specific cell state. For example, 'TEXterm' means TF has significantly higher TF score in TEXterm. \"ALL\" means no cell-state specificity.",
+    "sample_nomenclature_info": "Sample name nomenclature: cell state abbreviation + first author of RNA dataset + first author of ATAC dataset. If the same author has multiple datasets, then publication year and month is used. For example: 'Milner17' means Milner paper published in 2017; 'MilnerApr' means Milner paper published in April.",
+    "cell_state_bubble_plot_desc": "Below you will find the bubble plot and a searchable excel file containing all the normalized TF activity scores. Circle size represents the logarithm of gene expression, and the color represents the normalized PageRank score.",
+    "wave_analysis_overview_text": "To evaluate the predicted TFs governing specific T cell differentiation pathways, we identified dynamic activity patterns of TF groups, termed 'Transcription factor waves'. Transcription factor waves are generated via integration of the unbiased clustering and prior immunology knowledge. This curates catalogs of TFs associated with different cell states or differentiation trajectories. Circles represent specific cell state. Color indicates normalized PageRank scores with red displaying high values. Click to check TFs and GSEA results associated with each wave.",
+    "wave_analysis_seven_waves_desc": "Seven TF waves associated with distinctive biological pathways were identified. For example, the TRM TF wave (Wave 6) includes several members of the AP-1 family (e.g. Atf3, Fosb, Fosl2, and Jun), which aligns well with a recent report of their roles in TRM formation (link: https://www.biorxiv.org/content/10.1101/2023.09.29.560006v1). This wave was uniquely linked to the transforming growth factor beta (TGFβ) response pathway. Conversely, the TEX TF (including TEXprog, TEXeff, and TEXterm) wave, Wave 2, was characterized by a distinct set of TFs, such as Irf8, Jdp2, Nfatc1, and Vax2, among others, that correlated with pathways related to PD1 and senescence.",
+    "wave_analysis_click_prompt": "Click on the wave images below to be redirected to their corresponding pages and learn more about them!",
+    "wave_X_analysis_placeholder_details": "Details about the Wave {X} analysis go here.",
+    "tf_network_correlation_methodology": "Inspired by DBPNet (1) which is a framework to identify cooperations between DNA-binding proteins using Chromatin immunoprecipitation followed by sequencing (ChIP-seq) and Hi-C data, we constructed TF interaction network based on Taiji's output, which is a TF-regulatee network. For each context, we first combined the cell state-important TFs and cell state-specific TFs. In total, 159 TFs for TEXterm and 170 TFs for TRM were selected. We then combined TEXterm/TRM samples' network by taking the mean value of edge weight for each TF-regulatee pair. Next, regulatees with low variation across TFs (standard deviation <= 1) were removed, then correlation matrix between TFs is calculated by taking account of the Spearman's correlation of edge weight for each TF-regulatee pair. R package \"huge\" (2) is used to build a graphical model and construct the graph. We employed the Graphical lasso algorithm and the shrunken ECDF (empirical cumulative distribution function) estimator. We used a lasso penalty parameter lambda equal to 0.052 to control the regularization. We chose this value based on the local minimum point on the sparsity-lambda curve. When lambda = 0.05, around 15% of TF-TF pairs are considered as connected in the network.  To estimate the false discovery rate, we generated a null model by random shuffling the edge weight of TF-regulatee pair across TFs. When the same algorithm is applied to this dataset, the chosen cutoff identifies zero interaction, suggesting that the method with cutoff equal to 0.05 has a very low false discovery rate.",
+    "tf_network_correlation_legend": "Key for TF-TF Network Image: Circle: TF-specificity Green: TRM-specific Brown: TEXterm-specific. Line thickness: TF-TF interaction intensity. Line color: Green: TF-TF interaction found in TRM, Brown: TF-TF interaction found in TEXterm.",
+    "tf_community_methodology": "Communities were detected using Leiden algorithm (3) with modularity as objective function and resolution as 0.9 since it reached the highest clustering modularity. In total, we identified 5 communities for each context. Network visualization was performed by graph with Fruchterman-Reingold layout algorithm (4) utilizing R package \"igraph\" (https://r.igraph.org/).",
+    "tf_community_trmtexcom_image_desc": "TF-TF association clustering generates five TF communities between TRM and TEXterm cells. Left: The overall community topology is shaped by shared TFs (gray) in both TRM and TEXterm cells. Middle and Right: TRM and TEXterm cells show differential TF-TF interactions within each community and between communities. Top 10% of interactions are shown. The line thickness represents the interaction intensity.",
+    "tf_community_members_prompt": "Members of each TF community in each cell state are below:",
+    "tf_community_pathway_desc": "TF neighbor communities in TEXterm and TRM cells, respectively are linked to different biological processes (below). The overall topology of these communities was influenced by multi-state TFs active in both cell states, while cell state-specific TFs created unique interaction patterns between the communities. Pathway analysis of the regulatees in each community suggested that TRM- or TEXterm-specific TFs within each community controlled different biological pathways. For example, in TRM cells, community 3 was associated with cell-cell adhesion and response to TGFβ, and community 1 was associated with RNA metabolism, but in TEXterm cells, community 3 was linked to apoptosis, and community 1 was coupled to catabolism, proteolysis, ubiquitin-proteasome, and autophagy.",
+    "multi_omics_data_table_scroll_prompt": "Scroll horizontally to view entire data table",
+    "multi_omics_data_composition_desc": "Composition of multi-omic atlas. A total of 121 experiments across multiple data sets were utilized to generate an epigenetic and transcriptional atlas of murine CD8+ T cells under chronic and acute antigen exposure. Unless stated, all CD8+ T cells were isolated from spleens.",
+    "citation_zhang_2016": "1. Zhang, K., Li, N., Ainsworth, R. I. & Wang, W. Systematic identification of protein combinations mediating chromatin looping. Nat. Commun. 7, 1–11 (2016).",
+    "citation_zhao_2020": "2. Zhao, T., Liu, H., Roeder, K., Lafferty, J. & Wasserman, L. The huge Package for High-dimensional Undirected Graph Estimation in R. (2020) doi:10.48550/arXiv.2006.14781.",
+    "citation_traag_2019": "3. Traag, V. A., Waltman, L. & van Eck, N. J. From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep. 9, 5233 (2019).",
+    "citation_schonfeld_2019": "4. Schönfeld, M. & Pfeffer, J. Fruchterman/Reingold (1991): Graph Drawing by Force-Directed Placement. Schlüsselwerke der Netzwerkforschung 217–220 https://doi.org/10.1007/978-3-658-21742-6_49 (2019)."
 }

ui.R CHANGED Viewed

The diff for this file is too large to render. See raw diff

warning_overlay.R CHANGED Viewed

@@ -1,63 +1,63 @@
-# warning_overlay.R
-# This file previously contained UI and JS for a fixed overlay.
-# That functionality is being replaced by displaying warnings as messages in the chat log.
-# Relevant logic will now be primarily in chat_script.js (to handle new message types)
-# and long_operations.R (to send custom Shiny messages).
-# shinyjs is still useful for other UI manipulations, so useShinyjs() should still be called in the main UI.
-# Helper to ensure shinyjs is ready (can be called in your main UI definition if not already)
-# useShinyjsForWarning <- function() {
-#   shinyjs::useShinyjs()
-# }
-# UI for the warning overlay
-warningOverlayUI <- function() {
-  # The overlay itself, initially hidden
-  # It will be placed on top of the chat/thinking area
-  # Assuming the "thinking box" is part of the chat sidebar or main content area
-  # This overlay will cover its parent (e.g., the chat sidebar if placed within it)
-  div(
-    id = "thinkingWarningOverlay",
-    style = "position: absolute; top: 0; left: 0; width: 100%; height: 100%; background-color: rgba(255, 0, 0, 0.3); /* Half-transparent red */ z-index: 2000; /* Ensure it's on top */ display: none; /* Initially hidden */ justify-content: center; align-items: center; text-align: center; flex-direction: column;",
-    div(
-      style = "background-color: white; padding: 20px; border-radius: 5px; box-shadow: 0 0 10px rgba(0,0,0,0.5);",
-      h4("Processing your request..."),
-      p("This may take a moment, especially with large datasets. Please be patient."),
-      tags$div(class = "spinner-border text-primary", role = "status",
-               tags$span(class="sr-only", "Loading...")
-      ) # Simple spinner
-    )
-  )
-}
-# JavaScript to show/hide the overlay
-# These functions will be callable from the server
-warningOverlayJS <- "
-shinyjs.showWarningOverlay = function() {
-  var overlay = document.getElementById('thinkingWarningOverlay');
-  if (overlay) {
-    overlay.style.display = 'flex'; // Use flex to center content
-    console.log('Showing warning overlay.');
-  }
-}
-shinyjs.hideWarningOverlay = function() {
-  var overlay = document.getElementById('thinkingWarningOverlay');
-  if (overlay) {
-    overlay.style.display = 'none';
-    console.log('Hiding warning overlay.');
-  }
-}
-"
-# Functions to be called from the R server-side
-showWarningOverlay <- function(session) {
-  # Ensure shinyjs is initialized (typically in ui.R or server.R globally)
-  # If not already, add: useShinyjs()
-  shinyjs::runjs("shinyjs.showWarningOverlay();")
-}
-hideWarningOverlay <- function(session) {
-  shinyjs::runjs("shinyjs.hideWarningOverlay();")
 }

+# warning_overlay.R
+# This file previously contained UI and JS for a fixed overlay.
+# That functionality is being replaced by displaying warnings as messages in the chat log.
+# Relevant logic will now be primarily in chat_script.js (to handle new message types)
+# and long_operations.R (to send custom Shiny messages).
+# shinyjs is still useful for other UI manipulations, so useShinyjs() should still be called in the main UI.
+# Helper to ensure shinyjs is ready (can be called in your main UI definition if not already)
+# useShinyjsForWarning <- function() {
+#   shinyjs::useShinyjs()
+# }
+# UI for the warning overlay
+warningOverlayUI <- function() {
+  # The overlay itself, initially hidden
+  # It will be placed on top of the chat/thinking area
+  # Assuming the "thinking box" is part of the chat sidebar or main content area
+  # This overlay will cover its parent (e.g., the chat sidebar if placed within it)
+  div(
+    id = "thinkingWarningOverlay",
+    style = "position: absolute; top: 0; left: 0; width: 100%; height: 100%; background-color: rgba(255, 0, 0, 0.3); /* Half-transparent red */ z-index: 2000; /* Ensure it's on top */ display: none; /* Initially hidden */ justify-content: center; align-items: center; text-align: center; flex-direction: column;",
+    div(
+      style = "background-color: white; padding: 20px; border-radius: 5px; box-shadow: 0 0 10px rgba(0,0,0,0.5);",
+      h4("Processing your request..."),
+      p("This may take a moment, especially with large datasets. Please be patient."),
+      tags$div(class = "spinner-border text-primary", role = "status",
+               tags$span(class="sr-only", "Loading...")
+      ) # Simple spinner
+    )
+  )
+}
+# JavaScript to show/hide the overlay
+# These functions will be callable from the server
+warningOverlayJS <- "
+shinyjs.showWarningOverlay = function() {
+  var overlay = document.getElementById('thinkingWarningOverlay');
+  if (overlay) {
+    overlay.style.display = 'flex'; // Use flex to center content
+    console.log('Showing warning overlay.');
+  }
+}
+shinyjs.hideWarningOverlay = function() {
+  var overlay = document.getElementById('thinkingWarningOverlay');
+  if (overlay) {
+    overlay.style.display = 'none';
+    console.log('Hiding warning overlay.');
+  }
+}
+"
+# Functions to be called from the R server-side
+showWarningOverlay <- function(session) {
+  # Ensure shinyjs is initialized (typically in ui.R or server.R globally)
+  # If not already, add: useShinyjs()
+  shinyjs::runjs("shinyjs.showWarningOverlay();")
+}
+hideWarningOverlay <- function(session) {
+  shinyjs::runjs("shinyjs.hideWarningOverlay();")
 }

www/chat_script.js CHANGED Viewed

@@ -1,643 +1,559 @@
-// www/chat_script.js
-// Ensure jQuery and document are ready
-$(document).ready(function() {
-  console.log("Document ready - chat_script.js initializing");
-  $(document).on('shiny:connected', function(event) {
-    console.log("Shiny connected - chat_script.js executing");
-    initializeChatUI();
-    setupImageViewer();
-  });
-});
-// Setup full-size image viewer
-function setupImageViewer() {
-  // Create modal for full-size images if it doesn't exist
-  if ($('#fullImageModal').length === 0) {
-    const modalHtml = `
-      <div id="fullImageModal" class="modal">
-        <span class="close-modal">&times;</span>
-        <img class="modal-content" id="fullSizeImage">
-      </div>
-    `;
-    $('body').append(modalHtml);
-    // Add CSS for the modal
-    const modalCss = `
-      <style>
-        /* Image modal styles */
-        .chat-image-preview {
-          max-width: 250px;
-          max-height: 200px;
-          cursor: pointer;
-          border: 1px solid #ccc;
-          border-radius: 5px;
-          margin: 5px 0;
-          transition: transform 0.2s;
-        }
-        .chat-image-preview:hover {
-          transform: scale(1.05);
-        }
-        .chat-image-container {
-          margin: 10px 0;
-        }
-        #fullImageModal {
-          display: none;
-          position: fixed;
-          z-index: 9999;
-          left: 0;
-          top: 0;
-          width: 100%;
-          height: 100%;
-          overflow: auto;
-          background-color: rgba(0,0,0,0.9);
-        }
-        #fullImageModal .modal-content {
-          margin: auto;
-          display: block;
-          max-width: 90%;
-          max-height: 90%;
-        }
-        .close-modal {
-          position: absolute;
-          top: 15px;
-          right: 35px;
-          color: #f1f1f1;
-          font-size: 40px;
-          font-weight: bold;
-          cursor: pointer;
-        }
-      </style>
-    `;
-    $('head').append(modalCss);
-    // Close modal when clicking X or outside the image
-    $('.close-modal').click(function() {
-      $('#fullImageModal').hide();
-    });
-    $(document).click(function(event) {
-      if (event.target === document.getElementById('fullImageModal')) {
-        $('#fullImageModal').hide();
-      }
-    });
-  }
-  // Handle "activate_image_viewer" message from server
-  Shiny.addCustomMessageHandler("activate_image_viewer", function(message) {
-    console.log("Image viewer activated");
-  });
-}
-// Function to show full-size image
-window.showFullImage = function(imagePath) {
-  console.log("Showing full image:", imagePath);
-  // Debug image loading
-  var img = new Image();
-  img.onload = function() {
-    console.log("Image loaded successfully:", imagePath, "Size:", this.width, "x", this.height);
-  };
-  img.onerror = function() {
-    console.error("Failed to load image:", imagePath);
-    // Try alternative path by removing 'www' prefix
-    var altPath = imagePath.replace(/^www\//, '');
-    console.log("Trying alternative path:", altPath);
-    $('#fullSizeImage').attr('src', altPath);
-  };
-  img.src = imagePath;
-  $('#fullSizeImage').attr('src', imagePath);
-  $('#fullImageModal').show();
-}
-function initializeChatUI() {
-  var isFirstChatOpenThisSession = true;
-  var isResizing = false;
-  var startX;
-  var startWidth;
-  var $chatMessages = $('#chatMessages'); // Cache the selector
-  var autoScrollEnabled = true;
-  var scrollThreshold = 20; // Pixels from bottom to re-enable auto-scroll
-  // --- Dynamically create and insert the Chat tab --- START ---
-  var chatTabExists = $('#customChatTabLink').length > 0;
-  if (!chatTabExists) {
-    var $navbarList = $('ul.nav.navbar-nav').first();
-    if ($navbarList.length > 0) {
-      var $chatTabLi = $('<li></li>').addClass('nav-item custom-chat-tab-li');
-      var $chatTabLink = $('<a></a>')
-        .attr('id', 'customChatTabLink')
-        .attr('href', '#')
-        .addClass('nav-link')
-        .html('<i class="fa fa-comments"></i> Chat');
-      $chatTabLi.append($chatTabLink);
-      $navbarList.append($chatTabLi);
-      console.log("Custom 'Chat' tab dynamically added to navbar");
-    } else {
-      console.warn("Could not find navbar list to insert Chat tab");
-    }
-  }
-  // Remove previous handlers
-  $(document).off('click.chatToggle', 'a[data-value="chatTabTrigger"]');
-  $('a[data-value="chatTabTrigger"]').off('click.chatToggle');
-  var oldChatTabLink = $('a[data-toggle="tab"][data-value="chatTabTrigger"]');
-  if (oldChatTabLink.length > 0) {
-    oldChatTabLink.off('click.bs.tab.data-api');
-    oldChatTabLink.attr('href', 'javascript:void(0);');
-    oldChatTabLink.removeAttr('data-toggle');
-  }
-  $(document).off('click.chatNavbarButton', '#chatNavbarButton');
-  // Chat toggle handler
-  $(document).off('click.customChatTab').on('click.customChatTab', '#customChatTabLink', function(event) {
-    event.preventDefault();
-    event.stopPropagation();
-    console.log("Chat tab clicked");
-    var sidebar = $('#chatSidebar');
-    console.log("Sidebar visibility:", sidebar.is(':visible'));
-    if (sidebar.is(':visible')) {
-      sidebar.fadeOut();
-    } else {
-      sidebar.fadeIn(function() {
-        if (isFirstChatOpenThisSession) {
-          addChatMessage("How can I help you today?", 'agent');
-          addChatMessage("⚠️ TaijiChat can make errors. Please verify important scientific information and consult original research papers for critical findings.", 'agent', false, true);
-          addChatMessage("📊 Note: Your first query may take longer as we initialize the data analysis system.", 'agent', false, true);
-          isFirstChatOpenThisSession = false;
-        }
-      });
-    }
-  });
-  // Close button handler
-  $(document).off('click.chatClose').on('click.chatClose', '#closeChatSidebarBtn', function() {
-    console.log("Close button clicked");
-    $('#chatSidebar').fadeOut();
-  });
-  // Resize functionality
-  console.log("Setting up resize handlers");
-  // Remove any existing handlers first
-  $(document).off('mousedown.resizeHandle');
-  $(document).off('mousemove.resizePanel');
-  $(document).off('mouseup.resizePanel');
-  // Add new handlers using event delegation
-  $(document).on('mousedown.resizeHandle', '.resize-handle', function(e) {
-    console.log("Resize handle mousedown detected");
-    isResizing = true;
-    startX = e.pageX;
-    var sidebar = $('#chatSidebar');
-    startWidth = sidebar.width();
-    console.log("Initial width:", startWidth);
-    e.preventDefault();
-    $('body').css('user-select', 'none'); // Prevent text selection while dragging
-  });
-  $(document).on('mousemove.resizePanel', function(e) {
-    if (!isResizing) return;
-    var sidebar = $('#chatSidebar');
-    var windowWidth = $(window).width();
-    var width = windowWidth - e.pageX;
-    width = Math.max(250, Math.min(width, 3200));
-    console.log("Resizing to width:", width);
-    sidebar.css({
-      'width': width + 'px',
-      'transition': 'none' // Disable transition during drag
-    });
-  });
-  $(document).on('mouseup.resizePanel', function(e) {
-    if (isResizing) {
-      console.log("Resize ended");
-      isResizing = false;
-      $('body').css('user-select', ''); // Re-enable text selection
-      $('#chatSidebar').css('transition', ''); // Re-enable transitions
-    }
-  });
-  $(document).on('mouseenter', '.resize-handle', function() {
-    console.log('Mouse entered resize handle');
-  });
-  // Message handling functionality
-  var thinkingMessageElement = null;
-  var currentThoughtsContainer = null;
-  // Track if thinking animation is in progress
-  var thinkingTypingInProgress = false;
-  var resultTypingQueue = [];
-  // Scroll listener for chat messages panel
-  if ($chatMessages.length) { // Ensure element exists before attaching listener
-    $chatMessages.on('scroll.chatAutoScroll', function() {
-      // Check if scrolled near the bottom
-      if (this.scrollHeight - this.scrollTop - this.clientHeight < scrollThreshold) {
-        if (!autoScrollEnabled) {
-          // console.log("Auto-scroll re-enabled (scrolled to bottom).");
-          autoScrollEnabled = true;
-        }
-      } else {
-        if (autoScrollEnabled) {
-          // console.log("Auto-scroll disabled (user scrolled up).");
-          autoScrollEnabled = false;
-        }
-      }
-    });
-  } else {
-    console.warn("#chatMessages element not found for scroll listener.");
-  }
-  function typeTextLine($element, text, callback, speed = 10) {
-    let i = 0;
-    function typeChar() {
-      if (i < text.length) {
-        $element.append(text.charAt(i));
-        i++;
-        if (autoScrollEnabled) { // Conditional scroll
-          $chatMessages.scrollTop($chatMessages[0].scrollHeight); // Scroll after each character
-        }
-        setTimeout(typeChar, speed);
-      } else if (callback) {
-        callback();
-      }
-    }
-    typeChar();
-  }
-  function typeTextLines($container, lines, lineClass, speed, doneCallback) {
-    let idx = 0;
-    function typeNextLine() {
-      if (idx < lines.length) {
-        var $lineDiv = $('<div></div>').addClass(lineClass);
-        $container.append($lineDiv);
-        typeTextLine($lineDiv, lines[idx], function() {
-          idx++;
-          typeNextLine();
-        }, speed);
-      } else if (doneCallback) {
-        doneCallback();
-      }
-    }
-    typeNextLine();
-  }
-  function addChatMessage(messageText, messageType, isThinkingMessage = false, isDisclaimer = false) {
-    var messageClass = messageType === 'user' ? 'user-message' : 'agent-message';
-    if (isThinkingMessage) {
-      messageClass += ' thinking-message';
-    }
-    if (isDisclaimer) {
-      messageClass += ' disclaimer';
-    }
-    var $chatMessages = $('#chatMessages');
-    var $messageDiv = $('<div></div>').addClass('chat-message').addClass(messageClass);
-    if (messageType === 'user') {
-      // For user messages, just append the text directly without animation
-      $messageDiv.text(messageText);
-      $chatMessages.append($messageDiv);
-      // Always scroll for user's own messages, and re-enable autoScroll
-      autoScrollEnabled = true;
-      $chatMessages.scrollTop($chatMessages[0].scrollHeight);
-      return;
-    }
-    // Check if message contains HTML (for images)
-    var containsHtml = /<[a-z][\s\S]*>/i.test(messageText);
-    if (isThinkingMessage) {
-      // Guarantee thinkingTypingInProgress is set before any animation starts
-      thinkingTypingInProgress = true;
-      $messageDiv.html('<span class="thought-toggle-arrow" role="button" tabindex="0">&#9658;</span> ' +
-                       '<span class="thinking-text"></span>' +
-                       '<div class="thoughts-area" style="display: none; margin-left: 20px; font-style: italic; color: #555;"></div>');
-      thinkingMessageElement = $messageDiv;
-      currentThoughtsContainer = $messageDiv.find('.thoughts-area');
-      $chatMessages.append($messageDiv);
-      // Split main thinking into lines
-      var mainLines = messageText.split('\n');
-      typeTextLines($messageDiv.find('.thinking-text'), mainLines, '', 4, function() {
-        // If there are already thoughts queued, type them line by line
-        var $thoughtsArea = $messageDiv.find('.thoughts-area');
-        var thoughtDivs = $thoughtsArea.children('.thought-item').toArray();
-        function typeThoughtsSequentially(idx) {
-          if (idx < thoughtDivs.length) {
-            var $thoughtDiv = $(thoughtDivs[idx]);
-            var text = $thoughtDiv.data('pending-text');
-            $thoughtDiv.removeData('pending-text');
-            typeTextLine($thoughtDiv, text, function() {
-              typeThoughtsSequentially(idx + 1);
-            }, 4);
-          } else {
-            // Only now set thinkingTypingInProgress to false
-            thinkingTypingInProgress = false;
-            // If a result is queued, type it now
-            if (resultTypingQueue.length > 0) {
-              var nextResult = resultTypingQueue.shift();
-              nextResult();
-            }
-          }
-        }
-        typeThoughtsSequentially(0);
-      });
-      // Scroll to bottom if auto-scroll is enabled
-      if (autoScrollEnabled) {
-        $chatMessages.scrollTop($chatMessages[0].scrollHeight);
-      }
-      return;
-    }
-    function processRegularMessage() {
-      // Check if message contains HTML (for images)
-      if (containsHtml) {
-        // If contains HTML, add it directly without typing animation
-        console.log("HTML content detected in message, adding directly:", messageText);
-        // Check if it has an image tag
-        if (messageText.indexOf('<img') !== -1) {
-          console.log("Image tag found in message");
-          // Extract image src for debugging
-          var imgSrcMatch = messageText.match(/src="([^"]+)"/);
-          if (imgSrcMatch && imgSrcMatch.length > 1) {
-            console.log("Image src:", imgSrcMatch[1]);
-          }
-        }
-        $messageDiv.html(messageText);
-        $chatMessages.append($messageDiv);
-        if (autoScrollEnabled) {
-            $chatMessages.scrollTop($chatMessages[0].scrollHeight);
-        }
-      } else {
-        // For normal text messages, use typing animation
-        $chatMessages.append($messageDiv);
-        var lines = messageText.split('\n');
-        typeTextLines($messageDiv, lines, '', 5, function() {
-          if (autoScrollEnabled) {
-            $chatMessages.scrollTop($chatMessages[0].scrollHeight);
-          }
-        });
-      }
-    }
-    if (thinkingTypingInProgress) {
-      // If thinking animation is in progress, queue this message
-      resultTypingQueue.push(processRegularMessage);
-    } else {
-      // Otherwise, process immediately
-      processRegularMessage();
-    }
-  }
-  // Thought toggle handler
-  $(document).off('click.thoughtToggle').on('click.thoughtToggle', '.thought-toggle-arrow', function() {
-    var $arrow = $(this);
-    var $thoughtsArea = $arrow.siblings('.thoughts-area');
-    $thoughtsArea.slideToggle(200);
-    $arrow.html($arrow.html() === '►' ? '▼' : '►');
-  });
-  // Send message handlers
-  $('#sendChatMsg').off('click.chatSend').on('click.chatSend', function() {
-    var messageText = $('#chatInput').val().trim();
-    if (messageText) {
-      // Disable input and button
-      $('#chatInput').prop('disabled', true);
-      $('#sendChatMsg').prop('disabled', true);
-      addChatMessage(messageText, 'user');
-      $('#chatInput').val(''); // Clear input after grabbing value
-      Shiny.setInputValue("user_chat_message", messageText, {priority: "event"});
-    }
-  });
-  $('#chatInput').off('keypress.chatSend').on('keypress.chatSend', function(e) {
-    if (e.which == 13 && !$(this).prop('disabled')) { // Check if not disabled
-      e.preventDefault();
-      $('#sendChatMsg').click(); // This will trigger the click handler above
-    }
-  });
-  // Shiny message handlers
-  Shiny.addCustomMessageHandler("agent_thinking_started", function(message) {
-    console.log("Received thinking started message");
-    if(message && typeof message.text === 'string') {
-      // if (thinkingMessageElement) {
-      //   thinkingMessageElement.remove();
-      //   thinkingMessageElement = null;
-      // }
-      // if (currentThoughtsContainer) {
-      //   currentThoughtsContainer = null; // It's part of thinkingMessageElement, will be handled if parent is removed
-      // }
-      addChatMessage(message.text, 'agent', true);
-    }
-  });
-  Shiny.addCustomMessageHandler("agent_new_thought", function(message) {
-    console.log("Received new thought");
-    if (message && typeof message.text === 'string' && currentThoughtsContainer) {
-      var $thoughtDiv = $('<div></div>').addClass('thought-item');
-      $thoughtDiv.data('pending-text', message.text);
-      currentThoughtsContainer.append($thoughtDiv);
-      // If thinking text is done, type this thought now, else it will be picked up in the queue
-      if (!thinkingTypingInProgress) {
-        // Guarantee thinkingTypingInProgress is set before starting thoughts
-        thinkingTypingInProgress = true;
-        var $thoughtsArea = currentThoughtsContainer;
-        var thoughtDivs = $thoughtsArea.children('.thought-item').toArray();
-        function typeThoughtsSequentially(idx) {
-          if (idx < thoughtDivs.length) {
-            var $thoughtDiv = $(thoughtDivs[idx]);
-            if ($thoughtDiv.text().length === 0) { // Check if text has not been typed yet
-              var text = $thoughtDiv.data('pending-text');
-              $thoughtDiv.removeData('pending-text');
-              typeTextLine($thoughtDiv, text, function() {
-                typeThoughtsSequentially(idx + 1);
-              }, 4);
-            } else {
-              // Already typed (e.g., if this function is re-entered)
-              typeThoughtsSequentially(idx + 1);
-            }
-          } else {
-            thinkingTypingInProgress = false;
-            if (resultTypingQueue.length > 0) {
-              var nextResult = resultTypingQueue.shift();
-              nextResult(); // This will handle its own scrolling if needed
-            }
-          }
-        }
-        typeThoughtsSequentially(0);
-      }
-      if (autoScrollEnabled) { // Conditional scroll
-        $chatMessages.scrollTop($chatMessages[0].scrollHeight);
-      }
-    }
-  });
-  Shiny.addCustomMessageHandler("agent_chat_response", function(message) {
-    console.log("Received chat response");
-    if(message && typeof message.text === 'string') {
-      addChatMessage(message.text, 'agent');
-    }
-    // Re-enable input and button
-    $('#chatInput').prop('disabled', false);
-    $('#sendChatMsg').prop('disabled', false);
-    $('#chatInput').focus(); // Optionally focus the input field
-  });
-  Shiny.addCustomMessageHandler("long_op_custom_warning", function(message) {
-    if (message && typeof message.text === 'string') {
-      // Add message to chat display, styled as a warning
-      // You might want to add a specific class for styling, e.g., 'long-op-warning-message'
-      // For now, it will use the default 'agent-message' style but appear as a distinct message.
-      var $chatMessages = $('#chatMessages');
-      var $messageDiv = $('<div></div>').addClass('chat-message agent-message long-op-warning'); // Added long-op-warning class
-      $messageDiv.css({
-        'background-color': 'rgba(255, 0, 0, 0.1)', // Light red, less intense than the old overlay
-        'border': '1px solid rgba(255, 0, 0, 0.3)',
-        'color': '#721c24', // Darker red text for readability
-        'padding': '10px',
-        'margin-bottom': '10px',
-        'border-radius': '5px'
-      });
-      $messageDiv.text(message.text);
-      $chatMessages.append($messageDiv);
-      $chatMessages.scrollTop($chatMessages[0].scrollHeight);
-    }
-  });
-  Shiny.addCustomMessageHandler("agent_processing_error", function(message) {
-    // This is a new handler you might need if server.R sends a specific error message type
-    // For now, agent_chat_response handles errors from server.R's tryCatch
-    console.error("Agent processing error:", message.text);
-    // Ensure UI is re-enabled even on specific error messages
-    $('#chatInput').prop('disabled', false);
-    $('#sendChatMsg').prop('disabled', false);
-    $('#chatInput').focus();
-    // Optionally display a more prominent error in the chat
-    if(message && typeof message.text === 'string') {
-       addChatMessage("Error: " + message.text, 'agent-error'); // Define 'agent-error' style if needed
-    } else {
-       addChatMessage("An unexpected error occurred with the agent.", 'agent-error');
-    }
-  });
-  Shiny.addCustomMessageHandler("literature_confirmation_request", function(message) {
-    console.log("Received literature confirmation request");
-    if (message && typeof message.text === 'string') {
-      showLiteratureConfirmationDialog(message.text);
-    }
-  });
-  function showLiteratureConfirmationDialog(messageText) {
-    var $chatMessages = $('#chatMessages');
-    // Remove any existing confirmation dialogs first
-    $('.literature-confirmation-dialog').remove();
-    // Create a wrapper for the dialog that doesn't interfere with chat message styling
-    var $dialogWrapper = $('<div></div>').addClass('chat-message').css({
-      'background': 'transparent',
-      'border': 'none',
-      'padding': '10px 0',
-      'margin': '8px 0',
-      'max-width': '100%',
-      'float': 'none',
-      'display': 'flex',
-      'justify-content': 'center',
-      'align-items': 'center',
-      'width': '100%'
-    });
-    // Create the confirmation dialog box
-    var $confirmationDiv = $('<div></div>').addClass('literature-confirmation-dialog');
-    $confirmationDiv.html(`
-      <div class="confirmation-content">
-        <div class="confirmation-message">${messageText}</div>
-        <div class="confirmation-buttons">
-          <button class="btn confirmation-paper" onclick="handleLiteratureConfirmation('paper')">
-            Paper Only
-          </button>
-          <button class="btn confirmation-external" onclick="handleLiteratureConfirmation('external')">
-            External Only
-          </button>
-          <button class="btn confirmation-both" onclick="handleLiteratureConfirmation('both')">
-            Both Sources
-          </button>
-          <button class="btn confirmation-none" onclick="handleLiteratureConfirmation('none')">
-            None
-          </button>
-        </div>
-      </div>
-    `);
-    // Add the dialog to the wrapper and then to chat
-    $dialogWrapper.append($confirmationDiv);
-    $chatMessages.append($dialogWrapper);
-    // Smooth scroll to the dialog
-    if (autoScrollEnabled) {
-      $chatMessages.animate({
-        scrollTop: $chatMessages[0].scrollHeight
-      }, 300);
-    }
-    // Disable input while waiting for confirmation
-    $('#chatInput').prop('disabled', true);
-    $('#sendChatMsg').prop('disabled', true);
-  }
-  // Global function to handle literature confirmation response
-  window.handleLiteratureConfirmation = function(userChoice) {
-    console.log("Literature confirmation choice:", userChoice);
-    // Add a brief feedback before removing the dialog
-    var $confirmationDialog = $('.literature-confirmation-dialog');
-    var choiceText;
-    switch(userChoice) {
-      case 'paper':
-        choiceText = "Using the underlying paper only...";
-        break;
-      case 'external':
-        choiceText = "Using external literature only...";
-        break;
-      case 'both':
-        choiceText = "Using both paper and external literature...";
-        break;
-      case 'none':
-        choiceText = "Proceeding without literature sources...";
-        break;
-      default:
-        choiceText = "Processing your choice...";
-    }
-    // Show brief feedback
-    $confirmationDialog.find('.confirmation-message').html(`<em>${choiceText}</em>`);
-    $confirmationDialog.find('.confirmation-buttons').fadeOut(200);
-    // Remove the dialog after a brief delay
-    setTimeout(function() {
-      $('.literature-confirmation-dialog').closest('.chat-message').fadeOut(300, function() {
-        $(this).remove();
-      });
-    }, 1000);
-    // Send the response to Shiny
-    Shiny.setInputValue("literature_confirmation_response", userChoice, {priority: "event"});
-    // Keep input disabled - it will be re-enabled when the agent responds
-  };
-}

+// www/chat_script.js
+// Ensure jQuery and document are ready
+$(document).ready(function() {
+  console.log("Document ready - chat_script.js initializing");
+  $(document).on('shiny:connected', function(event) {
+    console.log("Shiny connected - chat_script.js executing");
+    initializeChatUI();
+    setupImageViewer();
+  });
+});
+// Setup full-size image viewer
+function setupImageViewer() {
+  // Create modal for full-size images if it doesn't exist
+  if ($('#fullImageModal').length === 0) {
+    const modalHtml = `
+      <div id="fullImageModal" class="modal">
+        <span class="close-modal">&times;</span>
+        <img class="modal-content" id="fullSizeImage">
+      </div>
+    `;
+    $('body').append(modalHtml);
+    // Add CSS for the modal
+    const modalCss = `
+      <style>
+        /* Image modal styles */
+        .chat-image-preview {
+          max-width: 250px;
+          max-height: 200px;
+          cursor: pointer;
+          border: 1px solid #ccc;
+          border-radius: 5px;
+          margin: 5px 0;
+          transition: transform 0.2s;
+        }
+        .chat-image-preview:hover {
+          transform: scale(1.05);
+        }
+        .chat-image-container {
+          margin: 10px 0;
+        }
+        #fullImageModal {
+          display: none;
+          position: fixed;
+          z-index: 9999;
+          left: 0;
+          top: 0;
+          width: 100%;
+          height: 100%;
+          overflow: auto;
+          background-color: rgba(0,0,0,0.9);
+        }
+        #fullImageModal .modal-content {
+          margin: auto;
+          display: block;
+          max-width: 90%;
+          max-height: 90%;
+        }
+        .close-modal {
+          position: absolute;
+          top: 15px;
+          right: 35px;
+          color: #f1f1f1;
+          font-size: 40px;
+          font-weight: bold;
+          cursor: pointer;
+        }
+      </style>
+    `;
+    $('head').append(modalCss);
+    // Close modal when clicking X or outside the image
+    $('.close-modal').click(function() {
+      $('#fullImageModal').hide();
+    });
+    $(document).click(function(event) {
+      if (event.target === document.getElementById('fullImageModal')) {
+        $('#fullImageModal').hide();
+      }
+    });
+  }
+  // Handle "activate_image_viewer" message from server
+  Shiny.addCustomMessageHandler("activate_image_viewer", function(message) {
+    console.log("Image viewer activated");
+  });
+}
+// Function to show full-size image
+window.showFullImage = function(imagePath) {
+  console.log("Showing full image:", imagePath);
+  // Debug image loading
+  var img = new Image();
+  img.onload = function() {
+    console.log("Image loaded successfully:", imagePath, "Size:", this.width, "x", this.height);
+  };
+  img.onerror = function() {
+    console.error("Failed to load image:", imagePath);
+    // Try alternative path by removing 'www' prefix
+    var altPath = imagePath.replace(/^www\//, '');
+    console.log("Trying alternative path:", altPath);
+    $('#fullSizeImage').attr('src', altPath);
+  };
+  img.src = imagePath;
+  $('#fullSizeImage').attr('src', imagePath);
+  $('#fullImageModal').show();
+}
+function initializeChatUI() {
+  var isFirstChatOpenThisSession = true;
+  var isResizing = false;
+  var startX;
+  var startWidth;
+  var $chatMessages = $('#chatMessages'); // Cache the selector
+  var autoScrollEnabled = true;
+  var scrollThreshold = 20; // Pixels from bottom to re-enable auto-scroll
+  // --- Dynamically create and insert the Chat tab --- START ---
+  var chatTabExists = $('#customChatTabLink').length > 0;
+  if (!chatTabExists) {
+    var $navbarList = $('ul.nav.navbar-nav').first();
+    if ($navbarList.length > 0) {
+      var $chatTabLi = $('<li></li>').addClass('nav-item custom-chat-tab-li');
+      var $chatTabLink = $('<a></a>')
+        .attr('id', 'customChatTabLink')
+        .attr('href', '#')
+        .addClass('nav-link')
+        .html('<i class="fa fa-comments"></i> Chat');
+      $chatTabLi.append($chatTabLink);
+      $navbarList.append($chatTabLi);
+      console.log("Custom 'Chat' tab dynamically added to navbar");
+    } else {
+      console.warn("Could not find navbar list to insert Chat tab");
+    }
+  }
+  // Remove previous handlers
+  $(document).off('click.chatToggle', 'a[data-value="chatTabTrigger"]');
+  $('a[data-value="chatTabTrigger"]').off('click.chatToggle');
+  var oldChatTabLink = $('a[data-toggle="tab"][data-value="chatTabTrigger"]');
+  if (oldChatTabLink.length > 0) {
+    oldChatTabLink.off('click.bs.tab.data-api');
+    oldChatTabLink.attr('href', 'javascript:void(0);');
+    oldChatTabLink.removeAttr('data-toggle');
+  }
+  $(document).off('click.chatNavbarButton', '#chatNavbarButton');
+  // Chat toggle handler
+  $(document).off('click.customChatTab').on('click.customChatTab', '#customChatTabLink', function(event) {
+    event.preventDefault();
+    event.stopPropagation();
+    console.log("Chat tab clicked");
+    var sidebar = $('#chatSidebar');
+    console.log("Sidebar visibility:", sidebar.is(':visible'));
+    if (sidebar.is(':visible')) {
+      sidebar.fadeOut();
+    } else {
+      sidebar.fadeIn(function() {
+        if (isFirstChatOpenThisSession) {
+          addChatMessage("How can I help you today?", 'agent');
+          addChatMessage("⚠️ TaijiChat can make errors. Please verify important scientific information and consult original research papers for critical findings.", 'agent', false, true);
+          addChatMessage("📊 Note: Your first query may take longer as we initialize the data analysis system.", 'agent', false, true);
+          isFirstChatOpenThisSession = false;
+        }
+      });
+    }
+  });
+  // Close button handler
+  $(document).off('click.chatClose').on('click.chatClose', '#closeChatSidebarBtn', function() {
+    console.log("Close button clicked");
+    $('#chatSidebar').fadeOut();
+  });
+  // Resize functionality
+  console.log("Setting up resize handlers");
+  // Remove any existing handlers first
+  $(document).off('mousedown.resizeHandle');
+  $(document).off('mousemove.resizePanel');
+  $(document).off('mouseup.resizePanel');
+  // Add new handlers using event delegation
+  $(document).on('mousedown.resizeHandle', '.resize-handle', function(e) {
+    console.log("Resize handle mousedown detected");
+    isResizing = true;
+    startX = e.pageX;
+    var sidebar = $('#chatSidebar');
+    startWidth = sidebar.width();
+    console.log("Initial width:", startWidth);
+    e.preventDefault();
+    $('body').css('user-select', 'none'); // Prevent text selection while dragging
+  });
+  $(document).on('mousemove.resizePanel', function(e) {
+    if (!isResizing) return;
+    var sidebar = $('#chatSidebar');
+    var windowWidth = $(window).width();
+    var width = windowWidth - e.pageX;
+    width = Math.max(250, Math.min(width, 3200));
+    console.log("Resizing to width:", width);
+    sidebar.css({
+      'width': width + 'px',
+      'transition': 'none' // Disable transition during drag
+    });
+  });
+  $(document).on('mouseup.resizePanel', function(e) {
+    if (isResizing) {
+      console.log("Resize ended");
+      isResizing = false;
+      $('body').css('user-select', ''); // Re-enable text selection
+      $('#chatSidebar').css('transition', ''); // Re-enable transitions
+    }
+  });
+  $(document).on('mouseenter', '.resize-handle', function() {
+    console.log('Mouse entered resize handle');
+  });
+  // Message handling functionality
+  var thinkingMessageElement = null;
+  var currentThoughtsContainer = null;
+  // Track if thinking animation is in progress
+  var thinkingTypingInProgress = false;
+  var resultTypingQueue = [];
+  // Scroll listener for chat messages panel
+  if ($chatMessages.length) { // Ensure element exists before attaching listener
+    $chatMessages.on('scroll.chatAutoScroll', function() {
+      // Check if scrolled near the bottom
+      if (this.scrollHeight - this.scrollTop - this.clientHeight < scrollThreshold) {
+        if (!autoScrollEnabled) {
+          // console.log("Auto-scroll re-enabled (scrolled to bottom).");
+          autoScrollEnabled = true;
+        }
+      } else {
+        if (autoScrollEnabled) {
+          // console.log("Auto-scroll disabled (user scrolled up).");
+          autoScrollEnabled = false;
+        }
+      }
+    });
+  } else {
+    console.warn("#chatMessages element not found for scroll listener.");
+  }
+  function typeTextLine($element, text, callback, speed = 10) {
+    let i = 0;
+    function typeChar() {
+      if (i < text.length) {
+        $element.append(text.charAt(i));
+        i++;
+        if (autoScrollEnabled) { // Conditional scroll
+          $chatMessages.scrollTop($chatMessages[0].scrollHeight); // Scroll after each character
+        }
+        setTimeout(typeChar, speed);
+      } else if (callback) {
+        callback();
+      }
+    }
+    typeChar();
+  }
+  function typeTextLines($container, lines, lineClass, speed, doneCallback) {
+    let idx = 0;
+    function typeNextLine() {
+      if (idx < lines.length) {
+        var $lineDiv = $('<div></div>').addClass(lineClass);
+        $container.append($lineDiv);
+        typeTextLine($lineDiv, lines[idx], function() {
+          idx++;
+          typeNextLine();
+        }, speed);
+      } else if (doneCallback) {
+        doneCallback();
+      }
+    }
+    typeNextLine();
+  }
+  function addChatMessage(messageText, messageType, isThinkingMessage = false, isDisclaimer = false) {
+    var messageClass = messageType === 'user' ? 'user-message' : 'agent-message';
+    if (isThinkingMessage) {
+      messageClass += ' thinking-message';
+    }
+    if (isDisclaimer) {
+      messageClass += ' disclaimer';
+    }
+    var $chatMessages = $('#chatMessages');
+    var $messageDiv = $('<div></div>').addClass('chat-message').addClass(messageClass);
+    if (messageType === 'user') {
+      // For user messages, just append the text directly without animation
+      $messageDiv.text(messageText);
+      $chatMessages.append($messageDiv);
+      // Always scroll for user's own messages, and re-enable autoScroll
+      autoScrollEnabled = true;
+      $chatMessages.scrollTop($chatMessages[0].scrollHeight);
+      return;
+    }
+    // Check if message contains HTML (for images)
+    var containsHtml = /<[a-z][\s\S]*>/i.test(messageText);
+    if (isThinkingMessage) {
+      // Guarantee thinkingTypingInProgress is set before any animation starts
+      thinkingTypingInProgress = true;
+      $messageDiv.html('<span class="thought-toggle-arrow" role="button" tabindex="0">&#9658;</span> ' +
+                       '<span class="thinking-text"></span>' +
+                       '<div class="thoughts-area" style="display: none; margin-left: 20px; font-style: italic; color: #555;"></div>');
+      thinkingMessageElement = $messageDiv;
+      currentThoughtsContainer = $messageDiv.find('.thoughts-area');
+      $chatMessages.append($messageDiv);
+      // Split main thinking into lines
+      var mainLines = messageText.split('\n');
+      typeTextLines($messageDiv.find('.thinking-text'), mainLines, '', 4, function() {
+        // If there are already thoughts queued, type them line by line
+        var $thoughtsArea = $messageDiv.find('.thoughts-area');
+        var thoughtDivs = $thoughtsArea.children('.thought-item').toArray();
+        function typeThoughtsSequentially(idx) {
+          if (idx < thoughtDivs.length) {
+            var $thoughtDiv = $(thoughtDivs[idx]);
+            var text = $thoughtDiv.data('pending-text');
+            $thoughtDiv.removeData('pending-text');
+            typeTextLine($thoughtDiv, text, function() {
+              typeThoughtsSequentially(idx + 1);
+            }, 4);
+          } else {
+            // Only now set thinkingTypingInProgress to false
+            thinkingTypingInProgress = false;
+            // If a result is queued, type it now
+            if (resultTypingQueue.length > 0) {
+              var nextResult = resultTypingQueue.shift();
+              nextResult();
+            }
+          }
+        }
+        typeThoughtsSequentially(0);
+      });
+      // Scroll to bottom if auto-scroll is enabled
+      if (autoScrollEnabled) {
+        $chatMessages.scrollTop($chatMessages[0].scrollHeight);
+      }
+      return;
+    }
+    function processRegularMessage() {
+      // Check if message contains HTML (for images)
+      if (containsHtml) {
+        // If contains HTML, add it directly without typing animation
+        console.log("HTML content detected in message, adding directly:", messageText);
+        // Check if it has an image tag
+        if (messageText.indexOf('<img') !== -1) {
+          console.log("Image tag found in message");
+          // Extract image src for debugging
+          var imgSrcMatch = messageText.match(/src="([^"]+)"/);
+          if (imgSrcMatch && imgSrcMatch.length > 1) {
+            console.log("Image src:", imgSrcMatch[1]);
+          }
+        }
+        $messageDiv.html(messageText);
+        $chatMessages.append($messageDiv);
+        if (autoScrollEnabled) {
+            $chatMessages.scrollTop($chatMessages[0].scrollHeight);
+        }
+      } else {
+        // For normal text messages, use typing animation
+        $chatMessages.append($messageDiv);
+        var lines = messageText.split('\n');
+        typeTextLines($messageDiv, lines, '', 5, function() {
+          if (autoScrollEnabled) {
+            $chatMessages.scrollTop($chatMessages[0].scrollHeight);
+          }
+        });
+      }
+    }
+    if (thinkingTypingInProgress) {
+      // If thinking animation is in progress, queue this message
+      resultTypingQueue.push(processRegularMessage);
+    } else {
+      // Otherwise, process immediately
+      processRegularMessage();
+    }
+  }
+  // Thought toggle handler
+  $(document).off('click.thoughtToggle').on('click.thoughtToggle', '.thought-toggle-arrow', function() {
+    var $arrow = $(this);
+    var $thoughtsArea = $arrow.siblings('.thoughts-area');
+    $thoughtsArea.slideToggle(200);
+    $arrow.html($arrow.html() === '►' ? '▼' : '►');
+  });
+  // Send message handlers
+  $('#sendChatMsg').off('click.chatSend').on('click.chatSend', function() {
+    var messageText = $('#chatInput').val().trim();
+    if (messageText) {
+      // Disable input and button
+      $('#chatInput').prop('disabled', true);
+      $('#sendChatMsg').prop('disabled', true);
+      addChatMessage(messageText, 'user');
+      $('#chatInput').val(''); // Clear input after grabbing value
+      Shiny.setInputValue("user_chat_message", messageText, {priority: "event"});
+    }
+  });
+  $('#chatInput').off('keypress.chatSend').on('keypress.chatSend', function(e) {
+    if (e.which == 13 && !$(this).prop('disabled')) { // Check if not disabled
+      e.preventDefault();
+      $('#sendChatMsg').click(); // This will trigger the click handler above
+    }
+  });
+  // Shiny message handlers
+  Shiny.addCustomMessageHandler("agent_thinking_started", function(message) {
+    console.log("Received thinking started message");
+    if(message && typeof message.text === 'string') {
+      // if (thinkingMessageElement) {
+      //   thinkingMessageElement.remove();
+      //   thinkingMessageElement = null;
+      // }
+      // if (currentThoughtsContainer) {
+      //   currentThoughtsContainer = null; // It's part of thinkingMessageElement, will be handled if parent is removed
+      // }
+      addChatMessage(message.text, 'agent', true);
+    }
+  });
+  Shiny.addCustomMessageHandler("agent_new_thought", function(message) {
+    console.log("Received new thought");
+    if (message && typeof message.text === 'string' && currentThoughtsContainer) {
+      var $thoughtDiv = $('<div></div>').addClass('thought-item');
+      $thoughtDiv.data('pending-text', message.text);
+      currentThoughtsContainer.append($thoughtDiv);
+      // If thinking text is done, type this thought now, else it will be picked up in the queue
+      if (!thinkingTypingInProgress) {
+        // Guarantee thinkingTypingInProgress is set before starting thoughts
+        thinkingTypingInProgress = true;
+        var $thoughtsArea = currentThoughtsContainer;
+        var thoughtDivs = $thoughtsArea.children('.thought-item').toArray();
+        function typeThoughtsSequentially(idx) {
+          if (idx < thoughtDivs.length) {
+            var $thoughtDiv = $(thoughtDivs[idx]);
+            if ($thoughtDiv.text().length === 0) { // Check if text has not been typed yet
+              var text = $thoughtDiv.data('pending-text');
+              $thoughtDiv.removeData('pending-text');
+              typeTextLine($thoughtDiv, text, function() {
+                typeThoughtsSequentially(idx + 1);
+              }, 4);
+            } else {
+              // Already typed (e.g., if this function is re-entered)
+              typeThoughtsSequentially(idx + 1);
+            }
+          } else {
+            thinkingTypingInProgress = false;
+            if (resultTypingQueue.length > 0) {
+              var nextResult = resultTypingQueue.shift();
+              nextResult(); // This will handle its own scrolling if needed
+            }
+          }
+        }
+        typeThoughtsSequentially(0);
+      }
+      if (autoScrollEnabled) { // Conditional scroll
+        $chatMessages.scrollTop($chatMessages[0].scrollHeight);
+      }
+    }
+  });
+  Shiny.addCustomMessageHandler("agent_chat_response", function(message) {
+    console.log("Received chat response");
+    if(message && typeof message.text === 'string') {
+      addChatMessage(message.text, 'agent');
+    }
+    // Re-enable input and button
+    $('#chatInput').prop('disabled', false);
+    $('#sendChatMsg').prop('disabled', false);
+    $('#chatInput').focus(); // Optionally focus the input field
+  });
+  Shiny.addCustomMessageHandler("long_op_custom_warning", function(message) {
+    if (message && typeof message.text === 'string') {
+      // Add message to chat display, styled as a warning
+      // You might want to add a specific class for styling, e.g., 'long-op-warning-message'
+      // For now, it will use the default 'agent-message' style but appear as a distinct message.
+      var $chatMessages = $('#chatMessages');
+      var $messageDiv = $('<div></div>').addClass('chat-message agent-message long-op-warning'); // Added long-op-warning class
+      $messageDiv.css({
+        'background-color': 'rgba(255, 0, 0, 0.1)', // Light red, less intense than the old overlay
+        'border': '1px solid rgba(255, 0, 0, 0.3)',
+        'color': '#721c24', // Darker red text for readability
+        'padding': '10px',
+        'margin-bottom': '10px',
+        'border-radius': '5px'
+      });
+      $messageDiv.text(message.text);
+      $chatMessages.append($messageDiv);
+      $chatMessages.scrollTop($chatMessages[0].scrollHeight);
+    }
+  });
+  Shiny.addCustomMessageHandler("agent_processing_error", function(message) {
+    // This is a new handler you might need if server.R sends a specific error message type
+    // For now, agent_chat_response handles errors from server.R's tryCatch
+    console.error("Agent processing error:", message.text);
+    // Ensure UI is re-enabled even on specific error messages
+    $('#chatInput').prop('disabled', false);
+    $('#sendChatMsg').prop('disabled', false);
+    $('#chatInput').focus();
+    // Optionally display a more prominent error in the chat
+    if(message && typeof message.text === 'string') {
+       addChatMessage("Error: " + message.text, 'agent-error'); // Define 'agent-error' style if needed
+    } else {
+       addChatMessage("An unexpected error occurred with the agent.", 'agent-error');
+    }
+  });
+  // Literature toggle handler
+  $(document).off('click.literatureToggle').on('click.literatureToggle', '#literatureToggleBtn', function() {
+    var $btn = $(this);
+    $btn.toggleClass('active');
+    var isEnabled = $btn.hasClass('active');
+    Shiny.setInputValue("literature_search_enabled", isEnabled, {priority: "event"});
+    // Update button text/icon based on state
+    if (isEnabled) {
+      $btn.html('<i class="fa fa-search"></i> External Literature (ON)');
+    } else {
+      $btn.html('<i class="fa fa-search"></i> External Literature (OFF)');
+    }
+  });
+  // Initialize button state (default: disabled)
+  setTimeout(function() {
+    Shiny.setInputValue("literature_search_enabled", false);
+  }, 100);
+}

www/chat_styles.css CHANGED Viewed

@@ -1,336 +1,197 @@
-/* www/chat_styles.css */
-.chat-toggle-button {
-  position: fixed; /* Fixed position */
-  top: 10px;      /* Adjust as needed for navbar height */
-  right: 20px;
-  z-index: 1051;  /* Higher than sidebar to be clickable if sidebar somehow overlaps header */
-}
-/* Styles for the chat sidebar itself are mostly inline in chatSidebarUI for now,
-   but could be moved here. For example: */
-.chat-sidebar {
-  position: fixed !important;
-  right: 0;
-  top: 0;
-  height: 100vh;
-  width: 350px;
-  min-width: 250px;
-  max-width: 3200px;
-  z-index: 1050;
-  background-color: #f8f9fa;
-  border-left: 1px solid #dee2e6;
-  box-shadow: -2px 0 5px rgba(0,0,0,0.1);
-  transition: width 0.1s ease-out;
-  padding: 15px;
-  display: none;
-  pointer-events: auto;
-}
-.chat-messages-area {
-  /* height: calc(100vh - 200px); */ /* Adjust based on header/footer/input area */
-  /* overflow-y: auto; */
-  /* border: 1px solid #ccc; */
-  /* padding: 10px; */
-  /* margin-bottom: 10px; */
-  /* background-color: white; */
-}
-.chat-input-area {
-  /* display: flex; */
-  box-sizing: border-box; /* Add this to the container */
-}
-/* Basic styling for messages (example) */
-.chat-message {
-  padding: 8px 12px;
-  margin-bottom: 8px;
-  border-radius: 15px;
-  word-wrap: break-word;
-  max-width: 85%;
-  clear: both; /* Ensures messages don't overlap if floats were used */
-}
-.user-message {
-  background-color: #007bff; /* Primary blue for user */
-  color: white;
-  float: right;
-  margin-left: auto; /* Pushes to the right */
-  border-bottom-right-radius: 5px; /* Slightly different rounding for bubble effect */
-}
-.agent-message {
-  background-color: #e9ecef; /* Light grey for agent */
-  color: #495057;
-  float: left;
-  margin-right: auto; /* Pushes to the left */
-  border-bottom-left-radius: 5px; /* Slightly different rounding for bubble effect */
-}
-/* Styling for disclaimer messages */
-.agent-message.disclaimer {
-  background-color: #fff3cd; /* Light yellow background for warning */
-  border: 1px solid #ffeaa7; /* Light orange border */
-  color: #856404; /* Dark yellow-brown text */
-  font-size: 0.9em; /* Slightly smaller text */
-  font-style: italic; /* Italic for emphasis */
-}
-/* Styles for the custom chat tab list item */
-.custom-chat-tab-li {
-  position: relative; /* Allows precise positioning of the link if needed */
-  margin-top: 12px; /* ADJUSTED: Increased to 15px to move the tab down further. */
-}
-/* Styles for the custom chat tab link */
-#customChatTabLink {
-  padding: 10px 15px; /* Adjust padding to match other tabs */
-  line-height: 20px;  /* Adjust line-height to match other tabs */
-  display: block;
-  position: relative;
-  color: white !important; /* Make font color white */
-}
-#customChatTabLink:hover,
-#customChatTabLink:focus {
-  text-decoration: none;
-  background-color: #333333; /* Darker background on hover for white text, e.g., dark grey */
-  color: white !important; /* Ensure hover text color is also white */
-}
-/* Ensure input and button in chat sidebar respect their defined widths */
-.chat-input-area #chatInput {
-  box-sizing: border-box;
-  flex-grow: 1; /* Allow input to take available space */
-  margin-right: 5px; /* Add a small margin to separate from button if needed */
-  height: 38px !important; /* Explicit height */
-}
-.chat-input-area #sendChatMsg {
-  box-sizing: border-box;
-  flex-shrink: 0; /* Prevent button from shrinking */
-  display: inline-flex !important;    /* Make the button a flex container */
-  align-items: center !important;   /* Vertically center content inside the button */
-  justify-content: center !important; /* Horizontally center content (good for icon+text) */
-  height: 38px !important; /* Explicit height, matching input */
-  padding-left: 10px !important; /* Ensure some horizontal padding for button text */
-  padding-right: 10px !important; /* Ensure some horizontal padding for button text */
-}
-/* Resize handle styles */
-.resize-handle {
-  position: absolute;
-  left: 0;
-  top: 0;
-  width: 8px;
-  height: 100%;
-  cursor: ew-resize;
-  background-color: transparent;
-  z-index: 9999 !important;
-  transition: background-color 0.2s;
-  pointer-events: auto;
-}
-.resize-handle:hover {
-  background-color: rgba(0, 0, 0, 0.1) !important;
-}
-.thinking-message {
-  background-color: #fffbe6;
-  border-left: 4px solid #ffd700;
-  position: relative;
-}
-.thinking-text, .thought-item {
-  font-family: 'Fira Mono', 'Consolas', monospace;
-  white-space: pre-wrap;
-  display: block;
-  vertical-align: middle;
-}
-.thinking-text.typing-cursor:after, .thought-item.typing-cursor:after {
-  content: '|';
-  animation: blink-cursor 1s steps(1) infinite;
-  margin-left: 2px;
-  color: #888;
-}
-@keyframes blink-cursor {
-  0%, 100% { opacity: 1; }
-  50% { opacity: 0; }
-}
-.thoughts-area {
-  background: #f9f9f9;
-  border-left: 2px dashed #ffd700;
-  margin-top: 6px;
-  padding: 6px 10px;
-  border-radius: 8px;
-}
-.thought-toggle-arrow {
-  cursor: pointer;
-  font-size: 1.1em;
-  color: #bfa100;
-  margin-right: 4px;
-  user-select: none;
-}
-/* Literature confirmation dialog styles */
-.literature-confirmation-dialog {
-  background: linear-gradient(135deg, #fff3cd 0%, #fef5e7 100%);
-  border: 2px solid #ffc107;
-  border-radius: 10px;
-  padding: 18px 22px;
-  margin: 12px auto;
-  box-shadow: 0 3px 12px rgba(0,0,0,0.12);
-  animation: fadeInScale 0.4s ease-out;
-  max-width: 85%;
-  width: fit-content;
-  min-width: 280px;
-  position: relative;
-  display: block;
-}
-.confirmation-content {
-  text-align: center;
-  width: 100%;
-}
-.confirmation-message {
-  font-size: 14px;
-  color: #856404;
-  margin-bottom: 18px;
-  line-height: 1.5;
-  font-weight: 500;
-  padding: 0 8px;
-  max-width: 350px;
-  margin-left: auto;
-  margin-right: auto;
-}
-.confirmation-buttons {
-  display: grid;
-  grid-template-columns: 1fr 1fr;
-  gap: 8px;
-  justify-content: center;
-  align-items: center;
-}
-.confirmation-buttons .btn {
-  min-width: 70px;
-  font-size: 12px;
-  font-weight: 600;
-  padding: 8px 12px;
-  border-radius: 6px;
-  cursor: pointer;
-  border: none;
-  transition: all 0.3s ease;
-  display: inline-flex;
-  align-items: center;
-  justify-content: center;
-  white-space: nowrap;
-  text-align: center;
-}
-.confirmation-paper {
-  background: linear-gradient(135deg, #007bff 0%, #0056b3 100%);
-  color: white;
-  box-shadow: 0 2px 6px rgba(0, 123, 255, 0.3);
-}
-.confirmation-paper:hover {
-  background: linear-gradient(135deg, #0056b3 0%, #004085 100%);
-  transform: translateY(-1px);
-  box-shadow: 0 3px 8px rgba(0, 123, 255, 0.4);
-}
-.confirmation-external {
-  background: linear-gradient(135deg, #28a745 0%, #20c997 100%);
-  color: white;
-  box-shadow: 0 2px 6px rgba(40, 167, 69, 0.3);
-}
-.confirmation-external:hover {
-  background: linear-gradient(135deg, #218838 0%, #1aa085 100%);
-  transform: translateY(-1px);
-  box-shadow: 0 3px 8px rgba(40, 167, 69, 0.4);
-}
-.confirmation-both {
-  background: linear-gradient(135deg, #6f42c1 0%, #5a2d91 100%);
-  color: white;
-  box-shadow: 0 2px 6px rgba(111, 66, 193, 0.3);
-}
-.confirmation-both:hover {
-  background: linear-gradient(135deg, #5a2d91 0%, #4c2470 100%);
-  transform: translateY(-1px);
-  box-shadow: 0 3px 8px rgba(111, 66, 193, 0.4);
-}
-.confirmation-none {
-  background: linear-gradient(135deg, #6c757d 0%, #adb5bd 100%);
-  color: white;
-  box-shadow: 0 2px 6px rgba(108, 117, 125, 0.3);
-}
-.confirmation-none:hover {
-  background: linear-gradient(135deg, #545b62 0%, #868e96 100%);
-  transform: translateY(-1px);
-  box-shadow: 0 3px 8px rgba(108, 117, 125, 0.4);
-}
-@keyframes fadeIn {
-  from { opacity: 0; transform: translateY(-10px); }
-  to { opacity: 1; transform: translateY(0); }
-}
-@keyframes fadeInScale {
-  from {
-    opacity: 0;
-    transform: translateY(-10px) scale(0.95);
-  }
-  to {
-    opacity: 1;
-    transform: translateY(0) scale(1);
-  }
-}
-/* Responsive design for confirmation dialog */
-@media (max-width: 380px) {
-  .literature-confirmation-dialog {
-    margin: 8px 3px;
-    padding: 15px 12px;
-    min-width: 250px;
-    max-width: calc(100% - 6px);
-  }
-  .confirmation-message {
-    font-size: 13px;
-    padding: 0 6px;
-    margin-bottom: 15px;
-  }
-  .confirmation-buttons {
-    grid-template-columns: 1fr;
-    gap: 6px;
-  }
-  .confirmation-buttons .btn {
-    font-size: 11px;
-    padding: 6px 10px;
-    min-width: 60px;
-  }
-}
-@media (max-width: 280px) {
-  .literature-confirmation-dialog {
-    margin: 6px 2px;
-    padding: 12px 8px;
-    min-width: auto;
-  }
-  .confirmation-message {
-    font-size: 12px;
-    padding: 0 4px;
-  }
-  .confirmation-buttons .btn {
-    font-size: 10px;
-    padding: 5px 8px;
-  }
 }

+/* www/chat_styles.css */
+.chat-toggle-button {
+  position: fixed; /* Fixed position */
+  top: 10px;      /* Adjust as needed for navbar height */
+  right: 20px;
+  z-index: 1051;  /* Higher than sidebar to be clickable if sidebar somehow overlaps header */
+}
+/* Styles for the chat sidebar itself are mostly inline in chatSidebarUI for now,
+   but could be moved here. For example: */
+.chat-sidebar {
+  position: fixed !important;
+  right: 0;
+  top: 0;
+  height: 100vh;
+  width: 350px;
+  min-width: 250px;
+  max-width: 3200px;
+  z-index: 1050;
+  background-color: #f8f9fa;
+  border-left: 1px solid #dee2e6;
+  box-shadow: -2px 0 5px rgba(0,0,0,0.1);
+  transition: width 0.1s ease-out;
+  padding: 15px;
+  display: none;
+  pointer-events: auto;
+}
+.chat-messages-area {
+  /* height: calc(100vh - 200px); */ /* Adjust based on header/footer/input area */
+  /* overflow-y: auto; */
+  /* border: 1px solid #ccc; */
+  /* padding: 10px; */
+  /* margin-bottom: 10px; */
+  /* background-color: white; */
+}
+.chat-input-area {
+  /* display: flex; */
+  box-sizing: border-box; /* Add this to the container */
+}
+/* Basic styling for messages (example) */
+.chat-message {
+  padding: 8px 12px;
+  margin-bottom: 8px;
+  border-radius: 15px;
+  word-wrap: break-word;
+  max-width: 85%;
+  clear: both; /* Ensures messages don't overlap if floats were used */
+}
+.user-message {
+  background-color: #007bff; /* Primary blue for user */
+  color: white;
+  float: right;
+  margin-left: auto; /* Pushes to the right */
+  border-bottom-right-radius: 5px; /* Slightly different rounding for bubble effect */
+}
+.agent-message {
+  background-color: #e9ecef; /* Light grey for agent */
+  color: #495057;
+  float: left;
+  margin-right: auto; /* Pushes to the left */
+  border-bottom-left-radius: 5px; /* Slightly different rounding for bubble effect */
+}
+/* Styling for disclaimer messages */
+.agent-message.disclaimer {
+  background-color: #fff3cd; /* Light yellow background for warning */
+  border: 1px solid #ffeaa7; /* Light orange border */
+  color: #856404; /* Dark yellow-brown text */
+  font-size: 0.9em; /* Slightly smaller text */
+  font-style: italic; /* Italic for emphasis */
+}
+/* Styles for the custom chat tab list item */
+.custom-chat-tab-li {
+  position: relative; /* Allows precise positioning of the link if needed */
+  margin-top: 12px; /* ADJUSTED: Increased to 15px to move the tab down further. */
+}
+/* Styles for the custom chat tab link */
+#customChatTabLink {
+  padding: 10px 15px; /* Adjust padding to match other tabs */
+  line-height: 20px;  /* Adjust line-height to match other tabs */
+  display: block;
+  position: relative;
+  color: white !important; /* Make font color white */
+}
+#customChatTabLink:hover,
+#customChatTabLink:focus {
+  text-decoration: none;
+  background-color: #333333; /* Darker background on hover for white text, e.g., dark grey */
+  color: white !important; /* Ensure hover text color is also white */
+}
+/* Ensure input and button in chat sidebar respect their defined widths */
+.chat-input-area #chatInput {
+  box-sizing: border-box;
+  flex-grow: 1; /* Allow input to take available space */
+  margin-right: 5px; /* Add a small margin to separate from button if needed */
+  height: 38px !important; /* Explicit height */
+}
+.chat-input-area #sendChatMsg {
+  box-sizing: border-box;
+  flex-shrink: 0; /* Prevent button from shrinking */
+  display: inline-flex !important;    /* Make the button a flex container */
+  align-items: center !important;   /* Vertically center content inside the button */
+  justify-content: center !important; /* Horizontally center content (good for icon+text) */
+  height: 38px !important; /* Explicit height, matching input */
+  padding-left: 10px !important; /* Ensure some horizontal padding for button text */
+  padding-right: 10px !important; /* Ensure some horizontal padding for button text */
+}
+/* Resize handle styles */
+.resize-handle {
+  position: absolute;
+  left: 0;
+  top: 0;
+  width: 8px;
+  height: 100%;
+  cursor: ew-resize;
+  background-color: transparent;
+  z-index: 9999 !important;
+  transition: background-color 0.2s;
+  pointer-events: auto;
+}
+.resize-handle:hover {
+  background-color: rgba(0, 0, 0, 0.1) !important;
+}
+.thinking-message {
+  background-color: #fffbe6;
+  border-left: 4px solid #ffd700;
+  position: relative;
+}
+.thinking-text, .thought-item {
+  font-family: 'Fira Mono', 'Consolas', monospace;
+  white-space: pre-wrap;
+  display: block;
+  vertical-align: middle;
+}
+.thinking-text.typing-cursor:after, .thought-item.typing-cursor:after {
+  content: '|';
+  animation: blink-cursor 1s steps(1) infinite;
+  margin-left: 2px;
+  color: #888;
+}
+@keyframes blink-cursor {
+  0%, 100% { opacity: 1; }
+  50% { opacity: 0; }
+}
+.thoughts-area {
+  background: #f9f9f9;
+  border-left: 2px dashed #ffd700;
+  margin-top: 6px;
+  padding: 6px 10px;
+  border-radius: 8px;
+}
+.thought-toggle-arrow {
+  cursor: pointer;
+  font-size: 1.1em;
+  color: #bfa100;
+  margin-right: 4px;
+  user-select: none;
+}
+/* Literature toggle button styles */
+.literature-toggle-btn {
+  transition: all 0.3s ease;
+  border: 2px solid #17a2b8;
+}
+.literature-toggle-btn.active {
+  background-color: #17a2b8 !important;
+  color: white !important;
+  border-color: #17a2b8;
+}
+.literature-toggle-btn:not(.active) {
+  background-color: transparent;
+  color: #17a2b8;
+}
+.literature-toggle-btn:hover {
+  background-color: #17a2b8;
+  color: white;
+}
+.literature-toggle-container {
+  margin: 5px 0;
 }

www/pages_description.md CHANGED Viewed

@@ -1,30 +1,30 @@
-homepage:
-- This webpage introduces the "TF atlas of CD8+ T cell states," a platform resulting from a multi-omics study focused on understanding and selectively programming T cell differentiation. The research, a collaboration involving UC San Diego, the Salk Institute, and The University of North Carolina at Chapel Hill, leverages a comprehensive transcriptional and epigenetic atlas generated from RNA-seq and ATAC-seq data. The atlas helps predict transcription factor (TF) activity and define differentiation trajectories, aiming to identify TFs that can control specific T cell states, such as terminally exhausted and tissue-resident memory T cells, for potential therapeutic applications in areas like cancer and viral infections.
-TF Catalog - Search TF Scores:
-This webpage provides a tool to "Search TF Scores" related to T cell differentiation. It features a diagram illustrating the "Memory path" and "Exhaustion path" of T cells, including states like Naive, Memory Precursor (MP), Effector T cell (TE), Tissue-Resident Memory (TRM), Terminal Effector (TEM), Progenitor Exhausted (TEXprog), and Terminally Exhausted (TEX). The core of the page is a searchable table displaying "TF activity score" for various transcription factors (TFs) across different T cell states and datasets. This allows users to explore and compare the activity levels of specific TFs in distinct T cell populations, aiding in the understanding of T cell fate decisions.
-TF Catalog - Cell State Specific TF Catalog:
-This webpage displays "Naive Specific Cells & normalized TF Activity Scores" as part of a "Cell State Specific TF Catalog." It features a dot plot visualization where each row represents a transcription factor (TF) and each column likely represents a specific sample or condition within naive T cells. The color intensity of the dots corresponds to the normalized TF activity score (PageRank score), while the size of the dots indicates the log-transformed gene expression level (TPM). This allows users to explore and compare the activity and expression of various TFs within the naive T cell state.
-TF Catalog - Multi-State TFs
-This webpage displays a series of heatmaps visualizing normalized PageRank scores, likely representing transcription factor activity, across various T cell differentiation states (Naive, MP, TE, TRM, TEM, TEXprog, TEXeff, TEXterm). The heatmaps are segmented into categories such as "Shared in cell states from acute infection," "Shared in cell states from chronic infection," and specific T cell subsets like "TRM & TEXPROG" and "MP, TE, TEXPROG." This presentation allows for the comparative analysis of transcription factor activity profiles across different T cell populations and under varying immunological contexts, revealing potential regulatory patterns.
-TF Wave Analysis:
-This webpage, titled "TF Wave Analysis," is dedicated to exploring the dynamic activity patterns of transcription factors (TFs) during T cell differentiation. It presents a series of visualizations, referred to as "TF Waves" (Wave 1, Wave 2, etc.), which illustrate how the activity of different sets of TFs changes as T cells transition through various states (Naive, MP, TRM, TEM, TEXprog, TEXterm). These waves are depicted on diagrams of T cell differentiation pathways, with color intensity and accompanying bar graphs likely indicating the strength or timing of TF activity. A table at the bottom of the page lists specific TFs and their association with these identified waves, allowing users to understand the sequential and coordinated roles of TFs in orchestrating T cell fate.
-TF Network Analysis - Search TF-TF Correlation in TRM/TEXterm:
-This webpage provides a tool to "Search TF-TF Correlation in TRM/TEXterm," allowing users to explore interactions and correlations between transcription factors (TFs) specifically within Tissue-Resident Memory (TRM) and Terminally Exhausted (TEXterm) T cell states.
-The page explains that it uses data from ChIP-seq and Hi-C to build TF interaction networks. It visualizes these relationships, showing how a "TF-regulatee network" and a "TF X TF correlation matrix" contribute to understanding "TF-TF association." Users can enter a transcription factor of interest to search for its correlations. A key explains the network visualization: circle color indicates TF specificity to TRM (green) or TEXterm (brown), line thickness denotes interaction intensity, and line color shows if the interaction is found in TRM (green) or TEXterm (brown). This tool aims to identify cooperations between DNA-binding proteins in these specific T cell states.
-TF Network Analysis - TF Community in TRM/TEXterm:
-This webpage focuses on "TF Community in TRM/TEXterm," illustrating how transcription factor (TF) associations are analyzed through clustering to identify distinct TF communities within Tissue-Resident Memory (TRM) and Terminally Exhausted (TEXterm) T cells.
-The page displays several network visualizations: one showing combined TRM and TEXterm TF communities and their interconnections, another detailing TRM-specific TF-TF interactions organized into communities (C1-C5), and a third depicting TEXterm-specific TF-TF interactions, also grouped into communities (C1-C5).
-Below these networks, tables likely provide details about the TFs that constitute each identified community. Furthermore, the webpage highlights "Shared pathways" between TRM and TEXterm communities and "Enriched pathways" specific to either the TRM or TEXterm TF networks, linking these TF communities to biological functions such as "IL-2 production," "cell-cell adhesion," "T cell activation," "intrinsic apoptosis," and "catabolism." This allows for an understanding of the collaborative roles of TFs in regulating distinct cellular processes within these two T cell states.
-Multi-omics Data:
-This webpage, titled "Multi-omics Data," presents a comprehensive, scrollable table cataloging various experimental datasets relevant to T cell research. Each entry in the table details specific studies, including information such as the primary author, laboratory, publication year, data accession number, type of data (e.g., RNA-seq, ATAC-seq), biological species (primarily mouse), infection model used (e.g., LCMV Arm), and the specific T cell populations analyzed (e.g., Naive, MP, TE, TRM, TexProg, TexTerm) along with their defining markers or characteristics. The page includes features for searching and adjusting the number of displayed entries, indicating it serves as an interactive repository for accessing and reviewing details of diverse multi-omics datasets.

+homepage:
+- This webpage introduces the "TF atlas of CD8+ T cell states," a platform resulting from a multi-omics study focused on understanding and selectively programming T cell differentiation. The research, a collaboration involving UC San Diego, the Salk Institute, and The University of North Carolina at Chapel Hill, leverages a comprehensive transcriptional and epigenetic atlas generated from RNA-seq and ATAC-seq data. The atlas helps predict transcription factor (TF) activity and define differentiation trajectories, aiming to identify TFs that can control specific T cell states, such as terminally exhausted and tissue-resident memory T cells, for potential therapeutic applications in areas like cancer and viral infections.
+TF Catalog - Search TF Scores:
+This webpage provides a tool to "Search TF Scores" related to T cell differentiation. It features a diagram illustrating the "Memory path" and "Exhaustion path" of T cells, including states like Naive, Memory Precursor (MP), Effector T cell (TE), Tissue-Resident Memory (TRM), Terminal Effector (TEM), Progenitor Exhausted (TEXprog), and Terminally Exhausted (TEX). The core of the page is a searchable table displaying "TF activity score" for various transcription factors (TFs) across different T cell states and datasets. This allows users to explore and compare the activity levels of specific TFs in distinct T cell populations, aiding in the understanding of T cell fate decisions.
+TF Catalog - Cell State Specific TF Catalog:
+This webpage displays "Naive Specific Cells & normalized TF Activity Scores" as part of a "Cell State Specific TF Catalog." It features a dot plot visualization where each row represents a transcription factor (TF) and each column likely represents a specific sample or condition within naive T cells. The color intensity of the dots corresponds to the normalized TF activity score (PageRank score), while the size of the dots indicates the log-transformed gene expression level (TPM). This allows users to explore and compare the activity and expression of various TFs within the naive T cell state.
+TF Catalog - Multi-State TFs
+This webpage displays a series of heatmaps visualizing normalized PageRank scores, likely representing transcription factor activity, across various T cell differentiation states (Naive, MP, TE, TRM, TEM, TEXprog, TEXeff, TEXterm). The heatmaps are segmented into categories such as "Shared in cell states from acute infection," "Shared in cell states from chronic infection," and specific T cell subsets like "TRM & TEXPROG" and "MP, TE, TEXPROG." This presentation allows for the comparative analysis of transcription factor activity profiles across different T cell populations and under varying immunological contexts, revealing potential regulatory patterns.
+TF Wave Analysis:
+This webpage, titled "TF Wave Analysis," is dedicated to exploring the dynamic activity patterns of transcription factors (TFs) during T cell differentiation. It presents a series of visualizations, referred to as "TF Waves" (Wave 1, Wave 2, etc.), which illustrate how the activity of different sets of TFs changes as T cells transition through various states (Naive, MP, TRM, TEM, TEXprog, TEXterm). These waves are depicted on diagrams of T cell differentiation pathways, with color intensity and accompanying bar graphs likely indicating the strength or timing of TF activity. A table at the bottom of the page lists specific TFs and their association with these identified waves, allowing users to understand the sequential and coordinated roles of TFs in orchestrating T cell fate.
+TF Network Analysis - Search TF-TF Correlation in TRM/TEXterm:
+This webpage provides a tool to "Search TF-TF Correlation in TRM/TEXterm," allowing users to explore interactions and correlations between transcription factors (TFs) specifically within Tissue-Resident Memory (TRM) and Terminally Exhausted (TEXterm) T cell states.
+The page explains that it uses data from ChIP-seq and Hi-C to build TF interaction networks. It visualizes these relationships, showing how a "TF-regulatee network" and a "TF X TF correlation matrix" contribute to understanding "TF-TF association." Users can enter a transcription factor of interest to search for its correlations. A key explains the network visualization: circle color indicates TF specificity to TRM (green) or TEXterm (brown), line thickness denotes interaction intensity, and line color shows if the interaction is found in TRM (green) or TEXterm (brown). This tool aims to identify cooperations between DNA-binding proteins in these specific T cell states.
+TF Network Analysis - TF Community in TRM/TEXterm:
+This webpage focuses on "TF Community in TRM/TEXterm," illustrating how transcription factor (TF) associations are analyzed through clustering to identify distinct TF communities within Tissue-Resident Memory (TRM) and Terminally Exhausted (TEXterm) T cells.
+The page displays several network visualizations: one showing combined TRM and TEXterm TF communities and their interconnections, another detailing TRM-specific TF-TF interactions organized into communities (C1-C5), and a third depicting TEXterm-specific TF-TF interactions, also grouped into communities (C1-C5).
+Below these networks, tables likely provide details about the TFs that constitute each identified community. Furthermore, the webpage highlights "Shared pathways" between TRM and TEXterm communities and "Enriched pathways" specific to either the TRM or TEXterm TF networks, linking these TF communities to biological functions such as "IL-2 production," "cell-cell adhesion," "T cell activation," "intrinsic apoptosis," and "catabolism." This allows for an understanding of the collaborative roles of TFs in regulating distinct cellular processes within these two T cell states.
+Multi-omics Data:
+This webpage, titled "Multi-omics Data," presents a comprehensive, scrollable table cataloging various experimental datasets relevant to T cell research. Each entry in the table details specific studies, including information such as the primary author, laboratory, publication year, data accession number, type of data (e.g., RNA-seq, ATAC-seq), biological species (primarily mouse), infection model used (e.g., LCMV Arm), and the specific T cell populations analyzed (e.g., Naive, MP, TE, TRM, TexProg, TexTerm) along with their defining markers or characteristics. The page includes features for searching and adjusting the number of displayed entries, indicating it serves as an interactive repository for accessing and reviewing details of diverse multi-omics datasets.

www_backup_original/pages_description.md CHANGED Viewed

@@ -1,30 +1,30 @@
-homepage:
-- This webpage introduces the "TF atlas of CD8+ T cell states," a platform resulting from a multi-omics study focused on understanding and selectively programming T cell differentiation. The research, a collaboration involving UC San Diego, the Salk Institute, and The University of North Carolina at Chapel Hill, leverages a comprehensive transcriptional and epigenetic atlas generated from RNA-seq and ATAC-seq data. The atlas helps predict transcription factor (TF) activity and define differentiation trajectories, aiming to identify TFs that can control specific T cell states, such as terminally exhausted and tissue-resident memory T cells, for potential therapeutic applications in areas like cancer and viral infections.
-TF Catalog - Search TF Scores:
-This webpage provides a tool to "Search TF Scores" related to T cell differentiation. It features a diagram illustrating the "Memory path" and "Exhaustion path" of T cells, including states like Naive, Memory Precursor (MP), Effector T cell (TE), Tissue-Resident Memory (TRM), Terminal Effector (TEM), Progenitor Exhausted (TEXprog), and Terminally Exhausted (TEX). The core of the page is a searchable table displaying "TF activity score" for various transcription factors (TFs) across different T cell states and datasets. This allows users to explore and compare the activity levels of specific TFs in distinct T cell populations, aiding in the understanding of T cell fate decisions.
-TF Catalog - Cell State Specific TF Catalog:
-This webpage displays "Naive Specific Cells & normalized TF Activity Scores" as part of a "Cell State Specific TF Catalog." It features a dot plot visualization where each row represents a transcription factor (TF) and each column likely represents a specific sample or condition within naive T cells. The color intensity of the dots corresponds to the normalized TF activity score (PageRank score), while the size of the dots indicates the log-transformed gene expression level (TPM). This allows users to explore and compare the activity and expression of various TFs within the naive T cell state.
-TF Catalog - Multi-State TFs
-This webpage displays a series of heatmaps visualizing normalized PageRank scores, likely representing transcription factor activity, across various T cell differentiation states (Naive, MP, TE, TRM, TEM, TEXprog, TEXeff, TEXterm). The heatmaps are segmented into categories such as "Shared in cell states from acute infection," "Shared in cell states from chronic infection," and specific T cell subsets like "TRM & TEXPROG" and "MP, TE, TEXPROG." This presentation allows for the comparative analysis of transcription factor activity profiles across different T cell populations and under varying immunological contexts, revealing potential regulatory patterns.
-TF Wave Analysis:
-This webpage, titled "TF Wave Analysis," is dedicated to exploring the dynamic activity patterns of transcription factors (TFs) during T cell differentiation. It presents a series of visualizations, referred to as "TF Waves" (Wave 1, Wave 2, etc.), which illustrate how the activity of different sets of TFs changes as T cells transition through various states (Naive, MP, TRM, TEM, TEXprog, TEXterm). These waves are depicted on diagrams of T cell differentiation pathways, with color intensity and accompanying bar graphs likely indicating the strength or timing of TF activity. A table at the bottom of the page lists specific TFs and their association with these identified waves, allowing users to understand the sequential and coordinated roles of TFs in orchestrating T cell fate.
-TF Network Analysis - Search TF-TF Correlation in TRM/TEXterm:
-This webpage provides a tool to "Search TF-TF Correlation in TRM/TEXterm," allowing users to explore interactions and correlations between transcription factors (TFs) specifically within Tissue-Resident Memory (TRM) and Terminally Exhausted (TEXterm) T cell states.
-The page explains that it uses data from ChIP-seq and Hi-C to build TF interaction networks. It visualizes these relationships, showing how a "TF-regulatee network" and a "TF X TF correlation matrix" contribute to understanding "TF-TF association." Users can enter a transcription factor of interest to search for its correlations. A key explains the network visualization: circle color indicates TF specificity to TRM (green) or TEXterm (brown), line thickness denotes interaction intensity, and line color shows if the interaction is found in TRM (green) or TEXterm (brown). This tool aims to identify cooperations between DNA-binding proteins in these specific T cell states.
-TF Network Analysis - TF Community in TRM/TEXterm:
-This webpage focuses on "TF Community in TRM/TEXterm," illustrating how transcription factor (TF) associations are analyzed through clustering to identify distinct TF communities within Tissue-Resident Memory (TRM) and Terminally Exhausted (TEXterm) T cells.
-The page displays several network visualizations: one showing combined TRM and TEXterm TF communities and their interconnections, another detailing TRM-specific TF-TF interactions organized into communities (C1-C5), and a third depicting TEXterm-specific TF-TF interactions, also grouped into communities (C1-C5).
-Below these networks, tables likely provide details about the TFs that constitute each identified community. Furthermore, the webpage highlights "Shared pathways" between TRM and TEXterm communities and "Enriched pathways" specific to either the TRM or TEXterm TF networks, linking these TF communities to biological functions such as "IL-2 production," "cell-cell adhesion," "T cell activation," "intrinsic apoptosis," and "catabolism." This allows for an understanding of the collaborative roles of TFs in regulating distinct cellular processes within these two T cell states.
-Multi-omics Data:
-This webpage, titled "Multi-omics Data," presents a comprehensive, scrollable table cataloging various experimental datasets relevant to T cell research. Each entry in the table details specific studies, including information such as the primary author, laboratory, publication year, data accession number, type of data (e.g., RNA-seq, ATAC-seq), biological species (primarily mouse), infection model used (e.g., LCMV Arm), and the specific T cell populations analyzed (e.g., Naive, MP, TE, TRM, TexProg, TexTerm) along with their defining markers or characteristics. The page includes features for searching and adjusting the number of displayed entries, indicating it serves as an interactive repository for accessing and reviewing details of diverse multi-omics datasets.

+homepage:
+- This webpage introduces the "TF atlas of CD8+ T cell states," a platform resulting from a multi-omics study focused on understanding and selectively programming T cell differentiation. The research, a collaboration involving UC San Diego, the Salk Institute, and The University of North Carolina at Chapel Hill, leverages a comprehensive transcriptional and epigenetic atlas generated from RNA-seq and ATAC-seq data. The atlas helps predict transcription factor (TF) activity and define differentiation trajectories, aiming to identify TFs that can control specific T cell states, such as terminally exhausted and tissue-resident memory T cells, for potential therapeutic applications in areas like cancer and viral infections.
+TF Catalog - Search TF Scores:
+This webpage provides a tool to "Search TF Scores" related to T cell differentiation. It features a diagram illustrating the "Memory path" and "Exhaustion path" of T cells, including states like Naive, Memory Precursor (MP), Effector T cell (TE), Tissue-Resident Memory (TRM), Terminal Effector (TEM), Progenitor Exhausted (TEXprog), and Terminally Exhausted (TEX). The core of the page is a searchable table displaying "TF activity score" for various transcription factors (TFs) across different T cell states and datasets. This allows users to explore and compare the activity levels of specific TFs in distinct T cell populations, aiding in the understanding of T cell fate decisions.
+TF Catalog - Cell State Specific TF Catalog:
+This webpage displays "Naive Specific Cells & normalized TF Activity Scores" as part of a "Cell State Specific TF Catalog." It features a dot plot visualization where each row represents a transcription factor (TF) and each column likely represents a specific sample or condition within naive T cells. The color intensity of the dots corresponds to the normalized TF activity score (PageRank score), while the size of the dots indicates the log-transformed gene expression level (TPM). This allows users to explore and compare the activity and expression of various TFs within the naive T cell state.
+TF Catalog - Multi-State TFs
+This webpage displays a series of heatmaps visualizing normalized PageRank scores, likely representing transcription factor activity, across various T cell differentiation states (Naive, MP, TE, TRM, TEM, TEXprog, TEXeff, TEXterm). The heatmaps are segmented into categories such as "Shared in cell states from acute infection," "Shared in cell states from chronic infection," and specific T cell subsets like "TRM & TEXPROG" and "MP, TE, TEXPROG." This presentation allows for the comparative analysis of transcription factor activity profiles across different T cell populations and under varying immunological contexts, revealing potential regulatory patterns.
+TF Wave Analysis:
+This webpage, titled "TF Wave Analysis," is dedicated to exploring the dynamic activity patterns of transcription factors (TFs) during T cell differentiation. It presents a series of visualizations, referred to as "TF Waves" (Wave 1, Wave 2, etc.), which illustrate how the activity of different sets of TFs changes as T cells transition through various states (Naive, MP, TRM, TEM, TEXprog, TEXterm). These waves are depicted on diagrams of T cell differentiation pathways, with color intensity and accompanying bar graphs likely indicating the strength or timing of TF activity. A table at the bottom of the page lists specific TFs and their association with these identified waves, allowing users to understand the sequential and coordinated roles of TFs in orchestrating T cell fate.
+TF Network Analysis - Search TF-TF Correlation in TRM/TEXterm:
+This webpage provides a tool to "Search TF-TF Correlation in TRM/TEXterm," allowing users to explore interactions and correlations between transcription factors (TFs) specifically within Tissue-Resident Memory (TRM) and Terminally Exhausted (TEXterm) T cell states.
+The page explains that it uses data from ChIP-seq and Hi-C to build TF interaction networks. It visualizes these relationships, showing how a "TF-regulatee network" and a "TF X TF correlation matrix" contribute to understanding "TF-TF association." Users can enter a transcription factor of interest to search for its correlations. A key explains the network visualization: circle color indicates TF specificity to TRM (green) or TEXterm (brown), line thickness denotes interaction intensity, and line color shows if the interaction is found in TRM (green) or TEXterm (brown). This tool aims to identify cooperations between DNA-binding proteins in these specific T cell states.
+TF Network Analysis - TF Community in TRM/TEXterm:
+This webpage focuses on "TF Community in TRM/TEXterm," illustrating how transcription factor (TF) associations are analyzed through clustering to identify distinct TF communities within Tissue-Resident Memory (TRM) and Terminally Exhausted (TEXterm) T cells.
+The page displays several network visualizations: one showing combined TRM and TEXterm TF communities and their interconnections, another detailing TRM-specific TF-TF interactions organized into communities (C1-C5), and a third depicting TEXterm-specific TF-TF interactions, also grouped into communities (C1-C5).
+Below these networks, tables likely provide details about the TFs that constitute each identified community. Furthermore, the webpage highlights "Shared pathways" between TRM and TEXterm communities and "Enriched pathways" specific to either the TRM or TEXterm TF networks, linking these TF communities to biological functions such as "IL-2 production," "cell-cell adhesion," "T cell activation," "intrinsic apoptosis," and "catabolism." This allows for an understanding of the collaborative roles of TFs in regulating distinct cellular processes within these two T cell states.
+Multi-omics Data:
+This webpage, titled "Multi-omics Data," presents a comprehensive, scrollable table cataloging various experimental datasets relevant to T cell research. Each entry in the table details specific studies, including information such as the primary author, laboratory, publication year, data accession number, type of data (e.g., RNA-seq, ATAC-seq), biological species (primarily mouse), infection model used (e.g., LCMV Arm), and the specific T cell populations analyzed (e.g., Naive, MP, TE, TRM, TexProg, TexTerm) along with their defining markers or characteristics. The page includes features for searching and adjusting the number of displayed entries, indicating it serves as an interactive repository for accessing and reviewing details of diverse multi-omics datasets.