48
Candidates
5
Top Picks
0
Blockbusters
18
Max Score

Editor's Rationale

Top pick: LongSeeker introduces Context-ReAct, an elastic context orchestration paradigm with five atomic operations (Skip, Compress, Rollback, Snippet, Delete) for long-horizon search agents. Strong empirical wins on BrowseComp (61.5%) substantially beat Tongyi DeepResearch and AgentFold — practical impact for anyone shipping agentic search systems.

Top Picks

5 papers
#01 18

LongSeeker: Elastic Context Orchestration for Long-Horizon Search Agents

Yijun Lu, Rui Ye, Yuwen Du et al.
cs.AI
Long-horizon search agents must manage a rapidly growing working context as they reason, call tools, and observe information. Naively accumulating all intermediate content can overwhelm the agent, increasing costs and the risk of errors. We propose that effective context manageme…
agentagentictool useReActreasoningorchestration
#02 12

Self-Induced Outcome Potential: Turn-Level Credit Assignment for Agents without Verifiers

Senkang Hu, Yong Dai, Xudong Han et al.
cs.LG, cs.CL
Long-horizon LLM agents depend on intermediate information-gathering turns, yet training feedback is usually observed only at the final answer, because process-level rewards require high-quality human annotation. Existing turn-level shaping methods reward turns that increase the …
agentagenticreasoningRAGbenchmarkPPO
#03 9

Design Conductor 2.0: An agent builds a TurboQuant inference accelerator in 80 hours

The Verkor Team, Ravi Krishna, Suresh Krishna et al.
cs.AR, cs.AI
Driven by a rapid co-evolution of both harness and underlying models, LLM agents are improving at a dizzying pace. In our prior work (performed in Dec. 2025), we introduced "Design Conductor" (or just "Conductor"), a system capable of building a 5-stage Linux-capable RISC-V CPU i…
agentmulti-agentautonomousPPO
#04 8

Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime

Tianshu Zhu, Wenyu Zhang, Xiaoying Zuo et al.
cs.LG
SWE-bench-style agentic reinforcement learning relies on expensive stateful trajectories, yet substantial compute is wasted on sampled rollout groups with skewed pass rates, where binary rewards provide a weak contrastive signal. We frame this inefficiency as a pass-rate control …
agentagenticreasoningRAGGRPO
#05 6

Executable World Models for ARC-AGI-3 in the Era of Coding Agents

Sergey Rodionov
cs.AI
We evaluate an initial coding-agent system for ARC-AGI-3 in which the agent maintains an executable Python world model, verifies it against previous observations, refactors it toward simpler abstractions as a practical proxy for an MDL-like simplicity bias, and plans through the …
agent

All Candidates

48 papers
ScorePaperAuthorsCategory
6 Misaligned by Reward: Socially Undesirable Preferences in LLMs Gayane Ghazaryan, Esra Dönmez cs.CL, cs.AI, cs.CY
6 Why Expert Alignment Is Hard: Evidence from Subjective Evaluation Tzu-Mi Lin, Wataru Hirota et al. cs.CL
5 Adaptive Policy Selection and Fine-Tuning under Interaction Budgets for Offline-to-Online Reinforcement Learning Alper Kamil Bozkurt, Xiaoan Xu et al. cs.LG, cs.AI
5 Beyond Semantics: An Evidential Reasoning-Aware Multi-View Learning Framework for Trustworthy Mental Health Prediction Yucheng Ruan, Ling Huang et al. cs.CL
5 TabEmbed: Benchmarking and Learning Generalist Embeddings for Tabular Understanding Minjie Qiang, Mingming Zhang et al. cs.CL, cs.IR
4 Geometry-Aware State Space Model: A New Paradigm for Whole-Slide Image Representation Enhui Chai, Sicheng Chen et al. cs.CV, cs.AI
4 Text Corpora as Concept Fields: Black-Box Hallucination and Novelty Measurement Nicholas S. Kersting, Vittorio Castelli et al. cs.CL, cs.AI, cs.CY
4 Low-Cost Black-Box Detection of LLM Hallucinations via Dynamical System Prediction Dan Wilson, Mohamed Akrout cs.LG, math.DS
4 MRI-Eval: A Tiered Benchmark for Evaluating LLM Performance on MRI Physics and GE Scanner Operations Knowledge Perry E. Radau eess.IV, cs.CL, physics.med-ph
4 The Pinocchio Dimension: Phenomenality of Experience as the Primary Axis of LLM Psychometric Differences Hubert Plisiecki, Sabina Siudaj et al. cs.CL
4 UFAL-CUNI at SemEval-2026 Task 11: An Efficient Modular Neuro-symbolic Method for Syllogistic Reasoning Ivan Kartáč, Kristýna Onderková et al. cs.CL
3 Human-AI Co-Mentorship in Project-Based Learning: A Case Study in Financial Forecasting Freyaa Chawla, Ahan Chawla et al. cs.LG, cs.CY
3 Automatically Finding and Validating Unexpected Side-Effects of Interventions on Language Models Quintin Pope, Ajay Hayagreeve Balaji et al. cs.CL, cs.AI
3 When Relations Break: Analyzing Relation Hallucination in Vision-Language Model Under Rotation and Noise Philip Wootaek Shin, Ajay Narayanan Sridhar et al. cs.CV, cs.CL
2 The First Token Knows: Single-Decode Confidence for Hallucination Detection Mina Gabriel cs.CL, cs.AI
2 Aes3D: Aesthetic Assessment in 3D Gaussian Splatting Chuanzhi Xu, Boyu Wei et al. cs.CV, cs.AI
2 Superposition Is Not Necessary: A Mechanistic Interpretability Analysis of Transformer Representations for Time Series Forecasting Alper Yıldırım cs.LG, cs.AI
2 Joint Treatment Effect Estimation from Incomplete Healthcare Data: Temporal Causal Normalizing Flows with LLM-driven Evolutionary MNAR Imputation Olivia Jullian Parra, Sara Zoccheddu et al. cs.LG, cs.AI
2 Transformed Latent Variable Multi-Output Gaussian Processes Xiaoyu Jiang, Xinxing Shi et al. cs.LG
2 Conditional outlier detection for clinical alerting Milos Hauskrecht, Michal Valko et al. cs.LG, cs.CY
2 Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior Daniel Wurgaft, Can Rager et al. cs.LG
2 Adapting Large Language Models to a Low-Resource Agglutinative Language: A Comparative Study of LoRA and QLoRA for Bashkir Mullosharaf K. Arabov, Svetlana S. Khaybullina cs.CL
1 When Life Gives You BC, Make Q-functions: Extracting Q-values from Behavior Cloning for On-Robot Reinforcement Learning Lakshita Dodeja, Ondrej Biza et al. cs.RO, cs.AI
1 PSK at SemEval-2026 Task 9: Multilingual Polarization Detection Using Ensemble Gemma Models with Synthetic Data Augmentation Srikar Kashyap Pulipaka cs.CL, cs.AI, cs.LG
1 LineRides: Line-Guided Reinforcement Learning for Bicycle Robot Stunts Seungeun Rho, Shamel Fahmi et al. cs.RO, cs.AI
1 Building informative materials datasets beyond targeted objectives Rafael Espinosa Castañeda, Ashley Dale et al. cond-mat.mtrl-sci, cs.AI, cs.DB, cs.LG, stat.AP
1 Sharp Capacity Thresholds in Linear Associative Memory: From Winner-Take-All to Listwise Retrieval Nicholas Barnfield, Juno Kim et al. stat.ML, cs.IT, cs.LG
1 Estimating the expected output of wide random MLPs more efficiently than sampling Wilson Wu, Victor Lecomte et al. cs.LG, cond-mat.dis-nn, stat.ML
1 Physiologically Grounded Driver Behavior Classification: SHAP-Driven Elite Feature Selection and Hybrid Gradient Boosting for Multimodal Physiological Signals Sahar Askari, Mohammad Mahdi Mirza Ali Mohammadi et al. cs.LG, eess.SP
1 On the Hardness of Junking LLMs Marco Rando, Samuel Vaiter cs.LG
1 Implicit Representations of Grammaticality in Language Models Yingshan Susan Wang, Linlu Qiu et al. cs.CL
1 Detecting Hallucinations in Large Language Models via Internal Attention Divergence Signals Gijs van Dijk cs.CL
1 Conceptors for Semantic Steering Ilias Triantafyllopoulos, Young-Min Cho et al. cs.LG, cs.CL
0 Taming Outlier Tokens in Diffusion Transformers Xiaoyu Wu, Yifei Wang et al. cs.CV, cs.AI, cs.LG
0 Grokability in five inequalities Paata Ivanisvili, Xinyuan Xie math.PR, cs.AI, math.AP, math.CA, math.FA
0 Almost-Orthogonality in Lp Spaces: A Case Study with Grok Ziang Chen, Jaume de Dios Pont et al. math.CA, cs.AI, math.CO, math.PR
0 What Matters in Practical Learned Image Compression Kedar Tatwawadi, Parisa Rahimzadeh et al. cs.CV, cs.AI, cs.LG
0 On the Wasserstein Gradient Flow Interpretation of Drifting Models Arthur Gretton, Li Kevin Wenliang et al. cs.LG, cs.AI, stat.ML
0 Continual Knowledge Updating in LLM Systems: Learning Through Multi-Timescale Memory Dynamics Andreas Pattichis, Constantine Dovrolis cs.LG, cs.AI, cs.CL
0 Understanding In-Context Learning for Nonlinear Regression with Transformers: Attention as Featurizer Alexander Hsu, Zhaiming Shen et al. cs.LG, math.NA
0 How Long Does Infinite Width Last? Signal Propagation in Long-Range Linear Recurrences Mariia Seleznova cs.LG
0 The Impossibility Triangle of Long-Context Modeling Yan Zhou cs.CL, cs.AI, cs.LG
0 Why Geometric Continuity Emerges in Deep Neural Networks: Residual Connections and Rotational Symmetry Breaking Kyungwon Jeong, Won-Gi Paeng et al. cs.LG, cs.AI, cs.CL