Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
|
from
login
Soft Contamination Means Benchmarks Test Shallow Generalization
(
arxiv.org
)
2 points
by
cjbarber
14 days ago
|
past
|
1 comment
SkillsBench: Benchmarking how well agent skills work across diverse tasks
(
arxiv.org
)
364 points
by
mustaphah
14 days ago
|
past
|
171 comments
Virtual Width Networks (VWN)
(
arxiv.org
)
9 points
by
tesserato
14 days ago
|
past
CodeLogician: Neuro-symbolic reasoning for precise software analysis
(
arxiv.org
)
2 points
by
NTCTech
14 days ago
|
past
|
1 comment
Intelligent AI Delegation (2026)
(
arxiv.org
)
1 point
by
Nydhal
14 days ago
|
past
Delegated Agent Authorization Constrained to Semantic Task-to-Scope Matching
(
arxiv.org
)
1 point
by
mooreds
14 days ago
|
past
Evaluating AGENTS.md: are they helpful for coding agents?
(
arxiv.org
)
232 points
by
mustaphah
14 days ago
|
past
|
161 comments
Multi-Agent Teams Hold Experts Back
(
arxiv.org
)
1 point
by
fauigerzigerk
15 days ago
|
past
Large Language Model Reasoning Failures
(
arxiv.org
)
1 point
by
kawera
15 days ago
|
past
Towards Autonomous Mathematics Research
(
arxiv.org
)
107 points
by
gmays
15 days ago
|
past
|
53 comments
Retrieval-Aware Distillation for Transformer-SSM Hybrids
(
arxiv.org
)
2 points
by
readitalready
15 days ago
|
past
Biases in the Blind Spot: Detecting What LLMs Fail to Mention
(
arxiv.org
)
2 points
by
mpweiher
16 days ago
|
past
A Framework for Time-Updating Probabilistic Forecasts
(
arxiv.org
)
6 points
by
Luc
16 days ago
|
past
Towards Autonomous Mathematics Research (Google DeepMind)
(
arxiv.org
)
1 point
by
u1hcw9nx
16 days ago
|
past
Remote Labor Index: Measuring AI Automation of Remote Work
(
arxiv.org
)
2 points
by
Leynos
17 days ago
|
past
Generalized on-policy distillation with reward extrapolation
(
arxiv.org
)
3 points
by
fzliu
17 days ago
|
past
OpenAI model proposes and proves Physics result
(
arxiv.org
)
1 point
by
KothuRoti
17 days ago
|
past
An API for Biological Neural Networks
(
arxiv.org
)
1 point
by
bwjx
17 days ago
|
past
Adversarial Patch: images that make classifiers ignore other items in a scene
(
arxiv.org
)
1 point
by
felineflock
17 days ago
|
past
Maximum Agreement Linear Predictor (MALP)
(
arxiv.org
)
1 point
by
tesserato
17 days ago
|
past
|
1 comment
Standardized and In-Depth Benchmarking of Post-Moore Dataflow AI Accelerators
(
arxiv.org
)
1 point
by
PaulHoule
17 days ago
|
past
Fine-Tuning GPT-5 for GPU Kernel Generation
(
arxiv.org
)
4 points
by
matt_d
17 days ago
|
past
SWE-ContextBench: context learning benchmark in coding
(
arxiv.org
)
1 point
by
mustaphah
17 days ago
|
past
LLMs exceed physicians on complex text-based differential diagnosis
(
arxiv.org
)
3 points
by
rippeltippel
17 days ago
|
past
|
2 comments
Horus: A Protocol For Trustless Verification Under Uncertainty
(
arxiv.org
)
1 point
by
optimalsolver
17 days ago
|
past
Learning to Reason in 13 Parameters
(
arxiv.org
)
2 points
by
stared
17 days ago
|
past
LLM Reasoning Failures
(
arxiv.org
)
1 point
by
gradus_ad
18 days ago
|
past
Defining causal mechanism in dual process theory and 2 types of feedback control
(
arxiv.org
)
1 point
by
s6i
18 days ago
|
past
Routing LLM queries using internal success predictions (70% cost reduction)
(
arxiv.org
)
1 point
by
stansApprentice
18 days ago
|
past
|
3 comments
SWE-AGI: benchmarking spec-driven software construction
(
arxiv.org
)
1 point
by
mustaphah
18 days ago
|
past
|
1 comment
More
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: