Hacker News
Updated 7 minutes agoInfoQ
Updated 4 minutes agoNetflix TechBlog
Updated about 2 hours agoPinterest Engineering
Updated about 2 hours agoSpotify Engineering
Updated about 2 hours agoThe system we built to ensure our AI agents produce predictable, trustworthy code. The post Background Coding Agents: Predictable Results Through Strong Feedback Loops (Part 3) appeared first on Spotify Engineering.
We explore context engineering for background coding agents and what makes a good migration prompt. The post Background Coding Agents: Context Engineering (Part 2) appeared first on Spotify Engineering.
Shuffle has always been one of Spotify’s most-used features, and also one of the most misunderstood. For... The post Shuffle: Making Random Feel More Human appeared first on Spotify Engineering.
Thousands of merged AI-generated pull requests and the future of large-scale software maintenance. The post 1,500+ PRs Later: Spotify’s Journey with Our Background Coding Agent (Part 1) appeared first on Spotify Engineering.
TL;DR Spotify’s experimentation platform, Confidence, scaled product decision-making across hundreds of... The post Beyond Winning: Spotify’s Experiments with Learning Framework appeared first on Spotify Engineering.
The Airbnb Tech Blog
Updated about 2 hours agoMartin Fowler
Updated about 2 hours agoGitanjali Venkatraman does wonderful illustrations of complex subjects (which is why I was so happy to work with her on our Expert Generalists article). She has now published the latest in her series of illustrated guides: tackling the complex topic of <a href="https://www.thoughtworks.com/content/dam/thoughtworks/documents/blog/mainframe_modernisation_illust
If you’re a regular reader of my site, you’ll have noticed that in the last few months I’ve been making a number of “fragments” posts. Such a post is a short post with a bunch of little, unconnected segments. These are usually a reference to something I’ve found on the web, sometimes a small thought of my own. A few years ago, I wouldn’t have covere
Why does AI write like… that (NYT, gift link). Sam Kriss delves into the quiet hum of AI writing. AI’s work is not compelling prose: it’s phantom text, ghostly scribblings, a spectre woven into our communal tapestry. ❄ ❄ ❄ ❄ ❄ <a href="https://coding-is-like-cooking.info/2025/12/test-desidera
Rob Bowley summarizes a study from Carnegie Mellon looking on the impact of AI on a bunch of open-source software projects. Like any such study, we shouldn’t take its results as definitive, but there seems enough there to make it a handy data point. The key point is that the AI code probably reduced the quality of the code base - at least if static code analysis can be trusted to determ
I’ve been on the road in Europe for the last couple of weeks, and while I was there Thoughtworks released volume 33 of our Technology Radar. Again it’s dominated by the AI wave, with lots of blips capturing our explorations of how to use LLMs and similar technology. “Agents” are the big thing these days but we’re also seeing grow
Uber Engineering
Feed not available
The feed URL may have changed or is temporarily unavailable.
Anthropic News
Updated about 2 hours agoThe best model in the world for coding, agents, and computer use, with meaningful improvements to everyday tasks like slides and spreadsheets.
Claude Sonnet 4.5 sets new benchmark records in coding, reasoning, and computer use while being Anthropic's most aligned model.
Claude Haiku 4.5 matches state-of-the-art coding capabilities from months ago while delivering unprecedented speed and cost-efficiency for complex tasks.
Anthropic raised $13 billion in a Series F round at a $183 billion valuation to expand enterprise offerings, safety research, and international growth.
Anthropic's response to the White House AI Action Plan supports infrastructure and safety measures while calling for stronger export controls.
Windsurf News
Updated about 2 hours agoParallel agents, Git worktrees, multi-pane Cascade, dedicated terminal, and SWE-1.5 Free
GPT-5.2 is now live in Windsurf! Available for 0x credits for a limited time (paid and trial users). The version bump undersells the jump in intelligence: Biggest leap for GPT models in agentic coding since GPT-5, SOTA coding model at its price point, Default in Windsurf
The most capable model in Windsurf yet, now available at Sonnet prices for a limited time
GPT 5.1, GPT 5.1-Codex, and GPT-5.1-Codex Mini deliver a solid upgrade for agentic coding with variable thinking and improved steerability
SWE-1.5 is our latest frontier model, delivering near-SOTA coding performance at unprecedented speed.
Cursor
Updated about 2 hours agoWe're partnering with ecosystem vendors who have built hooks support with Cursor.
Graphite has entered into a definitive agreement to be acquired by Cursor.
Bringing design and engineering closer together.
Debug Mode helps you reproduce and fix the most tricky bugs.
How we updated our agent harness to support GPT-5.1-Codex-Max.
OpenAI News
Updated about 2 hours agoMore than one million customers around the world now use OpenAI to empower their teams and unlock new opportunities. This post highlights how companies like PayPal, Virgin Atlantic, BBVA, Cisco, Moderna, and Canva are transforming the way work gets done with AI.
OpenAI is strengthening ChatGPT Atlas against prompt injection attacks using automated red teaming trained with reinforcement learning. This proactive discover-and-patch loop helps identify novel exploits early and harden the browser agent’s defenses as AI becomes more agentic.
OpenAI introduces a new framework and evaluation suite for chain-of-thought monitorability, covering 13 evaluations across 24 environments. Our findings show that monitoring a model’s internal reasoning is far more effective than monitoring outputs alone, offering a promising path toward scalable control as AI systems grow more capable.
OpenAI and the U.S. Department of Energy have signed a memorandum of understanding to deepen collaboration on AI and advanced computing in support of scientific discovery. The agreement builds on ongoing work with national laboratories and helps establish a framework for applying AI to high-impact research across the DOE ecosystem.
OpenAI shares new AI literacy resources to help teens and parents use ChatGPT thoughtfully, safely, and with confidence. The guides include expert-vetted tips for responsible use, critical thinking, healthy boundaries, and supporting teens through emotional or sensitive topics.
Google DeepMind News
Updated about 2 hours agoGoogle 2025 recap: Research breakthroughs of the year
Gemini 3 Flash offers frontier intelligence built for speed at a fraction of the cost.
Open interpretability tools for language models are now available across the entire Gemma 3 family with the release of Gemma Scope 2.
Google DeepMind and UK AI Security Institute (AISI) strengthen collaboration on critical AI safety and security research
Anthropic Engineering
Updated about 2 hours agoAgents still face challenges working across many context windows. We looked to human engineers for inspiration in creating a more effective harness for long-running agents.
Introducing advanced tool use on the Claude Developer Platform
Code execution with MCP: Building more efficient agents
Beyond permission prompts: making Claude Code more secure and autonomous
Equipping agents for the real world with Agent Skills
Hugging Face Blog
Updated about 2 hours agoNvidia Blog
Updated about 2 hours agoNVIDIA will join the U.S. Department of Energy’s (DOE) Genesis Mission as a private industry partner to keep U.S. AI both the leader and the standard in technology around the world. The Genesis Mission, which is part of an Executive Order recently signed by President Trump, aims to redefine American leadership in AI across three Read Article </span
The NVIDIA RTX PRO 5000 72GB Blackwell GPU is now generally available, bringing robust agentic and generative AI capabilities powered by the NVIDIA Blackwell architecture to more desktops and professionals across the world.
Step out of the vault and into the future of gaming with Fallout: New Vegas streaming on GeForce NOW, just in time to celebrate the newest season of the hit Amazon TV show Fallout. To mark the occasion, GeForce NOW members can claim Fallout 3 and Fallout 4 as special rewards, completing a wasteland-ready trilogy Read Article
Physical AI is moving from research labs into the real world, powering intelligent robots and autonomous vehicles (AVs) — such as robotaxis — that must reliably sense, reason and act amid unpredictable conditions.
The Hao AI Lab research team at the University of California San Diego — at the forefront of pioneering AI model innovation — recently received an NVIDIA DGX B200 system to elevate their critical work in large language model inference. Many LLM inference platforms in production today, such as NVIDIA Dynamo, use research concepts that Read Article
arXiv
Updated about 2 hours agoMasked Diffusion Models (MDMs) offer flexible, non-autoregressive generation, but this freedom introduces a challenge: final output quality is highly sensitive to the decoding order. We are the first to formalize this issue, attributing the variability in output quality to the cumulative predictive uncertainty along a generative path. To quantify this uncertainty, we introduce Denoising Entropy, a computable metric that serves as an internal signal for evaluating generative process. Leveraging t
Computational point-of-care (POC) sensors enable rapid, low-cost, and accessible diagnostics in emergency, remote and resource-limited areas that lack access to centralized medical facilities. These systems can utilize neural network-based algorithms to accurately infer a diagnosis from the signals generated by rapid diagnostic tests or sensors. However, neural network-based diagnostic models are subject to hallucinations and can produce erroneous predictions, posing a risk of misdiagnosis and i
Separating signal from noise is central to experimental science. Applying well-established statistical method effectively to LLM evals requires consideration of their unique noise characteristics. We clearly define and measure three types of noise: prediction noise from generating different answers on a given question, data noise from sampling questions, and their combined total noise following the law of total variance. To emphasize relative comparisons and gain statistical power, we propose th
We propose Parallel Token Prediction (PTP), a universal framework for parallel sequence generation in language models. PTP jointly predicts multiple dependent tokens in a single transformer call by incorporating the sampling procedure into the model. This reduces the latency bottleneck of autoregressive decoding, and avoids the restrictive independence assumptions common in existing multi-token prediction methods. We prove that PTP can represent arbitrary autoregressive sequence distributions. P
It has been discovered that latent-Euclidean variational autoencoders (VAEs) admit, in various capacities, Riemannian structure. We adapt these arguments but for complex VAEs with a complex latent stage. We show that complex VAEs reveal to some level Kähler geometric structure. Our methods will be tailored for decoder geometry. We derive the Fisher information metric in the complex case under a latent complex Gaussian with trivial relation matrix. It is well known from statistical information th
Hugging Face Trending
Updated 4 minutes agoGitHub Trending
Updated 18 minutes agoAWS News Blog
Updated about 2 hours agoCan you believe it? We’re nearly at the end of 2025. And what a year it’s been! From re:Invent recap events, to AWS Summits, AWS Innovate, AWS re:Inforce, Community Days, and DevDays and, recently, adding that cherry on the cake, re:Invent 2025, we have lived through a year filled with exciting moments and technology advancements […]
The week after AWS re:Invent builds on the excitement and energy of the event and is a good time to learn more and understand how the recent announcements can help you solve your challenges and unlock new opportunities. As usual, we have you covered with our top announcements of AWS re:Invent 2025 that you can […]
Amazon Bedrock now supports reinforcement fine-tuning delivering 66% accuracy gains on average over base models.
Accelerate AI model development with new training features that enable rapid recovery from failures and automatic scaling based on resource availability.
Accelerate AI model development with new training features that enable instant recovery from failures and automatic scaling based on resource availability.
Alibaba Cloud
Updated about 2 hours agoCloudflare
Updated about 2 hours agoPhysical data center maintenance is risky on a global network. We built a maintenance scheduler on Workers to safely plan disruptive operations, while solving scaling challenges by viewing the state of our infrastructure through a graph interface on top of multiple data sources and metrics pipelines.
We have declared “Code Orange: Fail Small” to focus everyone at Cloudflare on a set of high-priority workstreams with one simple goal: ensure that the cause of our last two global outages never happens again.
Cloudflare's H1 2025 Transparency Report is here. We discuss our principles on content blocking and our innovative approach to combating unauthorized streaming and copyright abuse.
Cloudflare’s R2 SQL, a distributed query engine, now supports aggregations. Explore how we built distributed GROUP BY execution, using scatter-gather and shuffling strategies to run analytics directly over your R2 Data Catalog.
We present our 6th annual review of Internet trends and patterns observed across the globe, revealing the disruptions, advances and metrics that defined 2025.