In this interview, we catch up with Ashton, a founding engineer at Theta, to discuss the bleeding edge of Reinforcement Learning infrastructure. He breaks down In this interview, we catch up with Ashton, a founding engineer at Theta, to discuss the bleeding edge of Reinforcement Learning infrastructure. He breaks down

Meet the Writer: Ashton Chew, Founding Engineer at Theta

2025/12/15 04:25


Let’s start! Tell us a bit about yourself. For example, name, profession, and personal interests.

Hey! My name is Ashton, and I’m a founding engineer at Theta where I work on RL infra, RL, and distributed systems. I specifically focus on computer-use and tool-use. In my past, I worked at Amazon AGI and tackled inference and tool-use infrastructure. In my free time, I love graphic design, side-projects, and bouldering.

Interesting! What was your latest Hackernoon Top Story about?

My latest story, “Can Your AI Actually Use a Computer? A 2025 Map of Computer‑Use Benchmarks,” touched on one of the hottest spaces in VC right now: RL environments and evals. I gave a comprehensive overview of the most-used computer-use benchmarks, plus practical advice on how to pick benchmarks for training and testing computer-use agents.

I kept running into the same gap: there aren’t many articles that review the benchmarks themselves. And as this field grows, it’s vital that we’re actually assessing quality instead of rewarding whatever happens to game the metric. We’ve been here before. In the early days of LLMs, benchmarks were random and disparate enough that they only weakly reflected the real winner.

Benchmarks became the de facto scoreboard for “best model,” and then people realized a lot of them weren’t measuring what they claimed.

One of the most revealing early-era failures was when “reading comprehension” quietly became “pattern matching on dataset structure.” Researchers ran intentionally provocative baselines (question-only, last-sentence-only), and the results were high enough to raise an uncomfortable possibility: the benchmark didn’t consistently force models to use the full passage. In a 2018 critique, the point wasn’t that reading never matters, but that some datasets accidentally made it optional by over-rewarding shortcuts like recency and stereotyped answer priors.

\

# Supposed task: answer the question given the passage and question Passage (summary): - Sentences 1–8: John’s day at school (mostly irrelevant detail) - Sentence 9: "After school, John went to the kitchen." - Sentence 10: "He ate a slice of pizza before starting his homework." Question: "What did John eat?" Answer: "pizza"

The benchmark accidentally rewards a shortcut where the model overweights the last sentence (because the answer is often near the end) and simply extracts the direct object of the most recent action (“ate ___”), which in this case yields “pizza.”

And then comes the even more damaging baseline: remove the passage entirely and see what happens. If a question-only model is competitive, it’s a sign the dataset is leaking signal through repetition and priors rather than testing passage-grounded comprehension.

Question: "What did John eat?"

This baseline is basically a sanity check: can the model still score well by leaning on high-frequency answer templates without grounding on the passage at all? In practice it just guesses a token the dataset disproportionately rewards (“pizza,” “sandwich”), and if that works more often than it should, you’re not measuring comprehension so much as you’re measuring the dataset’s priors.

Computer-use evals have already produced an even more literal shortcut: the agent has a browser, the benchmark is public, and the evaluation turns into an open-book exam with an answer key on the final page. In the Holistic Agent Leaderboard (HAL) paper, the authors report observing agents that searched for the benchmark on HuggingFace instead of solving the task, a behavior you only catch if you inspect logs.

\

# Supposed task: complete a workflow inside the web environment Task: "Configure setting X in the app and verify it's enabled." Failure mode: 1) Open a new tab 2) Search for: "benchmark X expected enabled state" / "HAL <benchmark> setting X" 3) Find: repo / leaderboard writeup / dataset card / issue thread 4) Reproduce the expected end state (answer)

At that point, the evaluation was measuring whether it can locate the answer key.

Task: "Find the correct page and extract Y." Failure mode: - Search: "<benchmark name> Y" - Copy from a public artifact (docs, forum post, dataset card) - Paste the value into the agent output as if it came from interaction

If an agent can pull the value from a dataset card or repo and still “pass,” the success check is grading plausibility, not interaction correctness. Public tasks plus shallow verification turn web search into an exploit.

These two examples are the warning shot: if we don’t hold computer-use benchmarks to higher standards early, we’ll repeat the LLM era just with better UIs and more elaborate ways to cheat.

Do you usually write on similar topics? If not, what do you usually write about?

Yes! Working on the RL environments and RL infra around computer-use, I’m constantly surrounded by the best computer-use models and the most realistic training environments. So I wrote another article, “The Screen Is the API,” which is the case for computer-use and why it’s the future of AI models.

This space is extremely underreported due to two reasons:

  1. Models aren’t as capable in computer-use as they are in other tasks (coding, math, etc.).
  2. Computer-use is fast-moving and extremely new.

I want to change that.

Great! What is your usual writing routine like (if you have one)

I usually read a bunch of research papers and speak to my peers in the industry about their thoughts on a topic. Other than that, I spend a lot of time reading articles by great bloggers like PG. So I usually take a lot of inspiration from other people in my writing.

Being a writer in tech can be a challenge. It’s not often our main role, but an addition to another one. What is the biggest challenge you have when it comes to writing?

Finding the time to sit down and put my lived experience into words.

What is the next thing you hope to achieve in your career?

To tackle harder problems with great people, to learn from those people, and share my experiences.

Wow, that’s admirable. Now, something more casual: What is your guilty pleasure of choice?

Watching movies! My favorite movie right now is Catch Me If You Can (2002).

Do you have a non-tech-related hobby? If yes, what is it?

I love bouldering because it makes me feel like I’m a human computer-use agent interacting with the climbing wall. I’m kidding. I think bouldering is a lot of fun because it allows me to take my mind off of work and consolidate my thinking.

What can the Hacker Noon community expect to read from you next?

I’m currently writing another piece on RL environment infrastructure!

What’s your opinion on HackerNoon as a platform for writers?

I think the review structure is awesome, and it was a great place for me to put my thoughts in front of technical readers.

Thanks for taking the time to join our “Meet the writer” series. It was a pleasure. Do you have any closing words?

I love writing. Thank you, HackerNoon!

Market Opportunity
CATCH Logo
CATCH Price(CATCH)
$0.001485
$0.001485$0.001485
-36.53%
USD
CATCH (CATCH) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

XRP price weakens at critical level, raising risk of deeper pullback

XRP price weakens at critical level, raising risk of deeper pullback

Markets Share Share this article
Copy linkX (Twitter)LinkedInFacebookEmail
XRP price weakens at critical level, raising
Share
Coindesk2025/12/16 11:34
Wormhole Unveils W Token 2.0 with Enhanced Tokenomics

Wormhole Unveils W Token 2.0 with Enhanced Tokenomics

The post Wormhole Unveils W Token 2.0 with Enhanced Tokenomics appeared on BitcoinEthereumNews.com. Joerg Hiller Sep 17, 2025 13:57 Wormhole introduces W Token 2.0, featuring upgraded tokenomics, a strategic Wormhole Reserve, and a 4% base yield, aiming to optimize ecosystem growth and align incentives. Wormhole has announced a significant upgrade to its native token, unveiling the W Token 2.0. This upgrade introduces new tokenomics including the establishment of a Wormhole Reserve, a 4% base yield, and an optimized unlock schedule, marking a pivotal development in the ecosystem, according to Wormhole. The W Token Evolution Launched in October 2020, Wormhole’s W token has been central to the platform’s mission of creating a connected internet economy. The latest upgrade aims to enhance the token’s utility across more than 40 blockchains. With a capped supply of 10 billion, the W token supports governance, staking, and ecosystem growth, aligning incentives for network security and development. Introducing the Wormhole Reserve The Wormhole Reserve will accumulate value from both onchain and offchain activities, supporting the ecosystem’s expansion. As Wormhole adoption grows, the token will capture value through network expansions and ecosystem applications, ensuring that growth is directly reflected in the token’s value. 4% Base Yield and Governance Rewards Wormhole 2.0 introduces a 4% base yield for W holders who actively participate in governance. The yield, derived from existing token supplies and protocol revenues, is designed to incentivize active participation without inflating the token supply. Optimized Unlock Schedule Updating its token release schedule, Wormhole replaces annual cliffs with bi-weekly unlocks, starting October 3, 2025. This change aims to reduce market pressure and provide a more stable environment for investors and contributors. The bi-weekly schedule will span over 4.5 years, affecting categories such as Guardian Nodes and Community & Launch. Wormhole’s Future Vision With these upgrades, Wormhole aims to expand its role as…
Share
BitcoinEthereumNews2025/09/18 15:48
Crucial US Stock Market Update: What Wednesday’s Mixed Close Reveals

Crucial US Stock Market Update: What Wednesday’s Mixed Close Reveals

BitcoinWorld Crucial US Stock Market Update: What Wednesday’s Mixed Close Reveals The financial world often keeps us on our toes, and Wednesday was no exception. Investors watched closely as the US stock market concluded the day with a mixed performance across its major indexes. This snapshot offers a crucial glimpse into current investor sentiment and economic undercurrents, prompting many to ask: what exactly happened? Understanding the Latest US Stock Market Movements On Wednesday, the closing bell brought a varied picture for the US stock market. While some indexes celebrated gains, others registered slight declines, creating a truly mixed bag for investors. The Dow Jones Industrial Average showed resilience, climbing by a notable 0.57%. This positive movement suggests strength in some of the larger, more established companies. Conversely, the S&P 500, a broader benchmark often seen as a barometer for the overall market, experienced a modest dip of 0.1%. The technology-heavy Nasdaq Composite also saw a slight retreat, sliding by 0.33%. This particular index often reflects investor sentiment towards growth stocks and the tech sector. These divergent outcomes highlight the complex dynamics currently at play within the American economy. It’s not simply a matter of “up” or “down” for the entire US stock market; rather, it’s a nuanced landscape where different sectors and company types are responding to unique pressures and opportunities. Why Did the US Stock Market See Mixed Results? When the US stock market delivers a mixed performance, it often points to a tug-of-war between various economic factors. Several elements could have contributed to Wednesday’s varied closings. For instance, positive corporate earnings reports from certain industries might have bolstered the Dow. At the same time, concerns over inflation, interest rate policies by the Federal Reserve, or even global economic uncertainties could have pressured growth stocks, affecting the S&P 500 and Nasdaq. Key considerations often include: Economic Data: Recent reports on employment, manufacturing, or consumer spending can sway market sentiment. Corporate Announcements: Strong or weak earnings forecasts from influential companies can significantly impact their respective sectors. Interest Rate Expectations: The prospect of higher or lower interest rates directly influences borrowing costs for businesses and consumer spending, affecting future profitability. Geopolitical Events: Global tensions or trade policies can introduce uncertainty, causing investors to become more cautious. Understanding these underlying drivers is crucial for anyone trying to make sense of daily market fluctuations in the US stock market. Navigating Volatility in the US Stock Market A mixed close, while not a dramatic downturn, serves as a reminder that market volatility is a constant companion for investors. For those involved in the US stock market, particularly individuals managing their portfolios, these days underscore the importance of a well-thought-out strategy. It’s important not to react impulsively to daily movements. Instead, consider these actionable insights: Diversification: Spreading investments across different sectors and asset classes can help mitigate risk when one area underperforms. Long-Term Perspective: Focusing on long-term financial goals rather than short-term gains can help weather daily market swings. Stay Informed: Keeping abreast of economic news and company fundamentals provides context for market behavior. Consult Experts: Financial advisors can offer personalized guidance based on individual risk tolerance and objectives. Even small movements in major indexes can signal shifts that require attention, guiding future investment decisions within the dynamic US stock market. What’s Next for the US Stock Market? Looking ahead, investors will be keenly watching for further economic indicators and corporate announcements to gauge the direction of the US stock market. Upcoming inflation data, statements from the Federal Reserve, and quarterly earnings reports will likely provide more clarity. The interplay of these factors will continue to shape investor confidence and, consequently, the performance of the Dow, S&P 500, and Nasdaq. Remaining informed and adaptive will be key to understanding the market’s trajectory. Conclusion: Wednesday’s mixed close in the US stock market highlights the intricate balance of forces influencing financial markets. While the Dow showed strength, the S&P 500 and Nasdaq experienced slight declines, reflecting a nuanced economic landscape. This reminds us that understanding the ‘why’ behind these movements is as important as the movements themselves. As always, a thoughtful, informed approach remains the best strategy for navigating the complexities of the market. Frequently Asked Questions (FAQs) Q1: What does a “mixed close” mean for the US stock market? A1: A mixed close indicates that while some major stock indexes advanced, others declined. It suggests that different sectors or types of companies within the US stock market are experiencing varying influences, rather than a uniform market movement. Q2: Which major indexes were affected on Wednesday? A2: On Wednesday, the Dow Jones Industrial Average gained 0.57%, while the S&P 500 edged down 0.1%, and the Nasdaq Composite slid 0.33%, illustrating the mixed performance across the US stock market. Q3: What factors contribute to a mixed stock market performance? A3: Mixed performances in the US stock market can be influenced by various factors, including specific corporate earnings, economic data releases, shifts in interest rate expectations, and broader geopolitical events that affect different market segments uniquely. Q4: How should investors react to mixed market signals? A4: Investors are generally advised to maintain a long-term perspective, diversify their portfolios, stay informed about economic news, and avoid impulsive decisions. Consulting a financial advisor can also provide personalized guidance for navigating the US stock market. Q5: What indicators should investors watch for future US stock market trends? A5: Key indicators to watch include upcoming inflation reports, statements from the Federal Reserve regarding monetary policy, and quarterly corporate earnings reports. These will offer insights into the future direction of the US stock market. Did you find this analysis of the US stock market helpful? Share this article with your network on social media to help others understand the nuances of current financial trends! To learn more about the latest stock market trends, explore our article on key developments shaping the US stock market‘s future performance. This post Crucial US Stock Market Update: What Wednesday’s Mixed Close Reveals first appeared on BitcoinWorld.
Share
Coinstats2025/09/18 05:30