The web does not need gatekeepers: Cloudflare’s new “signed agents” pitch (positiveblue.substack.com)

Many data teams often find themselves as 'tool jockeys' instead of becoming true engineers. They primarily learn some company data, and then rely on drag-and-drop or YML configuration functionality within the constraints of the tool's environment.

Their organization often insists they must use standard tools, and their idea of a good job is that the task works fine within their personal version. No automatic testing, no automated deployment, no version control, and handcrafted environments. And then they get yelled at when things break and yelled at for taking too long. And most DEs want to quit the field after a few years.

The real question is not that DE and software engineering are converging. It's why most DEs don't have the self-respect and confidence to engineer systems so that their lives don't suck.

CalRobert · 1h ago

Data engineering was software engineering from the very beginning. Then a bunch of business analysts who didn't know anything about writing software got jealous and said that if you knew SQL/DBT you were a data engineer. I've had to explain too many times that yes, indeed, I can set up a CI/CD pipeline or set up kafka or deploy Dagster on ECS, to the point where I think I need to change my title just to not be cheapened.

sdairs · 55m ago

I think even before dbt turned DE into "just write sql & yaml", there was an appreciable difference in DE vs SE. There was defo some DEs writing a lot of java/scala if they were in Spark heavy co's, but my experience is that DEs were doing a lot more platform engineering (similar to what you suggest), SQL and point-and-click (just because that was the nature of the tooling). I wasn't really seeing many DEs spending a lot of time in an IDE.

But I think whats interesting from the post is looking at SEs adopting data infra into their workflow, as opposed to DEs writing more software.

craneca0 · 16m ago

yeah, i've seen large fortune 100 data and analytics orgs where the majority of folks with data engineering titles are uncomfortable with even the basics of git.

isaacremuant · 24m ago

Agreed. Weird distinction to pay less to people who did certain things and you could a high variance between "data engineers". Some who had only done a course and others that had extensive knowledge of software engineering practices were considered the same.

Ridiculous.

giantg2 · 1h ago

I've never really seen the distinction between data and software engineering. It's more like front-end vs backend. If you're a data engineer and it's all no code tooling, then you're just an analyst or something.

flexiflex · 18m ago

When I worked at bigCo , it was a totally different world. Data engineers used data platform tools to do data work, usually for data’s sake. Software teams trying to build stuff with data had to finagle their way onto roadmaps.

banku_brougham · 11m ago

If are orchestrating pipelines in airflow or Prefect you are having to write the client software around those engines, and its a lot of python.

Another anecdatum: the data engineers role at Zillow is called "Software Development Engineer, Big Data"

getnormality · 56m ago

It's not hard to do data engineering to the standards of software engineering, and many people do it already, provided that

1. You use a real programming language that supports all the abstractions software engineers rely on, not (just) SQL.

2. The data is not too big, so the feedback cycle is not too horrendously slow.

#2 can't ever be fully solved, but testing a data pipeline on randomly subsampled data can help a lot in my experience.

sdairs · 49m ago

In your experience, how are folks doing (1)? The post is talking about a framework to add e.g. type safety, schema-as-code, etc. over assets in data infra in a familiar way as to what is common with Postgres; I'm not familiar with much else out there for that?

SrslyJosh · 44m ago

"Data engineering and software engineering are converging" says firm selling analytics products/services. I think the perspective here may be a bit skewed.

zurfer · 1h ago

Maybe. On the one side you have something like dbt or Moosestack. On the other hand analytics and data pipelining is still a lot of no code tooling and I doubt it will go away. However I would love to learn more about how other people use coding agents to do DE tasks.

craneca0 · 10m ago

agreed on the presence and stickiness of no-code tooling. but in a future where we want to enable LLMs and agents to do as much of that work as possible, a code-first approach seems far more likely to make that effective. not just because agents are better are writing code than clicking through interfaces (maybe that will change as agents evolve?), but because the SDLC is valuable for agents for the same reasons it's valuable for human developers - collaboration, testing, auditing, versioning, etc.

rawgabbit · 1h ago

In Snowflake, I am now writing Python Stored Procedures that make REST API calls to things like Datadog REST API and dumping the JSON into a Snowflake table. I then unpack the JSON and transform it into a normalized table. So far it works reasonably well. This is possible using Snowflake's external access feature. https://docs.snowflake.com/en/developer-guide/external-netwo...

zamalek · 1h ago

One things have seen through my more recent exposure to experienced data engineers is the lack of repeatability rigor (CI/CD, IaC, etc.). There's a lot of doing things in notebooks and calling that production-ready. Databricks has git (GitHub only from what I can tell) integration, but that's just checking out and directly committing to trunk, if it's in git then we have SDLC right, right? It's fucking nuts.

Anyone have workflows or tooling that are highly compatible with the entrenched notebook approach, and are easy to adopt? I want to prevent theses people from learning well-trodden lessons the hard way.

RobinL · 1h ago

I think this may be a databricks thing? From what I've seen there's a gap between data engineers forced to use databricks and everyone else. From what I've seen, at least how it's used in practice, databricks seems to result in a mess of notebooks with poor dependency and version management.

zamalek · 53m ago

Interesting, databricks has been my first exposure to DE at scale and it does seem to solve many problems (even though it sounds like it's causing some). So what does everyone else do? Run spark etc. themselves?

RobinL · 26m ago

We use aws glue for spark (but are increasingly moving towards duckdb because it's faster for our workloads and easier to test and deploy).

For Spark, glue works quite well. We use it as 'spark as a service', keeping our code as close to vanilla pyspark as possible. This leaves us free to write our code in normal python files, write our own (tested) libraries which are used in our jobs, use GitHub for version control and ci and so on

sdairs · 38m ago

tbh I see just as much notebook-hell outside of dbx, it's certainly not contained to just them. There's some folks doing good SDLC with Spark jobs in java/scala, but I've never found it to be overly common, I see "dump it on the shared drive" equally as much lol. IME data has always been a bit behind in this area

personally you couldn't pay me to run Spark myself these days (and I used to work for the biggest Hadoop vendor in the mid 2010s doing a lot of Spark!)

esafak · 1h ago

For CI, try dagger. It's code based and runs locally too, so you can write tests. But it is a moving target and more complex than Docker.

Do the simplest thing that could possibly work (seangoedecke.com)

The Theoretical Limitations of Embedding-Based Retrieval (arxiv.org)

Essential Coding Theory [pdf] (cse.buffalo.edu)

John Carmack's arguments against building a custom XR OS at Meta (twitter.com)

Lisp from Nothing, Second Edition (t3x.org)

How to Stop Google from AI-Summarising Your Website (teruza.com)

Nous Research presents Hermes 4 (hermes4.nousresearch.com)

Wikipedia as a Graph (wikigrapher.com)

Deploying DeepSeek on 96 H100 GPUs (lmsys.org)

Data engineering and software engineering are converging (clickhouse.com)

Grok Code Fast 1 (x.ai)

How did .agakhan, .ismaili and .imamat get their own TLDs? (data.iana.org)

Flunking my Anthropic interview again (taylor.town)

Offline-First Landscape – 2025 (marcoapp.io)

Thunder Compute (YC S24) Is Hiring (ycombinator.com)

Show HN: Sosumi.ai – Convert Apple Developer docs to AI-readable Markdown (sosumi.ai)

SQLite's Durability Settings Are a Mess (agwa.name)

The No-CPU Amiga Demo Challenge (github.com)

The Synology End Game (lowendbox.com)

What Does will-change In CSS Do? (jakub.kr)

Show HN: Find Hidden Gems on HN (pj4533.com)

This is my brain on leeches (todaythings.substack.com)

Bourbaki – A Secret Society of Mathematicians (books.google.com)

How do I get into the Game Industry (garry.net)

Why AI Isn't Ready to Be a Real Coder (spectrum.ieee.org)

God Created the Real Numbers (ethanheilman.com)

AI is ummasking ICE officers. Can Washington do anything about it? (politico.com)

Meta might be secretly scanning your phone's camera roll (zdnet.com)

The web does not need gatekeepers: Cloudflare’s new “signed agents” pitch (positiveblue.substack.com)

Updates to Consumer Terms and Privacy Policy (anthropic.com)

Fixing an old .NET Core native library loading issue on Alpine (andrewlock.net)

Seedbox Lite: A lightweight torrent streaming app with instant playback (github.com)

Show HN: Magic links – Get video and dev logs without installing anything

Make any site multiplayer in a few lines. Serverless WebRTC matchmaking (oxism.com)

If you have a Claude account, they're going to train on your data moving forward (old.reddit.com)

Claude Sonnet will ship in Xcode (developer.apple.com)

Intel's "Clearwater Forest" Xeon 7 E-Core CPU Will Be a Beast (nextplatform.com)

Lucky 13: a look at Debian trixie (lwn.net)

Sig Sauer citing national security to keep documents from public (practicalshootinginsights.com)

Show HN: A minimal TS library that generates prompt injection attacks (prompt-injector.blueprintlab.io)

Cloudflare confirms downtime on August 23rd, silently posts it on status page

Strange CW Keys (sites.google.com)

Private equity snaps up disability services, challenging regulators (governing.com)

Meta created flirty chatbots of Taylor Swift and others without permission (reuters.com)

Aspects of modern HTML/CSS you may not be familiar with (lyra.horse)

Fuck up my site – Turn any website into beautiful chaos (fuckupmysite.com)

Probability of typing a wrong Bitcoin address (johndcook.com)

Ask HN: The government of my country blocked VPN access. What should I use?

PSA: Libxslt is unmaintained and has 5 unpatched security bugs (vuxml.freebsd.org)

Interview with Dennis Ritchie, Bjarne Stroustrup, and James Gosling (2000) (gotw.ca)

Data engineering and software engineering are converging

Comments (20)