6NF File Format

75 sergeyprokhoren 25 9/3/2025, 5:49:08 PM habr.com ↗

Comments (25)

nikolayasdf123 · 1d ago
> habr.com

interesting to see this forum show-up again.

remember 15 years ago there were posts about DYI drone from some random guy with lots of theoretical physics about stable conditions derivations. it got a lot of criticism. now looking back and following what DJI is doing with sensors, his approach was totally wrong and that community nailed it with feedback. the forum got some extravagant ideas and some worthy criticism. at least back then.

artemonster · 1d ago
I remember visiting this site daily 10-15 years ago, in russian, ofc. The moderation was super high, karma system worked great, the content quality was astonishing. Then they switched up owners, tried heavily monetizing corpo-pseudo-blogpost-marketing crap and it all went downhill from there
0x457 · 1d ago
It went downhill when they allowed getting an invitation via single blog post, requiring just one person to like it enough to give an invitation. Which wasn't hard to write - just translate something popular from hackernews before anyone else does it.

Shortly after, it became hilariously easy to farm and manipulate karma balances across the entire site. With 50 accounts (mults or real people all the same) you could create a new account a day.

Monetization started when it was already in a death spiral.

NooneAtAll3 · 1d ago
don't forget the awful redesign, including completely replacing post formatter

all accumulated mastery of creating posts by experienced authors - gone overnight

balamatom · 1d ago
habr is an institution. it's like the "runet hn", minus wild west vc ecosystem, plus integrated blog posting like lj ogs intended to. probably helps a lot with original work like TFA getting traction. more power to that!

runet sites of that era are often born out of the hacker's characteristic contrarian attitude "because we can". attempts to monetize them in more recent years are bound to accomplish little more than fuck up the content quality and/or the "owner cashes out and opens cafe" thing.

nevertheless, to this day, when i think habrahabr, i think way higher bar for technical competence than hn. it's all in the attitude.

wearable · 1d ago
What are the modern equivalents of habr?
throw-the-towel · 1d ago
There's probably none. The Russian Internet has been Eternal Septembered too much for something similar to appear.
balamatom · 18h ago
if i knew any, i sure as fuck wouldn't post them on hn of all places.
jojobas · 1d ago
It was also notoriously politics-free, until something happened.
deepsun · 1d ago
Improvement idea -- in my experience "valid_from" is always a date (no time, no timezone). That's how it's reported in documents (e.g. contract validity period).

Rows that need seconds (e.g. bank transactions) are events, they aren't "valid" from a particular point in time forward, they just happen.

nine_k · 1d ago
In my experience, validity time may start at the start of business day, and likely has a specific time zone. In particular, I've seen how documents related to stock trading at NASDAQ specify Eastern Standard Time as the applicable timezone.

I understand how convenient it is to use UTC-only timestamps. It works in most cases but not all.

adammarples · 21h ago
No point losing information like that. What do you do if someone opens and closes an account on the same day? Changes their email address three times in one day? Etc
deepsun · 18h ago
Agree, if your updates have time then datetime it should be. It's just in my work everything is date only. E.g. employment starts on a date, not datetime, there's no data loss.
rixed · 1d ago
> country_code 01K3Y07Z94DGJWVMB0JG4YSDBV

A 7th normal form should mandate that no identifiers should ever be assigned to identifiers.

hdjrudni · 1d ago
Not sure about that. Some folks would argue you should always use surrogate keys.

I probably wouldn't for country_code specifically, but for most things its useful even when you have a 'natural' key.

rixed · 1d ago
It can be useful in some cases but can also be a hindrance. First because identifiers are more useful when they allow to actualy identify the thing; and also because now they can change from instance to instance, from customer to customer etc...
mhalle · 1d ago
This format requires temporal validity with `valid_from`, but doesn't include `valid_to`. I don't understand how `valid_from` and the also required `recorded_at` interact.
STKFLT · 1d ago
I don't have any additional insight to the format, but I think the idea is that there is an implied ->infinity on every date range. Every bank can only have one bank_name so multiple bank_names for the same bank entity can be sorted on the 'valid' and 'recorded' axes to find the upper bounds of each.
dragonwriter · 1d ago
In the bitemporal model, both system and valid times are half-open intervals, and both the preceding and following interval can either have a different value or no value. Using only start times means that while records can be updated in either time stream, they cannot be logically deleted (in transaction time) or invalidated (in valid time) once they exist. There are databases where this assumption is valid, but in general it is problematic.
spennant · 1d ago
Odd seeing this right now for me. I recently implemented a 6NF schema for parsed XBRL files from EDGAR. The architecture was the right call... too bad the data is not useful for analytics.
unquietwiki · 1d ago
Looks interesting, but few comments on the forum & even a negative vote count ATM. Format kinda looks "old school" in terms of defining records, but I guess that can be a positive in some circumstances?
sergeyprokhoren · 21h ago
What does looks "old school" mean? Do you want to wrap this format in JSON like JSON-LD? I don't mind
inkyoto · 1d ago
I would say it is a niche solution that solves a specific problem.

Modern data sources increasingly lean towards and produce nested and deeply nested semi-structured datasets (i.e. JSON) that are heavily denormalised and rely on organisation-wide entity ID's rather than system-generated referential integrity ID's (PK and FK ID's). That is a reason why modern data warehouse products (e.g. Redshift) have added extensive support for the nested data processing – because it neither makes sense to flatten/un-nest the nested data nor is it easy to do anyway.

sergeyprokhoren · 1d ago
This is a fairly common problem. Data is often transferred between information systems in denormalized form (tables with hundreds of columns - attributes). In the data warehouse, they are normalized (data duplication in tables is excluded by using references to reference tables) to make it easier to perform complex analytical queries to the data. Usually, they are normalized to 3NF and very rarely to 6NF, since there is still no convenient tool for 6NF (see my DSL: https://medium.com/@sergeyprokhorenko777/dsl-for-bitemporal-... ). And then the data is again denormalized in data marts to generate reports for external users. All these cycles of normalization - denormalization - normalization - denormalization are very expensive for IT departments. Therefore, I had an idea to transfer data between information systems directly in normalized form, so that nothing else would have to be normalized. The prototypes were the Anchor Modeling and (to a much lesser extent) Data Vault methodologies.
snthpy · 1d ago
Nice. Anchor Modelling is underappreciated.

Gonna have a look at your DSL.