Show HN: I built AI Agents that automate comprehensive due diligence on stocks (agents.decodeinvesting.com)

Most Java file processing solutions either involve a lot of boilerplate or don’t handle concurrency, backpressure, or metrics well out of the box. I needed something fast, clean, and production-friendly — so I built this.

Key features:

Multi-threaded line/batch processing using a configurable thread pool

Producer/consumer model with built-in backpressure

Buffered, asynchronous writing with optional auto-flush

Live metrics: memory usage, throughput, thread times, queue stats

Simple builder API — minimal setup to get going

Output metrics to JSON, CSV, or human-readable format

Use cases:

Large CSV or log file parsing

ETL pre-processing

Line-by-line filtering and transformation

Batch preparation before ingestion

I’d really appreciate your feedback — feature ideas, performance improvements, critiques, or whether this solves a real problem for others. Thanks for checking it out!

Comments (56)

Calzifer · 96d ago

        for(int i=0;i<10000; ++i){

            // do nothing just compute hash again and again.
            hash = str.hashCode();
        }

https://github.com/MayankPratap/Samchika/blob/ebf45acad1963d...

"do nothing" is correct, "again and again" not so much. Java caches the hash code for Strings and since the JIT knows that (at least in recent version[1]) it might even remove this loop entirely.

[1] https://news.ycombinator.com/item?id=43854337

hyperpape · 95d ago

Even in older versions, if the compiler can see that there are no side-effects, it is free to remove the loop and simply return the value from the first iteration.

I'm actually pretty curious to see what this method does on versions that don't have the optimization to treat hashCodes as quasi-final.

A quick test using Java 17 shows it's not being optimized away _completely_, but it's taking...~1 ns per iteration, which is not enough to compute a hash code.

Edit: I'm being silly. It will just compute the hashcode the first time, and then repeatedly check that it's cached and return it. So the JIT doesn't have to do any _real_ work to make this skip the hash code calculation.

So most likely, the effective code is:

    computeHashCode();
    for (int i = 0; i < 10000; i++) {
        if (false) { // pretend this wouldn't have dead code elimination, and the boolean is actually checked
            computeHashCode();
        }
    }

rzzzt · 95d ago

JMH, the microbenchmark harness has an example that highlights this: https://github.com/openjdk/jmh/blob/master/jmh-samples/src/m...

Calzifer · 95d ago

And since benchmarking is hard is also has a helper to actually "waste" time. [1] The implementation [2] might give an idea that it is not always trivial to do nothing but still appear busy.

Btw I found most of the jmh samples interesting. IMO a quite effective mix of example and documentation. (and I'm not sure there is even much other official documentation)

[1] https://github.com/openjdk/jmh/blob/master/jmh-samples/src/m... [2] https://github.com/openjdk/jmh/blob/872b7203c294d90c17766d19...

mprataps · 95d ago

You are write. This code does not recalculate. However, it was written just as a sample. Mainly user will provide his own method to process the file.

mprataps · 95d ago

Guys. I love you all. I did not expect such quality feedback.

I will try to incorporate most of your feedback. Your commments have given me much to learn.

This project was started to just learn more about multithreading in a practical way. I think I succeeded with that.

sieve · 95d ago

A note on the name.

The nasal "m" takes on the form of the nasal in the row/class of the letter that follows it. As "ñ" is the nasal of the "c" class, the "m" becomes "ñ"

Writing Sanskrit terms using the roman script without using something like IAST/ISO-15919 is a pain in the neck. They are going to be mispronounced one way or the other. I try to get the ISO-15919 form and strip away everything that is not a-z.

So, सञ्चिका (sañcikā) = sancika

You probably want to keep the "ch," as the average English speaker is not going to remember that the "c" is the "ch" of "cheese" and not "see."

arnsholt · 95d ago

It’s been ages since I did Sanskrit last, but wouldn’t sam-cika typically have the m realized as an anusvara rather than ñ?

sieve · 95d ago

Not unless it precedes a classless letter or it is actually "m."

All nasals becoming anusvaras is something Hindi/Marathi and other languages using the Devanagari script do. Sanskrit uses the specific form of the nasal when available.

sidcool · 96d ago

It would be even more amazing if it had tests. It's already pretty good.

mprataps · 95d ago

I will add unit tests next.

DannyB2 · 96d ago

Should the tests include some 10 GB files?

VWWHFSfQ · 96d ago

Should include a script for generating 10GB files maybe

diggan · 95d ago

Use tmpfs (/dev/shm) and it doesn't even have to hit the disk, all in memory but with filepaths as the library API might expect :)

sidcool · 95d ago

Naah. I meant unit tests. Not load tests.

sureglymop · 96d ago

Perhaps I misunderstand something but doesn't reading from a file require a system call? And when there is a system call, the context switches? So wouldn't using multiple threads to read from a file mean that they can't really read in parallel anyway because they block each other when executing that system call?

mike_hearn · 96d ago

System calls aren't context switches. They flip a permission bit in the CPU but don't do the work a context switch involves like modifying the MMU, flushing the TLBs, modifying kernel structures, doing scheduling etc.

Also, modern filing systems are all thread safe. You can have multiple threads reading and even writing in parallel on different CPU cores.

porridgeraisin · 95d ago

> system call, the context switches

No, there is no separate kernel "executing". When you do a syscall, your thread becomes kernel mode and it executes the function behind the syscall, then when it's done, your thread reverts to user mode.

A context switch is when one thread is being swapped out for another. Now the syscall could internally spawn a thread and context switch to that, but I'm not sure if this happens in read() or any syscall for that matter.

xxs · 95d ago

What all other siblings said - syscalls are not context switch, they are called 'mode switch' and it has significantly less impact.

bionsystem · 96d ago

If you open() read-only I don't think it blocks (some other process writing to it might block though).

VWWHFSfQ · 96d ago

Am I wrong in thinking that this is duplicating lines in memory repeatedly when buffering lines into batches, and then submitting batches to threads? And then again when calling the line processor? Seems like it might be a memory hog

Calzifer · 95d ago

Since most things in Java are handled by reference, including Strings there should be not that much memory overhead. From a quick look I could not find any actual line duplication.

mprataps · 94d ago

I have CONTRIBUTING.md with guidelines regarding Pull Requests if any of you would take out your precious time to make some changes in the library.

codetiger · 96d ago

Do you have a benchmark comparison with other similar tools?

stopthe · 95d ago

Does it handle line breaks inside quotes in CSV? Frankly, I don't think its possible to reliably process CSV in а multi-threaded manner.

drob518 · 95d ago

At least not without an initial scan. You could do post processing (e.g. parsing numbers and dates and things) in parallel after you’ve done correct line break processing.

gavinray · 96d ago

Please don't do this.

Have the OS handle memory paging and buffering for you and then use Java's parallel algorithms to do concurrent processing.

Create a "MappedByteBuffer" and mmap the file into memory.

If the file is too large, use an "AsynchronousFileChannel" and asynchronously read + process segments of the buffer.

papercrane · 96d ago

If you're using a newer JVM you can also map a "MemorySegment", which doesn't have the 2GiB limit that byte buffers have.

gavinray · 96d ago

Good point, have written about this in the past

https://gavinray97.github.io/blog/panama-not-so-foreign-memo...

switchbak · 96d ago

Memory mapping is fun, but shouldn't we have some kind of async IO / uring support by now? If you're looking at really high-perf I/O, mmaping isn't really state of the art right now.

Then again, if you're in Java/JVM land you're probably not building bleeding edge DBs ala ScyllaDB. But I'm somewhat surprised at the lack of projects in this space. One would think this would pair well with some of the reactive stream implementations so that you wouldn't have to reimplement things like backpressure, etc.

threeseed · 95d ago

a) There have been libraries supporting io_uring on the JVM for many years now.

b) SycllaDB is not bleeding edge. It uses the relatively old now DPDK.

c) There are countless reactive stream implementations e.g. https://vertx.io/docs/vertx-reactive-streams/java/

switchbak · 83d ago

Compared to what the JVM offers, Syclla is certainly way ahead - happy to hear what the latest greatest approaches are.

I'm very aware of various reactive stream impls - I was saying that this work should plug into them rather than reinventing the wheel.

hawk_ · 95d ago

I thought DPDK would still be faster than io_uring.

jlokier · 95d ago

Last time I measured on Linux (a few years ago), with NVMe, mmap + calling out to a thread pool to async-page-touch (so the main thread didn't block) was faster than io_uring (from the main thread) for random access reads.

SillyUsername · 95d ago

Better caveat that with, "but watch memory consumption, given the nature of the likes of CopyOnWriteArraylist". GC will be a bitch.

mprataps · 95d ago

Thanks for this comment. This will be an interesting aspect to explore.

SillyUsername · 95d ago

An ArrayList for huge numbers of add operations is not performant. LinkedList will see your list throughput performance at least double. There are other optimisations you can do but in a brief perusal this stood out like a sore thumb.

Calzifer · 95d ago

Arrays are fast and ArrayList is like a fancy array with bound check and auto grows. Only the grow part can be problematic if it has to grow very often. But that can be avoided by providing an appropriate initial size or reusing the ArrayList by using clear() instead of creating a new one. Both is used by OP in this project. Especially since the code copies lists quite often I would expect LinkedList to perform way worse.

SillyUsername · 95d ago

Wrong. In fact downvoters are wrong too I'm guessing most are junior devs who don't want to be proven wrong. LinkedList is much faster for inserts and slow for retrieval. ArrayLists are the opposite. To the downvoters; I say try it, this is why LinkedList is in the standard library. When you find I'm right, please consider re-upvoting for the free education.

pkulak · 95d ago

I've literally never seen a linked list be faster than an array list in a real application, so if you're right, this is kinda huge for me.

SillyUsername · 95d ago

LinkedList => use when adds total more than reads

ArrayList => use when reads total more than adds.

pkulak · 84d ago

No, that's not true at all. Adds aren't free. Adding in the middle involves following pointers into the heap all over the disk n/2 times, making them generally as expensive as reads. The only situation I can imagine a linked list making sense is if you only add to the front and only read from/delete the front (or back, if it's doubly linked). So a stack or queue.

But even then, I'm pretty sure Go actually uses an array for it's green stacks nowadays, even while paying the copy penalty for expansion.

stopthe · 95d ago

Did you count an allocation of LinkedList.Node<E> on every add operation? You may say it's negligible thanks to TLAB, and I will agree that fast allocation is Java's strength, but in practice I've seen that creating new objects gives order-of-magnitude perf degradation.

SillyUsername · 94d ago

I have seen it for millions of add/del operations, an analytics framework actually for a big American games company (first guess and you'll probably say it), which is where I originally did the analysis about 10 years ago.

I've also written a a video processor around that time too that was bottle necked using ArrayLists - typically a decode, store and read once op. It was at this point I looked at other collections, other list implementations and blocking deques (ArrayList was the wrong collection type to use, but I'd been in a rush for MVP) and ultimately came across https://github.com/conversant/disruptor and used that instead.

The ArrayList Vs Linkedlist was a real eye opener for me in two different systems this same behaviour was replicated when using ArrayLists like queues or incorrect sizing of the buffer increments as load increases.

stopthe · 93d ago

Of course, deletion is a whole different story. I was talking about addition in isolation.

Anyway, I felt I had to run the benchmarks myself.

  @Benchmark
  @Fork(1)
  @BenchmarkMode(Mode.Throughput)
  @OutputTimeUnit(TimeUnit.SECONDS)
  public Object arrayListPreallocAddMillionNulls() {
    ArrayList<Object> arrList = new ArrayList<>(1048576);
    for (int i = 0; i <= 1_000_000; i++) {
      arrList.add(null);
    }
    return arrList;
  }

  @Benchmark
  @Fork(1)
  @BenchmarkMode(Mode.Throughput)
  @OutputTimeUnit(TimeUnit.SECONDS)
  public Object arrayListAddMillionNulls() {
    ArrayList<Object> arrList = new ArrayList<>();
    for (int i = 0; i <= 1_000_000; i++) {
      arrList.add(null);
    }
    return arrList;
  }

  @Benchmark
  @Fork(1)
  @BenchmarkMode(Mode.Throughput)
  @OutputTimeUnit(TimeUnit.SECONDS)
  public Object linkedListAddMillionNulls() {
    LinkedList<Object> linkList = new LinkedList<>();
    for (int i = 0; i <= 1_000_000; i++) {
      linkList.add(null);
    }
    return linkList;
  }

And as I expected, on JDK 8 ArrayList with an appropriate initial capacity was faster than LinkedList. Admittedly not an order of magnitude difference, only 1.7x.

  JDK8
  Benchmark                                      Mode  Cnt    Score    Error  Units
  MyBenchmark.arrayListAddMillionNulls          thrpt    5  229.950 ±  9.994  ops/s
  MyBenchmark.arrayListPreallocAddMillionNulls  thrpt    5  344.116 ±  7.070  ops/s
  MyBenchmark.linkedListAddMillionNulls         thrpt    5  199.446 ± 15.910  ops/s

But! On JDK 17 the situation is completely upside-down:

  JDK17
  Benchmark                                      Mode  Cnt    Score    Error  Units
  MyBenchmark.arrayListAddMillionNulls          thrpt    5   90.462 ± 18.576  ops/s
  MyBenchmark.arrayListPreallocAddMillionNulls  thrpt    5  214.079 ± 15.505  ops/s
  MyBenchmark.linkedListAddMillionNulls         thrpt    5  216.796 ± 19.392  ops/s

I wonder why ArrayList with default initial capacity got so much worse. Worth investigating further.

SillyUsername · 89d ago

Thanks for taking the time to test.

This helps prove my point that adds (and deletes) are generally faster by default when not pre sizing, or removing.

Typically (in my experience) ArrayLists are used without thought to sizing, often because initial capacity and amount to resize, cannot be determined sensibly or consistently.

If in your example you were also to resize the lists, (perhaps adding then dropping those in the Fibonacci sequence?), it would help prove my statement further.

Certainly not worth the -2 points I got from making the statement, but hey you can "please some people some of the time..." :D

fedsocpuppet · 95d ago

Huh? It'll be slower and eat a massive amount of memory too.

SillyUsername · 95d ago

It's holding a reference on each element, but it no longer has to add large chunks of memory on insert when the current array size is exceeded, just single elements. So reads are slower and a small amount of reference memory is used per node. Writes however are much faster particularly when the lists are huge (as in this case). Also I've written video frame processors so I am experienced in this area.

Show HN: React Web Camera – Fix <input type=file> single-photo limit (shivantra.com)

Show HN: FilterQL – A tiny query language for filtering structured data (github.com)

Show HN: Element to LLM – Extension That Turns Runtime DOM into JSON for LLMs

Show HN: I built an open-source CSV importer that I wish existed (github.com)

Show HN: Dsa.rb: Practice core dsa in Ruby from the command line (github.com)

Show HN: AlphaSuite – An open-source platform for quantitative stock analysis (github.com)

Show HN: Async – Claude code and Linear and GitHub PRs in one opinionated tool (github.com)

Show HN: A zoomable, searchable archive of BYTE magazine (byte.tsundoku.io)

Show HN: Regolith – Regex library that prevents ReDoS CVEs in TypeScript (github.com)

Show HN: Turn Markdown into React/Svelte/Vue UI at runtime, zero build step (markdown-ui.com)

Show HN: I integrated my from-scratch TCP/IP stack into the xv6-riscv OS (github.com)

Show HN: Diggit.dev – Git history for architecture archaeologists (diggit.dev)

Show HN: Smooth – Faster, cheaper browser agent API (smooth.sh)

Show HN: Sideko – Hybrid deterministic/LLM generator for API SDKs and docs (github.com)

Show HN: Gonzo – A Go-based TUI for log analysis (OpenTelemetry/OTLP support) (github.com)

Show HN: Base, an SQLite database editor for macOS (menial.co.uk)

Show HN: Spoon-Bending – a framework for analyzing GPT-5 alignment behavior (github.com)

Show HN: Timep – A next-gen profiler and flamegraph-generator for bash code (github.com)

Show HN: Checkpoints for Claude Code [video] (youtube.com)

Show HN: I Built a XSLT Blog Framework (vgr.land)

Show HN: CashLedger – Offline-first PWA for cash tracking (cashflow-friend-pwa.vercel.app)

Show HN: My OSS P2P file transfer tool for learning Next.js (as a C++ dev) (privydrop.app)

Show HN: Built a tool to analyze the performance and risk of your IBKR portfolio (ibviz.com)

Show HN: Port Kill – A lightweight macOS status bar development port monitor (github.com)

Show HN: Bicyclopedia (bicyclopedia.lemoing.ca)

Show HN: Stagewise – frontend coding agent for real codebases (stagewise.io)

Show HN: Clearcam – Add AI object detection to your IP CCTV cameras (github.com)

Show HN: Enterprise MCP Bridge – Solving the MCP Chaos for IT (blog.inxm.ai)

Show HN: MaskWise: Redact, mask, and anonymize data in training files for LLMs (github.com)

Show HN: Framework to create linters for Python, YAML, TOML, JSON (github.com)

Show HN: GigsAlert - Never Miss an Urgent Gig Again (gigsalert.vercel.app)

Show HN: CasCache – multi-generational cache with optimistic concurrency control (github.com)

Show HN: Sping – An HTTP/TCP latency tool that's easy on the eye (dseltzer.gitlab.io)

Show HN: Rustormy – a neofetch-style weather CLI in Rust (github.com)

Show HN: OpenCQRS – A new CQRS framework for JVM developers (github.com)

Show HN: Word Slicer – browser word game (follow-up, looking for feedback) (wordslicer.com)

Show HN: Sip: Alternative to Git Clone (github.com)

Show HN: Luminal – Open-source, search-based GPU compiler (github.com)

Show HN: Ubon – a solution for the "You're absolutely right" debugging dread (github.com)

Show HN: SecretMemoryLocker – File Encryption Without Static Passwords (github.com)

Show HN: Game demo made with my homemade game engine (reprobate.site)

Show HN: I built AI Agents that automate comprehensive due diligence on stocks (agents.decodeinvesting.com)

Show HN: Simdgrep is a file grepper not written in Rust (github.com)

Show HN: Cosmic AI Platform – Build and deploy CMS sites using natural language (cosmicjs.com)

Show HN: I replaced vector databases with Git for AI memory (PoC) (github.com)

Show HN: Arabic Vocab API (egyptian-arabic-vocab-selmetwa.koyeb.app)

Show HN: Rebuilding GPT2 inference in ~500 lines of (commented) code (khamidou.com)

Show HN: I estimated the carbon impact of different LLMs (modelpilot.co)

Show HN: Lateral Thinking Puzzles – AI host that only answers Yes/No/Unknown (lateralthinkingpuzzles.org)

Show HN: Titan Breach – AI-driven cybersecurity platform (platform.titanbreach.com)

Show HN: Samchika – A Java Library for Fast, Multithreaded File Processing

Comments (56)