We had a bug that corrupted investment records silently for 3 weeks. No audit trail. No way to replay what happened. That was the day we decided to rebuild our transaction core with event sourcing. Twelve months later: here is what it looks like in production, what the migration cost us, and whether it was worth it.
Writing
Blogs
Long-form essays on distributed systems, blockchain architecture, security, and the craft of building reliable software. I write to think clearly.
Every 48 hours, our Node.js app in PM2 cluster mode would OOM and crash. We thought it was normal memory pressure. It was not. It was a leak in our WebSocket event listener code that accumulated unbounded objects in memory. Here is the full diagnosis with heap snapshots, the root cause, and the fix.
Our admin dashboard was loading slowly. We blamed it on a complex query. The real culprit was a silent N+1 problem in Sequelize that was generating one database query per user — and we had 3,000 users. Here is how we found it, what it looked like in the ORM layer, and how a single `include` option fixed it.
One misconfigured AWS security group inbound rule. That is all it took to expose our internal admin API — with no authentication — to the public internet for 11 days. We only discovered it during a routine audit. Here is the full story and how we locked down our AWS infrastructure for good.
Our Jenkins pipeline had been running flawlessly for 9 months. Then one routine deploy on a Tuesday afternoon cascaded into 4 hours of production downtime, two failed rollbacks, and a very angry on-call rotation. Here is exactly what happened, step by step, and how we rebuilt our CI/CD pipeline to never let it happen again.
XRP and XRPL look simple on the surface. Ledger transactions, wallets, payments — all well-documented. But building a production payment system on top of XRPL for real estate investments is a different beast. Here is every sharp edge we hit, from destination tags to reserve requirements to transaction finality.
Standard JWTs are signed but not encrypted. In a fintech platform handling investor data across multiple tenants, that was not good enough. Here is the full story of why we switched to JWE, how we implemented it with RSA-OAEP + AES-256-GCM, and what broke along the way.
At 9 AM on a Monday, every Redis key for our portfolio data expired simultaneously. What followed was a thundering herd that brought our API response times from 80ms to 14 seconds. Here is exactly how it happened, how we diagnosed it, and how we fixed it for good.
My first week at a new codebase, I found 12,000 lines of dead code, zombie services, and commented-out features from 2019. This is the story of how I got rid of all of it without breaking anything and what I learned about reading a codebase like an archaeologist.
A real incident from our fintech platform. A misconfigured analytics query, 3 AM pages, and the longest night of my engineering career. Here's everything that went wrong and what we fixed.
Exploring what automation and AI mean for human value, employment, and the future of software engineering — through the lens of economic history and philosophical reflection.
A speculative macro-level look at how AI will transform software engineering, employment, education, healthcare, sales, and income inequality by the year 2037.
Key findings from reading 100+ research papers and 1000+ videos on AI's impact on employment and value creation. AI is fast-forwarding value creation, enabling single-person unicorns, and rewriting the rules of mastery and entrepreneurship.
Beyond the Hype: The Real Impact of LLMs on Software Engineering (and Your Career)