We discovered it during a quarterly infrastructure review. While auditing our AWS security groups, I noticed an inbound rule on our internal API server's security group that should not have existed:
Type: Custom TCP
Port: 3001
Source: 0.0.0.0/0
Port 3001 was our internal admin API. The one with endpoints like /admin/users, /admin/transactions, /admin/kyc-override. The one with no authentication because we assumed it was only accessible from within our VPC.
0.0.0.0/0 means: anyone on the internet.
The rule had been created 11 days earlier.
How It Got There
A developer was debugging a connectivity issue between two EC2 instances. Instead of checking the VPC routing, they temporarily added an inbound rule to allow all traffic on port 3001. They planned to remove it after the debugging session. They forgot.
This is not a story about malicious intent. It is a story about temporary changes that become permanent, and infrastructure with no guardrails to prevent it.
What Was Exposed
The internal admin API at port 3001 had:
- /admin/users — full user list with KYC data
- /admin/investments — all investment records
- /admin/kyc — KYC document status and override endpoints
- /admin/transactions — all financial transaction history
- /admin/wallets — XRP wallet addresses and balances
None of these endpoints had authentication middleware. They were meant to only be accessible from our admin frontend, which was running on the same EC2 instance and accessed the API via localhost.
Was It Actually Accessed
We pulled 11 days of Apache access logs from the instance. Filtering for port 3001 requests from external IPs:
grep ':3001' /var/log/apache2/access.log | grep -v '10.0.' | grep -v '127.0.0.1'
We found 847 requests from external IPs. The vast majority were automated scanners — common bots that probe all open ports and try common paths like /admin, /.env, /wp-admin, etc.
However, 23 requests were more specific. They hit our actual API routes — /admin/users, /admin/investments. These were not random probes. Someone had discovered the endpoint and was specifically querying it.
Those 23 requests all came from the same IP address, across a 4-hour window on day 7 of the exposure. The IP geolocated to a data center in Singapore. We could not determine if this was a legitimate security researcher, a malicious actor, or an automated scanner that had learned our API structure.
We treated it as a breach and acted accordingly.
Immediate Response
1. Removed the security group rule immediately.
2. Rotated all API keys, database passwords, and JWT secrets.
3. Invalidated all active user sessions (forced re-login for everyone).
4. Filed an internal security incident report.
5. Notified legal and compliance teams.
6. Reviewed the 23 specific requests to determine what data was accessed.
The 23 requests had returned data — the API had responded with 200s. We could not determine if the data was captured or discarded. We documented this as a potential data exposure event.
Structural Fixes
Fix 1: Mandatory authentication on all admin routes. Regardless of network-level access controls, every admin endpoint now requires a JWT with an admin role claim. Defense in depth: network controls and application controls.
Fix 2: AWS Config rules to detect open security groups. We created a Config rule that fires an alert whenever any security group inbound rule has 0.0.0.0/0 as a source for non-standard ports (anything other than 80 and 443):
// AWS Config rule pseudocode
if (rule.source === '0.0.0.0/0' && !['80', '443'].includes(rule.port)) {
triggerAlert('Open security group detected: ' + securityGroupId);
}
Fix 3: Temporary rule expiration. We evaluated AWS Network Firewall and also wrote a Lambda that runs every 15 minutes, identifies security group rules older than 24 hours added outside of our Terraform infrastructure code, and sends a Slack alert. Rules added outside of IaC are flagged as drift.
Fix 4: Infrastructure as Code enforcement. All security group changes now go through Terraform. Direct console edits to security groups generate a CloudTrail event that triggers a Lambda alert. If the change is not in Terraform, it gets flagged and reviewed within 1 hour.
Fix 5: VPC endpoint for internal services. Our internal admin API now listens on a Unix socket, not a TCP port. The admin frontend on the same instance communicates via the socket. There is no TCP port to accidentally expose.
What I Learned About Cloud Security
The biggest lesson from this incident was not technical. It was operational.
Cloud environments change fast. Developers add security group rules to debug issues. They mean to clean up. They do not. Nobody notices for 11 days because there is no automated detection.
Security groups are not just a firewall configuration. They are a security boundary. Any change to them, even temporary, should require the same review process as a code change.
The three things that would have prevented this incident entirely:
1. Authentication on every admin endpoint, regardless of network location
2. Automated detection of 0.0.0.0/0 rules on non-standard ports
3. A cultural norm that temporary security group changes require a Jira ticket with an expiry date
We now have all three. But we had to learn the hard way.