
I found a bug in Rusoto, it’s the best & maybe the only AWS SDK for Rusoto programming language. It was painful, and irritating but the feeling I had when the bug was approved that it’s actually a bug was amazing!
What happened?
I was developing a Kinesis Consumer Client Library (KCL) in Rust, and suddenly after a month the AWS cloud trail bill increased by couple of thousand USD. What the fuck happened, it’s AssumeRole for my KCL!!
The bug in a nutshell, AssumeRole API was getting called 1 Million times in 1 hour instead of only one time (as it should be).
Bug Effect
- It was a multi-account AWS setup.
- The KCL was in account A & the Kinesis stream it self was in account B.
- AssumeRole is used for a cross-account authentication, so to use a service in another account. You have to do AssumeRole first.
- AssumeRole session lives for 1 hour & could be extended to 12 hours with AWS support help.
- You have to use this session while calling any api for the other account & when it’s expired re-call AssumeRole API — — -> The bug link in Rusoto.
- There was an AssumeRole request happening with each other request (~17000 per minute).
- So it was ~1M request per hour & it was supposed to be only one request.
- That caused throttling for the API because of AWS rate limits on AssumeRole API.
- Throttling the API resulted in way more logs on CloudTrail to notify that someone is abusing the AssumeRole API.
- More logs on CloudTrail caused the increase in the bill.
Bug Details
- The session_duration parameter is not used for caching.
- It causes:
- Huge performance issue, because it’s 2 requests instead of 1 (your API request + AssumeRole request).
- Also it causes throttling the Assume role API if you have a high load, which leads more money if CloudTrail is enabled.
- The session is valid for one hour, so it should be used till it’s expired.
Example: Kinesis stream get records API is calling Assume Role with each request instead of using the cached value.
The Solution is really simple & it’s a one line of code, use rusoto_credential::AutoRefreshingProvider to wrap the StsAssumeRoleSessionCredentialsProvider.
Post Mortem
I struggled a-lot in convincing my self that the bug is in Rusoto, it’s the best sdk out there, they can’t have such a bug & if they do I won’t be the first to find it. No Way!
Also because I’m newbie in Rust, it was very hard to debug the code as it’s really complicated.
This was a mistake, I would have saved a-lot of time and effort if this wasn’t my mind set.
Have faith in yourself!