TL;DR we ended up tuning the ARP cache on our EC2 instances
This article was posted on our Medium Engineering Blog.