Amazon Web Services Outage
Amazon Web Services (AWS), Internet infrastructure service that is the backbone of many websites and apps, experienced a long outage last week that affected a large portion of the Internet ,which Amazon now say they have almost restored. A “relatively small addition of capacity” to the Amazon Kinesis real-time data processing service triggered a widespread Amazon Web Services outage last week, the company said.
Amazon said that it was working to add new computer servers, but this created a large number of errors that took down Web-connected security cameras, robotic vacuums and publishing sites.
Within a few hours, the malfunctions began hitting customers of Amazon Web Services, the company’s cloud-computing unit. Customers of the Amazon-owned Ring security camera service couldn’t log in or watch video. Users struggled to operate their iRobot vacuum cleaners because the outage affected the iRobot Home app.
The leading multinational software company Adobe was affected, although among the first AWS customers to report an all clear.
Media companies, including The Washington Post (owned by Amazon founder and chief executive Jeff Bezos), also experienced publishing system outages. “We have restored all traffic to Kinesis Data Streams via all endpoints and it is now operating normally. We have also resolved the error rates invoking CloudWatch APIs,” reads an update on the AWS Service Health Dashboard. “We continue to work towards full recovery for IoT Site-Wise and details of the service status is below. All other services are operating normally. We have identified the root cause of the Kinesis Data Streams event, and have completed immediate actions to prevent recurrence.”
AWS is one of the most widely-used cloud computing services in the world, so any issues can have major ripple effects for other web services and apps, as evidenced by the number of companies affected by the outage.
Amazon apologised and said it would apply lessons learned to further improve its reliability: “While we are proud of our long track record of availability with Amazon Kinesis, we know how critical this service, and the other AWS services that were impacted, are to our customers, their applications and end users, and their businesses. We will do everything we can to learn from this event and use it to improve our availability even further.”
CloudWatch and other large AWS services will move to a separate, partitioned front-end fleet. It's also working on a broader project to isolate failures in one service from affecting other services. This is the first major outage to interrupt many customers since 2017, when the same US-EAST-1 Region websites went offline.
CNBC: Washington Post: GeekWire: The Verge: ZDNet:
You Might Also Read:
Phishers Target Microsoft & Google Public Cloud Users: