Admin 2 » knocktotalk

Major Microsoft Outage! Azure and 365 Services Down

Jul 30, 2024 | 0 comments

Microsoft is investigating an ongoing global outage blocking access to some Microsoft 365 and Azure services.

According to DownDetector, which tracks IT Outages Globally, Minecraft, Microsoft 365, Microsoft Teams, and Microsoft Azure are among the affected services.

Microsoft confirmed that the outage has impacted the following services:

Impacted Services	Not Impacted Services
Microsoft 365 admin center	SharePoint Online
Intune	OneDrive for Business
Entra	Microsoft Teams
Power Platform services	Exchange Online

Microsoft shared that the incident resulted from a buggy configuration change deployed by Azure backend workloads as Redmond explained:

“A backend cluster management workflow deployed a configuration change causing backend access to be blocked between a subset of Azure Storage clusters and compute resources in the Central US region,”

“This resulted in the compute resources automatically restarting when connectivity was lost to virtual disks.”

“Users who can access the impacted Microsoft 365 services may experience latency or degraded feature performance,” Microsoft explains on the service health status page.

“We’re analyzing traffic patterns within a section of a networking infrastructure to assist our investigations. Additionally, we’re reviewing mitigation options, including potential failovers, to provide relief.”

After acknowledging the outage Microsoft 365 Stated on social media:

“We’re currently investigating access issues and degraded performance with multiple Microsoft 365 services and features. More information can be found under MO842351 in the admin center,”

The last update at 18:13 UTC, a subset of customers may have experienced issues connecting to Microsoft services globally

“We are investigating reports of issues connecting to Microsoft services globally. Customers may experience timeouts connecting to Azure services,” Redmond says on the Azure status page.

“We have multiple engineering teams engaged to diagnose and resolve the issue. More details will be provided as soon as possible.”

Microsoft 365 customers experienced another severe outages in last week after a faulty CrowdStrike Falcon update also caused a widespread Windows outage, crashing systems with blue screen of death (BSOD) errors and impacting many organizations and services worldwide, including banks, airlines, airports, TV stations, and hospitals.

Follow Knocktotalk Blog & Social Media to stay up to dated with Technology.

CrowdStrike’s Sensor Issue – From Crises to Solution

Jul 27, 2024 | 1 comment

Last week a major IT news event created a buzz in the world of IT. CrowdStrike a well-known company which offers solutions like cloud workload protection, endpoint security threat intelligence and cyberattack response services had some issue. This issue caused the sensor to conflict with the Windows operating system that resulted in the infamous ‘Blue Screen of Death’, Technical Glitch. Approximately during a week they succeed to overcome the disruption, the Global IT Outage.

David Weston – Microsoft VP of Enterprise and OS Security described some facts about this Technology Glitch as, “We currently estimate that CrowdStrike’s update affected 8.5 million Windows devices, or less than one percent of all Windows machines. While the percentage was small, the broad economic and societal impacts reflect the use of CrowdStrike by enterprises that run many critical services.”

On Saturday, David Weston described his “first responder” approach. Since the start, we engaged over 5,000 support engineers working 24×7 to help bring critical services back online. We are providing ongoing updates via the Windows release health dashboard, where we detail remediation steps, including a signed Microsoft Recovery Tool.

George Kurtz, CEO of Crowdstrik, showed gratitude to overcome the Situation: “I want to share that over 97% of Windows sensors are back online as of July 25. This progress is thanks to the tireless efforts of our customers, partners, and the dedication of our team”.

He apologized for the disruption and promised to resolve it on the urgent basis. He stated, “To our customers still affected, please know we will not rest until we achieve full recovery. At CrowdStrike, our mission is to earn your trust by safeguarding your operations. I am deeply sorry for the disruption this outage has caused and personally apologize to everyone impacted. While I can’t promise perfection, I can promise a response that is focused, effective, and with a sense of urgency.”

Follow Knocktotalk to stay up to date with the Cyber Updates and Solutions.

Issue Resolved – Apology from Crowdstrike CEO

Jul 20, 2024 | 0 comments

Kurtz – CrowdStrike CEO has apologized to the company’s customers and partners for crashing their Windows systems, and the company has described the error that caused the disaster.

The issue has been identified, isolated and a fix has been deployed.

George Kurtz – CrowdStrike Founder and CEO said:

“I want to sincerely apologize directly to all of you for today’s outage. All of CrowdStrike understands the gravity and impact of the situation. We quickly identified the issue and deployed a fix, allowing us to focus diligently on restoring customer systems as our highest priority.

The outage was caused by a defect found in a Falcon content update for Windows hosts. Mac and Linux hosts are not impacted. This was not a cyberattack.

CrowdStrike is operating normally, and this issue does not affect our Falcon platform systems. There is no impact to any protection if the Falcon sensor is installed. Falcon Complete and Falcon OverWatch services are not disrupted.

Nothing is more important to me than the trust and confidence that our customers and partners have put into CrowdStrike. As we resolve this incident, you have my commitment to provide full transparency on how this occurred and steps we’re taking to prevent anything like this from happening again.”

Ref: Crowdstrike Blog Statement on IT Outage

Details

Symptoms include hosts experiencing a bugcheck\blue screen error related to the Falcon sensor.
Windows hosts which have not been impacted do not require any action as the problematic channel file has been reverted.
Windows hosts which are brought online after 0527 UTC will also not be impacted
This issue is not impacting Mac- or Linux-based hosts
Channel file “C-00000291*.sys” with timestamp of 0527 UTC or later is the reverted (good) version.
Channel file “C-00000291*.sys” with timestamp of 0409 UTC is the problematic version.

Note: It is normal for multiple “C-00000291*.sys files to be present in the CrowdStrike directory – as long as one of the files in the folder has a timestamp of 0527 UTC or later, that will be the active content.

“CrowdStrike has corrected the logic error by updating the content in Channel File 291.” Crowdstrike Concluded in their Blog.

That didn’t solve the problem for the many, Windows machines that had already downloaded the defective content then crashed, though.