Microsoft’s Data Leak: The Unexpected Challenge of AI Exposure

Microsoft inadvertently exposed 38 terabytes of sensitive data

This oversight happened when the company was sharing open-source training data from its GitHub repository. At this time the repository known as “robust-models-transfer” is no longer accessible. This repository pertained to a 2020 research paper on adversarially robust ImageNet models.

Behind the Breach

Azure’s Shared Access Signature (SAS) tokens, a crucial feature for securely sharing data, were at the center of this mishap. These tokens allow access to specific resources, execution of operations and connections from distinct networks.

However, a Microsoft employee accidentally shared a broad SAS token, exposing more than just the intended AI models from an internal storage account.

The leak consisted of backups of two former Microsoft employees’ workstations. This backup included passwords for Microsoft services, secret encryption keys, and over 30,000 internal Microsoft Teams messages from over 350 Microsoft employees.

The Potential Risks and Prevention

The situation was not just about data exposure. Wiz researchers pointed out that malicious actors could have altered the AI models in the storage account. In this case the leak could affect any user who trust Microsoft’s GitHub repository.

After learning about the breach on June 22 Microsoft swiftly revoked the problematic SAS token within two days. On the one hand, еhey assured that only specific data, related to two employees’ workstations, was exposed. On the other hand, they also emphasized that no customer data or other Microsoft services were compromised.

As a result of the breach, Microsoft improved its scanning service on GitHub to detect and warn about overly permissive SAS tokens. Subsequently, they advocated for best practices with SAS tokens. For example:

applying the principle of least privilege,
using short-lived SAS tokens,
treating SAS tokens as application secrets,
having a robust revocation plan,
regularly monitoring and auditing applications.

However, at Kaduu we caution against using SAS tokens for external sharing due to their inherent management challenges.

If you liked this article, we advise you to read our previous article about the massive cyberattack under MGM Resort. Follow us on Twitter and LinkedIn for more content.

Stay up to date with exposed information online. Kaduu with its cyber threat intelligence service offers an affordable insight into the darknet, social media and deep web.

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.