It is not uncommon to observe complacency when securing digital identities: organisations mandate strong password requirements and enforce multifactor authentication (MFA). Then, if a user account’s credentials are compromised, the threat actor will be thwarted by MFA. Job done! Well, not quite.
No doubt, readers will all be aware that threat actors persistently invent novel techniques to bypass the myriad of security controls deployed by organisations. This is no different for MFA, and one such technique garnering notoriety of late is known as MFA fatigue (T1621), and sometimes MFA spamming or MFA bombing. Our industry is increasingly observing breaches resulting from MFA bypass techniques, and even state-backed threat actors such as Cozy Bear (APT29) are known to employ this technique. Notable breaches leveraging MFA attacks to gain initial access include the infamous Uber breach attributed to the Lapsus$ group (Gatlan, 2022) along with several US government entities compromised by a Russian threat actor (Jenkins, Hawley, Najafi, and Bienstock, 2021). These types of attacks often also leverage social engineering to coerce victims into providing their MFA codes to attackers, rendering security controls ineffective. This stresses the importance of deploying robust detection capabilities in conjunction to best practice security controls.
There are several ways this attack can be carried out, but in essence all fundamentally rely on the attacker repeatedly prompting the target victim with MFA notifications until the victim accepts. This blog will provide an in-depth breakdown of Azure Active Directory’s (AAD) SigninLog schema so that readers can learn how these log records are generated and how they can be leveraged to detect MFA fatigue attacks. It will also provide a KQL query that you can immediately deploy to Sentinel which will detect various methods of MFA fatigue attacks. Finally, recommended security controls to help prevent MFA fatigue attacks will also be provided.
Understanding Sign in Logs
Successful detection engineering relies upon a comprehensive understanding of our data sources. We will utilise the SigninLogs table in Microsoft Sentinel to build our alert rule, so we must recognise a few key points around the data schema before designing our queries. AAD provides a powerful activity log which captures sign-in events from users against applications and services that are authenticated via AAD. Log records include numerous fields that provide valuable information about the authentication event, including:
- Timestamps of the various authentication flow events
- User principal name
- Source IP address
- Target application
- Authentication protocol
- User agent
- Result status and error codes
- Information about the source device
Something that must be clearly understood is the various timestamp columns in the schema. Currently there are:
- TimeGenerated: based on the time the log record was ingested into Log Analytics. In my testing, I have observed up to a one-hour delay between the initial sign in attempt and the final log ingestion event into Sentinel (see image). This is because Microsoft continues to generate new log records as more information is gathered about the authentication event.
- CreatedDateTime: based on the start and completion of the sign in event.
- authenticationStepDateTime: This field is nested as a key value pair inside the AuthenticationDetails array and is based on the time of each authentication flow event. Note that Microsoft round this number down, meaning it is not accurate to the microsecond and can cause issues when trying to find a maximum timestamp.
Each event in the SinginLogs table will include all three timestamps. Analysis of SigninLog records should be focused around CreatedDateTime and authenticationStepDateTime and not TimeGenerated, as the latter won’t provide accurate timestamps for the sign in activity itself.
Next, there are three other fields which prove particularly useful for correlating sign in events. These are UserPrincipalName, CorrelationId, and AuthenticationDetails which are also included within each log event in the table.
Correlation ID is a unique identifying number for that sign in session. One sign in attempt can have multiple distinct log events based on the authentication flow (entering password, prompting for MFA, accepting MFA, and successful sign in) but will contain the same CorrelationId value for the respective events. For example, when I sign into the Azure Portal, three SigninLog events are generated for each stage of the authentication flow:
You will note that the CorrelationId value is the same for each log event.
The AuthenticationDetails is a field containing a JSON array that provides information about the authentication flow. Each stage of the sign in attempt will generate an updated field in the AuthenticationDetails array which contains the timestamp and status of that authentication flow event. This is an important consideration for MFA sign ins, as the various MFA check events will all generate additional SigninLog records reflecting the authentication flow.
In the example above, when I enter my password, a new SigninLog event was generated which includes an AuthenticationDetails array. That array provides additional details about that stage in the sign in flow event, which in this example was me correctly entering my password:
Next, when prompted for MFA, a second SigninLog record is generated. The log record is the same except for the AuthenticationDetails column which now includes a new field in the array. This time detailing the MFA prompt:
Finally, after entering my MFA code, the third log record is generated. Again, with an updated AuthenticationDetails column containing a new field stating that MFA was successfully completed:
ResultType / ResultDescription
Also worth noting is that Microsoft include several different results based on the outcome of the authentication attempt. For example, 500121 is the ResultType for a logon which failed MFA, while ResultType 0 is for successful logons.
Putting it all together
In essence, here is a step-by-step of how the log records are generated:
- When you attempt an interactive sign in by typing in your password, a new log record will be generated with a time stamp captured in the CreatedDateTime field based on when the logon was initiated.
- Then each stage of the authentication flow will generate a new log record in the SigninLogs table but with the same CreatedDateTime and CorrelationId values.
- However, each new log record will have an updated row in the AuthenticationDetails array detailing the stage of the sign in flow with a corresponding timestamp for that event. Note that if you begin a new authentication flow, a new AuthenticationDetails array will be created. For example, if you cancel the first MFA prompt and begin the MFA check flow again.
- New log records will continuously be generated as Microsoft gathers more information about the sign in event. As stated previously, this can take up to one hour.
Most important to remember is that the CreatedDateTime field in the log record is based on the start of the sign in session, and subsequent logs records will be generated with a dynamic AuthenticationDetails array for the authentication progress events, and each field in that array will have its own unique timestamp that corresponds to the time of that authentication flow event.
Leveraging this to detect MFA fatigue attacks
Next, we must understand the different methods in which an MFA fatigue attack can be carried out.
The threat actor (TA) enters the correct password, moves to the MFA screen, and waits for the legitimate end user to accept the MFA prompt. If it times out, the TA prompts for MFA again within that same session. They keep doing this until the user accepts the MFA prompt. This will all be recorded as a single authentication flow.
This will show in the SigninLogs as a single sign in attempt – only one CorrelationId – but will have multiple individual log records capturing the authentication flow. As this is all a single authentication flow, the AuthenticationDetails array will continuously be updated with each flow event. The following log records correspond to a sign in session where I prompted for MFA 3 times after entering my password:
Note that the AuthenticationFlowEvents is a custom column illustrating the number of fields in the AuthenticationDetails array. You can see that in this example, there are 11 log records with the same CorrelationId and CreatedDateTime, but the AuthenticationDetails array contains more fields as time goes on.
Expanding the final AuthenticationDetails array shows that there were 3 MFA deny events for this particular sign in session.
Alternatively, the TA could attempt to log into an application by entering the correct password then prompting for MFA. If they do not get a response, they could restart the MFA authentication flow by cancelling the first prompt and then re-prompting from that session. This will be recorded as multiple different authentication flows originating from a single session.
This will show in the SigninLogs as a single sign in attempt – only one CorrelationId – but will have multiple individual log records capturing the authentication flow. However, as the TA has begun a new MFA authentication flow, the AuthenticationDetails array will only capture the single MFA prompt from each MFA check, meaning that there will never be a single AuthenticationDetails array that contains the total MFA prompts. The following log records correspond to a sign in session where I restarted the MFA flow 6 times:
In this case, there were 41 log records with the same CorrelationId value but distinct AuthenticationDetail arrays. No AuthenticationDetail array included more than one MFA deny event.
The TA consistently begins new sign in sessions, inputs the correct credentials, and prompts for MFA. They may do this across a range of applications or by ending the session each time an MFA prompt is denied or timed out.
This will appear in the SigninLogs as several distinct sign in attempts – multiple different CorrelationIds all with unique AuthenticationDetail arrays. The following log records correspond to 7 different sign in sessions which all prompted for MFA:
In this case, there were 29 log records which had different CorrelationId and CreatedDateTime values based on the sign in session. Each AuthenticationDetail array included the MFA deny event that corresponded to the session.
Understanding the difference in how the log records are created is crucial as it will influence our query logic. Due to the way SigninLog records are constructed, it means that the obvious way to detect MFA fatigue attacks – counting logon failures with a ResultType of 500121 (Authentication failed during strong authentication request), or counting the number of distinct CorrelationIds where the authentication resulted in an MFA failure – won’t work. Either you will count duplicate events, or you will miss events where a single correlation ID has multiple distinct MFA prompts. Those queries may also fail to detect sessions which contained multiple MFA prompts but resulted in a successful logon.
Therefore, to detect MFA fatigue attacks, the query will need to count the number of distinct MFA prompts per sign in session. We can achieve this via a single query which performs the following steps:
- Unpacks the AuthenticationDetails array for each log record.
- Extracts the unique timestamps that correspond to MFA prompts within the authentication flow.
- Counts the number of unique MFA prompts per sign in session.
- Aggregates the number of MFA prompts per user principal name to account for cases where there are multiple sessions.
When engineering queries, we must also ensure that the results contain sufficient detail for analysts to drill down in their investigation. The alert should provide as much context as possible to avoid requiring analysts to run additional queries to simply begin an investigation. So, in this case, it’s also important that we don’t lose the broader log record data when unpacking the AuthenticationDetails array.
This can be achieved via the following query:
//Title: Potential MFA fatigue attack //TTPs: T1621 //Cadence: 20 minutes let data=materialize (SigninLogs); //Materialize the data for efficiency. let interval = 20m; //Set bin range interval. let threshold = 3; //Set MFA prompt threshold. data | join kind=inner(data //Do this as a separate query so we can pass the results back to our main query and ensure we don't lose any important log context. | mv-expand todynamic(AuthenticationDetails) //Unpack the AuthenticationDetails array so we can extract the unique timestamps of MFA deny events. | extend AuthStepResult = tostring(AuthenticationDetails.authenticationStepResultDetail) | extend AuthStepTime = todatetime(AuthenticationDetails.authenticationStepDateTime) | where AuthStepResult has "MFA denied" //Find only those which contain an MFA deny. | summarize arg_max(CreatedDateTime, *) by CorrelationId, AuthStepTime //Get the unique timestamps per correlation ID value which contained an MFA deny. | extend MFAEvents = pack("AuthenticationStepTime", AuthStepTime, "FlowEvent", AuthStepResult) //Pack the unique MFA deny events so we can pass this information back later. | summarize AppsTargeted = make_set(AppDisplayName), MFAEvents = make_list(MFAEvents), arg_max(CreatedDateTime, *) by CorrelationId //Count the number of unique MFA deny events per correlation ID. | extend MFAFailures = array_length(MFAEvents) | distinct CorrelationId, MFAFailures, tostring(MFAEvents)) on CorrelationId //Join on correlation ID so we can retrieve the original log record along with our additional data about MFA deny events. | summarize arg_max(CreatedDateTime, *) by CorrelationId | summarize StartTime = min(CreatedDateTime), EndTime = max(CreatedDateTime), TotalMFAFailure = sum(MFAFailures), CorerlationIDs = make_set(CorrelationId), UniqueCorrelationIDs = dcount(CorrelationId), MFAEvents = make_set(MFAEvents), AppsTargeted = make_set(AppDisplayName), arg_max(TimeGenerated, *) by UserPrincipalName, BinTime = bin(CreatedDateTime, interval) //Count MFA the number of MFA prompt events across all sessions for each UPN. | extend Reason = strcat(UserPrincipalName, " received ", TotalMFAFailure, " MFA deny events to the application ", AppsTargeted, " from the IPaddress ", IPAddress) //Add context to aid investigation. | project-reorder TimeGenerated, StartTime, EndTime, TotalMFAFailure, Reason, MFAEvents, ResultType, ResultDescription, CorrelationId | where TotalMFAFailure >= threshold
Step-by-step, this query:
- Queries the SinginLogs table.
- Performs an inner join so we can run a sub query on the SigninLog table.
- In this join, it uses mv-expand to expand all the nested fields in the AuthenticationDetails array so that we can query each event.
- It then extracts the authenticationStepTime and authenticationStepResultDetails and adds them to two new columns called AuthStepTime and AuthStepResult.
- It then filters away any results which don't contain an MFA deny.
- It then uses a summarize statement to remove duplicates, summarising by AuthStepTime and Correlation ID to find the unique timestamps for authentication deny events.
- It also packs the various MFA deny events into a new array called MFAEvents so that this information can be provided in the final query results to aid investigations. It will also be used to count the number of total MFA failures later.
- Then it summarizes by CorrelationId to count the number of unique MFA prompts per CorrelationId value.
- Now we have a count of the number of unique MFA prompts per CorrelationId, along with an array that include information about each of those MFA prompts.
- Next, we join back to our original query on the CorrelationId value. The purpose of this is to retrieve the original log record which will include the full authentication flow, rather than just MFA deny events.
- Then it summarizes by UserPrincipalName and adds up all the distinct MFA failures in a column called TotalMFAFailure to count how many MFA failures each user has had within our 20-minute bin range.
- Finally, it then compares that TotalMFAFailure value to our specified threshold to find instances where a user has had more MFA prompts than our alert threshold.
This query will detect all three types of MFA fatigue attacks described above and has an exceptionally low false positive rate (zero false positives in a 90-day simulation). I would recommend deploying the alert with a 20-minute interval and lookback. Additional logic can be built into the query to ignore MFA prompts from office IPs, domain-joined devices, or other trusted sources. This may be useful in situations where you experience false positives such as when users restore browser sessions and have multiple tabs prompts for MFA at once.
Recommend security controls
Along with developing alert rules in Sentinel to detect MFA fatigue attack, there are several security controls that organsations can deploy to help prevent these types of attacks:
- Ensure MFA number matching is enabled in your environment. More information about MFA number matching can be found here.
- Enable Fraud Alerts via Microsoft Entra to provide users with the ability to report malicious MFA prompts after denying.
- Restrict logons to compliant devices based on domain-trust or other compliance requirements such as operating system version, device password requirements, or risk-score. More information can be found here.
- Where possible, deploy conditional access policies which limit offshore sign ins or sign ins from countries which users are not expected to operate from.
- Leverage Microsoft’s sign in risk level to block users from signing in when the attempt is deemed high risk.
- Educate users on identifying legitimate MFA prompts and organisation processes for MFA requirements. Many MFA fatigue attacks leverage social engineering to coerce victims into accepting MFA prompts or providing MFA codes to the attacker. User education can help protect users from falling victim to this type of social engineering.
- In cases where MFA fatigue attacks are suspected, the target user’s password should be rotated.
Gatlan, S. (2022, September 19). Uber links breach to Lapsus Group, blames contractor for hack. BleepingComputer. Retrieved from https://www.bleepingcomputer.com/news/security/uber-links-breach-to-lapsus-group-blames-contractor-for-hack/
Jenkins, L., Hawley, S., Najafi, P., & Bienstock, D. (2021, December 6). Suspected Russian Activity Targeting Government and Business Entities Around the Globe. Mandiant Blog. Retrieved from https://www.mandiant.com/resources/blog/russian-targeting-gov-business