Cloud Security

Correlating CloudTrail and Identity Logs: The Attack Patterns You're Missing

An AWS AssumeRole call from Singapore and an Okta authentication event on the same account thirty seconds later are two low-priority alerts in two separate tools — until you correlate them. This article walks through the event fields, entity resolution logic, and detection window configuration that connects CloudTrail and identity logs into a confirmed credential theft sequence.

Cloud trail and identity log correlation visualization

Two Blind Men Describing the Same Elephant

AWS CloudTrail sees API calls. Okta System Log sees authentication events. Active Directory event logs see lateral movement and privilege escalation. Each of these sources has excellent visibility into its own domain — and nearly zero visibility into what the others are seeing at the same moment.

This is the fundamental gap in single-source cloud security monitoring. CloudTrail will faithfully record an AssumeRole call from an IP in São Paulo at 02:14 UTC. Okta will record a successful MFA push on the same user account from a device registered in that user's normal geography at 02:11 UTC — three minutes earlier. Neither system, looking at its own logs, has the information to know that the sequence is anomalous. The Okta event is a clean authentication. The CloudTrail AssumeRole is a normal API operation. Together, with the geographic and temporal delta identified, they are a plausible session-token theft — someone authenticated legitimately through Okta, the session token was extracted (T1552.001), and it is now being used to assume an IAM role from a different geography.

The attack is only visible at the seam between two data sources. That seam is exactly where most mid-market detection architectures have no coverage.

The CloudTrail Event Fields That Actually Matter

CloudTrail logs are voluminous. A moderately active AWS environment — 400 EC2 instances, a handful of Lambda functions, S3 for data storage — can generate several hundred thousand CloudTrail events per day. Most of them are routine: DescribeInstances, GetObject, ListBuckets called by automated tooling. The events that matter for cross-source correlation occupy a small subset of API calls, and they cluster around a predictable set of patterns.

The high-signal CloudTrail event categories for cross-source correlation:

  • Identity and access manipulation: CreateAccessKey, AttachUserPolicy, PutUserPolicy, AddUserToGroup, CreateLoginProfile — these are T1098 (Account Manipulation) indicators that warrant immediate correlation against the calling identity's behavior baseline and any concurrent Okta or Entra ID events on the same account.
  • Cross-account and role assumption: AssumeRole, AssumeRoleWithSAML, AssumeRoleWithWebIdentity — particularly when the assumed role differs from the account's normal access pattern, or when the source IP is outside the entity's behavioral geographic baseline.
  • Data staging and exfiltration setup: PutBucketPolicy with public access enabled, CreateBucket in a new region, GetBucketAcl called at unusual frequency — potential T1537 (Transfer Data to Cloud Account) staging behavior.
  • Defense evasion: DeleteTrail, UpdateTrail with logging disabled, StopLogging — these almost always indicate post-compromise cleanup (T1562.008) and should trigger immediate escalation regardless of other corroborating signals.

The challenge is not identifying which events are high-signal in isolation. The challenge is building a detection system that can evaluate those events in the context of what the identity system and endpoint telemetry were showing within the same time window.

Entity Resolution: The Technical Problem That Breaks Naive Correlation

Before you can correlate a CloudTrail event with an Okta event, you need to know that the IAM principal in the CloudTrail record and the user in the Okta record are the same entity. This is not as straightforward as it sounds.

An IAM user might be represented in CloudTrail as arn:aws:iam::123456789012:user/jsmith. The same person's Okta record has a sub claim of [email protected] and a display name of "John Smith." The Entra ID sign-in log shows UPN: [email protected]. The Windows AD event log shows a logon event for domain account CORP\jsmith. These are four different identifier formats for the same person, spread across four different log sources with no native join key.

Entity resolution — building and maintaining a canonical identity graph that maps these heterogeneous identifiers to a single entity record — is the foundational technical requirement for cross-source correlation. Without it, you are not correlating events; you are running independent queries against independent datasets and manually noting when the results seem related.

A properly constructed entity graph maps each identity to its canonical form and maintains the source identifiers as attributes. It also handles service accounts and machine identities separately from human accounts, and distinguishes between interactive user sessions and programmatic API access — distinctions that are critical for determining whether an AssumeRole call is expected behavior or anomalous access.

The Detection Window and Why It Is Not Arbitrary

Cross-source correlation requires defining a detection window — the time period within which events from multiple sources must occur to be considered part of the same sequence. This number matters more than most practitioners initially expect.

A 30-second window catches same-session token reuse and near-instantaneous automated attack tooling, but misses the "low-and-slow" lateral movement patterns where an attacker manually pivots between steps over minutes. A 60-minute window catches those slow-moving scenarios but produces significant false positive correlation noise — many independent events from different users will spuriously co-occur within an hour simply by chance in an active environment.

The practical answer is technique-dependent windowing. Credential theft and immediate reuse (T1550) warrants a tight window — 60 to 300 seconds. Lateral movement after initial access (T1078 combined with T1021) might be detected across a 15-minute window. Data staging behaviors (T1537) are sometimes observed over hours. A correlation engine that applies a single global window to all technique pairings will either miss slow-moving attacks or generate unacceptable false positive rates on fast-moving ones.

This is not to say that dynamic windows eliminate false positives entirely. Any cross-source correlation system will produce some false positives — two independent events that happen to match a technique pattern within the window by coincidence. The quality signal to track is the escalation-to-confirmed-incident ratio, measured over weeks. If that ratio drops below roughly 30%, the window or scoring threshold needs adjustment.

A Worked Scenario: SaaS-to-Cloud Lateral Movement

Consider a mid-market professional services firm, roughly 700 employees, running AWS for their analytics workloads and Okta for SSO. A phishing campaign targets a finance team member. The user's Okta credentials are captured and a session cookie is extracted via an adversary-in-the-middle proxy (T1557.002). The attacker now has a valid Okta session.

What happens in the logs:

At T+0: Okta System Log records a successful authentication from the user's normal device (the original legitimate login the attacker proxied).

At T+4m: Okta System Log records a second session initiation for the same account from a residential IP in Eastern Europe — a new device fingerprint, new IP geolocation, no Okta FastPass device trust signal.

At T+7m: CloudTrail records an AssumeRole call via SAML federation under the user's IAM role from the same Eastern European IP. The assumed role has ReadOnlyAccess to S3 — not unusual for a finance user, so no IAM anomaly alert fires.

At T+9m: CloudTrail records repeated ListBuckets and GetObject calls against buckets containing financial data. Still within normal behavior for the role.

At T+14m: CloudTrail records a CreateAccessKey call under the assumed role — creating a long-lived credential that persists beyond the session.

Each individual event is explainable in isolation. The Okta re-authentication could be a coffee shop login. The S3 reads are role-appropriate. Only the CreateAccessKey at T+14m is anomalous in isolation — but by then, the access key exists and the session has had 14 minutes of unrestricted S3 read access.

A cross-source correlation engine with entity resolution applied would connect event 2 (Okta auth from anomalous geography) → event 3 (AssumeRole via SAML from same IP) → event 5 (CreateAccessKey under assumed role) as a three-event kill-chain sequence. MTTD in this scenario drops from "whenever someone manually reviews Okta anomalies" to approximately 15 minutes after initial anomalous authentication — with full context assembled for the analyst.

Building the Detection Coverage Map

Before implementing cross-source correlation, it is worth mapping your current detection coverage against the CloudTrail + identity attack surface explicitly. A coverage matrix — MITRE ATT&CK technique on one axis, detection source on the other — will quickly reveal which technique stages have single-source detection only and which have no coverage at all.

Techniques that are characteristically multi-source and frequently missing in mid-market deployments: T1078 (Valid Accounts) used for cloud service access, T1098 (Account Manipulation) combined with T1550 (Use Alternate Authentication Material), and T1537 (Transfer Data to Cloud Account) preceded by T1530 (Data from Cloud Storage Object). These are not exotic techniques. They appear in the majority of confirmed cloud-based intrusions that reach the data exfiltration stage.

The coverage map also helps you prioritize which source integrations to bring online first. If your current detection architecture has strong EDR coverage but no CloudTrail ingestion, the CloudTrail + identity correlation pair delivers the highest marginal detection coverage for typical cloud-assisted lateral movement patterns. If you have CloudTrail but no identity log correlation, the identity-to-cloud correlation pair closes the session token theft visibility gap.

Start with the highest-frequency attack patterns, map them to their required event sources, and build correlation detection iteratively. Detection coverage is not a binary state — it is a surface that expands as you connect more sources with entity resolution applied.