Symantec EDR Internals — Event Enrichment Rules [Part I]
In the first post of this series about Symantec EDR Internals, I've talked about “Criterion” a machine learning engine used by SEDR to detect files in the gray area. If you haven't read it please do (Link below)
For this post we’ll talk about a feature of SEDR called enrichment and how it works.
Enrichment Overview
From the oxford dictionary enrichment is:
The action of improving or enhancing the quality or value of something
In the context of detection and EDR. Enrichment is the idea of adding / giving context to an event or any piece of collected data.
So let’s say we have a hash of file. On its own this hash doesn’t provide any real value but if we answer the following question, like :
- Has the file been detected by AV vendors ?
- Is the file connected to any threat actors or campaign ?
- Are there any related samples online ?
- Does the file contain interesting strings ?
- Is the file packed ?
- Does the file spawn suspicious processes or execute suspicious commands (Process Execution)
- Etc.
We’ll go from having a simple hash to a lot more information and context. This is the process of enrichment.
The same would apply if we had a domain or an IP. Only the questions and information we get will be different. In the case of a domain for example we would be more interested in the registrar information and the IP associated with…etc.
In the context of Symantec EDR and when an event is collected, SEDR will try and provide as much context and additional data as possible. For example, a process creation event (TypeID 8001) will typically contain the following information :
- Parent / Child Image path on disk.
- Parent / Child Command line.
- MD5 / SHA256 hashes for both the Parent / Child processes.
- PID / PPID.
- Security Descriptor for both the Parent / Child process processes.
- Time of execution.
- Username and SID.
- Operating System.
- Publisher
In addition to this SEDR will enrich the event with other information such as mappings to ATT&CK.
But one interesting enrichment is the following :
This indicates a scheduled task was launched. I wanted to understand how SEDR was able to enrich the event with this kind of information. With this we start the journey to discover enrichment land.
SEDR Enriched Data
From the documentation about SEDR event schema we can see that there are 10 sub-fields related to the “enriched_data” field
- enriched_data.category_id
- enriched_data.category_name
- enriched_data.event_group_id
- enriched_data.extra_numeric_info.key_name
- enriched_data.extra_numeric_info.value
- enriched_data.extra_string_info.key_name
- enriched_data.extra_string_info.value
- enriched_data.rule_description
- enriched_data.rule_id
- enriched_data.rule_name
All of these seem very interesting, but i was interested in the “rule_name” sub field. So in my lab and in the SEDR interface i filtered for the existence of this field and removed any duplicated entries. Below are some examples of the rules that i found :
By reading the description of these rules we can see that they are inferring something about the behavior captured by the event. Let’s take an example.
The following command line from an process launch event :
Will have the following “rule_name” associated to it:
Understanding this and how SEDR enriches events with these rules can be very helpful for both blue and red teams.
- By understanding this, blue teams (detection engineers) can write more informed queries and obtain a deeper understanding of the underlying internals.
- Red teams can also use this to gain a deeper understanding and to try to bypass any detections that are in place and are based on these rules.
“atp-rules.sen” — Discovery
Using prior research I’ve made on SEDR events. I decided to start looking for traces of these enrichment rules on the endpoint. (You can more on this research below)
To recap, the events collected by SEDR are stored in LevelDB databases on the endpoint before they are sent to the EDR server. These files contained references to enrichment rules. Knowing I simply used the “grep” command on the whole Symantec directory located in “ProgramData” using the “eScheduledTask” keyword as an example
grep -irn "eScheduledTask" [Symantec_ProgramData_PATH]
The result of this command contained matches with LevelDB files which was to be expected. But another file popped up by the name of “atp-rules.sen”.
C:\ProgramData\Symantec\Symantec Endpoint Protection\CurrentVersion\Data\Definitions\EDRDefs\[Definition Version]\atp-rules.sen
Skimming through it quickly. This was actually a big json file with more than “599000” lines containing references to a lot of enrichment rules previously seen on the SEDR interface. (So this is it 😀)
“atp-rules.sen” — Structure
After spending quite sometimes analyzing the file. I’ve come with the following hypothesis of how things work.
The JSON file contain “actions” / “definitions” to be applied on a collected event in order to enrich it. Its divided into two major sections :
- Aggregators : Function definitions /references to be used by nodes to perform specific actions.
- Nodes : Represents a tree like structure where every node “can” be related to another following a specific flow based on conditions and node type
We’ll discuss the aggregators in a later blog post. For now let’s focus on the nodes and their structure.
At a minimum each node is comprised of the following fields
- nodeID
- childrenIDs
- parentID
- nodeType
To switch from one node to another, the “parser” must read the value of the “nodeType” first to determine the action to take and the next set(s) of node(s) to jump to. In the version I’ve analyzed there exists 8 node types:
- AlphaNoTestNode : Jump directly to the nodeID’s specified inside the childrenIDs attribute.
- AlphaConstantTestNode : Verify some value within the event against a constant value defined inside the node.
- AlphaSwitchNode : Act as a switch case statement. A specific value within the event will be checked against multiple cases.
- BetaActionNode : Specify the action to take against the event (We’ll see an example later)
- BetaResolutionNode : TBD
- BetaJoinNode : Jump to “BetaActionNode” nodes
- AlphaMemoryNode : Jump to “BetaJoinNode” nodes
- AlphaIntraEventTestNode : TBD
Each node type can have additional attributes. For example the “AlphaConstantTestNode” will often contain the following attributes :
- bop : Stands for “Binary Operation”. It can have one of the following values : “NOT_EQUAL”, “EQUAL”, “REGEX_LIKE”, “REGEX_NOT_LIKE”, “LESS_THAN”, “CONTAINS”, “NOT_CONTAINS”, “GREATER_THAN”, “GREATER_OR_EQUAL”, “LESS_OR_EQUAL”
- constantLiteral : Contains the value to compare against
Here is a an example where we are testing that the value of “signature_level_id” is not equal to 50.:
Now that we’ve seen the general gist of how things work, let’s do a walkthrough on a proper example. The following event has been captured by SEDR on an endpoint and parsed from the “.ldb”.
{
"artifacts": {
"actor": {
"object_type": "process",
"path": {
"value": [
"c:\\windows\\system32\\svchost.exe"
]
},
"normalized_path": {
"value": [
"CSIDL_SYSTEM\\svchost.exe"
]
},
"sha2": {
"value": [
"DD191A5B23DF92E12A8845291F2FB5ED423B76A28A5A464418442584AFD1E048"
]
},
"md5": {
"value": [
"9520A99E87D7196E5D09833146424113"
]
},
"user_sid": {
"value": [
"S-1-5-18"
]
},
"user_name": {
"value": [
"SYSTEM"
]
},
"user_domain": {
"value": [
"NT AUTHORITY"
]
},
"file_id": {
"value": [
281273379610221
]
},
"size": {
"value": [
53744
]
},
"session_id": {
"value": [
0
]
},
"pid": {
"value": [
2344
]
},
"uid": {
"value": [
"73324417-689B-F1EB-AB5E-602D86CC3F92"
]
},
"created": {
"value": "2019-03-19T04:44:33.676Z"
},
"modified": {
"value": "2019-03-19T04:44:33.676Z"
},
"security_descriptor": {
"value": [
"O:S-1-5-5-0-210746G:SYD:(A;;0x1fffff;;;S-1-5-5-0-111111)(A;;0x1400;;;BA)S:AI"
]
},
"signature_company_name": {
"value": [
"Microsoft Windows Publisher"
]
},
"signature_value_ids": {
"value": [
3,
5
]
},
"signature_level_id": {
"value": [
60
]
},
"cmd_line": {
"value": [
"C:\\WINDOWS\\system32\\svchost.exe -k netsvcs -p -s Schedule"
]
},
"original_name": {
"value": [
"svchost.exe"
]
},
"integrity_id": {
"value": [
6
]
},
"start_time": "2021-02-06T16:51:01.623Z",
"tid": {
"value": [
19480
]
}
},
"target": {
"object_type": "process",
"path": {
"value": [
"c:\\program files (x86)\\google\\update\\googleupdate.exe"
]
},
"normalized_path": {
"value": [
"CSIDL_PROGRAM_FILES\\google\\update\\googleupdate.exe"
]
},
"sha2": {
"value": [
"794CF7644115198DB451431ACF5F89FF9A97550482B1E3F7F13EB7ACA6120A11"
]
},
"md5": {
"value": [
"82F657B0AEE67A6A560321CF0927F9F7"
]
},
"user_sid": {
"value": [
"S-1-5-18"
]
},
"user_name": {
"value": [
"SYSTEM"
]
},
"user_domain": {
"value": [
"NT AUTHORITY"
]
},
"file_id": {
"value": [
281634986915819
]
},
"size": {
"value": [
154920
]
},
"session_id": {
"value": [
0
]
},
"pid": {
"value": [
17964
]
},
"uid": {
"value": [
"11111111-1111-1111-1111-111111111111"
]
},
"created": {
"value": "2019-10-03T07:50:16.921Z"
},
"modified": {
"value": "2019-10-03T07:50:16.765Z"
},
"security_descriptor": {
"value": [
"O:SYG:SYD:(A;;0x1fffff;;;SY)(A;;RC;;;OW)(A;;0x1fffff;;;S-1-5-11-1111111111-1111111111-111111111-1111111111-111111111)S:AI"
]
},
"signature_company_name": {
"value": [
"Google Inc"
]
},
"signature_value_ids": {
"value": [
3
]
},
"signature_level_id": {
"value": [
40
]
},
"cmd_line": {
"value": [
"C:\\Program Files (x86)\\Google\\Update\\GoogleUpdate.exe\\ /ua /installsource scheduler"
]
},
"original_name": {
"value": [
"GoogleUpdate.exe"
]
},
"integrity_id": {
"value": [
6
]
},
"start_time": "2021-02-11T07:16:24.866Z"
}
},
"action": [
"launch"
],
"type_id": {
"value": [
8001
]
},
"id": {
"value": [
1
]
},
"begin_time": {
"value": "2021-02-11T07:16:24.866Z"
},
"correlation_uid": {
"value": [
"XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"
]
},
"device_user_idle": true,
"receive_time": {
"value": "2021-02-11T07:16:24.907Z"
},
"timezone": -60,
"ref_uid": {
"value": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"
},
"seq_num": XXXXXX,
"edr_ver": "XXXXXX"
}
The idea is to start from the first node and follow the paths until we reach the enrichment rule(s) that correctly maps to this event.
Note that for the sake of brevity I will not be detailing all the possible choices / paths that an event could pass through. Instead I show only the correct path taken after performing all the tests.
Let’s start with “nodeID : 1”
{
"nodeID": 1,
"childrenIDs": [
2,
286225
],
"parentID": 0,
"nodeType": "AlphaNoTestNode"
}
The first node is an “AlphaNoTestNode” which means that we’ll switch directly to the nodes specified inside the “childrenIDs”. Next is “nodeID :2”
{
"nodeID": 2,
"childrenIDs": [
286392,
248750,
223501,
248844,
223496,
262526,
285418,
223507,
262514,
259973,
225407,
3,
248874,
223504,
248879,
248847
],
"parentID": 1,
"nodeType": "AlphaConstantTestNode",
"field": [
"artifacts",
"actor",
"signature_level_id",
"value",
"[0]"
],
"bop": "NOT_EQUAL",
"constantLiteral": 50
}
This one is an “AlphaConstantTestNode” meaning we need to verify the value specified in the “field” attribute against the value of the “constantLiteral” attribute using the “bop” operation. In this case to jump to the next set of nodes “signature_level_id” must not be equal to “50”. In our example it is equal to “40”. This means we’ll jump to all the children nodes specified in the “childrenIDs”. For now let’s focus on “nodeID : 3”
{
"nodeID": 3,
"nodeType": "AlphaSwitchNode",
"parentID": 2,
"switchField": [
"action",
"[0]"
],
"switchCases": [
{
"caseLiteral": "statistic",
"caseChildNodeID": 248225
},
{
"caseLiteral": "set",
"caseChildNodeID": 15
},
{
"caseLiteral": "logon",
"caseChildNodeID": 262550
},
{
"caseLiteral": "launch",
"caseChildNodeID": 8506
},
{
"caseLiteral": "set_attributes",
"caseChildNodeID": 248211
},
{
"caseLiteral": "delete",
"caseChildNodeID": 4
},
{
"caseLiteral": "accept",
"caseChildNodeID": 279450
},
{
"caseLiteral": "modify",
"caseChildNodeID": 4101
},
{
"caseLiteral": "logoff",
"caseChildNodeID": 279983
},
{
"caseLiteral": "load",
"caseChildNodeID": 220446
},
{
"caseLiteral": "rename",
"caseChildNodeID": 248314
},
{
"caseLiteral": "create",
"caseChildNodeID": 24
},
{
"caseLiteral": "set_security",
"caseChildNodeID": 262472
},
{
"caseLiteral": "terminate",
"caseChildNodeID": 262507
},
{
"caseLiteral": "injection",
"caseChildNodeID": 8461
},
{
"caseLiteral": "close",
"caseChildNodeID": 262445
},
{
"caseLiteral": "connect",
"caseChildNodeID": 8518
},
{
"caseLiteral": "open",
"caseChildNodeID": 248882
}
]
}
This node is an “AlphaSwitchNode”. This means we’ll switch to a node based on a condition. In this case the “action” field must equal one of the cases specified in the “caseLiteral” field. In our example its equal to “launch” so we’ll jump to “nodeID : 8506”.
{
"nodeID": 8506,
"childrenIDs": [
260984,
261179,
223053,
260248,
260987,
223049,
225796,
223027,
260954,
260963,
285406,
220361,
260966,
223002,
220418,
260972,
260975,
223830,
223035,
260960,
260990,
225259,
281861,
261980,
223041,
223190,
222948,
220341,
248795,
223107,
222955,
220352,
223045,
260981,
260978,
284341,
220443,
225647,
225946,
225397,
223031,
222964,
260969,
225246
],
"parentID": 3,
"nodeType": "AlphaNoTestNode"
}
Once again its an “AlphaNoTestNode” node. So we jump to the next set of nodes again for the sake brevity we’ll be interested in “nodeID : 220341”
{
"nodeID": 220341,
"nodeType": "AlphaSwitchNode",
"parentID": 8506,
"switchField": {
"fieldType": "CalculatedField",
"function": {
"functionName": "TAIL_STRING",
"literalParameters": [
11
]
},
"parameterField": {
"fieldType": "CalculatedField",
"function": {
"functionName": "LOWERCASE",
"literalParameters": []
},
"parameterField": [
"artifacts",
"actor",
"normalized_path",
"value",
"[0]"
]
}
},
"switchCases": [
{
"caseLiteral": "\\dnscmd.exe",
"caseChildNodeID": 223695
},
{
"caseLiteral": "\\pcwrun.exe",
"caseChildNodeID": 223695
},
{
"caseLiteral": "outlook.exe",
"caseChildNodeID": 220386
},
{
"caseLiteral": "\\regasm.exe",
"caseChildNodeID": 223695
},
{
"caseLiteral": "winproj.exe",
"caseChildNodeID": 220386
},
{
"caseLiteral": "\\bginfo.exe",
"caseChildNodeID": 223695
},
{
"caseLiteral": "acrobat.exe",
"caseChildNodeID": 220386
},
{
"caseLiteral": "\\ieexec.exe",
"caseChildNodeID": 223695
},
{
"caseLiteral": "\\chrome.exe",
"caseChildNodeID": 260257
},
{
"caseLiteral": "\\pcalua.exe",
"caseChildNodeID": 223695
},
{
"caseLiteral": "\\appvlp.exe",
"caseChildNodeID": 223695
},
{
"caseLiteral": "svchost.exe",
"caseChildNodeID": 223166
},
{
"caseLiteral": "winword.exe",
"caseChildNodeID": 220386
},
{
"caseLiteral": "\\windbg.exe",
"caseChildNodeID": 279426
}
]
}
This is an interesting node to show because it contains function calls. As before this we’ll switch to the next node depending on the match between the string specified in the “caseLiteral” attribute and the “field” attribute. This time instead of taking the value directly from the event we need to apply some functions on it.
First is the “LOWERCASE” function. As the name suggest it’ll transform all the character in a string to lowercase. So we grab the field specified inside the “parameterField” and turn it to lower case.
CSIDL_SYSTEM\\svchost.exe ==> csidl_system\\svchost.exe
Then we apply the “TAIL_STRING” function that will in reverse order the number of character specified. In this case we only read “11” character starting from the end.
csidl_system\\svchost.exe ==> svchost.exe
Now we are ready to compare against the “caseLiteral” and in this case we get a match with “nodeID : 223166”
{
"nodeID": 223166,
"childrenIDs": [
223167
],
"parentID": 220341,
"nodeType": "AlphaNoTestNode"
}{
"nodeID": 223167,
"childrenIDs": [
223168
],
"parentID": 223166,
"nodeType": "AlphaConstantTestNode",
"field": {
"fieldType": "CalculatedField",
"function": {
"functionName": "LOWERCASE",
"literalParameters": []
},
"parameterField": {
"fieldType": "CalculatedField",
"function": {
"functionName": "EXTRACT_MATCH",
"literalParameters": [
".*svchost\\.exe.*-k\\s+netsvcs.*schedule.*",
0
]
},
"parameterField": {
"fieldType": "CalculatedField",
"function": {
"functionName": "LOWERCASE",
"literalParameters": []
},
"parameterField": [
"artifacts",
"actor",
"cmd_line",
"value",
"[0]"
]
}
}
},
"bop": "NOT_EQUAL",
"constantLiteral": ""
}
Following the same logic, next we need to verify that the command line matches the following regex
".*svchost\\.exe.*-k\\s+netsvcs.*schedule.*"
This is achieved using both the “LOWERCASE” and “EXTRACT_MATCH” functions. We’ll continue like this until we reach our destination a couple of jumps later.
{
"nodeID": 223170,
"childrenIDs": [],
"nodeType": "BetaActionNode",
"nextResolverNodeID": 223171,
"ruleSourceFile": "mitre_ttps.fl",
"ruleSourceLine": 216,
"ruleSourceColumn": 0,
"ruleName": "eScheduledTaskLaunch",
"ruleID": 998,
"ruleDesc": "Scheduled task launch detected",
"actions": [
{
"actionType": "send",
"betaIndex": 0,
"fieldsToAdd": [
{
"fieldName": [
"suspicion_score"
],
"fieldValue": {
"rvalType": "Literal",
"literalValue": 0
}
},
{
"fieldName": [
"category_id"
],
"fieldValue": {
"rvalType": "Literal",
"literalValue": 201
}
},
{
"fieldName": [
"category_name"
],
"fieldValue": {
"rvalType": "Literal",
"literalValue": "Generic Data to be sent to ATP"
}
}
]
}
]
}
This “BetaActionNode” contains the enrichment rule “eScheduledTaskLaunch” and other information to apply to the event. We went from having an event of a “Process launching something” to having more context into what was actually launched.
The same logic is applies to different events. Below is an example of regexps that the SEDR might look for to determine the enrichment rule for an event.
".*forfiles(?:\\.exe)?.*\\/c.*\\.lnk.*"".*wmic(\\.exe\\\"?)?.*\\s+/node.*"".*reg(\\.exe)?\\\"?\\s+(save|query|export)\\s+(hkey_local_machine|hklm)\\\\(sam|security)(\\\\|\\s+|$).*""procdump(64)?(\\.exe)?\\\"?\\s+(.*\\s+)?\\-ma(\\s+.*)?\\s+lsass\\.exe\\s+.*\\.dmp""psexec(64)?(\\.exe)?\\\"?\\s+(.*\\s+)?net\\s+(start|stop|pause|continue)\\s+.*"
Conclusion & Future Research
This conclude the first part of this blog post. This was just an overview and a glimpse of how things work under the hood. Hopefully more is to come as there are many aspects still worth exploring in this file and in the SEDR enrichment process in general. For example :
- DLL responsible parsing the events (Listener.dll).
- Aggregation functions.
- Enrichment engine within SEDR.
Keep an eye out for the rest of this series in the next few months.
[Spoilers]
These enrichment rules are used internally by the SEDR rule engine to create detections and incidents. Here are a couple of examples :
"rule": "(enriched_data.rule_name='eScripting' and event_actor.file.name='regsvr32.exe')"
"rule": "(enriched_data.rule_name='dedup_eSND_ePECreation' and event_actor.file.name='powershell.exe')"
"rule": "(enriched_data.rule_name='eGenericProcessLaunch' and (process.file.name='cscript.exe' or process.file.name='wscript.exe'))"
Thanks for reading. If you have any questions or want to discuss this you can catch me on twitter @nas_bench