Hello everyone,
I am having a quite hard time trying to parse a MalwareByte logs in CEF + KV format, since the kv pairs are separated by a simple space and several values contains spaces as well. Here a (reconstructed) example:
<13>Apr 8 14:59:06 cercer CEF: 0|Malwarebytes|Malwarebytes Endpoint Protection|Endpoint Protection 1.2.0.1193|Detection|PUP found|2|deviceExternalId=239dw57h9861fe48342534f dvchost=cercer deviceDnsDomain=fake.local dvcmac=458234F23E33 dvc=10.10.10.10 rt=Apr 08 2024 14:59:06 Z fileType=file cat=PUP act=found msg=PUP found\nFile: C:\\Users\\Gengis\\AppData\\Local\\Google\\Chrome\\User Data\\Default\\Sync Data\\Rott\\in.ldb\nMD5: HEHEDEE3DE24DE4343FHT3TT3HTW\nSHA256:R3OU4HTIF39U4TND3487H387S64HDE9CU309JV4UT9F0V4KUY5GTJ894YHV3JTY9 filePath=C:\\Users\\Gengis\\AppData\\Local\\Google\\Chrome\\User Data\\Default\\Sync Data\\Rott\\1NCu.ldb cs1Label=Detection name cs1=PUP.Optional.PushNotifications.Generic cs3Label=Detection ID cs3=ijbf4398-7ryn-3944-38fy-n3g48ygr3uyj
I tried several approaches to solve this, but could not make it work. Big problem is the regex captuing functions do not work, so trying regex patterns like
gsub => ["inner_message", "(\\w=)", ",\\1"]
to modify the separator char are useless.
Is there any other peculiar function or trick that I am missing? I see there are several prebuilt parser working on CEF formats, so there must be a way around this...
Many thanks
A
Solved! Go to Solution.
You're on the right track! We need to use gsub to replace the "spaces" as our field_split. I quickly wrote the below example that replaces them with "^" and then parses using kv extraction. Here's the parser snippet and corresponding output it yielded.
mutate {
gsub => ["message", " ([a-zA-Z0-9]+)=", "^$1="]
}
kv {
source => "message"
field_split => "^"
value_split => "="
}
My statedump yields the following output (you'll still have to parse CEF headers, but it solves for the KV problem)
Internal State (label=):
{
"\u003c13\u003eApr 8 14:59:06 cercer CEF: 0|Malwarebytes|Malwarebytes Endpoint Protection|Endpoint Protection 1": {
"2": {
"0": {
"1193|Detection|PUP found|2|deviceExternalId": "239dw57h9861fe48342534f"
}
}
},
"@createTimestamp": {
"nanos": 0,
"seconds": 1712754380
},
"@enableCbnForLoop": true,
"@onErrorCount": 0,
"@output": [],
"@timezone": "",
"act": "found",
"cat": "PUP",
"cs1": "PUP.Optional.PushNotifications.Generic",
"cs1Label": "Detection name",
"cs3": "ijbf4398-7ryn-3944-38fy-n3g48ygr3uyj",
"cs3Label": "Detection ID",
"deviceDnsDomain": "fake.local",
"dvc": "10.10.10.10",
"dvchost": "cercer",
"dvcmac": "458234F23E33",
"filePath": "C:\\\\Users\\\\Gengis\\\\AppData\\\\Local\\\\Google\\\\Chrome\\\\User Data\\\\Default\\\\Sync Data\\\\Rott\\\\1NCu.ldb",
"fileType": "file",
"message": "\u003c13\u003eApr 8 14:59:06 cercer CEF: 0|Malwarebytes|Malwarebytes Endpoint Protection|Endpoint Protection 1.2.0.1193|Detection|PUP found|2|deviceExternalId=239dw57h9861fe48342534f^dvchost=cercer^deviceDnsDomain=fake.local^dvcmac=458234F23E33^dvc=10.10.10.10^rt=Apr 08 2024 14:59:06 Z^fileType=file^cat=PUP^act=found^msg=PUP found\\nFile: C:\\\\Users\\\\Gengis\\\\AppData\\\\Local\\\\Google\\\\Chrome\\\\User Data\\\\Default\\\\Sync Data\\\\Rott\\\\in.ldb\\nMD5: HEHEDEE3DE24DE4343FHT3TT3HTW\\nSHA256:R3OU4HTIF39U4TND3487H387S64HDE9CU309JV4UT9F0V4KUY5GTJ894YHV3JTY9^filePath=C:\\\\Users\\\\Gengis\\\\AppData\\\\Local\\\\Google\\\\Chrome\\\\User Data\\\\Default\\\\Sync Data\\\\Rott\\\\1NCu.ldb^cs1Label=Detection name^cs1=PUP.Optional.PushNotifications.Generic^cs3Label=Detection ID^cs3=ijbf4398-7ryn-3944-38fy-n3g48ygr3uyj \n",
"msg": "PUP found\\nFile: C:\\\\Users\\\\Gengis\\\\AppData\\\\Local\\\\Google\\\\Chrome\\\\User Data\\\\Default\\\\Sync Data\\\\Rott\\\\in.ldb\\nMD5: HEHEDEE3DE24DE4343FHT3TT3HTW\\nSHA256:R3OU4HTIF39U4TND3487H387S64HDE9CU309JV4UT9F0V4KUY5GTJ894YHV3JTY9",
"node": "",
"rt": "Apr 08 2024 14:59:06 Z"
}
You're on the right track! We need to use gsub to replace the "spaces" as our field_split. I quickly wrote the below example that replaces them with "^" and then parses using kv extraction. Here's the parser snippet and corresponding output it yielded.
mutate {
gsub => ["message", " ([a-zA-Z0-9]+)=", "^$1="]
}
kv {
source => "message"
field_split => "^"
value_split => "="
}
My statedump yields the following output (you'll still have to parse CEF headers, but it solves for the KV problem)
Internal State (label=):
{
"\u003c13\u003eApr 8 14:59:06 cercer CEF: 0|Malwarebytes|Malwarebytes Endpoint Protection|Endpoint Protection 1": {
"2": {
"0": {
"1193|Detection|PUP found|2|deviceExternalId": "239dw57h9861fe48342534f"
}
}
},
"@createTimestamp": {
"nanos": 0,
"seconds": 1712754380
},
"@enableCbnForLoop": true,
"@onErrorCount": 0,
"@output": [],
"@timezone": "",
"act": "found",
"cat": "PUP",
"cs1": "PUP.Optional.PushNotifications.Generic",
"cs1Label": "Detection name",
"cs3": "ijbf4398-7ryn-3944-38fy-n3g48ygr3uyj",
"cs3Label": "Detection ID",
"deviceDnsDomain": "fake.local",
"dvc": "10.10.10.10",
"dvchost": "cercer",
"dvcmac": "458234F23E33",
"filePath": "C:\\\\Users\\\\Gengis\\\\AppData\\\\Local\\\\Google\\\\Chrome\\\\User Data\\\\Default\\\\Sync Data\\\\Rott\\\\1NCu.ldb",
"fileType": "file",
"message": "\u003c13\u003eApr 8 14:59:06 cercer CEF: 0|Malwarebytes|Malwarebytes Endpoint Protection|Endpoint Protection 1.2.0.1193|Detection|PUP found|2|deviceExternalId=239dw57h9861fe48342534f^dvchost=cercer^deviceDnsDomain=fake.local^dvcmac=458234F23E33^dvc=10.10.10.10^rt=Apr 08 2024 14:59:06 Z^fileType=file^cat=PUP^act=found^msg=PUP found\\nFile: C:\\\\Users\\\\Gengis\\\\AppData\\\\Local\\\\Google\\\\Chrome\\\\User Data\\\\Default\\\\Sync Data\\\\Rott\\\\in.ldb\\nMD5: HEHEDEE3DE24DE4343FHT3TT3HTW\\nSHA256:R3OU4HTIF39U4TND3487H387S64HDE9CU309JV4UT9F0V4KUY5GTJ894YHV3JTY9^filePath=C:\\\\Users\\\\Gengis\\\\AppData\\\\Local\\\\Google\\\\Chrome\\\\User Data\\\\Default\\\\Sync Data\\\\Rott\\\\1NCu.ldb^cs1Label=Detection name^cs1=PUP.Optional.PushNotifications.Generic^cs3Label=Detection ID^cs3=ijbf4398-7ryn-3944-38fy-n3g48ygr3uyj \n",
"msg": "PUP found\\nFile: C:\\\\Users\\\\Gengis\\\\AppData\\\\Local\\\\Google\\\\Chrome\\\\User Data\\\\Default\\\\Sync Data\\\\Rott\\\\in.ldb\\nMD5: HEHEDEE3DE24DE4343FHT3TT3HTW\\nSHA256:R3OU4HTIF39U4TND3487H387S64HDE9CU309JV4UT9F0V4KUY5GTJ894YHV3JTY9",
"node": "",
"rt": "Apr 08 2024 14:59:06 Z"
}
That's great @mikewilusz! thanks so much for the fast solution.
So it cames out that the capturing group functions are available, do you confirm? I must have misread about it!
Many thanks again!
A
Correct, capture groups are supported. You can note the usage of "$1" to reference the capture group I used to get the field name.
-mike