Parsing CEF format or other fields containing separator character within

Hello everyone,

I am having a quite hard time trying to parse a MalwareByte logs in CEF + KV format, since the kv pairs are  separated by a simple space and several values contains spaces as well. Here a (reconstructed) example:

 

 

<13>Apr  8 14:59:06 cercer CEF: 0|Malwarebytes|Malwarebytes Endpoint Protection|Endpoint Protection 1.2.0.1193|Detection|PUP found|2|deviceExternalId=239dw57h9861fe48342534f dvchost=cercer deviceDnsDomain=fake.local dvcmac=458234F23E33 dvc=10.10.10.10 rt=Apr 08 2024 14:59:06 Z fileType=file cat=PUP act=found msg=PUP found\nFile: C:\\Users\\Gengis\\AppData\\Local\\Google\\Chrome\\User Data\\Default\\Sync Data\\Rott\\in.ldb\nMD5: HEHEDEE3DE24DE4343FHT3TT3HTW\nSHA256:R3OU4HTIF39U4TND3487H387S64HDE9CU309JV4UT9F0V4KUY5GTJ894YHV3JTY9 filePath=C:\\Users\\Gengis\\AppData\\Local\\Google\\Chrome\\User Data\\Default\\Sync Data\\Rott\\1NCu.ldb cs1Label=Detection name cs1=PUP.Optional.PushNotifications.Generic cs3Label=Detection ID cs3=ijbf4398-7ryn-3944-38fy-n3g48ygr3uyj 

 

 

 

I tried several approaches to solve this, but could not make it work. Big problem is the regex captuing functions do not work, so trying regex patterns like

 

 

gsub => ["inner_message", "(\\w=)", ",\\1"]

 

 

to modify the separator char are useless.

Is there any other peculiar function or trick that I am missing? I see there are several prebuilt parser working on CEF formats, so there must be a way around this...

 

Many thanks

 

A

Solved Solved
1 3 91
1 ACCEPTED SOLUTION

You're on the right track! We need to use gsub to replace the "spaces" as our field_split. I quickly wrote the below example that replaces them with "^" and then parses using kv extraction. Here's the parser snippet and corresponding output it yielded.

mutate {
   gsub => ["message", " ([a-zA-Z0-9]+)=", "^$1="]
}
kv {
   source => "message"
   field_split => "^"
   value_split => "="
}

My statedump yields the following output (you'll still have to parse CEF headers, but it solves for the KV problem)

Internal State (label=):

{
  "\u003c13\u003eApr  8 14:59:06 cercer CEF: 0|Malwarebytes|Malwarebytes Endpoint Protection|Endpoint Protection 1": {
    "2": {
      "0": {
        "1193|Detection|PUP found|2|deviceExternalId": "239dw57h9861fe48342534f"
      }
    }
  },
  "@createTimestamp": {
    "nanos": 0,
    "seconds": 1712754380
  },
  "@enableCbnForLoop": true,
  "@onErrorCount": 0,
  "@output": [],
  "@timezone": "",
  "act": "found",
  "cat": "PUP",
  "cs1": "PUP.Optional.PushNotifications.Generic",
  "cs1Label": "Detection name",
  "cs3": "ijbf4398-7ryn-3944-38fy-n3g48ygr3uyj",
  "cs3Label": "Detection ID",
  "deviceDnsDomain": "fake.local",
  "dvc": "10.10.10.10",
  "dvchost": "cercer",
  "dvcmac": "458234F23E33",
  "filePath": "C:\\\\Users\\\\Gengis\\\\AppData\\\\Local\\\\Google\\\\Chrome\\\\User Data\\\\Default\\\\Sync Data\\\\Rott\\\\1NCu.ldb",
  "fileType": "file",
  "message": "\u003c13\u003eApr  8 14:59:06 cercer CEF: 0|Malwarebytes|Malwarebytes Endpoint Protection|Endpoint Protection 1.2.0.1193|Detection|PUP found|2|deviceExternalId=239dw57h9861fe48342534f^dvchost=cercer^deviceDnsDomain=fake.local^dvcmac=458234F23E33^dvc=10.10.10.10^rt=Apr 08 2024 14:59:06 Z^fileType=file^cat=PUP^act=found^msg=PUP found\\nFile: C:\\\\Users\\\\Gengis\\\\AppData\\\\Local\\\\Google\\\\Chrome\\\\User Data\\\\Default\\\\Sync Data\\\\Rott\\\\in.ldb\\nMD5: HEHEDEE3DE24DE4343FHT3TT3HTW\\nSHA256:R3OU4HTIF39U4TND3487H387S64HDE9CU309JV4UT9F0V4KUY5GTJ894YHV3JTY9^filePath=C:\\\\Users\\\\Gengis\\\\AppData\\\\Local\\\\Google\\\\Chrome\\\\User Data\\\\Default\\\\Sync Data\\\\Rott\\\\1NCu.ldb^cs1Label=Detection name^cs1=PUP.Optional.PushNotifications.Generic^cs3Label=Detection ID^cs3=ijbf4398-7ryn-3944-38fy-n3g48ygr3uyj \n",
  "msg": "PUP found\\nFile: C:\\\\Users\\\\Gengis\\\\AppData\\\\Local\\\\Google\\\\Chrome\\\\User Data\\\\Default\\\\Sync Data\\\\Rott\\\\in.ldb\\nMD5: HEHEDEE3DE24DE4343FHT3TT3HTW\\nSHA256:R3OU4HTIF39U4TND3487H387S64HDE9CU309JV4UT9F0V4KUY5GTJ894YHV3JTY9",
  "node": "",
  "rt": "Apr 08 2024 14:59:06 Z"
}

 

View solution in original post

3 REPLIES 3

You're on the right track! We need to use gsub to replace the "spaces" as our field_split. I quickly wrote the below example that replaces them with "^" and then parses using kv extraction. Here's the parser snippet and corresponding output it yielded.

mutate {
   gsub => ["message", " ([a-zA-Z0-9]+)=", "^$1="]
}
kv {
   source => "message"
   field_split => "^"
   value_split => "="
}

My statedump yields the following output (you'll still have to parse CEF headers, but it solves for the KV problem)

Internal State (label=):

{
  "\u003c13\u003eApr  8 14:59:06 cercer CEF: 0|Malwarebytes|Malwarebytes Endpoint Protection|Endpoint Protection 1": {
    "2": {
      "0": {
        "1193|Detection|PUP found|2|deviceExternalId": "239dw57h9861fe48342534f"
      }
    }
  },
  "@createTimestamp": {
    "nanos": 0,
    "seconds": 1712754380
  },
  "@enableCbnForLoop": true,
  "@onErrorCount": 0,
  "@output": [],
  "@timezone": "",
  "act": "found",
  "cat": "PUP",
  "cs1": "PUP.Optional.PushNotifications.Generic",
  "cs1Label": "Detection name",
  "cs3": "ijbf4398-7ryn-3944-38fy-n3g48ygr3uyj",
  "cs3Label": "Detection ID",
  "deviceDnsDomain": "fake.local",
  "dvc": "10.10.10.10",
  "dvchost": "cercer",
  "dvcmac": "458234F23E33",
  "filePath": "C:\\\\Users\\\\Gengis\\\\AppData\\\\Local\\\\Google\\\\Chrome\\\\User Data\\\\Default\\\\Sync Data\\\\Rott\\\\1NCu.ldb",
  "fileType": "file",
  "message": "\u003c13\u003eApr  8 14:59:06 cercer CEF: 0|Malwarebytes|Malwarebytes Endpoint Protection|Endpoint Protection 1.2.0.1193|Detection|PUP found|2|deviceExternalId=239dw57h9861fe48342534f^dvchost=cercer^deviceDnsDomain=fake.local^dvcmac=458234F23E33^dvc=10.10.10.10^rt=Apr 08 2024 14:59:06 Z^fileType=file^cat=PUP^act=found^msg=PUP found\\nFile: C:\\\\Users\\\\Gengis\\\\AppData\\\\Local\\\\Google\\\\Chrome\\\\User Data\\\\Default\\\\Sync Data\\\\Rott\\\\in.ldb\\nMD5: HEHEDEE3DE24DE4343FHT3TT3HTW\\nSHA256:R3OU4HTIF39U4TND3487H387S64HDE9CU309JV4UT9F0V4KUY5GTJ894YHV3JTY9^filePath=C:\\\\Users\\\\Gengis\\\\AppData\\\\Local\\\\Google\\\\Chrome\\\\User Data\\\\Default\\\\Sync Data\\\\Rott\\\\1NCu.ldb^cs1Label=Detection name^cs1=PUP.Optional.PushNotifications.Generic^cs3Label=Detection ID^cs3=ijbf4398-7ryn-3944-38fy-n3g48ygr3uyj \n",
  "msg": "PUP found\\nFile: C:\\\\Users\\\\Gengis\\\\AppData\\\\Local\\\\Google\\\\Chrome\\\\User Data\\\\Default\\\\Sync Data\\\\Rott\\\\in.ldb\\nMD5: HEHEDEE3DE24DE4343FHT3TT3HTW\\nSHA256:R3OU4HTIF39U4TND3487H387S64HDE9CU309JV4UT9F0V4KUY5GTJ894YHV3JTY9",
  "node": "",
  "rt": "Apr 08 2024 14:59:06 Z"
}

 

That's great @mikewilusz! thanks so much for the fast solution. 

So it cames out that the capturing group functions are available, do you confirm? I must have misread  about it!

Many thanks again!

 

A

 

Correct, capture groups are supported. You can note the usage of "$1" to reference the capture group I used to get the field name.

-mike