文章目录[隐藏]
- 1 说明
- 2 局限性
- 3 功能改进
- 4 代码
项目地址:jupyter-collection/iocextractor at main · fr0gger/jupyter-collection (github.com)
1 说明
在安全领域中,个人或机构组织每周都会发出威胁情报报告,数量比较多,这些威胁情报报告中包含许多非常有价值的 IOC 情报,这些 IOC 能在一些 blog 结尾处或者给出的补充文档找到,有些很短,有些很长,但不管怎样,手动进行复制粘贴这些内容显得有点力不从心,好消息的是,在 Github 上有一些 IOC 自动提取器,以下只是做个小笔记展示如何使用 MSTICpy 库中的 IOCextractor 模块从一个链接当中取出 IOCs,包括其它任何源。
2 局限性
由于这是开发的早期阶段,从 URL 中提取的 IOC 可能并不全是恶意的,因为提取器无法区分恶意 URL 和合法 URL。为了克服这个问题,我添加了一个白名单,用于删除任何提取出来的错误数据,但这当然取决于 URL,可能需要过滤掉更多内容。
3 功能改进
- 改善提取
- 减少提取出来的错误数据
- 从多个源 (PDF、文本) 中提取
- 添加额外的正则表达式
- 添加多个导出
4 代码
将代码克隆到本地,安装好依赖的库:
安装好工具的依赖库
在 ipython 控制台中运行以下代码:
代码语言:javascript复制# Imports and configuration
import os
import glob
import requests
import json
import re
import ipywidgets as widgets
import pandas as pd
from ipywidgets import Button, Layout, Checkbox
from IPython.display import display, HTML
from bs4 import BeautifulSoup
from msticpy.sectools import IoCExtract
代码语言:javascript复制# Loading Whitelists
searchdir = "whitelists/whitelist_*.txt"
fpaths = glob.glob(searchdir)
patterns = []
# compiling the whitelist in one list
for fpath in fpaths:
t = os.path.splitext(fpath)[0].split('_',1)[1]
patterns = [line.strip() for line in open(fpath)]
代码语言:javascript复制# Initiate the IOC extractor
ioc_extractor = IoCExtract()
# Adding btc regex
ioc_extractor.add_ioc_type(ioc_type='btc', ioc_regex='^(?:[13]{1}[a-km-zA-HJ-NP-Z1-9]{26,33}|bc1[a-z0-9]{39,59})$')
# Configure widget
keyword = widgets.Text(
value = "",
placeholder = 'Enter the URL',
description = 'Extract IOCs:',
layout = Layout(width='90%', height='40px'),
disabled = False
)
display(keyword)
#Configure checkbox
checkbox_json = widgets.Checkbox(value = False, description="Json")
display(checkbox_json)
checkbox_table = widgets.Checkbox(value = False, description="Table")
display(checkbox_table)
# Configure click button
button = widgets.Button(description = "Extract IOCs", display='flex', layout = Layout(width='20%', height='40px', flex='3 1 0%'), icon = 'check', button_style='primary')
output = widgets.Output()
# Box layout
box_layout = widgets.Layout(display = 'flex', flex_flow='column', align_items='center', width='100%')
box = widgets.HBox(children = [button], layout = box_layout)
display(box)
# Searching for the input url
@output.capture()
def userInput(b):
try:
# Request to the url
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
result = requests.get(keyword.value, headers=headers)
soup = BeautifulSoup(result.text, 'html.parser')
print("[ ] Extracting IOC from: " keyword.value)
iocs_found = ioc_extractor.extract(str(soup.get_text()))
if iocs_found:
#removing element present into the whitelists
for k, v in iocs_found.items():
for i in iocs_found[k].copy():
for w in patterns:
w = re.compile(w)
test = re.findall(w, i)
if test:
try:
iocs_found[k].remove(str(i))
except:
pass
display(HTML('<h4> nPotential IoCs found: </h4>'))
# Get JSON Result
if checkbox_json.value is True:
ioc = {}
for k, v in iocs_found.items():
value = []
for i in iocs_found[k].copy():
value.append(i)
ioc[k] = value
jsonioc = json.dumps(ioc, indent=4, sort_keys=True)
print(jsonioc)
# Get table Result
if checkbox_table.value is True:
ioctable = pd.DataFrame([])
for k, v in iocs_found.items():
for i in iocs_found[k].copy():
ioc = {}
ioc[k] = i
data = pd.DataFrame(ioc.items())
ioctable = ioctable.append(data)
display(ioctable)
else:
print("no IOC found!")
except requests.exceptions.RequestException as e:
print(e)
except(AttributeError, KeyError) as er:
print(er)
# get the input url
button.on_click(userInput)
display(output)
白名单 url
代码语言:javascript复制^https?://www.fireeye.com/
^https?://blog.fireeye.com/
^httpv://www.symantec.com/
^https?://blog.kaspersky.com/
^https?://blog.trendmicro.com/
^https?://blogs.rsa.com/
^https?://www.trendmicro.com/
^https?://blog.trendmicro.com/
^https?://blogs.norman.com/
^https?://www.securelist.com/
^https?://www.mcafee.com/
^https?://blog.crysys.hu/
^https?://blogs.cisco.com
^https?://tools.cisco.com/security/
^https?://www.secureworks.com/research/
^https?://threatexpert.com/
^https?://www.f-secure.com/weblog/
^https?://nakedsecurity.sophos.com/
^https?://blog.eset.com/
^https?://www.gdata.de/
^https?://www.sophos.com/
^https?://normanshark.com/
^https?://www.cve.mitre.org/
^https?://www.virusbtn.com/pdf/
^https?://www.blackhat.com/presentations/
^https?://www.usenix.org/
^https?://blogs.sans.org/
^https?://www.shadowserver.org/
^https?://contagiodump.blogspot.com/
^https?://support.clean-mx.de/
^https?://lists.clean-mx.com/
^https?://citizenlab.org/
^https?://www.eff.org/document/
^https?://www.exploit-db.com/exploits/
^https?://www.adobe.com/support/security/
^https?://krebsonsecurity.com/
^https?://en.wikipedia.org/wiki/
^https?://www.google.com/
^https?://blogger.googleusercontent.com/
^https?://apis.google.com/
^https?://(?:[w-_] .) (?:google.com|google-analytics|googleapis).com/
^https?://(?:[w-_] .) (?:talosintelligence|snort|blogger).com/
^http://(?:[w-_] .) (?:talosintelligence).com/
^http://(?:[w-_] .) (?:talosintelligence).com
^https?://talosintelligence.com/
^https?://www.talosintelligence.com/
^https?://www.talosintelligence.com
^https?://blog.talosintelligence.com/
^https?://blog.talosintelligence.com
^https?://www.youtube.com/
^https?://(?:[w-_] .) (?:snort).org/
^https://(?:[w-_] .) (?:w3).org/
^https?://www.linkedin.com/
^https?://schema.org/
^https?://(?:[w-_] .) (?:clamav).net/
^https?://www.reddit.com/
^https?://www.w3.org/
^https?://twitter.com/
^https?://www.facebook.com/
^https?://snort.org/
^https?://cisco.com/
^https?://www.cisco.com/
^https?://www.blogger.com
^https?://talosintel.com/
^https?://www.talosintel.com
^https?://static.cloudflareinsights.com/
^https?://www.spamcop.net/
^https?://(?:[w-_] .) (?:blogblog).com/
^https?://www.welivesecurity.com
^https?://www.sentinelone.com
^https?://Microsoft.net
声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。