LNDb2020——肺结节自动诊断分析

今天将分享CT图像肺结节自动诊断分析完整实现版本，为了方便大家学习理解整个流程，将整个流程步骤进行了整理，并给出详细的步骤结果。感兴趣的朋友赶紧动手试一试吧。

一、LNDb2020介绍

肺癌是全球男性和女性最致命的癌症类型。与其他癌症类型相比，提高肺癌存活率的进展是出了名的缓慢，主要是由于疾病的晚期诊断。长期以来，低剂量计算机断层扫描（CT）一直被认为是一种潜在的早期筛查工具，并且已被证明对于肺癌风险人群的肺癌死亡率降低了20%。然而，由于设备和人员成本以及任务的复杂性，将这些筛查计划推广到普通人群是具有挑战性的。也就是说，肺结节呈现出多种形状和特征，因此这些异常的识别和表征并非易事，并且容易引起观察者间的高度变异性。因此，计算机辅助诊断（CAD）系统可以通过减轻临床医生的负担并提供独立的第二意见来促进筛查计划的采用和推广。

二、LNDb2020任务

主挑战-Fleischner分类：从胸部CT扫描中，根据2017年Fleischner指南预测正确的随访，根据结节数量（单个或多发）、结节体积和结节纹理分成四个类别（0：根据患者风险无需常规随访或在12个月进行CT扫描，1、需要6-12个月的CT扫描，2、需要3-6个月的CT扫描，3、需要3个月的CT扫描，PET/CT或组织取样）。

子挑战A-肺结节检测：通过胸部 CT 扫描，检测肺结节。

子挑战B-肺结节分割：根据给定肺结节质心的列表，分割肺结节。

子挑战C-肺结节纹理表征：根据给定肺结节质心列表，将肺结节分为三个纹理类别——实性结节、亚实性结节和GGO（毛玻璃浑浊）。

三、LNDb2020数据集

LNDb数据集包含2016年至2018年间在葡萄牙波尔图圣若昂医院（CHUSJ）回顾性收集的294次CT扫描。所有数据均在CHUSJ道德委员会的批准下获得，并在进行任何分析之前进行匿名处理，以删除除患者出生年份和性别以外的个人信息。

至少有一名放射科医生读取了一例CT扫描，以识别肺结节和其他可疑病变。共有5名放射科医生，他们至少有4年的经验，每周阅读多达30个CT，并参与了整个项目的注释过程。注释以单一盲法进行，即放射科医生将读取一次CT扫描，放射科医生之间没有达成共识或审查。每次扫描都由至少一名放射科医生读取。注释过程因类别而异。根据 LIDC-IDRI 对 ⩾3mm 结节进行分割和主观表征（对细微程度、内部结构、钙化、球形度、边缘、分叶、针状、质地和恶性肿瘤可能性的评级）。对于<3mm的结节，对结节质心进行标记，并对结节的特征进行主观评估。对于非结节，仅标记病变质心。鉴于不同的放射科医生可能阅读了相同的CT并且没有进行共识审查，预计放射科医生的注释存在差异。

CT数据是以MetaImage（.mhd / .raw）格式提供。文件名遵循 LNDb-XXXX.mhd 格式，其中 XXXX 是 LNDb CT ID。

单个结节注释是在csv文件（trainNodules.csv）中，该文件每行包含一个由放射科医生标记的发现。每行包含LNDb CT ID、标记发现的放射科医生（每个CT内从1到nrad编号）、发现的ID（每个放射科医生在每个CT中从1到nfind编号）、世界坐标中结节质心xyz坐标、无论是结节（1）还是非结节（0）、相应的结节体积和给出的结节纹理等级（1-5）。将LNDb注释中的5个类别（1-GGO、2-中间、3部分实性、4-中间、5-实性）将金标准肺结节5类纹理重新转换为三个类别，将GGO视为1-2，部分实性为3，实性为4-5。对于非结节，给出的纹理为 0。

合并不同放射科医生的注释后，结节注释列表可在csv文件（trainNodules_gt.csv）中找到，该文件每行包含一个结果。每行包含LNDb CT ID，标记结果的放射科医生（每个CT内的编号从1到nrad），结节上每个放射科医生的匹配结果的ID.csv，合并后的唯一结节ID （每个CT内的编号从1到nfind），世界坐标中结节质心xyz坐标，协议级别（注释每个发现的放射科医生人数，无论是结节（1）还是非结核（0），相应的结节体积和结核纹理（给出的纹理等级平均值），如果多个放射科医生确定了结核，则计算平均纹理，并通过将GGO视为<2.3（3），部分实性为2.3（3）-3.6（6）和实性为>3.6（6）来做为计算Fleischner指南类别的三类。对于非结节，给出的纹理为 0。

结节分割以元图像（*.mhd/*.raw）格式给出。每个LNDbXXXX_radR.mhd在CT XXXX上保存CT XXXX上所有结节的分割，在CT大小的3D数组中，其中每个像素的值是trainNodules.csv中发现的ID。

最后，Fleischner分数可以在一个单独的csv文件（trainFleischner.csv）上获得，该文件每行包含一个CT扫描。每行包含LNDb CT ID和Fleischner真实值分数。依据2017 年Fleischner 指南对 CT 扫描进行自动分类，以便患者随访建议。Fleischner 指南广泛用于结节发现的患者管理，由 4 类组成：

0）根据患者风险，无需常规随访或在12个月时进行可选的CT;

1）需要6-12个月的CT;

2）需要3-6个月的CT;

3）需要在3个月时进行CT，PET / CT或组织取样。

四、技术路线

根据trainCTs.csv文件将标注数据划分成196例训练集和40例验证集。

任务1、肺结节检测与分割

1、根据trainNodules.csv文件，读取每一例CT图像和全部标注Mask图像，并将全部标注Mask图像进行合并生成一个Mask图像结果。

2、去除多余的背景区域，只保留肺部区域ROI。

3、分析ROI图像，得到图像平均Spacing大小是0.65x0.65x1，因此将图像缩放到固定Spacing大小0.6x0.6x1。图像预处理，对步骤2的ROI图像进行（-1000，800）窗宽窗位截断，然后采用均值为0，方差为1的方式进行归一化处理，对每个图像随机裁切提取20个patch，大小为128x128x128。

4、搭建VNet3d网络，使用AdamW优化器，学习率是0.001，batchsize是6，epoch是300，损失函数采用二值化的dice和交叉熵。

5、训练结果和验证结果

6、验证集分割结果

左图是金标准分割结果，右图是预测分割结果。

有些CT数据图像上存在肺结节金标准漏检，但网络可以预测分割出肺结节。

任务2、肺结节纹理分类

1、根据trainNodules_gt.csv文件，读取对应CT图像和Mask图像，然后根据csv文件中的肺结节中心点坐标裁切出64x64x64大小的ROI图像和ROI Mask出来，同时将肺结节纹理0设置成标签0，1和2设置成标签1，3设置成标签2，4和5设置成标签3。统计训练集个数，标签0为373，标签1为34，标签2为46，标签3为564，验证集个数，标签0为78，标签1为7，标签2为11，标签3为106。如下所示为标签0，1，2，3对应效果图。

2、考虑到类别不平衡主要是在标签1和标签2，所以采用两个分类网络来实现是否是肺结节和肺结节三分类（磨玻璃结节，部分实性结节，实性结节）。

3、判断是否是肺结节任务，对裁切出来的图像采用均值为0，方差为1的方式进行归一化处理，并将标签1，2，3数据进行合并成为标签1和标签0组成训练数据和验证数据。

4、搭建ResNet3d网络，使用AdamW优化器，学习率是0.0001，batchsize是64，epoch是300，损失函数采用交叉熵。

5、训练结果和验证结果

6、验证集分类结果

precision recall f1-score support

0.0 0.84 0.81 0.82 78

1.0 0.88 0.90 0.89 124

accuracy 0.87 202

macro avg 0.86 0.86 0.86 202

weighted avg 0.87 0.87 0.87 202

7、判断肺结节三分类任务，对裁切出来的图像采用均值为0，方差为1的方式进行归一化处理，去除标签0数据，只保留标签1，2，3数据组成训练数据和验证数据，其中对标签1和2的训练数据使用10倍数据增强操作，例如旋转，缩放，平移，反转等操作。

8、搭建ResNet3d网络，使用AdamW优化器，学习率是0.0001，batchsize是64，epoch是300，损失函数采用交叉熵。

9、训练结果和验证结果

10、验证集分类结果

precision recall f1-score support

0.0 0.86 0.86 0.86 7

1.0 0.54 0.64 0.58 11

2.0 0.96 0.94 0.95 106

accuracy 0.91 124

macro avg 0.79 0.81 0.80 124

weighted avg 0.92 0.91 0.91 124

任务3、肺结节Fleischner 分类

1、对于验证集数据使用肺结节分割和肺结节纹理分类进行处理，得到如下所示结果。

2、将第1步成成的csv文件通过calcFleischner.py脚本计算得到Fleischner类别，得到如下所示结果。

代码语言：javascript复制

import numpy as np
import itertools
import math
from utils import readCsv
from readNoduleList import joinNodules

def calcCTFleischnerProb(nd,nn,v0,v1,v2,t0,t1,t2,verb=False):
    # Compute probability of belonging to each Fleischner class if given lists with:
    # nd: the probability of each finding being a nodule
    # nn: 1-nd
    # v0-v2: the probability of each finding belonging to each volume class
    # t0-t2: the probability of each finding belonging to each texture class
    p0 = 0.0
    p1 = 0.0
    p2 = 0.0
    p3 = 0.0
    
    vs = np.sum([v0,v1,v2],axis=0)
    ts = np.sum([t0,t1,t2],axis=0)
    v0=v0/vs
    v1=v1/vs
    v2=v2/vs
    t0=t0/ts
    t1=t1/ts
    t2=t2/ts
    
    # No nodules
    p0  = np.prod(nn)
    if verb:
        print('none',p0,p1,p2,p3)
    
    # Single nodule
    for i in range(len(nd)):
        p0  = np.prod(np.delete(nn,i))*nd[i]*v0[i]
        p1  = np.prod(np.delete(nn,i))*nd[i]*((v1[i] v2[i])*t0[i] v1[i]*t2[i])
        p2  = np.prod(np.delete(nn,i))*nd[i]*((v1[i] v2[i])*t1[i])
        p3  = np.prod(np.delete(nn,i))*nd[i]*v2[i]*t2[i]
    if verb:
        print('single',p0,p1,p2,p3)
    
    # Multiple nodules
    if len(nd)>1:
        combN = []
        indN = [i for i in range(len(nd))]
        for i in range(2,len(nd) 1):
            combN.extend(list(itertools.combinations(indN,i)))
        combN = [list(c) for c in combN]
        for cN in combN:
            cNN = [i for i in indN if not i in cN]
            # All Non Solid
            pans = np.prod(nd[cN]*(t0[cN] t1[cN]))*np.prod(nn[cNN])
            p3  = pans
            if verb:
                print('pans',p0,p1,p2,p3)
            # All Solid
            pas = np.prod(nd[cN]*t2[cN])*np.prod(nn[cNN])
            pasp0 = np.prod(nd[cN]*t2[cN]*v0[cN])*np.prod(nn[cNN])
            combV2 = []
            for i in range(1,len(cN) 1):
                combV2.extend(list(itertools.combinations(cN,i)))
            pasp3 = 0.0
            
            combV2 = [list(c) for c in combV2]
            for cV2 in combV2:
                cV01 = [i for i in cN if not i in cV2]
                pasp3  = np.prod(nd[cV2]*t2[cV2]*v2[cV2])*np.prod(nd[cV01]*t2[cV01]*(v0[cV01] v1[cV01]))*np.prod(nn[cNN])
            p0  = pasp0
            p3  = pasp3
            p2  = pas-pasp3-pasp0
            if verb:
                print('pas',p0,p1,p2,p3)
            # Mixed
            combT2 = []
            for i in range(1,len(cN)):
                combT2.extend(list(itertools.combinations(cN,i)))
            combT2 = [list(c) for c in combT2]
            for cT2 in combT2:
                cT01 = [i for i in cN if not i in cT2]
                pcomb = np.prod(nn[cNN])*np.prod(nd[cN])*np.prod(t2[cT2])*np.prod(t0[cT01] t1[cT01])
                pT2 = calcCTFleischnerProb_Mixed(v0[cT2],v1[cT2],v2[cT2],t0[cT2],t1[cT2],t2[cT2],'allsolid')
                pT01 = calcCTFleischnerProb_Mixed(v0[cT01],v1[cT01],v2[cT01],t0[cT01],t1[cT01],t2[cT01],'allnonsolid')
                p0  = pT2[0]*pT01[0]*pcomb
                p1  = (pT2[1]*pT01[0] pT2[0]*pT01[1] pT2[1]*pT01[1])*pcomb
                p2  = (pT2[2]*np.sum(pT01[:2]) np.sum(pT2[:2])*pT01[2] pT2[2]*pT01[2])*pcomb
                p3  = (pT2[3]*np.sum(pT01[:3]) np.sum(pT2[:3])*pT01[3] pT2[3]*pT01[3])*pcomb
            if verb:
                print('mxd',p0,p1,p2,p3)
    
    if verb:
        print('sum',p0 p1 p2 p3)
    return [p0,p1,p2,p3]

def calcCTFleischnerProb_Mixed(v0,v1,v2,t0,t1,t2,texstr,verb = False):
    # Compute probability of belonging to each Fleischner class for mixed nodule lists (called by calcCTFleischnerProb):
    p0 = 0.0
    p1 = 0.0
    p2 = 0.0
    p3 = 0.0
    
    if texstr=='allsolid':
        t0[:]=0
        t1[:]=0
        t2[:]=t2/t2
        t2[np.where(np.isnan(t2))] = 0
    else:
        ts = t0 t1
        t0=t0/ts
        t1=t1/ts        
        t0[np.where(np.isnan(t0))] = 0
        t1[np.where(np.isnan(t1))] = 0
        t2[:]=0
    
    if len(v0)==1: # Single nodule
        p0  = v0[0]
        p1  = (v1[0] v2[0])*t0[0] v1[0]*t2[0]
        p2  = (v1[0] v2[0])*t1[0]
        p3  = v2[0]*t2[0]
        if verb:
            print('single mixed',p0,p1,p2,p3)
    
    elif len(v0)>1: # Multiple nodules
        # All Non Solid
        pans = np.prod((t0 t1))
        p3  = pans
        if verb:
            print('pans mixed',p0,p1,p2,p3)
        # All Solid
        pas = np.prod(t2)
        pasp0 = np.prod(t2*v0)
        
        combV2 = []
        indN = [i for i in range(len(v0))]
        for i in range(1,len(v0) 1):
            combV2.extend(list(itertools.combinations(indN,i)))
        pasp3 = 0.0
        
        combV2 = [list(c) for c in combV2]
        for cV2 in combV2:
            cV01 = [i for i in indN if not i in cV2]
            pasp3  = np.prod(t2[cV2]*v2[cV2])*np.prod(t2[cV01]*(v0[cV01] v1[cV01]))
        p0  = pasp0
        p3  = pasp3
        p2  = pas-pasp3-pasp0
        if verb:
            print('pas mixed',p0,p1,p2,p3)
    
    return [p0,p1,p2,p3]

def calcCTFleischnerScore(nd,vclass,tclass,nodprob = 0.5):
    # Compute Fleischner class if given lists with:
    # nd: the probability of each finding being a nodule
    # vclass: each finding's volume class (0,1,2)
    # tclass: each finding's texture class (0,1,2)
    # nodprob: threshold to use to divide nodules/nonnodules
    vclass = vclass[nd>=nodprob]
    tclass = tclass[nd>=nodprob]
    nd = nd[nd>=nodprob]
    
    if len(nd)==0:# No nodules
        f = 0
    elif len(nd)==1: # Single nodule
        if vclass[0]==0: #small
            f = 0
        elif tclass[0]==1: #partsolid
            f = 2        
        elif tclass[0]==0 or vclass[0]==1: #ggo/solid medium
            f = 1
        else:
            f = 3 #large
    else: # Multiple nodules
        if np.all(tclass<2): #all nonsolid
            f = 2
        elif np.all(tclass==2): #all solid
            if np.all(vclass==0): #small
                f = 0
            elif np.any(vclass==2): #at least one big
                f = 3
            else: #otherwise
                f = 2
        else: #mixed nodules
            fs = calcCTFleischnerScore(nd[tclass==2],vclass[tclass==2],tclass[tclass==2])
            fns = calcCTFleischnerScore(nd[tclass<2],vclass[tclass<2],tclass[tclass<2])
            f = max(fs,fns)
        
    return f

def calcFleischner(nodules):
    # Compute Fleischner class for a list of nodules from different CTs (trainNodules_gt.csv):
    # nodules: list of nodules read from csv
    header = nodules[0]
    lines = nodules[1:]  
    
    indlnd = header.index('LNDbID')
    LND = np.asarray([line[indlnd] for line in lines])
    
    if 'Volume0' in header and 'Text0' in header:
        # if probabilities of volume and texture class are given compute Fleischner class probabilities
        Nd = np.asarray([float(line[header.index('Nodule')]) for line in lines])
        Nn = 1-Nd
        V0 = np.asarray([float(line[header.index('Volume0')]) for line in lines])
        V1 = np.asarray([float(line[header.index('Volume1')]) for line in lines])
        V2 = np.asarray([float(line[header.index('Volume2')]) for line in lines])
        T0 = np.asarray([float(line[header.index('Text0')]) for line in lines])
        T1 = np.asarray([float(line[header.index('Text1')]) for line in lines])
        T2 = np.asarray([float(line[header.index('Text2')]) for line in lines])
    elif 'Volume' in header and 'Text' in header:
        # if volume and texture are given compute Fleischner class
        Nd = np.asarray([float(line[header.index('Nodule')]) for line in lines])
        V = np.asarray([float(line[header.index('Volume')]) for line in lines])
        T = np.asarray([float(line[header.index('Text')]) for line in lines])
    elif 'VolumeClass' in header and 'TextClass' in header:
        # if volume and texture class are given compute Fleischner class
        Nd = np.asarray([float(line[header.index('Nodule')]) for line in lines])
        Vclass = np.asarray([float(line[header.index('VolumeClass')]) for line in lines])
        Tclass = np.asarray([float(line[header.index('TextClass')]) for line in lines])
    
    p0 = 0; p1 = 0; p2 = 0; p3 = 0
    fleischner = [['LNDbID','Fleischner','Fleischner0','Fleischner1','Fleischner2','Fleischner3']]
    for lndU in np.unique(LND):
        if 'Volume0' in header and 'Text0' in header:
            # if probabilities of volume and texture class are given compute Fleischner class probabilities
            nd = Nd[LND==lndU]
            nn = Nn[LND==lndU]
            v0 = V0[LND==lndU]
            v1 = V1[LND==lndU]
            v2 = V2[LND==lndU]
            t0 = T0[LND==lndU]
            t1 = T1[LND==lndU]
            t2 = T2[LND==lndU]
            [p0,p1,p2,p3] = calcCTFleischnerProb(nd,nn,v0,v1,v2,t0,t1,t2,verb=False)
            
            vclass = np.ndarray.flatten(np.argmax([v0,v1,v2], axis=0))
            tclass = np.ndarray.flatten(np.argmax([t0,t1,t2], axis=0))
        elif 'Volume' in header and 'Text' in header:
            # if volume and texture are given compute Fleischner class
            nd = Nd[LND==lndU]
            v = V[LND==lndU]
            t = T[LND==lndU]
            vclass = calcNodVolClass(v)
            tclass = calcNodTexClass(t)
            
        elif 'VolumeClass' in header and 'TextClass' in header:
            # if volume and texture class are given compute Fleischner class
            nd = Nd[LND==lndU]
            vclass = Vclass[LND==lndU]
            tclass = Tclass[LND==lndU]
            
        f = calcCTFleischnerScore(nd,vclass,tclass,nodprob=.5)
        
        print('LNDb {}'.format(lndU))
        print('Fleischner class: {}'.format(f))
        print('Fleischner class probabilities: {} {} {} {}'.format(p0,p1,p2,p3))
        # return csv with Fleischner score for each CT
        fleischner.append([lndU,f,p0,p1,p2,p3])
    return fleischner

def calcNodVolClass(nodvol):
    # Compute volume class from nodule volume nodvol
    volthr = [100,250]
    if isinstance(nodvol,float) or isinstance(nodvol,int):
        if nodvol>=volthr[0] and nodvol<volthr[1]:
            vclass = 1
        elif nodvol>=volthr[1]:
            vclass = 2
        else:
            vclass = 0
    else: #numpy array
        vclass = np.zeros(nodvol.shape)
        vclass[np.bitwise_and(nodvol>=volthr[0],nodvol<volthr[1])]=1
        vclass[nodvol>=volthr[1]]=2
    return vclass

def calcNodTexClass(nodtex):
    # Compute texture class from texture rating nodtex
    texthr = [7/3,11/3]
    if isinstance(nodtex,float) or isinstance(nodtex,int):
        if nodtex>=texthr[0] and nodtex<=texthr[1]:
            vclass = 1
        elif nodtex>texthr[1]:
            vclass = 2
        else:
            vclass = 0
    else: #numpy array
        vclass = np.zeros(nodtex.shape)
        vclass[np.bitwise_and(nodtex>=texthr[0],nodtex<texthr[1])]=1
        vclass[nodtex>=texthr[1]]=2
    return vclass

if __name__ == "__main__":
    # Compute Fleischner score for all trainset nodules
    fname_gtNodulesFleischner = 'trainNodules.csv'
    gtNodules = readCsv(fname_gtNodulesFleischner)
    gtNodules = joinNodules(gtNodules)
    pdFleischner = calcFleischner(gtNodules)