没有core文件时定位segfault at 0 ip 的问题(一)

2021-03-22 10:59:30 浏览数 (1)

C/C 编写的程序,崩溃后有时不能生成core文件(即使设置了ulimited),所以往往不知道发生了什么事情,生产环境根本不允许研发小朋友去调试,日志有时候看不出问题了。(如果生成了core文件,或通过日志能定位到问题所以,则可略过此文章)。

本文章专门针对于没有生成core文件、不能通过日志分析问题的情况

第一步:写一段测试代码吧,main.cpp:

代码语言:javascript复制
#include <iostream>
#include <cstdio>
#include <memory.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
#include <ucontext.h>
#include <dlfcn.h>
#include <execinfo.h>
#include <thread>
#include <chrono>
#include <vector>
#include <functional>
#include <iomanip>
#include <mutex>
#include <random>

using namespace std;

void sigsegv_handler(int signum)
{
    std::cout<<"catch signal:"<<signum<<endl;
    void *buffer[1024*1024*10];
    char **strings;
    int j,nptrs;
    nptrs=backtrace(buffer,1024*1024*10);
    cout<<"backtrace returned address:"<<nptrs<<endl;
    strings=backtrace_symbols(buffer,nptrs);
    if (strings!=NULL)
    {
        for(j=0;j<nptrs;  j)
        {
            cout<<strings[j]<<endl;
        }
    }
    free(strings);
}

static void  catch_sigsegv()
{
   struct sigaction action;
   memset(&action, 0, sizeof(action));
   action.sa_handler=sigsegv_handler;
   action.sa_flags=SA_NODEFER|SA_RESETHAND;

    if (sigaction(SIGSEGV, &action, NULL) != 0) { cout<<"sig_action error"<<endl; }
    if (sigaction(SIGFPE,  &action, NULL) != 0) { cout<<"sig_action error"<<endl; }
    if (sigaction(SIGINT,  &action, NULL) != 0) { cout<<"sig_action error"<<endl; }
    if (sigaction(SIGILL,  &action, NULL) != 0) { cout<<"sig_action error"<<endl; }
    if (sigaction(SIGTERM, &action, NULL) != 0) { cout<<"sig_action error"<<endl; }
    if (sigaction(SIGABRT, &action, NULL) != 0) { cout<<"sig_action error"<<endl; }
    if (sigaction(SIGSEGV, &action, NULL) != 0) { cout<<"sig_action error"<<endl; }
}

std::mutex m_mutex;
void *thread_entry(int thread_index)
{
    unsigned long index=0;
    std::default_random_engine e;
    e.seed(thread_index);

    while(true){
       auto random_value=e();
       if(random_value3==0){
            int *p=nullptr;
            *p=10;
        }
        {
            auto t=std::chrono::system_clock::to_time_t(std::chrono::system_clock::now());
            std::lock_guard<std::mutex> lock(m_mutex);
            std::cout<<"thread_index["<<thread_index<<"]:id["<<std::this_thread::get_id()<<"]:["<<std::put_time(std::localtime(&t), "%Y-%m-%d %X")<<"]:"<<  index<<":random_value="<<random_value<<std::endl;
        }
        std::this_thread::sleep_for(std::chrono::seconds(1));
    }
    return nullptr;
}

int main(int argc,char *argv[])
{
    //catch_sigsegv();

    std::vector<std::thread> thread_vector;
    for(int i=0;i<10;  i){
        thread_vector.push_back(std::move(std::thread(std::bind(thread_entry,i))));
    }

    for(int i=0;i<10;  i){
       thread_vector[i].join();
    }
    return 0;
}

代码写得比较乱,还有错误,虽然错误不是很明显,运行一段时间错误就出来了。

写完之后,编译(前提你的编译器最低支持C 11语法):

代码语言:javascript复制
g   -Wall -g -rdynamic main.cpp -o main -lpthread -std=c  11

编译完成后,运行:./main

运行输出如下:

代码语言:javascript复制
thread_index[9]:id[139681072420608]:[2020-07-28 18:24:26]:4:random_value=274558334
thread_index[6]:id[139681097598720]:[2020-07-28 18:24:26]:4:random_value=1614694654
thread_index[5]:id[139681105991424]:[2020-07-28 18:24:26]:4:random_value=629750996
thread_index[8]:id[139681080813312]:[2020-07-28 18:24:26]:4:random_value=1437098323
thread_index[7]:id[139681089206016]:[2020-07-28 18:24:26]:4:random_value=452154665
thread_index[0]:id[139681147954944]:[2020-07-28 18:24:27]:5:random_value=1144108930
thread_index[1]:id[139681139562240]:[2020-07-28 18:24:27]:5:random_value=1144108930
thread_index[4]:id[139681114384128]:[2020-07-28 18:24:27]:5:random_value=281468426
thread_index[2]:id[139681131169536]:[2020-07-28 18:24:27]:5:random_value=140734213
thread_index[3]:id[139681122776832]:[2020-07-28 18:24:27]:5:random_value=1284843143
thread_index[9]:id[139681072420608]:[2020-07-28 18:24:27]:5:random_value=1707045782
thread_index[6]:id[139681097598720]:[2020-07-28 18:24:27]:5:random_value=422202639
thread_index[5]:id[139681105991424]:[2020-07-28 18:24:27]:5:random_value=1425577356
thread_index[8]:id[139681080813312]:[2020-07-28 18:24:27]:5:random_value=562936852
thread_index[7]:id[139681089206016]:[2020-07-28 18:24:27]:5:random_value=1566311569
thread_index[0]:id[139681147954944]:[2020-07-28 18:24:28]:6:random_value=470211272
thread_index[1]:id[139681139562240]:[2020-07-28 18:24:28]:6:random_value=470211272
thread_index[4]:id[139681114384128]:[2020-07-28 18:24:28]:6:random_value=1880845088
thread_index[2]:id[139681131169536]:[2020-07-28 18:24:28]:6:random_value=940422544
thread_index[3]:id[139681122776832]:[2020-07-28 18:24:28]:6:random_value=1410633816
thread_index[5]:id[139681105991424]:[2020-07-28 18:24:28]:6:random_value=203572713
thread_index[9]:id[139681072420608]:[2020-07-28 18:24:28]:6:random_value=2084417801
thread_index[8]:id[139681080813312]:[2020-07-28 18:24:28]:6:random_value=1614206529
thread_index[6]:id[139681097598720]:[2020-07-28 18:24:28]:6:random_value=673783985
thread_index[7]:id[139681089206016]:[2020-07-28 18:24:28]:6:random_value=1143995257
thread_index[0]:id[139681147954944]:[2020-07-28 18:24:29]:7:random_value=101027544
thread_index[2]:id[139681131169536]:[2020-07-28 18:24:29]:7:random_value=202055088
thread_index[3]:id[139681122776832]:[2020-07-28 18:24:29]:7:random_value=303082632
thread_index[5]:id[139681105991424]:[2020-07-28 18:24:29]:7:random_value=505137720
thread_index[9]:id[139681072420608]:[2020-07-28 18:24:29]:7:random_value=909247896
thread_index[1]:id[139681139562240]:[2020-07-28 18:24:29]:7:random_value=101027544
thread_index[4]:id[139681114384128]:[2020-07-28 18:24:29]:7:random_value=404110176
thread_index[8]:id[139681080813312]:[2020-07-28 18:24:29]:7:random_value=808220352
thread_index[6]:id[139681097598720]:[2020-07-28 18:24:29]:7:random_value=606165264
thread_index[7]:id[139681089206016]:[2020-07-28 18:24:29]:7:random_value=707192808
thread_index[0]:id[139681147954944]:[2020-07-28 18:24:30]:8:random_value=1457850878
thread_index[2]:id[139681131169536]:[2020-07-28 18:24:30]:8:random_value=768218109
thread_index[3]:id[139681122776832]:[2020-07-28 18:24:30]:8:random_value=78585340
thread_index[6]:id[139681097598720]:[2020-07-28 18:24:30]:8:random_value=157170680
thread_index[9]:id[139681072420608]:[2020-07-28 18:24:30]:8:random_value=235756020
thread_index[7]:id[139681089206016]:[2020-07-28 18:24:30]:8:random_value=1615021558
thread_index[4]:id[139681114384128]:[2020-07-28 18:24:30]:8:random_value=1536436218
thread_index[5]:id[139681105991424]:[2020-07-28 18:24:30]:8:random_value=846803449
thread_index[8]:id[139681080813312]:[2020-07-28 18:24:30]:8:random_value=925388789
thread_index[1]:id[139681139562240]:[2020-07-28 18:24:30]:8:random_value=1457850878
thread_index[0]:id[139681147954944]:[2020-07-28 18:24:31]:9:random_value=1458777923
thread_index[2]:id[139681131169536]:[2020-07-28 18:24:31]:9:random_value=770072199
thread_index[3]:id[139681122776832]:[2020-07-28 18:24:31]:9:random_value=81366475
thread_index[6]:id[139681097598720]:[2020-07-28 18:24:31]:9:random_value=162732950
thread_index[5]:id[139681105991424]:[2020-07-28 18:24:31]:9:random_value=851438674
thread_index[7]:id[139681089206016]:[2020-07-28 18:24:31]:9:random_value=1621510873
thread_index[4]:id[139681114384128]:[2020-07-28 18:24:31]:9:random_value=1540144398
thread_index[9]:id[139681072420608]:[2020-07-28 18:24:31]:9:random_value=244099425
thread_index[8]:id[139681080813312]:[2020-07-28 18:24:31]:9:random_value=932805149
thread_index[1]:id[139681139562240]:[2020-07-28 18:24:31]:9:random_value=1458777923
thread_index[0]:id[139681147954944]:[2020-07-28 18:24:32]:10:random_value=2007237709
thread_index[2]:id[139681131169536]:[2020-07-28 18:24:32]:10:random_value=1866991771
thread_index[3]:id[139681122776832]:[2020-07-28 18:24:32]:10:random_value=1726745833
thread_index[6]:id[139681097598720]:[2020-07-28 18:24:32]:10:random_value=1306008019
thread_index[5]:id[139681105991424]:[2020-07-28 18:24:32]:10:random_value=1446253957
thread_index[9]:id[139681072420608]:[2020-07-28 18:24:32]:10:random_value=885270205
thread_index[4]:id[139681114384128]:[2020-07-28 18:24:32]:10:random_value=1586499895
thread_index[7]:id[139681089206016]:[2020-07-28 18:24:32]:10:random_value=1165762081
thread_index[8]:id[139681080813312]:[2020-07-28 18:24:32]:10:random_value=1025516143
thread_index[1]:id[139681139562240]:[2020-07-28 18:24:32]:10:random_value=2007237709
thread_index[0]:id[139681147954944]:[2020-07-28 18:24:33]:11:random_value=823564440
thread_index[2]:id[139681131169536]:[2020-07-28 18:24:33]:11:random_value=1647128880
thread_index[3]:id[139681122776832]:[2020-07-28 18:24:33]:11:random_value=323209673
thread_index[6]:id[139681097598720]:[2020-07-28 18:24:33]:11:random_value=646419346
thread_index[5]:id[139681105991424]:[2020-07-28 18:24:33]:11:random_value=1970338553
thread_index[9]:id[139681072420608]:[2020-07-28 18:24:33]:11:random_value=969629019
thread_index[4]:id[139681114384128]:[2020-07-28 18:24:33]:11:random_value=1146774113
thread_index[1]:id[139681139562240]:[2020-07-28 18:24:33]:11:random_value=823564440

根据运行情况而定,过不了多久就会崩溃,如果长时间还没有崩溃,请调整一下第65行的123参数,调得越小越好,调到个位数很快就会崩溃。

第二步:假设没有生成core文件(如果生成了可以删除)

使用命令查看core的信息:./dmesg   如下所示(不同的机器略有不同):

代码语言:javascript复制
[85277.854606] CPU7: Core temperature/speed normal
[90749.125541] CPU7: Core temperature above threshold, cpu clock throttled (total events = 2266885)
[90749.125542] CPU15: Core temperature above threshold, cpu clock throttled (total events = 2266885)
[90749.125544] CPU0: Package temperature above threshold, cpu clock throttled (total events = 4718542)
[90749.125546] CPU12: Package temperature above threshold, cpu clock throttled (total events = 4718542)
[90749.125547] CPU8: Package temperature above threshold, cpu clock throttled (total events = 4718542)
[90749.125549] CPU6: Package temperature above threshold, cpu clock throttled (total events = 4718542)
[90749.125551] CPU1: Package temperature above threshold, cpu clock throttled (total events = 4718542)
[90749.125553] CPU5: Package temperature above threshold, cpu clock throttled (total events = 4718542)
[90749.125554] CPU13: Package temperature above threshold, cpu clock throttled (total events = 4718542)
[90749.125556] CPU2: Package temperature above threshold, cpu clock throttled (total events = 4718542)
[90749.125557] CPU4: Package temperature above threshold, cpu clock throttled (total events = 4718542)
[90749.125559] CPU3: Package temperature above threshold, cpu clock throttled (total events = 4718542)
[90749.125560] CPU11: Package temperature above threshold, cpu clock throttled (total events = 4718542)
[90749.125561] CPU10: Package temperature above threshold, cpu clock throttled (total events = 4718542)
[90749.125562] CPU9: Package temperature above threshold, cpu clock throttled (total events = 4718542)
[90749.125563] CPU14: Package temperature above threshold, cpu clock throttled (total events = 4718542)
[90749.125564] CPU15: Package temperature above threshold, cpu clock throttled (total events = 4718542)
[90749.125580] CPU7: Package temperature above threshold, cpu clock throttled (total events = 4718542)
[90749.126527] CPU7: Core temperature/speed normal
[90749.126528] CPU5: Package temperature/speed normal
[90749.126529] CPU11: Package temperature/speed normal
[90749.126530] CPU3: Package temperature/speed normal
[90749.126531] CPU10: Package temperature/speed normal
[90749.126531] CPU2: Package temperature/speed normal
[90749.126533] CPU6: Package temperature/speed normal
[90749.126534] CPU1: Package temperature/speed normal
[90749.126535] CPU15: Core temperature/speed normal
[90749.126535] CPU9: Package temperature/speed normal
[90749.126537] CPU12: Package temperature/speed normal
[90749.126538] CPU14: Package temperature/speed normal
[90749.126539] CPU8: Package temperature/speed normal
[90749.126540] CPU4: Package temperature/speed normal
[90749.126541] CPU13: Package temperature/speed normal
[90749.126542] CPU0: Package temperature/speed normal
[90749.126542] CPU15: Package temperature/speed normal
[90749.126555] CPU7: Package temperature/speed normal
[93203.608134] main[32241]: segfault at 0 ip 000000000040749a sp 00007fc3c8f13c90 error 6 in main[400000 c000]
[95130.640597] main[9295]: segfault at 0 ip 000000000040742a sp 00007ff8bff35c90 error 6 in main[400000 c000]
[95130.640616] main[9296]: segfault at 0 ip 000000000040742a sp 00007ff8bf734c90 error 6 in main[400000 c000]

上面的信息大部分是没有用的,主要看segfault相关的信息,上面是最后三行,下面对segfault进行解析:

1、从上面可以看出,有三行和main程序有关的segfault信息

2、segfault at 0:0是内存地址,此处可能是访问了非法的内存地址,如:nullptr

3、ip 000000000040749a/ip 000000000040742a:ip,不是网络中的ip,而是指令指针(Instruction Pointer)的缩写,ip相关知识请看汇编或百度,这里不做解释。ip后面的是址是非常重要的——cpu执行代码时,崩溃的地方(有时候ip后面的地址是null,这种情况下节再分析,也是有办法的)

4、sp 00007fc3c8f13c90:sp和bp对应,bp是基址寄存器,sp则指向的是栈顶。不了解的继续补汇编吧,偶也帮不了你

5、error 6:猜也猜得出来,是错误码,这里的错误码有规则的,在linux内核的fault.c文件中有说明:

     错误码/和操作系统有关,所以一定要结合你的操作系统来解读error后的错误码

代码语言:javascript复制
 * Page fault error code bits:
 *
 *   bit 0 == 0: no page found1: protection fault
 *   bit 1 == 0: read access1: write access
 *   bit 2 == 0: kernel-mode access1: user-mode access
 *   bit 3 == 1: use of reserved bit detected
 *   bit 4 == 1: fault was an instruction fetch

错误码6等于二进制中的(110),结合上面的意思,是说:在用户态进行了写操作。 到这一步 ,初步分析为是赋值导致的

6、in main[400000 c000]:400000,指的是映射的地址,后面的c000指的是程序的大小

至此,把需要的信息收集完了。

第三步:见证奇迹的时刻

1、把编译出来的main进行反编译:objdump -d main >main.od ,顺便也读取一下符号吧:nm main >main.nm

2、用vim打开main.od,查找segfault 行中ip后面的地址,这里分别是:ip 000000000040749a/ip 000000000040742a,没有找到40749a这个地址,但这个地址也在程序中;找到了40742a,如下图所示:

从627行的代码可以看出:mov -0x28(%rbp),%rax:把%rbp指向的地址值(可理解为函数的局部变量的地址)赋值给%rax

从628行的代码可以看出:movl 0xa,(%rax):0xa是一个立即数(10),(%rax)是寄存器寻址,(不清楚的看一下汇编中的几种寻址方式:直接寻址、间接寻址,好像一共有七八种寻址方式),这行的意思是把10赋值为当前函数的一个局部变量。

到这一步,其实问题基本已经定位了。。。。。。。。。结合代码看看就清楚了

总之,好晕的呀,尤其是对于没有学过汇编的或汇编基础不好的同学来讲,那么,有没有简单的方法呢?答案是:有

3、使用addr2line工具

    执行命令:addr2line -e main 40749a  ,显示结果如下:

代码语言:javascript复制
_Z12thread_entryi
/home/lian.shao.hua/work/demo/catch_segv/main.cpp:73 (discriminator 3)

    执行命令:addr2line -e main 40742a ,显示结果如下:      

代码语言:javascript复制
_Z12thread_entryi
/home/lian.shao.hua/work/demo/catch_segv/main.cpp:68

 如此,错误的代码行就非常明显了:main.cpp的73行和68行

当然,如果编译的时候开启了-O1、-O2、-O3,会影响问题定位的

本文由来源 ztenv,由 javajgs_com 整理编辑,其版权均为 ztenv 所有,文章内容系作者个人观点,不代表 Java架构师必看 对观点赞同或支持。如需转载,请注明文章来源。

0 人点赞