闲话

gnome环境中自带的开始菜单确实不好看,功能也过于简单,其实就只是一个ItemList控件而已,基本无法定制,对比了很多的开源方案,还是觉得mintMenu不错,但是该工具在非x86架构上验证还是有问题。

问题现象

安装mintmenu包后,在面板中无法添加组件,添加后没有任何反应,查看相关进程信息,确认出现了coredump,分析core文件,确认发生了段错误,发生段错误的未知在不同的core文件中不同,如下是其中一种典型情况:

Core was generated by `/usr/bin/python /usr/share/linuxmint/mintMenu/mintMenu1.py'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000000ffefe17c4c in IA__gdk_window_get_origin (window=0x202ae5a0, x=0xffec356400, y=0xffec356490) at gdkwindow.c:8346
8346	gdkwindow.c: 没有那个文件或目录.
Missing separate debuginfos, use: debuginfo-install
 …
(gdb) bt
#0  0x000000ffefe17c4c in IA__gdk_window_get_origin (window=0x202ae5a0, x=0xffec356400, y=0xffec356490) at gdkwindow.c:8346
#1  0x000000fff0c33af4 in retint () at ../src/mips/n32.S:211
#2  0x000000fff0c32fd4 in ffi_call (cif=0xffffeb95b0, fn=<optimized out>, rvalue=<optimized out>, avalue=<optimized out>) at ../src/mips/ffi.c:644
#3  0x000000ffef07ff2c in _ctypes_callproc (argcount=3, resmem=0xffffeb9520, restype=<optimized out>, atypes=0xffffeb94c0, avalues=0xffffeb94f0, pProc=0xffefe17c08 <IA__gdk_window_get_origin>, flags=4353) at /home/xorg-x11-server/BUILD/Python-2.7.8/Modules/_ctypes/callproc.c:836
#4  0x000000ffef07ff2c in _ctypes_callproc (pProc=0xffefe17c08 <IA__gdk_window_get_origin>, argtuple=<optimized out>, flags=<optimized out>, argtypes=<optimized out>, restype=<_ctypes.PyCSimpleType at remote 0x1202fa7f0>, checker=0x0) at /home/xorg-x11-server/BUILD/Python-2.7.8/Modules/_ctypes/callproc.c:1183
#5  0x000000ffef073f68 in PyCFuncPtr_call (self=self@entry=0xffec34ba10, inargs=(4834649504, <CArgObject at remote 0xffec0c27f0>, <CArgObject at remote 0xffec0c2570>), kwds=<optimized out>)
    at /home/xorg-x11-server/BUILD/Python-2.7.8/Modules/_ctypes/_ctypes.c:3965
#6  0x000000fff79b52e4 in PyObject_Call (func=<_FuncPtr(__name__='gdk_window_get_origin') at remote 0xffec34ba10>, arg=<optimized out>, kw=<optimized out>)
    at /home/xorg-x11-server/BUILD/Python-2.7.8/Objects/abstract.c:2529
#7  0x000000fff7a95f88 in PyEval_EvalFrameEx (nk=<optimized out>, na=3, pp_stack=0xffffeb97c0, func=<_FuncPtr(__name__='gdk_window_get_origin') at remote 0xffec34ba10>)
    at /home/xorg-x11-server/BUILD/Python-2.7.8/Python/ceval.c:4328
#8  0x000000fff7a95f88 in PyEval_EvalFrameEx (oparg=<optimized out>, pp_stack=0xffffeb97c0) at /home/xorg-x11-server/BUILD/Python-2.7.8/Python/ceval.c:4133
#9  0x000000fff7a95f88 in PyEval_EvalFrameEx (f=<optimized out>, throwflag=<optimized out>) at /home/xorg-x11-server/BUILD/Python-2.7.8/Python/ceval.c:2753
#10 0x000000fff7a99454 in PyEval_EvalFrameEx (nk=<optimized out>, na=1, n=1, pp_stack=0xffffeb9900, func=<function at remote 0xffee9726e0>)
    at /home/xorg-x11-server/BUILD/Python-2.7.8/Python/ceval.c:4196
#11 0x000000fff7a99454 in PyEval_EvalFrameEx (oparg=<optimized out>, pp_stack=0xffffeb9900) at /home/xorg-x11-server/BUILD/Python-2.7.8/Python/ceval.c:4131
#12 0x000000fff7a99454 in PyEval_EvalFrameEx (f=<optimized out>, throwflag=<optimized out>) at /home/xorg-x11-server/BUILD/Python-2.7.8/Python/ceval.c:2753
#13 0x000000fff7a99454 in PyEval_EvalFrameEx (nk=<optimized out>, na=1, n=1, pp_stack=0xffffeb9a40, func=<function at remote 0xffee9725f0>)
    at /home/xorg-x11-server/BUILD/Python-2.7.8/Python/ceval.c:4196
#14 0x000000fff7a99454 in PyEval_EvalFrameEx (oparg=<optimized out>, pp_stack=0xffffeb9a40) at /home/xorg-x11-server/BUILD/Python-2.7.8/Python/ceval.c:4131
#15 0x000000fff7a99454 in PyEval_EvalFrameEx (f=<optimized out>, throwflag=<optimized out>) at /home/xorg-x11-server/BUILD/Python-2.7.8/Python/ceval.c:2753
#16 0x000000fff7a9aa94 in PyEval_EvalCodeEx (co=0xfff74dc330, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kws=0x0, kwcount=<optimized out>, defs=0xffee9d6380, defcount=2, closure=0x0) at /home/xorg-x11-server/BUILD/Python-2.7.8/Python/ceval.c:3342
#17 0x000000fff79f42f8 in function_call (func=func@entry=<function at remote 0xffee972578>, arg=(<MenuWin(hotkeyText='Super_L', mate_settings=<Settings at remote 0xffecfbe370>, theme_name='default', settings=<Settings at remote 0xffee9ca9b0>, iconSize=22, buttonText='Menu', mainwin=<MainWindow(tooltips=<Tooltips at remote 0xffecfbedc0>, panelSettings=<Settings at remote 0xffecfbe910>, paneholder=<HBox at remote 0xffecfbe780>, custombordercolor='#001155', toggle=<HBox at remote 0xffecfbe5a0>, panesToColor=[<EventBox at remote 0xffecfbee10>, <EventBox at remote 0xffecfbeeb0>, <EventBox at remote 0xffecbcbd20>, <Viewport at remote 0xffecbcbd70>, <EventBox at remote 0xffec3238c0>, <Viewport at remote 0xffec323910>, <EventBox at remote 0xffec3364b0>, <EventBox at remote 0xffecfbee60>, <EventBox at remote 0xffec275320>, <Viewport at remote 0xffec275370>, <Viewport at remote 0xffec2753c0>, <Viewport at remote 0xffec275410>], borderwidth=1, plugins={'applications': <pluginclass(adminMenu=None, showcategoryicons=True, categoryhoverdelay=150, searchButton=<Button at remote 0xffec342fa0>, mintMenuWin=<...>, applic...(truncated), kw=0x0)
    at /home/xorg-x11-server/BUILD/Python-2.7.8/Objects/funcobject.c:526

分析过程

反汇编,确认问题发生的直接现场

(gdb) disassemble 
Dump of assembler code for function IA__gdk_window_get_origin:
   0x000000ffefe17c08 <+0>:	daddiu	sp,sp,-48
   0x000000ffefe17c0c <+4>:	sd	gp,32(sp)
   0x000000ffefe17c10 <+8>:	lui	gp,0xa
   0x000000ffefe17c14 <+12>:	daddu	gp,gp,t9
   0x000000ffefe17c18 <+16>:	sd	s0,0(sp)
   0x000000ffefe17c1c <+20>:	move	s0,a0
   0x000000ffefe17c20 <+24>:	daddiu	gp,gp,1480
   0x000000ffefe17c24 <+28>:	sd	s3,24(sp)
   0x000000ffefe17c28 <+32>:	move	s3,a1
   0x000000ffefe17c2c <+36>:	ld	t9,-32504(gp)
   0x000000ffefe17c30 <+40>:	sd	s2,16(sp)
   0x000000ffefe17c34 <+44>:	move	s2,a2
   0x000000ffefe17c38 <+48>:	sd	ra,40(sp)
   0x000000ffefe17c3c <+52>:	bal	0xffefe0de40 <IA__gdk_window_object_get_type>
   0x000000ffefe17c40 <+56>:	sd	s1,8(sp)
   0x000000ffefe17c44 <+60>:	beqz	s0,0xffefe17c78 <IA__gdk_window_get_origin+112>
   0x000000ffefe17c48 <+64>:	move	s1,v0
=> 0x000000ffefe17c4c <+68>:	ld	v0,0(s0)
   0x000000ffefe17c50 <+72>:	beqz	v0,0xffefe17c64 <IA__gdk_window_get_origin+92>

问题出在:

=> 0x000000ffefe17c4c <+68>:	ld	v0,0(s0)

ld是内存加载指令,这句的意思是将s0寄存器中的值作为内存地址,加上偏移(0),访问内存,将内存中的数据加载到v0寄存器中。 看看s0寄存器的值:

(gdb) info registers 
                  zero               at               v0               v1
 R0   0000000000000000 0000000000000001 00000001202a8540 0000000000000000 
                    a0               a1               a2               a3
 R4   00000000202ae5a0 000000ffec356400 000000ffec356490 000000ffec044be0 
                    a4               a5               a6               a7
 R8   00000001202ae5a0 0000000000000000 0000000000000018 0000000000010000 
                    t0               t1               t2               t3
 R12  0000000000000000 000000ffffeb9520 0000000000000001 0000000000000003 
                    s0               s1               s2               s3
 R16  00000000202ae5a0 00000001202a8540 000000ffec356490 000000ffec356400 
                    s4               s5               s6               s7
 R20  0000000000000000 0000000000000000 000000ffffeb95b0 000000ffec0c2570 
                    t8               t9               k0               k1
 R24  ffffffffffffffd8 000000ffefe0de40 0000000000000000 0000000000000000 
                    gp               sp               s8               ra
 R28  000000ffefeb81d0 000000ffffeb93e0 000000ffffeb9430 000000ffefe17c44 
                    sr               lo               hi              bad
      000000004400ccf3 745d1745eba6cf10 000000003464e3f1 00000000202ae5a0 
                 cause               pc
      0000000010000008 000000ffefe17c4c 
                   fsr              fir
              0c800004         00000000 

可见,s0的值为00000000202ae5a0,该值只有后4个字节非0,将该值作为指针访问内存,显然会触发段错误。

再看看最后一级堆栈函数的入参:

#0  0x000000ffefe17c4c in IA__gdk_window_get_origin (window=0x202ae5a0, x=0xffec356400, y=0xffec356490) at gdkwindow.c:8346

刚好,window=0x202ae5a0,可见触发段错误的指针是通过入参传入的,而这个指针只有4个字节,明显是被截断了。那就是说,在参数传递过程中被截断了。

仔细看看堆栈的调用流程,结合python中的ctypes的相关原理,可知,这个流程本质是:在python脚本中通过ctypes模块调用C库中的接口。那问题应该就出在这个过程中了。

##深入分析堆栈:

(gdb) bt full
#0  0x000000ffefe17c4c in IA__gdk_window_get_origin (window=0x202ae5a0, x=0xffec356400, y=0xffec356490) at gdkwindow.c:8346
        __inst = 0x202ae5a0
        __t = <optimized out>
        __r = <optimized out>
        private = <optimized out>
        impl_iface = <optimized out>
        __FUNCTION__ = "IA__gdk_window_get_origin"
#1  0x000000fff0c33af4 in retint () at ../src/mips/n32.S:211
#2  0x000000fff0c32fd4 in ffi_call (cif=0xffffeb95b0, fn=<optimized out>, rvalue=<optimized out>, avalue=<optimized out>) at ../src/mips/ffi.c:644
        copy_rvalue = 0
        copy_offset = 0
        rvalue_copy = <optimized out>
        ecif = {cif = 0xffffeb95b0, rvalue = 0xffffeb9520, avalue = 0xffffeb94f0}
#3  0x000000ffef07ff2c in _ctypes_callproc (argcount=3, resmem=0xffffeb9520, restype=<optimized out>, atypes=0xffffeb94c0, avalues=0xffffeb94f0, pProc=0xffefe17c08 <IA__gdk_window_get_origin>, flags=4353) at /home/xorg-x11-server/BUILD/Python-2.7.8/Modules/_ctypes/callproc.c:836
        error_object = 0x0
        cc = 3
        _save = 0x1200140a0
        space = 0xfff0e6b004 <g_base_info_unref+212>
        cif = {abi = FFI_N64, nargs = 3, arg_types = 0xffffeb94c0, rtype = 0xffef0e54b0, bytes = 24, flags = 65536, rstruct_flag = 0}
        i = <optimized out>
        n = 3
        argcount = 3
        argtype_count = 0
        resbuf = 0xffffeb9520
        args = 0xffffeb9540
        pa = <optimized out>
        atypes = 0xffffeb94c0
        rtype = <optimized out>
        avalues = 0xffffeb94f0
        retval = 0x0
#4  0x000000ffef07ff2c in _ctypes_callproc (pProc=0xffefe17c08 <IA__gdk_window_get_origin>, argtuple=<optimized out>, flags=<optimized out>, argtypes=<optimized out>, restype=<_ctypes.PyCSim
leType at remote 0x1202fa7f0>, checker=0x0) at /home/xorg-x11-server/BUILD/Python-2.7.8/Modules/_ctypes/callproc.c:1183
        i = <optimized out>
        n = 3
        argcount = 3
        argtype_count = 0
        resbuf = 0xffffeb9520
        args = 0xffffeb9540
        pa = <optimized out>
        atypes = 0xffffeb94c0
        rtype = <optimized out>
        avalues = 0xffffeb94f0
        retval = 0x0
#5  0x000000ffef073f68 in PyCFuncPtr_call (self=self@entry=0xffec34ba10, inargs=(4834649504, <CArgObject at remote 0xffec0c27f0>, <CArgObject at remote 0xffec0c2570>), kwds=<optimized out>)
    at /home/xorg-x11-server/BUILD/Python-2.7.8/Modules/_ctypes/_ctypes.c:3965
        restype = <optimized out>
        converters = <optimized out>
        checker = <optimized out>
        argtypes = <optimized out>
        dict = <optimized out>
        result = <optimized out>
        errcheck = 0x0
        pProc = <optimized out>
        inoutmask = 0
        outmask = 0
        numretvals = 0
#6  0x000000fff79b52e4 in PyObject_Call (func=<_FuncPtr(__name__='gdk_window_get_origin') at remote 0xffec34ba10>, arg=<optimized out>, kw=<optimized out>)
    at /home/xorg-x11-server/BUILD/Python-2.7.8/Objects/abstract.c:2529
        result = <optimized out>
        call = 0xffef073e70 <PyCFuncPtr_call>
#7  0x000000fff7a95f88 in PyEval_EvalFrameEx (nk=<optimized out>, na=3, pp_stack=0xffffeb97c0, func=<_FuncPtr(__name__='gdk_window_get_origin') at remote 0xffec34ba10>)
    at /home/xorg-x11-server/BUILD/Python-2.7.8/Python/ceval.c:4328
        callargs = <optimized out>

会发现:

#5  0x000000ffef073f68 in PyCFuncPtr_call (self=self@entry=0xffec34ba10, inargs=(4834649504, <CArgObject at remote 0xffec0c27f0>, <CArgObject at remote 0xffec0c2570>), kwds=<optimized out>)

该级堆栈中的入参inargs元组中第一个数据4834649504,换算成十六进制,刚好为0x01202ae5a0,其后面四个字节为发生段错误的指针,推测该指针是从这里传入了,仔细分析相关代码流程,确实如此。 那问题肯定出在第#1级堆栈到第#5级堆栈之间的流程了。

逐级分析代码逻辑

第#1级堆栈为汇编代码,啃了一下,主要在其调用的pre_args相关的流程中,看似没有发现截断参数的可能性,都是至少8字节对齐的拷贝。

第#2为libffi相关的代码,了解了相关概念和原理,也没有发现异常,而且堆栈中也没有错误指针的信息。

第#3级堆栈看似冗余的,有点问题,跳过

第#4级堆栈为关键所在,其中有对参数的处理,该函数中的入参中有参数的信息,但是被编译器优化掉了,无法直接查看,只能看看其中的局部变量信息了,关键在:

args = 0xffffeb9540

分析相关代码逻辑,可以知道args局部变量中存放的即为参数,分析该局部变量对应地址中的信息

(gdb) x/100x  0xffffeb9540
0xffffeb9540:	0xf0c34140	0x000000ff	0x00000000	0x00000000
0xffffeb9550:	0x202ae5a0	0x00000000	0x00000000	0x00000000
0xffffeb9560:	0xf0c340f8	0x000000ff	0xec0c27f0	0x000000ff

args是argument结构体变量,看看定义:

struct argument {
    ffi_type *ffi_type;
    PyObject *keep;
    union result value;
};

可见前面16个字节分别对应ffi_type和keep指针,那么0xffffeb9550地址的指即为真实的参数值:

0xffffeb9550:	0x202ae5a0	0x00000000

显然,这里的数据已经被截断了,说明问题肯定出现在第#3级堆栈到第#5级堆栈直接的流程,

分析相关的疑似代码

(python脚本作为插件被调用,不方便直接调试,而且在python源码中加打印,重新编译实在是太花时间了~,所以还是主要依靠代码分析了),结合各级堆栈的中的局部变量信息,基本锁定问题代码:

/*
 * Convert a single Python object into a PyCArgObject and return it.
 */
/*
  * 拷贝参数
  */
static int ConvParam(PyObject *obj, Py_ssize_t index, struct argument *pa)
{
...
	/* Fixme:
	  * 这里看似有点问题,long型的值为何要用int的成员和ffi_type_sint呢?结果是该参数会被截断
	  */
    if (PyLong_Check(obj)) {
        pa->ffi_type = &ffi_type_sint;
        pa->value.i = (long)PyLong_AsUnsignedLong(obj);
        if (pa->value.i == -1 && PyErr_Occurred()) {
            PyErr_Clear();
            pa->value.i = PyLong_AsLong(obj);
            if (pa->value.i == -1 && PyErr_Occurred()) {
                PyErr_SetString(PyExc_OverflowError,
                                "long int too long to convert");
                return -1;
            }
        }
        return 0;
    }
...
}

根本原因

该问题的产生过程就是这样的,当问题的根源其实并不在这里,其实根源还在于在python中调用c接口时的代码标准问题,看看ctypes相关的refereences手册,加上google、stackoverflow,可以知道该问题的根源还在于,在使用CDLL模块调用C接口中,没有正确设置接口的argtypes和restype,mintMenu.py的代码中有如下调用gdk库接口的代码:

gdk = CDLL("libgdk-x11-2.0.so.0")
...
gdk.gdk_window_get_origin(hash(self.applet.window), byref(x), byref(y))

该代码中使用CDLL加载了gdk的库,然后调用了gdk库中的gdk_window_get_origin函数,但调用时未设置接口的argtypes和restype。

正确的做法应该是:

gdk = CDLL("libgdk-x11-2.0.so.0")
...
#gdk.gdk_window_get_origin.restype = c_int
#gdk.gdk_window_get_origin.argtypes = [c_void_p, c_void_p, c_void_p]
...
gdk.gdk_window_get_origin(hash(self.applet.window), byref(x), byref(y))

见ctypes相关的references手册,可以知道argtypes是设置函数接口的入参类型,restype 是设置返回值类型。如果未正确设置,那么入参和返回值都可能因为类型长度问题被截断。比如在遇到int的字节数和long不一样,指针的字节数与int不一样等情况时。

mintMenu中类似的调用还有不少,需要正确设置每一个参数和返回值的类似,逐一修改后,运行正常。

未解疑问

  1. 该python代码在x64_64的架构环境中运行没有问题,无段错误。原因不明。

  2. 疑似问题代码如此设计可能有作者自己的原因,具体不明。