关于ELF的辅助向量
elf是Linux系统下最通用的可执行程序的文件格式,关于elf文件的加载与动态链接已很多的相关资料,而本文尝试着重介绍一下其中的一个点,即辅助向量(Auxiliary Vector)。
当一个elf文件被加载并作为一个进程开始执行之前,加载器会把相关信息传递给它,这其中除了用户设置的启动参数以外,还有系统环境的信息(即通常所说的环境变量)以及本文的主角:辅助向量。
对于用户参数和环境变量及其它们各自的作用,我们并不陌生,而且还很经常的使用,特别是用户参数,这无需多说,但对于辅助向量,却不明就里。事实上,辅助向量另外一种由内核向应用程序传递信息的方式。
辅助向量的存储位置与用户参数、环境变量类似,同样也存放在栈空间上,大致的布局结构如下:
position content size (bytes) + comment ------------------------------------------------------------------------ (0xc0000000) < bottom of stack > 0 (virtual) (0xbffffffc) [ end marker ] 4 (= NULL) [ environment ASCIIZ str. ] >= 0 [ argument ASCIIZ strings ] >= 0 [ padding ] 0 - 16 [ auxv[term] (Elf32_auxv_t) ] 8 (= AT_NULL vector) [ auxv[..] (Elf32_auxv_t) ] 8 [ auxv[1] (Elf32_auxv_t) ] 8 [ auxv[0] (Elf32_auxv_t) ] 8 [ envp[term] (pointer) ] 4 (= NULL) [ envp[..] (pointer) ] 4 [ envp[1] (pointer) ] 4 [ envp[0] (pointer) ] 4 [ argv[n] (pointer) ] 4 (= NULL) [ argv[n - 1] (pointer) ] 4 [ argv[..] (pointer) ] 4 * x [ argv[1] (pointer) ] 4 [ argv[0] (pointer) ] 4 (program name) stack pointer -> [ argc = number of args ] 4 ------------------------------------------------------------------------
上面内容主要来之参考1,但根据Linux栈的特点(即倒序满栈)做了一下调整,另外,显示的虽然是32位系统地址情况,但64系统与此一致。
以32位系统为例,下的布局验证:
[root@lenky auxv]# uname -a Linux lenky 2.6.30 #2 SMP Tue Sep 21 17:19:57 CST 2010 i686 i686 i386 GNU/Linux [root@lenky auxv]# cat main.c /** * filename: main.c */ #include <stdio.h> int main(int argc, char *argv[]) { printf("argc:%d, argv[0]:%s\n", argc, argv[0]); return 0; } [root@lenky auxv]# gcc -O0 -g main.c -o main [root@lenky auxv]# echo 0 > /proc/sys/kernel/randomize_va_space [root@lenky auxv]# gdb ./main -q (gdb) b main Breakpoint 1 at 0x8048395: file main.c, line 8. (gdb) r a b c d Starting program: /home/work/auxv/main a b c d Breakpoint 1, main (argc=5, argv=0xbffffab4) at main.c:8 8 printf("argc:%d, argv[0]:%s\n", argc, argv[0]); (gdb) info reg esp esp 0xbffffa00 0xbffffa00 (gdb) p &argc $1 = (int *) 0xbffffa30 (gdb) dump memory /tmp/main.data 0xbffffa30 0xc0000000 (gdb) q The program is running. Exit anyway? (y or n) y [root@lenky auxv]#
这是一个很简单的测试程序,断点下在main函数处,gdb跟进来断下后(注意:在此之前把ASLR关掉,以便我们关注的逻辑更加清晰),esp寄存器的值为0xbffffa00,而参数argc的地址为0xbffffa30,这和前面给出的布局结构有一点差异,原因是在C语言中,参数是由函数调用者(即__libc_start_main)压入栈的,因此当执行到main函数后,栈已经加入了其它数据,比如保存的返回地址,main函数内需要的局部空间等,因此此时的栈指针esp寄存器要更小。
看看我们宕取出来的内存数据:
[root@lenky auxv]# hexdump -C /tmp/main.data 00000000 05 00 00 00 b4 fa ff bf cc fa ff bf 10 48 16 00 |.............H..| 00000010 00 00 00 00 01 00 00 00 01 00 00 00 00 00 00 00 |................| 00000020 f4 cf 2a 00 a0 3c 16 00 00 00 00 00 88 fa ff bf |..*..<..........| 00000030 b8 6d a4 83 e9 89 43 3c 00 00 00 00 00 00 00 00 |.m....C<........| 00000040 00 00 00 00 b0 c4 15 00 cd 1d 18 00 c0 3f 16 00 |.............?..| 00000050 05 00 00 00 b0 82 04 08 00 00 00 00 d1 82 04 08 |................| 00000060 84 83 04 08 05 00 00 00 b4 fa ff bf d0 83 04 08 |................| 00000070 c0 83 04 08 c0 75 15 00 ac fa ff bf eb ff 15 00 |.....u..........| 00000080 05 00 00 00 f6 fb ff bf 0b fc ff bf 0d fc ff bf |................| 00000090 0f fc ff bf 11 fc ff bf 00 00 00 00 13 fc ff bf |................| 000000a0 22 fc ff bf 32 fc ff bf 3d fc ff bf 4b fc ff bf |"...2...=...K...| 000000b0 6d fc ff bf 80 fc ff bf 8a fc ff bf 4d fe ff bf |m...........M...| 000000c0 58 fe ff bf c9 fe ff bf e3 fe ff bf f2 fe ff bf |X...............| 000000d0 0d ff ff bf 21 ff ff bf 36 ff ff bf 47 ff ff bf |....!...6...G...| 000000e0 50 ff ff bf 5b ff ff bf 63 ff ff bf 70 ff ff bf |P...[...c...p...| 000000f0 7c ff ff bf b0 ff ff bf d2 ff ff bf 00 00 00 00 ||...............| 00000100 20 00 00 00 14 f4 ff b7 21 00 00 00 00 f0 ff b7 | .......!.......| 00000110 10 00 00 00 ff fb eb 0f 06 00 00 00 00 10 00 00 |................| 00000120 11 00 00 00 64 00 00 00 03 00 00 00 34 80 04 08 |....d.......4...| 00000130 04 00 00 00 20 00 00 00 05 00 00 00 07 00 00 00 |.... ...........| 00000140 07 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 |................| 00000150 09 00 00 00 b0 82 04 08 0b 00 00 00 00 00 00 00 |................| 00000160 0c 00 00 00 00 00 00 00 0d 00 00 00 00 00 00 00 |................| 00000170 0e 00 00 00 00 00 00 00 17 00 00 00 00 00 00 00 |................| 00000180 19 00 00 00 db fb ff bf 1f 00 00 00 e7 ff ff bf |................| 00000190 0f 00 00 00 eb fb ff bf 00 00 00 00 00 00 00 00 |................| 000001a0 00 00 00 00 00 00 00 00 00 00 00 d8 28 62 8a b0 |............(b..| 000001b0 f6 f3 b2 fe 7b ae e7 aa 49 02 e2 69 36 38 36 00 |....{...I..i686.| 000001c0 00 00 00 00 00 00 2f 68 6f 6d 65 2f 77 6f 72 6b |....../home/work| 000001d0 2f 61 75 78 76 2f 6d 61 69 6e 00 61 00 62 00 63 |/auxv/main.a.b.c| 000001e0 00 64 00 48 4f 53 54 4e 41 4d 45 3d 6c 65 6e 6b |.d.HOSTNAME=lenk| 000001f0 79 00 53 48 45 4c 4c 3d 2f 62 69 6e 2f 62 61 73 |y.SHELL=/bin/bas| 00000200 68 00 54 45 52 4d 3d 78 74 65 72 6d 00 48 49 53 |h.TERM=xterm.HIS| 00000210 54 53 49 5a 45 3d 31 30 30 30 00 53 53 48 5f 43 |TSIZE=1000.SSH_C| ... 000005a0 73 00 47 5f 42 52 4f 4b 45 4e 5f 46 49 4c 45 4e |s.G_BROKEN_FILEN| 000005b0 41 4d 45 53 3d 31 00 2f 68 6f 6d 65 2f 77 6f 72 |AMES=1./home/wor| 000005c0 6b 2f 61 75 78 76 2f 6d 61 69 6e 00 00 00 00 00 |k/auxv/main.....| 000005d0 [root@lenky auxv]#
注意,我们宕取的内存是从参数argc所在地址开始的,因此前面4个字节(一个int类型数据,小端模式):
05 00 00 00
也就是argc的值,数值5,符合“r a b c d”实际情况,即加上表示执行程序文件名的第0个参数,一共有5个用户参数。
接下来:
b4 fa ff bf
为main函数的第二个参数argv的值,这是一个数组的指针,在C语言中,也就是一个二级指针(当然,这只是一种粗略的说法),它在上面宕取内存中的相对偏移为:
0xbffffab4 – 0xbffffa30 = 0×84
即内容值为:
f6 fb ff bf
即:argv[0]是一个char *类型,所以其具体值由0xbffffbf6指定:
0xbffffbf6 – 0xbffffa30 = 0x1c6
计算偏移后,从宕取的内存来看,结果如下:
000001c6 “/home/work/auxv/main”
那么,其它的argv[1]、argv[2]、argv[3]、argv[4]分别为0xbffffab8、0xbffffabc等。
用户参数数组以NULL结束,再之后就是环境变量的参数,比如:
[root@lenky auxv]# gdb ./main -q (gdb) b main Breakpoint 1 at 0x8048395: file main.c, line 8. (gdb) r Starting program: /home/work/auxv/main Breakpoint 1, main (argc=1, argv=0xbffffac4) at main.c:8 8 printf("argc:%d, argv[0]:%s\n", argc, argv[0]); (gdb) p argc $1 = 1 (gdb) p argv[0] $2 = 0xbffffbfe "/home/work/auxv/main" (gdb) p argv[1] $3 = 0x0 (gdb) p argv[2] $4 = 0xbffffc13 "HOSTNAME=lenky" (gdb) p argv[3] $5 = 0xbffffc22 "SHELL=/bin/bash" (gdb) q The program is running. Exit anyway? (y or n) y [root@lenky auxv]#
而事实上,main函数的定义还可以是这样:
[root@lenky auxv]# cat env.c /** * filename: env.c */ #include <stdio.h> int main(int argc, char *argv[], char *envp[]) { printf("argc:%d, argv[0]:%s, envp[0]:%s\n", argc, argv[0], envp[0]); return 0; } [root@lenky auxv]# gcc env.c -o env [root@lenky auxv]# ./env argc:1, argv[0]:./env, envp[0]:HOSTNAME=lenky [root@lenky auxv]#
但我们平常在定义main函数时,为什么不带第三个参数envp也可以正常工作?原因在于C语言的参数是有调用者压入栈的,被调用者用或者不用它,用两个还是用三个参数,都没有关系,因此如下这些情况的原型声明都不影响程序正常执行:
int main(); int main(int argc, char *argv[]); int main(int argc, char *argv[], char *envp[]);
看汇编代码实例:
[root@lenky auxv]# gdb ./main -q (gdb) b main Breakpoint 1 at 0x8048395: file main.c, line 8. (gdb) r Starting program: /home/work/auxv/main Breakpoint 1, main (argc=1, argv=0xbffffac4) at main.c:8 8 printf("argc:%d, argv[0]:%s\n", argc, argv[0]); (gdb)
这是2个参数的情况,当前是在main函数内,我们来看看是由谁调入进来的:
(gdb) info reg ebp ebp 0xbffffa28 0xbffffa28 (gdb) x/i *(0xbffffa28+4) 0x181e9c <__libc_start_main+220>: mov %eax,(%esp)
如果大家熟悉C函数调用栈帧,那么知道寄存器ebp的值再加上4所指向的地址空间里存储的是返回地址,所以用x命令反编译它,可以看到调用者为函数__libc_start_main。
再看看传了几个参数:
(gdb) x/10i *(0xbffffa28+4)-32 0x181e7c <__libc_start_main+188>: pusha 0x181e7d <__libc_start_main+189>: add %al,(%eax) 0x181e7f <__libc_start_main+191>: add %cl,-0x4b7d(%ebx) 0x181e85 <__libc_start_main+197>: decl 0x8b0c55(%ebx) 0x181e8b <__libc_start_main+203>: mov %edx,(%esp) 0x181e8e <__libc_start_main+206>: mov %eax,0x8(%esp) 0x181e92 <__libc_start_main+210>: mov 0x10(%ebp),%eax 0x181e95 <__libc_start_main+213>: mov %eax,0x4(%esp) 0x181e99 <__libc_start_main+217>: call *0x8(%ebp) 0x181e9c <__libc_start_main+220>: mov %eax,(%esp)
0x181e99行(就暂称为行吧)即为调入main的入口,下一行,即0x181e9c也就是返回地址,那么前面几行就是传入的参数,可以看到有3个mov指令到不同的栈空间,即esp,esp+4,esp+8。
从glibc的源代码,也可以看到相关部分:
STATIC int LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL), int argc, char *__unbounded *__unbounded ubp_av, #ifdef LIBC_START_MAIN_AUXVEC_ARG ElfW(auxv_t) *__unbounded auxvec, #endif __typeof (main) init, void (*fini) (void), void (*rtld_fini) (void), void *__unbounded stack_end) {
LIBC_START_MAIN也就是__libc_start_main,然后它有个函数指针参数为main,注意main的原型,重要的是宏MAIN_AUXVEC_DECL:
#ifdef MAIN_AUXVEC_ARG /* main gets passed a pointer to the auxiliary. */ # define MAIN_AUXVEC_DECL , void * # define MAIN_AUXVEC_PARAM , auxvec #else # define MAIN_AUXVEC_DECL # define MAIN_AUXVEC_PARAM #endif
main函数最少有3个参数,另外根据MAIN_AUXVEC_DECL宏是否打开,可能还有一个代表auxv的参数,但在我的glibc里,该宏貌似没有被打开,所以我系统上的main函数的真实原型声明应该为:
int main(int argc, char *argv[], char *envp[]);
没有参数指定环境变量数组的元素个数,但它同样是以NULL结束,所以可以通过遍历envp,一直到auxv,看实例:
[root@lenky auxv]# cat auxv.c /** * filename: auxv.c */ #include <stdio.h> #include <elf.h> int main(int argc, char *argv[], char *envp[]) { Elf32_auxv_t *auxv; /*from stack diagram above: *envp = NULL marks end of envp*/ while(*envp++ != NULL); /* auxv->a_type = AT_NULL marks the end of auxv */ for (auxv = (Elf32_auxv_t *)envp; auxv->a_type != AT_NULL; auxv++) { if( auxv->a_type == AT_SYSINFO) printf("AT_SYSINFO is: 0x%x\n", auxv->a_un.a_val); } } [root@lenky auxv]# gcc auxv.c -g -o auxv [root@lenky auxv]# ./auxv AT_SYSINFO is: 0xb7fff414 [root@lenky auxv]# gdb -q ./auxv (gdb) b 18 Breakpoint 1 at 0x80483e8: file auxv.c, line 18. (gdb) r Starting program: /home/work/auxv/auxv AT_SYSINFO is: 0xb7fff414 Breakpoint 1, main (argc=1, argv=0xbffffac4, envp=0xbffffb30) at auxv.c:19 warning: Source file is more recent than executable. 19 } (gdb) info auxv 32 AT_SYSINFO Special system info/entry points 0xb7fff414 33 AT_SYSINFO_EHDR System-supplied DSO's ELF header 0xb7fff000 16 AT_HWCAP Machine-dependent CPU capability hints 0xfebfbff 6 AT_PAGESZ System page size 4096 17 AT_CLKTCK Frequency of times() 100 3 AT_PHDR Program headers for program 0x8048034 4 AT_PHENT Size of program header entry 32 5 AT_PHNUM Number of program headers 7 7 AT_BASE Base address of interpreter 0x0 8 AT_FLAGS Flags 0x0 9 AT_ENTRY Entry point of program 0x80482b0 11 AT_UID Real user ID 0 12 AT_EUID Effective user ID 0 13 AT_GID Real group ID 0 14 AT_EGID Effective group ID 0 23 AT_SECURE Boolean, was exec setuid-like? 0 25 ??? 0xbffffbdb 31 ??? 0xbfffffe7 15 AT_PLATFORM String identifying platform 0xbffffbeb "i686" 0 AT_NULL End of vector 0x0 (gdb)
从上面各种内幕分析(就不一一对照宕取内存数据和main参数的匹配性了),可以看到,最前面给出的布局是正确的。auxv传递给应用程序的信息各种各样,如上面gdb里显示的那样,比如有vsyscall入口地址、真实uid、真实gid等等,还可以这样显示:
[root@lenky auxv]# LD_SHOW_AUXV=1 ./auxv AT_SYSINFO: 0xb7fff414 AT_SYSINFO_EHDR: 0xb7fff000 AT_HWCAP: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss AT_PAGESZ: 4096 AT_CLKTCK: 100 AT_PHDR: 0x8048034 AT_PHENT: 32 AT_PHNUM: 7 AT_BASE: 0x0 AT_FLAGS: 0x0 AT_ENTRY: 0x80482b0 AT_UID: 0 AT_EUID: 0 AT_GID: 0 AT_EGID: 0 AT_SECURE: 0 AT_??? (0x19): 0xbffffbeb AT_??? (0x1f): 0xbffffff5 AT_PLATFORM: i686 AT_SYSINFO is: 0xb7fff414 [root@lenky auxv]#
64位系统情况:
[root@localhost auxv]# uname -a Linux localhost.localdomain 3.7.0 #1 SMP Wed Jan 9 04:46:12 CST 2013 x86_64 x86_64 x86_64 GNU/Linux [root@localhost auxv]# cat /etc/issue CentOS Linux release 6.0 (Final) Kernel \r on an \m [root@localhost auxv]# cat auxv64.c /** * filename: auxv.c */ #include <stdio.h> #include <elf.h> int main(int argc, char *argv[], char *envp[]) { Elf64_auxv_t *auxv; /*from stack diagram above: *envp = NULL marks end of envp*/ while(*envp++ != NULL); /* auxv->a_type = AT_NULL marks the end of auxv */ for (auxv = (Elf64_auxv_t *)envp; auxv->a_type != AT_NULL; auxv++) { if( auxv->a_type == AT_SYSINFO_EHDR) printf("AT_SYSINFO_EHDR is: 0x%p\n", auxv->a_un.a_val); } } [root@localhost auxv]# gcc auxv64.c -o auxv64 [root@localhost auxv]# LD_SHOW_AUXV=1 ./auxv64 AT_SYSINFO_EHDR: 0x7fff89b11000 AT_HWCAP: febfbff AT_PAGESZ: 4096 AT_CLKTCK: 100 AT_PHDR: 0x400040 AT_PHENT: 56 AT_PHNUM: 8 AT_BASE: 0x0 AT_FLAGS: 0x0 AT_ENTRY: 0x4003e0 AT_UID: 0 AT_EUID: 0 AT_GID: 0 AT_EGID: 0 AT_SECURE: 0 AT_RANDOM: 0x7fff89a07ca9 AT_EXECFN: ./auxv64 AT_PLATFORM: x86_64 AT_SYSINFO_EHDR is: 0x0x7fff89b11000 [root@localhost auxv]#
参考:
http://articles.manugarg.com/aboutelfauxiliaryvectors.html
http://lwn.net/Articles/519085/
http://www.gnu.org/software/libc/manual/html_node/Auxiliary-Vector.html
http://www.win.tue.nl/~aeb/linux/hh/hh-14.html
http://articles.manugarg.com/systemcallinlinux2_6.html
http://www.win.tue.nl/~aeb/linux/lk/lk-4.html
转载请保留地址:http://lenky.info/archives/2013/02/05/2203 或 http://lenky.info/?p=2203
备注:如无特殊说明,文章内容均出自Lenky个人的真实理解而并非存心妄自揣测来故意愚人耳目。由于个人水平有限,虽力求内容正确无误,但仍然难免出错,请勿见怪,如果可以则请留言告之,并欢迎来讨论。另外值得说明的是,Lenky的部分文章以及部分内容参考借鉴了网络上各位网友的热心分享,特别是一些带有完全参考的文章,其后附带的链接内容也许更直接、更丰富,而我只是做了一下归纳&转述,在此也一并表示感谢。关于本站的所有技术文章,欢迎转载,但请遵从CC创作共享协议,而一些私人性质较强的心情随笔,建议不要转载。
法律:根据最新颁布的《信息网络传播权保护条例》,如果您认为本文章的任何内容侵犯了您的权利,请以或书面等方式告知,本站将及时删除相关内容或链接。