“ 进程被杀 ” 的原理


# 核心源码

关键类 路径
android_util_Process.cpp frameworks/base/core/jni/android_util_Process.cpp
processgroup.cpp system/core/libprocessgroup/processgroup.cpp
signal.c /kernel/kernel/signal.c
Process.java frameworks/base/core/java/android/os/Process.java


# 概述

“杀进程”是通过 发送 signal 信号 的方式来完成的。我们知道,创建进程是从 Process.start() 开始的,那么杀进程则相应从 Process.killProcess() 开始讲起,本篇文章我们就来聊聊进程被杀的原理,这个知识点很多大厂的面试官特别感兴趣,也是必须掌握的核心知识点。


一、用户层

Process.java 中有 3 种方法用于杀进程

// frameworks/base/core/java/android/os/Process.java

public class Process {

    /**
     * Kill the process with the given PID.
     * Note that, though this API allows us to request to
     * kill any process based on its PID, the kernel will
     * still impose standard restrictions on which PIDs you
     * are actually able to kill.  Typically this means only
     * the process running the caller's packages/application
     * and any additional processes created by that app; packages
     * sharing a common UID will also be able to kill each
     * other's processes.
     */
    public static final void killProcess(int pid) {
        sendSignal(pid, SIGNAL_KILL);
    }

    /**
     * @hide
     * Private impl for avoiding a log message...  DO NOT USE without doing
     * your own log, or the Android Illuminati will find you some night and
     * beat you up.
     */
    public static final void killProcessQuiet(int pid) {
        sendSignalQuiet(pid, SIGNAL_KILL);
    }

    /**
     * Kill all processes in a process group started for the given
     * pid.
     * @hide
     */
    public static final native int killProcessGroup(int uid, int pid);

}

1.1 Process.killProcess()

先来看下 killProcess 杀进程的方法:

// frameworks/base/core/java/android/os/Process.java

public class Process {

    // public static final int SIGNAL_KILL = 9;
    public static final void killProcess(int pid) {
        sendSignal(pid, SIGNAL_KILL);        // 调用 sendSignal() 方法
    }

    /**
     * Send a signal to the given process.
     * 
     * @param pid The pid of the target process.
     * @param signal The signal to send.
     */
    public static final native void sendSignal(int pid, int signal);    // native 方法

}

我们之前探讨 “Zygote” 和 “JNI” 时说过:虚拟机会注册各种 framework 所需的 JNI 方法,很多时候查询 Java 层的 native 方法所对应到的 native 层方法,可在路径 /framework/base/core/jni 中找到

这里的 sendSignal() 所对应的 JNI 方法是:android_util_Process.cpp 文件的 android_os_Process_SendSignal() 方法。

android_util_Process.android_os_Process_SendSignal()

// framework/base/core/jni/android_util_Process.cpp

void android_os_Process_sendSignal(JNIEnv* env, jobject clazz, jint pid, jint sig)
{
    if (pid > 0) {
        ALOGI("Sending signal. PID: %" PRId32 " SIG: %" PRId32, pid, sig);    // 打印 Signal 信息
        kill(pid, sig);                                                       // 调用 kill 方法
    }
}

1.2 Process.killProcessQuiet()

我们再来看看 killProcessQuiet() 方法:

// frameworks/base/core/java/android/os/Process.java

public class Process {

    public static final void killProcessQuiet(int pid) {
        sendSignalQuiet(pid, SIGNAL_KILL);
    }

    /**
     * @hide
     * Private impl for avoiding a log message...  DO NOT USE without doing
     * your own log, or the Android Illuminati will find you some night and
     * beat you up.
     */
    public static final native void sendSignalQuiet(int pid, int signal);

}

同样的,sendSignalQuiet() 所对应的 JNI 方法是:android_util_Process.cpp 文件的 android_os_Process_sendSignalQuiet() 方法。

android_util_Process.android_os_Process_sendSignalQuiet()

// framework/base/core/jni/android_util_Process.cpp

void android_os_Process_sendSignalQuiet(JNIEnv* env, jobject clazz, jint pid, jint sig)
{
    if (pid > 0) {
        kill(pid, sig);        // 调用 kill 方法
    }
}

其实 sendSignal()sendSignalQuiet() 方法的唯一区别就在于是否有 ALOGI() 这一行代码(是否打印 Log 信息),最终杀进程的实现方法都是调用 kill(pid, sig) 方法。

2.3 Process.killProcessGroup()

最后看下 killProcessGroup() 方法:

// frameworks/base/core/java/android/os/Process.java

public class Process {

    // 我们发现 killProcessGroup() 本身就是个 native 方法
    public static final native int killProcessGroup(int uid, int pid);

}

通过上面的分析,不然发现 killProcessGroup() 对应的 native 层方法是:android_os_Process_killProcessGroup()

android_util_Process.android_os_Process_killProcessGroup()

// framework/base/core/jni/android_util_Process.cpp

jint android_os_Process_killProcessGroup(JNIEnv* env, jobject clazz, jint uid, jint pid)
{
    return killProcessGroup(uid, pid, SIGKILL);
}

跟踪 killProcessGroup() 函数:位于 system/core/libprocessgroup/processgroup.cpp

// system/core/libprocessgroup/processgroup.cpp

int killProcessGroup(uid_t uid, int initialPid, int signal, int* max_processes) {
    return KillProcessGroup(uid, initialPid, signal, 40 /*retries*/, max_processes);    // 重启 40 次限定
}

继续跟踪 killProcessGroup()

// system/core/libprocessgroup/processgroup.cpp

static int KillProcessGroup(uid_t uid, int initialPid, int signal, int retries,
                            int* max_processes) {
    ... ...

    while ((processes = DoKillProcessGroupOnce(cgroup, uid, initialPid, signal)) > 0) {
        if (max_processes != nullptr && processes > *max_processes) {
            *max_processes = processes;
        }
        // 当还有进程未被杀死,则重试,最多40次
        LOG(VERBOSE) << "Killed " << processes << " processes for processgroup " << initialPid;
        if (retry > 0) {
            std::this_thread::sleep_for(5ms);
            --retry;
        } else {
            break;    // 重试40次,仍然没有杀死进程,代表杀进程失败
        }
    }
    ... ...
}

跟踪 DoKillProcessGroupOnce()

// system/core/libprocessgroup/processgroup.cpp

static int DoKillProcessGroupOnce(const char* cgroup, uid_t uid, int initialPid, int signal) {
    ... ...

    // GetOneAppProcess 方法的作用是从节点 /acct/uid_/pid_/cgroup.procs 中获取相应 pid,这里是进程,而非线程
    while (fscanf(fd.get(), "%d\n", &pid) == 1 && pid >= 0) {
        processes++;
        if (pid == 0) {
            LOG(WARNING) << "Yikes, we've been told to kill pid 0! How about we don't do that?";
            continue;    // 不会进入此分支
        }
        pid_t pgid = getpgid(pid);
        if (pgid == -1) PLOG(ERROR) << "getpgid(" << pid << ") failed";
        if (pgid == pid) {
            pgids.emplace(pid);
        } else {
            pids.emplace(pid);
        }
    }
    ... ...

    // Kill all process groups.
    for (const auto pgid : pgids) {
        LOG(VERBOSE) << "Killing process group " << -pgid << " in uid " << uid
                     << " as part of process cgroup " << initialPid;

        if (kill(-pgid, signal) == -1) {        // 调用 kill(pid, sig) 方法
            PLOG(WARNING) << "kill(" << -pgid << ", " << signal << ") failed";
        }
    }

    // Kill remaining pids.
    for (const auto pid : pids) {
        LOG(VERBOSE) << "Killing pid " << pid << " in uid " << uid << " as part of process cgroup "
                     << initialPid;
        if (kill(pid, signal) == -1) {          // 调用 kill(pid, sig) 方法
            PLOG(WARNING) << "kill(" << pid << ", " << signal << ") failed";
        }
    }

    // processes 代表总共杀死了进程组中的进程个数
    return feof(fd.get()) ? processes : -1;
}

DoKillProcessGroupOnce() 的功能是杀掉 uid 下,跟 initialPid 同一个进程组的所有进程。也就意味着通过 kill <pid> ,当 pid 是某个进程的子线程时,那么最终杀的仍是进程。

我们发现,最终杀进程的实现方法都是调用 kill(pid, sig) 方法。

2.5 小结

        ✎ Process.killProcess(int pid): 杀 pid 进程;

        ✎ Process.killProcessQuiet(int pid):杀 pid 进程,且不输出 log 信息;

        ✎ Process.killProcessGroup(int uid, int pid):杀同一个 uid 下同一进程组下的所有进程。

以上 3 个方法,最终杀进程的实现方法都是调用 kill(pid, sig) 方法,该方法位于用户空间的 Native 层,经过系统调用进入到 Linux 内核的 sys_kill 方法。对于杀进程此处的 sig = 9,其实与大家平时在 adb 里输入的 kill -9 <pid> 效果基本一致。

原理如如下:

杀进程流程图.png


二、内核层

在用户层,我们讨论 杀进程 的方法 kill(pid, sig),最终会调用到 Linux 内核的 sys_kill 方法,我们这边就从此方法开始继续分析(关于 Kernel 的源码,可以通过 https://code.woboq.org/linux/linux/ 查看)。

2.1 sys_kill - - SYSCALL_DEFINE2

sys_kill() 方法在 linux 内核中没有直接定义,而是通过宏定义 SYSCALL_DEFINE2 的方式来实现的。

// kernel/kernel/signal.c

SYSCALL_DEFINE2(kill, pid_t, pid, int, sig)
{
    struct kernel_siginfo info;

    prepare_kill_siginfo(sig, &info);

    return kill_something_info(sig, &info, pid);
}

SYSCALL_DEFINE2 是系统调用的宏定义,方法在此处经层层展开。

2.2 kill_something_info()

跟踪 kill_something_info() 函数:

// kernel/kernel/signal.c

static int kill_something_info(int sig, struct kernel_siginfo *info, pid_t pid)
{
   int ret;

    if (pid > 0) {
        rcu_read_lock();
        // 当 pid > 0 时,则发送给 pid 所对应的进程
        ret = kill_pid_info(sig, info, find_vpid(pid));
        rcu_read_unlock();
        return ret;
    }

    /* -INT_MIN is undefined.  Exclude this case to avoid a UBSAN warning */
    if (pid == INT_MIN)
        return -ESRCH;

    read_lock(&tasklist_lock);
    if (pid != -1) {
        // 当 pid = 0 时,则发送给当前进程组,当 pid < -1 时,则发送给 -pid 所对应的进程
        ret = __kill_pgrp_info(sig, info, pid ? find_vpid(-pid) : task_pgrp(current));
    } else {
        // 当 pid = -1 时,则发送给所有进程
        int retval = 0, count = 0;
        struct task_struct * p;

        for_each_process(p) {
            if (task_pid_vnr(p) > 1 &&
                    !same_thread_group(p, current)) {
                int err = group_send_sig_info(sig, info, p, PIDTYPE_MAX);
                ++count;
                if (err != -EPERM)
                    retval = err;
            }
        }
        ret = count ? retval : -ESRCH;
    }
    read_unlock(&tasklist_lock);

    return ret;
}

2.3 kill_pid_info()

// kernel/kernel/signal.c

int kill_pid_info(int sig, struct kernel_siginfo *info, struct pid *pid)
{
    int error = -ESRCH;
    struct task_struct *p;

    for (;;) {
        rcu_read_lock();
        p = pid_task(pid, PIDTYPE_PID);    // 根据 pid 查询到 task 结构体
        if (p)
            error = group_send_sig_info(sig, info, p, PIDTYPE_TGID);
        rcu_read_unlock();
        if (likely(!p || error != -ESRCH))
            return error;
    }
}

2.4 group_send_sig_info()

// kernel/kernel/signal.c

int group_send_sig_info(int sig, struct kernel_siginfo *info,
                struct task_struct *p, enum pid_type type)
{
    int ret;

    rcu_read_lock();
    ret = check_kill_permission(sig, info, p);       // 检查 sig 是否合法以及隐私等权限问题
    rcu_read_unlock();

    if (!ret && sig)
        ret = do_send_sig_info(sig, info, p, type);  // 调用 do_send_sig_info() 函数

    return ret;
}

2.5 do_send_sig_info()

// kernel/kernel/signal.c

int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p, enum pid_type type)
{
    unsigned long flags;
    int ret = -ESRCH;

    if (lock_task_sighand(p, &flags)) {
        ret = send_signal(sig, info, p, type);       // 调用 send_signal() 函数
        unlock_task_sighand(p, &flags);
    }

    return ret;
}

2.6 send_signal()

// kernel/kernel/signal.c

static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t, enum pid_type type)
{
    int from_ancestor_ns = 0;

#ifdef CONFIG_PID_NS
    from_ancestor_ns = si_fromuser(info) &&
               !task_pid_nr_ns(current, task_active_pid_ns(t));
#endif

    return __send_signal(sig, info, t, type, from_ancestor_ns);    // 调用 __send_signal() 函数
}

2.7 __send_signal()

// kernel/kernel/signal.c

static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
                enum pid_type type, int from_ancestor_ns)
{
    ... ...

out_set:
    signalfd_notify(t, sig);             // 将信号 sig 传递给正处于监听状态的 signalfd
    sigaddset(&pending->signal, sig);    // 向信号集中加入信号 sig

    /* Let multiprocess signals appear after on-going forks */
    if (type > PIDTYPE_TGID) {
        struct multiprocess_signals *delayed;
        hlist_for_each_entry(delayed, &t->signal->multiprocess, node) {
            sigset_t *signal = &delayed->signal;
            /* Can't queue both a stop and a continue signal */
            if (sig == SIGCONT)
                sigdelsetmask(signal, SIG_KERNEL_STOP_MASK);
            else if (sig_kernel_stop(sig))
                sigdelset(signal, SIGCONT);
            sigaddset(signal, sig);
        }
    }

    complete_signal(sig, t, type);    // 完成信号过程
ret:
    trace_signal_generate(sig, info, t, type != PIDTYPE_PID, result);
    return ret;
}

2.8 complete_signal()

// kernel/kernel/signal.c

static void complete_signal(int sig, struct task_struct *p, enum pid_type type)
{
    struct signal_struct *signal = p->signal;
    struct task_struct *t;

    // 查找能处理该信号的线程
    if (wants_signal(sig, p))
        t = p;
    else if ((type == PIDTYPE_PID) || thread_group_empty(p))
        return;
    else {
        // 递归查找适合的线程
        t = signal->curr_target;
        while (!wants_signal(sig, t)) {
            t = next_thread(t);
            if (t == signal->curr_target)
                return;
        }
        signal->curr_target = t;
    }

    // 找到一个能被杀掉的线程,如果这个信号是 SIGKILL,则立刻干掉整个线程组
    if (sig_fatal(p, sig) &&
          !(signal->flags & SIGNAL_GROUP_EXIT) &&
          !sigismember(&t->real_blocked, sig) &&
          (sig == SIGKILL || !p->ptrace)) {
        // 信号将终结整个线程组
        if (!sig_kernel_coredump(sig)) {
            signal->flags = SIGNAL_GROUP_EXIT;
            signal->group_exit_code = sig;
            signal->group_stop_count = 0;
            t = p;
            // 遍历整个线程组,全部结束
            do {
                task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK);
                // 向信号集中加入信号 SIGKILL
                sigaddset(&t->pending.signal, SIGKILL);
                signal_wake_up(t, 1);
            } while_each_thread(p, t);
            return;
        }
    }

    // 该信号处于共享队列里(即将要处理的),唤醒已选中的目标线程,并将该信号移出队列
    signal_wake_up(t, sig == SIGKILL);
    return;
}

2.9 小结

到此 Signal 信号已发送给目标线程,我们看下流程图:

杀进程流程图.png

流程分为 用户空间(User Space)内核空间(Kernel Space)。从用户空间进入内核空间需要向内核发出 syscall,用户空间的程序通过各种 syscall 来调用用内核空间相应的服务。系统调用是为了让用户空间的程序陷入内核,该陷入动作是由 软中断 来完成的。用户态的进程进行系统调用后,CPU 切换到内核态,开始执行内核函数。unistd.h 文件中定义了所有的系统中断号,用户态程序通过不同的系统调用号来调用不同的内核服务,通过系统调用号从系统调用表中查看到相应的内核服务。