1、遇到的問題
開發中常常會打印業務日志,這些日志在排查問題或跟蹤調用流程時非常有用,很多業務日志並沒有考慮排查問題時的便利性,看似都記錄了日志,一旦出現了線上問題,排查起來特別麻煩,很多時候又不方便讓用戶重現問題,造成解決問題周期長甚至是無法解決問題;如果日志記錄的好,可以根據關鍵字搜索得出一條調用鏈日志,順着這個日志鏈條就可以看出程序的執行全過程了。
2、如何記錄日志
當然,應用、環境、ip、日志時間、日志級別、線程、輸入輸出參數…這些都是必要便於排錯的日志信息就不多說了,重點說下如何記錄跟蹤一次調用流程的日志記錄方法。
一種方法是:打印日志時候指明業務線,比如支付的流程里,所有步驟業務日志統一前綴 "支付流程:userId:",那么排查問題時候就可以根據關鍵字和userId來快速得到某個userid在支付流程里的所有軌跡。
但是這種方法有如下問題:
1、所有log統一前綴實行起來不太容易
2、同一個userid的log可能會非常多,並不能按request來歸類
3、如果涉及到了公共方法調用(同時被多個業務流程調用),統一前綴很難實施
……
另一種方法是給每個request生成唯一的標識,給打印log的地方埋點這個標識,根據這個標識可以自動形成log鏈路。
自己埋點方式
使用filter(filter使用傳送門)給每個request生成唯一標識,在log里帶上這個標識,將一個request的log通過唯一標識串成一串。
@WebFilter(filterName = "myFilter",urlPatterns = "/*") public class LogFilter implements Filter { @Override public void init(FilterConfig filterConfig) throws ServletException { } @Override public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException { String replace = UUID.randomUUID().toString().replace("-", ""); LogHolder.setLogId(replace); LogUtil.info("thread:"+Thread.currentThread().getName() + ",uuid : " + replace+",url:"+((HttpServletRequest) request).getRequestURI()); chain.doFilter(request, response); LogHolder.remove(); } @Override public void destroy() { } }
LogUtil:
@Slf4j public class LogUtil { public static void info(String content, Object... args) { log.info(String.format("LogFlowId: %s , %s", LogHolder.getLogId(), content), args); } public static void warn(String content, Object... args) { log.info(String.format("LogFlowId: %s , %s", LogHolder.getLogId(), content), args); } public static void error(String content, Exception ex) { log.error(String.format("LogFlowId: %s , %s", LogHolder.getLogId(), content), ex);
}
}
業務場景:
@RequestMapping("/log")
public void writelog() throws InterruptedException {
LogUtil.info("request begin");
test();
LogUtil.info("request end");
}
private void test() throws InterruptedException {
long l = System.currentTimeMillis();
Thread.sleep(1000);
LogUtil.info("操作耗時:" + (System.currentTimeMillis() - l) + " ms");
}
輸出日志如下:

可以根據任意一步的logFlowId檢索出整個request生命周期內的所有log。
對於一般的情況都可以實現,但是如果程序里使用了線程池會不會有線程復用的問題?來驗證一下:
業務代碼修改如下:
@RequestMapping("/log")
public void writelog() {
LogUtil.info("request begin");
ExecuterServiceUtil.run(() -> test());
LogUtil.info("request end");
}
public void test() {
long l = System.currentTimeMillis();
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
LogUtil.info("操作耗時:" + (System.currentTimeMillis() - l) + " ms");
}

獲取不到設置在ThreadLocal里的logFlowId了,這是因為線程池重新開辟了子線程,子線程不能訪問主線程設置的logFlowId,可以使用 InheritableThreadLocal 來解決問題。
修改LogHolder為:
public class LogHolder { private LogHolder() { } public static final ThreadLocal<String> LOG_ID = new InheritableThreadLocal<>(); public static String getLogId() { return LOG_ID.get(); } public static void setLogId(String logId) { LOG_ID.set(logId); } public static void remove() { LOG_ID.remove(); } }
再次運行發現logFlowId有值了,但是卻重復了,很明顯是因為線程復用造成的問題。

最終解決辦法:
1、獲取主線程設置的logFlowId
2、執行前設置到子線程里
3、用完后清除子線程變量

輸出日志如下所示:

如果是分布式應用跟蹤,可以通過header透傳方式,將logFlowId傳入下一個應用里。
附錄
使用MDC
使用攔截器
使用前(preHandle)生成並設置log_trace_id,在使用完(afterCompletion)清除log_trace_id
public class LogInterceptor implements HandlerInterceptor { private static final String DEMO_LOG_TRACE_ID = "demo_log_trace_id"; @Override public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) throws Exception { MDC.put(DEMO_LOG_TRACE_ID, UUIdUtil.getUUID()); return true; } @Override public void postHandle(HttpServletRequest request, HttpServletResponse response, Object handler, ModelAndView modelAndView) throws Exception { } @Override public void afterCompletion(HttpServletRequest request, HttpServletResponse response, Object handler, Exception ex) throws Exception { MDC.remove(DEMO_LOG_TRACE_ID); } }
同樣的,如果使用線程池,會有類似的問題。我們繼承線程池方法,稍作修改
public class ThreadExecutorPoolUtils extends ThreadPoolTaskExecutor { /** * 所有線程都會委托給這個execute方法,在這個方法中我們把父線程的MDC內容賦值給子線程 * https://logback.qos.ch/manual/mdc.html#managedThreads * * @param runnable */ @Override public void execute(Runnable runnable) { // 獲取父線程MDC中的內容,必須在run方法之前,否則等異步線程執行的時候有可能MDC里面的值已經被清空了,這個時候就會返回null Map<String, String> context = MDC.getCopyOfContextMap(); super.execute(() -> run(runnable, context)); } @Override public Future<?> submit(Runnable runnable) { // 獲取父線程MDC中的內容,必須在run方法之前,否則等異步線程執行的時候有可能MDC里面的值已經被清空了,這個時候就會返回null Map<String, String> context = MDC.getCopyOfContextMap(); return super.submit(()->run(runnable, context)); } @Override public <T> Future<T> submit(Callable<T> callable) { // 獲取父線程MDC中的內容,必須在run方法之前,否則等異步線程執行的時候有可能MDC里面的值已經被清空了,這個時候就會返回null Map<String, String> context = MDC.getCopyOfContextMap(); return super.submit(() -> call(callable, context)); } /** * 子線程委托的執行方法 * * @param runnable {@link Runnable} * @param context 父線程MDC內容 */ private void run(Runnable runnable, Map<String, String> context) { // 將父線程的MDC內容傳給子線程 MDC.setContextMap(context); try { // 執行異步操作 runnable.run(); } finally { // 清空MDC內容 MDC.clear(); } } /** * 子線程委托的執行方法 * * @param callable {@link Callable} * @param context 父線程MDC內容 */ private <T> T call(Callable<T> callable, Map<String, String> context) throws Exception { // 將父線程的MDC內容傳給子線程 MDC.setContextMap(context); try { // 執行異步操作 return callable.call(); } finally { // 清空子線程MDC內容,防止線程復用問題 MDC.clear(); } } }
logback-xml配置
<pattern>[%d{yyyy-MM-dd HH:mm:ss.SSS}] [%thread] [%level] [%X{demo_log_trace_id}] %logger:%L x:\(%X\) - %msg%n</pattern>
輸出日志格式
2019-02-02 10:27:14.279 [INFO][Test-Thread-Pool3]:demo.web.api.DemoController [lambda$mutilLog$0:69] [d4676669641a4c6ab94f841d77c2b18e] Test-Thread-Pool3 runs >>>>>, activeAccount:4, corePoolSize:4, maxPoolSize:5, maxPoolSize:Test-Thread-Group, poolSize:4 2019-02-02 10:27:14.279 [INFO][Test-Thread-Pool3]:demo.web.api.DemoController [lambda$mutilLog$0:69] [d4676669641a4c6ab94f841d77c2b18e] Test-Thread-Pool3 runs >>>>>, activeAccount:4, corePoolSize:4, maxPoolSize:5, maxPoolSize:Test-Thread-Group, poolSize:4 2019-02-02 10:27:14.280 [INFO][Test-Thread-Pool3]:demo.web.api.DemoController [lambda$mutilLog$0:69] [d4676669641a4c6ab94f841d77c2b18e] Test-Thread-Pool3 runs >>>>>, activeAccount:4, corePoolSize:4, maxPoolSize:5, maxPoolSize:Test-Thread-Group, poolSize:4 2019-02-02 10:27:14.280 [INFO][Test-Thread-Pool3]:demo.web.api.DemoController [lambda$mutilLog$0:69] [d4676669641a4c6ab94f841d77c2b18e] Test-Thread-Pool3 runs >>>>>, activeAccount:4, corePoolSize:4, maxPoolSize:5, maxPoolSize:Test-Thread-Group, poolSize:4 2019-02-02 10:27:14.280 [INFO][Test-Thread-Pool3]:demo.web.api.DemoController [lambda$mutilLog$0:69] [d4676669641a4c6ab94f841d77c2b18e] Test-Thread-Pool3 runs >>>>>, activeAccount:4, corePoolSize:4, maxPoolSize:5, maxPoolSize:Test-Thread-Group, poolSize:4 2019-02-02 10:27:14.280 [INFO][Test-Thread-Pool3]:demo.web.api.DemoController [lambda$mutilLog$0:69] [d4676669641a4c6ab94f841d77c2b18e] Test-Thread-Pool3 runs >>>>>, activeAccount:4, corePoolSize:4, maxPoolSize:5, maxPoolSize:Test-Thread-Group, poolSize:4 2019-02-02 10:27:14.280 [INFO][Test-Thread-Pool3]:demo.web.api.DemoController [lambda$mutilLog$0:69] [d4676669641a4c6ab94f841d77c2b18e] Test-Thread-Pool3 runs >>>>>, activeAccount:4, corePoolSize:4, maxPoolSize:5, maxPoolSize:Test-Thread-Group, poolSize:4 2019-02-02 10:27:14.281 [INFO][Test-Thread-Pool3]:demo.web.api.DemoController [lambda$mutilLog$0:69] [d4676669641a4c6ab94f841d77c2b18e] Test-Thread-Pool3 runs >>>>>, activeAccount:4, corePoolSize:4, maxPoolSize:5, maxPoolSize:Test-Thread-Group, poolSize:4 2019-02-02 10:27:14.282 [INFO][Test-Thread-Pool3]:demo.web.api.DemoController [lambda$mutilLog$0:69] [d4676669641a4c6ab94f841d77c2b18e] Test-Thread-Pool3 runs >>>>>, activeAccount:4, corePoolSize:4, maxPoolSize:5, maxPoolSize:Test-Thread-Group, poolSize:4 2019-02-02 10:27:14.282 [INFO][Test-Thread-Pool3]:demo.web.api.DemoController [lambda$mutilLog$0:69] [d4676669641a4c6ab94f841d77c2b18e] Test-Thread-Pool3 runs >>>>>, activeAccount:4, corePoolSize:4, maxPoolSize:5, maxPoolSize:Test-Thread-Group, poolSize:4 2019-02-02 10:27:14.282 [INFO][Test-Thread-Pool3]:demo.web.api.DemoController [lambda$mutilLog$0:69] [d4676669641a4c6ab94f841d77c2b18e] Test-Thread-Pool3 runs >>>>>, activeAccount:4, corePoolSize:4, maxPoolSize:5, maxPoolSize:Test-Thread-Group, poolSize:4 2019-02-02 10:27:14.282 [INFO][Test-Thread-Pool3]:demo.web.api.DemoController [lambda$mutilLog$0:69] [d4676669641a4c6ab94f841d77c2b18e] Test-Thread-Pool3 runs >>>>>, activeAccount:4, corePoolSize:4, maxPoolSize:5, maxPoolSize:Test-Thread-Group, poolSize:4 2019-02-02 10:27:14.282 [INFO][Test-Thread-Pool3]:demo.web.api.DemoController [lambda$mutilLog$0:69] [d4676669641a4c6ab94f841d77c2b18e] Test-Thread-Pool3 runs >>>>>, activeAccount:4, corePoolSize:4, maxPoolSize:5, maxPoolSize:Test-Thread-Group, poolSize:4 2019-02-02 10:27:14.255 [INFO][Test-Thread-Pool4]:demo.web.api.DemoController [lambda$mutilLog$0:69] [d4676669641a4c6ab94f841d77c2b18e] Test-Thread-Pool4 runs >>>>>, activeAccount:4, corePoolSize:4, maxPoolSize:5, maxPoolSize:Test-Thread-Group, poolSize:4 2019-02-02 10:27:14.260 [INFO][Test-Thread-Pool2]:demo.web.api.DemoController [lambda$mutilLog$0:69] [d4676669641a4c6ab94f841d77c2b18e] Test-Thread-Pool2 runs >>>>>, activeAccount:4, corePoolSize:4, maxPoolSize:5, maxPoolSize:Test-Thread-Group, poolSize:4 2019-02-02 10:27:14.282 [INFO][Test-Thread-Pool4]:demo.web.api.DemoController [lambda$mutilLog$0:69] [d4676669641a4c6ab94f841d77c2b18e] Test-Thread-Pool4 runs >>>>>, activeAccount:4, corePoolSize:4, maxPoolSize:5, maxPoolSize:Test-Thread-Group, poolSize:4 2019-02-02 10:27:14.282 [INFO][Test-Thread-Pool3]:demo.web.api.DemoController [lambda$mutilLog$0:69] [d4676669641a4c6ab94f841d77c2b18e] Test-Thread-Pool3 runs >>>>>, activeAccount:4, corePoolSize:4, maxPoolSize:5, maxPoolSize:Test-Thread-Group, poolSize:4
