记录一次azkaban的报错


异常信息:


2020/08/05 01:13:36.673 +0800 INFO [ExecutorServlet] [Azkaban] User null has called action log on 6
2020/08/05 01:13:36.752 +0800 INFO [ExecutorServlet] [Azkaban] User null has called action log on 6
2020/08/05 01:14:06.776 +0800 INFO [ExecutorServlet] [Azkaban] User null has called action log on 6
[emo@hadoop102 executor]$ java.lang.RuntimeException: azkaban.jobExecutor.utils.process.ProcessFailureException
        at azkaban.jobExecutor.ProcessJob.run(ProcessJob.java:94)
        at azkaban.execapp.JobRunner.runJob(JobRunner.java:516)
        at azkaban.execapp.JobRunner.run(JobRunner.java:436)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: azkaban.jobExecutor.utils.process.ProcessFailureException
        at azkaban.jobExecutor.utils.process.AzkabanProcess.run(AzkabanProcess.java:98)
        at azkaban.jobExecutor.ProcessJob.run(ProcessJob.java:88)
        ... 7 more
2020/08/05 01:14:14.692 +0800 INFO [export] [Azkaban] Job import finished with status FAILED in 41 seconds
2020/08/05 01:14:14.695 +0800 INFO [export] [Azkaban] Setting export to FAILED_FINISHING
2020/08/05 01:14:14.696 +0800 INFO [export] [Azkaban] Cancelling 'ods' due to prior errors.
2020/08/05 01:14:14.822 +0800 INFO [export] [Azkaban] Cancelling 'dwd' due to prior errors.
2020/08/05 01:14:14.828 +0800 INFO [export] [Azkaban] Cancelling 'dws' due to prior errors.
2020/08/05 01:14:14.830 +0800 INFO [export] [Azkaban] No attachment file for job import written.
2020/08/05 01:14:14.837 +0800 INFO [export] [Azkaban] Cancelling 'ads' due to prior errors.
2020/08/05 01:14:14.844 +0800 INFO [export] [Azkaban] Cancelling 'export' due to prior errors.
2020/08/05 01:14:14.850 +0800 INFO [export] [Azkaban] Setting flow '' status to FAILED in 41 seconds
2020/08/05 01:14:14.856 +0800 INFO [export] [Azkaban] Finishing up flow. Awaiting Termination
2020/08/05 01:14:14.863 +0800 INFO [export] [Azkaban] Finished Flow
2020/08/05 01:14:14.863 +0800 INFO [export] [Azkaban] Setting end time for flow 6 to 1596561254863
2020/08/05 01:14:14.872 +0800 INFO [FlowRunnerManager] [Azkaban] Flow 6 is finished. Adding it to recently finished flows list.
2020/08/05 01:14:57.777 +0800 INFO [FlowRunnerManager] [Azkaban] Cleaning recently finished
2020/08/05 01:16:57.831 +0800 INFO [FlowRunnerManager] [Azkaban] Cleaning recently finished
2020/08/05 01:16:57.831 +0800 INFO [FlowRunnerManager] [Azkaban] Cleaning execution 6 from recently finished flows list.
2020/08/05 01:16:57.831 +0800 INFO [FlowRunnerManager] [Azkaban] Cleaning old projects

这种异常的排查:
1、在executor、server 确保都配置没有问题的情况下,要排查上传的文件的格式是否都为:unix,务必确保全是;

2、查看Job List/Details,看到一串串的ERROR都没很明确的提示到真正的错误,最后看到了熟悉的:Safe mode

The number of live datanodes 3 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
可能是因为内存不足,或者外部原因直接断电关机造成了块文件的丢失达到了阈值,所以集群自动进入了safe mode;
解决方案---强制退出安全模式:hadoop dfsadmin -safemode leave
绿了、绿了............



免责声明!

本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系本站邮箱yoyou2525@163.com删除。



 
粤ICP备18138465号  © 2018-2025 CODEPRJ.COM