1 0 stevel /* 2 0 stevel * CDDL HEADER START 3 0 stevel * 4 0 stevel * The contents of this file are subject to the terms of the 5 5084 johnlev * Common Development and Distribution License (the "License"). 6 5084 johnlev * You may not use this file except in compliance with the License. 7 0 stevel * 8 0 stevel * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE 9 0 stevel * or http://www.opensolaris.org/os/licensing. 10 0 stevel * See the License for the specific language governing permissions 11 0 stevel * and limitations under the License. 12 0 stevel * 13 0 stevel * When distributing Covered Code, include this CDDL HEADER in each 14 0 stevel * file and include the License file at usr/src/OPENSOLARIS.LICENSE. 15 0 stevel * If applicable, add the following below this CDDL HEADER, with the 16 0 stevel * fields enclosed by brackets "[]" replaced with your own identifying 17 0 stevel * information: Portions Copyright [yyyy] [name of copyright owner] 18 0 stevel * 19 0 stevel * CDDL HEADER END 20 0 stevel */ 21 0 stevel /* 22 9160 Sherry * Copyright 2009 Sun Microsystems, Inc. All rights reserved. 23 0 stevel * Use is subject to license terms. 24 0 stevel */ 25 0 stevel 26 0 stevel /* 27 0 stevel * When the operating system detects that it is in an invalid state, a panic 28 0 stevel * is initiated in order to minimize potential damage to user data and to 29 0 stevel * facilitate debugging. There are three major tasks to be performed in 30 0 stevel * a system panic: recording information about the panic in memory (and thus 31 0 stevel * making it part of the crash dump), synchronizing the file systems to 32 0 stevel * preserve user file data, and generating the crash dump. We define the 33 0 stevel * system to be in one of four states with respect to the panic code: 34 0 stevel * 35 0 stevel * CALM - the state of the system prior to any thread initiating a panic 36 0 stevel * 37 0 stevel * QUIESCE - the state of the system when the first thread to initiate 38 0 stevel * a system panic records information about the cause of the panic 39 0 stevel * and renders the system quiescent by stopping other processors 40 0 stevel * 41 0 stevel * SYNC - the state of the system when we synchronize the file systems 42 0 stevel * DUMP - the state when we generate the crash dump. 43 0 stevel * 44 0 stevel * The transitions between these states are irreversible: once we begin 45 0 stevel * panicking, we only make one attempt to perform the actions associated with 46 0 stevel * each state. 47 0 stevel * 48 0 stevel * The panic code itself must be re-entrant because actions taken during any 49 0 stevel * state may lead to another system panic. Additionally, any Solaris 50 0 stevel * thread may initiate a panic at any time, and so we must have synchronization 51 0 stevel * between threads which attempt to initiate a state transition simultaneously. 52 0 stevel * The panic code makes use of a special locking primitive, a trigger, to 53 0 stevel * perform this synchronization. A trigger is simply a word which is set 54 0 stevel * atomically and can only be set once. We declare three triggers, one for 55 0 stevel * each transition between the four states. When a thread enters the panic 56 0 stevel * code it attempts to set each trigger; if it fails it moves on to the 57 0 stevel * next trigger. A special case is the first trigger: if two threads race 58 0 stevel * to perform the transition to QUIESCE, the losing thread may execute before 59 0 stevel * the winner has a chance to stop its CPU. To solve this problem, we have 60 0 stevel * the loser look ahead to see if any other triggers are set; if not, it 61 0 stevel * presumes a panic is underway and simply spins. Unfortunately, since we 62 0 stevel * are panicking, it is not possible to know this with absolute certainty. 63 0 stevel * 64 0 stevel * There are two common reasons for re-entering the panic code once a panic 65 0 stevel * has been initiated: (1) after we debug_enter() at the end of QUIESCE, 66 0 stevel * the operator may type "sync" instead of "go", and the PROM's sync callback 67 0 stevel * routine will invoke panic(); (2) if the clock routine decides that sync 68 0 stevel * or dump is not making progress, it will invoke panic() to force a timeout. 69 0 stevel * The design assumes that a third possibility, another thread causing an 70 0 stevel * unrelated panic while sync or dump is still underway, is extremely unlikely. 71 0 stevel * If this situation occurs, we may end up triggering dump while sync is 72 0 stevel * still in progress. This third case is considered extremely unlikely because 73 0 stevel * all other CPUs are stopped and low-level interrupts have been blocked. 74 0 stevel * 75 0 stevel * The panic code is entered via a call directly to the vpanic() function, 76 0 stevel * or its varargs wrappers panic() and cmn_err(9F). The vpanic routine 77 0 stevel * is implemented in assembly language to record the current machine 78 0 stevel * registers, attempt to set the trigger for the QUIESCE state, and 79 0 stevel * if successful, switch stacks on to the panic_stack before calling into 80 0 stevel * the common panicsys() routine. The first thread to initiate a panic 81 0 stevel * is allowed to make use of the reserved panic_stack so that executing 82 0 stevel * the panic code itself does not overwrite valuable data on that thread's 83 0 stevel * stack *ahead* of the current stack pointer. This data will be preserved 84 0 stevel * in the crash dump and may prove invaluable in determining what this 85 0 stevel * thread has previously been doing. The first thread, saved in panic_thread, 86 0 stevel * is also responsible for stopping the other CPUs as quickly as possible, 87 0 stevel * and then setting the various panic_* variables. Most important among 88 0 stevel * these is panicstr, which allows threads to subsequently bypass held 89 0 stevel * locks so that we can proceed without ever blocking. We must stop the 90 0 stevel * other CPUs *prior* to setting panicstr in case threads running there are 91 0 stevel * currently spinning to acquire a lock; we want that state to be preserved. 92 0 stevel * Every thread which initiates a panic has its T_PANIC flag set so we can 93 0 stevel * identify all such threads in the crash dump. 94 0 stevel * 95 0 stevel * The panic_thread is also allowed to make use of the special memory buffer 96 0 stevel * panicbuf, which on machines with appropriate hardware is preserved across 97 0 stevel * reboots. We allow the panic_thread to store its register set and panic 98 0 stevel * message in this buffer, so even if we fail to obtain a crash dump we will 99 0 stevel * be able to examine the machine after reboot and determine some of the 100 0 stevel * state at the time of the panic. If we do get a dump, the panic buffer 101 0 stevel * data is structured so that a debugger can easily consume the information 102 0 stevel * therein (see <sys/panic.h>). 103 0 stevel * 104 0 stevel * Each platform or architecture is required to implement the functions 105 0 stevel * panic_savetrap() to record trap-specific information to panicbuf, 106 0 stevel * panic_saveregs() to record a register set to panicbuf, panic_stopcpus() 107 0 stevel * to halt all CPUs but the panicking CPU, panic_quiesce_hw() to perform 108 0 stevel * miscellaneous platform-specific tasks *after* panicstr is set, 109 0 stevel * panic_showtrap() to print trap-specific information to the console, 110 0 stevel * and panic_dump_hw() to perform platform tasks prior to calling dumpsys(). 111 0 stevel * 112 0 stevel * A Note on Word Formation, courtesy of the Oxford Guide to English Usage: 113 0 stevel * 114 0 stevel * Words ending in -c interpose k before suffixes which otherwise would 115 0 stevel * indicate a soft c, and thus the verb and adjective forms of 'panic' are 116 0 stevel * spelled "panicked", "panicking", and "panicky" respectively. Use of 117 0 stevel * the ill-conceived "panicing" and "panic'd" is discouraged. 118 0 stevel */ 119 0 stevel 120 0 stevel #include <sys/types.h> 121 0 stevel #include <sys/varargs.h> 122 0 stevel #include <sys/sysmacros.h> 123 0 stevel #include <sys/cmn_err.h> 124 0 stevel #include <sys/cpuvar.h> 125 0 stevel #include <sys/thread.h> 126 0 stevel #include <sys/t_lock.h> 127 0 stevel #include <sys/cred.h> 128 0 stevel #include <sys/systm.h> 129 5084 johnlev #include <sys/archsystm.h> 130 0 stevel #include <sys/uadmin.h> 131 0 stevel #include <sys/callb.h> 132 0 stevel #include <sys/vfs.h> 133 0 stevel #include <sys/log.h> 134 0 stevel #include <sys/disp.h> 135 0 stevel #include <sys/param.h> 136 0 stevel #include <sys/dumphdr.h> 137 0 stevel #include <sys/ftrace.h> 138 0 stevel #include <sys/reboot.h> 139 0 stevel #include <sys/debug.h> 140 0 stevel #include <sys/stack.h> 141 0 stevel #include <sys/spl.h> 142 0 stevel #include <sys/errorq.h> 143 0 stevel #include <sys/panic.h> 144 1414 cindi #include <sys/fm/util.h> 145 11066 rafael #include <sys/clock_impl.h> 146 0 stevel 147 0 stevel /* 148 0 stevel * Panic variables which are set once during the QUIESCE state by the 149 0 stevel * first thread to initiate a panic. These are examined by post-mortem 150 0 stevel * debugging tools; the inconsistent use of 'panic' versus 'panic_' in 151 0 stevel * the variable naming is historical and allows legacy tools to work. 152 0 stevel */ 153 0 stevel #pragma align STACK_ALIGN(panic_stack) 154 0 stevel char panic_stack[PANICSTKSIZE]; /* reserved stack for panic_thread */ 155 0 stevel kthread_t *panic_thread; /* first thread to call panicsys() */ 156 0 stevel cpu_t panic_cpu; /* cpu from first call to panicsys() */ 157 0 stevel label_t panic_regs; /* setjmp label from panic_thread */ 158 0 stevel struct regs *panic_reg; /* regs struct from first panicsys() */ 159 0 stevel char *volatile panicstr; /* format string to first panicsys() */ 160 0 stevel va_list panicargs; /* arguments to first panicsys() */ 161 0 stevel clock_t panic_lbolt; /* lbolt at time of panic */ 162 0 stevel int64_t panic_lbolt64; /* lbolt64 at time of panic */ 163 0 stevel hrtime_t panic_hrtime; /* hrtime at time of panic */ 164 0 stevel timespec_t panic_hrestime; /* hrestime at time of panic */ 165 0 stevel int panic_ipl; /* ipl on panic_cpu at time of panic */ 166 0 stevel ushort_t panic_schedflag; /* t_schedflag for panic_thread */ 167 0 stevel cpu_t *panic_bound_cpu; /* t_bound_cpu for panic_thread */ 168 0 stevel char panic_preempt; /* t_preempt for panic_thread */ 169 0 stevel 170 0 stevel /* 171 0 stevel * Panic variables which can be set via /etc/system or patched while 172 0 stevel * the system is in operation. Again, the stupid names are historic. 173 0 stevel */ 174 0 stevel char *panic_bootstr = NULL; /* mdboot string to use after panic */ 175 0 stevel int panic_bootfcn = AD_BOOT; /* mdboot function to use after panic */ 176 0 stevel int halt_on_panic = 0; /* halt after dump instead of reboot? */ 177 0 stevel int nopanicdebug = 0; /* reboot instead of call debugger? */ 178 0 stevel int in_sync = 0; /* skip vfs_syncall() and just dump? */ 179 0 stevel 180 0 stevel /* 181 0 stevel * The do_polled_io flag is set by the panic code to inform the SCSI subsystem 182 0 stevel * to use polled mode instead of interrupt-driven i/o. 183 0 stevel */ 184 0 stevel int do_polled_io = 0; 185 0 stevel 186 0 stevel /* 187 0 stevel * The panic_forced flag is set by the uadmin A_DUMP code to inform the 188 0 stevel * panic subsystem that it should not attempt an initial debug_enter. 189 0 stevel */ 190 0 stevel int panic_forced = 0; 191 0 stevel 192 0 stevel /* 193 0 stevel * Triggers for panic state transitions: 194 0 stevel */ 195 0 stevel int panic_quiesce; /* trigger for CALM -> QUIESCE */ 196 0 stevel int panic_sync; /* trigger for QUIESCE -> SYNC */ 197 0 stevel int panic_dump; /* trigger for SYNC -> DUMP */ 198 9160 Sherry 199 9160 Sherry /* 200 9160 Sherry * Variable signifying quiesce(9E) is in progress. 201 9160 Sherry */ 202 9160 Sherry volatile int quiesce_active = 0; 203 0 stevel 204 0 stevel void 205 0 stevel panicsys(const char *format, va_list alist, struct regs *rp, int on_panic_stack) 206 0 stevel { 207 0 stevel int s = spl8(); 208 0 stevel kthread_t *t = curthread; 209 0 stevel cpu_t *cp = CPU; 210 0 stevel 211 0 stevel caddr_t intr_stack = NULL; 212 0 stevel uint_t intr_actv; 213 0 stevel 214 0 stevel ushort_t schedflag = t->t_schedflag; 215 0 stevel cpu_t *bound_cpu = t->t_bound_cpu; 216 0 stevel char preempt = t->t_preempt; 217 0 stevel 218 0 stevel (void) setjmp(&t->t_pcb); 219 0 stevel t->t_flag |= T_PANIC; 220 0 stevel 221 0 stevel t->t_schedflag |= TS_DONT_SWAP; 222 0 stevel t->t_bound_cpu = cp; 223 0 stevel t->t_preempt++; 224 0 stevel 225 11066 rafael /* 226 11066 rafael * Switch lbolt to event driven mode. 227 11066 rafael */ 228 11066 rafael lbolt_hybrid = lbolt_event_driven; 229 11066 rafael 230 0 stevel panic_enter_hw(s); 231 0 stevel 232 0 stevel /* 233 0 stevel * If we're on the interrupt stack and an interrupt thread is available 234 0 stevel * in this CPU's pool, preserve the interrupt stack by detaching an 235 0 stevel * interrupt thread and making its stack the intr_stack. 236 0 stevel */ 237 0 stevel if (CPU_ON_INTR(cp) && cp->cpu_intr_thread != NULL) { 238 0 stevel kthread_t *it = cp->cpu_intr_thread; 239 0 stevel 240 0 stevel intr_stack = cp->cpu_intr_stack; 241 0 stevel intr_actv = cp->cpu_intr_actv; 242 0 stevel 243 0 stevel cp->cpu_intr_stack = thread_stk_init(it->t_stk); 244 0 stevel cp->cpu_intr_thread = it->t_link; 245 0 stevel 246 0 stevel /* 247 0 stevel * Clear only the high level bits of cpu_intr_actv. 248 0 stevel * We want to indicate that high-level interrupts are 249 0 stevel * not active without destroying the low-level interrupt 250 0 stevel * information stored there. 251 0 stevel */ 252 0 stevel cp->cpu_intr_actv &= ((1 << (LOCK_LEVEL + 1)) - 1); 253 0 stevel } 254 0 stevel 255 0 stevel /* 256 0 stevel * Record one-time panic information and quiesce the other CPUs. 257 0 stevel * Then print out the panic message and stack trace. 258 0 stevel */ 259 0 stevel if (on_panic_stack) { 260 0 stevel panic_data_t *pdp = (panic_data_t *)panicbuf; 261 0 stevel 262 0 stevel pdp->pd_version = PANICBUFVERS; 263 0 stevel pdp->pd_msgoff = sizeof (panic_data_t) - sizeof (panic_nv_t); 264 0 stevel 265 0 stevel if (t->t_panic_trap != NULL) 266 0 stevel panic_savetrap(pdp, t->t_panic_trap); 267 0 stevel else 268 0 stevel panic_saveregs(pdp, rp); 269 0 stevel 270 0 stevel (void) vsnprintf(&panicbuf[pdp->pd_msgoff], 271 0 stevel PANICBUFSIZE - pdp->pd_msgoff, format, alist); 272 0 stevel 273 0 stevel /* 274 0 stevel * Call into the platform code to stop the other CPUs. 275 0 stevel * We currently have all interrupts blocked, and expect that 276 0 stevel * the platform code will lower ipl only as far as needed to 277 0 stevel * perform cross-calls, and will acquire as *few* locks as is 278 0 stevel * possible -- panicstr is not set so we can still deadlock. 279 0 stevel */ 280 0 stevel panic_stopcpus(cp, t, s); 281 0 stevel 282 0 stevel panicstr = (char *)format; 283 0 stevel va_copy(panicargs, alist); 284 11099 rafael panic_lbolt = LBOLT_NO_ACCOUNT; 285 11099 rafael panic_lbolt64 = LBOLT_NO_ACCOUNT64; 286 0 stevel panic_hrestime = hrestime; 287 0 stevel panic_hrtime = gethrtime_waitfree(); 288 0 stevel panic_thread = t; 289 0 stevel panic_regs = t->t_pcb; 290 0 stevel panic_reg = rp; 291 0 stevel panic_cpu = *cp; 292 0 stevel panic_ipl = spltoipl(s); 293 0 stevel panic_schedflag = schedflag; 294 0 stevel panic_bound_cpu = bound_cpu; 295 0 stevel panic_preempt = preempt; 296 0 stevel 297 0 stevel if (intr_stack != NULL) { 298 0 stevel panic_cpu.cpu_intr_stack = intr_stack; 299 0 stevel panic_cpu.cpu_intr_actv = intr_actv; 300 0 stevel } 301 0 stevel 302 0 stevel /* 303 0 stevel * Lower ipl to 10 to keep clock() from running, but allow 304 0 stevel * keyboard interrupts to enter the debugger. These callbacks 305 0 stevel * are executed with panicstr set so they can bypass locks. 306 0 stevel */ 307 0 stevel splx(ipltospl(CLOCK_LEVEL)); 308 0 stevel panic_quiesce_hw(pdp); 309 0 stevel (void) FTRACE_STOP(); 310 0 stevel (void) callb_execute_class(CB_CL_PANIC, NULL); 311 0 stevel 312 5630 jbeck if (log_intrq != NULL) 313 5630 jbeck log_flushq(log_intrq); 314 5630 jbeck 315 5630 jbeck /* 316 5630 jbeck * If log_consq has been initialized and syslogd has started, 317 5630 jbeck * print any messages in log_consq that haven't been consumed. 318 5630 jbeck */ 319 5630 jbeck if (log_consq != NULL && log_consq != log_backlogq) 320 5630 jbeck log_printq(log_consq); 321 5630 jbeck 322 0 stevel fm_banner(); 323 0 stevel 324 5084 johnlev #if defined(__x86) 325 5084 johnlev /* 326 5084 johnlev * A hypervisor panic originates outside of Solaris, so we 327 5084 johnlev * don't want to prepend the panic message with misleading 328 5084 johnlev * pointers from within Solaris. 329 5084 johnlev */ 330 5084 johnlev if (!IN_XPV_PANIC()) 331 5084 johnlev #endif 332 5084 johnlev printf("\n\rpanic[cpu%d]/thread=%p: ", cp->cpu_id, 333 5084 johnlev (void *)t); 334 0 stevel vprintf(format, alist); 335 0 stevel printf("\n\n"); 336 0 stevel 337 0 stevel if (t->t_panic_trap != NULL) { 338 0 stevel panic_showtrap(t->t_panic_trap); 339 0 stevel printf("\n"); 340 0 stevel } 341 0 stevel 342 0 stevel traceregs(rp); 343 0 stevel printf("\n"); 344 0 stevel 345 0 stevel if (((boothowto & RB_DEBUG) || obpdebug) && 346 0 stevel !nopanicdebug && !panic_forced) { 347 0 stevel if (dumpvp != NULL) { 348 0 stevel debug_enter("panic: entering debugger " 349 0 stevel "(continue to save dump)"); 350 0 stevel } else { 351 0 stevel debug_enter("panic: entering debugger " 352 0 stevel "(no dump device, continue to reboot)"); 353 0 stevel } 354 0 stevel } 355 0 stevel 356 0 stevel } else if (panic_dump != 0 || panic_sync != 0 || panicstr != NULL) { 357 0 stevel printf("\n\rpanic[cpu%d]/thread=%p: ", cp->cpu_id, (void *)t); 358 0 stevel vprintf(format, alist); 359 0 stevel printf("\n"); 360 0 stevel } else 361 0 stevel goto spin; 362 0 stevel 363 0 stevel /* 364 0 stevel * Prior to performing sync or dump, we make sure that do_polled_io is 365 0 stevel * set, but we'll leave ipl at 10; deadman(), a CY_HIGH_LEVEL cyclic, 366 0 stevel * will re-enter panic if we are not making progress with sync or dump. 367 0 stevel */ 368 0 stevel 369 0 stevel /* 370 0 stevel * Sync the filesystems. Reset t_cred if not set because much of 371 0 stevel * the filesystem code depends on CRED() being valid. 372 0 stevel */ 373 0 stevel if (!in_sync && panic_trigger(&panic_sync)) { 374 0 stevel if (t->t_cred == NULL) 375 0 stevel t->t_cred = kcred; 376 0 stevel splx(ipltospl(CLOCK_LEVEL)); 377 0 stevel do_polled_io = 1; 378 0 stevel vfs_syncall(); 379 0 stevel } 380 0 stevel 381 0 stevel /* 382 0 stevel * Take the crash dump. If the dump trigger is already set, try to 383 0 stevel * enter the debugger again before rebooting the system. 384 0 stevel */ 385 0 stevel if (panic_trigger(&panic_dump)) { 386 0 stevel panic_dump_hw(s); 387 0 stevel splx(ipltospl(CLOCK_LEVEL)); 388 5197 stephh errorq_panic(); 389 0 stevel do_polled_io = 1; 390 0 stevel dumpsys(); 391 0 stevel } else if (((boothowto & RB_DEBUG) || obpdebug) && !nopanicdebug) { 392 0 stevel debug_enter("panic: entering debugger (continue to reboot)"); 393 0 stevel } else 394 0 stevel printf("dump aborted: please record the above information!\n"); 395 0 stevel 396 0 stevel if (halt_on_panic) 397 136 achartre mdboot(A_REBOOT, AD_HALT, NULL, B_FALSE); 398 0 stevel else 399 136 achartre mdboot(A_REBOOT, panic_bootfcn, panic_bootstr, B_FALSE); 400 0 stevel spin: 401 0 stevel /* 402 0 stevel * Restore ipl to at most CLOCK_LEVEL so we don't end up spinning 403 0 stevel * and unable to jump into the debugger. 404 0 stevel */ 405 0 stevel splx(MIN(s, ipltospl(CLOCK_LEVEL))); 406 5084 johnlev for (;;) 407 5084 johnlev ; 408 0 stevel } 409 0 stevel 410 0 stevel void 411 0 stevel panic(const char *format, ...) 412 0 stevel { 413 0 stevel va_list alist; 414 0 stevel 415 0 stevel va_start(alist, format); 416 0 stevel vpanic(format, alist); 417 0 stevel va_end(alist); 418 0 stevel } 419