Home | History | Annotate | only in /onnv/onnv-gate/usr/src/cmd/cmd-inet/sbin/dhcpagent
Up to higher level directory
NameDateSize
adopt.c13-Oct-20099.7K
agent.c13-Oct-200941.9K
agent.h08-Dec-20085.4K
async.c08-Dec-20082.9K
async.h08-Dec-20081.8K
bound.c15-May-200933.1K
class_id.c08-Dec-20084.6K
class_id.h08-Dec-20081.2K
common.h08-Dec-20081.8K
defaults.c15-May-20097.7K
defaults.h15-May-20092.1K
dhcpagent.dfl15-May-20095.7K
dhcpagent.xcl08-Dec-20081.4K
inform.c08-Dec-20083.7K
init_reboot.c15-May-20097.8K
interface.c17-Nov-200946.9K
interface.h13-Oct-20097.7K
ipc_action.c08-Dec-20087K
ipc_action.h08-Dec-20082K
Makefile08-Dec-20081.9K
packet.c07-Jan-200940.4K
packet.h08-Dec-20084.9K
README08-Dec-200824.1K
README.v608-Dec-200855.4K
release.c08-Dec-20088.4K
renew.c08-Dec-200815.1K
request.c07-Jan-200932.9K
script_handler.c30-Apr-20099.2K
script_handler.h30-Apr-20092.4K
select.c08-Dec-20087.5K
states.c15-May-200940K
states.h15-May-200910.9K
util.c08-Dec-200816.6K
util.h08-Dec-20082.4K

README

      1 CDDL HEADER START
      2 
      3 The contents of this file are subject to the terms of the
      4 Common Development and Distribution License (the "License").
      5 You may not use this file except in compliance with the License.
      6 
      7 You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
      8 or http://www.opensolaris.org/os/licensing.
      9 See the License for the specific language governing permissions
     10 and limitations under the License.
     11 
     12 When distributing Covered Code, include this CDDL HEADER in each
     13 file and include the License file at usr/src/OPENSOLARIS.LICENSE.
     14 If applicable, add the following below this CDDL HEADER, with the
     15 fields enclosed by brackets "[]" replaced with your own identifying
     16 information: Portions Copyright [yyyy] [name of copyright owner]
     17 
     18 CDDL HEADER END
     19 
     20 Copyright 2007 Sun Microsystems, Inc.  All rights reserved.
     21 Use is subject to license terms.
     22 
     23 Architectural Overview for the DHCP agent
     24 Peter Memishian
     25 ident	"%Z%%M%	%I%	%E% SMI"
     26 
     27 INTRODUCTION
     28 ============
     29 
     30 The Solaris DHCP agent (dhcpagent) is a DHCP client implementation
     31 compliant with RFCs 2131, 3315, and others.  The major forces shaping
     32 its design were:
     33 
     34 	* Must be capable of managing multiple network interfaces.
     35 	* Must consume little CPU, since it will always be running.
     36 	* Must have a small memory footprint, since it will always be
     37 	  running.
     38 	* Must not rely on any shared libraries outside of /lib, since
     39 	  it must run before all filesystems have been mounted.
     40 
     41 When a DHCP agent implementation is only required to control a single
     42 interface on a machine, the problem is expressed well as a simple
     43 state-machine, as shown in RFC2131.  However, when a DHCP agent is
     44 responsible for managing more than one interface at a time, the
     45 problem becomes much more complicated.
     46 
     47 This can be resolved using threads or with an event-driven model.
     48 Given that DHCP's behavior can be expressed concisely as a state
     49 machine, the event-driven model is the closest match.
     50 
     51 While tried-and-true, that model is subtle and easy to get wrong.
     52 Indeed, much of the agent's code is there to manage the complexity of
     53 programming in an asynchronous event-driven paradigm.
     54 
     55 THE BASICS
     56 ==========
     57 
     58 The DHCP agent consists of roughly 30 source files, most with a
     59 companion header file.  While the largest source file is around 1700
     60 lines, most are much shorter.  The source files can largely be broken
     61 up into three groups:
     62 
     63 	* Source files that, along with their companion header files,
     64 	  define an abstract "object" that is used by other parts of
     65 	  the system.  Examples include "packet.c", which along with
     66 	  "packet.h" provide a Packet object for use by the rest of
     67 	  the agent; and "async.c", which along with "async.h" defines
     68 	  an interface for managing asynchronous transactions within
     69 	  the agent.
     70 
     71 	* Source files that implement a given state of the agent; for
     72 	  instance, there is a "request.c" which comprises all of
     73 	  the procedural "work" which must be done while in the
     74 	  REQUESTING state of the agent.  By encapsulating states in
     75 	  files, it becomes easier to debug errors in the
     76 	  client/server protocol and adapt the agent to new
     77 	  constraints, since all the relevant code is in one place.
     78 
     79 	* Source files, which along with their companion header files,
     80   	  encapsulate a given task or related set of tasks.  The
     81 	  difference between this and the first group is that the
     82 	  interfaces exported from these files do not operate on
     83 	  an "object", but rather perform a specific task.  Examples
     84 	  include "defaults.c", which provides a useful interface
     85 	  to /etc/default/dhcpagent file operations.
     86 
     87 OVERVIEW
     88 ========
     89 
     90 Here we discuss the essential objects and subtle aspects of the
     91 DHCP agent implementation.  Note that there is of course much more
     92 that is not discussed here, but after this overview you should be able 
     93 to fend for yourself in the source code.
     94 
     95 For details on the DHCPv6 aspects of the design, and how this relates
     96 to the implementation present in previous releases of Solaris, see the
     97 README.v6 file.
     98 
     99 Event Handlers and Timer Queues
    100 -------------------------------
    101 
    102 The most important object in the agent is the event handler, whose
    103 interface is in libinetutil.h and whose implementation is in
    104 libinetutil.  The event handler is essentially an object-oriented
    105 wrapper around poll(2): other components of the agent can register to
    106 be called back when specific events on file descriptors happen -- for
    107 instance, to wait for requests to arrive on its IPC socket, the agent
    108 registers a callback function (accept_event()) that will be called
    109 back whenever a new connection arrives on the file descriptor
    110 associated with the IPC socket.  When the agent initially begins in
    111 main(), it registers a number of events with the event handler, and
    112 then calls iu_handle_events(), which proceeds to wait for events to
    113 happen -- this function does not return until the agent is shutdown
    114 via signal.
    115 
    116 When the registered events occur, the callback functions are called
    117 back, which in turn might lead to additional callbacks being
    118 registered -- this is the classic event-driven model.  (As an aside,
    119 note that programming in an event-driven model means that callbacks
    120 cannot block, or else the agent will become unresponsive.)
    121 
    122 A special kind of "event" is a timeout.  Since there are many timers
    123 which must be maintained for each DHCP-controlled interface (such as a
    124 lease expiration timer, time-to-first-renewal (t1) timer, and so
    125 forth), an object-oriented abstraction to timers called a "timer
    126 queue" is provided, whose interface is in libinetutil.h with a
    127 corresponding implementation in libinetutil.  The timer queue allows
    128 callback functions to be "scheduled" for callback after a certain
    129 amount of time has passed.
    130 
    131 The event handler and timer queue objects work hand-in-hand: the event
    132 handler is passed a pointer to a timer queue in iu_handle_events() --
    133 from there, it can use the iu_earliest_timer() routine to find the
    134 timer which will next fire, and use this to set its timeout value in
    135 its call to poll(2).  If poll(2) returns due to a timeout, the event
    136 handler calls iu_expire_timers() to expire all timers that expired
    137 (note that more than one may have expired if, for example, multiple
    138 timers were set to expire at the same time).
    139 
    140 Although it is possible to instantiate more than one timer queue or
    141 event handler object, it doesn't make a lot of sense -- these objects
    142 are really "singletons".  Accordingly, the agent has two global
    143 variables, `eh' and `tq', which store pointers to the global event
    144 handler and timer queue.
    145 
    146 Network Interfaces
    147 ------------------
    148 
    149 For each network interface managed by the agent, there is a set of
    150 associated state that describes both its general properties (such as
    151 the maximum MTU) and its connections to DHCP-related state (the
    152 protocol state machines).  This state is stored in a pair of
    153 structures called `dhcp_pif_t' (the IP physical interface layer or
    154 PIF) and `dhcp_lif_t' (the IP logical interface layer or LIF).  Each
    155 dhcp_pif_t represents a single physical interface, such as "hme0," for
    156 a given IP protocol version (4 or 6), and has a list of dhcp_lif_t
    157 structures representing the logical interfaces (such as "hme0:1") in
    158 use by the agent.
    159 
    160 This split is important because of differences between IPv4 and IPv6.
    161 For IPv4, each DHCP state machine manages a single IP address and
    162 associated configuration data.  This corresponds to a single logical
    163 interface, which must be specified by the user.  For IPv6, however,
    164 each DHCP state machine manages a group of addresses, and is
    165 associated with DUID value rather than with just an interface.
    166 
    167 Thus, DHCPv6 behaves more like in.ndpd in its creation of "ADDRCONF"
    168 interfaces.  The agent automatically plumbs logical interfaces when
    169 needed and removes them when the addresses expire.
    170 
    171 The state for a given session is stored separately in `dhcp_smach_t'.
    172 This state machine then points to the main LIF used for I/O, and to a
    173 list of `dhcp_lease_t' structures representing individual leases, and
    174 each of those points to a list of LIFs corresponding to the individual
    175 addresses being managed.
    176 
    177 One point that was brushed over in the preceding discussion of event
    178 handlers and timer queues was context.  Recall that the event-driven
    179 nature of the agent requires that functions cannot block, lest they
    180 starve out others and impact the observed responsiveness of the agent.
    181 As an example, consider the process of extending a lease: the agent
    182 must send a REQUEST packet and wait for an ACK or NAK packet in
    183 response.  This is done by sending a REQUEST and then returning to the
    184 event handler that waits for an ACK or NAK packet to arrive on the
    185 file descriptor associated with the interface.  Note however, that
    186 when the ACK or NAK does arrive, and the callback function called
    187 back, it must know which state machine this packet is for (it must get
    188 back its context).  This could be handled through an ad-hoc mapping of
    189 file descriptors to state machines, but a cleaner approach is to have
    190 the event handler's register function (iu_register_event()) take in an
    191 opaque context pointer, which will then be passed back to the
    192 callback.  In the agent, the context pointer used depends on the
    193 nature of the event: events on LIFs use the dhcp_lif_t pointer, events
    194 on the state machine use dhcp_smach_t, and so on.
    195 
    196 Note that there is nothing that guarantees the pointer passed into
    197 iu_register_event() or iu_schedule_timer() will still be valid when
    198 the callback is called back (for instance, the memory may have been
    199 freed in the meantime).  To solve this problem, all of the data
    200 structures used in this way are reference counted.  For more details
    201 on how the reference count scheme is implemented, see the closing
    202 comments in interface.h regarding memory management.
    203 
    204 Transactions
    205 ------------
    206 
    207 Many operations performed via DHCP must be performed in groups -- for
    208 instance, acquiring a lease requires several steps: sending a
    209 DISCOVER, collecting OFFERs, selecting an OFFER, sending a REQUEST,
    210 and receiving an ACK, assuming everything goes well.  Note however
    211 that due to the event-driven model the agent operates in, these
    212 operations are not inherently "grouped" -- instead, the agent sends a
    213 DISCOVER, goes back into the main event loop, waits for events
    214 (perhaps even requests on the IPC channel to begin acquiring a lease
    215 on another state machine), eventually checks to see if an acceptable
    216 OFFER has come in, and so forth.  To some degree, the notion of the
    217 state machine's current state (SELECTING, REQUESTING, etc) helps
    218 control the potential chaos of the event-driven model (for instance,
    219 if while the agent is waiting for an OFFER on a given state machine,
    220 an IPC event comes in requesting that the leases be RELEASED, the
    221 agent knows to send back an error since the state machine must be in
    222 at least the BOUND state before a RELEASE can be performed.)
    223 
    224 However, states are not enough -- for instance, suppose that the agent
    225 begins trying to renew a lease.  This is done by sending a REQUEST
    226 packet and waiting for an ACK or NAK, which might never come.  If,
    227 while waiting for the ACK or NAK, the user sends a request to renew
    228 the lease as well, then if the agent were to send another REQUEST,
    229 things could get quite complicated (and this is only the beginning of
    230 this rathole).  To protect against this, two objects exist:
    231 `async_action' and `ipc_action'.  These objects are related, but
    232 independent of one another; the more essential object is the
    233 `async_action', which we will discuss first.
    234 
    235 In short, an `async_action' represents a pending transaction (aka
    236 asynchronous action), of which each state machine can have at most
    237 one.  The `async_action' structure is embedded in the `dhcp_smach_t'
    238 structure, which is fine since there can be at most one pending
    239 transaction per state machine.  Typical "asynchronous transactions"
    240 are START, EXTEND, and INFORM, since each consists of a sequence of
    241 packets that must be done without interruption.  Note that not all
    242 DHCP operations are "asynchronous" -- for instance, a DHCPv4 RELEASE
    243 operation is synchronous (not asynchronous) since after the RELEASE is
    244 sent no reply is expected from the DHCP server, but DHCPv6 Release is
    245 asynchronous, as all DHCPv6 messages are transactional.  Some
    246 operations, such as status query, are synchronous and do not affect
    247 the system state, and thus do not require sequencing.
    248 
    249 When the agent realizes it must perform an asynchronous transaction,
    250 it calls async_async() to open the transaction.  If one is already
    251 pending, then the new transaction must fail (the details of failure
    252 depend on how the transaction was initiated, which is described in
    253 more detail later when the `ipc_action' object is discussed).  If
    254 there is no pending asynchronous transaction, the operation succeeds.
    255 
    256 When the transaction is complete, either async_finish() or
    257 async_cancel() must be called to complete or cancel the asynchronous
    258 action on that state machine.  If the transaction is unable to
    259 complete within a certain amount of time (more on this later), a timer
    260 should be used to cancel the operation.
    261 
    262 The notion of asynchronous transactions is complicated by the fact
    263 that they may originate from both inside and outside of the agent.
    264 For instance, a user initiates an asynchronous START transaction when
    265 he performs an `ifconfig hme0 dhcp start', but the agent will
    266 internally need to perform asynchronous EXTEND transactions to extend
    267 the lease before it expires.  Note that user-initiated actions always
    268 have priority over internal actions: the former will cancel the
    269 latter, if necessary.
    270 
    271 This leads us into the `ipc_action' object.  An `ipc_action'
    272 represents the IPC-related pieces of an asynchronous transaction that
    273 was started as a result of a user request, as well as the `BUSY' state
    274 of the administrative interface.  Only IPC-generated asynchronous
    275 transactions have a valid `ipc_action' object.  Note that since there
    276 can be at most one asynchronous action per state machine, there can
    277 also be at most one `ipc_action' per state machine (this means it can
    278 also conveniently be embedded inside the `dhcp_smach_t' structure).
    279 
    280 One of the main purposes of the `ipc_action' object is to timeout user
    281 events.  When the user specifies a timeout value as an argument to
    282 ifconfig, he is specifying an `ipc_action' timeout; in other words,
    283 how long he is willing to wait for the command to complete.  When this
    284 time expires, the ipc_action is terminated, as well as the
    285 asynchronous operation.
    286 
    287 The API provided for the `ipc_action' object is quite similar to the
    288 one for the `async_action' object: when an IPC request comes in for an 
    289 operation requiring asynchronous operation, ipc_action_start() is
    290 called.  When the request completes, ipc_action_finish() is called.
    291 If the user times out before the request completes, then
    292 ipc_action_timeout() is called.
    293 
    294 Packet Management
    295 -----------------
    296 
    297 Another complicated area is packet management: building, manipulating,
    298 sending and receiving packets.  These operations are all encapsulated
    299 behind a dozen or so interfaces (see packet.h) that abstract the
    300 unimportant details away from the rest of the agent code.  In order to
    301 send a DHCP packet, code first calls init_pkt(), which returns a
    302 dhcp_pkt_t initialized suitably for transmission.  Note that currently
    303 init_pkt() returns a dhcp_pkt_t that is actually allocated as part of
    304 the `dhcp_smach_t', but this may change in the future..  After calling
    305 init_pkt(), the add_pkt_opt*() functions are used to add options to
    306 the DHCP packet.  Finally, send_pkt() and send_pkt_v6() can be used to
    307 transmit the packet to a given IP address.
    308 
    309 The send_pkt() function handles the details of packet timeout and
    310 retransmission.  The last argument to send_pkt() is a pointer to a
    311 "stop function."  If this argument is passed as NULL, then the packet
    312 will only be sent once (it won't be retransmitted).  Otherwise, before
    313 each retransmission, the stop function will be called back prior to
    314 retransmission.  The callback may alter dsm_send_timeout if necessary
    315 to place a cap on the next timeout; this is done for DHCPv6 in
    316 stop_init_reboot() in order to implement the CNF_MAX_RD constraint.
    317 
    318 The return value from this function indicates whether to continue
    319 retransmission or not, which allows the send_pkt() caller to control
    320 the retransmission policy without making it have to deal with the
    321 retransmission mechanism.  See request.c for an example of this in
    322 action.
    323 
    324 The recv_pkt() function is simpler but still complicated by the fact
    325 that one may want to receive several different types of packets at
    326 once.  The caller registers an event handler on the file descriptor,
    327 and then calls recv_pkt() to read in the packet along with meta
    328 information about the message (the sender and interface identifier).
    329 				
    330 For IPv6, packet reception is done with a single socket, using
    331 IPV6_PKTINFO to determine the actual destination address and receiving
    332 interface.  Packets are then matched against the state machines on the
    333 given interface through the transaction ID.
    334 
    335 For IPv4, due to oddities in the DHCP specification (discussed in
    336 PSARC/2007/571), a special IP_DHCPINIT_IF socket option must be used
    337 to allow unicast DHCP traffic to be received on an interface during
    338 lease acquisition.  Since the IP_DHCPINIT_IF socket option can only
    339 enable one interface at a time, one socket must be used per interface.
    340 
    341 Time
    342 ----
    343 
    344 The notion of time is an exceptionally subtle area.  You will notice
    345 five ways that time is represented in the source: as lease_t's,
    346 uint32_t's, time_t's, hrtime_t's, and monosec_t's.  Each of these
    347 types serves a slightly different function.
    348 
    349 The `lease_t' type is the simplest to understand; it is the unit of
    350 time in the CD_{LEASE,T1,T2}_TIME options in a DHCP packet, as defined
    351 by RFC2131. This is defined as a positive number of seconds (relative
    352 to some fixed point in time) or the value `-1' (DHCP_PERM) which
    353 represents infinity (i.e., a permanent lease).  The lease_t should be
    354 used either when dealing with actual DHCP packets that are sent on the
    355 wire or for variables which follow the exact definition given in the
    356 RFC.
    357 
    358 The `uint32_t' type is also used to represent a relative time in
    359 seconds.  However, here the value `-1' is not special and of course
    360 this type is not tied to any definition given in RFC2131.  Use this
    361 for representing "offsets" from another point in time that are not
    362 DHCP lease times.
    363 
    364 The `time_t' type is the natural Unix type for representing time since
    365 the epoch.  Unfortunately, it is affected by stime(2) or adjtime(2)
    366 and since the DHCP client is used during system installation (and thus
    367 when time is typically being configured), the time_t cannot be used in
    368 general to represent an absolute time since the epoch.  For instance,
    369 if a time_t were used to keep track of when a lease began, and then a
    370 minute later stime(2) was called to adjust the system clock forward a
    371 year, then the lease would appeared to have expired a year ago even
    372 though it has only been a minute.  For this reason, time_t's should
    373 only be used either when wall time must be displayed (such as in
    374 DHCP_STATUS ipc transaction) or when a time meaningful across reboots
    375 must be obtained (such as when caching an ACK packet at system
    376 shutdown).
    377 
    378 The `hrtime_t' type returned from gethrtime() works around the
    379 limitations of the time_t in that it is not affected by stime(2) or
    380 adjtime(2), with the disadvantage that it represents time from some
    381 arbitrary time in the past and in nanoseconds.  The timer queue code
    382 deals with hrtime_t's directly since that particular piece of code is
    383 meant to be fairly independent of the rest of the DHCP client.
    384 
    385 However, dealing with nanoseconds is error-prone when all the other
    386 time types are in seconds.  As a result, yet another time type, the
    387 `monosec_t' was created to represent a monotonically increasing time
    388 in seconds, and is really no more than (hrtime_t / NANOSEC).  Note
    389 that this unit is typically used where time_t's would've traditionally
    390 been used.  The function monosec() in util.c returns the current
    391 monosec, and monosec_to_time() can convert a given monosec to wall
    392 time, using the system's current notion of time.
    393 
    394 One additional limitation of the `hrtime_t' and `monosec_t' types is
    395 that they are unaware of the passage of time across checkpoint/resume
    396 events (e.g., those generated by sys-suspend(1M)).  For example, if
    397 gethrtime() returns time T, and then the machine is suspended for 2
    398 hours, and then gethrtime() is called again, the time returned is not
    399 T + (2 * 60 * 60 * NANOSEC), but rather approximately still T.
    400 
    401 To work around this (and other checkpoint/resume related problems),
    402 when a system is resumed, the DHCP client makes the pessimistic
    403 assumption that all finite leases have expired while the machine was
    404 suspended and must be obtained again.  This is known as "refreshing"
    405 the leases, and is handled by refresh_smachs().
    406 
    407 Note that it appears like a more intelligent approach would be to
    408 record the time(2) when the system is suspended, compare that against
    409 the time(2) when the system is resumed, and use the delta between them
    410 to decide which leases have expired.  Sadly, this cannot be done since
    411 through at least Solaris 10, it is not possible for userland programs
    412 to be notified of system suspend events.
    413 
    414 Configuration
    415 -------------
    416 
    417 For the most part, the DHCP client only *retrieves* configuration data
    418 from the DHCP server, leaving the configuration to scripts (such as
    419 boot scripts), which themselves use dhcpinfo(1) to retrieve the data
    420 from the DHCP client.  This is desirable because it keeps the mechanism
    421 of retrieving the configuration data decoupled from the policy of using
    422 the data.
    423 
    424 However, unless used in "inform" mode, the DHCP client *does*
    425 configure each IP interface enough to allow it to communicate with
    426 other hosts.  Specifically, the DHCP client configures the interface's
    427 IP address, netmask, and broadcast address using the information
    428 provided by the server.  Further, for IPv4 logical interface 0
    429 ("hme0"), any provided default routes are also configured.
    430 
    431 For IPv6, only the IP addresses are set.  The netmask (prefix) is then
    432 set automatically by in.ndpd, and routes are discovered in the usual
    433 way by router discovery or routing protocols.  DHCPv6 doesn't set
    434 routes.
    435 
    436 Since logical interfaces cannot be specified as output interfaces in
    437 the kernel forwarding table, and in most cases, logical interfaces
    438 share a default route with their associated physical interface, the
    439 DHCP client does not automatically add or remove default routes when
    440 IPv4 leases are acquired or expired on logical interfaces.
    441 
    442 Event Scripting
    443 ---------------
    444 
    445 The DHCP client supports user program invocations on DHCP events.  The
    446 supported events are BOUND, EXTEND, EXPIRE, DROP, RELEASE, and INFORM
    447 for DHCPv4, and BUILD6, EXTEND6, EXPIRE6, DROP6, LOSS6, RELEASE6, and
    448 INFORM6 for DHCPv6.  The user program runs asynchronous to the DHCP
    449 client so that the main event loop stays active to process other
    450 events, including events triggered by the user program (for example,
    451 when it invokes dhcpinfo).
    452 
    453 The user program execution is part of the transaction of a DHCP command.
    454 For example, if the user program is not enabled, the transaction of the
    455 DHCP command START is considered over when an ACK is received and the
    456 interface is configured successfully.  If the user program is enabled,
    457 it is invoked after the interface is configured successfully, and the
    458 transaction is considered over only when the user program exits.  The
    459 event scripting implementation makes use of the asynchronous operations
    460 discussed in the "Transactions" section.
    461 
    462 An upper bound of 58 seconds is imposed on how long the user program
    463 can run. If the user program does not exit after 55 seconds, the signal
    464 SIGTERM is sent to it. If it still does not exit after additional 3
    465 seconds, the signal SIGKILL is sent to it.  Since the event handler is
    466 a wrapper around poll(), the DHCP client cannot directly observe the
    467 completion of the user program.  Instead, the DHCP client creates a
    468 child "helper" process to synchronously monitor the user program (this
    469 process is also used to send the aformentioned signals to the process,
    470 if necessary).  The DHCP client and the helper process share a pipe
    471 which is included in the set of poll descriptors monitored by the DHCP
    472 client's event handler.  When the user program exits, the helper process
    473 passes the user program exit status to the DHCP client through the pipe,
    474 informing the DHCP client that the user program has finished.  When the
    475 DHCP client is asked to shut down, it will wait for any running instances
    476 of the user program to complete.
    477 

README.v6

      1 CDDL HEADER START
      2 
      3 The contents of this file are subject to the terms of the
      4 Common Development and Distribution License (the "License").
      5 You may not use this file except in compliance with the License.
      6 
      7 You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
      8 or http://www.opensolaris.org/os/licensing.
      9 See the License for the specific language governing permissions
     10 and limitations under the License.
     11 
     12 When distributing Covered Code, include this CDDL HEADER in each
     13 file and include the License file at usr/src/OPENSOLARIS.LICENSE.
     14 If applicable, add the following below this CDDL HEADER, with the
     15 fields enclosed by brackets "[]" replaced with your own identifying
     16 information: Portions Copyright [yyyy] [name of copyright owner]
     17 
     18 CDDL HEADER END
     19 
     20 Copyright 2007 Sun Microsystems, Inc.  All rights reserved.
     21 Use is subject to license terms.
     22 
     23 ident	"%Z%%M%	%I%	%E% SMI"
     24 
     25 
     26 **  PLEASE NOTE:
     27 **
     28 **  This document discusses aspects of the DHCPv4 client design that have
     29 **  since changed (e.g., DLPI is no longer used).  However, since those
     30 **  aspects affected the DHCPv6 design, the discussion has been left for
     31 **  historical record.
     32 
     33 
     34 DHCPv6 Client Low-Level Design
     35 
     36 Introduction
     37 
     38   This project adds DHCPv6 client-side (not server) support to
     39   Solaris.  Future projects may add server-side support as well as
     40   enhance the basic capabilities added here.  These future projects
     41   are not discussed in detail in this document.
     42 
     43   This document assumes that the reader is familiar with the following
     44   other documents:
     45 
     46   - RFC 3315: the primary description of DHCPv6
     47   - RFCs 2131 and 2132: IPv4 DHCP
     48   - RFCs 2461 and 2462: IPv6 NDP and stateless autoconfiguration
     49   - RFC 3484: IPv6 default address selection
     50   - ifconfig(1M): Solaris IP interface configuration
     51   - in.ndpd(1M): Solaris IPv6 Neighbor and Router Discovery daemon
     52   - dhcpagent(1M): Solaris DHCP client
     53   - dhcpinfo(1): Solaris DHCP parameter utility
     54   - ndpd.conf(4): in.ndpd configuration file
     55   - netstat(1M): Solaris network status utility
     56   - snoop(1M): Solaris network packet capture and inspection
     57   - "DHCPv6 Client High-Level Design"
     58 
     59   Several terms from those documents (such as the DHCPv6 IA_NA and
     60   IAADDR options) are used without further explanation in this
     61   document; see the reference documents above for details.
     62 
     63   The overall plan is to enhance the existing Solaris dhcpagent so
     64   that it is able to process DHCPv6.  It would also have been possible
     65   to create a new, separate daemon process for this, or to integrate
     66   the feature into in.ndpd.  These alternatives, and the reason for
     67   the chosen design, are discussed in Appendix A.
     68 
     69   This document discusses the internal design issues involved in the
     70   protocol implementation, and with the associated components (such as
     71   in.ndpd, snoop, and the kernel's source address selection
     72   algorithm).  It does not discuss the details of the protocol itself,
     73   which are more than adequately described in the RFC, nor the
     74   individual lines of code, which will be in the code review.
     75 
     76   As a cross-reference, Appendix B has a summary of the components
     77   involved and the changes to each.
     78 
     79 
     80 Background
     81 
     82   In order to discuss the design changes for DHCPv6, it's necessary
     83   first to talk about the current IPv4-only design, and the
     84   assumptions built into that design.
     85 
     86   The main data structure used in dhcpagent is the 'struct ifslist'.
     87   Each instance of this structure represents a Solaris logical IP
     88   interface under DHCP's control.  It also represents the shared state
     89   with the DHCP server that granted the address, the address itself,
     90   and copies of the negotiated options.
     91 
     92   There is one list in dhcpagent containing all of the IP interfaces
     93   that are under DHCP control.  IP interfaces not under DHCP control
     94   (for example, those that are statically addressed) are not included
     95   in this list, even when plumbed on the system.  These ifslist
     96   entries are chained like this:
     97 
     98   ifsheadp -> ifslist -> ifslist -> ifslist -> NULL
     99 	        net0	  net0:1     net1
    100 
    101   Each ifslist entry contains the address, mask, lease information,
    102   interface name, hardware information, packets, protocol state, and
    103   timers.  The name of the logical IP interface under DHCP's control
    104   is also the name used in the administrative interfaces (dhcpinfo,
    105   ifconfig) and when logging events.
    106 
    107   Each entry holds open a DLPI stream and two sockets.  The DLPI
    108   stream is nulled-out with a filter when not in use, but still
    109   consumes system resources.  (Most significantly, it causes data
    110   copies in the driver layer that end up sapping performance.)
    111 
    112   The entry storage is managed by a insert/hold/release/remove model
    113   and reference counts.  In this model, insert_ifs() allocates a new
    114   ifslist entry and inserts it into the global list, with the global
    115   list holding a reference.  remove_ifs() removes it from the global
    116   list and drops that reference.  hold_ifs() and release_ifs() are
    117   used by data structures that refer to ifslist entries, such as timer
    118   entries, to make sure that the ifslist entry isn't freed until the
    119   timer has been dispatched or deleted.
    120 
    121   The design is single-threaded, so code that walks the global list
    122   needn't bother taking holds on the ifslist structure.  Only
    123   references that may be used at a different time (i.e., pointers
    124   stored in other data structures) need to be recorded.
    125 
    126   Packets are handled using PKT (struct dhcp; <netinet/dhcp.h>),
    127   PKT_LIST (struct dhcp_list; <dhcp_impl.h>), and dhcp_pkt_t (struct
    128   dhcp_pkt; "packet.h").  PKT is just the RFC 2131 DHCP packet
    129   structure, and has no additional information, such as packet length.
    130   PKT_LIST contains a PKT pointer, length, decoded option arrays, and
    131   linkage for putting the packet in a list.  Finally, dhcp_pkt_t has a
    132   PKT pointer and length values suitable for modifying the packet.
    133 
    134   Essentially, PKT_LIST is a wrapper for received packets, and
    135   dhcp_pkt_t is a wrapper for packets to be sent.
    136 
    137   The basic PKT structure is used in dhcpagent, inetboot, in.dhcpd,
    138   libdhcpagent, libwanboot, libdhcputil, and others.  PKT_LIST is used
    139   in a similar set of places, including the kernel NFS modules.
    140   dhcp_pkt_t is (as the header file implies) limited to dhcpagent.
    141 
    142   In addition to these structures, dhcpagent maintains a set of
    143   internal supporting abstractions.  Two key ones involved in this
    144   project are the "async operation" and the "IPC action."  An async
    145   operation encapsulates the actions needed for a given operation, so
    146   that if cancellation is needed, there's a single point where the
    147   associated resources can be freed.  An IPC action represents the
    148   user state related to the private interface used by ifconfig.
    149 
    150 
    151 DHCPv6 Inherent Differences
    152 
    153   DHCPv6 naturally has some commonality with IPv4 DHCP, but also has
    154   some significant differences.
    155 
    156   Unlike IPv4 DHCP, DHCPv6 relies on link-local IP addresses to do its
    157   work.  This means that, on Solaris, the client doesn't need DLPI to
    158   perform any of the I/O; regular IP sockets will do the job.  It also
    159   means that, unlike IPv4 DHCP, DHCPv6 does not need to obtain a lease
    160   for the address used in its messages to the server.  The system
    161   provides the address automatically.
    162 
    163   IPv4 DHCP expects some messages from the server to be broadcast.
    164   DHCPv6 has no such mechanism; all messages from the server to the
    165   client are unicast.  In the case where the client and server aren't
    166   on the same subnet, a relay agent is used to get the unicast replies
    167   back to the client's link-local address.
    168 
    169   With IPv4 DHCP, a single address plus configuration options is
    170   leased with a given client ID and a single state machine instance,
    171   and the implementation binds that to a single IP logical interface
    172   specified by the user.  The lease has a "Lease Time," a required
    173   option, as well as two timers, called T1 (renew) and T2 (rebind),
    174   which are controlled by regular options.
    175 
    176   DHCPv6 uses a single client/server session to control the
    177   acquisition of configuration options and "identity associations"
    178   (IAs).  The identity associations, in turn, contain lists of
    179   addresses for the client to use and the T1/T2 timer values.  Each
    180   individual address has its own preferred and valid lifetime, with
    181   the address being marked "deprecated" at the end of the preferred
    182   interval, and removed at the end of the valid interval.
    183 
    184   IPv4 DHCP leaves many of the retransmit decisions up to the client,
    185   and some things (such as RELEASE and DECLINE) are sent just once.
    186   Others (such as the REQUEST message used for renew and rebind) are
    187   dealt with by heuristics.  DHCPv6 treats each message to the server
    188   as a separate transaction, and resends each message using a common
    189   retransmission mechanism.  DHCPv6 also has separate messages for
    190   Renew, Rebind, and Confirm rather than reusing the Request
    191   mechanism.
    192 
    193   The set of options (which are used to convey configuration
    194   information) for each protocol are distinct.  Notably, two of the
    195   mistakes from IPv4 DHCP have been fixed: DHCPv6 doesn't carry a
    196   client name, and doesn't attempt to impersonate a routing protocol
    197   by setting a "default route."
    198 
    199   Another welcome change is the lack of a netmask/prefix length with
    200   DHCPv6.  Instead, the client uses the Router Advertisement prefixes
    201   to set the correct interface netmask.  This reduces the number of
    202   databases that need to be kept in sync.  (The equivalent mechanism
    203   in IPv4 would have been the use of ICMP Address Mask Request /
    204   Reply, but the BOOTP designers chose to embed it in the address
    205   assignment protocol itself.)
    206 
    207   Otherwise, DHCPv6 is similar to IPv4 DHCP.  The same overall
    208   renew/rebind and lease expiry strategy is used, although the state
    209   machine events must now take into account multiple IAs and the fact
    210   that each can cause RENEWING or REBINDING state independently.
    211 
    212 
    213 DHCPv6 And Solaris
    214 
    215   The protocol distinctions above have several important implications.
    216   For the logical interfaces:
    217 
    218     - Because Solaris uses IP logical interfaces to configure
    219       addresses, we must have multiple IP logical interfaces per IA
    220       with IPv6.
    221 
    222     - Because we need to support multiple addresses (and thus multiple
    223       IP logical interfaces) per IA and multiple IAs per client/server
    224       session, the IP logical interface name isn't a unique name for
    225       the lease.
    226 
    227   As a result, IP logical interfaces will come and go with DHCPv6,
    228   just as happens with the existing stateless address
    229   autoconfiguration support in in.ndpd.  The logical interface names
    230   (visible in ifconfig) have no administrative significance.
    231 
    232   Fortunately, DHCPv6 does end up with one fixed name that can be used
    233   to identify a session.  Because DHCPv6 uses link local addresses for
    234   communication with the server, the name of the IP logical interface
    235   that has this link local address (normally the same as the IP
    236   physical interface) can be used as an identifier for dhcpinfo and
    237   logging purposes.
    238 
    239 
    240 Dhcpagent Redesign Overview
    241 
    242   The redesign starts by refactoring the IP interface representation.
    243   Because we need to have multiple IP logical interfaces (LIFs) for a
    244   single identity association (IA), we should not store all of the
    245   DHCP state information along with the LIF information.
    246 
    247   For DHCPv6, we will need to keep LIFs on a single IP physical
    248   interface (PIF) together, so this is probably also a good time to
    249   reconsider the way dhcpagent represents physical interfaces.  The
    250   current design simply replicates the state (notably the DLPI stream,
    251   but also the hardware address and other bits) among all of the
    252   ifslist entries on the same physical interface.
    253 
    254   The new design creates two lists of dhcp_pif_t entries, one list for
    255   IPv4 and the other for IPv6.  Each dhcp_pif_t represents a PIF, with
    256   a list of dhcp_lif_t entries attached, each of which represents a
    257   LIF used by dhcpagent.  This structure mirrors the kernel's ill_t
    258   and ipif_t interface representations.
    259 
    260   Next, the lease-tracking needs to be refactored.  DHCPv6 is the
    261   functional superset in this case, as it has two lifetimes per
    262   address (LIF) and IA groupings with shared T1/T2 timers.  To
    263   represent these groupings, we will use a new dhcp_lease_t structure.
    264   IPv4 DHCP will have one such structure per state machine, while
    265   DHCPv6 will have a list.  (Note: the initial implementation will
    266   have only one lease per DHCPv6 state machine, because each state
    267   machine uses a single link-local address, a single DUID+IAID pair,
    268   and supports only Non-temporary Addresses [IA_NA option].  Future
    269   enhancements may use multiple leases per DHCPv6 state machine or
    270   support other IA types.)
    271 
    272   For all of these new structures, we will use the same insert/hold/
    273   release/remove model as with the original ifslist.
    274 
    275   Finally, the remaining items (and the bulk of the original ifslist
    276   members) are kept on a per-state-machine basis.  As this is no
    277   longer just an "interface," a new dhcp_smach_t structure will hold
    278   these, and the ifslist structure is gone.
    279 
    280 
    281 Lease Representation
    282 
    283   For DHCPv6, we need to track multiple LIFs per lease (IA), but we
    284   also need multiple LIFs per PIF.  Rather than having two sets of
    285   list linkage for each LIF, we can observe that a LIF is on exactly
    286   one PIF and is a member of at most one lease, and then simplify: the
    287   lease structure will use a base pointer for the first LIF in the
    288   lease, and a count for the number of consecutive LIFs in the PIF's
    289   list of LIFs that belong to the lease.
    290 
    291   When removing a LIF from the system, we need to decrement the count
    292   of LIFs in the lease, and advance the base pointer if the LIF being
    293   removed is the first one.  Inserting a LIF means just moving it into
    294   this list and bumping the counter.
    295 
    296   When removing a lease from a state machine, we need to dispose of
    297   the LIFs referenced.  If the LIF being disposed is the main LIF for
    298   a state machine, then all that we can do is canonize the LIF
    299   (returning it to a default state); this represents the normal IPv4
    300   DHCP operation on lease expiry.  Otherwise, the lease is the owner
    301   of that LIF (it was created because of a DHCPv6 IA), and disposal
    302   means unplumbing the LIF from the actual system and removing the LIF
    303   entry from the PIF.
    304 
    305 
    306 Main Structure Linkage
    307 
    308   For IPv4 DHCP, the new linkage is straightforward.  Using the same
    309   system configuration example as in the initial design discussion:
    310 
    311           +- lease  +- lease       +- lease
    312           |  ^      |  ^           |  ^
    313           |  |      |  |           |  |
    314           \  smach  \  smach       \  smach
    315            \ ^|      \ ^|           \ ^|
    316             v|v       v|v            v|v
    317             lif ----> lif -> NULL     lif -> NULL
    318             net0      net0:1          net1
    319             ^                         ^
    320             |                         |
    321   v4root -> pif --------------------> pif -> NULL
    322             net0                      net1
    323 
    324   This diagram shows three separate state machines running (with
    325   backpointers omitted for clarity).  Each state machine has a single
    326   "main" LIF with which it's associated (and named).  Each also has a
    327   single lease structure that points back to the same LIF (count of
    328   1), because IPv4 DHCP controls a single address allocation per state
    329   machine.
    330 
    331   DHCPv6 is a bit more complex.  This shows DHCPv6 running on two
    332   interfaces (more or fewer interfaces are of course possible) and
    333   with multiple leases on the first interface, and each lease with
    334   multiple addresses (one with two addresses, the second with one).
    335 
    336             lease ----------------> lease -> NULL   lease -> NULL
    337             ^   \(2)                |(1)            ^   \ (1)
    338             |    \                  |               |    \
    339             smach \                 |               smach \
    340             ^ |    \                |               ^ |    \
    341             | v     v               v               | v     v
    342             lif --> lif --> lif --> lif --> NULL    lif --> lif -> NULL
    343             net0    net0:1  net0:4  net0:2          net1    net1:5
    344             ^                                       ^
    345             |                                       |
    346   v6root -> pif ----------------------------------> pif -> NULL
    347             net0                                    net1
    348 
    349   Note that there's intentionally no ordering based on name in the
    350   list of LIFs.  Instead, the contiguous LIF structures in that list
    351   represent the addresses in each lease.  The logical interfaces
    352   themselves are allocated and numbered by the system kernel, so they
    353   may not be sequential, and there may be gaps in the list if other
    354   entities (such as in.ndpd) are also configuring interfaces.
    355 
    356   Note also that with IPv4 DHCP, the lease points to the LIF that's
    357   also the main LIF for the state machine, because that's the IP
    358   interface that dhcpagent controls.  With DHCPv6, the lease (one per
    359   IA structure) points to a separate set of LIFs that are created just
    360   for the leased addresses (one per IA address in an IAADDR option).
    361   The state machine alone points to the main LIF.
    362 
    363 
    364 Packet Structure Extensions
    365 
    366   Obviously, we need some DHCPv6 packet data structures and
    367   definitions.  A new <netinet/dhcp6.h> file will be introduced with
    368   the necessary #defines and structures.  The key structure there will
    369   be:
    370 
    371 	struct dhcpv6_message {
    372 		uint8_t		d6m_msg_type;
    373 		uint8_t		d6m_transid_ho;
    374 		uint16_t	d6m_transid_lo;
    375 	};
    376 	typedef	struct dhcpv6_message	dhcpv6_message_t;
    377 
    378   This defines the usual (non-relay) DHCPv6 packet header, and is
    379   roughly equivalent to PKT for IPv4.
    380 
    381   Extending dhcp_pkt_t for DHCPv6 is straightforward, as it's used
    382   only within dhcpagent.  This structure will be amended to use a
    383   union for v4/v6 and include a boolean to flag which version is in
    384   use.
    385 
    386   For the PKT_LIST structure, things are more complex.  This defines
    387   both a queuing mechanism for received packets (typically OFFERs) and
    388   a set of packet decoding structures.  The decoding structures are
    389   highly specific to IPv4 DHCP -- they have no means to handle nested
    390   or repeated options (as used heavily in DHCPv6) and make use of the
    391   DHCP_OPT structure which is specific to IPv4 DHCP -- and are
    392   somewhat expensive in storage, due to the use of arrays indexed by
    393   option code number.
    394 
    395   Worse, this structure is used throughout the system, so changes to
    396   it need to be made carefully.  (For example, the existing 'pkt'
    397   member can't just be turned into a union.)
    398 
    399   For an initial prototype, since discarded, I created a new
    400   dhcp_plist_t structure to represent packet lists as used inside
    401   dhcpagent and made dhcp_pkt_t valid for use on input and output.
    402   The result is unsatisfying, though, as it results in code that
    403   manipulates far too many data structures in common cases; it's a sea
    404   of pointers to pointers.
    405 
    406   The better answer is to use PKT_LIST for both IPv4 and IPv6, adding
    407   the few new bits of metadata required to the end (receiving ifIndex,
    408   packet source/destination addresses), and staying within the overall
    409   existing design.
    410 
    411   For option parsing, dhcpv6_find_option() and dhcpv6_pkt_option()
    412   functions will be added to libdhcputil.  The former function will
    413   walk a DHCPv6 option list, and provide safe (bounds-checked) access
    414   to the options inside.  The function can be called recursively, so
    415   that option nesting can be handled fairly simply by nested loops,
    416   and can be called repeatedly to return each instance of a given
    417   option code number.  The latter function is just a convenience
    418   wrapper on dhcpv6_find_option() that starts with a PKT_LIST pointer
    419   and iterates over the top-level options with a given code number.
    420 
    421   There are two special considerations for the use of these library
    422   interfaces: there's no "pad" option for DHCPv6 or alignment
    423   requirements on option headers or contents, and nested options
    424   always follow a structure that has type-dependent length.  This
    425   means that code that handles options must all be written to deal
    426   with unaligned data, and suboption code must index the pointer past
    427   the type-dependent part.
    428 
    429 
    430 Packet Construction
    431 
    432   Unlike DHCPv4, DHCPv6 places the transaction timer value in an
    433   option.  The existing code sets the current time value in
    434   send_pkt_internal(), which allows it to be updated in a
    435   straightforward way when doing retransmits.
    436 
    437   To make this work in a simple manner for DHCPv6, I added a
    438   remove_pkt_opt() function.  The update logic just does a remove and
    439   re-adds the option.  We could also just assume the presence of the
    440   option, find it, and modify in place, but the remove feature seems
    441   more general.
    442 
    443   DHCPv6 uses nesting options.  To make this work, two new utility
    444   functions are needed.  First, an add_pkt_subopt() function will take
    445   a pointer to an existing option and add an embedded option within
    446   it.  The packet length and existing option length are updated.  If
    447   that existing option isn't a top-level option, though, this means
    448   that the caller must update the lengths of all of the enclosing
    449   options up to the top level.  To do this, update_v6opt_len() will be
    450   added.  This is used in the special case of adding a Status Code
    451   option to an IAADDR option within an IA_NA top-level option.
    452 
    453 
    454 Sockets and I/O Handling
    455 
    456   DHCPv6 doesn't need or use either a DLPI or a broadcast IP socket.
    457   Instead, a single unicast-bound IP socket on a link-local address
    458   would be the most that is needed.  This is roughly equivalent to
    459   if_sock_ip_fd in the existing design, but that existing socket is
    460   bound only after DHCP reaches BOUND state -- that is, when it
    461   switches away from DLPI.  We need something different.
    462 
    463   This, along with the excess of open file descriptors in an otherwise
    464   idle daemon and the potentially serious performance problems in
    465   leaving DLPI open at all times, argues for a larger redesign of the
    466   I/O logic in dhcpagent.
    467 
    468   The first thing that we can do is eliminate the need for the
    469   per-ifslist if_sock_fd.  This is used primarily for issuing ioctls
    470   to configure interfaces -- a task that would work as well with any
    471   open socket -- and is also registered to receive any ACK/NAK packets
    472   that may arrive via broadcast.  Both of these can be eliminated by
    473   creating a pair of global sockets (IPv4 and IPv6), bound and
    474   configured for ACK/NAK reception.  The only functional difference is
    475   that the list of running state machines must be scanned on reception
    476   to find the correct transaction ID, but the existing design
    477   effectively already goes to this effort because the kernel
    478   replicates received datagrams among all matching sockets, and each
    479   ifslist entry has a socket open.
    480 
    481   (The existing code for if_sock_fd makes oblique reference to unknown
    482   problems in the system that may prevent binding from working in some
    483   cases.  The reference dates back some seven years to the original
    484   DHCP implementation.  I've observed no such problems in extensive
    485   testing and if any do show up, they will be dealt with by fixing the
    486   underlying bugs.)
    487 
    488   This leads to an important simplification: it's no longer necessary
    489   to register, unregister, and re-register for packet reception while
    490   changing state -- register_acknak() and unregister_acknak() are
    491   gone.  Instead, we always receive, and we dispatch the packets as
    492   they arrive.  As a result, when receiving a DHCPv4 ACK or DHCPv6
    493   Reply when in BOUND state, we know it's a duplicate, and we can
    494   discard.
    495 
    496   The next part is in minimizing DLPI usage.  A DLPI stream is needed
    497   at most for each IPv4 PIF, and it's not needed when all of the
    498   DHCP instances on that PIF are bound.  In fact, the current
    499   implementation deals with this in configure_bound() by setting a
    500   "blackhole" packet filter.  The stream is left open.
    501 
    502   To simplify this, we will open at most one DLPI stream on a PIF, and
    503   use reference counts from the state machines to determine when the
    504   stream must be open and when it can be closed.  This mechanism will
    505   be centralized in a set_smach_state() function that changes the
    506   state and opens/closes the DLPI stream when needed.
    507 
    508   This leads to another simplification.  The I/O logic in the existing
    509   dhcpagent makes use of the protocol state to select between DLPI and
    510   sockets.  Now that we keep track of this in a simpler manner, we no
    511   longer need to switch out on state in when sending a packet; just
    512   test the dsm_using_dlpi flag instead.
    513 
    514   Still another simplification is in the handling of DHCPv4 INFORM.
    515   The current code has separate logic in it for getting the interface
    516   state and address information.  This is no longer necessary, as the
    517   LIF mechanism keeps track of the interface state.  And since we have
    518   separate lease structures, and INFORM doesn't acquire a lease, we no
    519   longer have to be careful about canonizing the interface on
    520   shutdown.
    521 
    522   Although the default is to send all client messages to a well-known
    523   multicast address for servers and relays, DHCPv6 also has a
    524   mechanism that allows the client to send unicast messages to the
    525   server.  The operation of this mechanism is slightly complex.
    526   First, the server sends the client a unicast address via an option.
    527   We may use this address as the destination (rather than the
    528   well-known multicast address for local DHCPv6 servers and relays)
    529   only if we have a viable local source address.  This means using
    530   SIOCGDSTINFO each time we try to send unicast.  Next, the server may
    531   send back a special status code: UseMulticast.  If this is received,
    532   and if we were actually using unicast in our messages to the server,
    533   then we need to forget the unicast address, switch back to
    534   multicast, and resend our last message.
    535 
    536   Note that it's important to avoid the temptation to resend the last
    537   message every time UseMulticast is seen, and do it only once on
    538   switching back to multicast: otherwise, a potential feedback loop is
    539   created.
    540 
    541   Because IP_PKTINFO (PSARC 2006/466) has integrated, we could go a
    542   step further by removing the need for any per-LIF sockets and just
    543   use the global sockets for all but DLPI.  However, in order to
    544   facilitate a Solaris 10 backport, this will be done separately as CR
    545   6509317.
    546 
    547   In the case of DHCPv6, we already have IPV6_PKTINFO, so we will pave
    548   the way for IPv4 by beginning to using this now, and thus have just
    549   a single socket (bound to "::") for all of DHCPv6.  Doing this
    550   requires switching from the old BSD4.2 -lsocket -lnsl to the
    551   standards-compliant -lxnet in order to use ancillary data.
    552 
    553   It may also be possible to remove the need for DLPI for IPv4, and
    554   incidentally simplify the code a fair amount, by adding a kernel
    555   option to allow transmission and reception of UDP packets over
    556   interfaces that are plumbed but not marked IFF_UP.  This is left for
    557   future work.
    558 
    559 
    560 The State Machine
    561 
    562   Several parts of the existing state machine need additions to handle
    563   DHCPv6, which is a superset of DHCPv4.
    564 
    565   First, there are the RENEWING and REBINDING states.  For IPv4 DHCP,
    566   these states map one-to-one with a single address and single lease
    567   that's undergoing renewal.  It's a simple progression (on timeout)
    568   from BOUND, to RENEWING, to REBINDING and finally back to SELECTING
    569   to start over.  Each retransmit is done by simply rescheduling the
    570   T1 or T2 timer.
    571 
    572   For DHCPv6, things are somewhat more complex.  At any one time,
    573   there may be multiple IAs (leases) that are effectively in renewing
    574   or rebinding state, based on the T1/T2 timers for each IA, and many
    575   addresses that have expired.
    576 
    577   However, because all of the leases are related to a single server,
    578   and that server either responds to our requests or doesn't, we can
    579   simplify the states to be nearly identical to IPv4 DHCP.
    580 
    581   The revised definition for use with DHCPv6 is:
    582 
    583     - Transition from BOUND to RENEWING state when the first T1 timer
    584       (of any lease on the state machine) expires.  At this point, as
    585       an optimization, we should begin attempting to renew any IAs
    586       that are within REN_TIMEOUT (10 seconds) of reaching T1 as well.
    587       We may as well avoid sending an excess of packets.
    588 
    589     - When a T1 lease timer expires and we're in RENEWING or REBINDING
    590       state, just ignore it, because the transaction is already in
    591       progress.
    592 
    593     - At each retransmit timeout, we should check to see if there are
    594       more IAs that need to join in because they've passed point T1 as
    595       well, and, if so, add them.  This check isn't necessary at this
    596       time, because only a single IA_NA is possible with the initial
    597       design.
    598 
    599     - When we reach T2 on any IA and we're in BOUND or RENEWING state,
    600       enter REBINDING state.  At this point, we have a choice.  For
    601       those other IAs that are past T1 but not yet at T2, we could
    602       ignore them (sending only those that have passed point T2),
    603       continue to send separate Renew messages for them, or just
    604       include them in the Rebind message.  This isn't an issue that
    605       must be dealt with for this project, but the plan is to include
    606       them in the Rebind message.
    607 
    608     - When a T2 lease timer expires and we're in REBINDING state, just
    609       ignore it, as with the corresponding T1 timer.
    610 
    611     - As addresses reach the end of their preferred lifetimes, set the
    612       IFF_DEPRECATED flag.  As they reach the end of the valid
    613       lifetime, remove them from the system.  When an IA (lease)
    614       becomes empty, just remove it.  When there are no more leases
    615       left, return to SELECTING state to start over.
    616 
    617   Note that the RFC treats the IAs as separate entities when
    618   discussing the renew/rebind T1/T2 timers, but treats them as a unit
    619   when doing the initial negotiation.  This is, to say the least,
    620   confusing, especially so given that there's no reason to expect that
    621   after having failed to elicit any responses at all from the server
    622   on one IA, the server will suddenly start responding when we attempt
    623   to renew some other IA.  We rationalize this behavior by using a
    624   single renew/rebind state for the entire state machine (and thus
    625   client/server pair).
    626 
    627   There's a subtle timing difference here between DHCPv4 and DHCPv6.
    628   For DHCPv4, the client just sends packets more and more frequently
    629   (shorter timeouts) as the next state gets nearer.  DHCPv6 treats
    630   each as a transaction, using the same retransmit logic as for other
    631   messages.  The DHCPv6 method is a cleaner design, so we will change
    632   the DHCPv4 implementation to do the same, and compute the new timer
    633   values as part of stop_extending().
    634 
    635   Note that it would be possible to start the SELECTING state earlier
    636   than waiting for the last lease to expire, and thus avoid a loss of
    637   connectivity.  However, it this point, there are other servers on
    638   the network that have seen us attempting to Rebind for quite some
    639   time, and they have not responded.  The likelihood that there's a
    640   server that will ignore Rebind but then suddenly spring into action
    641   on a Solicit message seems low enough that the optimization won't be
    642   done now.  (Starting SELECTING state earlier may be done in the
    643   future, if it's found to be useful.)
    644 
    645 
    646 Persistent State
    647 
    648   IPv4 DHCP has only minimal need for persistent state, beyond the
    649   configuration parameters.  The state is stored when "ifconfig dhcp
    650   drop" is run or the daemon receives SIGTERM, which is typically done
    651   only well after the system is booted and running.
    652 
    653   The daemon stores this state in /etc/dhcp, because it needs to be
    654   available when only the root file system has been mounted.
    655 
    656   Moreover, dhcpagent starts very early in the boot process.  It runs
    657   as part of svc:/network/physical:default, which runs well before
    658   root is mounted read/write:
    659 
    660      svc:/system/filesystem/root:default ->
    661         svc:/system/metainit:default ->
    662            svc:/system/identity:node ->
    663               svc:/network/physical:default
    664            svc:/network/iscsi_initiator:default ->
    665               svc:/network/physical:default
    666 
    667   and, of course, well before either /var or /usr is mounted.  This
    668   means that any persistent state must be kept in the root file
    669   system, and that if we write before shutdown, we have to cope
    670   gracefully with the root file system returning EROFS on write
    671   attempts.
    672 
    673   For DHCPv6, we need to try to keep our stable DUID and IAID values
    674   stable across reboots to fulfill the demands of RFC 3315.
    675 
    676   The DUID is either configured or automatically generated.  When
    677   configured, it comes from the /etc/default/dhcpagent file, and thus
    678   does not need to be saved by the daemon.  If automatically
    679   generated, there's exactly one of these created, and it will
    680   eventually be needed before /usr is mounted, if /usr is mounted over
    681   IPv6.  This means a new file in the root file system,
    682   /etc/dhcp/duid, will be used to hold the automatically generated
    683   DUID.
    684 
    685   The determination of whether to use a configured DUID or one saved
    686   in a file is made in get_smach_cid().  This function will
    687   encapsulate all of the DUID parsing and generation machinery for the
    688   rest of dhcpagent.
    689 
    690   If root is not writable at the point when dhcpagent starts, and our
    691   attempt fails with EROFS, we will set a timer for 60 second
    692   intervals to retry the operation periodically.  In the unlikely case
    693   that it just never succeeds or that we're rebooted before root
    694   becomes writable, then the impact will be that the daemon will wake
    695   up once a minute and, ultimately, we'll choose a different DUID on
    696   next start-up, and we'll thus lose our leases across a reboot.
    697 
    698   The IAID similarly must be kept stable if at all possible, but
    699   cannot be configured by the user.  To do make these values stable,
    700   we will use two strategies.  First the IAID value for a given
    701   interface (if not known) will just default to the IP ifIndex value,
    702   provided that there's no known saved IAID using that value.  Second,
    703   we will save off the IAID we choose in a single /etc/dhcp/iaid file,
    704   containing an array of entries indexed by logical interface name.
    705   Keeping it in a single file allows us to scan for used and unused
    706   IAID values when necessary.
    707 
    708   This mechanism depends on the interface name, and thus will need to
    709   be revisited when Clearview vanity naming and NWAM are available.
    710 
    711   Currently, the boot system (GRUB, OBP, the miniroot) does not
    712   support installing over IPv6.  This could change in the future, so
    713   one of the goals of the above stability plan is to support that
    714   event.
    715 
    716   When running in the miniroot on an x86 system, /etc/dhcp (and the
    717   rest of the root) is mounted on a read-only ramdisk.  In this case,
    718   writing to /etc/dhcp will just never work.  A possible solution
    719   would be to add a new privileged command in ifconfig that forces
    720   dhcpagent to write to an alternate location.  The initial install
    721   process could then do "ifconfig <x> dhcp write /a" to get the needed
    722   state written out to the newly-constructed system root.
    723 
    724   This part (the new write option) won't be implemented as part of
    725   this project, because it's not needed yet.
    726 
    727 
    728 Router Advertisements
    729 
    730   IPv6 Router Advertisements perform two functions related to DHCPv6:
    731 
    732     - they specify whether and how to run DHCPv6 on a given interface.
    733     - they provide a list of the valid prefixes on an interface.
    734 
    735   For the first function, in.ndpd needs to use the same DHCP control
    736   interfaces that ifconfig uses, so that it can launch dhcpagent and
    737   trigger DHCPv6 when necessary.  Note that it never needs to shut
    738   down DHCPv6, as router advertisements can't do that.
    739 
    740   However, launching dhcpagent presents new problems.  As a part of
    741   the "Quagga SMF Modifications" project (PSARC 2006/552), in.ndpd in
    742   Nevada is now privilege-aware and runs with limited privileges,
    743   courtesy of SMF.  Dhcpagent, on the other hand, must run with all
    744   privileges.
    745 
    746   A simple work-around for this issue is to rip out the "privileges="
    747   clause from the method_credential for in.ndpd.  I've taken this
    748   direction initially, but the right longer-term answer seems to be
    749   converting dhcpagent into an SMF service.  This is quite a bit more
    750   complex, as it means turning the /sbin/dhcpagent command line
    751   interface into a utility that manipulates the service and passes the
    752   command line options via IPC extensions.
    753 
    754   Such a design also begs the question of whether dhcpagent itself
    755   ought to run with reduced privileges.  It could, but it still needs
    756   the ability to grant "all" (traditional UNIX root) privileges to the
    757   eventhook script, if present.  There seem to be few ways to do this,
    758   though it's a good area for research.
    759 
    760   The second function, prefix handling, is also subtle.  Unlike IPv4
    761   DHCP, DHCPv6 does not give the netmask or prefix length along with
    762   the leased address.  The client is on its own to determine the right
    763   netmask to use.  This is where the advertised prefixes come in:
    764   these must be used to finish the interface configuration.
    765 
    766   We will have the DHCPv6 client configure each interface with an
    767   all-ones (/128) netmask by default.  In.ndpd will be modified so
    768   that when it detects a new IFF_DHCPRUNNING IP logical interface, it
    769   checks for a known matching prefix, and sets the netmask as
    770   necessary.  If no matching prefix is known, it will send a new
    771   Router Solicitation message to try to find one.
    772 
    773   When in.ndpd learns of a new prefix from a Router Advertisement, it
    774   will scan all of the IFF_DHCPRUNNING IP logical interfaces on the
    775   same physical interface and set the netmasks when necessary.
    776   Dhcpagent, for its part, will ignore the netmask on IPv6 interfaces
    777   when checking for changes that would require it to "abandon" the
    778   interface.
    779 
    780   Given the way that DHCPv6 and in.ndpd control both the horizontal
    781   and the vertical in plumbing and removing logical interfaces, and
    782   users do not, it might be worthwhile to consider roping off any
    783   direct user changes to IPv6 logical interfaces under control of
    784   in.ndpd or dhcpagent, and instead force users through a higher-level
    785   interface.  This won't be done as part of this project, however.
    786 
    787 
    788 ARP Hardware Types
    789 
    790   There are multiple places within the DHCPv6 client where the mapping
    791   of DLPI MAC type to ARP Hardware Type is required:
    792 
    793   - When we are constructing an automatic, stable DUID for our own
    794     identity, we prefer to use a DUID-LLT if possible.  This is done
    795     by finding a link-layer interface, opening it, reading the MAC
    796     address and type, and translating in the make_stable_duid()
    797     function in libdhcpagent.
    798 
    799   - When we translate a user-configured DUID from
    800     /etc/default/dhcpagent into a binary representation, we may have
    801     to deal with a physical interface name.  In this case, we must
    802     open that interface and read the MAC address and type.
    803 
    804   - As part of the PIF data structure initialization, we need to read
    805     out the MAC type so that it can be used in the BOOTP/DHCPv4
    806     'htype' field.
    807 
    808   Ideally, these would all be provided by a single libdlpi
    809   implementation.  However, that project is on-going at this time and
    810   has not yet integrated.  For the time being, a dlpi_to_arp()
    811   translation function (taking dl_mac_type and returning an ARP
    812   Hardware Type number) will be placed in libdhcputil.
    813 
    814   This temporary function should be removed and this section of the
    815   code updated when the new libdlpi from Clearview integrates.
    816 
    817 
    818 Field Mappings
    819 
    820   Old (all in ifslist)	New
    821   next			dhcp_smach_t.dsm_next
    822   prev			dhcp_smach_t.dsm_prev
    823   if_hold_count		dhcp_smach_t.dsm_hold_count
    824   if_ia			dhcp_smach_t.dsm_ia
    825   if_async		dhcp_smach_t.dsm_async
    826   if_state		dhcp_smach_t.dsm_state
    827   if_dflags		dhcp_smach_t.dsm_dflags
    828   if_name		dhcp_smach_t.dsm_name (see text)
    829   if_index		dhcp_pif_t.pif_index
    830   if_max		dhcp_lif_t.lif_max and dhcp_pif_t.pif_max
    831   if_min		(was unused; removed)
    832   if_opt		(was unused; removed)
    833   if_hwaddr		dhcp_pif_t.pif_hwaddr
    834   if_hwlen		dhcp_pif_t.pif_hwlen
    835   if_hwtype		dhcp_pif_t.pif_hwtype
    836   if_cid		dhcp_smach_t.dsm_cid
    837   if_cidlen		dhcp_smach_t.dsm_cidlen
    838   if_prl		dhcp_smach_t.dsm_prl
    839   if_prllen		dhcp_smach_t.dsm_prllen
    840   if_daddr		dhcp_pif_t.pif_daddr
    841   if_dlen		dhcp_pif_t.pif_dlen
    842   if_saplen		dhcp_pif_t.pif_saplen
    843   if_sap_before		dhcp_pif_t.pif_sap_before
    844   if_dlpi_fd		dhcp_pif_t.pif_dlpi_fd
    845   if_sock_fd		v4_sock_fd and v6_sock_fd (globals)
    846   if_sock_ip_fd		dhcp_lif_t.lif_sock_ip_fd
    847   if_timer		(see text)
    848   if_t1			dhcp_lease_t.dl_t1
    849   if_t2			dhcp_lease_t.dl_t2
    850   if_lease		dhcp_lif_t.lif_expire
    851   if_nrouters		dhcp_smach_t.dsm_nrouters
    852   if_routers		dhcp_smach_t.dsm_routers
    853   if_server		dhcp_smach_t.dsm_server
    854   if_addr		dhcp_lif_t.lif_v6addr
    855   if_netmask		dhcp_lif_t.lif_v6mask
    856   if_broadcast		dhcp_lif_t.lif_v6peer
    857   if_ack		dhcp_smach_t.dsm_ack
    858   if_orig_ack		dhcp_smach_t.dsm_orig_ack
    859   if_offer_wait		dhcp_smach_t.dsm_offer_wait
    860   if_offer_timer	dhcp_smach_t.dsm_offer_timer
    861   if_offer_id		dhcp_pif_t.pif_dlpi_id
    862   if_acknak_id		dhcp_lif_t.lif_acknak_id
    863   if_acknak_bcast_id	v4_acknak_bcast_id (global)
    864   if_neg_monosec	dhcp_smach_t.dsm_neg_monosec
    865   if_newstart_monosec	dhcp_smach_t.dsm_newstart_monosec
    866   if_curstart_monosec	dhcp_smach_t.dsm_curstart_monosec
    867   if_disc_secs		dhcp_smach_t.dsm_disc_secs
    868   if_reqhost		dhcp_smach_t.dsm_reqhost
    869   if_recv_pkt_list	dhcp_smach_t.dsm_recv_pkt_list
    870   if_sent		dhcp_smach_t.dsm_sent
    871   if_received		dhcp_smach_t.dsm_received
    872   if_bad_offers		dhcp_smach_t.dsm_bad_offers
    873   if_send_pkt		dhcp_smach_t.dsm_send_pkt
    874   if_send_timeout	dhcp_smach_t.dsm_send_timeout
    875   if_send_dest		dhcp_smach_t.dsm_send_dest
    876   if_send_stop_func	dhcp_smach_t.dsm_send_stop_func
    877   if_packet_sent	dhcp_smach_t.dsm_packet_sent
    878   if_retrans_timer	dhcp_smach_t.dsm_retrans_timer
    879   if_script_fd		dhcp_smach_t.dsm_script_fd
    880   if_script_pid		dhcp_smach_t.dsm_script_pid
    881   if_script_helper_pid	dhcp_smach_t.dsm_script_helper_pid
    882   if_script_event	dhcp_smach_t.dsm_script_event
    883   if_script_event_id	dhcp_smach_t.dsm_script_event_id
    884   if_callback_msg	dhcp_smach_t.dsm_callback_msg
    885   if_script_callback	dhcp_smach_t.dsm_script_callback
    886 
    887   Notes:
    888 
    889     - The dsm_name field currently just points to the lif_name on the
    890       controlling LIF.  This may need to be named differently in the
    891       future; perhaps when Zones are supported.
    892 
    893     - The timer mechanism will be refactored.  Rather than using the
    894       separate if_timer[] array to hold the timer IDs and
    895       if_{t1,t2,lease} to hold the relative timer values, we will
    896       gather this information into a dhcp_timer_t structure:
    897 
    898 	dt_id		timer ID value
    899 	dt_start	relative start time
    900 
    901   New fields not accounted for above:
    902 
    903   dhcp_pif_t.pif_next		linkage in global list of PIFs
    904   dhcp_pif_t.pif_prev		linkage in global list of PIFs
    905   dhcp_pif_t.pif_lifs		pointer to list of LIFs on this PIF
    906   dhcp_pif_t.pif_isv6		IPv6 flag
    907   dhcp_pif_t.pif_dlpi_count	number of state machines using DLPI
    908   dhcp_pif_t.pif_hold_count	reference count
    909   dhcp_pif_t.pif_name		name of physical interface
    910   dhcp_lif_t.lif_next		linkage in per-PIF list of LIFs
    911   dhcp_lif_t.lif_prev		linkage in per-PIF list of LIFs
    912   dhcp_lif_t.lif_pif		backpointer to parent PIF
    913   dhcp_lif_t.lif_smachs		pointer to list of state machines
    914   dhcp_lif_t.lif_lease		backpointer to lease holding LIF
    915   dhcp_lif_t.lif_flags		interface flags (IFF_*)
    916   dhcp_lif_t.lif_hold_count	reference count
    917   dhcp_lif_t.lif_dad_wait	waiting for DAD resolution flag
    918   dhcp_lif_t.lif_removed	removed from list flag
    919   dhcp_lif_t.lif_plumbed	plumbed by dhcpagent flag
    920   dhcp_lif_t.lif_expired	lease has expired flag
    921   dhcp_lif_t.lif_declined	reason to refuse this address (string)
    922   dhcp_lif_t.lif_iaid		unique and stable 32-bit identifier
    923   dhcp_lif_t.lif_iaid_id	timer for delayed /etc writes
    924   dhcp_lif_t.lif_preferred	preferred timer for v6; deprecate after
    925   dhcp_lif_t.lif_name		name of logical interface
    926   dhcp_smach_t.dsm_lif		controlling (main) LIF
    927   dhcp_smach_t.dsm_leases	pointer to list of leases
    928   dhcp_smach_t.dsm_lif_wait	number of LIFs waiting on DAD
    929   dhcp_smach_t.dsm_lif_down	number of LIFs that have failed
    930   dhcp_smach_t.dsm_using_dlpi	currently using DLPI flag
    931   dhcp_smach_t.dsm_send_tcenter	v4 central timer value; v6 MRT
    932   dhcp_lease_t.dl_next		linkage in per-state-machine list of leases
    933   dhcp_lease_t.dl_prev		linkage in per-state-machine list of leases
    934   dhcp_lease_t.dl_smach		back pointer to state machine
    935   dhcp_lease_t.dl_lifs		pointer to first LIF configured by lease
    936   dhcp_lease_t.dl_nlifs		number of configured consecutive LIFs
    937   dhcp_lease_t.dl_hold_count	reference counter
    938   dhcp_lease_t.dl_removed	removed from list flag
    939   dhcp_lease_t.dl_stale		lease was not updated by Renew/Rebind
    940 
    941 
    942 Snoop
    943 
    944   The snoop changes are fairly straightforward.  As snoop just decodes
    945   the messages, and the message format is quite different between
    946   DHCPv4 and DHCPv6, a new module will be created to handle DHCPv6
    947   decoding, and will export a interpret_dhcpv6() function.
    948 
    949   The one bit of commonality between the two protocols is the use of
    950   ARP Hardware Type numbers, which are found in the underlying BOOTP
    951   message format for DHCPv4 and in the DUID-LL and DUID-LLT
    952   construction for DHCPv6.  To simplify this, the existing static
    953   show_htype() function in snoop_dhcp.c will be renamed to arp_htype()
    954   (to better reflect its functionality), updated with more modern
    955   hardware types, moved to snoop_arp.c (where it belongs), and made a
    956   public symbol within snoop.
    957 
    958   While I'm there, I'll update snoop_arp.c so that when it prints an
    959   ARP message in verbose mode, it uses arp_htype() to translate the
    960   ar_hrd value.
    961 
    962   The snoop updates also involve the addition of a new "dhcp6" keyword
    963   for filtering.  As a part of this, CR 6487534 will be fixed.
    964 
    965 
    966 IPv6 Source Address Selection
    967 
    968   One of the customer requests for DHCPv6 is to be able to predict the
    969   address selection behavior in the presence of both stateful and
    970   stateless addresses on the same network.
    971 
    972   Solaris implements RFC 3484 address selection behavior.  In this
    973   scheme, the first seven rules implement some basic preferences for
    974   addresses, with Rule 8 being a deterministic tie breaker.
    975 
    976   Rule 8 relies on a special function, CommonPrefixLen, defined in the
    977   RFC, that compares leading bits of the address without regard to
    978   configured prefix length.  As Rule 1 eliminates equal addresses,
    979   this always picks a single address.
    980 
    981   This rule, though, allows for additional checks:
    982 
    983    Rule 8 may be superseded if the implementation has other means of
    984    choosing among source addresses.  For example, if the implementation
    985    somehow knows which source address will result in the "best"
    986    communications performance.
    987 
    988   We will thus split Rule 8 into three separate rules:
    989 
    990   - First, compare on configured prefix.  The interface with the
    991     longest configured prefix length that also matches the candidate
    992     address will be preferred.
    993 
    994   - Next, check the type of address.  Prefer statically configured
    995     addresses above all others.  Next, those from DHCPv6.  Next,
    996     stateless autoconfigured addresses.  Finally, temporary addresses.
    997     (Note that Rule 7 will take care of temporary address preferences,
    998     so that this rule doesn't actually need to look at them.)
    999 
   1000   - Finally, run the check-all-bits (CommonPrefixLen) tie breaker.
   1001 
   1002   The result of this is that if there's a local address in the same
   1003   configured prefix, then we'll prefer that over other addresses.  If
   1004   there are multiple to choose from, then will pick static first, then
   1005   DHCPv6, then dynamic.  Finally, if there are still multiples, we'll
   1006   use the "closest" address, bitwise.
   1007 
   1008   Also, this basic implementation scheme also addresses CR 6485164, so
   1009   a fix for that will be included with this project.
   1010 
   1011 
   1012 Minor Improvements
   1013 
   1014   Various small problems with the system encountered during
   1015   development will be fixed along with this project.  Some of these
   1016   are:
   1017 
   1018   - List of ARPHRD_* types is a bit short; add some new ones.
   1019 
   1020   - List of IPPORT_* values is similarly sparse; add others in use by
   1021     snoop.
   1022 
   1023   - dhcpmsg.h lacks PRINTFLIKE for dhcpmsg(); add it.
   1024 
   1025   - CR 6482163 causes excessive lint errors with libxnet; will fix.
   1026 
   1027   - libdhcpagent uses gettimeofday() for I/O timing, and this can
   1028     drift on systems with NTP.  It should use a stable time source
   1029     (gethrtime()) instead, and should return better error values.
   1030 
   1031   - Controlling debug mode in the daemon shouldn't require changing
   1032     the command line arguments or jumping through special hoops.  I've
   1033     added undocumented ".DEBUG_LEVEL=[0-3]" and ".VERBOSE=[01]"
   1034     features to /etc/default/dhcpagent.
   1035 
   1036   - The various attributes of the IPC commands (requires privileges,
   1037     creates a new session, valid with BOOTP, immediate reply) should
   1038     be gathered together into one look-up table rather than scattered
   1039     as hard-coded tests.
   1040 
   1041   - Remove the event unregistration from the command dispatch loop and
   1042     get rid of the ipc_action_pending() botch.  We'll get a
   1043     zero-length read any time the client goes away, and that will be
   1044     enough to trigger termination.  This fix removes async_pending()
   1045     and async_timeout() as well, and fixes CR 6487958 as a
   1046     side-effect.
   1047 
   1048   - Throughout the dhcpagent code, there are private implementations
   1049     of doubly-linked and singly-linked lists for each data type.
   1050     These will all be removed and replaced with insque(3C) and
   1051     remque(3C).
   1052 
   1053 
   1054 Testing
   1055 
   1056   The implementation was tested using the TAHI test suite for DHCPv6
   1057   (www.tahi.org).  There are some peculiar aspects to this test suite,
   1058   and these issues directed some of the design.  In particular:
   1059 
   1060   - If Renew/Rebind doesn't mention one of our leases, then we need to
   1061     allow the message to be retransmitted.  Real servers are unlikely
   1062     to do this.
   1063 
   1064   - We must look for a status code within IAADDR and within IA_NA, and
   1065     handle the paradoxical case of "NoAddrAvail."  That doesn't make
   1066     sense, as a server with no addresses wouldn't use those options.
   1067     That option makes more sense at the top level of the message.
   1068 
   1069   - If we get "UseMulticast" when we were already using multicast,
   1070     then ignore the error code.  Sending another request would cause a
   1071     loop.
   1072 
   1073   - TAHI uses "NoBinding" at the top level of the message.  This
   1074     status code only makes sense within an IA, as it refers to the
   1075     GUID:IAID binding, which doesn't exist outside an IA.  We must
   1076     ignore such errors -- treat them as success.
   1077 
   1078 
   1079 Interactions With Other Projects
   1080 
   1081   Clearview UV (vanity naming) will cause link names, and thus IP
   1082   interface names, to become changeable over time.  This will break
   1083   the IAID stability mechanism if UV is used for arbitrary renaming,
   1084   rather than as just a DR enhancement.
   1085 
   1086   When this portion of Clearview integrates, this part of the DHCPv6
   1087   design may need to be revisited.  (The solution will likely be
   1088   handled at some higher layer, such as within Network Automagic.)
   1089 
   1090   Clearview is also contributing a new libdlpi that will work for
   1091   dhcpagent, and is thus removing the private dlpi_io.[ch] functions
   1092   from this daemon.  When that Clearview project integrates, the
   1093   DHCPv6 project will need to adjust to the new interfaces, and remove
   1094   or relocate the dlpi_to_arp() function.
   1095 
   1096 
   1097 Futures
   1098 
   1099   Zones currently cannot address any IP interfaces by way of DHCP.
   1100   This project will not fix that problem, but the DUID/IAID could be
   1101   used to help fix it in the future.
   1102 
   1103   In particular, the DUID allows the client to obtain separate sets of
   1104   addresses and configuration parameters on a single interface, just
   1105   like an IPv4 Client ID, but it includes a clean mechanism for vendor
   1106   extensions.  If we associate the DUID with the zone identifier or
   1107   name through an extension, then we have a really simple way of
   1108   allocating per-zone addresses.
   1109 
   1110   Moreover, RFC 4361 describes a handy way of using DHCPv6 DUID/IAID
   1111   values with IPv4 DHCP, which would quickly solve the problem of
   1112   using DHCP for IPv4 address assignment in non-global zones as well.
   1113 
   1114   (One potential risk with this plan is that there may be server
   1115   implementations that either do not implement the RFC correctly or
   1116   otherwise mishandle the DUID.  This has apparently bitten some early
   1117   adopters.)
   1118 
   1119   Implementing the FQDN option for DHCPv6 would, given the current
   1120   libdhcputil design, require a new 'type' of entry for the inittab6
   1121   file.  This is because the design does not allow for any simple
   1122   means to ``compose'' a sequence of basic types together.  Thus,
   1123   every type of option must either be a basic type, or an array of
   1124   multiple instances of the same basic type.
   1125 
   1126   If we implement FQDN in the future, it may be useful to explore some
   1127   means of allowing a given option instance to be a sequence of basic
   1128   types.
   1129 
   1130   This project does not make the DNS resolver or any other subsystem
   1131   use the data gathered by DHCPv6.  It just makes the data available
   1132   through dhcpinfo(1).  Future projects should modify those services
   1133   to use configuration data learned via DHCPv6.  (One of the reasons
   1134   this is not being done now is that Network Automagic [NWAM] will
   1135   likely be changing this area substantially in the very near future,
   1136   and thus the effort would be largely wasted.)
   1137 
   1138 
   1139 Appendix A - Choice of Venue
   1140 
   1141   There are three logical places to implement DHCPv6:
   1142 
   1143     - in dhcpagent
   1144     - in in.ndpd
   1145     - in a new daemon (say, 'dhcp6agent')
   1146 
   1147   We need to access parameters via dhcpinfo, and should provide the
   1148   same set of status and control features via ifconfig as are present
   1149   for IPv4.  (For the latter, if we fail to do that, it will likely
   1150   confuse users.  The expense for doing it is comparatively small, and
   1151   it will be useful for testing, even though it should not be needed
   1152   in normal operation.)
   1153 
   1154   If we implement somewhere other than dhcpagent, then we need to give
   1155   that new daemon (in.ndpd or dhcp6agent) the same basic IPC features
   1156   as dhcpagent already has.  This means either extracting those bits
   1157   (async.c and ipc_action.c) into a shared library or just copying
   1158   them.  Obviously, the former would be preferred, but as those bits
   1159   depend on the rest of the dhcpagent infrastructure for timers and
   1160   state handling, this means that the new process would have to look a
   1161   lot like dhcpagent.
   1162 
   1163   Implementing DHCPv6 as part of in.ndpd is attractive, as it
   1164   eliminates the confusion that the router discovery process for
   1165   determining interface netmasks can cause, along with the need to do
   1166   any signaling at all to bring DHCPv6 up.  However, the need to make
   1167   in.ndpd more like dhcpagent is unattractive.
   1168 
   1169   Having a new dhcp6agent daemon seems to have little to recommend it,
   1170   other than leaving the existing dhcpagent code untouched.  If we do
   1171   that, then we end up with two implementations that do many similar
   1172   things, and must be maintained in parallel.
   1173 
   1174   Thus, although it leads to some complexity in reworking the data
   1175   structures to fit both protocols, on balance the simplest solution
   1176   is to extend dhcpagent.
   1177 
   1178 
   1179 Appendix B - Cross-Reference
   1180 
   1181   in.ndpd
   1182 
   1183     - Start dhcpagent and issue "dhcp start" command via libdhcpagent
   1184     - Parse StatefulAddrConf interface option from ndpd.conf
   1185     - Watch for M and O bits to trigger DHCPv6
   1186     - Handle "no routers found" case and start DHCPv6
   1187     - Track prefixes and set prefix length on IFF_DHCPRUNNING aliases
   1188     - Send new Router Solicitation when prefix unknown
   1189     - Change privileges so that dhcpagent can be launched successfully
   1190 
   1191   libdhcputil
   1192 
   1193     - Parse new /etc/dhcp/inittab6 file
   1194     - Handle new UNUMBER24, SNUMBER64, IPV6, DUID and DOMAIN types
   1195     - Add DHCPv6 option iterators (dhcpv6_find_option and
   1196       dhcpv6_pkt_option)
   1197     - Add dlpi_to_arp function (temporary)
   1198 
   1199   libdhcpagent
   1200 
   1201     - Add stable DUID and IAID creation and storage support
   1202       functions and add new dhcp_stable.h include file
   1203     - Support new DECLINING and RELEASING states introduced by DHCPv6.
   1204     - Update implementation so that it doesn't rely on gettimeofday()
   1205       for I/O timeouts
   1206     - Extend the hostconf functions to support DHCPv6, using a new
   1207       ".dh6" file
   1208 
   1209   snoop
   1210 
   1211     - Add support for DHCPv6 packet decoding (all types)
   1212     - Add "dhcp6" filter keyword
   1213     - Fix known bugs in DHCP filtering
   1214 
   1215   ifconfig
   1216 
   1217     - Remove inet-only restriction on "dhcp" keyword
   1218 
   1219   netstat
   1220 
   1221     - Remove strange "-I list" feature.
   1222     - Add support for DHCPv6 and iterating over IPv6 interfaces.
   1223 
   1224   ip
   1225 
   1226     - Add extensions to IPv6 source address selection to prefer DHCPv6
   1227       addresses when all else is equal
   1228     - Fix known bugs in source address selection (remaining from TX
   1229       integration)
   1230 
   1231   other
   1232 
   1233     - Add ifindex and source/destination address into PKT_LIST.
   1234     - Add more ARPHDR_* and IPPORT_* values.
   1235