123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506 |
- 1. Control Interfaces
- The interfaces for receiving network packages timestamps are:
- * SO_TIMESTAMP
- Generates a timestamp for each incoming packet in (not necessarily
- monotonic) system time. Reports the timestamp via recvmsg() in a
- control message as struct timeval (usec resolution).
- * SO_TIMESTAMPNS
- Same timestamping mechanism as SO_TIMESTAMP, but reports the
- timestamp as struct timespec (nsec resolution).
- * IP_MULTICAST_LOOP + SO_TIMESTAMP[NS]
- Only for multicast:approximate transmit timestamp obtained by
- reading the looped packet receive timestamp.
- * SO_TIMESTAMPING
- Generates timestamps on reception, transmission or both. Supports
- multiple timestamp sources, including hardware. Supports generating
- timestamps for stream sockets.
- 1.1 SO_TIMESTAMP:
- This socket option enables timestamping of datagrams on the reception
- path. Because the destination socket, if any, is not known early in
- the network stack, the feature has to be enabled for all packets. The
- same is true for all early receive timestamp options.
- For interface details, see `man 7 socket`.
- 1.2 SO_TIMESTAMPNS:
- This option is identical to SO_TIMESTAMP except for the returned data type.
- Its struct timespec allows for higher resolution (ns) timestamps than the
- timeval of SO_TIMESTAMP (ms).
- 1.3 SO_TIMESTAMPING:
- Supports multiple types of timestamp requests. As a result, this
- socket option takes a bitmap of flags, not a boolean. In
- err = setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, (void *) val,
- sizeof(val));
- val is an integer with any of the following bits set. Setting other
- bit returns EINVAL and does not change the current state.
- The socket option configures timestamp generation for individual
- sk_buffs (1.3.1), timestamp reporting to the socket's error
- queue (1.3.2) and options (1.3.3). Timestamp generation can also
- be enabled for individual sendmsg calls using cmsg (1.3.4).
- 1.3.1 Timestamp Generation
- Some bits are requests to the stack to try to generate timestamps. Any
- combination of them is valid. Changes to these bits apply to newly
- created packets, not to packets already in the stack. As a result, it
- is possible to selectively request timestamps for a subset of packets
- (e.g., for sampling) by embedding an send() call within two setsockopt
- calls, one to enable timestamp generation and one to disable it.
- Timestamps may also be generated for reasons other than being
- requested by a particular socket, such as when receive timestamping is
- enabled system wide, as explained earlier.
- SOF_TIMESTAMPING_RX_HARDWARE:
- Request rx timestamps generated by the network adapter.
- SOF_TIMESTAMPING_RX_SOFTWARE:
- Request rx timestamps when data enters the kernel. These timestamps
- are generated just after a device driver hands a packet to the
- kernel receive stack.
- SOF_TIMESTAMPING_TX_HARDWARE:
- Request tx timestamps generated by the network adapter. This flag
- can be enabled via both socket options and control messages.
- SOF_TIMESTAMPING_TX_SOFTWARE:
- Request tx timestamps when data leaves the kernel. These timestamps
- are generated in the device driver as close as possible, but always
- prior to, passing the packet to the network interface. Hence, they
- require driver support and may not be available for all devices.
- This flag can be enabled via both socket options and control messages.
- SOF_TIMESTAMPING_TX_SCHED:
- Request tx timestamps prior to entering the packet scheduler. Kernel
- transmit latency is, if long, often dominated by queuing delay. The
- difference between this timestamp and one taken at
- SOF_TIMESTAMPING_TX_SOFTWARE will expose this latency independent
- of protocol processing. The latency incurred in protocol
- processing, if any, can be computed by subtracting a userspace
- timestamp taken immediately before send() from this timestamp. On
- machines with virtual devices where a transmitted packet travels
- through multiple devices and, hence, multiple packet schedulers,
- a timestamp is generated at each layer. This allows for fine
- grained measurement of queuing delay. This flag can be enabled
- via both socket options and control messages.
- SOF_TIMESTAMPING_TX_ACK:
- Request tx timestamps when all data in the send buffer has been
- acknowledged. This only makes sense for reliable protocols. It is
- currently only implemented for TCP. For that protocol, it may
- over-report measurement, because the timestamp is generated when all
- data up to and including the buffer at send() was acknowledged: the
- cumulative acknowledgment. The mechanism ignores SACK and FACK.
- This flag can be enabled via both socket options and control messages.
- 1.3.2 Timestamp Reporting
- The other three bits control which timestamps will be reported in a
- generated control message. Changes to the bits take immediate
- effect at the timestamp reporting locations in the stack. Timestamps
- are only reported for packets that also have the relevant timestamp
- generation request set.
- SOF_TIMESTAMPING_SOFTWARE:
- Report any software timestamps when available.
- SOF_TIMESTAMPING_SYS_HARDWARE:
- This option is deprecated and ignored.
- SOF_TIMESTAMPING_RAW_HARDWARE:
- Report hardware timestamps as generated by
- SOF_TIMESTAMPING_TX_HARDWARE when available.
- 1.3.3 Timestamp Options
- The interface supports the options
- SOF_TIMESTAMPING_OPT_ID:
- Generate a unique identifier along with each packet. A process can
- have multiple concurrent timestamping requests outstanding. Packets
- can be reordered in the transmit path, for instance in the packet
- scheduler. In that case timestamps will be queued onto the error
- queue out of order from the original send() calls. It is not always
- possible to uniquely match timestamps to the original send() calls
- based on timestamp order or payload inspection alone, then.
- This option associates each packet at send() with a unique
- identifier and returns that along with the timestamp. The identifier
- is derived from a per-socket u32 counter (that wraps). For datagram
- sockets, the counter increments with each sent packet. For stream
- sockets, it increments with every byte.
- The counter starts at zero. It is initialized the first time that
- the socket option is enabled. It is reset each time the option is
- enabled after having been disabled. Resetting the counter does not
- change the identifiers of existing packets in the system.
- This option is implemented only for transmit timestamps. There, the
- timestamp is always looped along with a struct sock_extended_err.
- The option modifies field ee_data to pass an id that is unique
- among all possibly concurrently outstanding timestamp requests for
- that socket.
- SOF_TIMESTAMPING_OPT_CMSG:
- Support recv() cmsg for all timestamped packets. Control messages
- are already supported unconditionally on all packets with receive
- timestamps and on IPv6 packets with transmit timestamp. This option
- extends them to IPv4 packets with transmit timestamp. One use case
- is to correlate packets with their egress device, by enabling socket
- option IP_PKTINFO simultaneously.
- SOF_TIMESTAMPING_OPT_TSONLY:
- Applies to transmit timestamps only. Makes the kernel return the
- timestamp as a cmsg alongside an empty packet, as opposed to
- alongside the original packet. This reduces the amount of memory
- charged to the socket's receive budget (SO_RCVBUF) and delivers
- the timestamp even if sysctl net.core.tstamp_allow_data is 0.
- This option disables SOF_TIMESTAMPING_OPT_CMSG.
- New applications are encouraged to pass SOF_TIMESTAMPING_OPT_ID to
- disambiguate timestamps and SOF_TIMESTAMPING_OPT_TSONLY to operate
- regardless of the setting of sysctl net.core.tstamp_allow_data.
- An exception is when a process needs additional cmsg data, for
- instance SOL_IP/IP_PKTINFO to detect the egress network interface.
- Then pass option SOF_TIMESTAMPING_OPT_CMSG. This option depends on
- having access to the contents of the original packet, so cannot be
- combined with SOF_TIMESTAMPING_OPT_TSONLY.
- 1.3.4. Enabling timestamps via control messages
- In addition to socket options, timestamp generation can be requested
- per write via cmsg, only for SOF_TIMESTAMPING_TX_* (see Section 1.3.1).
- Using this feature, applications can sample timestamps per sendmsg()
- without paying the overhead of enabling and disabling timestamps via
- setsockopt:
- struct msghdr *msg;
- ...
- cmsg = CMSG_FIRSTHDR(msg);
- cmsg->cmsg_level = SOL_SOCKET;
- cmsg->cmsg_type = SO_TIMESTAMPING;
- cmsg->cmsg_len = CMSG_LEN(sizeof(__u32));
- *((__u32 *) CMSG_DATA(cmsg)) = SOF_TIMESTAMPING_TX_SCHED |
- SOF_TIMESTAMPING_TX_SOFTWARE |
- SOF_TIMESTAMPING_TX_ACK;
- err = sendmsg(fd, msg, 0);
- The SOF_TIMESTAMPING_TX_* flags set via cmsg will override
- the SOF_TIMESTAMPING_TX_* flags set via setsockopt.
- Moreover, applications must still enable timestamp reporting via
- setsockopt to receive timestamps:
- __u32 val = SOF_TIMESTAMPING_SOFTWARE |
- SOF_TIMESTAMPING_OPT_ID /* or any other flag */;
- err = setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING, (void *) val,
- sizeof(val));
- 1.4 Bytestream Timestamps
- The SO_TIMESTAMPING interface supports timestamping of bytes in a
- bytestream. Each request is interpreted as a request for when the
- entire contents of the buffer has passed a timestamping point. That
- is, for streams option SOF_TIMESTAMPING_TX_SOFTWARE will record
- when all bytes have reached the device driver, regardless of how
- many packets the data has been converted into.
- In general, bytestreams have no natural delimiters and therefore
- correlating a timestamp with data is non-trivial. A range of bytes
- may be split across segments, any segments may be merged (possibly
- coalescing sections of previously segmented buffers associated with
- independent send() calls). Segments can be reordered and the same
- byte range can coexist in multiple segments for protocols that
- implement retransmissions.
- It is essential that all timestamps implement the same semantics,
- regardless of these possible transformations, as otherwise they are
- incomparable. Handling "rare" corner cases differently from the
- simple case (a 1:1 mapping from buffer to skb) is insufficient
- because performance debugging often needs to focus on such outliers.
- In practice, timestamps can be correlated with segments of a
- bytestream consistently, if both semantics of the timestamp and the
- timing of measurement are chosen correctly. This challenge is no
- different from deciding on a strategy for IP fragmentation. There, the
- definition is that only the first fragment is timestamped. For
- bytestreams, we chose that a timestamp is generated only when all
- bytes have passed a point. SOF_TIMESTAMPING_TX_ACK as defined is easy to
- implement and reason about. An implementation that has to take into
- account SACK would be more complex due to possible transmission holes
- and out of order arrival.
- On the host, TCP can also break the simple 1:1 mapping from buffer to
- skbuff as a result of Nagle, cork, autocork, segmentation and GSO. The
- implementation ensures correctness in all cases by tracking the
- individual last byte passed to send(), even if it is no longer the
- last byte after an skbuff extend or merge operation. It stores the
- relevant sequence number in skb_shinfo(skb)->tskey. Because an skbuff
- has only one such field, only one timestamp can be generated.
- In rare cases, a timestamp request can be missed if two requests are
- collapsed onto the same skb. A process can detect this situation by
- enabling SOF_TIMESTAMPING_OPT_ID and comparing the byte offset at
- send time with the value returned for each timestamp. It can prevent
- the situation by always flushing the TCP stack in between requests,
- for instance by enabling TCP_NODELAY and disabling TCP_CORK and
- autocork.
- These precautions ensure that the timestamp is generated only when all
- bytes have passed a timestamp point, assuming that the network stack
- itself does not reorder the segments. The stack indeed tries to avoid
- reordering. The one exception is under administrator control: it is
- possible to construct a packet scheduler configuration that delays
- segments from the same stream differently. Such a setup would be
- unusual.
- 2 Data Interfaces
- Timestamps are read using the ancillary data feature of recvmsg().
- See `man 3 cmsg` for details of this interface. The socket manual
- page (`man 7 socket`) describes how timestamps generated with
- SO_TIMESTAMP and SO_TIMESTAMPNS records can be retrieved.
- 2.1 SCM_TIMESTAMPING records
- These timestamps are returned in a control message with cmsg_level
- SOL_SOCKET, cmsg_type SCM_TIMESTAMPING, and payload of type
- struct scm_timestamping {
- struct timespec ts[3];
- };
- The structure can return up to three timestamps. This is a legacy
- feature. Only one field is non-zero at any time. Most timestamps
- are passed in ts[0]. Hardware timestamps are passed in ts[2].
- ts[1] used to hold hardware timestamps converted to system time.
- Instead, expose the hardware clock device on the NIC directly as
- a HW PTP clock source, to allow time conversion in userspace and
- optionally synchronize system time with a userspace PTP stack such
- as linuxptp. For the PTP clock API, see Documentation/ptp/ptp.txt.
- 2.1.1 Transmit timestamps with MSG_ERRQUEUE
- For transmit timestamps the outgoing packet is looped back to the
- socket's error queue with the send timestamp(s) attached. A process
- receives the timestamps by calling recvmsg() with flag MSG_ERRQUEUE
- set and with a msg_control buffer sufficiently large to receive the
- relevant metadata structures. The recvmsg call returns the original
- outgoing data packet with two ancillary messages attached.
- A message of cm_level SOL_IP(V6) and cm_type IP(V6)_RECVERR
- embeds a struct sock_extended_err. This defines the error type. For
- timestamps, the ee_errno field is ENOMSG. The other ancillary message
- will have cm_level SOL_SOCKET and cm_type SCM_TIMESTAMPING. This
- embeds the struct scm_timestamping.
- 2.1.1.2 Timestamp types
- The semantics of the three struct timespec are defined by field
- ee_info in the extended error structure. It contains a value of
- type SCM_TSTAMP_* to define the actual timestamp passed in
- scm_timestamping.
- The SCM_TSTAMP_* types are 1:1 matches to the SOF_TIMESTAMPING_*
- control fields discussed previously, with one exception. For legacy
- reasons, SCM_TSTAMP_SND is equal to zero and can be set for both
- SOF_TIMESTAMPING_TX_HARDWARE and SOF_TIMESTAMPING_TX_SOFTWARE. It
- is the first if ts[2] is non-zero, the second otherwise, in which
- case the timestamp is stored in ts[0].
- 2.1.1.3 Fragmentation
- Fragmentation of outgoing datagrams is rare, but is possible, e.g., by
- explicitly disabling PMTU discovery. If an outgoing packet is fragmented,
- then only the first fragment is timestamped and returned to the sending
- socket.
- 2.1.1.4 Packet Payload
- The calling application is often not interested in receiving the whole
- packet payload that it passed to the stack originally: the socket
- error queue mechanism is just a method to piggyback the timestamp on.
- In this case, the application can choose to read datagrams with a
- smaller buffer, possibly even of length 0. The payload is truncated
- accordingly. Until the process calls recvmsg() on the error queue,
- however, the full packet is queued, taking up budget from SO_RCVBUF.
- 2.1.1.5 Blocking Read
- Reading from the error queue is always a non-blocking operation. To
- block waiting on a timestamp, use poll or select. poll() will return
- POLLERR in pollfd.revents if any data is ready on the error queue.
- There is no need to pass this flag in pollfd.events. This flag is
- ignored on request. See also `man 2 poll`.
- 2.1.2 Receive timestamps
- On reception, there is no reason to read from the socket error queue.
- The SCM_TIMESTAMPING ancillary data is sent along with the packet data
- on a normal recvmsg(). Since this is not a socket error, it is not
- accompanied by a message SOL_IP(V6)/IP(V6)_RECVERROR. In this case,
- the meaning of the three fields in struct scm_timestamping is
- implicitly defined. ts[0] holds a software timestamp if set, ts[1]
- is again deprecated and ts[2] holds a hardware timestamp if set.
- 3. Hardware Timestamping configuration: SIOCSHWTSTAMP and SIOCGHWTSTAMP
- Hardware time stamping must also be initialized for each device driver
- that is expected to do hardware time stamping. The parameter is defined in
- /include/linux/net_tstamp.h as:
- struct hwtstamp_config {
- int flags; /* no flags defined right now, must be zero */
- int tx_type; /* HWTSTAMP_TX_* */
- int rx_filter; /* HWTSTAMP_FILTER_* */
- };
- Desired behavior is passed into the kernel and to a specific device by
- calling ioctl(SIOCSHWTSTAMP) with a pointer to a struct ifreq whose
- ifr_data points to a struct hwtstamp_config. The tx_type and
- rx_filter are hints to the driver what it is expected to do. If
- the requested fine-grained filtering for incoming packets is not
- supported, the driver may time stamp more than just the requested types
- of packets.
- Drivers are free to use a more permissive configuration than the requested
- configuration. It is expected that drivers should only implement directly the
- most generic mode that can be supported. For example if the hardware can
- support HWTSTAMP_FILTER_V2_EVENT, then it should generally always upscale
- HWTSTAMP_FILTER_V2_L2_SYNC_MESSAGE, and so forth, as HWTSTAMP_FILTER_V2_EVENT
- is more generic (and more useful to applications).
- A driver which supports hardware time stamping shall update the struct
- with the actual, possibly more permissive configuration. If the
- requested packets cannot be time stamped, then nothing should be
- changed and ERANGE shall be returned (in contrast to EINVAL, which
- indicates that SIOCSHWTSTAMP is not supported at all).
- Only a processes with admin rights may change the configuration. User
- space is responsible to ensure that multiple processes don't interfere
- with each other and that the settings are reset.
- Any process can read the actual configuration by passing this
- structure to ioctl(SIOCGHWTSTAMP) in the same way. However, this has
- not been implemented in all drivers.
- /* possible values for hwtstamp_config->tx_type */
- enum {
- /*
- * no outgoing packet will need hardware time stamping;
- * should a packet arrive which asks for it, no hardware
- * time stamping will be done
- */
- HWTSTAMP_TX_OFF,
- /*
- * enables hardware time stamping for outgoing packets;
- * the sender of the packet decides which are to be
- * time stamped by setting SOF_TIMESTAMPING_TX_SOFTWARE
- * before sending the packet
- */
- HWTSTAMP_TX_ON,
- };
- /* possible values for hwtstamp_config->rx_filter */
- enum {
- /* time stamp no incoming packet at all */
- HWTSTAMP_FILTER_NONE,
- /* time stamp any incoming packet */
- HWTSTAMP_FILTER_ALL,
- /* return value: time stamp all packets requested plus some others */
- HWTSTAMP_FILTER_SOME,
- /* PTP v1, UDP, any kind of event packet */
- HWTSTAMP_FILTER_PTP_V1_L4_EVENT,
- /* for the complete list of values, please check
- * the include file /include/linux/net_tstamp.h
- */
- };
- 3.1 Hardware Timestamping Implementation: Device Drivers
- A driver which supports hardware time stamping must support the
- SIOCSHWTSTAMP ioctl and update the supplied struct hwtstamp_config with
- the actual values as described in the section on SIOCSHWTSTAMP. It
- should also support SIOCGHWTSTAMP.
- Time stamps for received packets must be stored in the skb. To get a pointer
- to the shared time stamp structure of the skb call skb_hwtstamps(). Then
- set the time stamps in the structure:
- struct skb_shared_hwtstamps {
- /* hardware time stamp transformed into duration
- * since arbitrary point in time
- */
- ktime_t hwtstamp;
- };
- Time stamps for outgoing packets are to be generated as follows:
- - In hard_start_xmit(), check if (skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP)
- is set no-zero. If yes, then the driver is expected to do hardware time
- stamping.
- - If this is possible for the skb and requested, then declare
- that the driver is doing the time stamping by setting the flag
- SKBTX_IN_PROGRESS in skb_shinfo(skb)->tx_flags , e.g. with
- skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
- You might want to keep a pointer to the associated skb for the next step
- and not free the skb. A driver not supporting hardware time stamping doesn't
- do that. A driver must never touch sk_buff::tstamp! It is used to store
- software generated time stamps by the network subsystem.
- - Driver should call skb_tx_timestamp() as close to passing sk_buff to hardware
- as possible. skb_tx_timestamp() provides a software time stamp if requested
- and hardware timestamping is not possible (SKBTX_IN_PROGRESS not set).
- - As soon as the driver has sent the packet and/or obtained a
- hardware time stamp for it, it passes the time stamp back by
- calling skb_hwtstamp_tx() with the original skb, the raw
- hardware time stamp. skb_hwtstamp_tx() clones the original skb and
- adds the timestamps, therefore the original skb has to be freed now.
- If obtaining the hardware time stamp somehow fails, then the driver
- should not fall back to software time stamping. The rationale is that
- this would occur at a later time in the processing pipeline than other
- software time stamping and therefore could lead to unexpected deltas
- between time stamps.
|