Class LibIOURing
It allows the user to submit one or more I/O requests, which are processed asynchronously without blocking the calling process. io_uring gets
its name from ring buffers which are shared between user space and kernel space. This arrangement allows for efficient I/O, while avoiding the overhead
of copying buffers between them, where possible. This interface makes io_uring different from other UNIX I/O APIs, wherein, rather than just
communicate between kernel and user space with system calls, ring buffers are used as the main mode of communication. This arrangement has various
performance benefits which are discussed in a separate section below. This man page uses the terms shared buffers, shared ring buffers and queues
interchangeably.
The general programming model you need to follow for io_uring is outlined below
- Set up shared buffers with
setupandmmap(2), mapping into user space shared buffers for the submission queue (SQ) and the completion queue (CQ). You place I/O requests you want to make on the SQ, while the kernel places the results of those operations on the CQ. - For every I/O request you need to make (like to read a file, write a file, accept a socket connection, etc), you create a submission queue entry,
or SQE, describe the I/O operation you need to get done and add it to the tail of the submission queue (SQ). Each I/O operation is, in essence, the
equivalent of a system call you would have made otherwise, if you were not using
io_uring. You can add more than one SQE to the queue depending on the number of operations you want to request. - After you add one or more SQEs, you need to call
enterto tell the kernel to dequeue your I/O requests off the SQ and begin processing them. - For each SQE you submit, once it is done processing the request, the kernel places a completion queue event or CQE at the tail of the completion
queue or CQ. The kernel places exactly one matching CQE in the CQ for every SQE you submit on the SQ. After you retrieve a CQE, minimally, you
might be interested in checking the res field of the CQE structure, which corresponds to the return value of the system call's equivalent, had you
used it directly without using
io_uring. For instance, a read operation underio_uring, started with theOP_READoperation, which issues the equivalent of theread(2)system call, would return as part ofreswhatread(2)would have returned if called directly, without usingio_uring. - Optionally,
entercan also wait for a specified number of requests to be processed by the kernel before it returns. If you specified a certain number of completions to wait for, the kernel would have placed at least those many number of CQEs on the CQ, which you can then readily read, right after the return fromio_uring_enter(2). - It is important to remember that I/O requests submitted to the kernel can complete in any order. It is not necessary for the kernel to process one
request after another, in the order you placed them. Given that the interface is a ring, the requests are attempted in order, however that doesn't
imply any sort of ordering on their completion. When more than one request is in flight, it is not possible to determine which one will complete
first. When you dequeue CQEs off the CQ, you should always check which submitted request it corresponds to. The most common method for doing so is
utilizing the
user_datafield in the request, which is passed back on the completion side.
Adding to and reading from the queues:
- You add SQEs to the tail of the SQ. The kernel reads SQEs off the head of the queue.
- The kernel adds CQEs to the tail of the CQ. You read CQEs off the head of the queue.
Submission queue polling
One of the goals of io_uring is to provide a means for efficient I/O. To this end, io_uring supports a polling mode that lets you avoid
the call to enter, which you use to inform the kernel that you have queued SQEs on to the SQ. With SQ Polling, io_uring starts a kernel
thread that polls the submission queue for any I/O requests you submit by adding SQEs. With SQ Polling enabled, there is no need for you to call
io_uring_enter(2), letting you avoid the overhead of system calls. A designated kernel thread dequeues SQEs off the SQ as you add them and
dispatches them for asynchronous processing.
Setting up io_uring
The main steps in setting up io_uring consist of mapping in the shared buffers with mmap(2) calls.
Submitting I/O requests
The process of submitting a request consists of describing the I/O operation you need to get done using an io_uring_sqe structure instance.
These details describe the equivalent system call and its parameters. Because the range of I/O operations Linux supports are very varied and the
io_uring_sqe structure needs to be able to describe them, it has several fields, some packed into unions for space efficiency.
To submit an I/O request to io_uring, you need to acquire a submission queue entry (SQE) from the submission queue (SQ), fill it up with
details of the operation you want to submit and call enter. If you want to avoid calling io_uring_enter(2), you have the option of setting
up Submission Queue Polling.
SQEs are added to the tail of the submission queue. The kernel picks up SQEs off the head of the SQ. The general algorithm to get the next available SQE and update the tail is as follows.
struct io_uring_sqe *sqe;
unsigned tail, index;
tail = *sqring->tail;
index = tail & (*sqring->ring_mask);
sqe = &sqring->sqes[index];
// fill up details about this I/O request
describe_io(sqe);
// fill the sqe index into the SQ ring array
sqring->array[index] = index;
tail++;
atomic_store_release(sqring->tail, tail);
To get the index of an entry, the application must mask the current tail index with the size mask of the ring. This holds true for both SQs and CQs. Once the SQE is acquired, the necessary fields are filled in, describing the request. While the CQ ring directly indexes the shared array of CQEs, the submission side has an indirection array between them. The submission side ring buffer is an index into this array, which in turn contains the index into the SQEs.
The following code snippet demonstrates how a read operation, an equivalent of a preadv2(2) system call is described by filling up an SQE with the necessary parameters.
struct iovec iovecs[16];
...
sqe->opcode = IORING_OP_READV;
sqe->fd = fd;
sqe->addr = (unsigned long) iovecs;
sqe->len = 16;
sqe->off = offset;
sqe->flags = 0;
Memory ordering
Modern compilers and CPUs freely reorder reads and writes without affecting the program's outcome to optimize performance. Some aspects of this need to
be kept in mind on SMP systems since io_uring involves buffers shared between kernel and user space. These buffers are both visible and
modifiable from kernel and user space. As heads and tails belonging to these shared buffers are updated by kernel and user space, changes need to be
coherently visible on either side, irrespective of whether a CPU switch took place after the kernel-user mode switch happened. We use memory barriers
to enforce this coherency. Being significantly large subjects on their own, memory barriers are out of scope for further discussion on this man page.
Letting the kernel know about I/O submissions
Once you place one or more SQEs on to the SQ, you need to let the kernel know that you've done so. You can do this by calling the enter system call.
This system call is also capable of waiting for a specified count of events to complete. This way, you can be sure to find completion events in the
completion queue without having to poll it for events later.
Reading completion events
Similar to the submission queue (SQ), the completion queue (CQ) is a shared buffer between the kernel and user space. Whereas you placed submission queue entries on the tail of the SQ and the kernel read off the head, when it comes to the CQ, the kernel places completion queue events or CQEs on the tail of the CQ and you read off its head.
Submission is flexible (and thus a bit more complicated) since it needs to be able to encode different types of system calls that take various
parameters. Completion, on the other hand is simpler since we're looking only for a return value back from the kernel. This is easily understood by
looking at the completion queue event structure, IOURingCQE.
Here, user_data is custom data that is passed unchanged from submission to completion. That is, from SQEs to CQEs. This field can be used to
set context, uniquely identifying submissions that got completed. Given that I/O requests can complete in any order, this field can be used to
correlate a submission with a completion. res is the result from the system call that was performed as part of the submission; its return
value. The flags field could carry request-specific metadata in the future, but is currently unused.
The general sequence to read completion events off the completion queue is as follows:
unsigned head;
head = *cqring->head;
if (head != atomic_load_acquire(cqring->tail)) {
struct io_uring_cqe *cqe;
unsigned index;
index = head & (cqring->mask);
cqe = &cqring->cqes[index];
// process completed CQE
process_cqe(cqe);
// CQE consumption complete
head++;
}
atomic_store_release(cqring->head, head);
It helps to be reminded that the kernel adds CQEs to the tail of the CQ, while you need to dequeue them off the head. To get the index of an entry at the head, the application must mask the current head index with the size mask of the ring. Once the CQE has been consumed or processed, the head needs to be updated to reflect the consumption of the CQE. Attention should be paid to the read and write barriers to ensure successful read and update of the head.
io_uring performance
Because of the shared ring buffers between kernel and user space, io_uring can be a zero-copy system. Copying buffers to and from becomes
necessary when system calls that transfer data between kernel and user space are involved. But since the bulk of the communication in io_uring
is via buffers shared between the kernel and user space, this huge performance overhead is completely avoided.
While system calls may not seem like a significant overhead, in high performance applications, making a lot of them will begin to matter. While
workarounds the operating system has in place to deal with Spectre and Meltdown are ideally best done away with, unfortunately, some of these
workarounds are around the system call interface, making system calls not as cheap as before on affected hardware. While newer hardware should not need
these workarounds, hardware with these vulnerabilities can be expected to be in the wild for a long time. While using synchronous programming
interfaces or even when using asynchronous programming interfaces under Linux, there is at least one system call involved in the submission of each
request. In io_uring, on the other hand, you can batch several requests in one go, simply by queueing up multiple SQEs, each describing an I/O
operation you want and make a single call to enter. This is possible due to io_uring's shared buffers based design.
While this batching in itself can avoid the overhead associated with potentially multiple and frequent system calls, you can reduce even this overhead
further with Submission Queue Polling, by having the kernel poll and pick up your SQEs for processing as you add them to the submission queue. This
avoids the io_uring_enter(2) call you need to make to tell the kernel to pick SQEs up. For high-performance applications, this means even
lesser system call overheads.
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final intstatic final intio_wq_typestatic final intio_wq_typestatic final intAccept flags stored insqe->iopriostatic final intAccept flags stored insqe->iopriostatic final intAccept flags stored insqe->iopriostatic final intASYNC_CANCELflags.static final intASYNC_CANCELflags.static final intASYNC_CANCELflags.static final intASYNC_CANCELflags.static final intASYNC_CANCELflags.static final intASYNC_CANCELflags.static final intcq_ring->flagsstatic final intstatic final intcqe->flagsstatic final intcqe->flagsstatic final intcqe->flagsstatic final intcqe->flagsstatic final intcqe->flagsstatic final intio_uring_enter(2)flagsstatic final intio_uring_enter(2)flagsstatic final intio_uring_enter(2)flagsstatic final intio_uring_enter(2)flagsstatic final intio_uring_enter(2)flagsstatic final intio_uring_enter(2)flagsstatic final intio_uring_params->featuresflagsstatic final intio_uring_params->featuresflagsstatic final intio_uring_params->featuresflagsstatic final intio_uring_params->featuresflagsstatic final intio_uring_params->featuresflagsstatic final intio_uring_params->featuresflagsstatic final intio_uring_params->featuresflagsstatic final intio_uring_params->featuresflagsstatic final intio_uring_params->featuresflagsstatic final intio_uring_params->featuresflagsstatic final intio_uring_params->featuresflagsstatic final intio_uring_params->featuresflagsstatic final intio_uring_params->featuresflagsstatic final intio_uring_params->featuresflagsstatic final intio_uring_params->featuresflagsstatic final intio_uring_params->featuresflagsstatic final intIfsqe->file_indexis set to this for opcodes that instantiate a new direct descriptor (likeopenat/openat2/accept), then io_uring will allocate an available direct descriptor instead of having the application pass one in.static final intOP_FIXED_FD_INSTALLflags (sqe->install_fd_flags)static final intsqe->fsync_flagsstatic final intsqe->timeout_flagsstatic final intstatic final intio_uring_msg_ring_flagsstatic final intOP_MSG_RINGflags (sqe->msg_ring_flags)static final intOP_MSG_RINGflags (sqe->msg_ring_flags)static final intio_uring_msg_ring_flagsstatic final intOP_NOPflags (sqe->nop_flags)static final intstatic final longMagic offsets for the application tommapthe data it needsstatic final longMagic offsets for the application tommapthe data it needsstatic final longMagic offsets for the application tommapthe data it needsstatic final longMagic offsets for the application tommapthe data it needsstatic final longMagic offsets for the application tommapthe data it needsstatic final longMagic offsets for the application tommapthe data it needsstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final byteio_uring_opstatic final intPOLL_ADDflags.static final intPOLL_ADDflags.static final intPOLL_ADDflags.static final intPOLL_ADDflags.static final intsend/sendmsgandrecv/recvmsgflags (sqe->ioprio)static final intsend/sendmsgandrecv/recvmsgflags (sqe->ioprio)static final intsend/sendmsgandrecv/recvmsgflags (sqe->ioprio)static final intsend/sendmsgandrecv/recvmsgflags (sqe->ioprio)static final intio_uring_register_opstatic final intio_uring_register_opstatic final intio_uring_register_opstatic final intio_uring_register_opstatic final intio_uring_register_opstatic final intio_uring_register_opstatic final intio_uring_register_opstatic final intio_uring_register_opstatic final intio_uring_register_opstatic final intio_uring_register_opstatic final intSkip updating fd indexes set to this value in the fd table.static final intio_uring_register_opstatic final intio_uring_register_opstatic final intio_uring_register_opstatic final intio_uring_register_opstatic final intio_uring_register_opstatic final intio_uring_register_opstatic final intio_uring_register_opstatic final intio_uring_register_opstatic final intio_uring_register_opstatic final intio_uring_register_opstatic final intio_uring_register_opstatic final intio_uring_register_opstatic final intio_uring_register_opstatic final intstatic final intio_uring_register_opstatic final intio_uring_register_opstatic final intio_uring_register_restriction_opstatic final intio_uring_register_restriction_opstatic final intio_uring_register_restriction_opstatic final intio_uring_register_restriction_opstatic final intio_uring_register_restriction_opstatic final intRegister a fully sparse file space, rather than pass in an array of all -1 file descriptors.static final intsend/sendmsgandrecv/recvmsgflags (sqe->ioprio)static final intio_uring_setup()flagsstatic final intio_uring_setup()flagsstatic final intio_uring_setup()flagsstatic final intio_uring_setup()flagsstatic final intio_uring_setup()flagsstatic final intio_uring_setup()flagsstatic final intio_uring_setup()flagsstatic final intio_uring_setup()flagsstatic final intio_uring_setup()flagsstatic final intio_uring_setup()flagsstatic final intio_uring_setup()flagsstatic final intio_uring_setup()flagsstatic final intio_uring_setup()flagsstatic final intio_uring_setup()flagsstatic final intio_uring_setup()flagsstatic final intio_uring_setup()flagsstatic final intio_uring_setup()flagsstatic final intsqe->splice_flags, extendssplice(2)flagsstatic final intsq_ring->flagsstatic final intsq_ring->flagsstatic final intsq_ring->flagsstatic final intsqe->timeout_flagsstatic final intsqe->timeout_flagsstatic final intsqe->timeout_flagsstatic final intsqe->timeout_flagsstatic final intsqe->timeout_flagsstatic final intsqe->timeout_flagsstatic final intsqe->timeout_flagsstatic final intsqe->timeout_flagsstatic final intio_uring_register_opstatic final intio_uring_register_opstatic final intio_uring_register_opstatic final intio_uring_register_opstatic final intio_uring_register_opstatic final intio_uring_register_opstatic final intio_uring_register_opstatic final intio_uring_register_opstatic final intUse registered buffer; pass this flag along with settingsqe->buf_index.static final intstatic final intio_uring_sqe->flagsbitfield valuesstatic final intio_uring_sqe_flags_bitstatic final intio_uring_sqe->flagsbitfield valuesstatic final intio_uring_sqe_flags_bitstatic final intio_uring_sqe->flagsbitfield valuesstatic final intio_uring_sqe_flags_bitstatic final intio_uring_sqe->flagsbitfield valuesstatic final intio_uring_sqe_flags_bitstatic final intio_uring_sqe->flagsbitfield valuesstatic final intio_uring_sqe_flags_bitstatic final intio_uring_sqe->flagsbitfield valuesstatic final intio_uring_sqe_flags_bitstatic final intio_uring_sqe->flagsbitfield valuesstatic final intio_uring_sqe_flags_bitstatic final intFlags forREGISTER_PBUF_RING.static final intFlags forREGISTER_PBUF_RING.static final intio_uring_socket_op}static final intio_uring_socket_op}static final intio_uring_socket_op}static final intio_uring_socket_op} -
Method Summary
Modifier and TypeMethodDescriptionstatic intio_uring_enter(@Nullable IntBuffer _errno, int fd, int to_submit, int min_complete, int flags, long sig) io_uring_enter()is used to initiate and complete I/O using the shared submission and completion queues setup by a call tosetup.static intio_uring_enter2(@Nullable IntBuffer _errno, int fd, int to_submit, int min_complete, int flags, long sig, int sz) static intio_uring_register(@Nullable IntBuffer _errno, int fd, int opcode, long arg, int nr_args) Theio_uring_register()system call registers resources (e.g.static intio_uring_setup(@Nullable IntBuffer _errno, int entries, IOURingParams p) Theio_uring_setup()system call sets up a submission queue (SQ) and completion queue (CQ) with at leastentriesentries, and returns a file descriptor which can be used to perform subsequent operations on theio_uringinstance.static intnio_uring_enter(long _errno, int fd, int to_submit, int min_complete, int flags, long sig) Unsafe version of:enterstatic intnio_uring_enter2(long _errno, int fd, int to_submit, int min_complete, int flags, long sig, int sz) Unsafe version of:enter2static intnio_uring_register(long _errno, int fd, int opcode, long arg, int nr_args) Unsafe version of:registerstatic intnio_uring_setup(long _errno, int entries, long p) Unsafe version of:setup
-
Field Details
-
IORING_FILE_INDEX_ALLOC
public static final int IORING_FILE_INDEX_ALLOCIfsqe->file_indexis set to this for opcodes that instantiate a new direct descriptor (likeopenat/openat2/accept), then io_uring will allocate an available direct descriptor instead of having the application pass one in.The picked direct descriptor will be returned in
cqe->res, or-ENFILEif the space is full.- See Also:
-
IORING_MAX_ENTRIES
public static final int IORING_MAX_ENTRIES- See Also:
-
IOSQE_FIXED_FILE_BIT
public static final int IOSQE_FIXED_FILE_BITio_uring_sqe_flags_bitEnum values:
- See Also:
-
IOSQE_IO_DRAIN_BIT
public static final int IOSQE_IO_DRAIN_BITio_uring_sqe_flags_bitEnum values:
- See Also:
-
IOSQE_IO_LINK_BIT
public static final int IOSQE_IO_LINK_BITio_uring_sqe_flags_bitEnum values:
- See Also:
-
IOSQE_IO_HARDLINK_BIT
public static final int IOSQE_IO_HARDLINK_BITio_uring_sqe_flags_bitEnum values:
- See Also:
-
IOSQE_ASYNC_BIT
public static final int IOSQE_ASYNC_BITio_uring_sqe_flags_bitEnum values:
- See Also:
-
IOSQE_BUFFER_SELECT_BIT
public static final int IOSQE_BUFFER_SELECT_BITio_uring_sqe_flags_bitEnum values:
- See Also:
-
IOSQE_CQE_SKIP_SUCCESS_BIT
public static final int IOSQE_CQE_SKIP_SUCCESS_BITio_uring_sqe_flags_bitEnum values:
- See Also:
-
IOSQE_FIXED_FILE
public static final int IOSQE_FIXED_FILEio_uring_sqe->flagsbitfield valuesEnum values:
IOSQE_FIXED_FILE- When this flag is specified,fdis an index into the files array registered with theio_uringinstance (see theREGISTER_FILESsection of theregisterman page).Note that this isn't always available for all commands. If used on a command that doesn't support fixed files, the SQE will error with
-EBADF.Available since 5.1.
IOSQE_IO_DRAIN- When this flag is specified, the SQE will not be started before previously submitted SQEs have completed, and new SQEs will not be started before this one completes.Available since 5.2.
IOSQE_IO_LINK- When this flag is specified, it forms a link with the next SQE in the submission ring.That next SQE will not be started before the previous request completes. This, in effect, forms a chain of SQEs, which can be arbitrarily long. The tail of the chain is denoted by the first SQE that does not have this flag set. Chains are not supported across submission boundaries. Even if the last SQE in a submission has this flag set, it will still terminate the current chain. This flag has no effect on previous SQE submissions, nor does it impact SQEs that are outside of the chain tail. This means that multiple chains can be executing in parallel, or chains and individual SQEs. Only members inside the chain are serialized. A chain of SQEs will be broken, if any request in that chain ends in error.
io_uringconsiders any unexpected result an error. This means that, eg, a short read will also terminate the remainder of the chain. If a chain of SQE links is broken, the remaining unstarted part of the chain will be terminated and completed with-ECANCELEDas the error code.Available since 5.3.
IOSQE_IO_HARDLINK- LikeIOSQE_IO_LINK, but it doesn't sever regardless of the completion result.Note that the link will still sever if we fail submitting the parent request, hard links are only resilient in the presence of completion results for requests that did submit correctly.
IOSQE_IO_HARDLINKimpliesIOSQE_IO_LINK.Available since 5.5.
IOSQE_ASYNC- Normal operation forio_uringis to try and issue an sqe as non-blocking first, and if that fails, execute it in an async manner.To support more efficient overlapped operation of requests that the application knows/assumes will always (or most of the time) block, the application can ask for an sqe to be issued async from the start.
Available since 5.6.
IOSQE_BUFFER_SELECT- Used in conjunction with theOP_PROVIDE_BUFFERScommand, which registers a pool of buffers to be used by commands that read or receive data.When buffers are registered for this use case, and this flag is set in the command,
io_uringwill grab a buffer from this pool when the request is ready to receive or read data. If successful, the resulting CQE will haveCQE_F_BUFFERset in the flags part of the struct, and the upperCQE_BUFFER_SHIFTbits will contain the ID of the selected buffers. This allows the application to know exactly which buffer was selected for the operation. If no buffers are available and this flag is set, then the request will fail with-ENOBUFSas the error code. Once a buffer has been used, it is no longer available in the kernel pool. The application must re-register the given buffer again when it is ready to recycle it (eg has completed using it).Available since 5.7.
IOSQE_CQE_SKIP_SUCCESS- Don't generate a CQE if the request completes successfully.If the request fails, an appropriate CQE will be posted as usual and if there is no
IOSQE_IO_HARDLINK, CQEs for all linked requests will be omitted. The notion of failure/success isopcodespecific and is the same as with breaking chains ofIOSQE_IO_LINK. One special case is when the request has a linked timeout, then the CQE generation for the linked timeout is decided solely by whether it hasIOSQE_CQE_SKIP_SUCCESSset, regardless whether it timed out or was cancelled. In other words, if a linked timeout has the flag set, it's guaranteed to not post a CQE.The semantics are chosen to accommodate several use cases. First, when all but the last request of a normal link without linked timeouts are marked with the flag, only one CQE per link is posted. Additionally, it enables supression of CQEs in cases where the side effects of a successfully executed operation is enough for userspace to know the state of the system. One such example would be writing to a synchronisation file.
This flag is incompatible with
IOSQE_IO_DRAIN. Using both of them in a single ring is undefined behavior, even when they are not used together in a single request. Currently, after the first request withIOSQE_CQE_SKIP_SUCCESS, all subsequent requests marked with drain will be failed at submission time. Note that the error reporting is best effort only, and restrictions may change in the future.Available since 5.17.
- See Also:
-
IOSQE_IO_DRAIN
public static final int IOSQE_IO_DRAINio_uring_sqe->flagsbitfield valuesEnum values:
IOSQE_FIXED_FILE- When this flag is specified,fdis an index into the files array registered with theio_uringinstance (see theREGISTER_FILESsection of theregisterman page).Note that this isn't always available for all commands. If used on a command that doesn't support fixed files, the SQE will error with
-EBADF.Available since 5.1.
IOSQE_IO_DRAIN- When this flag is specified, the SQE will not be started before previously submitted SQEs have completed, and new SQEs will not be started before this one completes.Available since 5.2.
IOSQE_IO_LINK- When this flag is specified, it forms a link with the next SQE in the submission ring.That next SQE will not be started before the previous request completes. This, in effect, forms a chain of SQEs, which can be arbitrarily long. The tail of the chain is denoted by the first SQE that does not have this flag set. Chains are not supported across submission boundaries. Even if the last SQE in a submission has this flag set, it will still terminate the current chain. This flag has no effect on previous SQE submissions, nor does it impact SQEs that are outside of the chain tail. This means that multiple chains can be executing in parallel, or chains and individual SQEs. Only members inside the chain are serialized. A chain of SQEs will be broken, if any request in that chain ends in error.
io_uringconsiders any unexpected result an error. This means that, eg, a short read will also terminate the remainder of the chain. If a chain of SQE links is broken, the remaining unstarted part of the chain will be terminated and completed with-ECANCELEDas the error code.Available since 5.3.
IOSQE_IO_HARDLINK- LikeIOSQE_IO_LINK, but it doesn't sever regardless of the completion result.Note that the link will still sever if we fail submitting the parent request, hard links are only resilient in the presence of completion results for requests that did submit correctly.
IOSQE_IO_HARDLINKimpliesIOSQE_IO_LINK.Available since 5.5.
IOSQE_ASYNC- Normal operation forio_uringis to try and issue an sqe as non-blocking first, and if that fails, execute it in an async manner.To support more efficient overlapped operation of requests that the application knows/assumes will always (or most of the time) block, the application can ask for an sqe to be issued async from the start.
Available since 5.6.
IOSQE_BUFFER_SELECT- Used in conjunction with theOP_PROVIDE_BUFFERScommand, which registers a pool of buffers to be used by commands that read or receive data.When buffers are registered for this use case, and this flag is set in the command,
io_uringwill grab a buffer from this pool when the request is ready to receive or read data. If successful, the resulting CQE will haveCQE_F_BUFFERset in the flags part of the struct, and the upperCQE_BUFFER_SHIFTbits will contain the ID of the selected buffers. This allows the application to know exactly which buffer was selected for the operation. If no buffers are available and this flag is set, then the request will fail with-ENOBUFSas the error code. Once a buffer has been used, it is no longer available in the kernel pool. The application must re-register the given buffer again when it is ready to recycle it (eg has completed using it).Available since 5.7.
IOSQE_CQE_SKIP_SUCCESS- Don't generate a CQE if the request completes successfully.If the request fails, an appropriate CQE will be posted as usual and if there is no
IOSQE_IO_HARDLINK, CQEs for all linked requests will be omitted. The notion of failure/success isopcodespecific and is the same as with breaking chains ofIOSQE_IO_LINK. One special case is when the request has a linked timeout, then the CQE generation for the linked timeout is decided solely by whether it hasIOSQE_CQE_SKIP_SUCCESSset, regardless whether it timed out or was cancelled. In other words, if a linked timeout has the flag set, it's guaranteed to not post a CQE.The semantics are chosen to accommodate several use cases. First, when all but the last request of a normal link without linked timeouts are marked with the flag, only one CQE per link is posted. Additionally, it enables supression of CQEs in cases where the side effects of a successfully executed operation is enough for userspace to know the state of the system. One such example would be writing to a synchronisation file.
This flag is incompatible with
IOSQE_IO_DRAIN. Using both of them in a single ring is undefined behavior, even when they are not used together in a single request. Currently, after the first request withIOSQE_CQE_SKIP_SUCCESS, all subsequent requests marked with drain will be failed at submission time. Note that the error reporting is best effort only, and restrictions may change in the future.Available since 5.17.
- See Also:
-
IOSQE_IO_LINK
public static final int IOSQE_IO_LINKio_uring_sqe->flagsbitfield valuesEnum values:
IOSQE_FIXED_FILE- When this flag is specified,fdis an index into the files array registered with theio_uringinstance (see theREGISTER_FILESsection of theregisterman page).Note that this isn't always available for all commands. If used on a command that doesn't support fixed files, the SQE will error with
-EBADF.Available since 5.1.
IOSQE_IO_DRAIN- When this flag is specified, the SQE will not be started before previously submitted SQEs have completed, and new SQEs will not be started before this one completes.Available since 5.2.
IOSQE_IO_LINK- When this flag is specified, it forms a link with the next SQE in the submission ring.That next SQE will not be started before the previous request completes. This, in effect, forms a chain of SQEs, which can be arbitrarily long. The tail of the chain is denoted by the first SQE that does not have this flag set. Chains are not supported across submission boundaries. Even if the last SQE in a submission has this flag set, it will still terminate the current chain. This flag has no effect on previous SQE submissions, nor does it impact SQEs that are outside of the chain tail. This means that multiple chains can be executing in parallel, or chains and individual SQEs. Only members inside the chain are serialized. A chain of SQEs will be broken, if any request in that chain ends in error.
io_uringconsiders any unexpected result an error. This means that, eg, a short read will also terminate the remainder of the chain. If a chain of SQE links is broken, the remaining unstarted part of the chain will be terminated and completed with-ECANCELEDas the error code.Available since 5.3.
IOSQE_IO_HARDLINK- LikeIOSQE_IO_LINK, but it doesn't sever regardless of the completion result.Note that the link will still sever if we fail submitting the parent request, hard links are only resilient in the presence of completion results for requests that did submit correctly.
IOSQE_IO_HARDLINKimpliesIOSQE_IO_LINK.Available since 5.5.
IOSQE_ASYNC- Normal operation forio_uringis to try and issue an sqe as non-blocking first, and if that fails, execute it in an async manner.To support more efficient overlapped operation of requests that the application knows/assumes will always (or most of the time) block, the application can ask for an sqe to be issued async from the start.
Available since 5.6.
IOSQE_BUFFER_SELECT- Used in conjunction with theOP_PROVIDE_BUFFERScommand, which registers a pool of buffers to be used by commands that read or receive data.When buffers are registered for this use case, and this flag is set in the command,
io_uringwill grab a buffer from this pool when the request is ready to receive or read data. If successful, the resulting CQE will haveCQE_F_BUFFERset in the flags part of the struct, and the upperCQE_BUFFER_SHIFTbits will contain the ID of the selected buffers. This allows the application to know exactly which buffer was selected for the operation. If no buffers are available and this flag is set, then the request will fail with-ENOBUFSas the error code. Once a buffer has been used, it is no longer available in the kernel pool. The application must re-register the given buffer again when it is ready to recycle it (eg has completed using it).Available since 5.7.
IOSQE_CQE_SKIP_SUCCESS- Don't generate a CQE if the request completes successfully.If the request fails, an appropriate CQE will be posted as usual and if there is no
IOSQE_IO_HARDLINK, CQEs for all linked requests will be omitted. The notion of failure/success isopcodespecific and is the same as with breaking chains ofIOSQE_IO_LINK. One special case is when the request has a linked timeout, then the CQE generation for the linked timeout is decided solely by whether it hasIOSQE_CQE_SKIP_SUCCESSset, regardless whether it timed out or was cancelled. In other words, if a linked timeout has the flag set, it's guaranteed to not post a CQE.The semantics are chosen to accommodate several use cases. First, when all but the last request of a normal link without linked timeouts are marked with the flag, only one CQE per link is posted. Additionally, it enables supression of CQEs in cases where the side effects of a successfully executed operation is enough for userspace to know the state of the system. One such example would be writing to a synchronisation file.
This flag is incompatible with
IOSQE_IO_DRAIN. Using both of them in a single ring is undefined behavior, even when they are not used together in a single request. Currently, after the first request withIOSQE_CQE_SKIP_SUCCESS, all subsequent requests marked with drain will be failed at submission time. Note that the error reporting is best effort only, and restrictions may change in the future.Available since 5.17.
- See Also:
-
IOSQE_IO_HARDLINK
public static final int IOSQE_IO_HARDLINKio_uring_sqe->flagsbitfield valuesEnum values:
IOSQE_FIXED_FILE- When this flag is specified,fdis an index into the files array registered with theio_uringinstance (see theREGISTER_FILESsection of theregisterman page).Note that this isn't always available for all commands. If used on a command that doesn't support fixed files, the SQE will error with
-EBADF.Available since 5.1.
IOSQE_IO_DRAIN- When this flag is specified, the SQE will not be started before previously submitted SQEs have completed, and new SQEs will not be started before this one completes.Available since 5.2.
IOSQE_IO_LINK- When this flag is specified, it forms a link with the next SQE in the submission ring.That next SQE will not be started before the previous request completes. This, in effect, forms a chain of SQEs, which can be arbitrarily long. The tail of the chain is denoted by the first SQE that does not have this flag set. Chains are not supported across submission boundaries. Even if the last SQE in a submission has this flag set, it will still terminate the current chain. This flag has no effect on previous SQE submissions, nor does it impact SQEs that are outside of the chain tail. This means that multiple chains can be executing in parallel, or chains and individual SQEs. Only members inside the chain are serialized. A chain of SQEs will be broken, if any request in that chain ends in error.
io_uringconsiders any unexpected result an error. This means that, eg, a short read will also terminate the remainder of the chain. If a chain of SQE links is broken, the remaining unstarted part of the chain will be terminated and completed with-ECANCELEDas the error code.Available since 5.3.
IOSQE_IO_HARDLINK- LikeIOSQE_IO_LINK, but it doesn't sever regardless of the completion result.Note that the link will still sever if we fail submitting the parent request, hard links are only resilient in the presence of completion results for requests that did submit correctly.
IOSQE_IO_HARDLINKimpliesIOSQE_IO_LINK.Available since 5.5.
IOSQE_ASYNC- Normal operation forio_uringis to try and issue an sqe as non-blocking first, and if that fails, execute it in an async manner.To support more efficient overlapped operation of requests that the application knows/assumes will always (or most of the time) block, the application can ask for an sqe to be issued async from the start.
Available since 5.6.
IOSQE_BUFFER_SELECT- Used in conjunction with theOP_PROVIDE_BUFFERScommand, which registers a pool of buffers to be used by commands that read or receive data.When buffers are registered for this use case, and this flag is set in the command,
io_uringwill grab a buffer from this pool when the request is ready to receive or read data. If successful, the resulting CQE will haveCQE_F_BUFFERset in the flags part of the struct, and the upperCQE_BUFFER_SHIFTbits will contain the ID of the selected buffers. This allows the application to know exactly which buffer was selected for the operation. If no buffers are available and this flag is set, then the request will fail with-ENOBUFSas the error code. Once a buffer has been used, it is no longer available in the kernel pool. The application must re-register the given buffer again when it is ready to recycle it (eg has completed using it).Available since 5.7.
IOSQE_CQE_SKIP_SUCCESS- Don't generate a CQE if the request completes successfully.If the request fails, an appropriate CQE will be posted as usual and if there is no
IOSQE_IO_HARDLINK, CQEs for all linked requests will be omitted. The notion of failure/success isopcodespecific and is the same as with breaking chains ofIOSQE_IO_LINK. One special case is when the request has a linked timeout, then the CQE generation for the linked timeout is decided solely by whether it hasIOSQE_CQE_SKIP_SUCCESSset, regardless whether it timed out or was cancelled. In other words, if a linked timeout has the flag set, it's guaranteed to not post a CQE.The semantics are chosen to accommodate several use cases. First, when all but the last request of a normal link without linked timeouts are marked with the flag, only one CQE per link is posted. Additionally, it enables supression of CQEs in cases where the side effects of a successfully executed operation is enough for userspace to know the state of the system. One such example would be writing to a synchronisation file.
This flag is incompatible with
IOSQE_IO_DRAIN. Using both of them in a single ring is undefined behavior, even when they are not used together in a single request. Currently, after the first request withIOSQE_CQE_SKIP_SUCCESS, all subsequent requests marked with drain will be failed at submission time. Note that the error reporting is best effort only, and restrictions may change in the future.Available since 5.17.
- See Also:
-
IOSQE_ASYNC
public static final int IOSQE_ASYNCio_uring_sqe->flagsbitfield valuesEnum values:
IOSQE_FIXED_FILE- When this flag is specified,fdis an index into the files array registered with theio_uringinstance (see theREGISTER_FILESsection of theregisterman page).Note that this isn't always available for all commands. If used on a command that doesn't support fixed files, the SQE will error with
-EBADF.Available since 5.1.
IOSQE_IO_DRAIN- When this flag is specified, the SQE will not be started before previously submitted SQEs have completed, and new SQEs will not be started before this one completes.Available since 5.2.
IOSQE_IO_LINK- When this flag is specified, it forms a link with the next SQE in the submission ring.That next SQE will not be started before the previous request completes. This, in effect, forms a chain of SQEs, which can be arbitrarily long. The tail of the chain is denoted by the first SQE that does not have this flag set. Chains are not supported across submission boundaries. Even if the last SQE in a submission has this flag set, it will still terminate the current chain. This flag has no effect on previous SQE submissions, nor does it impact SQEs that are outside of the chain tail. This means that multiple chains can be executing in parallel, or chains and individual SQEs. Only members inside the chain are serialized. A chain of SQEs will be broken, if any request in that chain ends in error.
io_uringconsiders any unexpected result an error. This means that, eg, a short read will also terminate the remainder of the chain. If a chain of SQE links is broken, the remaining unstarted part of the chain will be terminated and completed with-ECANCELEDas the error code.Available since 5.3.
IOSQE_IO_HARDLINK- LikeIOSQE_IO_LINK, but it doesn't sever regardless of the completion result.Note that the link will still sever if we fail submitting the parent request, hard links are only resilient in the presence of completion results for requests that did submit correctly.
IOSQE_IO_HARDLINKimpliesIOSQE_IO_LINK.Available since 5.5.
IOSQE_ASYNC- Normal operation forio_uringis to try and issue an sqe as non-blocking first, and if that fails, execute it in an async manner.To support more efficient overlapped operation of requests that the application knows/assumes will always (or most of the time) block, the application can ask for an sqe to be issued async from the start.
Available since 5.6.
IOSQE_BUFFER_SELECT- Used in conjunction with theOP_PROVIDE_BUFFERScommand, which registers a pool of buffers to be used by commands that read or receive data.When buffers are registered for this use case, and this flag is set in the command,
io_uringwill grab a buffer from this pool when the request is ready to receive or read data. If successful, the resulting CQE will haveCQE_F_BUFFERset in the flags part of the struct, and the upperCQE_BUFFER_SHIFTbits will contain the ID of the selected buffers. This allows the application to know exactly which buffer was selected for the operation. If no buffers are available and this flag is set, then the request will fail with-ENOBUFSas the error code. Once a buffer has been used, it is no longer available in the kernel pool. The application must re-register the given buffer again when it is ready to recycle it (eg has completed using it).Available since 5.7.
IOSQE_CQE_SKIP_SUCCESS- Don't generate a CQE if the request completes successfully.If the request fails, an appropriate CQE will be posted as usual and if there is no
IOSQE_IO_HARDLINK, CQEs for all linked requests will be omitted. The notion of failure/success isopcodespecific and is the same as with breaking chains ofIOSQE_IO_LINK. One special case is when the request has a linked timeout, then the CQE generation for the linked timeout is decided solely by whether it hasIOSQE_CQE_SKIP_SUCCESSset, regardless whether it timed out or was cancelled. In other words, if a linked timeout has the flag set, it's guaranteed to not post a CQE.The semantics are chosen to accommodate several use cases. First, when all but the last request of a normal link without linked timeouts are marked with the flag, only one CQE per link is posted. Additionally, it enables supression of CQEs in cases where the side effects of a successfully executed operation is enough for userspace to know the state of the system. One such example would be writing to a synchronisation file.
This flag is incompatible with
IOSQE_IO_DRAIN. Using both of them in a single ring is undefined behavior, even when they are not used together in a single request. Currently, after the first request withIOSQE_CQE_SKIP_SUCCESS, all subsequent requests marked with drain will be failed at submission time. Note that the error reporting is best effort only, and restrictions may change in the future.Available since 5.17.
- See Also:
-
IOSQE_BUFFER_SELECT
public static final int IOSQE_BUFFER_SELECTio_uring_sqe->flagsbitfield valuesEnum values:
IOSQE_FIXED_FILE- When this flag is specified,fdis an index into the files array registered with theio_uringinstance (see theREGISTER_FILESsection of theregisterman page).Note that this isn't always available for all commands. If used on a command that doesn't support fixed files, the SQE will error with
-EBADF.Available since 5.1.
IOSQE_IO_DRAIN- When this flag is specified, the SQE will not be started before previously submitted SQEs have completed, and new SQEs will not be started before this one completes.Available since 5.2.
IOSQE_IO_LINK- When this flag is specified, it forms a link with the next SQE in the submission ring.That next SQE will not be started before the previous request completes. This, in effect, forms a chain of SQEs, which can be arbitrarily long. The tail of the chain is denoted by the first SQE that does not have this flag set. Chains are not supported across submission boundaries. Even if the last SQE in a submission has this flag set, it will still terminate the current chain. This flag has no effect on previous SQE submissions, nor does it impact SQEs that are outside of the chain tail. This means that multiple chains can be executing in parallel, or chains and individual SQEs. Only members inside the chain are serialized. A chain of SQEs will be broken, if any request in that chain ends in error.
io_uringconsiders any unexpected result an error. This means that, eg, a short read will also terminate the remainder of the chain. If a chain of SQE links is broken, the remaining unstarted part of the chain will be terminated and completed with-ECANCELEDas the error code.Available since 5.3.
IOSQE_IO_HARDLINK- LikeIOSQE_IO_LINK, but it doesn't sever regardless of the completion result.Note that the link will still sever if we fail submitting the parent request, hard links are only resilient in the presence of completion results for requests that did submit correctly.
IOSQE_IO_HARDLINKimpliesIOSQE_IO_LINK.Available since 5.5.
IOSQE_ASYNC- Normal operation forio_uringis to try and issue an sqe as non-blocking first, and if that fails, execute it in an async manner.To support more efficient overlapped operation of requests that the application knows/assumes will always (or most of the time) block, the application can ask for an sqe to be issued async from the start.
Available since 5.6.
IOSQE_BUFFER_SELECT- Used in conjunction with theOP_PROVIDE_BUFFERScommand, which registers a pool of buffers to be used by commands that read or receive data.When buffers are registered for this use case, and this flag is set in the command,
io_uringwill grab a buffer from this pool when the request is ready to receive or read data. If successful, the resulting CQE will haveCQE_F_BUFFERset in the flags part of the struct, and the upperCQE_BUFFER_SHIFTbits will contain the ID of the selected buffers. This allows the application to know exactly which buffer was selected for the operation. If no buffers are available and this flag is set, then the request will fail with-ENOBUFSas the error code. Once a buffer has been used, it is no longer available in the kernel pool. The application must re-register the given buffer again when it is ready to recycle it (eg has completed using it).Available since 5.7.
IOSQE_CQE_SKIP_SUCCESS- Don't generate a CQE if the request completes successfully.If the request fails, an appropriate CQE will be posted as usual and if there is no
IOSQE_IO_HARDLINK, CQEs for all linked requests will be omitted. The notion of failure/success isopcodespecific and is the same as with breaking chains ofIOSQE_IO_LINK. One special case is when the request has a linked timeout, then the CQE generation for the linked timeout is decided solely by whether it hasIOSQE_CQE_SKIP_SUCCESSset, regardless whether it timed out or was cancelled. In other words, if a linked timeout has the flag set, it's guaranteed to not post a CQE.The semantics are chosen to accommodate several use cases. First, when all but the last request of a normal link without linked timeouts are marked with the flag, only one CQE per link is posted. Additionally, it enables supression of CQEs in cases where the side effects of a successfully executed operation is enough for userspace to know the state of the system. One such example would be writing to a synchronisation file.
This flag is incompatible with
IOSQE_IO_DRAIN. Using both of them in a single ring is undefined behavior, even when they are not used together in a single request. Currently, after the first request withIOSQE_CQE_SKIP_SUCCESS, all subsequent requests marked with drain will be failed at submission time. Note that the error reporting is best effort only, and restrictions may change in the future.Available since 5.17.
- See Also:
-
IOSQE_CQE_SKIP_SUCCESS
public static final int IOSQE_CQE_SKIP_SUCCESSio_uring_sqe->flagsbitfield valuesEnum values:
IOSQE_FIXED_FILE- When this flag is specified,fdis an index into the files array registered with theio_uringinstance (see theREGISTER_FILESsection of theregisterman page).Note that this isn't always available for all commands. If used on a command that doesn't support fixed files, the SQE will error with
-EBADF.Available since 5.1.
IOSQE_IO_DRAIN- When this flag is specified, the SQE will not be started before previously submitted SQEs have completed, and new SQEs will not be started before this one completes.Available since 5.2.
IOSQE_IO_LINK- When this flag is specified, it forms a link with the next SQE in the submission ring.That next SQE will not be started before the previous request completes. This, in effect, forms a chain of SQEs, which can be arbitrarily long. The tail of the chain is denoted by the first SQE that does not have this flag set. Chains are not supported across submission boundaries. Even if the last SQE in a submission has this flag set, it will still terminate the current chain. This flag has no effect on previous SQE submissions, nor does it impact SQEs that are outside of the chain tail. This means that multiple chains can be executing in parallel, or chains and individual SQEs. Only members inside the chain are serialized. A chain of SQEs will be broken, if any request in that chain ends in error.
io_uringconsiders any unexpected result an error. This means that, eg, a short read will also terminate the remainder of the chain. If a chain of SQE links is broken, the remaining unstarted part of the chain will be terminated and completed with-ECANCELEDas the error code.Available since 5.3.
IOSQE_IO_HARDLINK- LikeIOSQE_IO_LINK, but it doesn't sever regardless of the completion result.Note that the link will still sever if we fail submitting the parent request, hard links are only resilient in the presence of completion results for requests that did submit correctly.
IOSQE_IO_HARDLINKimpliesIOSQE_IO_LINK.Available since 5.5.
IOSQE_ASYNC- Normal operation forio_uringis to try and issue an sqe as non-blocking first, and if that fails, execute it in an async manner.To support more efficient overlapped operation of requests that the application knows/assumes will always (or most of the time) block, the application can ask for an sqe to be issued async from the start.
Available since 5.6.
IOSQE_BUFFER_SELECT- Used in conjunction with theOP_PROVIDE_BUFFERScommand, which registers a pool of buffers to be used by commands that read or receive data.When buffers are registered for this use case, and this flag is set in the command,
io_uringwill grab a buffer from this pool when the request is ready to receive or read data. If successful, the resulting CQE will haveCQE_F_BUFFERset in the flags part of the struct, and the upperCQE_BUFFER_SHIFTbits will contain the ID of the selected buffers. This allows the application to know exactly which buffer was selected for the operation. If no buffers are available and this flag is set, then the request will fail with-ENOBUFSas the error code. Once a buffer has been used, it is no longer available in the kernel pool. The application must re-register the given buffer again when it is ready to recycle it (eg has completed using it).Available since 5.7.
IOSQE_CQE_SKIP_SUCCESS- Don't generate a CQE if the request completes successfully.If the request fails, an appropriate CQE will be posted as usual and if there is no
IOSQE_IO_HARDLINK, CQEs for all linked requests will be omitted. The notion of failure/success isopcodespecific and is the same as with breaking chains ofIOSQE_IO_LINK. One special case is when the request has a linked timeout, then the CQE generation for the linked timeout is decided solely by whether it hasIOSQE_CQE_SKIP_SUCCESSset, regardless whether it timed out or was cancelled. In other words, if a linked timeout has the flag set, it's guaranteed to not post a CQE.The semantics are chosen to accommodate several use cases. First, when all but the last request of a normal link without linked timeouts are marked with the flag, only one CQE per link is posted. Additionally, it enables supression of CQEs in cases where the side effects of a successfully executed operation is enough for userspace to know the state of the system. One such example would be writing to a synchronisation file.
This flag is incompatible with
IOSQE_IO_DRAIN. Using both of them in a single ring is undefined behavior, even when they are not used together in a single request. Currently, after the first request withIOSQE_CQE_SKIP_SUCCESS, all subsequent requests marked with drain will be failed at submission time. Note that the error reporting is best effort only, and restrictions may change in the future.Available since 5.17.
- See Also:
-
IORING_SETUP_IOPOLL
public static final int IORING_SETUP_IOPOLLio_uring_setup()flagsEnum values:
SETUP_IOPOLL- Perform busy-waiting for an I/O completion, as opposed to getting notifications via an asynchronous IRQ (Interrupt Request).The file system (if any) and block device must support polling in order for this to work. Busy-waiting provides lower latency, but may consume more CPU resources than interrupt driven I/O. Currently, this feature is usable only on a file descriptor opened using the
O_DIRECTflag. When a read or write is submitted to a polled context, the application must poll for completions on the CQ ring by callingenter. It is illegal to mix and match polled and non-polled I/O on an io_uring instance.SETUP_SQPOLL- When this flag is specified, a kernel thread is created to perform submission queue polling.An
io_uringinstance configured in this way enables an application to issue I/O without ever context switching into the kernel. By using the submission queue to fill in new submission queue entries and watching for completions on the completion queue, the application can submit and reap I/Os without doing a single system call.If the kernel thread is idle for more than
sq_thread_idlemilliseconds, it will set theSQ_NEED_WAKEUPbit in the flags field of the structio_sq_ring. When this happens, the application must callenterto wake the kernel thread. If I/O is kept busy, the kernel thread will never sleep. An application making use of this feature will need to guard theio_uring_enter()call with the following code sequence:// Ensure that the wakeup flag is read after the tail pointer // has been written. It's important to use memory load acquire // semantics for the flags read, as otherwise the application // and the kernel might not agree on the consistency of the // wakeup flag. unsigned flags = atomic_load_relaxed(sq_ring->flags); if (flags & IORING_SQ_NEED_WAKEUP) io_uring_enter(fd, 0, 0, IORING_ENTER_SQ_WAKEUP);where
sq_ringis a submission queue ring setup using the structio_sqring_offsetsdescribed below.Before version 5.11 of the Linux kernel, to successfully use this feature, the application must register a set of files to be used for IO through
registerusing theREGISTER_FILESopcode. Failure to do so will result in submitted IO being errored withEBADF. The presence of this feature can be detected by theFEAT_SQPOLL_NONFIXEDfeature flag. In version 5.11 and later, it is no longer necessary to register files to use this feature. 5.11 also allows using this as non-root, if the user has theCAP_SYS_NICEcapability.SETUP_SQ_AFF- If this flag is specified, then the poll thread will be bound to the cpu set in thesq_thread_cpufield of the structio_uring_params. This flag is only meaningful whenSETUP_SQPOLLis specified. Whencgroupsettingcpuset.cpuschanges (typically in container environment), the bounded cpu set may be changed as well.SETUP_CQSIZE- Create the completion queue with structio_uring_params.cq_entriesentries.The value must be greater than entries, and may be rounded up to the next power-of-two.
SETUP_CLAMP- If this flag is specified, and if entries exceedsMAX_ENTRIES, then entries will be clamped atIORING_MAX_ENTRIES.If the flag
SETUP_SQPOLLis set, and if the value of structio_uring_params.cq_entriesexceedsIORING_MAX_CQ_ENTRIES, then it will be clamped atIORING_MAX_CQ_ENTRIES.SETUP_ATTACH_WQ- This flag should be set in conjunction with structio_uring_params.wq_fdbeing set to an existingio_uringring file descriptor.When set, the
io_uringinstance being created will share the asynchronous worker thread backend of the specifiedio_uringring, rather than create a new separate thread pool.SETUP_R_DISABLED- If this flag is specified, the io_uring ring starts in a disabled state.In this state, restrictions can be registered, but submissions are not allowed. See
registerfor details on how to enable the ring.Available since 5.10.
SETUP_SUBMIT_ALL- Continue submit on error.Normally io_uring stops submitting a batch of request, if one of these requests results in an error. This can cause submission of less than what is expected, if a request ends in error while being submitted. If the ring is created with this flag,
enterwill continue submitting requests even if it encounters an error submitting a request. CQEs are still posted for errored request regardless of whether or not this flag is set at ring creation time, the only difference is if the submit sequence is halted or continued when an error is observed.Available since 5.18.
SETUP_COOP_TASKRUN- Cooperative task running.By default, io_uring will interrupt a task running in userspace when a completion event comes in. This is to ensure that completions run in a timely manner. For a lot of use cases, this is overkill and can cause reduced performance from both the inter-processor interrupt used to do this, the kernel/user transition, the needless interruption of the tasks userspace activities, and reduced batching if completions come in at a rapid rate. Most applications don't need the forceful interruption, as the events are processed at any kernel/user transition. The exception are setups where the application uses multiple threads operating on the same ring, where the application waiting on completions isn't the one that submitted them. For most other use cases, setting this flag will improve performance.
Available since 5.19.
SETUP_TASKRUN_FLAG- Used in conjunction withSETUP_COOP_TASKRUN, this provides a flag,SQ_TASKRUN, which is set in the SQ ringflagswhenever completions are pending that should be processed. liburing will check for this flag even when doingpeek_cqeand enter the kernel to process them, and applications can do the same. This makesIORING_SETUP_TASKRUN_FLAGsafe to use even when applications rely on a peek style operation on the CQ ring to see if anything might be pending to reap.Available since 5.19.
SETUP_SQE128- If set, io_uring will use 128-byte SQEs rather than the normal 64-byte sized variant.This is a requirement for using certain request types, as of 5.19 only the
OP_URING_CMDpassthrough command for NVMe passthrough needs this.Available since 5.19.
SETUP_CQE32- If set, io_uring will use 32-byte CQEs rather than the normal 32-byte sized variant.This is a requirement for using certain request types, as of 5.19 only the
OP_URING_CMDpassthrough command for NVMe passthrough needs this.Available since 5.19.
SETUP_SINGLE_ISSUER- A hint to the kernel that only a single task can submit requests, which is used for internal optimisations.The kernel enforces the rule, which only affects
entercalls submitting requests and will fail them with-EEXISTif the restriction is violated. The submitter task may differ from the task that created the ring. Note that whenSETUP_SQPOLLis set it is considered that the polling task is doing all submissions on behalf of the userspace and so it always complies with the rule disregarding how many userspace tasks doio_uring_enter.Available since 5.20.
SETUP_DEFER_TASKRUN- Defer running task work to get events.By default, io_uring will process all outstanding work at the end of any system call or thread interrupt. This can delay the application from making other progress. Setting this flag will hint to io_uring that it should defer work until an
entercall with theENTER_GETEVENTSflag set. This allows the application to request work to run just before it wants to process completions. This flag requires theSETUP_SINGLE_ISSUERflag to be set, and also enforces that the call toio_uring_enteris called from the same thread that submitted requests. Note that if this flag is set then it is the application's responsibility to periodically trigger work (for example via any of the CQE waiting functions) or else completions may not be delivered.Available since 6.1.
SETUP_NO_MMAP- Application provides the memory for the rings.SETUP_REGISTERED_FD_ONLY- Register the ring fd in itself for use withREGISTER_USE_REGISTERED_RING; return a registered fd index rather than an fd.SETUP_NO_SQARRAY- Removes indirection through the SQ index array.
- See Also:
-
IORING_SETUP_SQPOLL
public static final int IORING_SETUP_SQPOLLio_uring_setup()flagsEnum values:
SETUP_IOPOLL- Perform busy-waiting for an I/O completion, as opposed to getting notifications via an asynchronous IRQ (Interrupt Request).The file system (if any) and block device must support polling in order for this to work. Busy-waiting provides lower latency, but may consume more CPU resources than interrupt driven I/O. Currently, this feature is usable only on a file descriptor opened using the
O_DIRECTflag. When a read or write is submitted to a polled context, the application must poll for completions on the CQ ring by callingenter. It is illegal to mix and match polled and non-polled I/O on an io_uring instance.SETUP_SQPOLL- When this flag is specified, a kernel thread is created to perform submission queue polling.An
io_uringinstance configured in this way enables an application to issue I/O without ever context switching into the kernel. By using the submission queue to fill in new submission queue entries and watching for completions on the completion queue, the application can submit and reap I/Os without doing a single system call.If the kernel thread is idle for more than
sq_thread_idlemilliseconds, it will set theSQ_NEED_WAKEUPbit in the flags field of the structio_sq_ring. When this happens, the application must callenterto wake the kernel thread. If I/O is kept busy, the kernel thread will never sleep. An application making use of this feature will need to guard theio_uring_enter()call with the following code sequence:// Ensure that the wakeup flag is read after the tail pointer // has been written. It's important to use memory load acquire // semantics for the flags read, as otherwise the application // and the kernel might not agree on the consistency of the // wakeup flag. unsigned flags = atomic_load_relaxed(sq_ring->flags); if (flags & IORING_SQ_NEED_WAKEUP) io_uring_enter(fd, 0, 0, IORING_ENTER_SQ_WAKEUP);where
sq_ringis a submission queue ring setup using the structio_sqring_offsetsdescribed below.Before version 5.11 of the Linux kernel, to successfully use this feature, the application must register a set of files to be used for IO through
registerusing theREGISTER_FILESopcode. Failure to do so will result in submitted IO being errored withEBADF. The presence of this feature can be detected by theFEAT_SQPOLL_NONFIXEDfeature flag. In version 5.11 and later, it is no longer necessary to register files to use this feature. 5.11 also allows using this as non-root, if the user has theCAP_SYS_NICEcapability.SETUP_SQ_AFF- If this flag is specified, then the poll thread will be bound to the cpu set in thesq_thread_cpufield of the structio_uring_params. This flag is only meaningful whenSETUP_SQPOLLis specified. Whencgroupsettingcpuset.cpuschanges (typically in container environment), the bounded cpu set may be changed as well.SETUP_CQSIZE- Create the completion queue with structio_uring_params.cq_entriesentries.The value must be greater than entries, and may be rounded up to the next power-of-two.
SETUP_CLAMP- If this flag is specified, and if entries exceedsMAX_ENTRIES, then entries will be clamped atIORING_MAX_ENTRIES.If the flag
SETUP_SQPOLLis set, and if the value of structio_uring_params.cq_entriesexceedsIORING_MAX_CQ_ENTRIES, then it will be clamped atIORING_MAX_CQ_ENTRIES.SETUP_ATTACH_WQ- This flag should be set in conjunction with structio_uring_params.wq_fdbeing set to an existingio_uringring file descriptor.When set, the
io_uringinstance being created will share the asynchronous worker thread backend of the specifiedio_uringring, rather than create a new separate thread pool.SETUP_R_DISABLED- If this flag is specified, the io_uring ring starts in a disabled state.In this state, restrictions can be registered, but submissions are not allowed. See
registerfor details on how to enable the ring.Available since 5.10.
SETUP_SUBMIT_ALL- Continue submit on error.Normally io_uring stops submitting a batch of request, if one of these requests results in an error. This can cause submission of less than what is expected, if a request ends in error while being submitted. If the ring is created with this flag,
enterwill continue submitting requests even if it encounters an error submitting a request. CQEs are still posted for errored request regardless of whether or not this flag is set at ring creation time, the only difference is if the submit sequence is halted or continued when an error is observed.Available since 5.18.
SETUP_COOP_TASKRUN- Cooperative task running.By default, io_uring will interrupt a task running in userspace when a completion event comes in. This is to ensure that completions run in a timely manner. For a lot of use cases, this is overkill and can cause reduced performance from both the inter-processor interrupt used to do this, the kernel/user transition, the needless interruption of the tasks userspace activities, and reduced batching if completions come in at a rapid rate. Most applications don't need the forceful interruption, as the events are processed at any kernel/user transition. The exception are setups where the application uses multiple threads operating on the same ring, where the application waiting on completions isn't the one that submitted them. For most other use cases, setting this flag will improve performance.
Available since 5.19.
SETUP_TASKRUN_FLAG- Used in conjunction withSETUP_COOP_TASKRUN, this provides a flag,SQ_TASKRUN, which is set in the SQ ringflagswhenever completions are pending that should be processed. liburing will check for this flag even when doingpeek_cqeand enter the kernel to process them, and applications can do the same. This makesIORING_SETUP_TASKRUN_FLAGsafe to use even when applications rely on a peek style operation on the CQ ring to see if anything might be pending to reap.Available since 5.19.
SETUP_SQE128- If set, io_uring will use 128-byte SQEs rather than the normal 64-byte sized variant.This is a requirement for using certain request types, as of 5.19 only the
OP_URING_CMDpassthrough command for NVMe passthrough needs this.Available since 5.19.
SETUP_CQE32- If set, io_uring will use 32-byte CQEs rather than the normal 32-byte sized variant.This is a requirement for using certain request types, as of 5.19 only the
OP_URING_CMDpassthrough command for NVMe passthrough needs this.Available since 5.19.
SETUP_SINGLE_ISSUER- A hint to the kernel that only a single task can submit requests, which is used for internal optimisations.The kernel enforces the rule, which only affects
entercalls submitting requests and will fail them with-EEXISTif the restriction is violated. The submitter task may differ from the task that created the ring. Note that whenSETUP_SQPOLLis set it is considered that the polling task is doing all submissions on behalf of the userspace and so it always complies with the rule disregarding how many userspace tasks doio_uring_enter.Available since 5.20.
SETUP_DEFER_TASKRUN- Defer running task work to get events.By default, io_uring will process all outstanding work at the end of any system call or thread interrupt. This can delay the application from making other progress. Setting this flag will hint to io_uring that it should defer work until an
entercall with theENTER_GETEVENTSflag set. This allows the application to request work to run just before it wants to process completions. This flag requires theSETUP_SINGLE_ISSUERflag to be set, and also enforces that the call toio_uring_enteris called from the same thread that submitted requests. Note that if this flag is set then it is the application's responsibility to periodically trigger work (for example via any of the CQE waiting functions) or else completions may not be delivered.Available since 6.1.
SETUP_NO_MMAP- Application provides the memory for the rings.SETUP_REGISTERED_FD_ONLY- Register the ring fd in itself for use withREGISTER_USE_REGISTERED_RING; return a registered fd index rather than an fd.SETUP_NO_SQARRAY- Removes indirection through the SQ index array.
- See Also:
-
IORING_SETUP_SQ_AFF
public static final int IORING_SETUP_SQ_AFFio_uring_setup()flagsEnum values:
SETUP_IOPOLL- Perform busy-waiting for an I/O completion, as opposed to getting notifications via an asynchronous IRQ (Interrupt Request).The file system (if any) and block device must support polling in order for this to work. Busy-waiting provides lower latency, but may consume more CPU resources than interrupt driven I/O. Currently, this feature is usable only on a file descriptor opened using the
O_DIRECTflag. When a read or write is submitted to a polled context, the application must poll for completions on the CQ ring by callingenter. It is illegal to mix and match polled and non-polled I/O on an io_uring instance.SETUP_SQPOLL- When this flag is specified, a kernel thread is created to perform submission queue polling.An
io_uringinstance configured in this way enables an application to issue I/O without ever context switching into the kernel. By using the submission queue to fill in new submission queue entries and watching for completions on the completion queue, the application can submit and reap I/Os without doing a single system call.If the kernel thread is idle for more than
sq_thread_idlemilliseconds, it will set theSQ_NEED_WAKEUPbit in the flags field of the structio_sq_ring. When this happens, the application must callenterto wake the kernel thread. If I/O is kept busy, the kernel thread will never sleep. An application making use of this feature will need to guard theio_uring_enter()call with the following code sequence:// Ensure that the wakeup flag is read after the tail pointer // has been written. It's important to use memory load acquire // semantics for the flags read, as otherwise the application // and the kernel might not agree on the consistency of the // wakeup flag. unsigned flags = atomic_load_relaxed(sq_ring->flags); if (flags & IORING_SQ_NEED_WAKEUP) io_uring_enter(fd, 0, 0, IORING_ENTER_SQ_WAKEUP);where
sq_ringis a submission queue ring setup using the structio_sqring_offsetsdescribed below.Before version 5.11 of the Linux kernel, to successfully use this feature, the application must register a set of files to be used for IO through
registerusing theREGISTER_FILESopcode. Failure to do so will result in submitted IO being errored withEBADF. The presence of this feature can be detected by theFEAT_SQPOLL_NONFIXEDfeature flag. In version 5.11 and later, it is no longer necessary to register files to use this feature. 5.11 also allows using this as non-root, if the user has theCAP_SYS_NICEcapability.SETUP_SQ_AFF- If this flag is specified, then the poll thread will be bound to the cpu set in thesq_thread_cpufield of the structio_uring_params. This flag is only meaningful whenSETUP_SQPOLLis specified. Whencgroupsettingcpuset.cpuschanges (typically in container environment), the bounded cpu set may be changed as well.SETUP_CQSIZE- Create the completion queue with structio_uring_params.cq_entriesentries.The value must be greater than entries, and may be rounded up to the next power-of-two.
SETUP_CLAMP- If this flag is specified, and if entries exceedsMAX_ENTRIES, then entries will be clamped atIORING_MAX_ENTRIES.If the flag
SETUP_SQPOLLis set, and if the value of structio_uring_params.cq_entriesexceedsIORING_MAX_CQ_ENTRIES, then it will be clamped atIORING_MAX_CQ_ENTRIES.SETUP_ATTACH_WQ- This flag should be set in conjunction with structio_uring_params.wq_fdbeing set to an existingio_uringring file descriptor.When set, the
io_uringinstance being created will share the asynchronous worker thread backend of the specifiedio_uringring, rather than create a new separate thread pool.SETUP_R_DISABLED- If this flag is specified, the io_uring ring starts in a disabled state.In this state, restrictions can be registered, but submissions are not allowed. See
registerfor details on how to enable the ring.Available since 5.10.
SETUP_SUBMIT_ALL- Continue submit on error.Normally io_uring stops submitting a batch of request, if one of these requests results in an error. This can cause submission of less than what is expected, if a request ends in error while being submitted. If the ring is created with this flag,
enterwill continue submitting requests even if it encounters an error submitting a request. CQEs are still posted for errored request regardless of whether or not this flag is set at ring creation time, the only difference is if the submit sequence is halted or continued when an error is observed.Available since 5.18.
SETUP_COOP_TASKRUN- Cooperative task running.By default, io_uring will interrupt a task running in userspace when a completion event comes in. This is to ensure that completions run in a timely manner. For a lot of use cases, this is overkill and can cause reduced performance from both the inter-processor interrupt used to do this, the kernel/user transition, the needless interruption of the tasks userspace activities, and reduced batching if completions come in at a rapid rate. Most applications don't need the forceful interruption, as the events are processed at any kernel/user transition. The exception are setups where the application uses multiple threads operating on the same ring, where the application waiting on completions isn't the one that submitted them. For most other use cases, setting this flag will improve performance.
Available since 5.19.
SETUP_TASKRUN_FLAG- Used in conjunction withSETUP_COOP_TASKRUN, this provides a flag,SQ_TASKRUN, which is set in the SQ ringflagswhenever completions are pending that should be processed. liburing will check for this flag even when doingpeek_cqeand enter the kernel to process them, and applications can do the same. This makesIORING_SETUP_TASKRUN_FLAGsafe to use even when applications rely on a peek style operation on the CQ ring to see if anything might be pending to reap.Available since 5.19.
SETUP_SQE128- If set, io_uring will use 128-byte SQEs rather than the normal 64-byte sized variant.This is a requirement for using certain request types, as of 5.19 only the
OP_URING_CMDpassthrough command for NVMe passthrough needs this.Available since 5.19.
SETUP_CQE32- If set, io_uring will use 32-byte CQEs rather than the normal 32-byte sized variant.This is a requirement for using certain request types, as of 5.19 only the
OP_URING_CMDpassthrough command for NVMe passthrough needs this.Available since 5.19.
SETUP_SINGLE_ISSUER- A hint to the kernel that only a single task can submit requests, which is used for internal optimisations.The kernel enforces the rule, which only affects
entercalls submitting requests and will fail them with-EEXISTif the restriction is violated. The submitter task may differ from the task that created the ring. Note that whenSETUP_SQPOLLis set it is considered that the polling task is doing all submissions on behalf of the userspace and so it always complies with the rule disregarding how many userspace tasks doio_uring_enter.Available since 5.20.
SETUP_DEFER_TASKRUN- Defer running task work to get events.By default, io_uring will process all outstanding work at the end of any system call or thread interrupt. This can delay the application from making other progress. Setting this flag will hint to io_uring that it should defer work until an
entercall with theENTER_GETEVENTSflag set. This allows the application to request work to run just before it wants to process completions. This flag requires theSETUP_SINGLE_ISSUERflag to be set, and also enforces that the call toio_uring_enteris called from the same thread that submitted requests. Note that if this flag is set then it is the application's responsibility to periodically trigger work (for example via any of the CQE waiting functions) or else completions may not be delivered.Available since 6.1.
SETUP_NO_MMAP- Application provides the memory for the rings.SETUP_REGISTERED_FD_ONLY- Register the ring fd in itself for use withREGISTER_USE_REGISTERED_RING; return a registered fd index rather than an fd.SETUP_NO_SQARRAY- Removes indirection through the SQ index array.
- See Also:
-
IORING_SETUP_CQSIZE
public static final int IORING_SETUP_CQSIZEio_uring_setup()flagsEnum values:
SETUP_IOPOLL- Perform busy-waiting for an I/O completion, as opposed to getting notifications via an asynchronous IRQ (Interrupt Request).The file system (if any) and block device must support polling in order for this to work. Busy-waiting provides lower latency, but may consume more CPU resources than interrupt driven I/O. Currently, this feature is usable only on a file descriptor opened using the
O_DIRECTflag. When a read or write is submitted to a polled context, the application must poll for completions on the CQ ring by callingenter. It is illegal to mix and match polled and non-polled I/O on an io_uring instance.SETUP_SQPOLL- When this flag is specified, a kernel thread is created to perform submission queue polling.An
io_uringinstance configured in this way enables an application to issue I/O without ever context switching into the kernel. By using the submission queue to fill in new submission queue entries and watching for completions on the completion queue, the application can submit and reap I/Os without doing a single system call.If the kernel thread is idle for more than
sq_thread_idlemilliseconds, it will set theSQ_NEED_WAKEUPbit in the flags field of the structio_sq_ring. When this happens, the application must callenterto wake the kernel thread. If I/O is kept busy, the kernel thread will never sleep. An application making use of this feature will need to guard theio_uring_enter()call with the following code sequence:// Ensure that the wakeup flag is read after the tail pointer // has been written. It's important to use memory load acquire // semantics for the flags read, as otherwise the application // and the kernel might not agree on the consistency of the // wakeup flag. unsigned flags = atomic_load_relaxed(sq_ring->flags); if (flags & IORING_SQ_NEED_WAKEUP) io_uring_enter(fd, 0, 0, IORING_ENTER_SQ_WAKEUP);where
sq_ringis a submission queue ring setup using the structio_sqring_offsetsdescribed below.Before version 5.11 of the Linux kernel, to successfully use this feature, the application must register a set of files to be used for IO through
registerusing theREGISTER_FILESopcode. Failure to do so will result in submitted IO being errored withEBADF. The presence of this feature can be detected by theFEAT_SQPOLL_NONFIXEDfeature flag. In version 5.11 and later, it is no longer necessary to register files to use this feature. 5.11 also allows using this as non-root, if the user has theCAP_SYS_NICEcapability.SETUP_SQ_AFF- If this flag is specified, then the poll thread will be bound to the cpu set in thesq_thread_cpufield of the structio_uring_params. This flag is only meaningful whenSETUP_SQPOLLis specified. Whencgroupsettingcpuset.cpuschanges (typically in container environment), the bounded cpu set may be changed as well.SETUP_CQSIZE- Create the completion queue with structio_uring_params.cq_entriesentries.The value must be greater than entries, and may be rounded up to the next power-of-two.
SETUP_CLAMP- If this flag is specified, and if entries exceedsMAX_ENTRIES, then entries will be clamped atIORING_MAX_ENTRIES.If the flag
SETUP_SQPOLLis set, and if the value of structio_uring_params.cq_entriesexceedsIORING_MAX_CQ_ENTRIES, then it will be clamped atIORING_MAX_CQ_ENTRIES.SETUP_ATTACH_WQ- This flag should be set in conjunction with structio_uring_params.wq_fdbeing set to an existingio_uringring file descriptor.When set, the
io_uringinstance being created will share the asynchronous worker thread backend of the specifiedio_uringring, rather than create a new separate thread pool.SETUP_R_DISABLED- If this flag is specified, the io_uring ring starts in a disabled state.In this state, restrictions can be registered, but submissions are not allowed. See
registerfor details on how to enable the ring.Available since 5.10.
SETUP_SUBMIT_ALL- Continue submit on error.Normally io_uring stops submitting a batch of request, if one of these requests results in an error. This can cause submission of less than what is expected, if a request ends in error while being submitted. If the ring is created with this flag,
enterwill continue submitting requests even if it encounters an error submitting a request. CQEs are still posted for errored request regardless of whether or not this flag is set at ring creation time, the only difference is if the submit sequence is halted or continued when an error is observed.Available since 5.18.
SETUP_COOP_TASKRUN- Cooperative task running.By default, io_uring will interrupt a task running in userspace when a completion event comes in. This is to ensure that completions run in a timely manner. For a lot of use cases, this is overkill and can cause reduced performance from both the inter-processor interrupt used to do this, the kernel/user transition, the needless interruption of the tasks userspace activities, and reduced batching if completions come in at a rapid rate. Most applications don't need the forceful interruption, as the events are processed at any kernel/user transition. The exception are setups where the application uses multiple threads operating on the same ring, where the application waiting on completions isn't the one that submitted them. For most other use cases, setting this flag will improve performance.
Available since 5.19.
SETUP_TASKRUN_FLAG- Used in conjunction withSETUP_COOP_TASKRUN, this provides a flag,SQ_TASKRUN, which is set in the SQ ringflagswhenever completions are pending that should be processed. liburing will check for this flag even when doingpeek_cqeand enter the kernel to process them, and applications can do the same. This makesIORING_SETUP_TASKRUN_FLAGsafe to use even when applications rely on a peek style operation on the CQ ring to see if anything might be pending to reap.Available since 5.19.
SETUP_SQE128- If set, io_uring will use 128-byte SQEs rather than the normal 64-byte sized variant.This is a requirement for using certain request types, as of 5.19 only the
OP_URING_CMDpassthrough command for NVMe passthrough needs this.Available since 5.19.
SETUP_CQE32- If set, io_uring will use 32-byte CQEs rather than the normal 32-byte sized variant.This is a requirement for using certain request types, as of 5.19 only the
OP_URING_CMDpassthrough command for NVMe passthrough needs this.Available since 5.19.
SETUP_SINGLE_ISSUER- A hint to the kernel that only a single task can submit requests, which is used for internal optimisations.The kernel enforces the rule, which only affects
entercalls submitting requests and will fail them with-EEXISTif the restriction is violated. The submitter task may differ from the task that created the ring. Note that whenSETUP_SQPOLLis set it is considered that the polling task is doing all submissions on behalf of the userspace and so it always complies with the rule disregarding how many userspace tasks doio_uring_enter.Available since 5.20.
SETUP_DEFER_TASKRUN- Defer running task work to get events.By default, io_uring will process all outstanding work at the end of any system call or thread interrupt. This can delay the application from making other progress. Setting this flag will hint to io_uring that it should defer work until an
entercall with theENTER_GETEVENTSflag set. This allows the application to request work to run just before it wants to process completions. This flag requires theSETUP_SINGLE_ISSUERflag to be set, and also enforces that the call toio_uring_enteris called from the same thread that submitted requests. Note that if this flag is set then it is the application's responsibility to periodically trigger work (for example via any of the CQE waiting functions) or else completions may not be delivered.Available since 6.1.
SETUP_NO_MMAP- Application provides the memory for the rings.SETUP_REGISTERED_FD_ONLY- Register the ring fd in itself for use withREGISTER_USE_REGISTERED_RING; return a registered fd index rather than an fd.SETUP_NO_SQARRAY- Removes indirection through the SQ index array.
- See Also:
-
IORING_SETUP_CLAMP
public static final int IORING_SETUP_CLAMPio_uring_setup()flagsEnum values:
SETUP_IOPOLL- Perform busy-waiting for an I/O completion, as opposed to getting notifications via an asynchronous IRQ (Interrupt Request).The file system (if any) and block device must support polling in order for this to work. Busy-waiting provides lower latency, but may consume more CPU resources than interrupt driven I/O. Currently, this feature is usable only on a file descriptor opened using the
O_DIRECTflag. When a read or write is submitted to a polled context, the application must poll for completions on the CQ ring by callingenter. It is illegal to mix and match polled and non-polled I/O on an io_uring instance.SETUP_SQPOLL- When this flag is specified, a kernel thread is created to perform submission queue polling.An
io_uringinstance configured in this way enables an application to issue I/O without ever context switching into the kernel. By using the submission queue to fill in new submission queue entries and watching for completions on the completion queue, the application can submit and reap I/Os without doing a single system call.If the kernel thread is idle for more than
sq_thread_idlemilliseconds, it will set theSQ_NEED_WAKEUPbit in the flags field of the structio_sq_ring. When this happens, the application must callenterto wake the kernel thread. If I/O is kept busy, the kernel thread will never sleep. An application making use of this feature will need to guard theio_uring_enter()call with the following code sequence:// Ensure that the wakeup flag is read after the tail pointer // has been written. It's important to use memory load acquire // semantics for the flags read, as otherwise the application // and the kernel might not agree on the consistency of the // wakeup flag. unsigned flags = atomic_load_relaxed(sq_ring->flags); if (flags & IORING_SQ_NEED_WAKEUP) io_uring_enter(fd, 0, 0, IORING_ENTER_SQ_WAKEUP);where
sq_ringis a submission queue ring setup using the structio_sqring_offsetsdescribed below.Before version 5.11 of the Linux kernel, to successfully use this feature, the application must register a set of files to be used for IO through
registerusing theREGISTER_FILESopcode. Failure to do so will result in submitted IO being errored withEBADF. The presence of this feature can be detected by theFEAT_SQPOLL_NONFIXEDfeature flag. In version 5.11 and later, it is no longer necessary to register files to use this feature. 5.11 also allows using this as non-root, if the user has theCAP_SYS_NICEcapability.SETUP_SQ_AFF- If this flag is specified, then the poll thread will be bound to the cpu set in thesq_thread_cpufield of the structio_uring_params. This flag is only meaningful whenSETUP_SQPOLLis specified. Whencgroupsettingcpuset.cpuschanges (typically in container environment), the bounded cpu set may be changed as well.SETUP_CQSIZE- Create the completion queue with structio_uring_params.cq_entriesentries.The value must be greater than entries, and may be rounded up to the next power-of-two.
SETUP_CLAMP- If this flag is specified, and if entries exceedsMAX_ENTRIES, then entries will be clamped atIORING_MAX_ENTRIES.If the flag
SETUP_SQPOLLis set, and if the value of structio_uring_params.cq_entriesexceedsIORING_MAX_CQ_ENTRIES, then it will be clamped atIORING_MAX_CQ_ENTRIES.SETUP_ATTACH_WQ- This flag should be set in conjunction with structio_uring_params.wq_fdbeing set to an existingio_uringring file descriptor.When set, the
io_uringinstance being created will share the asynchronous worker thread backend of the specifiedio_uringring, rather than create a new separate thread pool.SETUP_R_DISABLED- If this flag is specified, the io_uring ring starts in a disabled state.In this state, restrictions can be registered, but submissions are not allowed. See
registerfor details on how to enable the ring.Available since 5.10.
SETUP_SUBMIT_ALL- Continue submit on error.Normally io_uring stops submitting a batch of request, if one of these requests results in an error. This can cause submission of less than what is expected, if a request ends in error while being submitted. If the ring is created with this flag,
enterwill continue submitting requests even if it encounters an error submitting a request. CQEs are still posted for errored request regardless of whether or not this flag is set at ring creation time, the only difference is if the submit sequence is halted or continued when an error is observed.Available since 5.18.
SETUP_COOP_TASKRUN- Cooperative task running.By default, io_uring will interrupt a task running in userspace when a completion event comes in. This is to ensure that completions run in a timely manner. For a lot of use cases, this is overkill and can cause reduced performance from both the inter-processor interrupt used to do this, the kernel/user transition, the needless interruption of the tasks userspace activities, and reduced batching if completions come in at a rapid rate. Most applications don't need the forceful interruption, as the events are processed at any kernel/user transition. The exception are setups where the application uses multiple threads operating on the same ring, where the application waiting on completions isn't the one that submitted them. For most other use cases, setting this flag will improve performance.
Available since 5.19.
SETUP_TASKRUN_FLAG- Used in conjunction withSETUP_COOP_TASKRUN, this provides a flag,SQ_TASKRUN, which is set in the SQ ringflagswhenever completions are pending that should be processed. liburing will check for this flag even when doingpeek_cqeand enter the kernel to process them, and applications can do the same. This makesIORING_SETUP_TASKRUN_FLAGsafe to use even when applications rely on a peek style operation on the CQ ring to see if anything might be pending to reap.Available since 5.19.
SETUP_SQE128- If set, io_uring will use 128-byte SQEs rather than the normal 64-byte sized variant.This is a requirement for using certain request types, as of 5.19 only the
OP_URING_CMDpassthrough command for NVMe passthrough needs this.Available since 5.19.
SETUP_CQE32- If set, io_uring will use 32-byte CQEs rather than the normal 32-byte sized variant.This is a requirement for using certain request types, as of 5.19 only the
OP_URING_CMDpassthrough command for NVMe passthrough needs this.Available since 5.19.
SETUP_SINGLE_ISSUER- A hint to the kernel that only a single task can submit requests, which is used for internal optimisations.The kernel enforces the rule, which only affects
entercalls submitting requests and will fail them with-EEXISTif the restriction is violated. The submitter task may differ from the task that created the ring. Note that whenSETUP_SQPOLLis set it is considered that the polling task is doing all submissions on behalf of the userspace and so it always complies with the rule disregarding how many userspace tasks doio_uring_enter.Available since 5.20.
SETUP_DEFER_TASKRUN- Defer running task work to get events.By default, io_uring will process all outstanding work at the end of any system call or thread interrupt. This can delay the application from making other progress. Setting this flag will hint to io_uring that it should defer work until an
entercall with theENTER_GETEVENTSflag set. This allows the application to request work to run just before it wants to process completions. This flag requires theSETUP_SINGLE_ISSUERflag to be set, and also enforces that the call toio_uring_enteris called from the same thread that submitted requests. Note that if this flag is set then it is the application's responsibility to periodically trigger work (for example via any of the CQE waiting functions) or else completions may not be delivered.Available since 6.1.
SETUP_NO_MMAP- Application provides the memory for the rings.SETUP_REGISTERED_FD_ONLY- Register the ring fd in itself for use withREGISTER_USE_REGISTERED_RING; return a registered fd index rather than an fd.SETUP_NO_SQARRAY- Removes indirection through the SQ index array.
- See Also:
-
IORING_SETUP_ATTACH_WQ
public static final int IORING_SETUP_ATTACH_WQio_uring_setup()flagsEnum values:
SETUP_IOPOLL- Perform busy-waiting for an I/O completion, as opposed to getting notifications via an asynchronous IRQ (Interrupt Request).The file system (if any) and block device must support polling in order for this to work. Busy-waiting provides lower latency, but may consume more CPU resources than interrupt driven I/O. Currently, this feature is usable only on a file descriptor opened using the
O_DIRECTflag. When a read or write is submitted to a polled context, the application must poll for completions on the CQ ring by callingenter. It is illegal to mix and match polled and non-polled I/O on an io_uring instance.SETUP_SQPOLL- When this flag is specified, a kernel thread is created to perform submission queue polling.An
io_uringinstance configured in this way enables an application to issue I/O without ever context switching into the kernel. By using the submission queue to fill in new submission queue entries and watching for completions on the completion queue, the application can submit and reap I/Os without doing a single system call.If the kernel thread is idle for more than
sq_thread_idlemilliseconds, it will set theSQ_NEED_WAKEUPbit in the flags field of the structio_sq_ring. When this happens, the application must callenterto wake the kernel thread. If I/O is kept busy, the kernel thread will never sleep. An application making use of this feature will need to guard theio_uring_enter()call with the following code sequence:// Ensure that the wakeup flag is read after the tail pointer // has been written. It's important to use memory load acquire // semantics for the flags read, as otherwise the application // and the kernel might not agree on the consistency of the // wakeup flag. unsigned flags = atomic_load_relaxed(sq_ring->flags); if (flags & IORING_SQ_NEED_WAKEUP) io_uring_enter(fd, 0, 0, IORING_ENTER_SQ_WAKEUP);where
sq_ringis a submission queue ring setup using the structio_sqring_offsetsdescribed below.Before version 5.11 of the Linux kernel, to successfully use this feature, the application must register a set of files to be used for IO through
registerusing theREGISTER_FILESopcode. Failure to do so will result in submitted IO being errored withEBADF. The presence of this feature can be detected by theFEAT_SQPOLL_NONFIXEDfeature flag. In version 5.11 and later, it is no longer necessary to register files to use this feature. 5.11 also allows using this as non-root, if the user has theCAP_SYS_NICEcapability.SETUP_SQ_AFF- If this flag is specified, then the poll thread will be bound to the cpu set in thesq_thread_cpufield of the structio_uring_params. This flag is only meaningful whenSETUP_SQPOLLis specified. Whencgroupsettingcpuset.cpuschanges (typically in container environment), the bounded cpu set may be changed as well.SETUP_CQSIZE- Create the completion queue with structio_uring_params.cq_entriesentries.The value must be greater than entries, and may be rounded up to the next power-of-two.
SETUP_CLAMP- If this flag is specified, and if entries exceedsMAX_ENTRIES, then entries will be clamped atIORING_MAX_ENTRIES.If the flag
SETUP_SQPOLLis set, and if the value of structio_uring_params.cq_entriesexceedsIORING_MAX_CQ_ENTRIES, then it will be clamped atIORING_MAX_CQ_ENTRIES.SETUP_ATTACH_WQ- This flag should be set in conjunction with structio_uring_params.wq_fdbeing set to an existingio_uringring file descriptor.When set, the
io_uringinstance being created will share the asynchronous worker thread backend of the specifiedio_uringring, rather than create a new separate thread pool.SETUP_R_DISABLED- If this flag is specified, the io_uring ring starts in a disabled state.In this state, restrictions can be registered, but submissions are not allowed. See
registerfor details on how to enable the ring.Available since 5.10.
SETUP_SUBMIT_ALL- Continue submit on error.Normally io_uring stops submitting a batch of request, if one of these requests results in an error. This can cause submission of less than what is expected, if a request ends in error while being submitted. If the ring is created with this flag,
enterwill continue submitting requests even if it encounters an error submitting a request. CQEs are still posted for errored request regardless of whether or not this flag is set at ring creation time, the only difference is if the submit sequence is halted or continued when an error is observed.Available since 5.18.
SETUP_COOP_TASKRUN- Cooperative task running.By default, io_uring will interrupt a task running in userspace when a completion event comes in. This is to ensure that completions run in a timely manner. For a lot of use cases, this is overkill and can cause reduced performance from both the inter-processor interrupt used to do this, the kernel/user transition, the needless interruption of the tasks userspace activities, and reduced batching if completions come in at a rapid rate. Most applications don't need the forceful interruption, as the events are processed at any kernel/user transition. The exception are setups where the application uses multiple threads operating on the same ring, where the application waiting on completions isn't the one that submitted them. For most other use cases, setting this flag will improve performance.
Available since 5.19.
SETUP_TASKRUN_FLAG- Used in conjunction withSETUP_COOP_TASKRUN, this provides a flag,SQ_TASKRUN, which is set in the SQ ringflagswhenever completions are pending that should be processed. liburing will check for this flag even when doingpeek_cqeand enter the kernel to process them, and applications can do the same. This makesIORING_SETUP_TASKRUN_FLAGsafe to use even when applications rely on a peek style operation on the CQ ring to see if anything might be pending to reap.Available since 5.19.
SETUP_SQE128- If set, io_uring will use 128-byte SQEs rather than the normal 64-byte sized variant.This is a requirement for using certain request types, as of 5.19 only the
OP_URING_CMDpassthrough command for NVMe passthrough needs this.Available since 5.19.
SETUP_CQE32- If set, io_uring will use 32-byte CQEs rather than the normal 32-byte sized variant.This is a requirement for using certain request types, as of 5.19 only the
OP_URING_CMDpassthrough command for NVMe passthrough needs this.Available since 5.19.
SETUP_SINGLE_ISSUER- A hint to the kernel that only a single task can submit requests, which is used for internal optimisations.The kernel enforces the rule, which only affects
entercalls submitting requests and will fail them with-EEXISTif the restriction is violated. The submitter task may differ from the task that created the ring. Note that whenSETUP_SQPOLLis set it is considered that the polling task is doing all submissions on behalf of the userspace and so it always complies with the rule disregarding how many userspace tasks doio_uring_enter.Available since 5.20.
SETUP_DEFER_TASKRUN- Defer running task work to get events.By default, io_uring will process all outstanding work at the end of any system call or thread interrupt. This can delay the application from making other progress. Setting this flag will hint to io_uring that it should defer work until an
entercall with theENTER_GETEVENTSflag set. This allows the application to request work to run just before it wants to process completions. This flag requires theSETUP_SINGLE_ISSUERflag to be set, and also enforces that the call toio_uring_enteris called from the same thread that submitted requests. Note that if this flag is set then it is the application's responsibility to periodically trigger work (for example via any of the CQE waiting functions) or else completions may not be delivered.Available since 6.1.
SETUP_NO_MMAP- Application provides the memory for the rings.SETUP_REGISTERED_FD_ONLY- Register the ring fd in itself for use withREGISTER_USE_REGISTERED_RING; return a registered fd index rather than an fd.SETUP_NO_SQARRAY- Removes indirection through the SQ index array.
- See Also:
-
IORING_SETUP_R_DISABLED
public static final int IORING_SETUP_R_DISABLEDio_uring_setup()flagsEnum values:
SETUP_IOPOLL- Perform busy-waiting for an I/O completion, as opposed to getting notifications via an asynchronous IRQ (Interrupt Request).The file system (if any) and block device must support polling in order for this to work. Busy-waiting provides lower latency, but may consume more CPU resources than interrupt driven I/O. Currently, this feature is usable only on a file descriptor opened using the
O_DIRECTflag. When a read or write is submitted to a polled context, the application must poll for completions on the CQ ring by callingenter. It is illegal to mix and match polled and non-polled I/O on an io_uring instance.SETUP_SQPOLL- When this flag is specified, a kernel thread is created to perform submission queue polling.An
io_uringinstance configured in this way enables an application to issue I/O without ever context switching into the kernel. By using the submission queue to fill in new submission queue entries and watching for completions on the completion queue, the application can submit and reap I/Os without doing a single system call.If the kernel thread is idle for more than
sq_thread_idlemilliseconds, it will set theSQ_NEED_WAKEUPbit in the flags field of the structio_sq_ring. When this happens, the application must callenterto wake the kernel thread. If I/O is kept busy, the kernel thread will never sleep. An application making use of this feature will need to guard theio_uring_enter()call with the following code sequence:// Ensure that the wakeup flag is read after the tail pointer // has been written. It's important to use memory load acquire // semantics for the flags read, as otherwise the application // and the kernel might not agree on the consistency of the // wakeup flag. unsigned flags = atomic_load_relaxed(sq_ring->flags); if (flags & IORING_SQ_NEED_WAKEUP) io_uring_enter(fd, 0, 0, IORING_ENTER_SQ_WAKEUP);where
sq_ringis a submission queue ring setup using the structio_sqring_offsetsdescribed below.Before version 5.11 of the Linux kernel, to successfully use this feature, the application must register a set of files to be used for IO through
registerusing theREGISTER_FILESopcode. Failure to do so will result in submitted IO being errored withEBADF. The presence of this feature can be detected by theFEAT_SQPOLL_NONFIXEDfeature flag. In version 5.11 and later, it is no longer necessary to register files to use this feature. 5.11 also allows using this as non-root, if the user has theCAP_SYS_NICEcapability.SETUP_SQ_AFF- If this flag is specified, then the poll thread will be bound to the cpu set in thesq_thread_cpufield of the structio_uring_params. This flag is only meaningful whenSETUP_SQPOLLis specified. Whencgroupsettingcpuset.cpuschanges (typically in container environment), the bounded cpu set may be changed as well.SETUP_CQSIZE- Create the completion queue with structio_uring_params.cq_entriesentries.The value must be greater than entries, and may be rounded up to the next power-of-two.
SETUP_CLAMP- If this flag is specified, and if entries exceedsMAX_ENTRIES, then entries will be clamped atIORING_MAX_ENTRIES.If the flag
SETUP_SQPOLLis set, and if the value of structio_uring_params.cq_entriesexceedsIORING_MAX_CQ_ENTRIES, then it will be clamped atIORING_MAX_CQ_ENTRIES.SETUP_ATTACH_WQ- This flag should be set in conjunction with structio_uring_params.wq_fdbeing set to an existingio_uringring file descriptor.When set, the
io_uringinstance being created will share the asynchronous worker thread backend of the specifiedio_uringring, rather than create a new separate thread pool.SETUP_R_DISABLED- If this flag is specified, the io_uring ring starts in a disabled state.In this state, restrictions can be registered, but submissions are not allowed. See
registerfor details on how to enable the ring.Available since 5.10.
SETUP_SUBMIT_ALL- Continue submit on error.Normally io_uring stops submitting a batch of request, if one of these requests results in an error. This can cause submission of less than what is expected, if a request ends in error while being submitted. If the ring is created with this flag,
enterwill continue submitting requests even if it encounters an error submitting a request. CQEs are still posted for errored request regardless of whether or not this flag is set at ring creation time, the only difference is if the submit sequence is halted or continued when an error is observed.Available since 5.18.
SETUP_COOP_TASKRUN- Cooperative task running.By default, io_uring will interrupt a task running in userspace when a completion event comes in. This is to ensure that completions run in a timely manner. For a lot of use cases, this is overkill and can cause reduced performance from both the inter-processor interrupt used to do this, the kernel/user transition, the needless interruption of the tasks userspace activities, and reduced batching if completions come in at a rapid rate. Most applications don't need the forceful interruption, as the events are processed at any kernel/user transition. The exception are setups where the application uses multiple threads operating on the same ring, where the application waiting on completions isn't the one that submitted them. For most other use cases, setting this flag will improve performance.
Available since 5.19.
SETUP_TASKRUN_FLAG- Used in conjunction withSETUP_COOP_TASKRUN, this provides a flag,SQ_TASKRUN, which is set in the SQ ringflagswhenever completions are pending that should be processed. liburing will check for this flag even when doingpeek_cqeand enter the kernel to process them, and applications can do the same. This makesIORING_SETUP_TASKRUN_FLAGsafe to use even when applications rely on a peek style operation on the CQ ring to see if anything might be pending to reap.Available since 5.19.
SETUP_SQE128- If set, io_uring will use 128-byte SQEs rather than the normal 64-byte sized variant.This is a requirement for using certain request types, as of 5.19 only the
OP_URING_CMDpassthrough command for NVMe passthrough needs this.Available since 5.19.
SETUP_CQE32- If set, io_uring will use 32-byte CQEs rather than the normal 32-byte sized variant.This is a requirement for using certain request types, as of 5.19 only the
OP_URING_CMDpassthrough command for NVMe passthrough needs this.Available since 5.19.
SETUP_SINGLE_ISSUER- A hint to the kernel that only a single task can submit requests, which is used for internal optimisations.The kernel enforces the rule, which only affects
entercalls submitting requests and will fail them with-EEXISTif the restriction is violated. The submitter task may differ from the task that created the ring. Note that whenSETUP_SQPOLLis set it is considered that the polling task is doing all submissions on behalf of the userspace and so it always complies with the rule disregarding how many userspace tasks doio_uring_enter.Available since 5.20.
SETUP_DEFER_TASKRUN- Defer running task work to get events.By default, io_uring will process all outstanding work at the end of any system call or thread interrupt. This can delay the application from making other progress. Setting this flag will hint to io_uring that it should defer work until an
entercall with theENTER_GETEVENTSflag set. This allows the application to request work to run just before it wants to process completions. This flag requires theSETUP_SINGLE_ISSUERflag to be set, and also enforces that the call toio_uring_enteris called from the same thread that submitted requests. Note that if this flag is set then it is the application's responsibility to periodically trigger work (for example via any of the CQE waiting functions) or else completions may not be delivered.Available since 6.1.
SETUP_NO_MMAP- Application provides the memory for the rings.SETUP_REGISTERED_FD_ONLY- Register the ring fd in itself for use withREGISTER_USE_REGISTERED_RING; return a registered fd index rather than an fd.SETUP_NO_SQARRAY- Removes indirection through the SQ index array.
- See Also:
-
IORING_SETUP_SUBMIT_ALL
public static final int IORING_SETUP_SUBMIT_ALLio_uring_setup()flagsEnum values:
SETUP_IOPOLL- Perform busy-waiting for an I/O completion, as opposed to getting notifications via an asynchronous IRQ (Interrupt Request).The file system (if any) and block device must support polling in order for this to work. Busy-waiting provides lower latency, but may consume more CPU resources than interrupt driven I/O. Currently, this feature is usable only on a file descriptor opened using the
O_DIRECTflag. When a read or write is submitted to a polled context, the application must poll for completions on the CQ ring by callingenter. It is illegal to mix and match polled and non-polled I/O on an io_uring instance.SETUP_SQPOLL- When this flag is specified, a kernel thread is created to perform submission queue polling.An
io_uringinstance configured in this way enables an application to issue I/O without ever context switching into the kernel. By using the submission queue to fill in new submission queue entries and watching for completions on the completion queue, the application can submit and reap I/Os without doing a single system call.If the kernel thread is idle for more than
sq_thread_idlemilliseconds, it will set theSQ_NEED_WAKEUPbit in the flags field of the structio_sq_ring. When this happens, the application must callenterto wake the kernel thread. If I/O is kept busy, the kernel thread will never sleep. An application making use of this feature will need to guard theio_uring_enter()call with the following code sequence:// Ensure that the wakeup flag is read after the tail pointer // has been written. It's important to use memory load acquire // semantics for the flags read, as otherwise the application // and the kernel might not agree on the consistency of the // wakeup flag. unsigned flags = atomic_load_relaxed(sq_ring->flags); if (flags & IORING_SQ_NEED_WAKEUP) io_uring_enter(fd, 0, 0, IORING_ENTER_SQ_WAKEUP);where
sq_ringis a submission queue ring setup using the structio_sqring_offsetsdescribed below.Before version 5.11 of the Linux kernel, to successfully use this feature, the application must register a set of files to be used for IO through
registerusing theREGISTER_FILESopcode. Failure to do so will result in submitted IO being errored withEBADF. The presence of this feature can be detected by theFEAT_SQPOLL_NONFIXEDfeature flag. In version 5.11 and later, it is no longer necessary to register files to use this feature. 5.11 also allows using this as non-root, if the user has theCAP_SYS_NICEcapability.SETUP_SQ_AFF- If this flag is specified, then the poll thread will be bound to the cpu set in thesq_thread_cpufield of the structio_uring_params. This flag is only meaningful whenSETUP_SQPOLLis specified. Whencgroupsettingcpuset.cpuschanges (typically in container environment), the bounded cpu set may be changed as well.SETUP_CQSIZE- Create the completion queue with structio_uring_params.cq_entriesentries.The value must be greater than entries, and may be rounded up to the next power-of-two.
SETUP_CLAMP- If this flag is specified, and if entries exceedsMAX_ENTRIES, then entries will be clamped atIORING_MAX_ENTRIES.If the flag
SETUP_SQPOLLis set, and if the value of structio_uring_params.cq_entriesexceedsIORING_MAX_CQ_ENTRIES, then it will be clamped atIORING_MAX_CQ_ENTRIES.SETUP_ATTACH_WQ- This flag should be set in conjunction with structio_uring_params.wq_fdbeing set to an existingio_uringring file descriptor.When set, the
io_uringinstance being created will share the asynchronous worker thread backend of the specifiedio_uringring, rather than create a new separate thread pool.SETUP_R_DISABLED- If this flag is specified, the io_uring ring starts in a disabled state.In this state, restrictions can be registered, but submissions are not allowed. See
registerfor details on how to enable the ring.Available since 5.10.
SETUP_SUBMIT_ALL- Continue submit on error.Normally io_uring stops submitting a batch of request, if one of these requests results in an error. This can cause submission of less than what is expected, if a request ends in error while being submitted. If the ring is created with this flag,
enterwill continue submitting requests even if it encounters an error submitting a request. CQEs are still posted for errored request regardless of whether or not this flag is set at ring creation time, the only difference is if the submit sequence is halted or continued when an error is observed.Available since 5.18.
SETUP_COOP_TASKRUN- Cooperative task running.By default, io_uring will interrupt a task running in userspace when a completion event comes in. This is to ensure that completions run in a timely manner. For a lot of use cases, this is overkill and can cause reduced performance from both the inter-processor interrupt used to do this, the kernel/user transition, the needless interruption of the tasks userspace activities, and reduced batching if completions come in at a rapid rate. Most applications don't need the forceful interruption, as the events are processed at any kernel/user transition. The exception are setups where the application uses multiple threads operating on the same ring, where the application waiting on completions isn't the one that submitted them. For most other use cases, setting this flag will improve performance.
Available since 5.19.
SETUP_TASKRUN_FLAG- Used in conjunction withSETUP_COOP_TASKRUN, this provides a flag,SQ_TASKRUN, which is set in the SQ ringflagswhenever completions are pending that should be processed. liburing will check for this flag even when doingpeek_cqeand enter the kernel to process them, and applications can do the same. This makesIORING_SETUP_TASKRUN_FLAGsafe to use even when applications rely on a peek style operation on the CQ ring to see if anything might be pending to reap.Available since 5.19.
SETUP_SQE128- If set, io_uring will use 128-byte SQEs rather than the normal 64-byte sized variant.This is a requirement for using certain request types, as of 5.19 only the
OP_URING_CMDpassthrough command for NVMe passthrough needs this.Available since 5.19.
SETUP_CQE32- If set, io_uring will use 32-byte CQEs rather than the normal 32-byte sized variant.This is a requirement for using certain request types, as of 5.19 only the
OP_URING_CMDpassthrough command for NVMe passthrough needs this.Available since 5.19.
SETUP_SINGLE_ISSUER- A hint to the kernel that only a single task can submit requests, which is used for internal optimisations.The kernel enforces the rule, which only affects
entercalls submitting requests and will fail them with-EEXISTif the restriction is violated. The submitter task may differ from the task that created the ring. Note that whenSETUP_SQPOLLis set it is considered that the polling task is doing all submissions on behalf of the userspace and so it always complies with the rule disregarding how many userspace tasks doio_uring_enter.Available since 5.20.
SETUP_DEFER_TASKRUN- Defer running task work to get events.By default, io_uring will process all outstanding work at the end of any system call or thread interrupt. This can delay the application from making other progress. Setting this flag will hint to io_uring that it should defer work until an
entercall with theENTER_GETEVENTSflag set. This allows the application to request work to run just before it wants to process completions. This flag requires theSETUP_SINGLE_ISSUERflag to be set, and also enforces that the call toio_uring_enteris called from the same thread that submitted requests. Note that if this flag is set then it is the application's responsibility to periodically trigger work (for example via any of the CQE waiting functions) or else completions may not be delivered.Available since 6.1.
SETUP_NO_MMAP- Application provides the memory for the rings.SETUP_REGISTERED_FD_ONLY- Register the ring fd in itself for use withREGISTER_USE_REGISTERED_RING; return a registered fd index rather than an fd.SETUP_NO_SQARRAY- Removes indirection through the SQ index array.
- See Also:
-
IORING_SETUP_COOP_TASKRUN
public static final int IORING_SETUP_COOP_TASKRUNio_uring_setup()flagsEnum values:
SETUP_IOPOLL- Perform busy-waiting for an I/O completion, as opposed to getting notifications via an asynchronous IRQ (Interrupt Request).The file system (if any) and block device must support polling in order for this to work. Busy-waiting provides lower latency, but may consume more CPU resources than interrupt driven I/O. Currently, this feature is usable only on a file descriptor opened using the
O_DIRECTflag. When a read or write is submitted to a polled context, the application must poll for completions on the CQ ring by callingenter. It is illegal to mix and match polled and non-polled I/O on an io_uring instance.SETUP_SQPOLL- When this flag is specified, a kernel thread is created to perform submission queue polling.An
io_uringinstance configured in this way enables an application to issue I/O without ever context switching into the kernel. By using the submission queue to fill in new submission queue entries and watching for completions on the completion queue, the application can submit and reap I/Os without doing a single system call.If the kernel thread is idle for more than
sq_thread_idlemilliseconds, it will set theSQ_NEED_WAKEUPbit in the flags field of the structio_sq_ring. When this happens, the application must callenterto wake the kernel thread. If I/O is kept busy, the kernel thread will never sleep. An application making use of this feature will need to guard theio_uring_enter()call with the following code sequence:// Ensure that the wakeup flag is read after the tail pointer // has been written. It's important to use memory load acquire // semantics for the flags read, as otherwise the application // and the kernel might not agree on the consistency of the // wakeup flag. unsigned flags = atomic_load_relaxed(sq_ring->flags); if (flags & IORING_SQ_NEED_WAKEUP) io_uring_enter(fd, 0, 0, IORING_ENTER_SQ_WAKEUP);where
sq_ringis a submission queue ring setup using the structio_sqring_offsetsdescribed below.Before version 5.11 of the Linux kernel, to successfully use this feature, the application must register a set of files to be used for IO through
registerusing theREGISTER_FILESopcode. Failure to do so will result in submitted IO being errored withEBADF. The presence of this feature can be detected by theFEAT_SQPOLL_NONFIXEDfeature flag. In version 5.11 and later, it is no longer necessary to register files to use this feature. 5.11 also allows using this as non-root, if the user has theCAP_SYS_NICEcapability.SETUP_SQ_AFF- If this flag is specified, then the poll thread will be bound to the cpu set in thesq_thread_cpufield of the structio_uring_params. This flag is only meaningful whenSETUP_SQPOLLis specified. Whencgroupsettingcpuset.cpuschanges (typically in container environment), the bounded cpu set may be changed as well.SETUP_CQSIZE- Create the completion queue with structio_uring_params.cq_entriesentries.The value must be greater than entries, and may be rounded up to the next power-of-two.
SETUP_CLAMP- If this flag is specified, and if entries exceedsMAX_ENTRIES, then entries will be clamped atIORING_MAX_ENTRIES.If the flag
SETUP_SQPOLLis set, and if the value of structio_uring_params.cq_entriesexceedsIORING_MAX_CQ_ENTRIES, then it will be clamped atIORING_MAX_CQ_ENTRIES.SETUP_ATTACH_WQ- This flag should be set in conjunction with structio_uring_params.wq_fdbeing set to an existingio_uringring file descriptor.When set, the
io_uringinstance being created will share the asynchronous worker thread backend of the specifiedio_uringring, rather than create a new separate thread pool.SETUP_R_DISABLED- If this flag is specified, the io_uring ring starts in a disabled state.In this state, restrictions can be registered, but submissions are not allowed. See
registerfor details on how to enable the ring.Available since 5.10.
SETUP_SUBMIT_ALL- Continue submit on error.Normally io_uring stops submitting a batch of request, if one of these requests results in an error. This can cause submission of less than what is expected, if a request ends in error while being submitted. If the ring is created with this flag,
enterwill continue submitting requests even if it encounters an error submitting a request. CQEs are still posted for errored request regardless of whether or not this flag is set at ring creation time, the only difference is if the submit sequence is halted or continued when an error is observed.Available since 5.18.
SETUP_COOP_TASKRUN- Cooperative task running.By default, io_uring will interrupt a task running in userspace when a completion event comes in. This is to ensure that completions run in a timely manner. For a lot of use cases, this is overkill and can cause reduced performance from both the inter-processor interrupt used to do this, the kernel/user transition, the needless interruption of the tasks userspace activities, and reduced batching if completions come in at a rapid rate. Most applications don't need the forceful interruption, as the events are processed at any kernel/user transition. The exception are setups where the application uses multiple threads operating on the same ring, where the application waiting on completions isn't the one that submitted them. For most other use cases, setting this flag will improve performance.
Available since 5.19.
SETUP_TASKRUN_FLAG- Used in conjunction withSETUP_COOP_TASKRUN, this provides a flag,SQ_TASKRUN, which is set in the SQ ringflagswhenever completions are pending that should be processed. liburing will check for this flag even when doingpeek_cqeand enter the kernel to process them, and applications can do the same. This makesIORING_SETUP_TASKRUN_FLAGsafe to use even when applications rely on a peek style operation on the CQ ring to see if anything might be pending to reap.Available since 5.19.
SETUP_SQE128- If set, io_uring will use 128-byte SQEs rather than the normal 64-byte sized variant.This is a requirement for using certain request types, as of 5.19 only the
OP_URING_CMDpassthrough command for NVMe passthrough needs this.Available since 5.19.
SETUP_CQE32- If set, io_uring will use 32-byte CQEs rather than the normal 32-byte sized variant.This is a requirement for using certain request types, as of 5.19 only the
OP_URING_CMDpassthrough command for NVMe passthrough needs this.Available since 5.19.
SETUP_SINGLE_ISSUER- A hint to the kernel that only a single task can submit requests, which is used for internal optimisations.The kernel enforces the rule, which only affects
entercalls submitting requests and will fail them with-EEXISTif the restriction is violated. The submitter task may differ from the task that created the ring. Note that whenSETUP_SQPOLLis set it is considered that the polling task is doing all submissions on behalf of the userspace and so it always complies with the rule disregarding how many userspace tasks doio_uring_enter.Available since 5.20.
SETUP_DEFER_TASKRUN- Defer running task work to get events.By default, io_uring will process all outstanding work at the end of any system call or thread interrupt. This can delay the application from making other progress. Setting this flag will hint to io_uring that it should defer work until an
entercall with theENTER_GETEVENTSflag set. This allows the application to request work to run just before it wants to process completions. This flag requires theSETUP_SINGLE_ISSUERflag to be set, and also enforces that the call toio_uring_enteris called from the same thread that submitted requests. Note that if this flag is set then it is the application's responsibility to periodically trigger work (for example via any of the CQE waiting functions) or else completions may not be delivered.Available since 6.1.
SETUP_NO_MMAP- Application provides the memory for the rings.SETUP_REGISTERED_FD_ONLY- Register the ring fd in itself for use withREGISTER_USE_REGISTERED_RING; return a registered fd index rather than an fd.SETUP_NO_SQARRAY- Removes indirection through the SQ index array.
- See Also:
-
IORING_SETUP_TASKRUN_FLAG
public static final int IORING_SETUP_TASKRUN_FLAGio_uring_setup()flagsEnum values:
SETUP_IOPOLL- Perform busy-waiting for an I/O completion, as opposed to getting notifications via an asynchronous IRQ (Interrupt Request).The file system (if any) and block device must support polling in order for this to work. Busy-waiting provides lower latency, but may consume more CPU resources than interrupt driven I/O. Currently, this feature is usable only on a file descriptor opened using the
O_DIRECTflag. When a read or write is submitted to a polled context, the application must poll for completions on the CQ ring by callingenter. It is illegal to mix and match polled and non-polled I/O on an io_uring instance.SETUP_SQPOLL- When this flag is specified, a kernel thread is created to perform submission queue polling.An
io_uringinstance configured in this way enables an application to issue I/O without ever context switching into the kernel. By using the submission queue to fill in new submission queue entries and watching for completions on the completion queue, the application can submit and reap I/Os without doing a single system call.If the kernel thread is idle for more than
sq_thread_idlemilliseconds, it will set theSQ_NEED_WAKEUPbit in the flags field of the structio_sq_ring. When this happens, the application must callenterto wake the kernel thread. If I/O is kept busy, the kernel thread will never sleep. An application making use of this feature will need to guard theio_uring_enter()call with the following code sequence:// Ensure that the wakeup flag is read after the tail pointer // has been written. It's important to use memory load acquire // semantics for the flags read, as otherwise the application // and the kernel might not agree on the consistency of the // wakeup flag. unsigned flags = atomic_load_relaxed(sq_ring->flags); if (flags & IORING_SQ_NEED_WAKEUP) io_uring_enter(fd, 0, 0, IORING_ENTER_SQ_WAKEUP);where
sq_ringis a submission queue ring setup using the structio_sqring_offsetsdescribed below.Before version 5.11 of the Linux kernel, to successfully use this feature, the application must register a set of files to be used for IO through
registerusing theREGISTER_FILESopcode. Failure to do so will result in submitted IO being errored withEBADF. The presence of this feature can be detected by theFEAT_SQPOLL_NONFIXEDfeature flag. In version 5.11 and later, it is no longer necessary to register files to use this feature. 5.11 also allows using this as non-root, if the user has theCAP_SYS_NICEcapability.SETUP_SQ_AFF- If this flag is specified, then the poll thread will be bound to the cpu set in thesq_thread_cpufield of the structio_uring_params. This flag is only meaningful whenSETUP_SQPOLLis specified. Whencgroupsettingcpuset.cpuschanges (typically in container environment), the bounded cpu set may be changed as well.SETUP_CQSIZE- Create the completion queue with structio_uring_params.cq_entriesentries.The value must be greater than entries, and may be rounded up to the next power-of-two.
SETUP_CLAMP- If this flag is specified, and if entries exceedsMAX_ENTRIES, then entries will be clamped atIORING_MAX_ENTRIES.If the flag
SETUP_SQPOLLis set, and if the value of structio_uring_params.cq_entriesexceedsIORING_MAX_CQ_ENTRIES, then it will be clamped atIORING_MAX_CQ_ENTRIES.SETUP_ATTACH_WQ- This flag should be set in conjunction with structio_uring_params.wq_fdbeing set to an existingio_uringring file descriptor.When set, the
io_uringinstance being created will share the asynchronous worker thread backend of the specifiedio_uringring, rather than create a new separate thread pool.SETUP_R_DISABLED- If this flag is specified, the io_uring ring starts in a disabled state.In this state, restrictions can be registered, but submissions are not allowed. See
registerfor details on how to enable the ring.Available since 5.10.
SETUP_SUBMIT_ALL- Continue submit on error.Normally io_uring stops submitting a batch of request, if one of these requests results in an error. This can cause submission of less than what is expected, if a request ends in error while being submitted. If the ring is created with this flag,
enterwill continue submitting requests even if it encounters an error submitting a request. CQEs are still posted for errored request regardless of whether or not this flag is set at ring creation time, the only difference is if the submit sequence is halted or continued when an error is observed.Available since 5.18.
SETUP_COOP_TASKRUN- Cooperative task running.By default, io_uring will interrupt a task running in userspace when a completion event comes in. This is to ensure that completions run in a timely manner. For a lot of use cases, this is overkill and can cause reduced performance from both the inter-processor interrupt used to do this, the kernel/user transition, the needless interruption of the tasks userspace activities, and reduced batching if completions come in at a rapid rate. Most applications don't need the forceful interruption, as the events are processed at any kernel/user transition. The exception are setups where the application uses multiple threads operating on the same ring, where the application waiting on completions isn't the one that submitted them. For most other use cases, setting this flag will improve performance.
Available since 5.19.
SETUP_TASKRUN_FLAG- Used in conjunction withSETUP_COOP_TASKRUN, this provides a flag,SQ_TASKRUN, which is set in the SQ ringflagswhenever completions are pending that should be processed. liburing will check for this flag even when doingpeek_cqeand enter the kernel to process them, and applications can do the same. This makesIORING_SETUP_TASKRUN_FLAGsafe to use even when applications rely on a peek style operation on the CQ ring to see if anything might be pending to reap.Available since 5.19.
SETUP_SQE128- If set, io_uring will use 128-byte SQEs rather than the normal 64-byte sized variant.This is a requirement for using certain request types, as of 5.19 only the
OP_URING_CMDpassthrough command for NVMe passthrough needs this.Available since 5.19.
SETUP_CQE32- If set, io_uring will use 32-byte CQEs rather than the normal 32-byte sized variant.This is a requirement for using certain request types, as of 5.19 only the
OP_URING_CMDpassthrough command for NVMe passthrough needs this.Available since 5.19.
SETUP_SINGLE_ISSUER- A hint to the kernel that only a single task can submit requests, which is used for internal optimisations.The kernel enforces the rule, which only affects
entercalls submitting requests and will fail them with-EEXISTif the restriction is violated. The submitter task may differ from the task that created the ring. Note that whenSETUP_SQPOLLis set it is considered that the polling task is doing all submissions on behalf of the userspace and so it always complies with the rule disregarding how many userspace tasks doio_uring_enter.Available since 5.20.
SETUP_DEFER_TASKRUN- Defer running task work to get events.By default, io_uring will process all outstanding work at the end of any system call or thread interrupt. This can delay the application from making other progress. Setting this flag will hint to io_uring that it should defer work until an
entercall with theENTER_GETEVENTSflag set. This allows the application to request work to run just before it wants to process completions. This flag requires theSETUP_SINGLE_ISSUERflag to be set, and also enforces that the call toio_uring_enteris called from the same thread that submitted requests. Note that if this flag is set then it is the application's responsibility to periodically trigger work (for example via any of the CQE waiting functions) or else completions may not be delivered.Available since 6.1.
SETUP_NO_MMAP- Application provides the memory for the rings.SETUP_REGISTERED_FD_ONLY- Register the ring fd in itself for use withREGISTER_USE_REGISTERED_RING; return a registered fd index rather than an fd.SETUP_NO_SQARRAY- Removes indirection through the SQ index array.
- See Also:
-
IORING_SETUP_SQE128
public static final int IORING_SETUP_SQE128io_uring_setup()flagsEnum values:
SETUP_IOPOLL- Perform busy-waiting for an I/O completion, as opposed to getting notifications via an asynchronous IRQ (Interrupt Request).The file system (if any) and block device must support polling in order for this to work. Busy-waiting provides lower latency, but may consume more CPU resources than interrupt driven I/O. Currently, this feature is usable only on a file descriptor opened using the
O_DIRECTflag. When a read or write is submitted to a polled context, the application must poll for completions on the CQ ring by callingenter. It is illegal to mix and match polled and non-polled I/O on an io_uring instance.SETUP_SQPOLL- When this flag is specified, a kernel thread is created to perform submission queue polling.An
io_uringinstance configured in this way enables an application to issue I/O without ever context switching into the kernel. By using the submission queue to fill in new submission queue entries and watching for completions on the completion queue, the application can submit and reap I/Os without doing a single system call.If the kernel thread is idle for more than
sq_thread_idlemilliseconds, it will set theSQ_NEED_WAKEUPbit in the flags field of the structio_sq_ring. When this happens, the application must callenterto wake the kernel thread. If I/O is kept busy, the kernel thread will never sleep. An application making use of this feature will need to guard theio_uring_enter()call with the following code sequence:// Ensure that the wakeup flag is read after the tail pointer // has been written. It's important to use memory load acquire // semantics for the flags read, as otherwise the application // and the kernel might not agree on the consistency of the // wakeup flag. unsigned flags = atomic_load_relaxed(sq_ring->flags); if (flags & IORING_SQ_NEED_WAKEUP) io_uring_enter(fd, 0, 0, IORING_ENTER_SQ_WAKEUP);where
sq_ringis a submission queue ring setup using the structio_sqring_offsetsdescribed below.Before version 5.11 of the Linux kernel, to successfully use this feature, the application must register a set of files to be used for IO through
registerusing theREGISTER_FILESopcode. Failure to do so will result in submitted IO being errored withEBADF. The presence of this feature can be detected by theFEAT_SQPOLL_NONFIXEDfeature flag. In version 5.11 and later, it is no longer necessary to register files to use this feature. 5.11 also allows using this as non-root, if the user has theCAP_SYS_NICEcapability.SETUP_SQ_AFF- If this flag is specified, then the poll thread will be bound to the cpu set in thesq_thread_cpufield of the structio_uring_params. This flag is only meaningful whenSETUP_SQPOLLis specified. Whencgroupsettingcpuset.cpuschanges (typically in container environment), the bounded cpu set may be changed as well.SETUP_CQSIZE- Create the completion queue with structio_uring_params.cq_entriesentries.The value must be greater than entries, and may be rounded up to the next power-of-two.
SETUP_CLAMP- If this flag is specified, and if entries exceedsMAX_ENTRIES, then entries will be clamped atIORING_MAX_ENTRIES.If the flag
SETUP_SQPOLLis set, and if the value of structio_uring_params.cq_entriesexceedsIORING_MAX_CQ_ENTRIES, then it will be clamped atIORING_MAX_CQ_ENTRIES.SETUP_ATTACH_WQ- This flag should be set in conjunction with structio_uring_params.wq_fdbeing set to an existingio_uringring file descriptor.When set, the
io_uringinstance being created will share the asynchronous worker thread backend of the specifiedio_uringring, rather than create a new separate thread pool.SETUP_R_DISABLED- If this flag is specified, the io_uring ring starts in a disabled state.In this state, restrictions can be registered, but submissions are not allowed. See
registerfor details on how to enable the ring.Available since 5.10.
SETUP_SUBMIT_ALL- Continue submit on error.Normally io_uring stops submitting a batch of request, if one of these requests results in an error. This can cause submission of less than what is expected, if a request ends in error while being submitted. If the ring is created with this flag,
enterwill continue submitting requests even if it encounters an error submitting a request. CQEs are still posted for errored request regardless of whether or not this flag is set at ring creation time, the only difference is if the submit sequence is halted or continued when an error is observed.Available since 5.18.
SETUP_COOP_TASKRUN- Cooperative task running.By default, io_uring will interrupt a task running in userspace when a completion event comes in. This is to ensure that completions run in a timely manner. For a lot of use cases, this is overkill and can cause reduced performance from both the inter-processor interrupt used to do this, the kernel/user transition, the needless interruption of the tasks userspace activities, and reduced batching if completions come in at a rapid rate. Most applications don't need the forceful interruption, as the events are processed at any kernel/user transition. The exception are setups where the application uses multiple threads operating on the same ring, where the application waiting on completions isn't the one that submitted them. For most other use cases, setting this flag will improve performance.
Available since 5.19.
SETUP_TASKRUN_FLAG- Used in conjunction withSETUP_COOP_TASKRUN, this provides a flag,SQ_TASKRUN, which is set in the SQ ringflagswhenever completions are pending that should be processed. liburing will check for this flag even when doingpeek_cqeand enter the kernel to process them, and applications can do the same. This makesIORING_SETUP_TASKRUN_FLAGsafe to use even when applications rely on a peek style operation on the CQ ring to see if anything might be pending to reap.Available since 5.19.
SETUP_SQE128- If set, io_uring will use 128-byte SQEs rather than the normal 64-byte sized variant.This is a requirement for using certain request types, as of 5.19 only the
OP_URING_CMDpassthrough command for NVMe passthrough needs this.Available since 5.19.
SETUP_CQE32- If set, io_uring will use 32-byte CQEs rather than the normal 32-byte sized variant.This is a requirement for using certain request types, as of 5.19 only the
OP_URING_CMDpassthrough command for NVMe passthrough needs this.Available since 5.19.
SETUP_SINGLE_ISSUER- A hint to the kernel that only a single task can submit requests, which is used for internal optimisations.The kernel enforces the rule, which only affects
entercalls submitting requests and will fail them with-EEXISTif the restriction is violated. The submitter task may differ from the task that created the ring. Note that whenSETUP_SQPOLLis set it is considered that the polling task is doing all submissions on behalf of the userspace and so it always complies with the rule disregarding how many userspace tasks doio_uring_enter.Available since 5.20.
SETUP_DEFER_TASKRUN- Defer running task work to get events.By default, io_uring will process all outstanding work at the end of any system call or thread interrupt. This can delay the application from making other progress. Setting this flag will hint to io_uring that it should defer work until an
entercall with theENTER_GETEVENTSflag set. This allows the application to request work to run just before it wants to process completions. This flag requires theSETUP_SINGLE_ISSUERflag to be set, and also enforces that the call toio_uring_enteris called from the same thread that submitted requests. Note that if this flag is set then it is the application's responsibility to periodically trigger work (for example via any of the CQE waiting functions) or else completions may not be delivered.Available since 6.1.
SETUP_NO_MMAP- Application provides the memory for the rings.SETUP_REGISTERED_FD_ONLY- Register the ring fd in itself for use withREGISTER_USE_REGISTERED_RING; return a registered fd index rather than an fd.SETUP_NO_SQARRAY- Removes indirection through the SQ index array.
- See Also:
-
IORING_SETUP_CQE32
public static final int IORING_SETUP_CQE32io_uring_setup()flagsEnum values:
SETUP_IOPOLL- Perform busy-waiting for an I/O completion, as opposed to getting notifications via an asynchronous IRQ (Interrupt Request).The file system (if any) and block device must support polling in order for this to work. Busy-waiting provides lower latency, but may consume more CPU resources than interrupt driven I/O. Currently, this feature is usable only on a file descriptor opened using the
O_DIRECTflag. When a read or write is submitted to a polled context, the application must poll for completions on the CQ ring by callingenter. It is illegal to mix and match polled and non-polled I/O on an io_uring instance.SETUP_SQPOLL- When this flag is specified, a kernel thread is created to perform submission queue polling.An
io_uringinstance configured in this way enables an application to issue I/O without ever context switching into the kernel. By using the submission queue to fill in new submission queue entries and watching for completions on the completion queue, the application can submit and reap I/Os without doing a single system call.If the kernel thread is idle for more than
sq_thread_idlemilliseconds, it will set theSQ_NEED_WAKEUPbit in the flags field of the structio_sq_ring. When this happens, the application must callenterto wake the kernel thread. If I/O is kept busy, the kernel thread will never sleep. An application making use of this feature will need to guard theio_uring_enter()call with the following code sequence:// Ensure that the wakeup flag is read after the tail pointer // has been written. It's important to use memory load acquire // semantics for the flags read, as otherwise the application // and the kernel might not agree on the consistency of the // wakeup flag. unsigned flags = atomic_load_relaxed(sq_ring->flags); if (flags & IORING_SQ_NEED_WAKEUP) io_uring_enter(fd, 0, 0, IORING_ENTER_SQ_WAKEUP);where
sq_ringis a submission queue ring setup using the structio_sqring_offsetsdescribed below.Before version 5.11 of the Linux kernel, to successfully use this feature, the application must register a set of files to be used for IO through
registerusing theREGISTER_FILESopcode. Failure to do so will result in submitted IO being errored withEBADF. The presence of this feature can be detected by theFEAT_SQPOLL_NONFIXEDfeature flag. In version 5.11 and later, it is no longer necessary to register files to use this feature. 5.11 also allows using this as non-root, if the user has theCAP_SYS_NICEcapability.SETUP_SQ_AFF- If this flag is specified, then the poll thread will be bound to the cpu set in thesq_thread_cpufield of the structio_uring_params. This flag is only meaningful whenSETUP_SQPOLLis specified. Whencgroupsettingcpuset.cpuschanges (typically in container environment), the bounded cpu set may be changed as well.SETUP_CQSIZE- Create the completion queue with structio_uring_params.cq_entriesentries.The value must be greater than entries, and may be rounded up to the next power-of-two.
SETUP_CLAMP- If this flag is specified, and if entries exceedsMAX_ENTRIES, then entries will be clamped atIORING_MAX_ENTRIES.If the flag
SETUP_SQPOLLis set, and if the value of structio_uring_params.cq_entriesexceedsIORING_MAX_CQ_ENTRIES, then it will be clamped atIORING_MAX_CQ_ENTRIES.SETUP_ATTACH_WQ- This flag should be set in conjunction with structio_uring_params.wq_fdbeing set to an existingio_uringring file descriptor.When set, the
io_uringinstance being created will share the asynchronous worker thread backend of the specifiedio_uringring, rather than create a new separate thread pool.SETUP_R_DISABLED- If this flag is specified, the io_uring ring starts in a disabled state.In this state, restrictions can be registered, but submissions are not allowed. See
registerfor details on how to enable the ring.Available since 5.10.
SETUP_SUBMIT_ALL- Continue submit on error.Normally io_uring stops submitting a batch of request, if one of these requests results in an error. This can cause submission of less than what is expected, if a request ends in error while being submitted. If the ring is created with this flag,
enterwill continue submitting requests even if it encounters an error submitting a request. CQEs are still posted for errored request regardless of whether or not this flag is set at ring creation time, the only difference is if the submit sequence is halted or continued when an error is observed.Available since 5.18.
SETUP_COOP_TASKRUN- Cooperative task running.By default, io_uring will interrupt a task running in userspace when a completion event comes in. This is to ensure that completions run in a timely manner. For a lot of use cases, this is overkill and can cause reduced performance from both the inter-processor interrupt used to do this, the kernel/user transition, the needless interruption of the tasks userspace activities, and reduced batching if completions come in at a rapid rate. Most applications don't need the forceful interruption, as the events are processed at any kernel/user transition. The exception are setups where the application uses multiple threads operating on the same ring, where the application waiting on completions isn't the one that submitted them. For most other use cases, setting this flag will improve performance.
Available since 5.19.
SETUP_TASKRUN_FLAG- Used in conjunction withSETUP_COOP_TASKRUN, this provides a flag,SQ_TASKRUN, which is set in the SQ ringflagswhenever completions are pending that should be processed. liburing will check for this flag even when doingpeek_cqeand enter the kernel to process them, and applications can do the same. This makesIORING_SETUP_TASKRUN_FLAGsafe to use even when applications rely on a peek style operation on the CQ ring to see if anything might be pending to reap.Available since 5.19.
SETUP_SQE128- If set, io_uring will use 128-byte SQEs rather than the normal 64-byte sized variant.This is a requirement for using certain request types, as of 5.19 only the
OP_URING_CMDpassthrough command for NVMe passthrough needs this.Available since 5.19.
SETUP_CQE32- If set, io_uring will use 32-byte CQEs rather than the normal 32-byte sized variant.This is a requirement for using certain request types, as of 5.19 only the
OP_URING_CMDpassthrough command for NVMe passthrough needs this.Available since 5.19.
SETUP_SINGLE_ISSUER- A hint to the kernel that only a single task can submit requests, which is used for internal optimisations.The kernel enforces the rule, which only affects
entercalls submitting requests and will fail them with-EEXISTif the restriction is violated. The submitter task may differ from the task that created the ring. Note that whenSETUP_SQPOLLis set it is considered that the polling task is doing all submissions on behalf of the userspace and so it always complies with the rule disregarding how many userspace tasks doio_uring_enter.Available since 5.20.
SETUP_DEFER_TASKRUN- Defer running task work to get events.By default, io_uring will process all outstanding work at the end of any system call or thread interrupt. This can delay the application from making other progress. Setting this flag will hint to io_uring that it should defer work until an
entercall with theENTER_GETEVENTSflag set. This allows the application to request work to run just before it wants to process completions. This flag requires theSETUP_SINGLE_ISSUERflag to be set, and also enforces that the call toio_uring_enteris called from the same thread that submitted requests. Note that if this flag is set then it is the application's responsibility to periodically trigger work (for example via any of the CQE waiting functions) or else completions may not be delivered.Available since 6.1.
SETUP_NO_MMAP- Application provides the memory for the rings.SETUP_REGISTERED_FD_ONLY- Register the ring fd in itself for use withREGISTER_USE_REGISTERED_RING; return a registered fd index rather than an fd.SETUP_NO_SQARRAY- Removes indirection through the SQ index array.
- See Also:
-
IORING_SETUP_SINGLE_ISSUER
public static final int IORING_SETUP_SINGLE_ISSUERio_uring_setup()flagsEnum values:
SETUP_IOPOLL- Perform busy-waiting for an I/O completion, as opposed to getting notifications via an asynchronous IRQ (Interrupt Request).The file system (if any) and block device must support polling in order for this to work. Busy-waiting provides lower latency, but may consume more CPU resources than interrupt driven I/O. Currently, this feature is usable only on a file descriptor opened using the
O_DIRECTflag. When a read or write is submitted to a polled context, the application must poll for completions on the CQ ring by callingenter. It is illegal to mix and match polled and non-polled I/O on an io_uring instance.SETUP_SQPOLL- When this flag is specified, a kernel thread is created to perform submission queue polling.An
io_uringinstance configured in this way enables an application to issue I/O without ever context switching into the kernel. By using the submission queue to fill in new submission queue entries and watching for completions on the completion queue, the application can submit and reap I/Os without doing a single system call.If the kernel thread is idle for more than
sq_thread_idlemilliseconds, it will set theSQ_NEED_WAKEUPbit in the flags field of the structio_sq_ring. When this happens, the application must callenterto wake the kernel thread. If I/O is kept busy, the kernel thread will never sleep. An application making use of this feature will need to guard theio_uring_enter()call with the following code sequence:// Ensure that the wakeup flag is read after the tail pointer // has been written. It's important to use memory load acquire // semantics for the flags read, as otherwise the application // and the kernel might not agree on the consistency of the // wakeup flag. unsigned flags = atomic_load_relaxed(sq_ring->flags); if (flags & IORING_SQ_NEED_WAKEUP) io_uring_enter(fd, 0, 0, IORING_ENTER_SQ_WAKEUP);where
sq_ringis a submission queue ring setup using the structio_sqring_offsetsdescribed below.Before version 5.11 of the Linux kernel, to successfully use this feature, the application must register a set of files to be used for IO through
registerusing theREGISTER_FILESopcode. Failure to do so will result in submitted IO being errored withEBADF. The presence of this feature can be detected by theFEAT_SQPOLL_NONFIXEDfeature flag. In version 5.11 and later, it is no longer necessary to register files to use this feature. 5.11 also allows using this as non-root, if the user has theCAP_SYS_NICEcapability.SETUP_SQ_AFF- If this flag is specified, then the poll thread will be bound to the cpu set in thesq_thread_cpufield of the structio_uring_params. This flag is only meaningful whenSETUP_SQPOLLis specified. Whencgroupsettingcpuset.cpuschanges (typically in container environment), the bounded cpu set may be changed as well.SETUP_CQSIZE- Create the completion queue with structio_uring_params.cq_entriesentries.The value must be greater than entries, and may be rounded up to the next power-of-two.
SETUP_CLAMP- If this flag is specified, and if entries exceedsMAX_ENTRIES, then entries will be clamped atIORING_MAX_ENTRIES.If the flag
SETUP_SQPOLLis set, and if the value of structio_uring_params.cq_entriesexceedsIORING_MAX_CQ_ENTRIES, then it will be clamped atIORING_MAX_CQ_ENTRIES.SETUP_ATTACH_WQ- This flag should be set in conjunction with structio_uring_params.wq_fdbeing set to an existingio_uringring file descriptor.When set, the
io_uringinstance being created will share the asynchronous worker thread backend of the specifiedio_uringring, rather than create a new separate thread pool.SETUP_R_DISABLED- If this flag is specified, the io_uring ring starts in a disabled state.In this state, restrictions can be registered, but submissions are not allowed. See
registerfor details on how to enable the ring.Available since 5.10.
SETUP_SUBMIT_ALL- Continue submit on error.Normally io_uring stops submitting a batch of request, if one of these requests results in an error. This can cause submission of less than what is expected, if a request ends in error while being submitted. If the ring is created with this flag,
enterwill continue submitting requests even if it encounters an error submitting a request. CQEs are still posted for errored request regardless of whether or not this flag is set at ring creation time, the only difference is if the submit sequence is halted or continued when an error is observed.Available since 5.18.
SETUP_COOP_TASKRUN- Cooperative task running.By default, io_uring will interrupt a task running in userspace when a completion event comes in. This is to ensure that completions run in a timely manner. For a lot of use cases, this is overkill and can cause reduced performance from both the inter-processor interrupt used to do this, the kernel/user transition, the needless interruption of the tasks userspace activities, and reduced batching if completions come in at a rapid rate. Most applications don't need the forceful interruption, as the events are processed at any kernel/user transition. The exception are setups where the application uses multiple threads operating on the same ring, where the application waiting on completions isn't the one that submitted them. For most other use cases, setting this flag will improve performance.
Available since 5.19.
SETUP_TASKRUN_FLAG- Used in conjunction withSETUP_COOP_TASKRUN, this provides a flag,SQ_TASKRUN, which is set in the SQ ringflagswhenever completions are pending that should be processed. liburing will check for this flag even when doingpeek_cqeand enter the kernel to process them, and applications can do the same. This makesIORING_SETUP_TASKRUN_FLAGsafe to use even when applications rely on a peek style operation on the CQ ring to see if anything might be pending to reap.Available since 5.19.
SETUP_SQE128- If set, io_uring will use 128-byte SQEs rather than the normal 64-byte sized variant.This is a requirement for using certain request types, as of 5.19 only the
OP_URING_CMDpassthrough command for NVMe passthrough needs this.Available since 5.19.
SETUP_CQE32- If set, io_uring will use 32-byte CQEs rather than the normal 32-byte sized variant.This is a requirement for using certain request types, as of 5.19 only the
OP_URING_CMDpassthrough command for NVMe passthrough needs this.Available since 5.19.
SETUP_SINGLE_ISSUER- A hint to the kernel that only a single task can submit requests, which is used for internal optimisations.The kernel enforces the rule, which only affects
entercalls submitting requests and will fail them with-EEXISTif the restriction is violated. The submitter task may differ from the task that created the ring. Note that whenSETUP_SQPOLLis set it is considered that the polling task is doing all submissions on behalf of the userspace and so it always complies with the rule disregarding how many userspace tasks doio_uring_enter.Available since 5.20.
SETUP_DEFER_TASKRUN- Defer running task work to get events.By default, io_uring will process all outstanding work at the end of any system call or thread interrupt. This can delay the application from making other progress. Setting this flag will hint to io_uring that it should defer work until an
entercall with theENTER_GETEVENTSflag set. This allows the application to request work to run just before it wants to process completions. This flag requires theSETUP_SINGLE_ISSUERflag to be set, and also enforces that the call toio_uring_enteris called from the same thread that submitted requests. Note that if this flag is set then it is the application's responsibility to periodically trigger work (for example via any of the CQE waiting functions) or else completions may not be delivered.Available since 6.1.
SETUP_NO_MMAP- Application provides the memory for the rings.SETUP_REGISTERED_FD_ONLY- Register the ring fd in itself for use withREGISTER_USE_REGISTERED_RING; return a registered fd index rather than an fd.SETUP_NO_SQARRAY- Removes indirection through the SQ index array.
- See Also:
-
IORING_SETUP_DEFER_TASKRUN
public static final int IORING_SETUP_DEFER_TASKRUNio_uring_setup()flagsEnum values:
SETUP_IOPOLL- Perform busy-waiting for an I/O completion, as opposed to getting notifications via an asynchronous IRQ (Interrupt Request).The file system (if any) and block device must support polling in order for this to work. Busy-waiting provides lower latency, but may consume more CPU resources than interrupt driven I/O. Currently, this feature is usable only on a file descriptor opened using the
O_DIRECTflag. When a read or write is submitted to a polled context, the application must poll for completions on the CQ ring by callingenter. It is illegal to mix and match polled and non-polled I/O on an io_uring instance.SETUP_SQPOLL- When this flag is specified, a kernel thread is created to perform submission queue polling.An
io_uringinstance configured in this way enables an application to issue I/O without ever context switching into the kernel. By using the submission queue to fill in new submission queue entries and watching for completions on the completion queue, the application can submit and reap I/Os without doing a single system call.If the kernel thread is idle for more than
sq_thread_idlemilliseconds, it will set theSQ_NEED_WAKEUPbit in the flags field of the structio_sq_ring. When this happens, the application must callenterto wake the kernel thread. If I/O is kept busy, the kernel thread will never sleep. An application making use of this feature will need to guard theio_uring_enter()call with the following code sequence:// Ensure that the wakeup flag is read after the tail pointer // has been written. It's important to use memory load acquire // semantics for the flags read, as otherwise the application // and the kernel might not agree on the consistency of the // wakeup flag. unsigned flags = atomic_load_relaxed(sq_ring->flags); if (flags & IORING_SQ_NEED_WAKEUP) io_uring_enter(fd, 0, 0, IORING_ENTER_SQ_WAKEUP);where
sq_ringis a submission queue ring setup using the structio_sqring_offsetsdescribed below.Before version 5.11 of the Linux kernel, to successfully use this feature, the application must register a set of files to be used for IO through
registerusing theREGISTER_FILESopcode. Failure to do so will result in submitted IO being errored withEBADF. The presence of this feature can be detected by theFEAT_SQPOLL_NONFIXEDfeature flag. In version 5.11 and later, it is no longer necessary to register files to use this feature. 5.11 also allows using this as non-root, if the user has theCAP_SYS_NICEcapability.SETUP_SQ_AFF- If this flag is specified, then the poll thread will be bound to the cpu set in thesq_thread_cpufield of the structio_uring_params. This flag is only meaningful whenSETUP_SQPOLLis specified. Whencgroupsettingcpuset.cpuschanges (typically in container environment), the bounded cpu set may be changed as well.SETUP_CQSIZE- Create the completion queue with structio_uring_params.cq_entriesentries.The value must be greater than entries, and may be rounded up to the next power-of-two.
SETUP_CLAMP- If this flag is specified, and if entries exceedsMAX_ENTRIES, then entries will be clamped atIORING_MAX_ENTRIES.If the flag
SETUP_SQPOLLis set, and if the value of structio_uring_params.cq_entriesexceedsIORING_MAX_CQ_ENTRIES, then it will be clamped atIORING_MAX_CQ_ENTRIES.SETUP_ATTACH_WQ- This flag should be set in conjunction with structio_uring_params.wq_fdbeing set to an existingio_uringring file descriptor.When set, the
io_uringinstance being created will share the asynchronous worker thread backend of the specifiedio_uringring, rather than create a new separate thread pool.SETUP_R_DISABLED- If this flag is specified, the io_uring ring starts in a disabled state.In this state, restrictions can be registered, but submissions are not allowed. See
registerfor details on how to enable the ring.Available since 5.10.
SETUP_SUBMIT_ALL- Continue submit on error.Normally io_uring stops submitting a batch of request, if one of these requests results in an error. This can cause submission of less than what is expected, if a request ends in error while being submitted. If the ring is created with this flag,
enterwill continue submitting requests even if it encounters an error submitting a request. CQEs are still posted for errored request regardless of whether or not this flag is set at ring creation time, the only difference is if the submit sequence is halted or continued when an error is observed.Available since 5.18.
SETUP_COOP_TASKRUN- Cooperative task running.By default, io_uring will interrupt a task running in userspace when a completion event comes in. This is to ensure that completions run in a timely manner. For a lot of use cases, this is overkill and can cause reduced performance from both the inter-processor interrupt used to do this, the kernel/user transition, the needless interruption of the tasks userspace activities, and reduced batching if completions come in at a rapid rate. Most applications don't need the forceful interruption, as the events are processed at any kernel/user transition. The exception are setups where the application uses multiple threads operating on the same ring, where the application waiting on completions isn't the one that submitted them. For most other use cases, setting this flag will improve performance.
Available since 5.19.
SETUP_TASKRUN_FLAG- Used in conjunction withSETUP_COOP_TASKRUN, this provides a flag,SQ_TASKRUN, which is set in the SQ ringflagswhenever completions are pending that should be processed. liburing will check for this flag even when doingpeek_cqeand enter the kernel to process them, and applications can do the same. This makesIORING_SETUP_TASKRUN_FLAGsafe to use even when applications rely on a peek style operation on the CQ ring to see if anything might be pending to reap.Available since 5.19.
SETUP_SQE128- If set, io_uring will use 128-byte SQEs rather than the normal 64-byte sized variant.This is a requirement for using certain request types, as of 5.19 only the
OP_URING_CMDpassthrough command for NVMe passthrough needs this.Available since 5.19.
SETUP_CQE32- If set, io_uring will use 32-byte CQEs rather than the normal 32-byte sized variant.This is a requirement for using certain request types, as of 5.19 only the
OP_URING_CMDpassthrough command for NVMe passthrough needs this.Available since 5.19.
SETUP_SINGLE_ISSUER- A hint to the kernel that only a single task can submit requests, which is used for internal optimisations.The kernel enforces the rule, which only affects
entercalls submitting requests and will fail them with-EEXISTif the restriction is violated. The submitter task may differ from the task that created the ring. Note that whenSETUP_SQPOLLis set it is considered that the polling task is doing all submissions on behalf of the userspace and so it always complies with the rule disregarding how many userspace tasks doio_uring_enter.Available since 5.20.
SETUP_DEFER_TASKRUN- Defer running task work to get events.By default, io_uring will process all outstanding work at the end of any system call or thread interrupt. This can delay the application from making other progress. Setting this flag will hint to io_uring that it should defer work until an
entercall with theENTER_GETEVENTSflag set. This allows the application to request work to run just before it wants to process completions. This flag requires theSETUP_SINGLE_ISSUERflag to be set, and also enforces that the call toio_uring_enteris called from the same thread that submitted requests. Note that if this flag is set then it is the application's responsibility to periodically trigger work (for example via any of the CQE waiting functions) or else completions may not be delivered.Available since 6.1.
SETUP_NO_MMAP- Application provides the memory for the rings.SETUP_REGISTERED_FD_ONLY- Register the ring fd in itself for use withREGISTER_USE_REGISTERED_RING; return a registered fd index rather than an fd.SETUP_NO_SQARRAY- Removes indirection through the SQ index array.
- See Also:
-
IORING_SETUP_NO_MMAP
public static final int IORING_SETUP_NO_MMAPio_uring_setup()flagsEnum values:
SETUP_IOPOLL- Perform busy-waiting for an I/O completion, as opposed to getting notifications via an asynchronous IRQ (Interrupt Request).The file system (if any) and block device must support polling in order for this to work. Busy-waiting provides lower latency, but may consume more CPU resources than interrupt driven I/O. Currently, this feature is usable only on a file descriptor opened using the
O_DIRECTflag. When a read or write is submitted to a polled context, the application must poll for completions on the CQ ring by callingenter. It is illegal to mix and match polled and non-polled I/O on an io_uring instance.SETUP_SQPOLL- When this flag is specified, a kernel thread is created to perform submission queue polling.An
io_uringinstance configured in this way enables an application to issue I/O without ever context switching into the kernel. By using the submission queue to fill in new submission queue entries and watching for completions on the completion queue, the application can submit and reap I/Os without doing a single system call.If the kernel thread is idle for more than
sq_thread_idlemilliseconds, it will set theSQ_NEED_WAKEUPbit in the flags field of the structio_sq_ring. When this happens, the application must callenterto wake the kernel thread. If I/O is kept busy, the kernel thread will never sleep. An application making use of this feature will need to guard theio_uring_enter()call with the following code sequence:// Ensure that the wakeup flag is read after the tail pointer // has been written. It's important to use memory load acquire // semantics for the flags read, as otherwise the application // and the kernel might not agree on the consistency of the // wakeup flag. unsigned flags = atomic_load_relaxed(sq_ring->flags); if (flags & IORING_SQ_NEED_WAKEUP) io_uring_enter(fd, 0, 0, IORING_ENTER_SQ_WAKEUP);where
sq_ringis a submission queue ring setup using the structio_sqring_offsetsdescribed below.Before version 5.11 of the Linux kernel, to successfully use this feature, the application must register a set of files to be used for IO through
registerusing theREGISTER_FILESopcode. Failure to do so will result in submitted IO being errored withEBADF. The presence of this feature can be detected by theFEAT_SQPOLL_NONFIXEDfeature flag. In version 5.11 and later, it is no longer necessary to register files to use this feature. 5.11 also allows using this as non-root, if the user has theCAP_SYS_NICEcapability.SETUP_SQ_AFF- If this flag is specified, then the poll thread will be bound to the cpu set in thesq_thread_cpufield of the structio_uring_params. This flag is only meaningful whenSETUP_SQPOLLis specified. Whencgroupsettingcpuset.cpuschanges (typically in container environment), the bounded cpu set may be changed as well.SETUP_CQSIZE- Create the completion queue with structio_uring_params.cq_entriesentries.The value must be greater than entries, and may be rounded up to the next power-of-two.
SETUP_CLAMP- If this flag is specified, and if entries exceedsMAX_ENTRIES, then entries will be clamped atIORING_MAX_ENTRIES.If the flag
SETUP_SQPOLLis set, and if the value of structio_uring_params.cq_entriesexceedsIORING_MAX_CQ_ENTRIES, then it will be clamped atIORING_MAX_CQ_ENTRIES.SETUP_ATTACH_WQ- This flag should be set in conjunction with structio_uring_params.wq_fdbeing set to an existingio_uringring file descriptor.When set, the
io_uringinstance being created will share the asynchronous worker thread backend of the specifiedio_uringring, rather than create a new separate thread pool.SETUP_R_DISABLED- If this flag is specified, the io_uring ring starts in a disabled state.In this state, restrictions can be registered, but submissions are not allowed. See
registerfor details on how to enable the ring.Available since 5.10.
SETUP_SUBMIT_ALL- Continue submit on error.Normally io_uring stops submitting a batch of request, if one of these requests results in an error. This can cause submission of less than what is expected, if a request ends in error while being submitted. If the ring is created with this flag,
enterwill continue submitting requests even if it encounters an error submitting a request. CQEs are still posted for errored request regardless of whether or not this flag is set at ring creation time, the only difference is if the submit sequence is halted or continued when an error is observed.Available since 5.18.
SETUP_COOP_TASKRUN- Cooperative task running.By default, io_uring will interrupt a task running in userspace when a completion event comes in. This is to ensure that completions run in a timely manner. For a lot of use cases, this is overkill and can cause reduced performance from both the inter-processor interrupt used to do this, the kernel/user transition, the needless interruption of the tasks userspace activities, and reduced batching if completions come in at a rapid rate. Most applications don't need the forceful interruption, as the events are processed at any kernel/user transition. The exception are setups where the application uses multiple threads operating on the same ring, where the application waiting on completions isn't the one that submitted them. For most other use cases, setting this flag will improve performance.
Available since 5.19.
SETUP_TASKRUN_FLAG- Used in conjunction withSETUP_COOP_TASKRUN, this provides a flag,SQ_TASKRUN, which is set in the SQ ringflagswhenever completions are pending that should be processed. liburing will check for this flag even when doingpeek_cqeand enter the kernel to process them, and applications can do the same. This makesIORING_SETUP_TASKRUN_FLAGsafe to use even when applications rely on a peek style operation on the CQ ring to see if anything might be pending to reap.Available since 5.19.
SETUP_SQE128- If set, io_uring will use 128-byte SQEs rather than the normal 64-byte sized variant.This is a requirement for using certain request types, as of 5.19 only the
OP_URING_CMDpassthrough command for NVMe passthrough needs this.Available since 5.19.
SETUP_CQE32- If set, io_uring will use 32-byte CQEs rather than the normal 32-byte sized variant.This is a requirement for using certain request types, as of 5.19 only the
OP_URING_CMDpassthrough command for NVMe passthrough needs this.Available since 5.19.
SETUP_SINGLE_ISSUER- A hint to the kernel that only a single task can submit requests, which is used for internal optimisations.The kernel enforces the rule, which only affects
entercalls submitting requests and will fail them with-EEXISTif the restriction is violated. The submitter task may differ from the task that created the ring. Note that whenSETUP_SQPOLLis set it is considered that the polling task is doing all submissions on behalf of the userspace and so it always complies with the rule disregarding how many userspace tasks doio_uring_enter.Available since 5.20.
SETUP_DEFER_TASKRUN- Defer running task work to get events.By default, io_uring will process all outstanding work at the end of any system call or thread interrupt. This can delay the application from making other progress. Setting this flag will hint to io_uring that it should defer work until an
entercall with theENTER_GETEVENTSflag set. This allows the application to request work to run just before it wants to process completions. This flag requires theSETUP_SINGLE_ISSUERflag to be set, and also enforces that the call toio_uring_enteris called from the same thread that submitted requests. Note that if this flag is set then it is the application's responsibility to periodically trigger work (for example via any of the CQE waiting functions) or else completions may not be delivered.Available since 6.1.
SETUP_NO_MMAP- Application provides the memory for the rings.SETUP_REGISTERED_FD_ONLY- Register the ring fd in itself for use withREGISTER_USE_REGISTERED_RING; return a registered fd index rather than an fd.SETUP_NO_SQARRAY- Removes indirection through the SQ index array.
- See Also:
-
IORING_SETUP_REGISTERED_FD_ONLY
public static final int IORING_SETUP_REGISTERED_FD_ONLYio_uring_setup()flagsEnum values:
SETUP_IOPOLL- Perform busy-waiting for an I/O completion, as opposed to getting notifications via an asynchronous IRQ (Interrupt Request).The file system (if any) and block device must support polling in order for this to work. Busy-waiting provides lower latency, but may consume more CPU resources than interrupt driven I/O. Currently, this feature is usable only on a file descriptor opened using the
O_DIRECTflag. When a read or write is submitted to a polled context, the application must poll for completions on the CQ ring by callingenter. It is illegal to mix and match polled and non-polled I/O on an io_uring instance.SETUP_SQPOLL- When this flag is specified, a kernel thread is created to perform submission queue polling.An
io_uringinstance configured in this way enables an application to issue I/O without ever context switching into the kernel. By using the submission queue to fill in new submission queue entries and watching for completions on the completion queue, the application can submit and reap I/Os without doing a single system call.If the kernel thread is idle for more than
sq_thread_idlemilliseconds, it will set theSQ_NEED_WAKEUPbit in the flags field of the structio_sq_ring. When this happens, the application must callenterto wake the kernel thread. If I/O is kept busy, the kernel thread will never sleep. An application making use of this feature will need to guard theio_uring_enter()call with the following code sequence:// Ensure that the wakeup flag is read after the tail pointer // has been written. It's important to use memory load acquire // semantics for the flags read, as otherwise the application // and the kernel might not agree on the consistency of the // wakeup flag. unsigned flags = atomic_load_relaxed(sq_ring->flags); if (flags & IORING_SQ_NEED_WAKEUP) io_uring_enter(fd, 0, 0, IORING_ENTER_SQ_WAKEUP);where
sq_ringis a submission queue ring setup using the structio_sqring_offsetsdescribed below.Before version 5.11 of the Linux kernel, to successfully use this feature, the application must register a set of files to be used for IO through
registerusing theREGISTER_FILESopcode. Failure to do so will result in submitted IO being errored withEBADF. The presence of this feature can be detected by theFEAT_SQPOLL_NONFIXEDfeature flag. In version 5.11 and later, it is no longer necessary to register files to use this feature. 5.11 also allows using this as non-root, if the user has theCAP_SYS_NICEcapability.SETUP_SQ_AFF- If this flag is specified, then the poll thread will be bound to the cpu set in thesq_thread_cpufield of the structio_uring_params. This flag is only meaningful whenSETUP_SQPOLLis specified. Whencgroupsettingcpuset.cpuschanges (typically in container environment), the bounded cpu set may be changed as well.SETUP_CQSIZE- Create the completion queue with structio_uring_params.cq_entriesentries.The value must be greater than entries, and may be rounded up to the next power-of-two.
SETUP_CLAMP- If this flag is specified, and if entries exceedsMAX_ENTRIES, then entries will be clamped atIORING_MAX_ENTRIES.If the flag
SETUP_SQPOLLis set, and if the value of structio_uring_params.cq_entriesexceedsIORING_MAX_CQ_ENTRIES, then it will be clamped atIORING_MAX_CQ_ENTRIES.SETUP_ATTACH_WQ- This flag should be set in conjunction with structio_uring_params.wq_fdbeing set to an existingio_uringring file descriptor.When set, the
io_uringinstance being created will share the asynchronous worker thread backend of the specifiedio_uringring, rather than create a new separate thread pool.SETUP_R_DISABLED- If this flag is specified, the io_uring ring starts in a disabled state.In this state, restrictions can be registered, but submissions are not allowed. See
registerfor details on how to enable the ring.Available since 5.10.
SETUP_SUBMIT_ALL- Continue submit on error.Normally io_uring stops submitting a batch of request, if one of these requests results in an error. This can cause submission of less than what is expected, if a request ends in error while being submitted. If the ring is created with this flag,
enterwill continue submitting requests even if it encounters an error submitting a request. CQEs are still posted for errored request regardless of whether or not this flag is set at ring creation time, the only difference is if the submit sequence is halted or continued when an error is observed.Available since 5.18.
SETUP_COOP_TASKRUN- Cooperative task running.By default, io_uring will interrupt a task running in userspace when a completion event comes in. This is to ensure that completions run in a timely manner. For a lot of use cases, this is overkill and can cause reduced performance from both the inter-processor interrupt used to do this, the kernel/user transition, the needless interruption of the tasks userspace activities, and reduced batching if completions come in at a rapid rate. Most applications don't need the forceful interruption, as the events are processed at any kernel/user transition. The exception are setups where the application uses multiple threads operating on the same ring, where the application waiting on completions isn't the one that submitted them. For most other use cases, setting this flag will improve performance.
Available since 5.19.
SETUP_TASKRUN_FLAG- Used in conjunction withSETUP_COOP_TASKRUN, this provides a flag,SQ_TASKRUN, which is set in the SQ ringflagswhenever completions are pending that should be processed. liburing will check for this flag even when doingpeek_cqeand enter the kernel to process them, and applications can do the same. This makesIORING_SETUP_TASKRUN_FLAGsafe to use even when applications rely on a peek style operation on the CQ ring to see if anything might be pending to reap.Available since 5.19.
SETUP_SQE128- If set, io_uring will use 128-byte SQEs rather than the normal 64-byte sized variant.This is a requirement for using certain request types, as of 5.19 only the
OP_URING_CMDpassthrough command for NVMe passthrough needs this.Available since 5.19.
SETUP_CQE32- If set, io_uring will use 32-byte CQEs rather than the normal 32-byte sized variant.This is a requirement for using certain request types, as of 5.19 only the
OP_URING_CMDpassthrough command for NVMe passthrough needs this.Available since 5.19.
SETUP_SINGLE_ISSUER- A hint to the kernel that only a single task can submit requests, which is used for internal optimisations.The kernel enforces the rule, which only affects
entercalls submitting requests and will fail them with-EEXISTif the restriction is violated. The submitter task may differ from the task that created the ring. Note that whenSETUP_SQPOLLis set it is considered that the polling task is doing all submissions on behalf of the userspace and so it always complies with the rule disregarding how many userspace tasks doio_uring_enter.Available since 5.20.
SETUP_DEFER_TASKRUN- Defer running task work to get events.By default, io_uring will process all outstanding work at the end of any system call or thread interrupt. This can delay the application from making other progress. Setting this flag will hint to io_uring that it should defer work until an
entercall with theENTER_GETEVENTSflag set. This allows the application to request work to run just before it wants to process completions. This flag requires theSETUP_SINGLE_ISSUERflag to be set, and also enforces that the call toio_uring_enteris called from the same thread that submitted requests. Note that if this flag is set then it is the application's responsibility to periodically trigger work (for example via any of the CQE waiting functions) or else completions may not be delivered.Available since 6.1.
SETUP_NO_MMAP- Application provides the memory for the rings.SETUP_REGISTERED_FD_ONLY- Register the ring fd in itself for use withREGISTER_USE_REGISTERED_RING; return a registered fd index rather than an fd.SETUP_NO_SQARRAY- Removes indirection through the SQ index array.
- See Also:
-
IORING_SETUP_NO_SQARRAY
public static final int IORING_SETUP_NO_SQARRAYio_uring_setup()flagsEnum values:
SETUP_IOPOLL- Perform busy-waiting for an I/O completion, as opposed to getting notifications via an asynchronous IRQ (Interrupt Request).The file system (if any) and block device must support polling in order for this to work. Busy-waiting provides lower latency, but may consume more CPU resources than interrupt driven I/O. Currently, this feature is usable only on a file descriptor opened using the
O_DIRECTflag. When a read or write is submitted to a polled context, the application must poll for completions on the CQ ring by callingenter. It is illegal to mix and match polled and non-polled I/O on an io_uring instance.SETUP_SQPOLL- When this flag is specified, a kernel thread is created to perform submission queue polling.An
io_uringinstance configured in this way enables an application to issue I/O without ever context switching into the kernel. By using the submission queue to fill in new submission queue entries and watching for completions on the completion queue, the application can submit and reap I/Os without doing a single system call.If the kernel thread is idle for more than
sq_thread_idlemilliseconds, it will set theSQ_NEED_WAKEUPbit in the flags field of the structio_sq_ring. When this happens, the application must callenterto wake the kernel thread. If I/O is kept busy, the kernel thread will never sleep. An application making use of this feature will need to guard theio_uring_enter()call with the following code sequence:// Ensure that the wakeup flag is read after the tail pointer // has been written. It's important to use memory load acquire // semantics for the flags read, as otherwise the application // and the kernel might not agree on the consistency of the // wakeup flag. unsigned flags = atomic_load_relaxed(sq_ring->flags); if (flags & IORING_SQ_NEED_WAKEUP) io_uring_enter(fd, 0, 0, IORING_ENTER_SQ_WAKEUP);where
sq_ringis a submission queue ring setup using the structio_sqring_offsetsdescribed below.Before version 5.11 of the Linux kernel, to successfully use this feature, the application must register a set of files to be used for IO through
registerusing theREGISTER_FILESopcode. Failure to do so will result in submitted IO being errored withEBADF. The presence of this feature can be detected by theFEAT_SQPOLL_NONFIXEDfeature flag. In version 5.11 and later, it is no longer necessary to register files to use this feature. 5.11 also allows using this as non-root, if the user has theCAP_SYS_NICEcapability.SETUP_SQ_AFF- If this flag is specified, then the poll thread will be bound to the cpu set in thesq_thread_cpufield of the structio_uring_params. This flag is only meaningful whenSETUP_SQPOLLis specified. Whencgroupsettingcpuset.cpuschanges (typically in container environment), the bounded cpu set may be changed as well.SETUP_CQSIZE- Create the completion queue with structio_uring_params.cq_entriesentries.The value must be greater than entries, and may be rounded up to the next power-of-two.
SETUP_CLAMP- If this flag is specified, and if entries exceedsMAX_ENTRIES, then entries will be clamped atIORING_MAX_ENTRIES.If the flag
SETUP_SQPOLLis set, and if the value of structio_uring_params.cq_entriesexceedsIORING_MAX_CQ_ENTRIES, then it will be clamped atIORING_MAX_CQ_ENTRIES.SETUP_ATTACH_WQ- This flag should be set in conjunction with structio_uring_params.wq_fdbeing set to an existingio_uringring file descriptor.When set, the
io_uringinstance being created will share the asynchronous worker thread backend of the specifiedio_uringring, rather than create a new separate thread pool.SETUP_R_DISABLED- If this flag is specified, the io_uring ring starts in a disabled state.In this state, restrictions can be registered, but submissions are not allowed. See
registerfor details on how to enable the ring.Available since 5.10.
SETUP_SUBMIT_ALL- Continue submit on error.Normally io_uring stops submitting a batch of request, if one of these requests results in an error. This can cause submission of less than what is expected, if a request ends in error while being submitted. If the ring is created with this flag,
enterwill continue submitting requests even if it encounters an error submitting a request. CQEs are still posted for errored request regardless of whether or not this flag is set at ring creation time, the only difference is if the submit sequence is halted or continued when an error is observed.Available since 5.18.
SETUP_COOP_TASKRUN- Cooperative task running.By default, io_uring will interrupt a task running in userspace when a completion event comes in. This is to ensure that completions run in a timely manner. For a lot of use cases, this is overkill and can cause reduced performance from both the inter-processor interrupt used to do this, the kernel/user transition, the needless interruption of the tasks userspace activities, and reduced batching if completions come in at a rapid rate. Most applications don't need the forceful interruption, as the events are processed at any kernel/user transition. The exception are setups where the application uses multiple threads operating on the same ring, where the application waiting on completions isn't the one that submitted them. For most other use cases, setting this flag will improve performance.
Available since 5.19.
SETUP_TASKRUN_FLAG- Used in conjunction withSETUP_COOP_TASKRUN, this provides a flag,SQ_TASKRUN, which is set in the SQ ringflagswhenever completions are pending that should be processed. liburing will check for this flag even when doingpeek_cqeand enter the kernel to process them, and applications can do the same. This makesIORING_SETUP_TASKRUN_FLAGsafe to use even when applications rely on a peek style operation on the CQ ring to see if anything might be pending to reap.Available since 5.19.
SETUP_SQE128- If set, io_uring will use 128-byte SQEs rather than the normal 64-byte sized variant.This is a requirement for using certain request types, as of 5.19 only the
OP_URING_CMDpassthrough command for NVMe passthrough needs this.Available since 5.19.
SETUP_CQE32- If set, io_uring will use 32-byte CQEs rather than the normal 32-byte sized variant.This is a requirement for using certain request types, as of 5.19 only the
OP_URING_CMDpassthrough command for NVMe passthrough needs this.Available since 5.19.
SETUP_SINGLE_ISSUER- A hint to the kernel that only a single task can submit requests, which is used for internal optimisations.The kernel enforces the rule, which only affects
entercalls submitting requests and will fail them with-EEXISTif the restriction is violated. The submitter task may differ from the task that created the ring. Note that whenSETUP_SQPOLLis set it is considered that the polling task is doing all submissions on behalf of the userspace and so it always complies with the rule disregarding how many userspace tasks doio_uring_enter.Available since 5.20.
SETUP_DEFER_TASKRUN- Defer running task work to get events.By default, io_uring will process all outstanding work at the end of any system call or thread interrupt. This can delay the application from making other progress. Setting this flag will hint to io_uring that it should defer work until an
entercall with theENTER_GETEVENTSflag set. This allows the application to request work to run just before it wants to process completions. This flag requires theSETUP_SINGLE_ISSUERflag to be set, and also enforces that the call toio_uring_enteris called from the same thread that submitted requests. Note that if this flag is set then it is the application's responsibility to periodically trigger work (for example via any of the CQE waiting functions) or else completions may not be delivered.Available since 6.1.
SETUP_NO_MMAP- Application provides the memory for the rings.SETUP_REGISTERED_FD_ONLY- Register the ring fd in itself for use withREGISTER_USE_REGISTERED_RING; return a registered fd index rather than an fd.SETUP_NO_SQARRAY- Removes indirection through the SQ index array.
- See Also:
-
IORING_OP_NOP
public static final byte IORING_OP_NOPio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_READV
public static final byte IORING_OP_READVio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_WRITEV
public static final byte IORING_OP_WRITEVio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_FSYNC
public static final byte IORING_OP_FSYNCio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_READ_FIXED
public static final byte IORING_OP_READ_FIXEDio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_WRITE_FIXED
public static final byte IORING_OP_WRITE_FIXEDio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_POLL_ADD
public static final byte IORING_OP_POLL_ADDio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_POLL_REMOVE
public static final byte IORING_OP_POLL_REMOVEio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_SYNC_FILE_RANGE
public static final byte IORING_OP_SYNC_FILE_RANGEio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_SENDMSG
public static final byte IORING_OP_SENDMSGio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_RECVMSG
public static final byte IORING_OP_RECVMSGio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_TIMEOUT
public static final byte IORING_OP_TIMEOUTio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_TIMEOUT_REMOVE
public static final byte IORING_OP_TIMEOUT_REMOVEio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_ACCEPT
public static final byte IORING_OP_ACCEPTio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_ASYNC_CANCEL
public static final byte IORING_OP_ASYNC_CANCELio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_LINK_TIMEOUT
public static final byte IORING_OP_LINK_TIMEOUTio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_CONNECT
public static final byte IORING_OP_CONNECTio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_FALLOCATE
public static final byte IORING_OP_FALLOCATEio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_OPENAT
public static final byte IORING_OP_OPENATio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_CLOSE
public static final byte IORING_OP_CLOSEio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_FILES_UPDATE
public static final byte IORING_OP_FILES_UPDATEio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_STATX
public static final byte IORING_OP_STATXio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_READ
public static final byte IORING_OP_READio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_WRITE
public static final byte IORING_OP_WRITEio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_FADVISE
public static final byte IORING_OP_FADVISEio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_MADVISE
public static final byte IORING_OP_MADVISEio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_SEND
public static final byte IORING_OP_SENDio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_RECV
public static final byte IORING_OP_RECVio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_OPENAT2
public static final byte IORING_OP_OPENAT2io_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_EPOLL_CTL
public static final byte IORING_OP_EPOLL_CTLio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_SPLICE
public static final byte IORING_OP_SPLICEio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_PROVIDE_BUFFERS
public static final byte IORING_OP_PROVIDE_BUFFERSio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_REMOVE_BUFFERS
public static final byte IORING_OP_REMOVE_BUFFERSio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_TEE
public static final byte IORING_OP_TEEio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_SHUTDOWN
public static final byte IORING_OP_SHUTDOWNio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_RENAMEAT
public static final byte IORING_OP_RENAMEATio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_UNLINKAT
public static final byte IORING_OP_UNLINKATio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_MKDIRAT
public static final byte IORING_OP_MKDIRATio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_SYMLINKAT
public static final byte IORING_OP_SYMLINKATio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_LINKAT
public static final byte IORING_OP_LINKATio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_MSG_RING
public static final byte IORING_OP_MSG_RINGio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_FSETXATTR
public static final byte IORING_OP_FSETXATTRio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_SETXATTR
public static final byte IORING_OP_SETXATTRio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_FGETXATTR
public static final byte IORING_OP_FGETXATTRio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_GETXATTR
public static final byte IORING_OP_GETXATTRio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_SOCKET
public static final byte IORING_OP_SOCKETio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_URING_CMD
public static final byte IORING_OP_URING_CMDio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_SEND_ZC
public static final byte IORING_OP_SEND_ZCio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_SENDMSG_ZC
public static final byte IORING_OP_SENDMSG_ZCio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_READ_MULTISHOT
public static final byte IORING_OP_READ_MULTISHOTio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_WAITID
public static final byte IORING_OP_WAITIDio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_FUTEX_WAIT
public static final byte IORING_OP_FUTEX_WAITio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_FUTEX_WAKE
public static final byte IORING_OP_FUTEX_WAKEio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_FUTEX_WAITV
public static final byte IORING_OP_FUTEX_WAITVio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_FIXED_FD_INSTALL
public static final byte IORING_OP_FIXED_FD_INSTALLio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_FTRUNCATE
public static final byte IORING_OP_FTRUNCATEio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_BIND
public static final byte IORING_OP_BINDio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_LISTEN
public static final byte IORING_OP_LISTENio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_OP_LAST
public static final byte IORING_OP_LASTio_uring_opEnum values:
OP_NOP- Do not perform any I/O. This is useful for testing the performance of theio_uringimplementation itself.OP_READV- Vectored read operation, similar topreadv2(2). If the file is not seekable,offmust be set to zero.OP_WRITEV- Vectored write operation, similar topwritev2(2). If the file is not seekable,offmust be set to zero.OP_FSYNC- File sync. See alsofsync(2).Note that, while I/O is initiated in the order in which it appears inthe submission queue, completions are unordered. For example, an application which places a write I/O followed by an fsync in the submission queue cannot expect the fsync to apply to the write. The two operations execute in parallel, so the fsync may complete before the write is issued to the storage. The same is also true for previously issued writes that have not completed prior to the fsync.
OP_READ_FIXED- Read from pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed reads.OP_WRITE_FIXED- Write to pre-mapped buffers. Seeregisterfor details on how to setup a context for fixed writes.OP_POLL_ADD- Poll thefdspecified in the submission queue entry for the events specified in thepoll_eventsfield.Unlike poll or epoll without
EPOLLONESHOT, by default this interface always works in one shot mode. That is, once the poll operation is completed, it will have to be resubmitted.If
POLL_ADD_MULTIis set in the SQElenfield, then the poll will work in multi shot mode instead. That means it'll repatedly trigger when the requested event becomes true, and hence multiple CQEs can be generated from this single SQE. The CQEflagsfield will haveCQE_F_MOREset on completion if the application should expect further CQE entries from the original request. If this flag isn't set on completion, then the poll request has been terminated and no further events will be generated. This mode is available since 5.13.If
POLL_UPDATE_EVENTSis set in the SQElenfield, then the request will update an existing poll request with the mask of events passed in with this request. The lookup is based on theuser_datafield of the original SQE submitted, and this values is passed in theaddrfield of the SQE. This mode is available since 5.13.If
POLL_UPDATE_USER_DATAis set in the SQElenfield, then the request will update theuser_dataof an existing poll request based on the value passed in theofffield. This mode is available since 5.13.This command works like an
asyncpoll(2)and the completion event result is the returned mask of events. For the variants that updateuser_dataorevents, the completion result will be similar toOP_POLL_REMOVE.OP_POLL_REMOVE- Remove an existing poll request.If found, the
resfield of the structio_uring_cqewill contain 0. If not found,reswill contain-ENOENT, or-EALREADYif the poll request was in the process of completing already.OP_SYNC_FILE_RANGE- Issue the equivalent of async_file_range(2)on the file descriptor.The
fdfield is the file descriptor to sync, theofffield holds the offset in bytes, thelenfield holds the length in bytes, and thesync_range_flagsfield holds the flags for the command. See alsosync_file_range(2)for the general description of the related system call.Available since 5.2.
OP_SENDMSG- Issue the equivalent of asendmsg(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to themsghdrstructure, andmsg_flagsholds the flags associated with the system call. See alsosendmsg(2)for the general description of the related system call.Available since 5.3.
OP_RECVMSG- Works just likeOP_SENDMSG, except forrecvmsg(2)instead. See the description ofIORING_OP_SENDMSG.Available since 5.3.
OP_TIMEOUT- This command will register a timeout operation.The
addrfield must contain a pointer to astruct timespec64structure,lenmust contain 1 to signify onetimespec64structure,timeout_flagsmay containTIMEOUT_ABSfor an absolute timeout value, or 0 for a relative timeout.offmay contain a completion event count. A timeout will trigger a wakeup event on the completion ring for anyone waiting for events. A timeout condition is met when either the specified timeout expires, or the specified number of events have completed. Either condition will trigger the event. If set to 0, completed events are not counted, which effectively acts like a timer.io_uringtimeouts use theCLOCK_MONOTONICclock source. The request will complete with-ETIMEif the timeout got completed through expiration of the timer, or 0 if the timeout got completed through requests completing on their own. If the timeout was cancelled before it expired, the request will complete with-ECANCELED.Available since 5.4.
Since 5.15, this command also supports the following modifiers in
timeout_flags:TIMEOUT_BOOTTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC. This clock source differs in that it includes time elapsed if the system was suspend while having a timeout request in-flight.TIMEOUT_REALTIME: If set, then the clock source used isCLOCK_BOOTTIMEinstead ofCLOCK_MONOTONIC.
OP_TIMEOUT_REMOVE- Iftimeout_flagsare zero, then it attempts to remove an existing timeout operation.addrmust contain theuser_datafield of the previously issued timeout operation. If the specified timeout request is found and cancelled successfully, this request will terminate with a result value of 0. If the timeout request was found but expiration was already in progress, this request will terminate with a result value of-EBUSY. If the timeout request wasn't found, the request will terminate with a result value of-ENOENT.Available since 5.5.
If
timeout_flagscontainTIMEOUT_UPDATE, instead of removing an existing operation, it updates it.addrand return values are same as before.addr2field must contain a pointer to astruct timespec64structure.timeout_flagsmay also containTIMEOUT_ABS, in which case the value given is an absolute one, not a relative one.Available since 5.11.
OP_ACCEPT- Issue the equivalent of anaccept4(2)system call.fdmust be set to the socket file descriptor,addrmust contain the pointer to thesockaddrstructure, andaddr2must contain a pointer to thesocklen_taddrlenfield. Flags can be passed using theaccept_flagsfield. See alsoaccept4(2)for the general description of the related system call.Available since 5.5.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If the index points to a valid empty slot, the installation is guaranteed to not fail. If there is already a file in the slot, it will be replaced, similar toOP_FILES_UPDATE. Please note that onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.5.
OP_ASYNC_CANCEL- Attempt to cancel an already issued request.addrmust contain theuser_datafield of the request that should be cancelled. The cancellation request will complete with one of the following results codes. If found, theresfield of the cqe will contain 0. If not found,reswill contain-ENOENT. If found and attempted cancelled, theresfield will contain-EALREADY. In this case, the request may or may not terminate. In general, requests that are interruptible (like socket IO) will get cancelled, while disk IO requests cannot be cancelled if already started.Available since 5.5.
OP_LINK_TIMEOUT- This request must be linked with another request throughIOSQE_IO_LINKwhich is described below.Unlike
OP_TIMEOUT,IORING_OP_LINK_TIMEOUTacts on the linked request, not the completion queue. The format of the command is otherwise likeIORING_OP_TIMEOUT, except there's no completion event count as it's tied to a specific request. If used, the timeout specified in the command will cancel the linked command, unless the linked command completes before the timeout. The timeout will complete with-ETIMEif the timer expired and the linked request was attempted cancelled, or-ECANCELEDif the timer got cancelled because of completion of the linked request. LikeIORING_OP_TIMEOUTthe clock source used isCLOCK_MONOTONIC.Available since 5.5.
OP_CONNECT- Issue the equivalent of aconnect(2)system call.fdmust be set to the socket file descriptor,addrmust contain the const pointer to thesockaddrstructure, andoffmust contain thesocklen_taddrlenfield. See alsoconnect(2)for the general description of the related system call.Available since 5.5.
OP_FALLOCATE- Issue the equivalent of afallocate(2)system call.fdmust be set to the file descriptor,lenmust contain the mode associated with the operation,offmust contain the offset on which to operate, andaddrmust contain the length. See alsofallocate(2)for the general description of the related system call.Available since 5.6.
OP_OPENAT- Issue the equivalent of aopenat(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,open_flagsshould contain any flags passed in, andlenis access mode of the file. See alsoopenat(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_CLOSE- Issue the equivalent of aclose(2)system call.fdis the file descriptor to be closed. See alsoclose(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, this command can be used to close files that were direct opened throughOP_OPENAT,OP_OPENAT2, orOP_ACCEPTusing theio_uringspecific direct descriptors. Note that only one of the descriptor fields may be set. The direct close feature is available since the 5.15 kernel, where direct descriptors were introduced.OP_FILES_UPDATE- This command is an alternative to usingREGISTER_FILES_UPDATEwhich then works in an async fashion, like the rest of theio_uringcommands.The arguments passed in are the same.
addrmust contain a pointer to the array of file descriptors,lenmust contain the length of the array, andoffmust contain the offset at which to operate. Note that the array of file descriptors pointed to inaddrmust remain valid until this operation has completed.Available since 5.6.
OP_STATX- Issue the equivalent of astatx(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnamestring,statx_flagsis theflagsargument,lenshould be themaskargument, andoffmust contain a pointer to thestatxbufto be filled in. See alsostatx(2)for the general description of the related system call.Available since 5.6.
OP_READ- Issue the equivalent of apread(2)orpwrite(2)system call.fdis the file descriptor to be operated on,addrcontains the buffer in question,lencontains the length of the IO operation, andoffscontains the read or write offset. Iffddoes not refer to a seekable file,offmust be set to zero. Ifoffsis set to -1, the offset will use (and advance) the file position, like theread(2)andwrite(2)system calls. These are non-vectored versions of theOP_READVandOP_WRITEVopcodes. See alsoread(2)andwrite(2)for the general description of the related system call.Available since 5.6.
OP_WRITE- SeeOP_READ.OP_FADVISE- Issue the equivalent of aposix_fadvise(2)system call.fdmust be set to the file descriptor,offmust contain the offset on which to operate,lenmust contain the length, andfadvise_advicemust contain the advice associated with the operation. See alsoposix_fadvise(2)for the general description of the related system call.Available since 5.6.
OP_MADVISE- Issue the equivalent of amadvise(2)system call.addrmust contain the address to operate on,lenmust contain the length on which to operate, andfadvise_advicemust contain the advice associated with the operation. See alsomadvise(2)for the general description of the related system call.Available since 5.6.
OP_SEND- Issue the equivalent of asend(2)system call.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. See alsosend(2)for the general description of the related system call.Available since 5.6.
OP_RECV- Works just likeOP_SEND, except forrecv(2)instead. See the description ofIORING_OP_SEND.Available since 5.6.
OP_OPENAT2- Issue the equivalent of aopenat2(2)system call.fdis thedirfdargument,addrmust contain a pointer to the*pathnameargument,lenshould contain the size of theopen_howstructure, andoffshould be set to the address of theopen_howstructure. See alsoopenat2(2)for the general description of the related system call.Available since 5.6.
If the
file_indexfield is set to a positive number, the file won't be installed into the normal file table as usual but will be placed into the fixed file table at indexfile_index - 1. In this case, instead of returning a file descriptor, the result will contain either 0 on success or an error. If there is already a file registered at this index, the request will fail with-EBADF. Onlyio_uringhas access to such files and no other syscall can use them. SeeIOSQE_FIXED_FILEandREGISTER_FILES.Available since 5.15.
OP_EPOLL_CTL- Add, remove or modify entries in the interest list ofepoll(7). Seeepoll_ctl(2)for details of the system call.fdholds the file descriptor that represents the epoll instance,addrholds the file descriptor to add, remove or modify,lenholds the operation (EPOLL_CTL_ADD,EPOLL_CTL_DEL,EPOLL_CTL_MOD) to perform and,offholds a pointer to theepoll_eventsstructure.Available since 5.6.
OP_SPLICE- Issue the equivalent of asplice(2)system call.splice_fd_inis the file descriptor to read from,splice_off_inis an offset to read from,fdis the file descriptor to write to,offis an offset from which to start writing to. A sentinel value of -1 is used to pass the equivalent of aNULLfor the offsets tosplice(2).lencontains the number of bytes to copy.splice_flagscontains a bit mask for the flag field associated with the system call. Please note that one of the file descriptors must refer to a pipe. See alsosplice(2)for the general description of the related system call.Available since 5.7.
OP_PROVIDE_BUFFERS- This command allows an application to register a group of buffers to be used by commands that read/receive data.Using buffers in this manner can eliminate the need to separate the poll + read, which provides a convenient point in time to allocate a buffer for a given request. It's often infeasible to have as many buffers available as pending reads or receive. With this feature, the application can have its pool of buffers ready in the kernel, and when the file or socket is ready to read/receive data, a buffer can be selected for the operation.
fdmust contain the number of buffers to provide,addrmust contain the starting address to add buffers from,lenmust contain the length of each buffer to add from the range,buf_groupmust contain the group ID of this range of buffers, andoffmust contain the starting buffer ID of this range of buffers. With that set, the kernel adds buffers starting with the memory address inaddr, each with a length oflen. Hence the application should providelen * fdworth of memory inaddr. Buffers are grouped by the group ID, and each buffer within this group will be identical in size according to the above arguments. This allows the application to provide different groups of buffers, and this is often used to have differently sized buffers available depending on what the expectations are of the individual request. When submitting a request that should use a provided buffer, theIOSQE_BUFFER_SELECTflag must be set, andbuf_groupmust be set to the desired buffer group ID where the buffer should be selected from.Available since 5.7.
OP_REMOVE_BUFFERS- Remove buffers previously registered withOP_PROVIDE_BUFFERS.fdmust contain the number of buffers to remove, andbuf_groupmust contain the buffer group ID from which to remove the buffers.Available since 5.7.
OP_TEE- Issue the equivalent of atee(2)system call.splice_fd_inis the file descriptor to read from,fdis the file descriptor to write to,lencontains the number of bytes to copy, andsplice_flagscontains a bit mask for the flag field associated with the system call. Please note that both of the file descriptors must refer to a pipe. See alsotee(2)for the general description of the related system call.Available since 5.8.
OP_SHUTDOWN- Issue the equivalent of ashutdown(2)system call.fdis the file descriptor to the socket being shutdown andlenmust be set to thehowargument. No other fields should be set.Available since 5.11.
OP_RENAMEAT- Issue the equivalent of arenameat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addrshould be set to theoldpath,addr2should be set to thenewpath, and finallyrename_flagsshould be set to theflagspassed in torenameat2(2).Available since 5.11.
OP_UNLINKAT- Issue the equivalent of aunlinkat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andunlink_flagsshould be set to theflagsbeing passed in tounlinkat(2).Available since 5.11.
OP_MKDIRAT- Issue the equivalent of amkdirat2(2)system call.fdshould be set to thedirfd,addrshould be set to thepathname, andlenshould be set to themodebeing passed in tomkdirat(2).Available since 5.15.
OP_SYMLINKAT- Issue the equivalent of asymlinkat2(2)system call.fdshould be set to thenewdirfd,addrshould be set to thetargetandaddr2should be set to thelinkpathbeing passed in tosymlinkat(2).Available since 5.15.
OP_LINKAT- Issue the equivalent of alinkat2(2)system call.fdshould be set to theolddirfd,addrshould be set to theoldpath,lenshould be set to thenewdirfd,addr2should be set to thenewpath, andhardlink_flagsshould be set to theflagsbeing passed intolinkat(2).Available since 5.15.
OP_MSG_RING- Send a message to an io_uring.fdmust be set to a file descriptor of a ring that the application has access to,lencan be set to any 32-bit value that the application wishes to pass on, andoffshould be set any 64-bit value that the application wishes to send. On the target ring, a CQE will be posted with theresfield matching thelenset, and auser_datafield matching theoffvalue being passed in. This request type can be used to either just wake or interrupt anyone waiting for completions on the target ring, or it can be used to pass messages via the two fields.Available since 5.18.
OP_FSETXATTROP_SETXATTROP_FGETXATTROP_GETXATTROP_SOCKETOP_URING_CMDOP_SEND_ZC- Issue the zerocopy equivalent of asend(2)system call.Similar to
OP_SEND, but tries to avoid making intermediate copies of data. Zerocopy execution is not guaranteed and it may fall back to copying.The
flagsfield of the first"struct io_uring_cqe"may likely containCQE_F_MORE, which means that there will be a second completion event / notification for the request, with theuser_datafield set to the same value. The user must not modify the data buffer until the notification is posted. The first cqe follows the usual rules and so itsresfield will contain the number of bytes sent or a negative error code. The notification'sresfield will be set to zero and theflagsfield will containCQE_F_NOTIF. The two step model is needed because the kernel may hold on to buffers for a long time, e.g. waiting for a TCP ACK, and having a separate cqe for request completions allows userspace to push more data without extra delays. Note, notifications are only responsible for controlling the lifetime of the buffers, and as such don't mean anything about whether the data has atually been sent out or received by the other end.fdmust be set to the socket file descriptor,addrmust contain a pointer to the buffer,lendenotes the length of the buffer to send, andmsg_flagsholds the flags associated with the system call. Whenaddr2is non-zero it points to the address of the target withaddr_lenspecifying its size, turning the request into asendto(2)system call equivalent.Available since 6.0.
OP_SENDMSG_ZCOP_READ_MULTISHOTOP_WAITIDOP_FUTEX_WAITOP_FUTEX_WAKEOP_FUTEX_WAITVOP_FIXED_FD_INSTALLOP_FTRUNCATEOP_BINDOP_LISTENOP_LAST
- See Also:
-
IORING_URING_CMD_FIXED
public static final int IORING_URING_CMD_FIXEDUse registered buffer; pass this flag along with settingsqe->buf_index.- See Also:
-
IORING_URING_CMD_MASK
public static final int IORING_URING_CMD_MASK- See Also:
-
IORING_FSYNC_DATASYNC
public static final int IORING_FSYNC_DATASYNCsqe->fsync_flags- See Also:
-
IORING_TIMEOUT_ABS
public static final int IORING_TIMEOUT_ABSsqe->timeout_flagsEnum values:
- See Also:
-
IORING_TIMEOUT_UPDATE
public static final int IORING_TIMEOUT_UPDATEsqe->timeout_flagsEnum values:
- See Also:
-
IORING_TIMEOUT_BOOTTIME
public static final int IORING_TIMEOUT_BOOTTIMEsqe->timeout_flagsEnum values:
- See Also:
-
IORING_TIMEOUT_REALTIME
public static final int IORING_TIMEOUT_REALTIMEsqe->timeout_flagsEnum values:
- See Also:
-
IORING_LINK_TIMEOUT_UPDATE
public static final int IORING_LINK_TIMEOUT_UPDATEsqe->timeout_flagsEnum values:
- See Also:
-
IORING_TIMEOUT_ETIME_SUCCESS
public static final int IORING_TIMEOUT_ETIME_SUCCESSsqe->timeout_flagsEnum values:
- See Also:
-
IORING_TIMEOUT_MULTISHOT
public static final int IORING_TIMEOUT_MULTISHOTsqe->timeout_flagsEnum values:
- See Also:
-
IORING_TIMEOUT_CLOCK_MASK
public static final int IORING_TIMEOUT_CLOCK_MASKsqe->timeout_flagsEnum values:
- See Also:
-
IORING_TIMEOUT_UPDATE_MASK
public static final int IORING_TIMEOUT_UPDATE_MASKsqe->timeout_flagsEnum values:
- See Also:
-
IORING_SPLICE_F_FD_IN_FIXED
public static final int IORING_SPLICE_F_FD_IN_FIXEDsqe->splice_flags, extendssplice(2)flags- See Also:
-
IORING_POLL_ADD_MULTI
public static final int IORING_POLL_ADD_MULTIPOLL_ADDflags. Note that sincesqe->poll_eventsis the flag space, the command flags forPOLL_ADDare stored insqe->len.IORING_POLL_UPDATE: Update existing poll request, matchingsqe->addras the olduser_datafield.IORING_POLL_LEVEL: Level triggered poll.Enum values:
POLL_ADD_MULTI- Multishot poll. SetsIORING_CQE_F_MOREif the poll handler will continue to report CQEs on behalf of the same SQE.POLL_UPDATE_EVENTSPOLL_UPDATE_USER_DATAPOLL_ADD_LEVEL
- See Also:
-
IORING_POLL_UPDATE_EVENTS
public static final int IORING_POLL_UPDATE_EVENTSPOLL_ADDflags. Note that sincesqe->poll_eventsis the flag space, the command flags forPOLL_ADDare stored insqe->len.IORING_POLL_UPDATE: Update existing poll request, matchingsqe->addras the olduser_datafield.IORING_POLL_LEVEL: Level triggered poll.Enum values:
POLL_ADD_MULTI- Multishot poll. SetsIORING_CQE_F_MOREif the poll handler will continue to report CQEs on behalf of the same SQE.POLL_UPDATE_EVENTSPOLL_UPDATE_USER_DATAPOLL_ADD_LEVEL
- See Also:
-
IORING_POLL_UPDATE_USER_DATA
public static final int IORING_POLL_UPDATE_USER_DATAPOLL_ADDflags. Note that sincesqe->poll_eventsis the flag space, the command flags forPOLL_ADDare stored insqe->len.IORING_POLL_UPDATE: Update existing poll request, matchingsqe->addras the olduser_datafield.IORING_POLL_LEVEL: Level triggered poll.Enum values:
POLL_ADD_MULTI- Multishot poll. SetsIORING_CQE_F_MOREif the poll handler will continue to report CQEs on behalf of the same SQE.POLL_UPDATE_EVENTSPOLL_UPDATE_USER_DATAPOLL_ADD_LEVEL
- See Also:
-
IORING_POLL_ADD_LEVEL
public static final int IORING_POLL_ADD_LEVELPOLL_ADDflags. Note that sincesqe->poll_eventsis the flag space, the command flags forPOLL_ADDare stored insqe->len.IORING_POLL_UPDATE: Update existing poll request, matchingsqe->addras the olduser_datafield.IORING_POLL_LEVEL: Level triggered poll.Enum values:
POLL_ADD_MULTI- Multishot poll. SetsIORING_CQE_F_MOREif the poll handler will continue to report CQEs on behalf of the same SQE.POLL_UPDATE_EVENTSPOLL_UPDATE_USER_DATAPOLL_ADD_LEVEL
- See Also:
-
IORING_ASYNC_CANCEL_ALL
public static final int IORING_ASYNC_CANCEL_ALLASYNC_CANCELflags.Enum values:
ASYNC_CANCEL_ALL- Cancel all requests that match the given criteria, rather than just canceling the first one found.Available since 5.19.
ASYNC_CANCEL_FD- Match based on the file descriptor used in the original request rather than theuser_data.This is what
prep_cancel_fdsets up.Available since 5.19.
ASYNC_CANCEL_ANY- Match any request in the ring, regardless ofuser_dataor file descriptor.Can be used to cancel any pending request in the ring.
Available since 5.19.
ASYNC_CANCEL_FD_FIXED-fdpassed in is a fixed descriptorASYNC_CANCEL_USERDATA- Match onuser_data, default for no other keyASYNC_CANCEL_OP- Match request based onopcode
- See Also:
-
IORING_ASYNC_CANCEL_FD
public static final int IORING_ASYNC_CANCEL_FDASYNC_CANCELflags.Enum values:
ASYNC_CANCEL_ALL- Cancel all requests that match the given criteria, rather than just canceling the first one found.Available since 5.19.
ASYNC_CANCEL_FD- Match based on the file descriptor used in the original request rather than theuser_data.This is what
prep_cancel_fdsets up.Available since 5.19.
ASYNC_CANCEL_ANY- Match any request in the ring, regardless ofuser_dataor file descriptor.Can be used to cancel any pending request in the ring.
Available since 5.19.
ASYNC_CANCEL_FD_FIXED-fdpassed in is a fixed descriptorASYNC_CANCEL_USERDATA- Match onuser_data, default for no other keyASYNC_CANCEL_OP- Match request based onopcode
- See Also:
-
IORING_ASYNC_CANCEL_ANY
public static final int IORING_ASYNC_CANCEL_ANYASYNC_CANCELflags.Enum values:
ASYNC_CANCEL_ALL- Cancel all requests that match the given criteria, rather than just canceling the first one found.Available since 5.19.
ASYNC_CANCEL_FD- Match based on the file descriptor used in the original request rather than theuser_data.This is what
prep_cancel_fdsets up.Available since 5.19.
ASYNC_CANCEL_ANY- Match any request in the ring, regardless ofuser_dataor file descriptor.Can be used to cancel any pending request in the ring.
Available since 5.19.
ASYNC_CANCEL_FD_FIXED-fdpassed in is a fixed descriptorASYNC_CANCEL_USERDATA- Match onuser_data, default for no other keyASYNC_CANCEL_OP- Match request based onopcode
- See Also:
-
IORING_ASYNC_CANCEL_FD_FIXED
public static final int IORING_ASYNC_CANCEL_FD_FIXEDASYNC_CANCELflags.Enum values:
ASYNC_CANCEL_ALL- Cancel all requests that match the given criteria, rather than just canceling the first one found.Available since 5.19.
ASYNC_CANCEL_FD- Match based on the file descriptor used in the original request rather than theuser_data.This is what
prep_cancel_fdsets up.Available since 5.19.
ASYNC_CANCEL_ANY- Match any request in the ring, regardless ofuser_dataor file descriptor.Can be used to cancel any pending request in the ring.
Available since 5.19.
ASYNC_CANCEL_FD_FIXED-fdpassed in is a fixed descriptorASYNC_CANCEL_USERDATA- Match onuser_data, default for no other keyASYNC_CANCEL_OP- Match request based onopcode
- See Also:
-
IORING_ASYNC_CANCEL_USERDATA
public static final int IORING_ASYNC_CANCEL_USERDATAASYNC_CANCELflags.Enum values:
ASYNC_CANCEL_ALL- Cancel all requests that match the given criteria, rather than just canceling the first one found.Available since 5.19.
ASYNC_CANCEL_FD- Match based on the file descriptor used in the original request rather than theuser_data.This is what
prep_cancel_fdsets up.Available since 5.19.
ASYNC_CANCEL_ANY- Match any request in the ring, regardless ofuser_dataor file descriptor.Can be used to cancel any pending request in the ring.
Available since 5.19.
ASYNC_CANCEL_FD_FIXED-fdpassed in is a fixed descriptorASYNC_CANCEL_USERDATA- Match onuser_data, default for no other keyASYNC_CANCEL_OP- Match request based onopcode
- See Also:
-
IORING_ASYNC_CANCEL_OP
public static final int IORING_ASYNC_CANCEL_OPASYNC_CANCELflags.Enum values:
ASYNC_CANCEL_ALL- Cancel all requests that match the given criteria, rather than just canceling the first one found.Available since 5.19.
ASYNC_CANCEL_FD- Match based on the file descriptor used in the original request rather than theuser_data.This is what
prep_cancel_fdsets up.Available since 5.19.
ASYNC_CANCEL_ANY- Match any request in the ring, regardless ofuser_dataor file descriptor.Can be used to cancel any pending request in the ring.
Available since 5.19.
ASYNC_CANCEL_FD_FIXED-fdpassed in is a fixed descriptorASYNC_CANCEL_USERDATA- Match onuser_data, default for no other keyASYNC_CANCEL_OP- Match request based onopcode
- See Also:
-
IORING_RECVSEND_POLL_FIRST
public static final int IORING_RECVSEND_POLL_FIRSTsend/sendmsgandrecv/recvmsgflags (sqe->ioprio)Enum values:
RECVSEND_POLL_FIRST- If set, io_uring will assume the socket is currently empty and attempting to receive data will be unsuccessful.For this case, io_uring will arm internal poll and trigger a receive of the data when the socket has data to be read. This initial receive attempt can be wasteful for the case where the socket is expected to be empty, setting this flag will bypass the initial receive attempt and go straight to arming poll. If poll does indicate that data is ready to be received, the operation will proceed.
Can be used with the CQE
CQE_F_SOCK_NONEMPTYflag, which io_uring will set on CQEs after arecv(2)orrecvmsg(2)operation. If set, the socket still had data to be read after the operation completed.Both these flags are available since 5.19.
RECV_MULTISHOT- Multishotrecv.Sets
CQE_F_MOREif the handler will continue to report CQEs on behalf of the same SQE.RECVSEND_FIXED_BUF- Use registered buffers, the index is stored in thebuf_indexfield.SEND_ZC_REPORT_USAGE- If set,SEND[MSG]_ZCshould report the zerocopy usage incqe.resfor theCQE_F_NOTIFcqe.0 is reported if zerocopy was actually possible.
NOTIF_USAGE_ZC_COPIEDif data was copied (at least partially).RECVSEND_BUNDLE- Used withIOSQE_BUFFER_SELECT.If set,
sendorrecvwill grab as many buffers from the buffer group ID given and send them all. The completion result will be the number of buffers send, with the starting buffer ID incqe->flagsas per usual for provided buffer usage. The buffers will be contiguous from the starting buffer ID.
- See Also:
-
IORING_RECV_MULTISHOT
public static final int IORING_RECV_MULTISHOTsend/sendmsgandrecv/recvmsgflags (sqe->ioprio)Enum values:
RECVSEND_POLL_FIRST- If set, io_uring will assume the socket is currently empty and attempting to receive data will be unsuccessful.For this case, io_uring will arm internal poll and trigger a receive of the data when the socket has data to be read. This initial receive attempt can be wasteful for the case where the socket is expected to be empty, setting this flag will bypass the initial receive attempt and go straight to arming poll. If poll does indicate that data is ready to be received, the operation will proceed.
Can be used with the CQE
CQE_F_SOCK_NONEMPTYflag, which io_uring will set on CQEs after arecv(2)orrecvmsg(2)operation. If set, the socket still had data to be read after the operation completed.Both these flags are available since 5.19.
RECV_MULTISHOT- Multishotrecv.Sets
CQE_F_MOREif the handler will continue to report CQEs on behalf of the same SQE.RECVSEND_FIXED_BUF- Use registered buffers, the index is stored in thebuf_indexfield.SEND_ZC_REPORT_USAGE- If set,SEND[MSG]_ZCshould report the zerocopy usage incqe.resfor theCQE_F_NOTIFcqe.0 is reported if zerocopy was actually possible.
NOTIF_USAGE_ZC_COPIEDif data was copied (at least partially).RECVSEND_BUNDLE- Used withIOSQE_BUFFER_SELECT.If set,
sendorrecvwill grab as many buffers from the buffer group ID given and send them all. The completion result will be the number of buffers send, with the starting buffer ID incqe->flagsas per usual for provided buffer usage. The buffers will be contiguous from the starting buffer ID.
- See Also:
-
IORING_RECVSEND_FIXED_BUF
public static final int IORING_RECVSEND_FIXED_BUFsend/sendmsgandrecv/recvmsgflags (sqe->ioprio)Enum values:
RECVSEND_POLL_FIRST- If set, io_uring will assume the socket is currently empty and attempting to receive data will be unsuccessful.For this case, io_uring will arm internal poll and trigger a receive of the data when the socket has data to be read. This initial receive attempt can be wasteful for the case where the socket is expected to be empty, setting this flag will bypass the initial receive attempt and go straight to arming poll. If poll does indicate that data is ready to be received, the operation will proceed.
Can be used with the CQE
CQE_F_SOCK_NONEMPTYflag, which io_uring will set on CQEs after arecv(2)orrecvmsg(2)operation. If set, the socket still had data to be read after the operation completed.Both these flags are available since 5.19.
RECV_MULTISHOT- Multishotrecv.Sets
CQE_F_MOREif the handler will continue to report CQEs on behalf of the same SQE.RECVSEND_FIXED_BUF- Use registered buffers, the index is stored in thebuf_indexfield.SEND_ZC_REPORT_USAGE- If set,SEND[MSG]_ZCshould report the zerocopy usage incqe.resfor theCQE_F_NOTIFcqe.0 is reported if zerocopy was actually possible.
NOTIF_USAGE_ZC_COPIEDif data was copied (at least partially).RECVSEND_BUNDLE- Used withIOSQE_BUFFER_SELECT.If set,
sendorrecvwill grab as many buffers from the buffer group ID given and send them all. The completion result will be the number of buffers send, with the starting buffer ID incqe->flagsas per usual for provided buffer usage. The buffers will be contiguous from the starting buffer ID.
- See Also:
-
IORING_SEND_ZC_REPORT_USAGE
public static final int IORING_SEND_ZC_REPORT_USAGEsend/sendmsgandrecv/recvmsgflags (sqe->ioprio)Enum values:
RECVSEND_POLL_FIRST- If set, io_uring will assume the socket is currently empty and attempting to receive data will be unsuccessful.For this case, io_uring will arm internal poll and trigger a receive of the data when the socket has data to be read. This initial receive attempt can be wasteful for the case where the socket is expected to be empty, setting this flag will bypass the initial receive attempt and go straight to arming poll. If poll does indicate that data is ready to be received, the operation will proceed.
Can be used with the CQE
CQE_F_SOCK_NONEMPTYflag, which io_uring will set on CQEs after arecv(2)orrecvmsg(2)operation. If set, the socket still had data to be read after the operation completed.Both these flags are available since 5.19.
RECV_MULTISHOT- Multishotrecv.Sets
CQE_F_MOREif the handler will continue to report CQEs on behalf of the same SQE.RECVSEND_FIXED_BUF- Use registered buffers, the index is stored in thebuf_indexfield.SEND_ZC_REPORT_USAGE- If set,SEND[MSG]_ZCshould report the zerocopy usage incqe.resfor theCQE_F_NOTIFcqe.0 is reported if zerocopy was actually possible.
NOTIF_USAGE_ZC_COPIEDif data was copied (at least partially).RECVSEND_BUNDLE- Used withIOSQE_BUFFER_SELECT.If set,
sendorrecvwill grab as many buffers from the buffer group ID given and send them all. The completion result will be the number of buffers send, with the starting buffer ID incqe->flagsas per usual for provided buffer usage. The buffers will be contiguous from the starting buffer ID.
- See Also:
-
IORING_RECVSEND_BUNDLE
public static final int IORING_RECVSEND_BUNDLEsend/sendmsgandrecv/recvmsgflags (sqe->ioprio)Enum values:
RECVSEND_POLL_FIRST- If set, io_uring will assume the socket is currently empty and attempting to receive data will be unsuccessful.For this case, io_uring will arm internal poll and trigger a receive of the data when the socket has data to be read. This initial receive attempt can be wasteful for the case where the socket is expected to be empty, setting this flag will bypass the initial receive attempt and go straight to arming poll. If poll does indicate that data is ready to be received, the operation will proceed.
Can be used with the CQE
CQE_F_SOCK_NONEMPTYflag, which io_uring will set on CQEs after arecv(2)orrecvmsg(2)operation. If set, the socket still had data to be read after the operation completed.Both these flags are available since 5.19.
RECV_MULTISHOT- Multishotrecv.Sets
CQE_F_MOREif the handler will continue to report CQEs on behalf of the same SQE.RECVSEND_FIXED_BUF- Use registered buffers, the index is stored in thebuf_indexfield.SEND_ZC_REPORT_USAGE- If set,SEND[MSG]_ZCshould report the zerocopy usage incqe.resfor theCQE_F_NOTIFcqe.0 is reported if zerocopy was actually possible.
NOTIF_USAGE_ZC_COPIEDif data was copied (at least partially).RECVSEND_BUNDLE- Used withIOSQE_BUFFER_SELECT.If set,
sendorrecvwill grab as many buffers from the buffer group ID given and send them all. The completion result will be the number of buffers send, with the starting buffer ID incqe->flagsas per usual for provided buffer usage. The buffers will be contiguous from the starting buffer ID.
- See Also:
-
IORING_NOTIF_USAGE_ZC_COPIED
public static final int IORING_NOTIF_USAGE_ZC_COPIED- See Also:
-
IORING_ACCEPT_MULTISHOT
public static final int IORING_ACCEPT_MULTISHOT- See Also:
-
IORING_ACCEPT_DONTWAIT
public static final int IORING_ACCEPT_DONTWAIT- See Also:
-
IORING_ACCEPT_POLL_FIRST
public static final int IORING_ACCEPT_POLL_FIRST- See Also:
-
IORING_MSG_DATA
public static final int IORING_MSG_DATAio_uring_msg_ring_flagsEnum values:
MSG_DATA- passsqe->lenasresandoffasuser_dataMSG_SEND_FD- send a registered fd to another ring
- See Also:
-
IORING_MSG_SEND_FD
public static final int IORING_MSG_SEND_FDio_uring_msg_ring_flagsEnum values:
MSG_DATA- passsqe->lenasresandoffasuser_dataMSG_SEND_FD- send a registered fd to another ring
- See Also:
-
IORING_MSG_RING_CQE_SKIP
public static final int IORING_MSG_RING_CQE_SKIPOP_MSG_RINGflags (sqe->msg_ring_flags)Enum values:
MSG_RING_CQE_SKIP- Don't post a CQE to the target ring. Not applicable forMSG_DATA, obviously.MSG_RING_FLAGS_PASS
- See Also:
-
IORING_MSG_RING_FLAGS_PASS
public static final int IORING_MSG_RING_FLAGS_PASSOP_MSG_RINGflags (sqe->msg_ring_flags)Enum values:
MSG_RING_CQE_SKIP- Don't post a CQE to the target ring. Not applicable forMSG_DATA, obviously.MSG_RING_FLAGS_PASS
- See Also:
-
IORING_FIXED_FD_NO_CLOEXEC
public static final int IORING_FIXED_FD_NO_CLOEXECOP_FIXED_FD_INSTALLflags (sqe->install_fd_flags)Enum values:
FIXED_FD_NO_CLOEXEC- Don't mark the fd asO_CLOEXEC.
- See Also:
-
IORING_NOP_INJECT_RESULT
public static final int IORING_NOP_INJECT_RESULT- See Also:
-
IORING_CQE_F_BUFFER
public static final int IORING_CQE_F_BUFFERcqe->flagsEnum values:
CQE_F_BUFFER- If set, the upper 16 bits are the buffer IDCQE_F_MORE- If set, parent SQE will generate more CQE entriesCQE_F_SOCK_NONEMPTY- If set, more data to read after socketrecv.CQE_F_NOTIF- Set for notification CQEs. Can be used to distinct them from sends.CQE_F_BUF_MORE- If set, the buffer ID set in the completion will get more completions.In other words, the buffer is being partially consumed, and will be used by the kernel for more completions. This is only set for buffers used via the incremental buffer consumption, as provided by a ring buffer setup with
IOU_PBUF_RING_INC. For any other provided buffer type, all completions with a buffer passed back is automatically returned to the application.
- See Also:
-
IORING_CQE_F_MORE
public static final int IORING_CQE_F_MOREcqe->flagsEnum values:
CQE_F_BUFFER- If set, the upper 16 bits are the buffer IDCQE_F_MORE- If set, parent SQE will generate more CQE entriesCQE_F_SOCK_NONEMPTY- If set, more data to read after socketrecv.CQE_F_NOTIF- Set for notification CQEs. Can be used to distinct them from sends.CQE_F_BUF_MORE- If set, the buffer ID set in the completion will get more completions.In other words, the buffer is being partially consumed, and will be used by the kernel for more completions. This is only set for buffers used via the incremental buffer consumption, as provided by a ring buffer setup with
IOU_PBUF_RING_INC. For any other provided buffer type, all completions with a buffer passed back is automatically returned to the application.
- See Also:
-
IORING_CQE_F_SOCK_NONEMPTY
public static final int IORING_CQE_F_SOCK_NONEMPTYcqe->flagsEnum values:
CQE_F_BUFFER- If set, the upper 16 bits are the buffer IDCQE_F_MORE- If set, parent SQE will generate more CQE entriesCQE_F_SOCK_NONEMPTY- If set, more data to read after socketrecv.CQE_F_NOTIF- Set for notification CQEs. Can be used to distinct them from sends.CQE_F_BUF_MORE- If set, the buffer ID set in the completion will get more completions.In other words, the buffer is being partially consumed, and will be used by the kernel for more completions. This is only set for buffers used via the incremental buffer consumption, as provided by a ring buffer setup with
IOU_PBUF_RING_INC. For any other provided buffer type, all completions with a buffer passed back is automatically returned to the application.
- See Also:
-
IORING_CQE_F_NOTIF
public static final int IORING_CQE_F_NOTIFcqe->flagsEnum values:
CQE_F_BUFFER- If set, the upper 16 bits are the buffer IDCQE_F_MORE- If set, parent SQE will generate more CQE entriesCQE_F_SOCK_NONEMPTY- If set, more data to read after socketrecv.CQE_F_NOTIF- Set for notification CQEs. Can be used to distinct them from sends.CQE_F_BUF_MORE- If set, the buffer ID set in the completion will get more completions.In other words, the buffer is being partially consumed, and will be used by the kernel for more completions. This is only set for buffers used via the incremental buffer consumption, as provided by a ring buffer setup with
IOU_PBUF_RING_INC. For any other provided buffer type, all completions with a buffer passed back is automatically returned to the application.
- See Also:
-
IORING_CQE_F_BUF_MORE
public static final int IORING_CQE_F_BUF_MOREcqe->flagsEnum values:
CQE_F_BUFFER- If set, the upper 16 bits are the buffer IDCQE_F_MORE- If set, parent SQE will generate more CQE entriesCQE_F_SOCK_NONEMPTY- If set, more data to read after socketrecv.CQE_F_NOTIF- Set for notification CQEs. Can be used to distinct them from sends.CQE_F_BUF_MORE- If set, the buffer ID set in the completion will get more completions.In other words, the buffer is being partially consumed, and will be used by the kernel for more completions. This is only set for buffers used via the incremental buffer consumption, as provided by a ring buffer setup with
IOU_PBUF_RING_INC. For any other provided buffer type, all completions with a buffer passed back is automatically returned to the application.
- See Also:
-
IORING_CQE_BUFFER_SHIFT
public static final int IORING_CQE_BUFFER_SHIFT- See Also:
-
IORING_OFF_SQ_RING
public static final long IORING_OFF_SQ_RINGMagic offsets for the application tommapthe data it needs- See Also:
-
IORING_OFF_CQ_RING
public static final long IORING_OFF_CQ_RINGMagic offsets for the application tommapthe data it needs- See Also:
-
IORING_OFF_SQES
public static final long IORING_OFF_SQESMagic offsets for the application tommapthe data it needs- See Also:
-
IORING_OFF_PBUF_RING
public static final long IORING_OFF_PBUF_RINGMagic offsets for the application tommapthe data it needs- See Also:
-
IORING_OFF_PBUF_SHIFT
public static final long IORING_OFF_PBUF_SHIFTMagic offsets for the application tommapthe data it needs- See Also:
-
IORING_OFF_MMAP_MASK
public static final long IORING_OFF_MMAP_MASKMagic offsets for the application tommapthe data it needs- See Also:
-
IORING_SQ_NEED_WAKEUP
public static final int IORING_SQ_NEED_WAKEUPsq_ring->flagsEnum values:
SQ_NEED_WAKEUP- needsio_uring_enterwakeupSQ_CQ_OVERFLOW- CQ ring is overflownSQ_TASKRUN- task should enter the kernel
- See Also:
-
IORING_SQ_CQ_OVERFLOW
public static final int IORING_SQ_CQ_OVERFLOWsq_ring->flagsEnum values:
SQ_NEED_WAKEUP- needsio_uring_enterwakeupSQ_CQ_OVERFLOW- CQ ring is overflownSQ_TASKRUN- task should enter the kernel
- See Also:
-
IORING_SQ_TASKRUN
public static final int IORING_SQ_TASKRUNsq_ring->flagsEnum values:
SQ_NEED_WAKEUP- needsio_uring_enterwakeupSQ_CQ_OVERFLOW- CQ ring is overflownSQ_TASKRUN- task should enter the kernel
- See Also:
-
IORING_CQ_EVENTFD_DISABLED
public static final int IORING_CQ_EVENTFD_DISABLED- See Also:
-
IORING_ENTER_GETEVENTS
public static final int IORING_ENTER_GETEVENTSio_uring_enter(2)flagsEnum values:
ENTER_GETEVENTS- If this flag is set, then the system call will wait for the specificied number of events inmin_completebefore returning.This flag can be set along with
to_submitto both submit and complete events in a single system call.ENTER_SQ_WAKEUP- If the ring has been created withSETUP_SQPOLL, then this flag asks the kernel to wakeup the SQ kernel thread to submit IO.ENTER_SQ_WAIT- If the ring has been created withSETUP_SQPOLL, then the application has no real insight into when the SQ kernel thread has consumed entries from the SQ ring. This can lead to a situation where the application can no longer get a free SQE entry to submit, without knowing when it one becomes available as the SQ kernel thread consumes them. If the system call is used with this flag set, then it will wait until at least one entry is free in the SQ ring.ENTER_EXT_ARG- Since kernel 5.11, the system calls arguments have been modified to look like the following:int io_uring_enter(unsigned int fd, unsigned int to_submit, unsigned int min_complete, unsigned int flags, const void *arg, size_t argsz);which is behaves just like the original definition by default. However, if
IORING_ENTER_EXT_ARGis set, then instead of asigset_tbeing passed in, a pointer to a structio_uring_getevents_argis used instead andargszmust be set to the size of this structure.The definition is
IOURingGeteventsArgwhich allows passing in both a signal mask as well as pointer to a struct__kernel_timespectimeout value. Iftsis set to a valid pointer, then this time value indicates the timeout for waiting on events. If an application is waiting on events and wishes to stop waiting after a specified amount of time, then this can be accomplished directly in version 5.11 and newer by using this feature.ENTER_REGISTERED_RING- If the ring file descriptor has been registered through use ofREGISTER_RING_FDS, then setting this flag will tell the kernel that thering_fdpassed in is the registered ring offset rather than a normal file descriptor.ENTER_ABS_TIMER
- See Also:
-
IORING_ENTER_SQ_WAKEUP
public static final int IORING_ENTER_SQ_WAKEUPio_uring_enter(2)flagsEnum values:
ENTER_GETEVENTS- If this flag is set, then the system call will wait for the specificied number of events inmin_completebefore returning.This flag can be set along with
to_submitto both submit and complete events in a single system call.ENTER_SQ_WAKEUP- If the ring has been created withSETUP_SQPOLL, then this flag asks the kernel to wakeup the SQ kernel thread to submit IO.ENTER_SQ_WAIT- If the ring has been created withSETUP_SQPOLL, then the application has no real insight into when the SQ kernel thread has consumed entries from the SQ ring. This can lead to a situation where the application can no longer get a free SQE entry to submit, without knowing when it one becomes available as the SQ kernel thread consumes them. If the system call is used with this flag set, then it will wait until at least one entry is free in the SQ ring.ENTER_EXT_ARG- Since kernel 5.11, the system calls arguments have been modified to look like the following:int io_uring_enter(unsigned int fd, unsigned int to_submit, unsigned int min_complete, unsigned int flags, const void *arg, size_t argsz);which is behaves just like the original definition by default. However, if
IORING_ENTER_EXT_ARGis set, then instead of asigset_tbeing passed in, a pointer to a structio_uring_getevents_argis used instead andargszmust be set to the size of this structure.The definition is
IOURingGeteventsArgwhich allows passing in both a signal mask as well as pointer to a struct__kernel_timespectimeout value. Iftsis set to a valid pointer, then this time value indicates the timeout for waiting on events. If an application is waiting on events and wishes to stop waiting after a specified amount of time, then this can be accomplished directly in version 5.11 and newer by using this feature.ENTER_REGISTERED_RING- If the ring file descriptor has been registered through use ofREGISTER_RING_FDS, then setting this flag will tell the kernel that thering_fdpassed in is the registered ring offset rather than a normal file descriptor.ENTER_ABS_TIMER
- See Also:
-
IORING_ENTER_SQ_WAIT
public static final int IORING_ENTER_SQ_WAITio_uring_enter(2)flagsEnum values:
ENTER_GETEVENTS- If this flag is set, then the system call will wait for the specificied number of events inmin_completebefore returning.This flag can be set along with
to_submitto both submit and complete events in a single system call.ENTER_SQ_WAKEUP- If the ring has been created withSETUP_SQPOLL, then this flag asks the kernel to wakeup the SQ kernel thread to submit IO.ENTER_SQ_WAIT- If the ring has been created withSETUP_SQPOLL, then the application has no real insight into when the SQ kernel thread has consumed entries from the SQ ring. This can lead to a situation where the application can no longer get a free SQE entry to submit, without knowing when it one becomes available as the SQ kernel thread consumes them. If the system call is used with this flag set, then it will wait until at least one entry is free in the SQ ring.ENTER_EXT_ARG- Since kernel 5.11, the system calls arguments have been modified to look like the following:int io_uring_enter(unsigned int fd, unsigned int to_submit, unsigned int min_complete, unsigned int flags, const void *arg, size_t argsz);which is behaves just like the original definition by default. However, if
IORING_ENTER_EXT_ARGis set, then instead of asigset_tbeing passed in, a pointer to a structio_uring_getevents_argis used instead andargszmust be set to the size of this structure.The definition is
IOURingGeteventsArgwhich allows passing in both a signal mask as well as pointer to a struct__kernel_timespectimeout value. Iftsis set to a valid pointer, then this time value indicates the timeout for waiting on events. If an application is waiting on events and wishes to stop waiting after a specified amount of time, then this can be accomplished directly in version 5.11 and newer by using this feature.ENTER_REGISTERED_RING- If the ring file descriptor has been registered through use ofREGISTER_RING_FDS, then setting this flag will tell the kernel that thering_fdpassed in is the registered ring offset rather than a normal file descriptor.ENTER_ABS_TIMER
- See Also:
-
IORING_ENTER_EXT_ARG
public static final int IORING_ENTER_EXT_ARGio_uring_enter(2)flagsEnum values:
ENTER_GETEVENTS- If this flag is set, then the system call will wait for the specificied number of events inmin_completebefore returning.This flag can be set along with
to_submitto both submit and complete events in a single system call.ENTER_SQ_WAKEUP- If the ring has been created withSETUP_SQPOLL, then this flag asks the kernel to wakeup the SQ kernel thread to submit IO.ENTER_SQ_WAIT- If the ring has been created withSETUP_SQPOLL, then the application has no real insight into when the SQ kernel thread has consumed entries from the SQ ring. This can lead to a situation where the application can no longer get a free SQE entry to submit, without knowing when it one becomes available as the SQ kernel thread consumes them. If the system call is used with this flag set, then it will wait until at least one entry is free in the SQ ring.ENTER_EXT_ARG- Since kernel 5.11, the system calls arguments have been modified to look like the following:int io_uring_enter(unsigned int fd, unsigned int to_submit, unsigned int min_complete, unsigned int flags, const void *arg, size_t argsz);which is behaves just like the original definition by default. However, if
IORING_ENTER_EXT_ARGis set, then instead of asigset_tbeing passed in, a pointer to a structio_uring_getevents_argis used instead andargszmust be set to the size of this structure.The definition is
IOURingGeteventsArgwhich allows passing in both a signal mask as well as pointer to a struct__kernel_timespectimeout value. Iftsis set to a valid pointer, then this time value indicates the timeout for waiting on events. If an application is waiting on events and wishes to stop waiting after a specified amount of time, then this can be accomplished directly in version 5.11 and newer by using this feature.ENTER_REGISTERED_RING- If the ring file descriptor has been registered through use ofREGISTER_RING_FDS, then setting this flag will tell the kernel that thering_fdpassed in is the registered ring offset rather than a normal file descriptor.ENTER_ABS_TIMER
- See Also:
-
IORING_ENTER_REGISTERED_RING
public static final int IORING_ENTER_REGISTERED_RINGio_uring_enter(2)flagsEnum values:
ENTER_GETEVENTS- If this flag is set, then the system call will wait for the specificied number of events inmin_completebefore returning.This flag can be set along with
to_submitto both submit and complete events in a single system call.ENTER_SQ_WAKEUP- If the ring has been created withSETUP_SQPOLL, then this flag asks the kernel to wakeup the SQ kernel thread to submit IO.ENTER_SQ_WAIT- If the ring has been created withSETUP_SQPOLL, then the application has no real insight into when the SQ kernel thread has consumed entries from the SQ ring. This can lead to a situation where the application can no longer get a free SQE entry to submit, without knowing when it one becomes available as the SQ kernel thread consumes them. If the system call is used with this flag set, then it will wait until at least one entry is free in the SQ ring.ENTER_EXT_ARG- Since kernel 5.11, the system calls arguments have been modified to look like the following:int io_uring_enter(unsigned int fd, unsigned int to_submit, unsigned int min_complete, unsigned int flags, const void *arg, size_t argsz);which is behaves just like the original definition by default. However, if
IORING_ENTER_EXT_ARGis set, then instead of asigset_tbeing passed in, a pointer to a structio_uring_getevents_argis used instead andargszmust be set to the size of this structure.The definition is
IOURingGeteventsArgwhich allows passing in both a signal mask as well as pointer to a struct__kernel_timespectimeout value. Iftsis set to a valid pointer, then this time value indicates the timeout for waiting on events. If an application is waiting on events and wishes to stop waiting after a specified amount of time, then this can be accomplished directly in version 5.11 and newer by using this feature.ENTER_REGISTERED_RING- If the ring file descriptor has been registered through use ofREGISTER_RING_FDS, then setting this flag will tell the kernel that thering_fdpassed in is the registered ring offset rather than a normal file descriptor.ENTER_ABS_TIMER
- See Also:
-
IORING_ENTER_ABS_TIMER
public static final int IORING_ENTER_ABS_TIMERio_uring_enter(2)flagsEnum values:
ENTER_GETEVENTS- If this flag is set, then the system call will wait for the specificied number of events inmin_completebefore returning.This flag can be set along with
to_submitto both submit and complete events in a single system call.ENTER_SQ_WAKEUP- If the ring has been created withSETUP_SQPOLL, then this flag asks the kernel to wakeup the SQ kernel thread to submit IO.ENTER_SQ_WAIT- If the ring has been created withSETUP_SQPOLL, then the application has no real insight into when the SQ kernel thread has consumed entries from the SQ ring. This can lead to a situation where the application can no longer get a free SQE entry to submit, without knowing when it one becomes available as the SQ kernel thread consumes them. If the system call is used with this flag set, then it will wait until at least one entry is free in the SQ ring.ENTER_EXT_ARG- Since kernel 5.11, the system calls arguments have been modified to look like the following:int io_uring_enter(unsigned int fd, unsigned int to_submit, unsigned int min_complete, unsigned int flags, const void *arg, size_t argsz);which is behaves just like the original definition by default. However, if
IORING_ENTER_EXT_ARGis set, then instead of asigset_tbeing passed in, a pointer to a structio_uring_getevents_argis used instead andargszmust be set to the size of this structure.The definition is
IOURingGeteventsArgwhich allows passing in both a signal mask as well as pointer to a struct__kernel_timespectimeout value. Iftsis set to a valid pointer, then this time value indicates the timeout for waiting on events. If an application is waiting on events and wishes to stop waiting after a specified amount of time, then this can be accomplished directly in version 5.11 and newer by using this feature.ENTER_REGISTERED_RING- If the ring file descriptor has been registered through use ofREGISTER_RING_FDS, then setting this flag will tell the kernel that thering_fdpassed in is the registered ring offset rather than a normal file descriptor.ENTER_ABS_TIMER
- See Also:
-
IORING_FEAT_SINGLE_MMAP
public static final int IORING_FEAT_SINGLE_MMAPio_uring_params->featuresflagsEnum values:
FEAT_SINGLE_MMAP- If this flag is set, the two SQ and CQ rings can be mapped with a singlemmap(2)call.The SQEs must still be allocated separately. This brings the necessary
mmap(2)calls down from three to two.Available since kernel 5.4.
FEAT_NODROP- If this flag is set,io_uringsupports never dropping completion events.If a completion event occurs and the CQ ring is full, the kernel stores the event internally until such a time that the CQ ring has room for more entries. If this overflow condition is entered, attempting to submit more IO will fail with the
-EBUSYerror value, if it can't flush the overflown events to the CQ ring. If this happens, the application must reap events from the CQ ring and attempt the submit again.Available since kernel 5.5.
FEAT_SUBMIT_STABLE- If this flag is set, applications can be certain that any data for async offload has been consumed when the kernel has consumed the SQE.Available since kernel 5.5.
FEAT_RW_CUR_POS- If this flag is set, applications can specifyoffset == -1withIORING_OP_{READV,WRITEV},IORING_OP_{READ,WRITE}_FIXED, andIORING_OP_{READ,WRITE}to mean current file position, which behaves likepreadv2(2)andpwritev2(2)withoffset == -1.It'll use (and update) the current file position. This obviously comes with the caveat that if the application has multiple reads or writes in flight, then the end result will not be as expected. This is similar to threads sharing a file descriptor and doing IO using the current file position.
Available since kernel 5.6.
FEAT_CUR_PERSONALITY- If this flag is set, thenio_uringguarantees that both sync and async execution of a request assumes the credentials of the task that calledenterto queue the requests.If this flag isn't set, then requests are issued with the credentials of the task that originally registered the
io_uring. If only one task is using a ring, then this flag doesn't matter as the credentials will always be the same. Note that this is the default behavior, tasks can still register different personalities throughregisterwithREGISTER_PERSONALITYand specify the personality to use in the sqe.Available since kernel 5.6.
FEAT_FAST_POLL- If this flag is set, thenio_uringsupports using an internal poll mechanism to drive data/space readiness.This means that requests that cannot read or write data to a file no longer need to be punted to an async thread for handling, instead they will begin operation when the file is ready. This is similar to doing poll + read/write in userspace, but eliminates the need to do so. If this flag is set, requests waiting on space/data consume a lot less resources doing so as they are not blocking a thread.
Available since kernel 5.7.
FEAT_POLL_32BITS- If this flag is set, theOP_POLL_ADDcommand accepts the full 32-bit range of epoll based flags.Most notably
EPOLLEXCLUSIVEwhich allows exclusive (waking single waiters) behavior.Available since kernel 5.9.
FEAT_SQPOLL_NONFIXED- If this flag is set, theSETUP_SQPOLLfeature no longer requires the use of fixed files.Any normal file descriptor can be used for IO commands without needing registration.
Available since kernel 5.11.
FEAT_EXT_ARG- If this flag is set, then theentersystem call supports passing in an extended argument instead of just thesigset_tof earlier kernels.This extended argument is of type
struct io_uring_getevents_argand allows the caller to pass in both asigset_tand a timeout argument for waiting on events. A pointer to this struct must be passed in ifENTER_EXT_ARGis set in the flags for the enter system call.Available since kernel 5.11.
FEAT_NATIVE_WORKERS- If this flag is set,io_uringis using native workers for its async helpers.Previous kernels used kernel threads that assumed the identity of the original
io_uringowning task, but later kernels will actively create what looks more like regular process threads instead.Available since kernel 5.12.
FEAT_RSRC_TAGS- If this flag is set, thenio_uringsupports a variety of features related to fixed files and buffers.In particular, it indicates that registered buffers can be updated in-place, whereas before the full set would have to be unregistered first.
Available since kernel 5.13.
FEAT_CQE_SKIP- If this flag is set, then io_uring supports settingIOSQE_CQE_SKIP_SUCCESSin the submitted SQE, indicating that no CQE should be generated for this SQE if it executes normally. If an error happens processing the SQE, a CQE with the appropriate error value will still be generated.Available since kernel 5.17.
FEAT_LINKED_FILE- If this flag is set, then io_uring supports sane assignment of files for SQEs that have dependencies. For example, if a chain of SQEs are submitted withIOSQE_IO_LINK, then kernels without this flag will prepare the file for each link upfront. If a previous link opens a file with a known index, eg if direct descriptors are used with open or accept, then file assignment needs to happen post execution of that SQE. If this flag is set, then the kernel will defer file assignment until execution of a given request is started.Available since kernel 5.17.
FEAT_REG_REG_RINGFEAT_RECVSEND_BUNDLEFEAT_MIN_TIMEOUT
- See Also:
-
IORING_FEAT_NODROP
public static final int IORING_FEAT_NODROPio_uring_params->featuresflagsEnum values:
FEAT_SINGLE_MMAP- If this flag is set, the two SQ and CQ rings can be mapped with a singlemmap(2)call.The SQEs must still be allocated separately. This brings the necessary
mmap(2)calls down from three to two.Available since kernel 5.4.
FEAT_NODROP- If this flag is set,io_uringsupports never dropping completion events.If a completion event occurs and the CQ ring is full, the kernel stores the event internally until such a time that the CQ ring has room for more entries. If this overflow condition is entered, attempting to submit more IO will fail with the
-EBUSYerror value, if it can't flush the overflown events to the CQ ring. If this happens, the application must reap events from the CQ ring and attempt the submit again.Available since kernel 5.5.
FEAT_SUBMIT_STABLE- If this flag is set, applications can be certain that any data for async offload has been consumed when the kernel has consumed the SQE.Available since kernel 5.5.
FEAT_RW_CUR_POS- If this flag is set, applications can specifyoffset == -1withIORING_OP_{READV,WRITEV},IORING_OP_{READ,WRITE}_FIXED, andIORING_OP_{READ,WRITE}to mean current file position, which behaves likepreadv2(2)andpwritev2(2)withoffset == -1.It'll use (and update) the current file position. This obviously comes with the caveat that if the application has multiple reads or writes in flight, then the end result will not be as expected. This is similar to threads sharing a file descriptor and doing IO using the current file position.
Available since kernel 5.6.
FEAT_CUR_PERSONALITY- If this flag is set, thenio_uringguarantees that both sync and async execution of a request assumes the credentials of the task that calledenterto queue the requests.If this flag isn't set, then requests are issued with the credentials of the task that originally registered the
io_uring. If only one task is using a ring, then this flag doesn't matter as the credentials will always be the same. Note that this is the default behavior, tasks can still register different personalities throughregisterwithREGISTER_PERSONALITYand specify the personality to use in the sqe.Available since kernel 5.6.
FEAT_FAST_POLL- If this flag is set, thenio_uringsupports using an internal poll mechanism to drive data/space readiness.This means that requests that cannot read or write data to a file no longer need to be punted to an async thread for handling, instead they will begin operation when the file is ready. This is similar to doing poll + read/write in userspace, but eliminates the need to do so. If this flag is set, requests waiting on space/data consume a lot less resources doing so as they are not blocking a thread.
Available since kernel 5.7.
FEAT_POLL_32BITS- If this flag is set, theOP_POLL_ADDcommand accepts the full 32-bit range of epoll based flags.Most notably
EPOLLEXCLUSIVEwhich allows exclusive (waking single waiters) behavior.Available since kernel 5.9.
FEAT_SQPOLL_NONFIXED- If this flag is set, theSETUP_SQPOLLfeature no longer requires the use of fixed files.Any normal file descriptor can be used for IO commands without needing registration.
Available since kernel 5.11.
FEAT_EXT_ARG- If this flag is set, then theentersystem call supports passing in an extended argument instead of just thesigset_tof earlier kernels.This extended argument is of type
struct io_uring_getevents_argand allows the caller to pass in both asigset_tand a timeout argument for waiting on events. A pointer to this struct must be passed in ifENTER_EXT_ARGis set in the flags for the enter system call.Available since kernel 5.11.
FEAT_NATIVE_WORKERS- If this flag is set,io_uringis using native workers for its async helpers.Previous kernels used kernel threads that assumed the identity of the original
io_uringowning task, but later kernels will actively create what looks more like regular process threads instead.Available since kernel 5.12.
FEAT_RSRC_TAGS- If this flag is set, thenio_uringsupports a variety of features related to fixed files and buffers.In particular, it indicates that registered buffers can be updated in-place, whereas before the full set would have to be unregistered first.
Available since kernel 5.13.
FEAT_CQE_SKIP- If this flag is set, then io_uring supports settingIOSQE_CQE_SKIP_SUCCESSin the submitted SQE, indicating that no CQE should be generated for this SQE if it executes normally. If an error happens processing the SQE, a CQE with the appropriate error value will still be generated.Available since kernel 5.17.
FEAT_LINKED_FILE- If this flag is set, then io_uring supports sane assignment of files for SQEs that have dependencies. For example, if a chain of SQEs are submitted withIOSQE_IO_LINK, then kernels without this flag will prepare the file for each link upfront. If a previous link opens a file with a known index, eg if direct descriptors are used with open or accept, then file assignment needs to happen post execution of that SQE. If this flag is set, then the kernel will defer file assignment until execution of a given request is started.Available since kernel 5.17.
FEAT_REG_REG_RINGFEAT_RECVSEND_BUNDLEFEAT_MIN_TIMEOUT
- See Also:
-
IORING_FEAT_SUBMIT_STABLE
public static final int IORING_FEAT_SUBMIT_STABLEio_uring_params->featuresflagsEnum values:
FEAT_SINGLE_MMAP- If this flag is set, the two SQ and CQ rings can be mapped with a singlemmap(2)call.The SQEs must still be allocated separately. This brings the necessary
mmap(2)calls down from three to two.Available since kernel 5.4.
FEAT_NODROP- If this flag is set,io_uringsupports never dropping completion events.If a completion event occurs and the CQ ring is full, the kernel stores the event internally until such a time that the CQ ring has room for more entries. If this overflow condition is entered, attempting to submit more IO will fail with the
-EBUSYerror value, if it can't flush the overflown events to the CQ ring. If this happens, the application must reap events from the CQ ring and attempt the submit again.Available since kernel 5.5.
FEAT_SUBMIT_STABLE- If this flag is set, applications can be certain that any data for async offload has been consumed when the kernel has consumed the SQE.Available since kernel 5.5.
FEAT_RW_CUR_POS- If this flag is set, applications can specifyoffset == -1withIORING_OP_{READV,WRITEV},IORING_OP_{READ,WRITE}_FIXED, andIORING_OP_{READ,WRITE}to mean current file position, which behaves likepreadv2(2)andpwritev2(2)withoffset == -1.It'll use (and update) the current file position. This obviously comes with the caveat that if the application has multiple reads or writes in flight, then the end result will not be as expected. This is similar to threads sharing a file descriptor and doing IO using the current file position.
Available since kernel 5.6.
FEAT_CUR_PERSONALITY- If this flag is set, thenio_uringguarantees that both sync and async execution of a request assumes the credentials of the task that calledenterto queue the requests.If this flag isn't set, then requests are issued with the credentials of the task that originally registered the
io_uring. If only one task is using a ring, then this flag doesn't matter as the credentials will always be the same. Note that this is the default behavior, tasks can still register different personalities throughregisterwithREGISTER_PERSONALITYand specify the personality to use in the sqe.Available since kernel 5.6.
FEAT_FAST_POLL- If this flag is set, thenio_uringsupports using an internal poll mechanism to drive data/space readiness.This means that requests that cannot read or write data to a file no longer need to be punted to an async thread for handling, instead they will begin operation when the file is ready. This is similar to doing poll + read/write in userspace, but eliminates the need to do so. If this flag is set, requests waiting on space/data consume a lot less resources doing so as they are not blocking a thread.
Available since kernel 5.7.
FEAT_POLL_32BITS- If this flag is set, theOP_POLL_ADDcommand accepts the full 32-bit range of epoll based flags.Most notably
EPOLLEXCLUSIVEwhich allows exclusive (waking single waiters) behavior.Available since kernel 5.9.
FEAT_SQPOLL_NONFIXED- If this flag is set, theSETUP_SQPOLLfeature no longer requires the use of fixed files.Any normal file descriptor can be used for IO commands without needing registration.
Available since kernel 5.11.
FEAT_EXT_ARG- If this flag is set, then theentersystem call supports passing in an extended argument instead of just thesigset_tof earlier kernels.This extended argument is of type
struct io_uring_getevents_argand allows the caller to pass in both asigset_tand a timeout argument for waiting on events. A pointer to this struct must be passed in ifENTER_EXT_ARGis set in the flags for the enter system call.Available since kernel 5.11.
FEAT_NATIVE_WORKERS- If this flag is set,io_uringis using native workers for its async helpers.Previous kernels used kernel threads that assumed the identity of the original
io_uringowning task, but later kernels will actively create what looks more like regular process threads instead.Available since kernel 5.12.
FEAT_RSRC_TAGS- If this flag is set, thenio_uringsupports a variety of features related to fixed files and buffers.In particular, it indicates that registered buffers can be updated in-place, whereas before the full set would have to be unregistered first.
Available since kernel 5.13.
FEAT_CQE_SKIP- If this flag is set, then io_uring supports settingIOSQE_CQE_SKIP_SUCCESSin the submitted SQE, indicating that no CQE should be generated for this SQE if it executes normally. If an error happens processing the SQE, a CQE with the appropriate error value will still be generated.Available since kernel 5.17.
FEAT_LINKED_FILE- If this flag is set, then io_uring supports sane assignment of files for SQEs that have dependencies. For example, if a chain of SQEs are submitted withIOSQE_IO_LINK, then kernels without this flag will prepare the file for each link upfront. If a previous link opens a file with a known index, eg if direct descriptors are used with open or accept, then file assignment needs to happen post execution of that SQE. If this flag is set, then the kernel will defer file assignment until execution of a given request is started.Available since kernel 5.17.
FEAT_REG_REG_RINGFEAT_RECVSEND_BUNDLEFEAT_MIN_TIMEOUT
- See Also:
-
IORING_FEAT_RW_CUR_POS
public static final int IORING_FEAT_RW_CUR_POSio_uring_params->featuresflagsEnum values:
FEAT_SINGLE_MMAP- If this flag is set, the two SQ and CQ rings can be mapped with a singlemmap(2)call.The SQEs must still be allocated separately. This brings the necessary
mmap(2)calls down from three to two.Available since kernel 5.4.
FEAT_NODROP- If this flag is set,io_uringsupports never dropping completion events.If a completion event occurs and the CQ ring is full, the kernel stores the event internally until such a time that the CQ ring has room for more entries. If this overflow condition is entered, attempting to submit more IO will fail with the
-EBUSYerror value, if it can't flush the overflown events to the CQ ring. If this happens, the application must reap events from the CQ ring and attempt the submit again.Available since kernel 5.5.
FEAT_SUBMIT_STABLE- If this flag is set, applications can be certain that any data for async offload has been consumed when the kernel has consumed the SQE.Available since kernel 5.5.
FEAT_RW_CUR_POS- If this flag is set, applications can specifyoffset == -1withIORING_OP_{READV,WRITEV},IORING_OP_{READ,WRITE}_FIXED, andIORING_OP_{READ,WRITE}to mean current file position, which behaves likepreadv2(2)andpwritev2(2)withoffset == -1.It'll use (and update) the current file position. This obviously comes with the caveat that if the application has multiple reads or writes in flight, then the end result will not be as expected. This is similar to threads sharing a file descriptor and doing IO using the current file position.
Available since kernel 5.6.
FEAT_CUR_PERSONALITY- If this flag is set, thenio_uringguarantees that both sync and async execution of a request assumes the credentials of the task that calledenterto queue the requests.If this flag isn't set, then requests are issued with the credentials of the task that originally registered the
io_uring. If only one task is using a ring, then this flag doesn't matter as the credentials will always be the same. Note that this is the default behavior, tasks can still register different personalities throughregisterwithREGISTER_PERSONALITYand specify the personality to use in the sqe.Available since kernel 5.6.
FEAT_FAST_POLL- If this flag is set, thenio_uringsupports using an internal poll mechanism to drive data/space readiness.This means that requests that cannot read or write data to a file no longer need to be punted to an async thread for handling, instead they will begin operation when the file is ready. This is similar to doing poll + read/write in userspace, but eliminates the need to do so. If this flag is set, requests waiting on space/data consume a lot less resources doing so as they are not blocking a thread.
Available since kernel 5.7.
FEAT_POLL_32BITS- If this flag is set, theOP_POLL_ADDcommand accepts the full 32-bit range of epoll based flags.Most notably
EPOLLEXCLUSIVEwhich allows exclusive (waking single waiters) behavior.Available since kernel 5.9.
FEAT_SQPOLL_NONFIXED- If this flag is set, theSETUP_SQPOLLfeature no longer requires the use of fixed files.Any normal file descriptor can be used for IO commands without needing registration.
Available since kernel 5.11.
FEAT_EXT_ARG- If this flag is set, then theentersystem call supports passing in an extended argument instead of just thesigset_tof earlier kernels.This extended argument is of type
struct io_uring_getevents_argand allows the caller to pass in both asigset_tand a timeout argument for waiting on events. A pointer to this struct must be passed in ifENTER_EXT_ARGis set in the flags for the enter system call.Available since kernel 5.11.
FEAT_NATIVE_WORKERS- If this flag is set,io_uringis using native workers for its async helpers.Previous kernels used kernel threads that assumed the identity of the original
io_uringowning task, but later kernels will actively create what looks more like regular process threads instead.Available since kernel 5.12.
FEAT_RSRC_TAGS- If this flag is set, thenio_uringsupports a variety of features related to fixed files and buffers.In particular, it indicates that registered buffers can be updated in-place, whereas before the full set would have to be unregistered first.
Available since kernel 5.13.
FEAT_CQE_SKIP- If this flag is set, then io_uring supports settingIOSQE_CQE_SKIP_SUCCESSin the submitted SQE, indicating that no CQE should be generated for this SQE if it executes normally. If an error happens processing the SQE, a CQE with the appropriate error value will still be generated.Available since kernel 5.17.
FEAT_LINKED_FILE- If this flag is set, then io_uring supports sane assignment of files for SQEs that have dependencies. For example, if a chain of SQEs are submitted withIOSQE_IO_LINK, then kernels without this flag will prepare the file for each link upfront. If a previous link opens a file with a known index, eg if direct descriptors are used with open or accept, then file assignment needs to happen post execution of that SQE. If this flag is set, then the kernel will defer file assignment until execution of a given request is started.Available since kernel 5.17.
FEAT_REG_REG_RINGFEAT_RECVSEND_BUNDLEFEAT_MIN_TIMEOUT
- See Also:
-
IORING_FEAT_CUR_PERSONALITY
public static final int IORING_FEAT_CUR_PERSONALITYio_uring_params->featuresflagsEnum values:
FEAT_SINGLE_MMAP- If this flag is set, the two SQ and CQ rings can be mapped with a singlemmap(2)call.The SQEs must still be allocated separately. This brings the necessary
mmap(2)calls down from three to two.Available since kernel 5.4.
FEAT_NODROP- If this flag is set,io_uringsupports never dropping completion events.If a completion event occurs and the CQ ring is full, the kernel stores the event internally until such a time that the CQ ring has room for more entries. If this overflow condition is entered, attempting to submit more IO will fail with the
-EBUSYerror value, if it can't flush the overflown events to the CQ ring. If this happens, the application must reap events from the CQ ring and attempt the submit again.Available since kernel 5.5.
FEAT_SUBMIT_STABLE- If this flag is set, applications can be certain that any data for async offload has been consumed when the kernel has consumed the SQE.Available since kernel 5.5.
FEAT_RW_CUR_POS- If this flag is set, applications can specifyoffset == -1withIORING_OP_{READV,WRITEV},IORING_OP_{READ,WRITE}_FIXED, andIORING_OP_{READ,WRITE}to mean current file position, which behaves likepreadv2(2)andpwritev2(2)withoffset == -1.It'll use (and update) the current file position. This obviously comes with the caveat that if the application has multiple reads or writes in flight, then the end result will not be as expected. This is similar to threads sharing a file descriptor and doing IO using the current file position.
Available since kernel 5.6.
FEAT_CUR_PERSONALITY- If this flag is set, thenio_uringguarantees that both sync and async execution of a request assumes the credentials of the task that calledenterto queue the requests.If this flag isn't set, then requests are issued with the credentials of the task that originally registered the
io_uring. If only one task is using a ring, then this flag doesn't matter as the credentials will always be the same. Note that this is the default behavior, tasks can still register different personalities throughregisterwithREGISTER_PERSONALITYand specify the personality to use in the sqe.Available since kernel 5.6.
FEAT_FAST_POLL- If this flag is set, thenio_uringsupports using an internal poll mechanism to drive data/space readiness.This means that requests that cannot read or write data to a file no longer need to be punted to an async thread for handling, instead they will begin operation when the file is ready. This is similar to doing poll + read/write in userspace, but eliminates the need to do so. If this flag is set, requests waiting on space/data consume a lot less resources doing so as they are not blocking a thread.
Available since kernel 5.7.
FEAT_POLL_32BITS- If this flag is set, theOP_POLL_ADDcommand accepts the full 32-bit range of epoll based flags.Most notably
EPOLLEXCLUSIVEwhich allows exclusive (waking single waiters) behavior.Available since kernel 5.9.
FEAT_SQPOLL_NONFIXED- If this flag is set, theSETUP_SQPOLLfeature no longer requires the use of fixed files.Any normal file descriptor can be used for IO commands without needing registration.
Available since kernel 5.11.
FEAT_EXT_ARG- If this flag is set, then theentersystem call supports passing in an extended argument instead of just thesigset_tof earlier kernels.This extended argument is of type
struct io_uring_getevents_argand allows the caller to pass in both asigset_tand a timeout argument for waiting on events. A pointer to this struct must be passed in ifENTER_EXT_ARGis set in the flags for the enter system call.Available since kernel 5.11.
FEAT_NATIVE_WORKERS- If this flag is set,io_uringis using native workers for its async helpers.Previous kernels used kernel threads that assumed the identity of the original
io_uringowning task, but later kernels will actively create what looks more like regular process threads instead.Available since kernel 5.12.
FEAT_RSRC_TAGS- If this flag is set, thenio_uringsupports a variety of features related to fixed files and buffers.In particular, it indicates that registered buffers can be updated in-place, whereas before the full set would have to be unregistered first.
Available since kernel 5.13.
FEAT_CQE_SKIP- If this flag is set, then io_uring supports settingIOSQE_CQE_SKIP_SUCCESSin the submitted SQE, indicating that no CQE should be generated for this SQE if it executes normally. If an error happens processing the SQE, a CQE with the appropriate error value will still be generated.Available since kernel 5.17.
FEAT_LINKED_FILE- If this flag is set, then io_uring supports sane assignment of files for SQEs that have dependencies. For example, if a chain of SQEs are submitted withIOSQE_IO_LINK, then kernels without this flag will prepare the file for each link upfront. If a previous link opens a file with a known index, eg if direct descriptors are used with open or accept, then file assignment needs to happen post execution of that SQE. If this flag is set, then the kernel will defer file assignment until execution of a given request is started.Available since kernel 5.17.
FEAT_REG_REG_RINGFEAT_RECVSEND_BUNDLEFEAT_MIN_TIMEOUT
- See Also:
-
IORING_FEAT_FAST_POLL
public static final int IORING_FEAT_FAST_POLLio_uring_params->featuresflagsEnum values:
FEAT_SINGLE_MMAP- If this flag is set, the two SQ and CQ rings can be mapped with a singlemmap(2)call.The SQEs must still be allocated separately. This brings the necessary
mmap(2)calls down from three to two.Available since kernel 5.4.
FEAT_NODROP- If this flag is set,io_uringsupports never dropping completion events.If a completion event occurs and the CQ ring is full, the kernel stores the event internally until such a time that the CQ ring has room for more entries. If this overflow condition is entered, attempting to submit more IO will fail with the
-EBUSYerror value, if it can't flush the overflown events to the CQ ring. If this happens, the application must reap events from the CQ ring and attempt the submit again.Available since kernel 5.5.
FEAT_SUBMIT_STABLE- If this flag is set, applications can be certain that any data for async offload has been consumed when the kernel has consumed the SQE.Available since kernel 5.5.
FEAT_RW_CUR_POS- If this flag is set, applications can specifyoffset == -1withIORING_OP_{READV,WRITEV},IORING_OP_{READ,WRITE}_FIXED, andIORING_OP_{READ,WRITE}to mean current file position, which behaves likepreadv2(2)andpwritev2(2)withoffset == -1.It'll use (and update) the current file position. This obviously comes with the caveat that if the application has multiple reads or writes in flight, then the end result will not be as expected. This is similar to threads sharing a file descriptor and doing IO using the current file position.
Available since kernel 5.6.
FEAT_CUR_PERSONALITY- If this flag is set, thenio_uringguarantees that both sync and async execution of a request assumes the credentials of the task that calledenterto queue the requests.If this flag isn't set, then requests are issued with the credentials of the task that originally registered the
io_uring. If only one task is using a ring, then this flag doesn't matter as the credentials will always be the same. Note that this is the default behavior, tasks can still register different personalities throughregisterwithREGISTER_PERSONALITYand specify the personality to use in the sqe.Available since kernel 5.6.
FEAT_FAST_POLL- If this flag is set, thenio_uringsupports using an internal poll mechanism to drive data/space readiness.This means that requests that cannot read or write data to a file no longer need to be punted to an async thread for handling, instead they will begin operation when the file is ready. This is similar to doing poll + read/write in userspace, but eliminates the need to do so. If this flag is set, requests waiting on space/data consume a lot less resources doing so as they are not blocking a thread.
Available since kernel 5.7.
FEAT_POLL_32BITS- If this flag is set, theOP_POLL_ADDcommand accepts the full 32-bit range of epoll based flags.Most notably
EPOLLEXCLUSIVEwhich allows exclusive (waking single waiters) behavior.Available since kernel 5.9.
FEAT_SQPOLL_NONFIXED- If this flag is set, theSETUP_SQPOLLfeature no longer requires the use of fixed files.Any normal file descriptor can be used for IO commands without needing registration.
Available since kernel 5.11.
FEAT_EXT_ARG- If this flag is set, then theentersystem call supports passing in an extended argument instead of just thesigset_tof earlier kernels.This extended argument is of type
struct io_uring_getevents_argand allows the caller to pass in both asigset_tand a timeout argument for waiting on events. A pointer to this struct must be passed in ifENTER_EXT_ARGis set in the flags for the enter system call.Available since kernel 5.11.
FEAT_NATIVE_WORKERS- If this flag is set,io_uringis using native workers for its async helpers.Previous kernels used kernel threads that assumed the identity of the original
io_uringowning task, but later kernels will actively create what looks more like regular process threads instead.Available since kernel 5.12.
FEAT_RSRC_TAGS- If this flag is set, thenio_uringsupports a variety of features related to fixed files and buffers.In particular, it indicates that registered buffers can be updated in-place, whereas before the full set would have to be unregistered first.
Available since kernel 5.13.
FEAT_CQE_SKIP- If this flag is set, then io_uring supports settingIOSQE_CQE_SKIP_SUCCESSin the submitted SQE, indicating that no CQE should be generated for this SQE if it executes normally. If an error happens processing the SQE, a CQE with the appropriate error value will still be generated.Available since kernel 5.17.
FEAT_LINKED_FILE- If this flag is set, then io_uring supports sane assignment of files for SQEs that have dependencies. For example, if a chain of SQEs are submitted withIOSQE_IO_LINK, then kernels without this flag will prepare the file for each link upfront. If a previous link opens a file with a known index, eg if direct descriptors are used with open or accept, then file assignment needs to happen post execution of that SQE. If this flag is set, then the kernel will defer file assignment until execution of a given request is started.Available since kernel 5.17.
FEAT_REG_REG_RINGFEAT_RECVSEND_BUNDLEFEAT_MIN_TIMEOUT
- See Also:
-
IORING_FEAT_POLL_32BITS
public static final int IORING_FEAT_POLL_32BITSio_uring_params->featuresflagsEnum values:
FEAT_SINGLE_MMAP- If this flag is set, the two SQ and CQ rings can be mapped with a singlemmap(2)call.The SQEs must still be allocated separately. This brings the necessary
mmap(2)calls down from three to two.Available since kernel 5.4.
FEAT_NODROP- If this flag is set,io_uringsupports never dropping completion events.If a completion event occurs and the CQ ring is full, the kernel stores the event internally until such a time that the CQ ring has room for more entries. If this overflow condition is entered, attempting to submit more IO will fail with the
-EBUSYerror value, if it can't flush the overflown events to the CQ ring. If this happens, the application must reap events from the CQ ring and attempt the submit again.Available since kernel 5.5.
FEAT_SUBMIT_STABLE- If this flag is set, applications can be certain that any data for async offload has been consumed when the kernel has consumed the SQE.Available since kernel 5.5.
FEAT_RW_CUR_POS- If this flag is set, applications can specifyoffset == -1withIORING_OP_{READV,WRITEV},IORING_OP_{READ,WRITE}_FIXED, andIORING_OP_{READ,WRITE}to mean current file position, which behaves likepreadv2(2)andpwritev2(2)withoffset == -1.It'll use (and update) the current file position. This obviously comes with the caveat that if the application has multiple reads or writes in flight, then the end result will not be as expected. This is similar to threads sharing a file descriptor and doing IO using the current file position.
Available since kernel 5.6.
FEAT_CUR_PERSONALITY- If this flag is set, thenio_uringguarantees that both sync and async execution of a request assumes the credentials of the task that calledenterto queue the requests.If this flag isn't set, then requests are issued with the credentials of the task that originally registered the
io_uring. If only one task is using a ring, then this flag doesn't matter as the credentials will always be the same. Note that this is the default behavior, tasks can still register different personalities throughregisterwithREGISTER_PERSONALITYand specify the personality to use in the sqe.Available since kernel 5.6.
FEAT_FAST_POLL- If this flag is set, thenio_uringsupports using an internal poll mechanism to drive data/space readiness.This means that requests that cannot read or write data to a file no longer need to be punted to an async thread for handling, instead they will begin operation when the file is ready. This is similar to doing poll + read/write in userspace, but eliminates the need to do so. If this flag is set, requests waiting on space/data consume a lot less resources doing so as they are not blocking a thread.
Available since kernel 5.7.
FEAT_POLL_32BITS- If this flag is set, theOP_POLL_ADDcommand accepts the full 32-bit range of epoll based flags.Most notably
EPOLLEXCLUSIVEwhich allows exclusive (waking single waiters) behavior.Available since kernel 5.9.
FEAT_SQPOLL_NONFIXED- If this flag is set, theSETUP_SQPOLLfeature no longer requires the use of fixed files.Any normal file descriptor can be used for IO commands without needing registration.
Available since kernel 5.11.
FEAT_EXT_ARG- If this flag is set, then theentersystem call supports passing in an extended argument instead of just thesigset_tof earlier kernels.This extended argument is of type
struct io_uring_getevents_argand allows the caller to pass in both asigset_tand a timeout argument for waiting on events. A pointer to this struct must be passed in ifENTER_EXT_ARGis set in the flags for the enter system call.Available since kernel 5.11.
FEAT_NATIVE_WORKERS- If this flag is set,io_uringis using native workers for its async helpers.Previous kernels used kernel threads that assumed the identity of the original
io_uringowning task, but later kernels will actively create what looks more like regular process threads instead.Available since kernel 5.12.
FEAT_RSRC_TAGS- If this flag is set, thenio_uringsupports a variety of features related to fixed files and buffers.In particular, it indicates that registered buffers can be updated in-place, whereas before the full set would have to be unregistered first.
Available since kernel 5.13.
FEAT_CQE_SKIP- If this flag is set, then io_uring supports settingIOSQE_CQE_SKIP_SUCCESSin the submitted SQE, indicating that no CQE should be generated for this SQE if it executes normally. If an error happens processing the SQE, a CQE with the appropriate error value will still be generated.Available since kernel 5.17.
FEAT_LINKED_FILE- If this flag is set, then io_uring supports sane assignment of files for SQEs that have dependencies. For example, if a chain of SQEs are submitted withIOSQE_IO_LINK, then kernels without this flag will prepare the file for each link upfront. If a previous link opens a file with a known index, eg if direct descriptors are used with open or accept, then file assignment needs to happen post execution of that SQE. If this flag is set, then the kernel will defer file assignment until execution of a given request is started.Available since kernel 5.17.
FEAT_REG_REG_RINGFEAT_RECVSEND_BUNDLEFEAT_MIN_TIMEOUT
- See Also:
-
IORING_FEAT_SQPOLL_NONFIXED
public static final int IORING_FEAT_SQPOLL_NONFIXEDio_uring_params->featuresflagsEnum values:
FEAT_SINGLE_MMAP- If this flag is set, the two SQ and CQ rings can be mapped with a singlemmap(2)call.The SQEs must still be allocated separately. This brings the necessary
mmap(2)calls down from three to two.Available since kernel 5.4.
FEAT_NODROP- If this flag is set,io_uringsupports never dropping completion events.If a completion event occurs and the CQ ring is full, the kernel stores the event internally until such a time that the CQ ring has room for more entries. If this overflow condition is entered, attempting to submit more IO will fail with the
-EBUSYerror value, if it can't flush the overflown events to the CQ ring. If this happens, the application must reap events from the CQ ring and attempt the submit again.Available since kernel 5.5.
FEAT_SUBMIT_STABLE- If this flag is set, applications can be certain that any data for async offload has been consumed when the kernel has consumed the SQE.Available since kernel 5.5.
FEAT_RW_CUR_POS- If this flag is set, applications can specifyoffset == -1withIORING_OP_{READV,WRITEV},IORING_OP_{READ,WRITE}_FIXED, andIORING_OP_{READ,WRITE}to mean current file position, which behaves likepreadv2(2)andpwritev2(2)withoffset == -1.It'll use (and update) the current file position. This obviously comes with the caveat that if the application has multiple reads or writes in flight, then the end result will not be as expected. This is similar to threads sharing a file descriptor and doing IO using the current file position.
Available since kernel 5.6.
FEAT_CUR_PERSONALITY- If this flag is set, thenio_uringguarantees that both sync and async execution of a request assumes the credentials of the task that calledenterto queue the requests.If this flag isn't set, then requests are issued with the credentials of the task that originally registered the
io_uring. If only one task is using a ring, then this flag doesn't matter as the credentials will always be the same. Note that this is the default behavior, tasks can still register different personalities throughregisterwithREGISTER_PERSONALITYand specify the personality to use in the sqe.Available since kernel 5.6.
FEAT_FAST_POLL- If this flag is set, thenio_uringsupports using an internal poll mechanism to drive data/space readiness.This means that requests that cannot read or write data to a file no longer need to be punted to an async thread for handling, instead they will begin operation when the file is ready. This is similar to doing poll + read/write in userspace, but eliminates the need to do so. If this flag is set, requests waiting on space/data consume a lot less resources doing so as they are not blocking a thread.
Available since kernel 5.7.
FEAT_POLL_32BITS- If this flag is set, theOP_POLL_ADDcommand accepts the full 32-bit range of epoll based flags.Most notably
EPOLLEXCLUSIVEwhich allows exclusive (waking single waiters) behavior.Available since kernel 5.9.
FEAT_SQPOLL_NONFIXED- If this flag is set, theSETUP_SQPOLLfeature no longer requires the use of fixed files.Any normal file descriptor can be used for IO commands without needing registration.
Available since kernel 5.11.
FEAT_EXT_ARG- If this flag is set, then theentersystem call supports passing in an extended argument instead of just thesigset_tof earlier kernels.This extended argument is of type
struct io_uring_getevents_argand allows the caller to pass in both asigset_tand a timeout argument for waiting on events. A pointer to this struct must be passed in ifENTER_EXT_ARGis set in the flags for the enter system call.Available since kernel 5.11.
FEAT_NATIVE_WORKERS- If this flag is set,io_uringis using native workers for its async helpers.Previous kernels used kernel threads that assumed the identity of the original
io_uringowning task, but later kernels will actively create what looks more like regular process threads instead.Available since kernel 5.12.
FEAT_RSRC_TAGS- If this flag is set, thenio_uringsupports a variety of features related to fixed files and buffers.In particular, it indicates that registered buffers can be updated in-place, whereas before the full set would have to be unregistered first.
Available since kernel 5.13.
FEAT_CQE_SKIP- If this flag is set, then io_uring supports settingIOSQE_CQE_SKIP_SUCCESSin the submitted SQE, indicating that no CQE should be generated for this SQE if it executes normally. If an error happens processing the SQE, a CQE with the appropriate error value will still be generated.Available since kernel 5.17.
FEAT_LINKED_FILE- If this flag is set, then io_uring supports sane assignment of files for SQEs that have dependencies. For example, if a chain of SQEs are submitted withIOSQE_IO_LINK, then kernels without this flag will prepare the file for each link upfront. If a previous link opens a file with a known index, eg if direct descriptors are used with open or accept, then file assignment needs to happen post execution of that SQE. If this flag is set, then the kernel will defer file assignment until execution of a given request is started.Available since kernel 5.17.
FEAT_REG_REG_RINGFEAT_RECVSEND_BUNDLEFEAT_MIN_TIMEOUT
- See Also:
-
IORING_FEAT_EXT_ARG
public static final int IORING_FEAT_EXT_ARGio_uring_params->featuresflagsEnum values:
FEAT_SINGLE_MMAP- If this flag is set, the two SQ and CQ rings can be mapped with a singlemmap(2)call.The SQEs must still be allocated separately. This brings the necessary
mmap(2)calls down from three to two.Available since kernel 5.4.
FEAT_NODROP- If this flag is set,io_uringsupports never dropping completion events.If a completion event occurs and the CQ ring is full, the kernel stores the event internally until such a time that the CQ ring has room for more entries. If this overflow condition is entered, attempting to submit more IO will fail with the
-EBUSYerror value, if it can't flush the overflown events to the CQ ring. If this happens, the application must reap events from the CQ ring and attempt the submit again.Available since kernel 5.5.
FEAT_SUBMIT_STABLE- If this flag is set, applications can be certain that any data for async offload has been consumed when the kernel has consumed the SQE.Available since kernel 5.5.
FEAT_RW_CUR_POS- If this flag is set, applications can specifyoffset == -1withIORING_OP_{READV,WRITEV},IORING_OP_{READ,WRITE}_FIXED, andIORING_OP_{READ,WRITE}to mean current file position, which behaves likepreadv2(2)andpwritev2(2)withoffset == -1.It'll use (and update) the current file position. This obviously comes with the caveat that if the application has multiple reads or writes in flight, then the end result will not be as expected. This is similar to threads sharing a file descriptor and doing IO using the current file position.
Available since kernel 5.6.
FEAT_CUR_PERSONALITY- If this flag is set, thenio_uringguarantees that both sync and async execution of a request assumes the credentials of the task that calledenterto queue the requests.If this flag isn't set, then requests are issued with the credentials of the task that originally registered the
io_uring. If only one task is using a ring, then this flag doesn't matter as the credentials will always be the same. Note that this is the default behavior, tasks can still register different personalities throughregisterwithREGISTER_PERSONALITYand specify the personality to use in the sqe.Available since kernel 5.6.
FEAT_FAST_POLL- If this flag is set, thenio_uringsupports using an internal poll mechanism to drive data/space readiness.This means that requests that cannot read or write data to a file no longer need to be punted to an async thread for handling, instead they will begin operation when the file is ready. This is similar to doing poll + read/write in userspace, but eliminates the need to do so. If this flag is set, requests waiting on space/data consume a lot less resources doing so as they are not blocking a thread.
Available since kernel 5.7.
FEAT_POLL_32BITS- If this flag is set, theOP_POLL_ADDcommand accepts the full 32-bit range of epoll based flags.Most notably
EPOLLEXCLUSIVEwhich allows exclusive (waking single waiters) behavior.Available since kernel 5.9.
FEAT_SQPOLL_NONFIXED- If this flag is set, theSETUP_SQPOLLfeature no longer requires the use of fixed files.Any normal file descriptor can be used for IO commands without needing registration.
Available since kernel 5.11.
FEAT_EXT_ARG- If this flag is set, then theentersystem call supports passing in an extended argument instead of just thesigset_tof earlier kernels.This extended argument is of type
struct io_uring_getevents_argand allows the caller to pass in both asigset_tand a timeout argument for waiting on events. A pointer to this struct must be passed in ifENTER_EXT_ARGis set in the flags for the enter system call.Available since kernel 5.11.
FEAT_NATIVE_WORKERS- If this flag is set,io_uringis using native workers for its async helpers.Previous kernels used kernel threads that assumed the identity of the original
io_uringowning task, but later kernels will actively create what looks more like regular process threads instead.Available since kernel 5.12.
FEAT_RSRC_TAGS- If this flag is set, thenio_uringsupports a variety of features related to fixed files and buffers.In particular, it indicates that registered buffers can be updated in-place, whereas before the full set would have to be unregistered first.
Available since kernel 5.13.
FEAT_CQE_SKIP- If this flag is set, then io_uring supports settingIOSQE_CQE_SKIP_SUCCESSin the submitted SQE, indicating that no CQE should be generated for this SQE if it executes normally. If an error happens processing the SQE, a CQE with the appropriate error value will still be generated.Available since kernel 5.17.
FEAT_LINKED_FILE- If this flag is set, then io_uring supports sane assignment of files for SQEs that have dependencies. For example, if a chain of SQEs are submitted withIOSQE_IO_LINK, then kernels without this flag will prepare the file for each link upfront. If a previous link opens a file with a known index, eg if direct descriptors are used with open or accept, then file assignment needs to happen post execution of that SQE. If this flag is set, then the kernel will defer file assignment until execution of a given request is started.Available since kernel 5.17.
FEAT_REG_REG_RINGFEAT_RECVSEND_BUNDLEFEAT_MIN_TIMEOUT
- See Also:
-
IORING_FEAT_NATIVE_WORKERS
public static final int IORING_FEAT_NATIVE_WORKERSio_uring_params->featuresflagsEnum values:
FEAT_SINGLE_MMAP- If this flag is set, the two SQ and CQ rings can be mapped with a singlemmap(2)call.The SQEs must still be allocated separately. This brings the necessary
mmap(2)calls down from three to two.Available since kernel 5.4.
FEAT_NODROP- If this flag is set,io_uringsupports never dropping completion events.If a completion event occurs and the CQ ring is full, the kernel stores the event internally until such a time that the CQ ring has room for more entries. If this overflow condition is entered, attempting to submit more IO will fail with the
-EBUSYerror value, if it can't flush the overflown events to the CQ ring. If this happens, the application must reap events from the CQ ring and attempt the submit again.Available since kernel 5.5.
FEAT_SUBMIT_STABLE- If this flag is set, applications can be certain that any data for async offload has been consumed when the kernel has consumed the SQE.Available since kernel 5.5.
FEAT_RW_CUR_POS- If this flag is set, applications can specifyoffset == -1withIORING_OP_{READV,WRITEV},IORING_OP_{READ,WRITE}_FIXED, andIORING_OP_{READ,WRITE}to mean current file position, which behaves likepreadv2(2)andpwritev2(2)withoffset == -1.It'll use (and update) the current file position. This obviously comes with the caveat that if the application has multiple reads or writes in flight, then the end result will not be as expected. This is similar to threads sharing a file descriptor and doing IO using the current file position.
Available since kernel 5.6.
FEAT_CUR_PERSONALITY- If this flag is set, thenio_uringguarantees that both sync and async execution of a request assumes the credentials of the task that calledenterto queue the requests.If this flag isn't set, then requests are issued with the credentials of the task that originally registered the
io_uring. If only one task is using a ring, then this flag doesn't matter as the credentials will always be the same. Note that this is the default behavior, tasks can still register different personalities throughregisterwithREGISTER_PERSONALITYand specify the personality to use in the sqe.Available since kernel 5.6.
FEAT_FAST_POLL- If this flag is set, thenio_uringsupports using an internal poll mechanism to drive data/space readiness.This means that requests that cannot read or write data to a file no longer need to be punted to an async thread for handling, instead they will begin operation when the file is ready. This is similar to doing poll + read/write in userspace, but eliminates the need to do so. If this flag is set, requests waiting on space/data consume a lot less resources doing so as they are not blocking a thread.
Available since kernel 5.7.
FEAT_POLL_32BITS- If this flag is set, theOP_POLL_ADDcommand accepts the full 32-bit range of epoll based flags.Most notably
EPOLLEXCLUSIVEwhich allows exclusive (waking single waiters) behavior.Available since kernel 5.9.
FEAT_SQPOLL_NONFIXED- If this flag is set, theSETUP_SQPOLLfeature no longer requires the use of fixed files.Any normal file descriptor can be used for IO commands without needing registration.
Available since kernel 5.11.
FEAT_EXT_ARG- If this flag is set, then theentersystem call supports passing in an extended argument instead of just thesigset_tof earlier kernels.This extended argument is of type
struct io_uring_getevents_argand allows the caller to pass in both asigset_tand a timeout argument for waiting on events. A pointer to this struct must be passed in ifENTER_EXT_ARGis set in the flags for the enter system call.Available since kernel 5.11.
FEAT_NATIVE_WORKERS- If this flag is set,io_uringis using native workers for its async helpers.Previous kernels used kernel threads that assumed the identity of the original
io_uringowning task, but later kernels will actively create what looks more like regular process threads instead.Available since kernel 5.12.
FEAT_RSRC_TAGS- If this flag is set, thenio_uringsupports a variety of features related to fixed files and buffers.In particular, it indicates that registered buffers can be updated in-place, whereas before the full set would have to be unregistered first.
Available since kernel 5.13.
FEAT_CQE_SKIP- If this flag is set, then io_uring supports settingIOSQE_CQE_SKIP_SUCCESSin the submitted SQE, indicating that no CQE should be generated for this SQE if it executes normally. If an error happens processing the SQE, a CQE with the appropriate error value will still be generated.Available since kernel 5.17.
FEAT_LINKED_FILE- If this flag is set, then io_uring supports sane assignment of files for SQEs that have dependencies. For example, if a chain of SQEs are submitted withIOSQE_IO_LINK, then kernels without this flag will prepare the file for each link upfront. If a previous link opens a file with a known index, eg if direct descriptors are used with open or accept, then file assignment needs to happen post execution of that SQE. If this flag is set, then the kernel will defer file assignment until execution of a given request is started.Available since kernel 5.17.
FEAT_REG_REG_RINGFEAT_RECVSEND_BUNDLEFEAT_MIN_TIMEOUT
- See Also:
-
IORING_FEAT_RSRC_TAGS
public static final int IORING_FEAT_RSRC_TAGSio_uring_params->featuresflagsEnum values:
FEAT_SINGLE_MMAP- If this flag is set, the two SQ and CQ rings can be mapped with a singlemmap(2)call.The SQEs must still be allocated separately. This brings the necessary
mmap(2)calls down from three to two.Available since kernel 5.4.
FEAT_NODROP- If this flag is set,io_uringsupports never dropping completion events.If a completion event occurs and the CQ ring is full, the kernel stores the event internally until such a time that the CQ ring has room for more entries. If this overflow condition is entered, attempting to submit more IO will fail with the
-EBUSYerror value, if it can't flush the overflown events to the CQ ring. If this happens, the application must reap events from the CQ ring and attempt the submit again.Available since kernel 5.5.
FEAT_SUBMIT_STABLE- If this flag is set, applications can be certain that any data for async offload has been consumed when the kernel has consumed the SQE.Available since kernel 5.5.
FEAT_RW_CUR_POS- If this flag is set, applications can specifyoffset == -1withIORING_OP_{READV,WRITEV},IORING_OP_{READ,WRITE}_FIXED, andIORING_OP_{READ,WRITE}to mean current file position, which behaves likepreadv2(2)andpwritev2(2)withoffset == -1.It'll use (and update) the current file position. This obviously comes with the caveat that if the application has multiple reads or writes in flight, then the end result will not be as expected. This is similar to threads sharing a file descriptor and doing IO using the current file position.
Available since kernel 5.6.
FEAT_CUR_PERSONALITY- If this flag is set, thenio_uringguarantees that both sync and async execution of a request assumes the credentials of the task that calledenterto queue the requests.If this flag isn't set, then requests are issued with the credentials of the task that originally registered the
io_uring. If only one task is using a ring, then this flag doesn't matter as the credentials will always be the same. Note that this is the default behavior, tasks can still register different personalities throughregisterwithREGISTER_PERSONALITYand specify the personality to use in the sqe.Available since kernel 5.6.
FEAT_FAST_POLL- If this flag is set, thenio_uringsupports using an internal poll mechanism to drive data/space readiness.This means that requests that cannot read or write data to a file no longer need to be punted to an async thread for handling, instead they will begin operation when the file is ready. This is similar to doing poll + read/write in userspace, but eliminates the need to do so. If this flag is set, requests waiting on space/data consume a lot less resources doing so as they are not blocking a thread.
Available since kernel 5.7.
FEAT_POLL_32BITS- If this flag is set, theOP_POLL_ADDcommand accepts the full 32-bit range of epoll based flags.Most notably
EPOLLEXCLUSIVEwhich allows exclusive (waking single waiters) behavior.Available since kernel 5.9.
FEAT_SQPOLL_NONFIXED- If this flag is set, theSETUP_SQPOLLfeature no longer requires the use of fixed files.Any normal file descriptor can be used for IO commands without needing registration.
Available since kernel 5.11.
FEAT_EXT_ARG- If this flag is set, then theentersystem call supports passing in an extended argument instead of just thesigset_tof earlier kernels.This extended argument is of type
struct io_uring_getevents_argand allows the caller to pass in both asigset_tand a timeout argument for waiting on events. A pointer to this struct must be passed in ifENTER_EXT_ARGis set in the flags for the enter system call.Available since kernel 5.11.
FEAT_NATIVE_WORKERS- If this flag is set,io_uringis using native workers for its async helpers.Previous kernels used kernel threads that assumed the identity of the original
io_uringowning task, but later kernels will actively create what looks more like regular process threads instead.Available since kernel 5.12.
FEAT_RSRC_TAGS- If this flag is set, thenio_uringsupports a variety of features related to fixed files and buffers.In particular, it indicates that registered buffers can be updated in-place, whereas before the full set would have to be unregistered first.
Available since kernel 5.13.
FEAT_CQE_SKIP- If this flag is set, then io_uring supports settingIOSQE_CQE_SKIP_SUCCESSin the submitted SQE, indicating that no CQE should be generated for this SQE if it executes normally. If an error happens processing the SQE, a CQE with the appropriate error value will still be generated.Available since kernel 5.17.
FEAT_LINKED_FILE- If this flag is set, then io_uring supports sane assignment of files for SQEs that have dependencies. For example, if a chain of SQEs are submitted withIOSQE_IO_LINK, then kernels without this flag will prepare the file for each link upfront. If a previous link opens a file with a known index, eg if direct descriptors are used with open or accept, then file assignment needs to happen post execution of that SQE. If this flag is set, then the kernel will defer file assignment until execution of a given request is started.Available since kernel 5.17.
FEAT_REG_REG_RINGFEAT_RECVSEND_BUNDLEFEAT_MIN_TIMEOUT
- See Also:
-
IORING_FEAT_CQE_SKIP
public static final int IORING_FEAT_CQE_SKIPio_uring_params->featuresflagsEnum values:
FEAT_SINGLE_MMAP- If this flag is set, the two SQ and CQ rings can be mapped with a singlemmap(2)call.The SQEs must still be allocated separately. This brings the necessary
mmap(2)calls down from three to two.Available since kernel 5.4.
FEAT_NODROP- If this flag is set,io_uringsupports never dropping completion events.If a completion event occurs and the CQ ring is full, the kernel stores the event internally until such a time that the CQ ring has room for more entries. If this overflow condition is entered, attempting to submit more IO will fail with the
-EBUSYerror value, if it can't flush the overflown events to the CQ ring. If this happens, the application must reap events from the CQ ring and attempt the submit again.Available since kernel 5.5.
FEAT_SUBMIT_STABLE- If this flag is set, applications can be certain that any data for async offload has been consumed when the kernel has consumed the SQE.Available since kernel 5.5.
FEAT_RW_CUR_POS- If this flag is set, applications can specifyoffset == -1withIORING_OP_{READV,WRITEV},IORING_OP_{READ,WRITE}_FIXED, andIORING_OP_{READ,WRITE}to mean current file position, which behaves likepreadv2(2)andpwritev2(2)withoffset == -1.It'll use (and update) the current file position. This obviously comes with the caveat that if the application has multiple reads or writes in flight, then the end result will not be as expected. This is similar to threads sharing a file descriptor and doing IO using the current file position.
Available since kernel 5.6.
FEAT_CUR_PERSONALITY- If this flag is set, thenio_uringguarantees that both sync and async execution of a request assumes the credentials of the task that calledenterto queue the requests.If this flag isn't set, then requests are issued with the credentials of the task that originally registered the
io_uring. If only one task is using a ring, then this flag doesn't matter as the credentials will always be the same. Note that this is the default behavior, tasks can still register different personalities throughregisterwithREGISTER_PERSONALITYand specify the personality to use in the sqe.Available since kernel 5.6.
FEAT_FAST_POLL- If this flag is set, thenio_uringsupports using an internal poll mechanism to drive data/space readiness.This means that requests that cannot read or write data to a file no longer need to be punted to an async thread for handling, instead they will begin operation when the file is ready. This is similar to doing poll + read/write in userspace, but eliminates the need to do so. If this flag is set, requests waiting on space/data consume a lot less resources doing so as they are not blocking a thread.
Available since kernel 5.7.
FEAT_POLL_32BITS- If this flag is set, theOP_POLL_ADDcommand accepts the full 32-bit range of epoll based flags.Most notably
EPOLLEXCLUSIVEwhich allows exclusive (waking single waiters) behavior.Available since kernel 5.9.
FEAT_SQPOLL_NONFIXED- If this flag is set, theSETUP_SQPOLLfeature no longer requires the use of fixed files.Any normal file descriptor can be used for IO commands without needing registration.
Available since kernel 5.11.
FEAT_EXT_ARG- If this flag is set, then theentersystem call supports passing in an extended argument instead of just thesigset_tof earlier kernels.This extended argument is of type
struct io_uring_getevents_argand allows the caller to pass in both asigset_tand a timeout argument for waiting on events. A pointer to this struct must be passed in ifENTER_EXT_ARGis set in the flags for the enter system call.Available since kernel 5.11.
FEAT_NATIVE_WORKERS- If this flag is set,io_uringis using native workers for its async helpers.Previous kernels used kernel threads that assumed the identity of the original
io_uringowning task, but later kernels will actively create what looks more like regular process threads instead.Available since kernel 5.12.
FEAT_RSRC_TAGS- If this flag is set, thenio_uringsupports a variety of features related to fixed files and buffers.In particular, it indicates that registered buffers can be updated in-place, whereas before the full set would have to be unregistered first.
Available since kernel 5.13.
FEAT_CQE_SKIP- If this flag is set, then io_uring supports settingIOSQE_CQE_SKIP_SUCCESSin the submitted SQE, indicating that no CQE should be generated for this SQE if it executes normally. If an error happens processing the SQE, a CQE with the appropriate error value will still be generated.Available since kernel 5.17.
FEAT_LINKED_FILE- If this flag is set, then io_uring supports sane assignment of files for SQEs that have dependencies. For example, if a chain of SQEs are submitted withIOSQE_IO_LINK, then kernels without this flag will prepare the file for each link upfront. If a previous link opens a file with a known index, eg if direct descriptors are used with open or accept, then file assignment needs to happen post execution of that SQE. If this flag is set, then the kernel will defer file assignment until execution of a given request is started.Available since kernel 5.17.
FEAT_REG_REG_RINGFEAT_RECVSEND_BUNDLEFEAT_MIN_TIMEOUT
- See Also:
-
IORING_FEAT_LINKED_FILE
public static final int IORING_FEAT_LINKED_FILEio_uring_params->featuresflagsEnum values:
FEAT_SINGLE_MMAP- If this flag is set, the two SQ and CQ rings can be mapped with a singlemmap(2)call.The SQEs must still be allocated separately. This brings the necessary
mmap(2)calls down from three to two.Available since kernel 5.4.
FEAT_NODROP- If this flag is set,io_uringsupports never dropping completion events.If a completion event occurs and the CQ ring is full, the kernel stores the event internally until such a time that the CQ ring has room for more entries. If this overflow condition is entered, attempting to submit more IO will fail with the
-EBUSYerror value, if it can't flush the overflown events to the CQ ring. If this happens, the application must reap events from the CQ ring and attempt the submit again.Available since kernel 5.5.
FEAT_SUBMIT_STABLE- If this flag is set, applications can be certain that any data for async offload has been consumed when the kernel has consumed the SQE.Available since kernel 5.5.
FEAT_RW_CUR_POS- If this flag is set, applications can specifyoffset == -1withIORING_OP_{READV,WRITEV},IORING_OP_{READ,WRITE}_FIXED, andIORING_OP_{READ,WRITE}to mean current file position, which behaves likepreadv2(2)andpwritev2(2)withoffset == -1.It'll use (and update) the current file position. This obviously comes with the caveat that if the application has multiple reads or writes in flight, then the end result will not be as expected. This is similar to threads sharing a file descriptor and doing IO using the current file position.
Available since kernel 5.6.
FEAT_CUR_PERSONALITY- If this flag is set, thenio_uringguarantees that both sync and async execution of a request assumes the credentials of the task that calledenterto queue the requests.If this flag isn't set, then requests are issued with the credentials of the task that originally registered the
io_uring. If only one task is using a ring, then this flag doesn't matter as the credentials will always be the same. Note that this is the default behavior, tasks can still register different personalities throughregisterwithREGISTER_PERSONALITYand specify the personality to use in the sqe.Available since kernel 5.6.
FEAT_FAST_POLL- If this flag is set, thenio_uringsupports using an internal poll mechanism to drive data/space readiness.This means that requests that cannot read or write data to a file no longer need to be punted to an async thread for handling, instead they will begin operation when the file is ready. This is similar to doing poll + read/write in userspace, but eliminates the need to do so. If this flag is set, requests waiting on space/data consume a lot less resources doing so as they are not blocking a thread.
Available since kernel 5.7.
FEAT_POLL_32BITS- If this flag is set, theOP_POLL_ADDcommand accepts the full 32-bit range of epoll based flags.Most notably
EPOLLEXCLUSIVEwhich allows exclusive (waking single waiters) behavior.Available since kernel 5.9.
FEAT_SQPOLL_NONFIXED- If this flag is set, theSETUP_SQPOLLfeature no longer requires the use of fixed files.Any normal file descriptor can be used for IO commands without needing registration.
Available since kernel 5.11.
FEAT_EXT_ARG- If this flag is set, then theentersystem call supports passing in an extended argument instead of just thesigset_tof earlier kernels.This extended argument is of type
struct io_uring_getevents_argand allows the caller to pass in both asigset_tand a timeout argument for waiting on events. A pointer to this struct must be passed in ifENTER_EXT_ARGis set in the flags for the enter system call.Available since kernel 5.11.
FEAT_NATIVE_WORKERS- If this flag is set,io_uringis using native workers for its async helpers.Previous kernels used kernel threads that assumed the identity of the original
io_uringowning task, but later kernels will actively create what looks more like regular process threads instead.Available since kernel 5.12.
FEAT_RSRC_TAGS- If this flag is set, thenio_uringsupports a variety of features related to fixed files and buffers.In particular, it indicates that registered buffers can be updated in-place, whereas before the full set would have to be unregistered first.
Available since kernel 5.13.
FEAT_CQE_SKIP- If this flag is set, then io_uring supports settingIOSQE_CQE_SKIP_SUCCESSin the submitted SQE, indicating that no CQE should be generated for this SQE if it executes normally. If an error happens processing the SQE, a CQE with the appropriate error value will still be generated.Available since kernel 5.17.
FEAT_LINKED_FILE- If this flag is set, then io_uring supports sane assignment of files for SQEs that have dependencies. For example, if a chain of SQEs are submitted withIOSQE_IO_LINK, then kernels without this flag will prepare the file for each link upfront. If a previous link opens a file with a known index, eg if direct descriptors are used with open or accept, then file assignment needs to happen post execution of that SQE. If this flag is set, then the kernel will defer file assignment until execution of a given request is started.Available since kernel 5.17.
FEAT_REG_REG_RINGFEAT_RECVSEND_BUNDLEFEAT_MIN_TIMEOUT
- See Also:
-
IORING_FEAT_REG_REG_RING
public static final int IORING_FEAT_REG_REG_RINGio_uring_params->featuresflagsEnum values:
FEAT_SINGLE_MMAP- If this flag is set, the two SQ and CQ rings can be mapped with a singlemmap(2)call.The SQEs must still be allocated separately. This brings the necessary
mmap(2)calls down from three to two.Available since kernel 5.4.
FEAT_NODROP- If this flag is set,io_uringsupports never dropping completion events.If a completion event occurs and the CQ ring is full, the kernel stores the event internally until such a time that the CQ ring has room for more entries. If this overflow condition is entered, attempting to submit more IO will fail with the
-EBUSYerror value, if it can't flush the overflown events to the CQ ring. If this happens, the application must reap events from the CQ ring and attempt the submit again.Available since kernel 5.5.
FEAT_SUBMIT_STABLE- If this flag is set, applications can be certain that any data for async offload has been consumed when the kernel has consumed the SQE.Available since kernel 5.5.
FEAT_RW_CUR_POS- If this flag is set, applications can specifyoffset == -1withIORING_OP_{READV,WRITEV},IORING_OP_{READ,WRITE}_FIXED, andIORING_OP_{READ,WRITE}to mean current file position, which behaves likepreadv2(2)andpwritev2(2)withoffset == -1.It'll use (and update) the current file position. This obviously comes with the caveat that if the application has multiple reads or writes in flight, then the end result will not be as expected. This is similar to threads sharing a file descriptor and doing IO using the current file position.
Available since kernel 5.6.
FEAT_CUR_PERSONALITY- If this flag is set, thenio_uringguarantees that both sync and async execution of a request assumes the credentials of the task that calledenterto queue the requests.If this flag isn't set, then requests are issued with the credentials of the task that originally registered the
io_uring. If only one task is using a ring, then this flag doesn't matter as the credentials will always be the same. Note that this is the default behavior, tasks can still register different personalities throughregisterwithREGISTER_PERSONALITYand specify the personality to use in the sqe.Available since kernel 5.6.
FEAT_FAST_POLL- If this flag is set, thenio_uringsupports using an internal poll mechanism to drive data/space readiness.This means that requests that cannot read or write data to a file no longer need to be punted to an async thread for handling, instead they will begin operation when the file is ready. This is similar to doing poll + read/write in userspace, but eliminates the need to do so. If this flag is set, requests waiting on space/data consume a lot less resources doing so as they are not blocking a thread.
Available since kernel 5.7.
FEAT_POLL_32BITS- If this flag is set, theOP_POLL_ADDcommand accepts the full 32-bit range of epoll based flags.Most notably
EPOLLEXCLUSIVEwhich allows exclusive (waking single waiters) behavior.Available since kernel 5.9.
FEAT_SQPOLL_NONFIXED- If this flag is set, theSETUP_SQPOLLfeature no longer requires the use of fixed files.Any normal file descriptor can be used for IO commands without needing registration.
Available since kernel 5.11.
FEAT_EXT_ARG- If this flag is set, then theentersystem call supports passing in an extended argument instead of just thesigset_tof earlier kernels.This extended argument is of type
struct io_uring_getevents_argand allows the caller to pass in both asigset_tand a timeout argument for waiting on events. A pointer to this struct must be passed in ifENTER_EXT_ARGis set in the flags for the enter system call.Available since kernel 5.11.
FEAT_NATIVE_WORKERS- If this flag is set,io_uringis using native workers for its async helpers.Previous kernels used kernel threads that assumed the identity of the original
io_uringowning task, but later kernels will actively create what looks more like regular process threads instead.Available since kernel 5.12.
FEAT_RSRC_TAGS- If this flag is set, thenio_uringsupports a variety of features related to fixed files and buffers.In particular, it indicates that registered buffers can be updated in-place, whereas before the full set would have to be unregistered first.
Available since kernel 5.13.
FEAT_CQE_SKIP- If this flag is set, then io_uring supports settingIOSQE_CQE_SKIP_SUCCESSin the submitted SQE, indicating that no CQE should be generated for this SQE if it executes normally. If an error happens processing the SQE, a CQE with the appropriate error value will still be generated.Available since kernel 5.17.
FEAT_LINKED_FILE- If this flag is set, then io_uring supports sane assignment of files for SQEs that have dependencies. For example, if a chain of SQEs are submitted withIOSQE_IO_LINK, then kernels without this flag will prepare the file for each link upfront. If a previous link opens a file with a known index, eg if direct descriptors are used with open or accept, then file assignment needs to happen post execution of that SQE. If this flag is set, then the kernel will defer file assignment until execution of a given request is started.Available since kernel 5.17.
FEAT_REG_REG_RINGFEAT_RECVSEND_BUNDLEFEAT_MIN_TIMEOUT
- See Also:
-
IORING_FEAT_RECVSEND_BUNDLE
public static final int IORING_FEAT_RECVSEND_BUNDLEio_uring_params->featuresflagsEnum values:
FEAT_SINGLE_MMAP- If this flag is set, the two SQ and CQ rings can be mapped with a singlemmap(2)call.The SQEs must still be allocated separately. This brings the necessary
mmap(2)calls down from three to two.Available since kernel 5.4.
FEAT_NODROP- If this flag is set,io_uringsupports never dropping completion events.If a completion event occurs and the CQ ring is full, the kernel stores the event internally until such a time that the CQ ring has room for more entries. If this overflow condition is entered, attempting to submit more IO will fail with the
-EBUSYerror value, if it can't flush the overflown events to the CQ ring. If this happens, the application must reap events from the CQ ring and attempt the submit again.Available since kernel 5.5.
FEAT_SUBMIT_STABLE- If this flag is set, applications can be certain that any data for async offload has been consumed when the kernel has consumed the SQE.Available since kernel 5.5.
FEAT_RW_CUR_POS- If this flag is set, applications can specifyoffset == -1withIORING_OP_{READV,WRITEV},IORING_OP_{READ,WRITE}_FIXED, andIORING_OP_{READ,WRITE}to mean current file position, which behaves likepreadv2(2)andpwritev2(2)withoffset == -1.It'll use (and update) the current file position. This obviously comes with the caveat that if the application has multiple reads or writes in flight, then the end result will not be as expected. This is similar to threads sharing a file descriptor and doing IO using the current file position.
Available since kernel 5.6.
FEAT_CUR_PERSONALITY- If this flag is set, thenio_uringguarantees that both sync and async execution of a request assumes the credentials of the task that calledenterto queue the requests.If this flag isn't set, then requests are issued with the credentials of the task that originally registered the
io_uring. If only one task is using a ring, then this flag doesn't matter as the credentials will always be the same. Note that this is the default behavior, tasks can still register different personalities throughregisterwithREGISTER_PERSONALITYand specify the personality to use in the sqe.Available since kernel 5.6.
FEAT_FAST_POLL- If this flag is set, thenio_uringsupports using an internal poll mechanism to drive data/space readiness.This means that requests that cannot read or write data to a file no longer need to be punted to an async thread for handling, instead they will begin operation when the file is ready. This is similar to doing poll + read/write in userspace, but eliminates the need to do so. If this flag is set, requests waiting on space/data consume a lot less resources doing so as they are not blocking a thread.
Available since kernel 5.7.
FEAT_POLL_32BITS- If this flag is set, theOP_POLL_ADDcommand accepts the full 32-bit range of epoll based flags.Most notably
EPOLLEXCLUSIVEwhich allows exclusive (waking single waiters) behavior.Available since kernel 5.9.
FEAT_SQPOLL_NONFIXED- If this flag is set, theSETUP_SQPOLLfeature no longer requires the use of fixed files.Any normal file descriptor can be used for IO commands without needing registration.
Available since kernel 5.11.
FEAT_EXT_ARG- If this flag is set, then theentersystem call supports passing in an extended argument instead of just thesigset_tof earlier kernels.This extended argument is of type
struct io_uring_getevents_argand allows the caller to pass in both asigset_tand a timeout argument for waiting on events. A pointer to this struct must be passed in ifENTER_EXT_ARGis set in the flags for the enter system call.Available since kernel 5.11.
FEAT_NATIVE_WORKERS- If this flag is set,io_uringis using native workers for its async helpers.Previous kernels used kernel threads that assumed the identity of the original
io_uringowning task, but later kernels will actively create what looks more like regular process threads instead.Available since kernel 5.12.
FEAT_RSRC_TAGS- If this flag is set, thenio_uringsupports a variety of features related to fixed files and buffers.In particular, it indicates that registered buffers can be updated in-place, whereas before the full set would have to be unregistered first.
Available since kernel 5.13.
FEAT_CQE_SKIP- If this flag is set, then io_uring supports settingIOSQE_CQE_SKIP_SUCCESSin the submitted SQE, indicating that no CQE should be generated for this SQE if it executes normally. If an error happens processing the SQE, a CQE with the appropriate error value will still be generated.Available since kernel 5.17.
FEAT_LINKED_FILE- If this flag is set, then io_uring supports sane assignment of files for SQEs that have dependencies. For example, if a chain of SQEs are submitted withIOSQE_IO_LINK, then kernels without this flag will prepare the file for each link upfront. If a previous link opens a file with a known index, eg if direct descriptors are used with open or accept, then file assignment needs to happen post execution of that SQE. If this flag is set, then the kernel will defer file assignment until execution of a given request is started.Available since kernel 5.17.
FEAT_REG_REG_RINGFEAT_RECVSEND_BUNDLEFEAT_MIN_TIMEOUT
- See Also:
-
IORING_FEAT_MIN_TIMEOUT
public static final int IORING_FEAT_MIN_TIMEOUTio_uring_params->featuresflagsEnum values:
FEAT_SINGLE_MMAP- If this flag is set, the two SQ and CQ rings can be mapped with a singlemmap(2)call.The SQEs must still be allocated separately. This brings the necessary
mmap(2)calls down from three to two.Available since kernel 5.4.
FEAT_NODROP- If this flag is set,io_uringsupports never dropping completion events.If a completion event occurs and the CQ ring is full, the kernel stores the event internally until such a time that the CQ ring has room for more entries. If this overflow condition is entered, attempting to submit more IO will fail with the
-EBUSYerror value, if it can't flush the overflown events to the CQ ring. If this happens, the application must reap events from the CQ ring and attempt the submit again.Available since kernel 5.5.
FEAT_SUBMIT_STABLE- If this flag is set, applications can be certain that any data for async offload has been consumed when the kernel has consumed the SQE.Available since kernel 5.5.
FEAT_RW_CUR_POS- If this flag is set, applications can specifyoffset == -1withIORING_OP_{READV,WRITEV},IORING_OP_{READ,WRITE}_FIXED, andIORING_OP_{READ,WRITE}to mean current file position, which behaves likepreadv2(2)andpwritev2(2)withoffset == -1.It'll use (and update) the current file position. This obviously comes with the caveat that if the application has multiple reads or writes in flight, then the end result will not be as expected. This is similar to threads sharing a file descriptor and doing IO using the current file position.
Available since kernel 5.6.
FEAT_CUR_PERSONALITY- If this flag is set, thenio_uringguarantees that both sync and async execution of a request assumes the credentials of the task that calledenterto queue the requests.If this flag isn't set, then requests are issued with the credentials of the task that originally registered the
io_uring. If only one task is using a ring, then this flag doesn't matter as the credentials will always be the same. Note that this is the default behavior, tasks can still register different personalities throughregisterwithREGISTER_PERSONALITYand specify the personality to use in the sqe.Available since kernel 5.6.
FEAT_FAST_POLL- If this flag is set, thenio_uringsupports using an internal poll mechanism to drive data/space readiness.This means that requests that cannot read or write data to a file no longer need to be punted to an async thread for handling, instead they will begin operation when the file is ready. This is similar to doing poll + read/write in userspace, but eliminates the need to do so. If this flag is set, requests waiting on space/data consume a lot less resources doing so as they are not blocking a thread.
Available since kernel 5.7.
FEAT_POLL_32BITS- If this flag is set, theOP_POLL_ADDcommand accepts the full 32-bit range of epoll based flags.Most notably
EPOLLEXCLUSIVEwhich allows exclusive (waking single waiters) behavior.Available since kernel 5.9.
FEAT_SQPOLL_NONFIXED- If this flag is set, theSETUP_SQPOLLfeature no longer requires the use of fixed files.Any normal file descriptor can be used for IO commands without needing registration.
Available since kernel 5.11.
FEAT_EXT_ARG- If this flag is set, then theentersystem call supports passing in an extended argument instead of just thesigset_tof earlier kernels.This extended argument is of type
struct io_uring_getevents_argand allows the caller to pass in both asigset_tand a timeout argument for waiting on events. A pointer to this struct must be passed in ifENTER_EXT_ARGis set in the flags for the enter system call.Available since kernel 5.11.
FEAT_NATIVE_WORKERS- If this flag is set,io_uringis using native workers for its async helpers.Previous kernels used kernel threads that assumed the identity of the original
io_uringowning task, but later kernels will actively create what looks more like regular process threads instead.Available since kernel 5.12.
FEAT_RSRC_TAGS- If this flag is set, thenio_uringsupports a variety of features related to fixed files and buffers.In particular, it indicates that registered buffers can be updated in-place, whereas before the full set would have to be unregistered first.
Available since kernel 5.13.
FEAT_CQE_SKIP- If this flag is set, then io_uring supports settingIOSQE_CQE_SKIP_SUCCESSin the submitted SQE, indicating that no CQE should be generated for this SQE if it executes normally. If an error happens processing the SQE, a CQE with the appropriate error value will still be generated.Available since kernel 5.17.
FEAT_LINKED_FILE- If this flag is set, then io_uring supports sane assignment of files for SQEs that have dependencies. For example, if a chain of SQEs are submitted withIOSQE_IO_LINK, then kernels without this flag will prepare the file for each link upfront. If a previous link opens a file with a known index, eg if direct descriptors are used with open or accept, then file assignment needs to happen post execution of that SQE. If this flag is set, then the kernel will defer file assignment until execution of a given request is started.Available since kernel 5.17.
FEAT_REG_REG_RINGFEAT_RECVSEND_BUNDLEFEAT_MIN_TIMEOUT
- See Also:
-
IORING_REGISTER_BUFFERS
public static final int IORING_REGISTER_BUFFERSio_uring_register_opEnum values:
REGISTER_BUFFERS-argpoints to a structiovecarray ofnr_argsentries.The buffers associated with the
iovecswill be locked in memory and charged against the user'sRLIMIT_MEMLOCKresource limit. Seegetrlimit(2)for more information. Additionally, there is a size limit of 1GiB per buffer. Currently, the buffers must be anonymous, non-file-backed memory, such as that returned bymalloc(3)ormmap(2)with theMAP_ANONYMOUSflag set. It is expected that this limitation will be lifted in the future. Huge pages are supported as well. Note that the entire huge page will be pinned in the kernel, even if only a portion of it is used.After a successful call, the supplied buffers are mapped into the kernel and eligible for I/O. To make use of them, the application must specify the
OP_READ_FIXEDorOP_WRITE_FIXEDopcodesin the submission queue entry (see the structio_uring_sqedefinition inenter), and set thebuf_indexfield to the desired buffer index. The memory range described by the submission queue entry'saddrandlenfields must fall within the indexed buffer.It is perfectly valid to setup a large buffer and then only use part of it for an I/O, as long as the range is within the originally mapped region.
An application can increase or decrease the size or number of registered buffers by first unregistering the existing buffers, and then issuing a new call to
io_uring_register()with the new buffers.Note that before 5.13 registering buffers would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding.
An application need not unregister buffers explicitly before shutting down the io_uring instance.
Available since 5.1.
UNREGISTER_BUFFERS- This operation takes no argument, andargmust be passed asNULL.All previously registered buffers associated with the
io_uringinstance will be released.Available since 5.1.
REGISTER_FILES- Register files for I/O.argcontains a pointer to an array ofnr_argsfile descriptors (signed 32 bit integers). To make use of the registered files, theIOSQE_FIXED_FILEflag must be set in theflagsmember of the structio_uring_sqe, and thefdmember is set to the index of the file in the file descriptor array.The file set may be sparse, meaning that the
fdfield in the array may be set to -1. SeeREGISTER_FILES_UPDATEfor how to update files in place.Note that before 5.13 registering files would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding. See
REGISTER_FILES_UPDATEfor how to update an existing set without that limitation.Files are automatically unregistered when the io_uring instance is torn down. An application needs only unregister if it wishes to register a new set of fds.
Available since 5.1.
UNREGISTER_FILES- This operation requires no argument, andargmust be passed asNULL.All previously registered files associated with the
io_uringinstance will be unregistered.Available since 5.1.
REGISTER_EVENTFD- It's possible to useeventfd(2)to get notified of completion events on anio_uringinstance. If this is desired, an eventfd file descriptor can be registered through this operation.argmust contain a pointer to the eventfd file descriptor, andnr_argsmust be 1.Available since 5.2.
An application can temporarily disable notifications, coming through the registered eventfd, by setting the
CQ_EVENTFD_DISABLEDbit in theflagsfield of the CQ ring.Available since 5.8.
UNREGISTER_EVENTFD- Unregister an eventfd file descriptor to stop notifications.Since only one eventfd descriptor is currently supported, this operation takes no argument, and
argmust be passed asNULLandnr_argsmust be zero.Available since 5.2.
REGISTER_FILES_UPDATE- This operation replaces existing files in the registered file set with new ones, either turning a sparse entry (one wherefdis equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_files_update, which contains an offset on which to start the update, and an array of file descriptors to use for the update.nr_argsmust contain the number of descriptors in the passed in array.Available since 5.5.
File descriptors can be skipped if they are set to
REGISTER_FILES_SKIP. Skipping an fd will not touch the file associated with the previous fd at that index.Available since 5.12.
REGISTER_EVENTFD_ASYNC- This works just likeREGISTER_EVENTFD, except notifications are only posted for events that complete in an async manner.This means that events that complete inline while being submitted do not trigger a notification event. The arguments supplied are the same as for
IORING_REGISTER_EVENTFD.Available since 5.6.
REGISTER_PROBE- This operation returns a structure,io_uring_probe, which contains information about theopcodessupported byio_uringon the running kernel.argmust contain a pointer to a structio_uring_probe, andnr_argsmust contain the size of the ops array in that probe struct. Theopsarray is of the typeio_uring_probe_op, which holds the value of theopcodeand aflagsfield. If the flags field hasIO_URING_OP_SUPPORTEDset, then this opcode is supported on the running kernel.Available since 5.6.
REGISTER_PERSONALITY- This operation registers credentials of the running application withio_uring, and returns an id associated with these credentials.Applications wishing to share a ring between separate users/processes can pass in this credential id in the sqe personality field. If set, that particular sqe will be issued with these credentials. Must be invoked with
argset toNULLandnr_argsset to zero.Available since 5.6.
UNREGISTER_PERSONALITY- This operation unregisters a previously registered personality withio_uring.nr_argsmust be set to the id in question, andargmust be set toNULL.Available since 5.6.
REGISTER_RESTRICTIONS-argpoints to a structio_uring_restrictionarray ofnr_argsentries.With an entry it is possible to allow an
registeropcode, or specify whichopcodeand flags of the submission queue entry are allowed, or require certain flags to be specified (these flags must be set on each submission queue entry).All the restrictions must be submitted with a single
io_uring_register()call and they are handled as an allowlist (opcodesand flags not registered, are not allowed).Restrictions can be registered only if the
io_uringring started in a disabled state (SETUP_R_DISABLEDmust be specified in the call tosetup).Available since 5.10.
REGISTER_ENABLE_RINGS- This operation enables anio_uringring started in a disabled state (SETUP_R_DISABLEDwas specified in the call tosetup).While the
io_uringring is disabled, submissions are not allowed and registrations are not restricted. After the execution of this operation, theio_uringring is enabled: submissions and registration are allowed, but they will be validated following the registered restrictions (if any). This operation takes no argument, must be invoked withargset toNULLandnr_argsset to zero.Available since 5.10.
REGISTER_FILES2- Register files for I/O. Similar toREGISTER_FILES.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The
datafield contains a pointer to an array ofnrfile descriptors (signed 32 bit integers).tagsfield should either be 0 or or point to an array ofnr"tags" (unsigned 64 bit integers). SeeREGISTER_BUFFERS2for more info on resource tagging.Note that resource updates, e.g.
REGISTER_FILES_UPDATE, don't necessarily deallocate resources, they might be held until all requests using that resource complete.Available since 5.13.
REGISTER_FILES_UPDATE2- Similar toREGISTER_FILES_UPDATE, replaces existing files in the registered file set with new ones, either turning a sparse entry (one where fd is equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of file descriptors to use for the update stored in data.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_BUFFERS2- Register buffers for I/O.Similar to
REGISTER_BUFFERSbut aims to have a more extensible ABI.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The data field contains a pointer to a struct
iovecarray ofnrentries. Thetagsfield should either be 0, then tagging is disabled, or point to an array ofnr"tags" (unsigned 64 bit integers). If a tag is zero, then tagging for this particular resource (a buffer in this case) is disabled. Otherwise, after the resource had been unregistered and it's not used anymore, a CQE will be posted withuser_dataset to the specified tag and all other fields zeroed.Note that resource updates, e.g.
REGISTER_BUFFERS_UPDATE, don't necessarily deallocate resources by the time it returns, but they might be held alive until all requests using it complete.Available since 5.13.
REGISTER_BUFFERS_UPDATE- Updates registered buffers with new ones, either turning a sparse entry into a real one, or replacing an existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of structiovec.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_IOWQ_AFF- By default, async workers created byio_uringwill inherit the CPU mask of its parent.This is usually all the CPUs in the system, unless the parent is being run with a limited set. If this isn't the desired outcome, the application may explicitly tell
io_uringwhat CPUs the async workers may run on.argmust point to acpu_set_tmask, andnr_argsthe byte size of that mask.Available since 5.14.
UNREGISTER_IOWQ_AFF- Undoes a CPU mask previously set withREGISTER_IOWQ_AFF.Must not have
argornr_argsset.Available since 5.14.
REGISTER_IOWQ_MAX_WORKERS- By default,io_uringlimits the unbounded workers created to the maximum processor count set byRLIMIT_NPROCand the bounded workers is a function of the SQ ring size and the number of CPUs in the system. Sometimes this can be excessive (or too little, for bounded), and this command provides a way to change the count per ring (per NUMA node) instead.argmust be set to an unsigned int pointer to an array of two values, with the values in the array being set to the maximum count of workers per NUMA node. Index 0 holds the bounded worker count, and index 1 holds the unbounded worker count. On successful return, the passed in array will contain the previous maximum valyes for each type. If the count being passed in is 0, then this command returns the current maximum values and doesn't modify the current setting.nr_argsmust be set to 2, as the command takes two values.Available since 5.15.
REGISTER_RING_FDS- Wheneverenteris called to submit request or wait for completions, the kernel must grab a reference to the file descriptor. If the application using io_uring is threaded, the file table is marked as shared, and the reference grab and put of the file descriptor count is more expensive than it is for a non-threaded application.Similarly to how io_uring allows registration of files, this allow registration of the ring file descriptor itself. This reduces the overhead of the
io_uring_enter (2)system call.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Thedatafield of this struct must point to an io_uring file descriptor, and theoffsetfield can be either-1or an explicit offset desired for the registered file descriptor value. If-1is used, then upon successful return of this system call, the field will contain the value of the registered file descriptor to be used for futureio_uring_enter (2)system calls.On successful completion of this request, the returned descriptors may be used instead of the real file descriptor for
io_uring_enter (2), provided thatIORING_ENTER_REGISTERED_RINGis set in theflagsfor the system call. This flag tells the kernel that a registered descriptor is used rather than a real file descriptor.Each thread or process using a ring must register the file descriptor directly by issuing this request.
The maximum number of supported registered ring descriptors is currently limited to
16.Available since 5.18.
UNREGISTER_RING_FDS- Unregister descriptors previously registered withREGISTER_RING_FDS.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Only theoffsetfield should be set in the structure, containing the registered file descriptor offset previously returned fromIORING_REGISTER_RING_FDSthat the application wishes to unregister.Note that this isn't done automatically on ring exit, if the thread or task that previously registered a ring file descriptor isn't exiting. It is recommended to manually unregister any previously registered ring descriptors if the ring is closed and the task persists. This will free up a registration slot, making it available for future use.
Available since 5.18.
REGISTER_PBUF_RING- register ring based provide buffer groupUNREGISTER_PBUF_RING- unregister ring based provide buffer groupREGISTER_SYNC_CANCEL- sync cancelation APIREGISTER_FILE_ALLOC_RANGE- register a range of fixed file slots for automatic slot allocationREGISTER_PBUF_STATUS- return status information for a buffer groupREGISTER_NAPI- set busy poll settingsUNREGISTER_NAPI- clear busy poll settingsREGISTER_CLOCKREGISTER_CLONE_BUFFERS- clone registered buffers from source ring to current ringREGISTER_LASTREGISTER_USE_REGISTERED_RING
- See Also:
-
IORING_UNREGISTER_BUFFERS
public static final int IORING_UNREGISTER_BUFFERSio_uring_register_opEnum values:
REGISTER_BUFFERS-argpoints to a structiovecarray ofnr_argsentries.The buffers associated with the
iovecswill be locked in memory and charged against the user'sRLIMIT_MEMLOCKresource limit. Seegetrlimit(2)for more information. Additionally, there is a size limit of 1GiB per buffer. Currently, the buffers must be anonymous, non-file-backed memory, such as that returned bymalloc(3)ormmap(2)with theMAP_ANONYMOUSflag set. It is expected that this limitation will be lifted in the future. Huge pages are supported as well. Note that the entire huge page will be pinned in the kernel, even if only a portion of it is used.After a successful call, the supplied buffers are mapped into the kernel and eligible for I/O. To make use of them, the application must specify the
OP_READ_FIXEDorOP_WRITE_FIXEDopcodesin the submission queue entry (see the structio_uring_sqedefinition inenter), and set thebuf_indexfield to the desired buffer index. The memory range described by the submission queue entry'saddrandlenfields must fall within the indexed buffer.It is perfectly valid to setup a large buffer and then only use part of it for an I/O, as long as the range is within the originally mapped region.
An application can increase or decrease the size or number of registered buffers by first unregistering the existing buffers, and then issuing a new call to
io_uring_register()with the new buffers.Note that before 5.13 registering buffers would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding.
An application need not unregister buffers explicitly before shutting down the io_uring instance.
Available since 5.1.
UNREGISTER_BUFFERS- This operation takes no argument, andargmust be passed asNULL.All previously registered buffers associated with the
io_uringinstance will be released.Available since 5.1.
REGISTER_FILES- Register files for I/O.argcontains a pointer to an array ofnr_argsfile descriptors (signed 32 bit integers). To make use of the registered files, theIOSQE_FIXED_FILEflag must be set in theflagsmember of the structio_uring_sqe, and thefdmember is set to the index of the file in the file descriptor array.The file set may be sparse, meaning that the
fdfield in the array may be set to -1. SeeREGISTER_FILES_UPDATEfor how to update files in place.Note that before 5.13 registering files would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding. See
REGISTER_FILES_UPDATEfor how to update an existing set without that limitation.Files are automatically unregistered when the io_uring instance is torn down. An application needs only unregister if it wishes to register a new set of fds.
Available since 5.1.
UNREGISTER_FILES- This operation requires no argument, andargmust be passed asNULL.All previously registered files associated with the
io_uringinstance will be unregistered.Available since 5.1.
REGISTER_EVENTFD- It's possible to useeventfd(2)to get notified of completion events on anio_uringinstance. If this is desired, an eventfd file descriptor can be registered through this operation.argmust contain a pointer to the eventfd file descriptor, andnr_argsmust be 1.Available since 5.2.
An application can temporarily disable notifications, coming through the registered eventfd, by setting the
CQ_EVENTFD_DISABLEDbit in theflagsfield of the CQ ring.Available since 5.8.
UNREGISTER_EVENTFD- Unregister an eventfd file descriptor to stop notifications.Since only one eventfd descriptor is currently supported, this operation takes no argument, and
argmust be passed asNULLandnr_argsmust be zero.Available since 5.2.
REGISTER_FILES_UPDATE- This operation replaces existing files in the registered file set with new ones, either turning a sparse entry (one wherefdis equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_files_update, which contains an offset on which to start the update, and an array of file descriptors to use for the update.nr_argsmust contain the number of descriptors in the passed in array.Available since 5.5.
File descriptors can be skipped if they are set to
REGISTER_FILES_SKIP. Skipping an fd will not touch the file associated with the previous fd at that index.Available since 5.12.
REGISTER_EVENTFD_ASYNC- This works just likeREGISTER_EVENTFD, except notifications are only posted for events that complete in an async manner.This means that events that complete inline while being submitted do not trigger a notification event. The arguments supplied are the same as for
IORING_REGISTER_EVENTFD.Available since 5.6.
REGISTER_PROBE- This operation returns a structure,io_uring_probe, which contains information about theopcodessupported byio_uringon the running kernel.argmust contain a pointer to a structio_uring_probe, andnr_argsmust contain the size of the ops array in that probe struct. Theopsarray is of the typeio_uring_probe_op, which holds the value of theopcodeand aflagsfield. If the flags field hasIO_URING_OP_SUPPORTEDset, then this opcode is supported on the running kernel.Available since 5.6.
REGISTER_PERSONALITY- This operation registers credentials of the running application withio_uring, and returns an id associated with these credentials.Applications wishing to share a ring between separate users/processes can pass in this credential id in the sqe personality field. If set, that particular sqe will be issued with these credentials. Must be invoked with
argset toNULLandnr_argsset to zero.Available since 5.6.
UNREGISTER_PERSONALITY- This operation unregisters a previously registered personality withio_uring.nr_argsmust be set to the id in question, andargmust be set toNULL.Available since 5.6.
REGISTER_RESTRICTIONS-argpoints to a structio_uring_restrictionarray ofnr_argsentries.With an entry it is possible to allow an
registeropcode, or specify whichopcodeand flags of the submission queue entry are allowed, or require certain flags to be specified (these flags must be set on each submission queue entry).All the restrictions must be submitted with a single
io_uring_register()call and they are handled as an allowlist (opcodesand flags not registered, are not allowed).Restrictions can be registered only if the
io_uringring started in a disabled state (SETUP_R_DISABLEDmust be specified in the call tosetup).Available since 5.10.
REGISTER_ENABLE_RINGS- This operation enables anio_uringring started in a disabled state (SETUP_R_DISABLEDwas specified in the call tosetup).While the
io_uringring is disabled, submissions are not allowed and registrations are not restricted. After the execution of this operation, theio_uringring is enabled: submissions and registration are allowed, but they will be validated following the registered restrictions (if any). This operation takes no argument, must be invoked withargset toNULLandnr_argsset to zero.Available since 5.10.
REGISTER_FILES2- Register files for I/O. Similar toREGISTER_FILES.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The
datafield contains a pointer to an array ofnrfile descriptors (signed 32 bit integers).tagsfield should either be 0 or or point to an array ofnr"tags" (unsigned 64 bit integers). SeeREGISTER_BUFFERS2for more info on resource tagging.Note that resource updates, e.g.
REGISTER_FILES_UPDATE, don't necessarily deallocate resources, they might be held until all requests using that resource complete.Available since 5.13.
REGISTER_FILES_UPDATE2- Similar toREGISTER_FILES_UPDATE, replaces existing files in the registered file set with new ones, either turning a sparse entry (one where fd is equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of file descriptors to use for the update stored in data.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_BUFFERS2- Register buffers for I/O.Similar to
REGISTER_BUFFERSbut aims to have a more extensible ABI.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The data field contains a pointer to a struct
iovecarray ofnrentries. Thetagsfield should either be 0, then tagging is disabled, or point to an array ofnr"tags" (unsigned 64 bit integers). If a tag is zero, then tagging for this particular resource (a buffer in this case) is disabled. Otherwise, after the resource had been unregistered and it's not used anymore, a CQE will be posted withuser_dataset to the specified tag and all other fields zeroed.Note that resource updates, e.g.
REGISTER_BUFFERS_UPDATE, don't necessarily deallocate resources by the time it returns, but they might be held alive until all requests using it complete.Available since 5.13.
REGISTER_BUFFERS_UPDATE- Updates registered buffers with new ones, either turning a sparse entry into a real one, or replacing an existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of structiovec.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_IOWQ_AFF- By default, async workers created byio_uringwill inherit the CPU mask of its parent.This is usually all the CPUs in the system, unless the parent is being run with a limited set. If this isn't the desired outcome, the application may explicitly tell
io_uringwhat CPUs the async workers may run on.argmust point to acpu_set_tmask, andnr_argsthe byte size of that mask.Available since 5.14.
UNREGISTER_IOWQ_AFF- Undoes a CPU mask previously set withREGISTER_IOWQ_AFF.Must not have
argornr_argsset.Available since 5.14.
REGISTER_IOWQ_MAX_WORKERS- By default,io_uringlimits the unbounded workers created to the maximum processor count set byRLIMIT_NPROCand the bounded workers is a function of the SQ ring size and the number of CPUs in the system. Sometimes this can be excessive (or too little, for bounded), and this command provides a way to change the count per ring (per NUMA node) instead.argmust be set to an unsigned int pointer to an array of two values, with the values in the array being set to the maximum count of workers per NUMA node. Index 0 holds the bounded worker count, and index 1 holds the unbounded worker count. On successful return, the passed in array will contain the previous maximum valyes for each type. If the count being passed in is 0, then this command returns the current maximum values and doesn't modify the current setting.nr_argsmust be set to 2, as the command takes two values.Available since 5.15.
REGISTER_RING_FDS- Wheneverenteris called to submit request or wait for completions, the kernel must grab a reference to the file descriptor. If the application using io_uring is threaded, the file table is marked as shared, and the reference grab and put of the file descriptor count is more expensive than it is for a non-threaded application.Similarly to how io_uring allows registration of files, this allow registration of the ring file descriptor itself. This reduces the overhead of the
io_uring_enter (2)system call.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Thedatafield of this struct must point to an io_uring file descriptor, and theoffsetfield can be either-1or an explicit offset desired for the registered file descriptor value. If-1is used, then upon successful return of this system call, the field will contain the value of the registered file descriptor to be used for futureio_uring_enter (2)system calls.On successful completion of this request, the returned descriptors may be used instead of the real file descriptor for
io_uring_enter (2), provided thatIORING_ENTER_REGISTERED_RINGis set in theflagsfor the system call. This flag tells the kernel that a registered descriptor is used rather than a real file descriptor.Each thread or process using a ring must register the file descriptor directly by issuing this request.
The maximum number of supported registered ring descriptors is currently limited to
16.Available since 5.18.
UNREGISTER_RING_FDS- Unregister descriptors previously registered withREGISTER_RING_FDS.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Only theoffsetfield should be set in the structure, containing the registered file descriptor offset previously returned fromIORING_REGISTER_RING_FDSthat the application wishes to unregister.Note that this isn't done automatically on ring exit, if the thread or task that previously registered a ring file descriptor isn't exiting. It is recommended to manually unregister any previously registered ring descriptors if the ring is closed and the task persists. This will free up a registration slot, making it available for future use.
Available since 5.18.
REGISTER_PBUF_RING- register ring based provide buffer groupUNREGISTER_PBUF_RING- unregister ring based provide buffer groupREGISTER_SYNC_CANCEL- sync cancelation APIREGISTER_FILE_ALLOC_RANGE- register a range of fixed file slots for automatic slot allocationREGISTER_PBUF_STATUS- return status information for a buffer groupREGISTER_NAPI- set busy poll settingsUNREGISTER_NAPI- clear busy poll settingsREGISTER_CLOCKREGISTER_CLONE_BUFFERS- clone registered buffers from source ring to current ringREGISTER_LASTREGISTER_USE_REGISTERED_RING
- See Also:
-
IORING_REGISTER_FILES
public static final int IORING_REGISTER_FILESio_uring_register_opEnum values:
REGISTER_BUFFERS-argpoints to a structiovecarray ofnr_argsentries.The buffers associated with the
iovecswill be locked in memory and charged against the user'sRLIMIT_MEMLOCKresource limit. Seegetrlimit(2)for more information. Additionally, there is a size limit of 1GiB per buffer. Currently, the buffers must be anonymous, non-file-backed memory, such as that returned bymalloc(3)ormmap(2)with theMAP_ANONYMOUSflag set. It is expected that this limitation will be lifted in the future. Huge pages are supported as well. Note that the entire huge page will be pinned in the kernel, even if only a portion of it is used.After a successful call, the supplied buffers are mapped into the kernel and eligible for I/O. To make use of them, the application must specify the
OP_READ_FIXEDorOP_WRITE_FIXEDopcodesin the submission queue entry (see the structio_uring_sqedefinition inenter), and set thebuf_indexfield to the desired buffer index. The memory range described by the submission queue entry'saddrandlenfields must fall within the indexed buffer.It is perfectly valid to setup a large buffer and then only use part of it for an I/O, as long as the range is within the originally mapped region.
An application can increase or decrease the size or number of registered buffers by first unregistering the existing buffers, and then issuing a new call to
io_uring_register()with the new buffers.Note that before 5.13 registering buffers would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding.
An application need not unregister buffers explicitly before shutting down the io_uring instance.
Available since 5.1.
UNREGISTER_BUFFERS- This operation takes no argument, andargmust be passed asNULL.All previously registered buffers associated with the
io_uringinstance will be released.Available since 5.1.
REGISTER_FILES- Register files for I/O.argcontains a pointer to an array ofnr_argsfile descriptors (signed 32 bit integers). To make use of the registered files, theIOSQE_FIXED_FILEflag must be set in theflagsmember of the structio_uring_sqe, and thefdmember is set to the index of the file in the file descriptor array.The file set may be sparse, meaning that the
fdfield in the array may be set to -1. SeeREGISTER_FILES_UPDATEfor how to update files in place.Note that before 5.13 registering files would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding. See
REGISTER_FILES_UPDATEfor how to update an existing set without that limitation.Files are automatically unregistered when the io_uring instance is torn down. An application needs only unregister if it wishes to register a new set of fds.
Available since 5.1.
UNREGISTER_FILES- This operation requires no argument, andargmust be passed asNULL.All previously registered files associated with the
io_uringinstance will be unregistered.Available since 5.1.
REGISTER_EVENTFD- It's possible to useeventfd(2)to get notified of completion events on anio_uringinstance. If this is desired, an eventfd file descriptor can be registered through this operation.argmust contain a pointer to the eventfd file descriptor, andnr_argsmust be 1.Available since 5.2.
An application can temporarily disable notifications, coming through the registered eventfd, by setting the
CQ_EVENTFD_DISABLEDbit in theflagsfield of the CQ ring.Available since 5.8.
UNREGISTER_EVENTFD- Unregister an eventfd file descriptor to stop notifications.Since only one eventfd descriptor is currently supported, this operation takes no argument, and
argmust be passed asNULLandnr_argsmust be zero.Available since 5.2.
REGISTER_FILES_UPDATE- This operation replaces existing files in the registered file set with new ones, either turning a sparse entry (one wherefdis equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_files_update, which contains an offset on which to start the update, and an array of file descriptors to use for the update.nr_argsmust contain the number of descriptors in the passed in array.Available since 5.5.
File descriptors can be skipped if they are set to
REGISTER_FILES_SKIP. Skipping an fd will not touch the file associated with the previous fd at that index.Available since 5.12.
REGISTER_EVENTFD_ASYNC- This works just likeREGISTER_EVENTFD, except notifications are only posted for events that complete in an async manner.This means that events that complete inline while being submitted do not trigger a notification event. The arguments supplied are the same as for
IORING_REGISTER_EVENTFD.Available since 5.6.
REGISTER_PROBE- This operation returns a structure,io_uring_probe, which contains information about theopcodessupported byio_uringon the running kernel.argmust contain a pointer to a structio_uring_probe, andnr_argsmust contain the size of the ops array in that probe struct. Theopsarray is of the typeio_uring_probe_op, which holds the value of theopcodeand aflagsfield. If the flags field hasIO_URING_OP_SUPPORTEDset, then this opcode is supported on the running kernel.Available since 5.6.
REGISTER_PERSONALITY- This operation registers credentials of the running application withio_uring, and returns an id associated with these credentials.Applications wishing to share a ring between separate users/processes can pass in this credential id in the sqe personality field. If set, that particular sqe will be issued with these credentials. Must be invoked with
argset toNULLandnr_argsset to zero.Available since 5.6.
UNREGISTER_PERSONALITY- This operation unregisters a previously registered personality withio_uring.nr_argsmust be set to the id in question, andargmust be set toNULL.Available since 5.6.
REGISTER_RESTRICTIONS-argpoints to a structio_uring_restrictionarray ofnr_argsentries.With an entry it is possible to allow an
registeropcode, or specify whichopcodeand flags of the submission queue entry are allowed, or require certain flags to be specified (these flags must be set on each submission queue entry).All the restrictions must be submitted with a single
io_uring_register()call and they are handled as an allowlist (opcodesand flags not registered, are not allowed).Restrictions can be registered only if the
io_uringring started in a disabled state (SETUP_R_DISABLEDmust be specified in the call tosetup).Available since 5.10.
REGISTER_ENABLE_RINGS- This operation enables anio_uringring started in a disabled state (SETUP_R_DISABLEDwas specified in the call tosetup).While the
io_uringring is disabled, submissions are not allowed and registrations are not restricted. After the execution of this operation, theio_uringring is enabled: submissions and registration are allowed, but they will be validated following the registered restrictions (if any). This operation takes no argument, must be invoked withargset toNULLandnr_argsset to zero.Available since 5.10.
REGISTER_FILES2- Register files for I/O. Similar toREGISTER_FILES.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The
datafield contains a pointer to an array ofnrfile descriptors (signed 32 bit integers).tagsfield should either be 0 or or point to an array ofnr"tags" (unsigned 64 bit integers). SeeREGISTER_BUFFERS2for more info on resource tagging.Note that resource updates, e.g.
REGISTER_FILES_UPDATE, don't necessarily deallocate resources, they might be held until all requests using that resource complete.Available since 5.13.
REGISTER_FILES_UPDATE2- Similar toREGISTER_FILES_UPDATE, replaces existing files in the registered file set with new ones, either turning a sparse entry (one where fd is equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of file descriptors to use for the update stored in data.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_BUFFERS2- Register buffers for I/O.Similar to
REGISTER_BUFFERSbut aims to have a more extensible ABI.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The data field contains a pointer to a struct
iovecarray ofnrentries. Thetagsfield should either be 0, then tagging is disabled, or point to an array ofnr"tags" (unsigned 64 bit integers). If a tag is zero, then tagging for this particular resource (a buffer in this case) is disabled. Otherwise, after the resource had been unregistered and it's not used anymore, a CQE will be posted withuser_dataset to the specified tag and all other fields zeroed.Note that resource updates, e.g.
REGISTER_BUFFERS_UPDATE, don't necessarily deallocate resources by the time it returns, but they might be held alive until all requests using it complete.Available since 5.13.
REGISTER_BUFFERS_UPDATE- Updates registered buffers with new ones, either turning a sparse entry into a real one, or replacing an existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of structiovec.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_IOWQ_AFF- By default, async workers created byio_uringwill inherit the CPU mask of its parent.This is usually all the CPUs in the system, unless the parent is being run with a limited set. If this isn't the desired outcome, the application may explicitly tell
io_uringwhat CPUs the async workers may run on.argmust point to acpu_set_tmask, andnr_argsthe byte size of that mask.Available since 5.14.
UNREGISTER_IOWQ_AFF- Undoes a CPU mask previously set withREGISTER_IOWQ_AFF.Must not have
argornr_argsset.Available since 5.14.
REGISTER_IOWQ_MAX_WORKERS- By default,io_uringlimits the unbounded workers created to the maximum processor count set byRLIMIT_NPROCand the bounded workers is a function of the SQ ring size and the number of CPUs in the system. Sometimes this can be excessive (or too little, for bounded), and this command provides a way to change the count per ring (per NUMA node) instead.argmust be set to an unsigned int pointer to an array of two values, with the values in the array being set to the maximum count of workers per NUMA node. Index 0 holds the bounded worker count, and index 1 holds the unbounded worker count. On successful return, the passed in array will contain the previous maximum valyes for each type. If the count being passed in is 0, then this command returns the current maximum values and doesn't modify the current setting.nr_argsmust be set to 2, as the command takes two values.Available since 5.15.
REGISTER_RING_FDS- Wheneverenteris called to submit request or wait for completions, the kernel must grab a reference to the file descriptor. If the application using io_uring is threaded, the file table is marked as shared, and the reference grab and put of the file descriptor count is more expensive than it is for a non-threaded application.Similarly to how io_uring allows registration of files, this allow registration of the ring file descriptor itself. This reduces the overhead of the
io_uring_enter (2)system call.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Thedatafield of this struct must point to an io_uring file descriptor, and theoffsetfield can be either-1or an explicit offset desired for the registered file descriptor value. If-1is used, then upon successful return of this system call, the field will contain the value of the registered file descriptor to be used for futureio_uring_enter (2)system calls.On successful completion of this request, the returned descriptors may be used instead of the real file descriptor for
io_uring_enter (2), provided thatIORING_ENTER_REGISTERED_RINGis set in theflagsfor the system call. This flag tells the kernel that a registered descriptor is used rather than a real file descriptor.Each thread or process using a ring must register the file descriptor directly by issuing this request.
The maximum number of supported registered ring descriptors is currently limited to
16.Available since 5.18.
UNREGISTER_RING_FDS- Unregister descriptors previously registered withREGISTER_RING_FDS.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Only theoffsetfield should be set in the structure, containing the registered file descriptor offset previously returned fromIORING_REGISTER_RING_FDSthat the application wishes to unregister.Note that this isn't done automatically on ring exit, if the thread or task that previously registered a ring file descriptor isn't exiting. It is recommended to manually unregister any previously registered ring descriptors if the ring is closed and the task persists. This will free up a registration slot, making it available for future use.
Available since 5.18.
REGISTER_PBUF_RING- register ring based provide buffer groupUNREGISTER_PBUF_RING- unregister ring based provide buffer groupREGISTER_SYNC_CANCEL- sync cancelation APIREGISTER_FILE_ALLOC_RANGE- register a range of fixed file slots for automatic slot allocationREGISTER_PBUF_STATUS- return status information for a buffer groupREGISTER_NAPI- set busy poll settingsUNREGISTER_NAPI- clear busy poll settingsREGISTER_CLOCKREGISTER_CLONE_BUFFERS- clone registered buffers from source ring to current ringREGISTER_LASTREGISTER_USE_REGISTERED_RING
- See Also:
-
IORING_UNREGISTER_FILES
public static final int IORING_UNREGISTER_FILESio_uring_register_opEnum values:
REGISTER_BUFFERS-argpoints to a structiovecarray ofnr_argsentries.The buffers associated with the
iovecswill be locked in memory and charged against the user'sRLIMIT_MEMLOCKresource limit. Seegetrlimit(2)for more information. Additionally, there is a size limit of 1GiB per buffer. Currently, the buffers must be anonymous, non-file-backed memory, such as that returned bymalloc(3)ormmap(2)with theMAP_ANONYMOUSflag set. It is expected that this limitation will be lifted in the future. Huge pages are supported as well. Note that the entire huge page will be pinned in the kernel, even if only a portion of it is used.After a successful call, the supplied buffers are mapped into the kernel and eligible for I/O. To make use of them, the application must specify the
OP_READ_FIXEDorOP_WRITE_FIXEDopcodesin the submission queue entry (see the structio_uring_sqedefinition inenter), and set thebuf_indexfield to the desired buffer index. The memory range described by the submission queue entry'saddrandlenfields must fall within the indexed buffer.It is perfectly valid to setup a large buffer and then only use part of it for an I/O, as long as the range is within the originally mapped region.
An application can increase or decrease the size or number of registered buffers by first unregistering the existing buffers, and then issuing a new call to
io_uring_register()with the new buffers.Note that before 5.13 registering buffers would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding.
An application need not unregister buffers explicitly before shutting down the io_uring instance.
Available since 5.1.
UNREGISTER_BUFFERS- This operation takes no argument, andargmust be passed asNULL.All previously registered buffers associated with the
io_uringinstance will be released.Available since 5.1.
REGISTER_FILES- Register files for I/O.argcontains a pointer to an array ofnr_argsfile descriptors (signed 32 bit integers). To make use of the registered files, theIOSQE_FIXED_FILEflag must be set in theflagsmember of the structio_uring_sqe, and thefdmember is set to the index of the file in the file descriptor array.The file set may be sparse, meaning that the
fdfield in the array may be set to -1. SeeREGISTER_FILES_UPDATEfor how to update files in place.Note that before 5.13 registering files would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding. See
REGISTER_FILES_UPDATEfor how to update an existing set without that limitation.Files are automatically unregistered when the io_uring instance is torn down. An application needs only unregister if it wishes to register a new set of fds.
Available since 5.1.
UNREGISTER_FILES- This operation requires no argument, andargmust be passed asNULL.All previously registered files associated with the
io_uringinstance will be unregistered.Available since 5.1.
REGISTER_EVENTFD- It's possible to useeventfd(2)to get notified of completion events on anio_uringinstance. If this is desired, an eventfd file descriptor can be registered through this operation.argmust contain a pointer to the eventfd file descriptor, andnr_argsmust be 1.Available since 5.2.
An application can temporarily disable notifications, coming through the registered eventfd, by setting the
CQ_EVENTFD_DISABLEDbit in theflagsfield of the CQ ring.Available since 5.8.
UNREGISTER_EVENTFD- Unregister an eventfd file descriptor to stop notifications.Since only one eventfd descriptor is currently supported, this operation takes no argument, and
argmust be passed asNULLandnr_argsmust be zero.Available since 5.2.
REGISTER_FILES_UPDATE- This operation replaces existing files in the registered file set with new ones, either turning a sparse entry (one wherefdis equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_files_update, which contains an offset on which to start the update, and an array of file descriptors to use for the update.nr_argsmust contain the number of descriptors in the passed in array.Available since 5.5.
File descriptors can be skipped if they are set to
REGISTER_FILES_SKIP. Skipping an fd will not touch the file associated with the previous fd at that index.Available since 5.12.
REGISTER_EVENTFD_ASYNC- This works just likeREGISTER_EVENTFD, except notifications are only posted for events that complete in an async manner.This means that events that complete inline while being submitted do not trigger a notification event. The arguments supplied are the same as for
IORING_REGISTER_EVENTFD.Available since 5.6.
REGISTER_PROBE- This operation returns a structure,io_uring_probe, which contains information about theopcodessupported byio_uringon the running kernel.argmust contain a pointer to a structio_uring_probe, andnr_argsmust contain the size of the ops array in that probe struct. Theopsarray is of the typeio_uring_probe_op, which holds the value of theopcodeand aflagsfield. If the flags field hasIO_URING_OP_SUPPORTEDset, then this opcode is supported on the running kernel.Available since 5.6.
REGISTER_PERSONALITY- This operation registers credentials of the running application withio_uring, and returns an id associated with these credentials.Applications wishing to share a ring between separate users/processes can pass in this credential id in the sqe personality field. If set, that particular sqe will be issued with these credentials. Must be invoked with
argset toNULLandnr_argsset to zero.Available since 5.6.
UNREGISTER_PERSONALITY- This operation unregisters a previously registered personality withio_uring.nr_argsmust be set to the id in question, andargmust be set toNULL.Available since 5.6.
REGISTER_RESTRICTIONS-argpoints to a structio_uring_restrictionarray ofnr_argsentries.With an entry it is possible to allow an
registeropcode, or specify whichopcodeand flags of the submission queue entry are allowed, or require certain flags to be specified (these flags must be set on each submission queue entry).All the restrictions must be submitted with a single
io_uring_register()call and they are handled as an allowlist (opcodesand flags not registered, are not allowed).Restrictions can be registered only if the
io_uringring started in a disabled state (SETUP_R_DISABLEDmust be specified in the call tosetup).Available since 5.10.
REGISTER_ENABLE_RINGS- This operation enables anio_uringring started in a disabled state (SETUP_R_DISABLEDwas specified in the call tosetup).While the
io_uringring is disabled, submissions are not allowed and registrations are not restricted. After the execution of this operation, theio_uringring is enabled: submissions and registration are allowed, but they will be validated following the registered restrictions (if any). This operation takes no argument, must be invoked withargset toNULLandnr_argsset to zero.Available since 5.10.
REGISTER_FILES2- Register files for I/O. Similar toREGISTER_FILES.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The
datafield contains a pointer to an array ofnrfile descriptors (signed 32 bit integers).tagsfield should either be 0 or or point to an array ofnr"tags" (unsigned 64 bit integers). SeeREGISTER_BUFFERS2for more info on resource tagging.Note that resource updates, e.g.
REGISTER_FILES_UPDATE, don't necessarily deallocate resources, they might be held until all requests using that resource complete.Available since 5.13.
REGISTER_FILES_UPDATE2- Similar toREGISTER_FILES_UPDATE, replaces existing files in the registered file set with new ones, either turning a sparse entry (one where fd is equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of file descriptors to use for the update stored in data.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_BUFFERS2- Register buffers for I/O.Similar to
REGISTER_BUFFERSbut aims to have a more extensible ABI.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The data field contains a pointer to a struct
iovecarray ofnrentries. Thetagsfield should either be 0, then tagging is disabled, or point to an array ofnr"tags" (unsigned 64 bit integers). If a tag is zero, then tagging for this particular resource (a buffer in this case) is disabled. Otherwise, after the resource had been unregistered and it's not used anymore, a CQE will be posted withuser_dataset to the specified tag and all other fields zeroed.Note that resource updates, e.g.
REGISTER_BUFFERS_UPDATE, don't necessarily deallocate resources by the time it returns, but they might be held alive until all requests using it complete.Available since 5.13.
REGISTER_BUFFERS_UPDATE- Updates registered buffers with new ones, either turning a sparse entry into a real one, or replacing an existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of structiovec.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_IOWQ_AFF- By default, async workers created byio_uringwill inherit the CPU mask of its parent.This is usually all the CPUs in the system, unless the parent is being run with a limited set. If this isn't the desired outcome, the application may explicitly tell
io_uringwhat CPUs the async workers may run on.argmust point to acpu_set_tmask, andnr_argsthe byte size of that mask.Available since 5.14.
UNREGISTER_IOWQ_AFF- Undoes a CPU mask previously set withREGISTER_IOWQ_AFF.Must not have
argornr_argsset.Available since 5.14.
REGISTER_IOWQ_MAX_WORKERS- By default,io_uringlimits the unbounded workers created to the maximum processor count set byRLIMIT_NPROCand the bounded workers is a function of the SQ ring size and the number of CPUs in the system. Sometimes this can be excessive (or too little, for bounded), and this command provides a way to change the count per ring (per NUMA node) instead.argmust be set to an unsigned int pointer to an array of two values, with the values in the array being set to the maximum count of workers per NUMA node. Index 0 holds the bounded worker count, and index 1 holds the unbounded worker count. On successful return, the passed in array will contain the previous maximum valyes for each type. If the count being passed in is 0, then this command returns the current maximum values and doesn't modify the current setting.nr_argsmust be set to 2, as the command takes two values.Available since 5.15.
REGISTER_RING_FDS- Wheneverenteris called to submit request or wait for completions, the kernel must grab a reference to the file descriptor. If the application using io_uring is threaded, the file table is marked as shared, and the reference grab and put of the file descriptor count is more expensive than it is for a non-threaded application.Similarly to how io_uring allows registration of files, this allow registration of the ring file descriptor itself. This reduces the overhead of the
io_uring_enter (2)system call.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Thedatafield of this struct must point to an io_uring file descriptor, and theoffsetfield can be either-1or an explicit offset desired for the registered file descriptor value. If-1is used, then upon successful return of this system call, the field will contain the value of the registered file descriptor to be used for futureio_uring_enter (2)system calls.On successful completion of this request, the returned descriptors may be used instead of the real file descriptor for
io_uring_enter (2), provided thatIORING_ENTER_REGISTERED_RINGis set in theflagsfor the system call. This flag tells the kernel that a registered descriptor is used rather than a real file descriptor.Each thread or process using a ring must register the file descriptor directly by issuing this request.
The maximum number of supported registered ring descriptors is currently limited to
16.Available since 5.18.
UNREGISTER_RING_FDS- Unregister descriptors previously registered withREGISTER_RING_FDS.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Only theoffsetfield should be set in the structure, containing the registered file descriptor offset previously returned fromIORING_REGISTER_RING_FDSthat the application wishes to unregister.Note that this isn't done automatically on ring exit, if the thread or task that previously registered a ring file descriptor isn't exiting. It is recommended to manually unregister any previously registered ring descriptors if the ring is closed and the task persists. This will free up a registration slot, making it available for future use.
Available since 5.18.
REGISTER_PBUF_RING- register ring based provide buffer groupUNREGISTER_PBUF_RING- unregister ring based provide buffer groupREGISTER_SYNC_CANCEL- sync cancelation APIREGISTER_FILE_ALLOC_RANGE- register a range of fixed file slots for automatic slot allocationREGISTER_PBUF_STATUS- return status information for a buffer groupREGISTER_NAPI- set busy poll settingsUNREGISTER_NAPI- clear busy poll settingsREGISTER_CLOCKREGISTER_CLONE_BUFFERS- clone registered buffers from source ring to current ringREGISTER_LASTREGISTER_USE_REGISTERED_RING
- See Also:
-
IORING_REGISTER_EVENTFD
public static final int IORING_REGISTER_EVENTFDio_uring_register_opEnum values:
REGISTER_BUFFERS-argpoints to a structiovecarray ofnr_argsentries.The buffers associated with the
iovecswill be locked in memory and charged against the user'sRLIMIT_MEMLOCKresource limit. Seegetrlimit(2)for more information. Additionally, there is a size limit of 1GiB per buffer. Currently, the buffers must be anonymous, non-file-backed memory, such as that returned bymalloc(3)ormmap(2)with theMAP_ANONYMOUSflag set. It is expected that this limitation will be lifted in the future. Huge pages are supported as well. Note that the entire huge page will be pinned in the kernel, even if only a portion of it is used.After a successful call, the supplied buffers are mapped into the kernel and eligible for I/O. To make use of them, the application must specify the
OP_READ_FIXEDorOP_WRITE_FIXEDopcodesin the submission queue entry (see the structio_uring_sqedefinition inenter), and set thebuf_indexfield to the desired buffer index. The memory range described by the submission queue entry'saddrandlenfields must fall within the indexed buffer.It is perfectly valid to setup a large buffer and then only use part of it for an I/O, as long as the range is within the originally mapped region.
An application can increase or decrease the size or number of registered buffers by first unregistering the existing buffers, and then issuing a new call to
io_uring_register()with the new buffers.Note that before 5.13 registering buffers would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding.
An application need not unregister buffers explicitly before shutting down the io_uring instance.
Available since 5.1.
UNREGISTER_BUFFERS- This operation takes no argument, andargmust be passed asNULL.All previously registered buffers associated with the
io_uringinstance will be released.Available since 5.1.
REGISTER_FILES- Register files for I/O.argcontains a pointer to an array ofnr_argsfile descriptors (signed 32 bit integers). To make use of the registered files, theIOSQE_FIXED_FILEflag must be set in theflagsmember of the structio_uring_sqe, and thefdmember is set to the index of the file in the file descriptor array.The file set may be sparse, meaning that the
fdfield in the array may be set to -1. SeeREGISTER_FILES_UPDATEfor how to update files in place.Note that before 5.13 registering files would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding. See
REGISTER_FILES_UPDATEfor how to update an existing set without that limitation.Files are automatically unregistered when the io_uring instance is torn down. An application needs only unregister if it wishes to register a new set of fds.
Available since 5.1.
UNREGISTER_FILES- This operation requires no argument, andargmust be passed asNULL.All previously registered files associated with the
io_uringinstance will be unregistered.Available since 5.1.
REGISTER_EVENTFD- It's possible to useeventfd(2)to get notified of completion events on anio_uringinstance. If this is desired, an eventfd file descriptor can be registered through this operation.argmust contain a pointer to the eventfd file descriptor, andnr_argsmust be 1.Available since 5.2.
An application can temporarily disable notifications, coming through the registered eventfd, by setting the
CQ_EVENTFD_DISABLEDbit in theflagsfield of the CQ ring.Available since 5.8.
UNREGISTER_EVENTFD- Unregister an eventfd file descriptor to stop notifications.Since only one eventfd descriptor is currently supported, this operation takes no argument, and
argmust be passed asNULLandnr_argsmust be zero.Available since 5.2.
REGISTER_FILES_UPDATE- This operation replaces existing files in the registered file set with new ones, either turning a sparse entry (one wherefdis equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_files_update, which contains an offset on which to start the update, and an array of file descriptors to use for the update.nr_argsmust contain the number of descriptors in the passed in array.Available since 5.5.
File descriptors can be skipped if they are set to
REGISTER_FILES_SKIP. Skipping an fd will not touch the file associated with the previous fd at that index.Available since 5.12.
REGISTER_EVENTFD_ASYNC- This works just likeREGISTER_EVENTFD, except notifications are only posted for events that complete in an async manner.This means that events that complete inline while being submitted do not trigger a notification event. The arguments supplied are the same as for
IORING_REGISTER_EVENTFD.Available since 5.6.
REGISTER_PROBE- This operation returns a structure,io_uring_probe, which contains information about theopcodessupported byio_uringon the running kernel.argmust contain a pointer to a structio_uring_probe, andnr_argsmust contain the size of the ops array in that probe struct. Theopsarray is of the typeio_uring_probe_op, which holds the value of theopcodeand aflagsfield. If the flags field hasIO_URING_OP_SUPPORTEDset, then this opcode is supported on the running kernel.Available since 5.6.
REGISTER_PERSONALITY- This operation registers credentials of the running application withio_uring, and returns an id associated with these credentials.Applications wishing to share a ring between separate users/processes can pass in this credential id in the sqe personality field. If set, that particular sqe will be issued with these credentials. Must be invoked with
argset toNULLandnr_argsset to zero.Available since 5.6.
UNREGISTER_PERSONALITY- This operation unregisters a previously registered personality withio_uring.nr_argsmust be set to the id in question, andargmust be set toNULL.Available since 5.6.
REGISTER_RESTRICTIONS-argpoints to a structio_uring_restrictionarray ofnr_argsentries.With an entry it is possible to allow an
registeropcode, or specify whichopcodeand flags of the submission queue entry are allowed, or require certain flags to be specified (these flags must be set on each submission queue entry).All the restrictions must be submitted with a single
io_uring_register()call and they are handled as an allowlist (opcodesand flags not registered, are not allowed).Restrictions can be registered only if the
io_uringring started in a disabled state (SETUP_R_DISABLEDmust be specified in the call tosetup).Available since 5.10.
REGISTER_ENABLE_RINGS- This operation enables anio_uringring started in a disabled state (SETUP_R_DISABLEDwas specified in the call tosetup).While the
io_uringring is disabled, submissions are not allowed and registrations are not restricted. After the execution of this operation, theio_uringring is enabled: submissions and registration are allowed, but they will be validated following the registered restrictions (if any). This operation takes no argument, must be invoked withargset toNULLandnr_argsset to zero.Available since 5.10.
REGISTER_FILES2- Register files for I/O. Similar toREGISTER_FILES.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The
datafield contains a pointer to an array ofnrfile descriptors (signed 32 bit integers).tagsfield should either be 0 or or point to an array ofnr"tags" (unsigned 64 bit integers). SeeREGISTER_BUFFERS2for more info on resource tagging.Note that resource updates, e.g.
REGISTER_FILES_UPDATE, don't necessarily deallocate resources, they might be held until all requests using that resource complete.Available since 5.13.
REGISTER_FILES_UPDATE2- Similar toREGISTER_FILES_UPDATE, replaces existing files in the registered file set with new ones, either turning a sparse entry (one where fd is equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of file descriptors to use for the update stored in data.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_BUFFERS2- Register buffers for I/O.Similar to
REGISTER_BUFFERSbut aims to have a more extensible ABI.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The data field contains a pointer to a struct
iovecarray ofnrentries. Thetagsfield should either be 0, then tagging is disabled, or point to an array ofnr"tags" (unsigned 64 bit integers). If a tag is zero, then tagging for this particular resource (a buffer in this case) is disabled. Otherwise, after the resource had been unregistered and it's not used anymore, a CQE will be posted withuser_dataset to the specified tag and all other fields zeroed.Note that resource updates, e.g.
REGISTER_BUFFERS_UPDATE, don't necessarily deallocate resources by the time it returns, but they might be held alive until all requests using it complete.Available since 5.13.
REGISTER_BUFFERS_UPDATE- Updates registered buffers with new ones, either turning a sparse entry into a real one, or replacing an existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of structiovec.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_IOWQ_AFF- By default, async workers created byio_uringwill inherit the CPU mask of its parent.This is usually all the CPUs in the system, unless the parent is being run with a limited set. If this isn't the desired outcome, the application may explicitly tell
io_uringwhat CPUs the async workers may run on.argmust point to acpu_set_tmask, andnr_argsthe byte size of that mask.Available since 5.14.
UNREGISTER_IOWQ_AFF- Undoes a CPU mask previously set withREGISTER_IOWQ_AFF.Must not have
argornr_argsset.Available since 5.14.
REGISTER_IOWQ_MAX_WORKERS- By default,io_uringlimits the unbounded workers created to the maximum processor count set byRLIMIT_NPROCand the bounded workers is a function of the SQ ring size and the number of CPUs in the system. Sometimes this can be excessive (or too little, for bounded), and this command provides a way to change the count per ring (per NUMA node) instead.argmust be set to an unsigned int pointer to an array of two values, with the values in the array being set to the maximum count of workers per NUMA node. Index 0 holds the bounded worker count, and index 1 holds the unbounded worker count. On successful return, the passed in array will contain the previous maximum valyes for each type. If the count being passed in is 0, then this command returns the current maximum values and doesn't modify the current setting.nr_argsmust be set to 2, as the command takes two values.Available since 5.15.
REGISTER_RING_FDS- Wheneverenteris called to submit request or wait for completions, the kernel must grab a reference to the file descriptor. If the application using io_uring is threaded, the file table is marked as shared, and the reference grab and put of the file descriptor count is more expensive than it is for a non-threaded application.Similarly to how io_uring allows registration of files, this allow registration of the ring file descriptor itself. This reduces the overhead of the
io_uring_enter (2)system call.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Thedatafield of this struct must point to an io_uring file descriptor, and theoffsetfield can be either-1or an explicit offset desired for the registered file descriptor value. If-1is used, then upon successful return of this system call, the field will contain the value of the registered file descriptor to be used for futureio_uring_enter (2)system calls.On successful completion of this request, the returned descriptors may be used instead of the real file descriptor for
io_uring_enter (2), provided thatIORING_ENTER_REGISTERED_RINGis set in theflagsfor the system call. This flag tells the kernel that a registered descriptor is used rather than a real file descriptor.Each thread or process using a ring must register the file descriptor directly by issuing this request.
The maximum number of supported registered ring descriptors is currently limited to
16.Available since 5.18.
UNREGISTER_RING_FDS- Unregister descriptors previously registered withREGISTER_RING_FDS.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Only theoffsetfield should be set in the structure, containing the registered file descriptor offset previously returned fromIORING_REGISTER_RING_FDSthat the application wishes to unregister.Note that this isn't done automatically on ring exit, if the thread or task that previously registered a ring file descriptor isn't exiting. It is recommended to manually unregister any previously registered ring descriptors if the ring is closed and the task persists. This will free up a registration slot, making it available for future use.
Available since 5.18.
REGISTER_PBUF_RING- register ring based provide buffer groupUNREGISTER_PBUF_RING- unregister ring based provide buffer groupREGISTER_SYNC_CANCEL- sync cancelation APIREGISTER_FILE_ALLOC_RANGE- register a range of fixed file slots for automatic slot allocationREGISTER_PBUF_STATUS- return status information for a buffer groupREGISTER_NAPI- set busy poll settingsUNREGISTER_NAPI- clear busy poll settingsREGISTER_CLOCKREGISTER_CLONE_BUFFERS- clone registered buffers from source ring to current ringREGISTER_LASTREGISTER_USE_REGISTERED_RING
- See Also:
-
IORING_UNREGISTER_EVENTFD
public static final int IORING_UNREGISTER_EVENTFDio_uring_register_opEnum values:
REGISTER_BUFFERS-argpoints to a structiovecarray ofnr_argsentries.The buffers associated with the
iovecswill be locked in memory and charged against the user'sRLIMIT_MEMLOCKresource limit. Seegetrlimit(2)for more information. Additionally, there is a size limit of 1GiB per buffer. Currently, the buffers must be anonymous, non-file-backed memory, such as that returned bymalloc(3)ormmap(2)with theMAP_ANONYMOUSflag set. It is expected that this limitation will be lifted in the future. Huge pages are supported as well. Note that the entire huge page will be pinned in the kernel, even if only a portion of it is used.After a successful call, the supplied buffers are mapped into the kernel and eligible for I/O. To make use of them, the application must specify the
OP_READ_FIXEDorOP_WRITE_FIXEDopcodesin the submission queue entry (see the structio_uring_sqedefinition inenter), and set thebuf_indexfield to the desired buffer index. The memory range described by the submission queue entry'saddrandlenfields must fall within the indexed buffer.It is perfectly valid to setup a large buffer and then only use part of it for an I/O, as long as the range is within the originally mapped region.
An application can increase or decrease the size or number of registered buffers by first unregistering the existing buffers, and then issuing a new call to
io_uring_register()with the new buffers.Note that before 5.13 registering buffers would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding.
An application need not unregister buffers explicitly before shutting down the io_uring instance.
Available since 5.1.
UNREGISTER_BUFFERS- This operation takes no argument, andargmust be passed asNULL.All previously registered buffers associated with the
io_uringinstance will be released.Available since 5.1.
REGISTER_FILES- Register files for I/O.argcontains a pointer to an array ofnr_argsfile descriptors (signed 32 bit integers). To make use of the registered files, theIOSQE_FIXED_FILEflag must be set in theflagsmember of the structio_uring_sqe, and thefdmember is set to the index of the file in the file descriptor array.The file set may be sparse, meaning that the
fdfield in the array may be set to -1. SeeREGISTER_FILES_UPDATEfor how to update files in place.Note that before 5.13 registering files would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding. See
REGISTER_FILES_UPDATEfor how to update an existing set without that limitation.Files are automatically unregistered when the io_uring instance is torn down. An application needs only unregister if it wishes to register a new set of fds.
Available since 5.1.
UNREGISTER_FILES- This operation requires no argument, andargmust be passed asNULL.All previously registered files associated with the
io_uringinstance will be unregistered.Available since 5.1.
REGISTER_EVENTFD- It's possible to useeventfd(2)to get notified of completion events on anio_uringinstance. If this is desired, an eventfd file descriptor can be registered through this operation.argmust contain a pointer to the eventfd file descriptor, andnr_argsmust be 1.Available since 5.2.
An application can temporarily disable notifications, coming through the registered eventfd, by setting the
CQ_EVENTFD_DISABLEDbit in theflagsfield of the CQ ring.Available since 5.8.
UNREGISTER_EVENTFD- Unregister an eventfd file descriptor to stop notifications.Since only one eventfd descriptor is currently supported, this operation takes no argument, and
argmust be passed asNULLandnr_argsmust be zero.Available since 5.2.
REGISTER_FILES_UPDATE- This operation replaces existing files in the registered file set with new ones, either turning a sparse entry (one wherefdis equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_files_update, which contains an offset on which to start the update, and an array of file descriptors to use for the update.nr_argsmust contain the number of descriptors in the passed in array.Available since 5.5.
File descriptors can be skipped if they are set to
REGISTER_FILES_SKIP. Skipping an fd will not touch the file associated with the previous fd at that index.Available since 5.12.
REGISTER_EVENTFD_ASYNC- This works just likeREGISTER_EVENTFD, except notifications are only posted for events that complete in an async manner.This means that events that complete inline while being submitted do not trigger a notification event. The arguments supplied are the same as for
IORING_REGISTER_EVENTFD.Available since 5.6.
REGISTER_PROBE- This operation returns a structure,io_uring_probe, which contains information about theopcodessupported byio_uringon the running kernel.argmust contain a pointer to a structio_uring_probe, andnr_argsmust contain the size of the ops array in that probe struct. Theopsarray is of the typeio_uring_probe_op, which holds the value of theopcodeand aflagsfield. If the flags field hasIO_URING_OP_SUPPORTEDset, then this opcode is supported on the running kernel.Available since 5.6.
REGISTER_PERSONALITY- This operation registers credentials of the running application withio_uring, and returns an id associated with these credentials.Applications wishing to share a ring between separate users/processes can pass in this credential id in the sqe personality field. If set, that particular sqe will be issued with these credentials. Must be invoked with
argset toNULLandnr_argsset to zero.Available since 5.6.
UNREGISTER_PERSONALITY- This operation unregisters a previously registered personality withio_uring.nr_argsmust be set to the id in question, andargmust be set toNULL.Available since 5.6.
REGISTER_RESTRICTIONS-argpoints to a structio_uring_restrictionarray ofnr_argsentries.With an entry it is possible to allow an
registeropcode, or specify whichopcodeand flags of the submission queue entry are allowed, or require certain flags to be specified (these flags must be set on each submission queue entry).All the restrictions must be submitted with a single
io_uring_register()call and they are handled as an allowlist (opcodesand flags not registered, are not allowed).Restrictions can be registered only if the
io_uringring started in a disabled state (SETUP_R_DISABLEDmust be specified in the call tosetup).Available since 5.10.
REGISTER_ENABLE_RINGS- This operation enables anio_uringring started in a disabled state (SETUP_R_DISABLEDwas specified in the call tosetup).While the
io_uringring is disabled, submissions are not allowed and registrations are not restricted. After the execution of this operation, theio_uringring is enabled: submissions and registration are allowed, but they will be validated following the registered restrictions (if any). This operation takes no argument, must be invoked withargset toNULLandnr_argsset to zero.Available since 5.10.
REGISTER_FILES2- Register files for I/O. Similar toREGISTER_FILES.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The
datafield contains a pointer to an array ofnrfile descriptors (signed 32 bit integers).tagsfield should either be 0 or or point to an array ofnr"tags" (unsigned 64 bit integers). SeeREGISTER_BUFFERS2for more info on resource tagging.Note that resource updates, e.g.
REGISTER_FILES_UPDATE, don't necessarily deallocate resources, they might be held until all requests using that resource complete.Available since 5.13.
REGISTER_FILES_UPDATE2- Similar toREGISTER_FILES_UPDATE, replaces existing files in the registered file set with new ones, either turning a sparse entry (one where fd is equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of file descriptors to use for the update stored in data.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_BUFFERS2- Register buffers for I/O.Similar to
REGISTER_BUFFERSbut aims to have a more extensible ABI.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The data field contains a pointer to a struct
iovecarray ofnrentries. Thetagsfield should either be 0, then tagging is disabled, or point to an array ofnr"tags" (unsigned 64 bit integers). If a tag is zero, then tagging for this particular resource (a buffer in this case) is disabled. Otherwise, after the resource had been unregistered and it's not used anymore, a CQE will be posted withuser_dataset to the specified tag and all other fields zeroed.Note that resource updates, e.g.
REGISTER_BUFFERS_UPDATE, don't necessarily deallocate resources by the time it returns, but they might be held alive until all requests using it complete.Available since 5.13.
REGISTER_BUFFERS_UPDATE- Updates registered buffers with new ones, either turning a sparse entry into a real one, or replacing an existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of structiovec.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_IOWQ_AFF- By default, async workers created byio_uringwill inherit the CPU mask of its parent.This is usually all the CPUs in the system, unless the parent is being run with a limited set. If this isn't the desired outcome, the application may explicitly tell
io_uringwhat CPUs the async workers may run on.argmust point to acpu_set_tmask, andnr_argsthe byte size of that mask.Available since 5.14.
UNREGISTER_IOWQ_AFF- Undoes a CPU mask previously set withREGISTER_IOWQ_AFF.Must not have
argornr_argsset.Available since 5.14.
REGISTER_IOWQ_MAX_WORKERS- By default,io_uringlimits the unbounded workers created to the maximum processor count set byRLIMIT_NPROCand the bounded workers is a function of the SQ ring size and the number of CPUs in the system. Sometimes this can be excessive (or too little, for bounded), and this command provides a way to change the count per ring (per NUMA node) instead.argmust be set to an unsigned int pointer to an array of two values, with the values in the array being set to the maximum count of workers per NUMA node. Index 0 holds the bounded worker count, and index 1 holds the unbounded worker count. On successful return, the passed in array will contain the previous maximum valyes for each type. If the count being passed in is 0, then this command returns the current maximum values and doesn't modify the current setting.nr_argsmust be set to 2, as the command takes two values.Available since 5.15.
REGISTER_RING_FDS- Wheneverenteris called to submit request or wait for completions, the kernel must grab a reference to the file descriptor. If the application using io_uring is threaded, the file table is marked as shared, and the reference grab and put of the file descriptor count is more expensive than it is for a non-threaded application.Similarly to how io_uring allows registration of files, this allow registration of the ring file descriptor itself. This reduces the overhead of the
io_uring_enter (2)system call.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Thedatafield of this struct must point to an io_uring file descriptor, and theoffsetfield can be either-1or an explicit offset desired for the registered file descriptor value. If-1is used, then upon successful return of this system call, the field will contain the value of the registered file descriptor to be used for futureio_uring_enter (2)system calls.On successful completion of this request, the returned descriptors may be used instead of the real file descriptor for
io_uring_enter (2), provided thatIORING_ENTER_REGISTERED_RINGis set in theflagsfor the system call. This flag tells the kernel that a registered descriptor is used rather than a real file descriptor.Each thread or process using a ring must register the file descriptor directly by issuing this request.
The maximum number of supported registered ring descriptors is currently limited to
16.Available since 5.18.
UNREGISTER_RING_FDS- Unregister descriptors previously registered withREGISTER_RING_FDS.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Only theoffsetfield should be set in the structure, containing the registered file descriptor offset previously returned fromIORING_REGISTER_RING_FDSthat the application wishes to unregister.Note that this isn't done automatically on ring exit, if the thread or task that previously registered a ring file descriptor isn't exiting. It is recommended to manually unregister any previously registered ring descriptors if the ring is closed and the task persists. This will free up a registration slot, making it available for future use.
Available since 5.18.
REGISTER_PBUF_RING- register ring based provide buffer groupUNREGISTER_PBUF_RING- unregister ring based provide buffer groupREGISTER_SYNC_CANCEL- sync cancelation APIREGISTER_FILE_ALLOC_RANGE- register a range of fixed file slots for automatic slot allocationREGISTER_PBUF_STATUS- return status information for a buffer groupREGISTER_NAPI- set busy poll settingsUNREGISTER_NAPI- clear busy poll settingsREGISTER_CLOCKREGISTER_CLONE_BUFFERS- clone registered buffers from source ring to current ringREGISTER_LASTREGISTER_USE_REGISTERED_RING
- See Also:
-
IORING_REGISTER_FILES_UPDATE
public static final int IORING_REGISTER_FILES_UPDATEio_uring_register_opEnum values:
REGISTER_BUFFERS-argpoints to a structiovecarray ofnr_argsentries.The buffers associated with the
iovecswill be locked in memory and charged against the user'sRLIMIT_MEMLOCKresource limit. Seegetrlimit(2)for more information. Additionally, there is a size limit of 1GiB per buffer. Currently, the buffers must be anonymous, non-file-backed memory, such as that returned bymalloc(3)ormmap(2)with theMAP_ANONYMOUSflag set. It is expected that this limitation will be lifted in the future. Huge pages are supported as well. Note that the entire huge page will be pinned in the kernel, even if only a portion of it is used.After a successful call, the supplied buffers are mapped into the kernel and eligible for I/O. To make use of them, the application must specify the
OP_READ_FIXEDorOP_WRITE_FIXEDopcodesin the submission queue entry (see the structio_uring_sqedefinition inenter), and set thebuf_indexfield to the desired buffer index. The memory range described by the submission queue entry'saddrandlenfields must fall within the indexed buffer.It is perfectly valid to setup a large buffer and then only use part of it for an I/O, as long as the range is within the originally mapped region.
An application can increase or decrease the size or number of registered buffers by first unregistering the existing buffers, and then issuing a new call to
io_uring_register()with the new buffers.Note that before 5.13 registering buffers would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding.
An application need not unregister buffers explicitly before shutting down the io_uring instance.
Available since 5.1.
UNREGISTER_BUFFERS- This operation takes no argument, andargmust be passed asNULL.All previously registered buffers associated with the
io_uringinstance will be released.Available since 5.1.
REGISTER_FILES- Register files for I/O.argcontains a pointer to an array ofnr_argsfile descriptors (signed 32 bit integers). To make use of the registered files, theIOSQE_FIXED_FILEflag must be set in theflagsmember of the structio_uring_sqe, and thefdmember is set to the index of the file in the file descriptor array.The file set may be sparse, meaning that the
fdfield in the array may be set to -1. SeeREGISTER_FILES_UPDATEfor how to update files in place.Note that before 5.13 registering files would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding. See
REGISTER_FILES_UPDATEfor how to update an existing set without that limitation.Files are automatically unregistered when the io_uring instance is torn down. An application needs only unregister if it wishes to register a new set of fds.
Available since 5.1.
UNREGISTER_FILES- This operation requires no argument, andargmust be passed asNULL.All previously registered files associated with the
io_uringinstance will be unregistered.Available since 5.1.
REGISTER_EVENTFD- It's possible to useeventfd(2)to get notified of completion events on anio_uringinstance. If this is desired, an eventfd file descriptor can be registered through this operation.argmust contain a pointer to the eventfd file descriptor, andnr_argsmust be 1.Available since 5.2.
An application can temporarily disable notifications, coming through the registered eventfd, by setting the
CQ_EVENTFD_DISABLEDbit in theflagsfield of the CQ ring.Available since 5.8.
UNREGISTER_EVENTFD- Unregister an eventfd file descriptor to stop notifications.Since only one eventfd descriptor is currently supported, this operation takes no argument, and
argmust be passed asNULLandnr_argsmust be zero.Available since 5.2.
REGISTER_FILES_UPDATE- This operation replaces existing files in the registered file set with new ones, either turning a sparse entry (one wherefdis equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_files_update, which contains an offset on which to start the update, and an array of file descriptors to use for the update.nr_argsmust contain the number of descriptors in the passed in array.Available since 5.5.
File descriptors can be skipped if they are set to
REGISTER_FILES_SKIP. Skipping an fd will not touch the file associated with the previous fd at that index.Available since 5.12.
REGISTER_EVENTFD_ASYNC- This works just likeREGISTER_EVENTFD, except notifications are only posted for events that complete in an async manner.This means that events that complete inline while being submitted do not trigger a notification event. The arguments supplied are the same as for
IORING_REGISTER_EVENTFD.Available since 5.6.
REGISTER_PROBE- This operation returns a structure,io_uring_probe, which contains information about theopcodessupported byio_uringon the running kernel.argmust contain a pointer to a structio_uring_probe, andnr_argsmust contain the size of the ops array in that probe struct. Theopsarray is of the typeio_uring_probe_op, which holds the value of theopcodeand aflagsfield. If the flags field hasIO_URING_OP_SUPPORTEDset, then this opcode is supported on the running kernel.Available since 5.6.
REGISTER_PERSONALITY- This operation registers credentials of the running application withio_uring, and returns an id associated with these credentials.Applications wishing to share a ring between separate users/processes can pass in this credential id in the sqe personality field. If set, that particular sqe will be issued with these credentials. Must be invoked with
argset toNULLandnr_argsset to zero.Available since 5.6.
UNREGISTER_PERSONALITY- This operation unregisters a previously registered personality withio_uring.nr_argsmust be set to the id in question, andargmust be set toNULL.Available since 5.6.
REGISTER_RESTRICTIONS-argpoints to a structio_uring_restrictionarray ofnr_argsentries.With an entry it is possible to allow an
registeropcode, or specify whichopcodeand flags of the submission queue entry are allowed, or require certain flags to be specified (these flags must be set on each submission queue entry).All the restrictions must be submitted with a single
io_uring_register()call and they are handled as an allowlist (opcodesand flags not registered, are not allowed).Restrictions can be registered only if the
io_uringring started in a disabled state (SETUP_R_DISABLEDmust be specified in the call tosetup).Available since 5.10.
REGISTER_ENABLE_RINGS- This operation enables anio_uringring started in a disabled state (SETUP_R_DISABLEDwas specified in the call tosetup).While the
io_uringring is disabled, submissions are not allowed and registrations are not restricted. After the execution of this operation, theio_uringring is enabled: submissions and registration are allowed, but they will be validated following the registered restrictions (if any). This operation takes no argument, must be invoked withargset toNULLandnr_argsset to zero.Available since 5.10.
REGISTER_FILES2- Register files for I/O. Similar toREGISTER_FILES.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The
datafield contains a pointer to an array ofnrfile descriptors (signed 32 bit integers).tagsfield should either be 0 or or point to an array ofnr"tags" (unsigned 64 bit integers). SeeREGISTER_BUFFERS2for more info on resource tagging.Note that resource updates, e.g.
REGISTER_FILES_UPDATE, don't necessarily deallocate resources, they might be held until all requests using that resource complete.Available since 5.13.
REGISTER_FILES_UPDATE2- Similar toREGISTER_FILES_UPDATE, replaces existing files in the registered file set with new ones, either turning a sparse entry (one where fd is equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of file descriptors to use for the update stored in data.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_BUFFERS2- Register buffers for I/O.Similar to
REGISTER_BUFFERSbut aims to have a more extensible ABI.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The data field contains a pointer to a struct
iovecarray ofnrentries. Thetagsfield should either be 0, then tagging is disabled, or point to an array ofnr"tags" (unsigned 64 bit integers). If a tag is zero, then tagging for this particular resource (a buffer in this case) is disabled. Otherwise, after the resource had been unregistered and it's not used anymore, a CQE will be posted withuser_dataset to the specified tag and all other fields zeroed.Note that resource updates, e.g.
REGISTER_BUFFERS_UPDATE, don't necessarily deallocate resources by the time it returns, but they might be held alive until all requests using it complete.Available since 5.13.
REGISTER_BUFFERS_UPDATE- Updates registered buffers with new ones, either turning a sparse entry into a real one, or replacing an existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of structiovec.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_IOWQ_AFF- By default, async workers created byio_uringwill inherit the CPU mask of its parent.This is usually all the CPUs in the system, unless the parent is being run with a limited set. If this isn't the desired outcome, the application may explicitly tell
io_uringwhat CPUs the async workers may run on.argmust point to acpu_set_tmask, andnr_argsthe byte size of that mask.Available since 5.14.
UNREGISTER_IOWQ_AFF- Undoes a CPU mask previously set withREGISTER_IOWQ_AFF.Must not have
argornr_argsset.Available since 5.14.
REGISTER_IOWQ_MAX_WORKERS- By default,io_uringlimits the unbounded workers created to the maximum processor count set byRLIMIT_NPROCand the bounded workers is a function of the SQ ring size and the number of CPUs in the system. Sometimes this can be excessive (or too little, for bounded), and this command provides a way to change the count per ring (per NUMA node) instead.argmust be set to an unsigned int pointer to an array of two values, with the values in the array being set to the maximum count of workers per NUMA node. Index 0 holds the bounded worker count, and index 1 holds the unbounded worker count. On successful return, the passed in array will contain the previous maximum valyes for each type. If the count being passed in is 0, then this command returns the current maximum values and doesn't modify the current setting.nr_argsmust be set to 2, as the command takes two values.Available since 5.15.
REGISTER_RING_FDS- Wheneverenteris called to submit request or wait for completions, the kernel must grab a reference to the file descriptor. If the application using io_uring is threaded, the file table is marked as shared, and the reference grab and put of the file descriptor count is more expensive than it is for a non-threaded application.Similarly to how io_uring allows registration of files, this allow registration of the ring file descriptor itself. This reduces the overhead of the
io_uring_enter (2)system call.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Thedatafield of this struct must point to an io_uring file descriptor, and theoffsetfield can be either-1or an explicit offset desired for the registered file descriptor value. If-1is used, then upon successful return of this system call, the field will contain the value of the registered file descriptor to be used for futureio_uring_enter (2)system calls.On successful completion of this request, the returned descriptors may be used instead of the real file descriptor for
io_uring_enter (2), provided thatIORING_ENTER_REGISTERED_RINGis set in theflagsfor the system call. This flag tells the kernel that a registered descriptor is used rather than a real file descriptor.Each thread or process using a ring must register the file descriptor directly by issuing this request.
The maximum number of supported registered ring descriptors is currently limited to
16.Available since 5.18.
UNREGISTER_RING_FDS- Unregister descriptors previously registered withREGISTER_RING_FDS.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Only theoffsetfield should be set in the structure, containing the registered file descriptor offset previously returned fromIORING_REGISTER_RING_FDSthat the application wishes to unregister.Note that this isn't done automatically on ring exit, if the thread or task that previously registered a ring file descriptor isn't exiting. It is recommended to manually unregister any previously registered ring descriptors if the ring is closed and the task persists. This will free up a registration slot, making it available for future use.
Available since 5.18.
REGISTER_PBUF_RING- register ring based provide buffer groupUNREGISTER_PBUF_RING- unregister ring based provide buffer groupREGISTER_SYNC_CANCEL- sync cancelation APIREGISTER_FILE_ALLOC_RANGE- register a range of fixed file slots for automatic slot allocationREGISTER_PBUF_STATUS- return status information for a buffer groupREGISTER_NAPI- set busy poll settingsUNREGISTER_NAPI- clear busy poll settingsREGISTER_CLOCKREGISTER_CLONE_BUFFERS- clone registered buffers from source ring to current ringREGISTER_LASTREGISTER_USE_REGISTERED_RING
- See Also:
-
IORING_REGISTER_EVENTFD_ASYNC
public static final int IORING_REGISTER_EVENTFD_ASYNCio_uring_register_opEnum values:
REGISTER_BUFFERS-argpoints to a structiovecarray ofnr_argsentries.The buffers associated with the
iovecswill be locked in memory and charged against the user'sRLIMIT_MEMLOCKresource limit. Seegetrlimit(2)for more information. Additionally, there is a size limit of 1GiB per buffer. Currently, the buffers must be anonymous, non-file-backed memory, such as that returned bymalloc(3)ormmap(2)with theMAP_ANONYMOUSflag set. It is expected that this limitation will be lifted in the future. Huge pages are supported as well. Note that the entire huge page will be pinned in the kernel, even if only a portion of it is used.After a successful call, the supplied buffers are mapped into the kernel and eligible for I/O. To make use of them, the application must specify the
OP_READ_FIXEDorOP_WRITE_FIXEDopcodesin the submission queue entry (see the structio_uring_sqedefinition inenter), and set thebuf_indexfield to the desired buffer index. The memory range described by the submission queue entry'saddrandlenfields must fall within the indexed buffer.It is perfectly valid to setup a large buffer and then only use part of it for an I/O, as long as the range is within the originally mapped region.
An application can increase or decrease the size or number of registered buffers by first unregistering the existing buffers, and then issuing a new call to
io_uring_register()with the new buffers.Note that before 5.13 registering buffers would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding.
An application need not unregister buffers explicitly before shutting down the io_uring instance.
Available since 5.1.
UNREGISTER_BUFFERS- This operation takes no argument, andargmust be passed asNULL.All previously registered buffers associated with the
io_uringinstance will be released.Available since 5.1.
REGISTER_FILES- Register files for I/O.argcontains a pointer to an array ofnr_argsfile descriptors (signed 32 bit integers). To make use of the registered files, theIOSQE_FIXED_FILEflag must be set in theflagsmember of the structio_uring_sqe, and thefdmember is set to the index of the file in the file descriptor array.The file set may be sparse, meaning that the
fdfield in the array may be set to -1. SeeREGISTER_FILES_UPDATEfor how to update files in place.Note that before 5.13 registering files would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding. See
REGISTER_FILES_UPDATEfor how to update an existing set without that limitation.Files are automatically unregistered when the io_uring instance is torn down. An application needs only unregister if it wishes to register a new set of fds.
Available since 5.1.
UNREGISTER_FILES- This operation requires no argument, andargmust be passed asNULL.All previously registered files associated with the
io_uringinstance will be unregistered.Available since 5.1.
REGISTER_EVENTFD- It's possible to useeventfd(2)to get notified of completion events on anio_uringinstance. If this is desired, an eventfd file descriptor can be registered through this operation.argmust contain a pointer to the eventfd file descriptor, andnr_argsmust be 1.Available since 5.2.
An application can temporarily disable notifications, coming through the registered eventfd, by setting the
CQ_EVENTFD_DISABLEDbit in theflagsfield of the CQ ring.Available since 5.8.
UNREGISTER_EVENTFD- Unregister an eventfd file descriptor to stop notifications.Since only one eventfd descriptor is currently supported, this operation takes no argument, and
argmust be passed asNULLandnr_argsmust be zero.Available since 5.2.
REGISTER_FILES_UPDATE- This operation replaces existing files in the registered file set with new ones, either turning a sparse entry (one wherefdis equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_files_update, which contains an offset on which to start the update, and an array of file descriptors to use for the update.nr_argsmust contain the number of descriptors in the passed in array.Available since 5.5.
File descriptors can be skipped if they are set to
REGISTER_FILES_SKIP. Skipping an fd will not touch the file associated with the previous fd at that index.Available since 5.12.
REGISTER_EVENTFD_ASYNC- This works just likeREGISTER_EVENTFD, except notifications are only posted for events that complete in an async manner.This means that events that complete inline while being submitted do not trigger a notification event. The arguments supplied are the same as for
IORING_REGISTER_EVENTFD.Available since 5.6.
REGISTER_PROBE- This operation returns a structure,io_uring_probe, which contains information about theopcodessupported byio_uringon the running kernel.argmust contain a pointer to a structio_uring_probe, andnr_argsmust contain the size of the ops array in that probe struct. Theopsarray is of the typeio_uring_probe_op, which holds the value of theopcodeand aflagsfield. If the flags field hasIO_URING_OP_SUPPORTEDset, then this opcode is supported on the running kernel.Available since 5.6.
REGISTER_PERSONALITY- This operation registers credentials of the running application withio_uring, and returns an id associated with these credentials.Applications wishing to share a ring between separate users/processes can pass in this credential id in the sqe personality field. If set, that particular sqe will be issued with these credentials. Must be invoked with
argset toNULLandnr_argsset to zero.Available since 5.6.
UNREGISTER_PERSONALITY- This operation unregisters a previously registered personality withio_uring.nr_argsmust be set to the id in question, andargmust be set toNULL.Available since 5.6.
REGISTER_RESTRICTIONS-argpoints to a structio_uring_restrictionarray ofnr_argsentries.With an entry it is possible to allow an
registeropcode, or specify whichopcodeand flags of the submission queue entry are allowed, or require certain flags to be specified (these flags must be set on each submission queue entry).All the restrictions must be submitted with a single
io_uring_register()call and they are handled as an allowlist (opcodesand flags not registered, are not allowed).Restrictions can be registered only if the
io_uringring started in a disabled state (SETUP_R_DISABLEDmust be specified in the call tosetup).Available since 5.10.
REGISTER_ENABLE_RINGS- This operation enables anio_uringring started in a disabled state (SETUP_R_DISABLEDwas specified in the call tosetup).While the
io_uringring is disabled, submissions are not allowed and registrations are not restricted. After the execution of this operation, theio_uringring is enabled: submissions and registration are allowed, but they will be validated following the registered restrictions (if any). This operation takes no argument, must be invoked withargset toNULLandnr_argsset to zero.Available since 5.10.
REGISTER_FILES2- Register files for I/O. Similar toREGISTER_FILES.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The
datafield contains a pointer to an array ofnrfile descriptors (signed 32 bit integers).tagsfield should either be 0 or or point to an array ofnr"tags" (unsigned 64 bit integers). SeeREGISTER_BUFFERS2for more info on resource tagging.Note that resource updates, e.g.
REGISTER_FILES_UPDATE, don't necessarily deallocate resources, they might be held until all requests using that resource complete.Available since 5.13.
REGISTER_FILES_UPDATE2- Similar toREGISTER_FILES_UPDATE, replaces existing files in the registered file set with new ones, either turning a sparse entry (one where fd is equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of file descriptors to use for the update stored in data.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_BUFFERS2- Register buffers for I/O.Similar to
REGISTER_BUFFERSbut aims to have a more extensible ABI.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The data field contains a pointer to a struct
iovecarray ofnrentries. Thetagsfield should either be 0, then tagging is disabled, or point to an array ofnr"tags" (unsigned 64 bit integers). If a tag is zero, then tagging for this particular resource (a buffer in this case) is disabled. Otherwise, after the resource had been unregistered and it's not used anymore, a CQE will be posted withuser_dataset to the specified tag and all other fields zeroed.Note that resource updates, e.g.
REGISTER_BUFFERS_UPDATE, don't necessarily deallocate resources by the time it returns, but they might be held alive until all requests using it complete.Available since 5.13.
REGISTER_BUFFERS_UPDATE- Updates registered buffers with new ones, either turning a sparse entry into a real one, or replacing an existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of structiovec.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_IOWQ_AFF- By default, async workers created byio_uringwill inherit the CPU mask of its parent.This is usually all the CPUs in the system, unless the parent is being run with a limited set. If this isn't the desired outcome, the application may explicitly tell
io_uringwhat CPUs the async workers may run on.argmust point to acpu_set_tmask, andnr_argsthe byte size of that mask.Available since 5.14.
UNREGISTER_IOWQ_AFF- Undoes a CPU mask previously set withREGISTER_IOWQ_AFF.Must not have
argornr_argsset.Available since 5.14.
REGISTER_IOWQ_MAX_WORKERS- By default,io_uringlimits the unbounded workers created to the maximum processor count set byRLIMIT_NPROCand the bounded workers is a function of the SQ ring size and the number of CPUs in the system. Sometimes this can be excessive (or too little, for bounded), and this command provides a way to change the count per ring (per NUMA node) instead.argmust be set to an unsigned int pointer to an array of two values, with the values in the array being set to the maximum count of workers per NUMA node. Index 0 holds the bounded worker count, and index 1 holds the unbounded worker count. On successful return, the passed in array will contain the previous maximum valyes for each type. If the count being passed in is 0, then this command returns the current maximum values and doesn't modify the current setting.nr_argsmust be set to 2, as the command takes two values.Available since 5.15.
REGISTER_RING_FDS- Wheneverenteris called to submit request or wait for completions, the kernel must grab a reference to the file descriptor. If the application using io_uring is threaded, the file table is marked as shared, and the reference grab and put of the file descriptor count is more expensive than it is for a non-threaded application.Similarly to how io_uring allows registration of files, this allow registration of the ring file descriptor itself. This reduces the overhead of the
io_uring_enter (2)system call.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Thedatafield of this struct must point to an io_uring file descriptor, and theoffsetfield can be either-1or an explicit offset desired for the registered file descriptor value. If-1is used, then upon successful return of this system call, the field will contain the value of the registered file descriptor to be used for futureio_uring_enter (2)system calls.On successful completion of this request, the returned descriptors may be used instead of the real file descriptor for
io_uring_enter (2), provided thatIORING_ENTER_REGISTERED_RINGis set in theflagsfor the system call. This flag tells the kernel that a registered descriptor is used rather than a real file descriptor.Each thread or process using a ring must register the file descriptor directly by issuing this request.
The maximum number of supported registered ring descriptors is currently limited to
16.Available since 5.18.
UNREGISTER_RING_FDS- Unregister descriptors previously registered withREGISTER_RING_FDS.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Only theoffsetfield should be set in the structure, containing the registered file descriptor offset previously returned fromIORING_REGISTER_RING_FDSthat the application wishes to unregister.Note that this isn't done automatically on ring exit, if the thread or task that previously registered a ring file descriptor isn't exiting. It is recommended to manually unregister any previously registered ring descriptors if the ring is closed and the task persists. This will free up a registration slot, making it available for future use.
Available since 5.18.
REGISTER_PBUF_RING- register ring based provide buffer groupUNREGISTER_PBUF_RING- unregister ring based provide buffer groupREGISTER_SYNC_CANCEL- sync cancelation APIREGISTER_FILE_ALLOC_RANGE- register a range of fixed file slots for automatic slot allocationREGISTER_PBUF_STATUS- return status information for a buffer groupREGISTER_NAPI- set busy poll settingsUNREGISTER_NAPI- clear busy poll settingsREGISTER_CLOCKREGISTER_CLONE_BUFFERS- clone registered buffers from source ring to current ringREGISTER_LASTREGISTER_USE_REGISTERED_RING
- See Also:
-
IORING_REGISTER_PROBE
public static final int IORING_REGISTER_PROBEio_uring_register_opEnum values:
REGISTER_BUFFERS-argpoints to a structiovecarray ofnr_argsentries.The buffers associated with the
iovecswill be locked in memory and charged against the user'sRLIMIT_MEMLOCKresource limit. Seegetrlimit(2)for more information. Additionally, there is a size limit of 1GiB per buffer. Currently, the buffers must be anonymous, non-file-backed memory, such as that returned bymalloc(3)ormmap(2)with theMAP_ANONYMOUSflag set. It is expected that this limitation will be lifted in the future. Huge pages are supported as well. Note that the entire huge page will be pinned in the kernel, even if only a portion of it is used.After a successful call, the supplied buffers are mapped into the kernel and eligible for I/O. To make use of them, the application must specify the
OP_READ_FIXEDorOP_WRITE_FIXEDopcodesin the submission queue entry (see the structio_uring_sqedefinition inenter), and set thebuf_indexfield to the desired buffer index. The memory range described by the submission queue entry'saddrandlenfields must fall within the indexed buffer.It is perfectly valid to setup a large buffer and then only use part of it for an I/O, as long as the range is within the originally mapped region.
An application can increase or decrease the size or number of registered buffers by first unregistering the existing buffers, and then issuing a new call to
io_uring_register()with the new buffers.Note that before 5.13 registering buffers would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding.
An application need not unregister buffers explicitly before shutting down the io_uring instance.
Available since 5.1.
UNREGISTER_BUFFERS- This operation takes no argument, andargmust be passed asNULL.All previously registered buffers associated with the
io_uringinstance will be released.Available since 5.1.
REGISTER_FILES- Register files for I/O.argcontains a pointer to an array ofnr_argsfile descriptors (signed 32 bit integers). To make use of the registered files, theIOSQE_FIXED_FILEflag must be set in theflagsmember of the structio_uring_sqe, and thefdmember is set to the index of the file in the file descriptor array.The file set may be sparse, meaning that the
fdfield in the array may be set to -1. SeeREGISTER_FILES_UPDATEfor how to update files in place.Note that before 5.13 registering files would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding. See
REGISTER_FILES_UPDATEfor how to update an existing set without that limitation.Files are automatically unregistered when the io_uring instance is torn down. An application needs only unregister if it wishes to register a new set of fds.
Available since 5.1.
UNREGISTER_FILES- This operation requires no argument, andargmust be passed asNULL.All previously registered files associated with the
io_uringinstance will be unregistered.Available since 5.1.
REGISTER_EVENTFD- It's possible to useeventfd(2)to get notified of completion events on anio_uringinstance. If this is desired, an eventfd file descriptor can be registered through this operation.argmust contain a pointer to the eventfd file descriptor, andnr_argsmust be 1.Available since 5.2.
An application can temporarily disable notifications, coming through the registered eventfd, by setting the
CQ_EVENTFD_DISABLEDbit in theflagsfield of the CQ ring.Available since 5.8.
UNREGISTER_EVENTFD- Unregister an eventfd file descriptor to stop notifications.Since only one eventfd descriptor is currently supported, this operation takes no argument, and
argmust be passed asNULLandnr_argsmust be zero.Available since 5.2.
REGISTER_FILES_UPDATE- This operation replaces existing files in the registered file set with new ones, either turning a sparse entry (one wherefdis equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_files_update, which contains an offset on which to start the update, and an array of file descriptors to use for the update.nr_argsmust contain the number of descriptors in the passed in array.Available since 5.5.
File descriptors can be skipped if they are set to
REGISTER_FILES_SKIP. Skipping an fd will not touch the file associated with the previous fd at that index.Available since 5.12.
REGISTER_EVENTFD_ASYNC- This works just likeREGISTER_EVENTFD, except notifications are only posted for events that complete in an async manner.This means that events that complete inline while being submitted do not trigger a notification event. The arguments supplied are the same as for
IORING_REGISTER_EVENTFD.Available since 5.6.
REGISTER_PROBE- This operation returns a structure,io_uring_probe, which contains information about theopcodessupported byio_uringon the running kernel.argmust contain a pointer to a structio_uring_probe, andnr_argsmust contain the size of the ops array in that probe struct. Theopsarray is of the typeio_uring_probe_op, which holds the value of theopcodeand aflagsfield. If the flags field hasIO_URING_OP_SUPPORTEDset, then this opcode is supported on the running kernel.Available since 5.6.
REGISTER_PERSONALITY- This operation registers credentials of the running application withio_uring, and returns an id associated with these credentials.Applications wishing to share a ring between separate users/processes can pass in this credential id in the sqe personality field. If set, that particular sqe will be issued with these credentials. Must be invoked with
argset toNULLandnr_argsset to zero.Available since 5.6.
UNREGISTER_PERSONALITY- This operation unregisters a previously registered personality withio_uring.nr_argsmust be set to the id in question, andargmust be set toNULL.Available since 5.6.
REGISTER_RESTRICTIONS-argpoints to a structio_uring_restrictionarray ofnr_argsentries.With an entry it is possible to allow an
registeropcode, or specify whichopcodeand flags of the submission queue entry are allowed, or require certain flags to be specified (these flags must be set on each submission queue entry).All the restrictions must be submitted with a single
io_uring_register()call and they are handled as an allowlist (opcodesand flags not registered, are not allowed).Restrictions can be registered only if the
io_uringring started in a disabled state (SETUP_R_DISABLEDmust be specified in the call tosetup).Available since 5.10.
REGISTER_ENABLE_RINGS- This operation enables anio_uringring started in a disabled state (SETUP_R_DISABLEDwas specified in the call tosetup).While the
io_uringring is disabled, submissions are not allowed and registrations are not restricted. After the execution of this operation, theio_uringring is enabled: submissions and registration are allowed, but they will be validated following the registered restrictions (if any). This operation takes no argument, must be invoked withargset toNULLandnr_argsset to zero.Available since 5.10.
REGISTER_FILES2- Register files for I/O. Similar toREGISTER_FILES.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The
datafield contains a pointer to an array ofnrfile descriptors (signed 32 bit integers).tagsfield should either be 0 or or point to an array ofnr"tags" (unsigned 64 bit integers). SeeREGISTER_BUFFERS2for more info on resource tagging.Note that resource updates, e.g.
REGISTER_FILES_UPDATE, don't necessarily deallocate resources, they might be held until all requests using that resource complete.Available since 5.13.
REGISTER_FILES_UPDATE2- Similar toREGISTER_FILES_UPDATE, replaces existing files in the registered file set with new ones, either turning a sparse entry (one where fd is equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of file descriptors to use for the update stored in data.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_BUFFERS2- Register buffers for I/O.Similar to
REGISTER_BUFFERSbut aims to have a more extensible ABI.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The data field contains a pointer to a struct
iovecarray ofnrentries. Thetagsfield should either be 0, then tagging is disabled, or point to an array ofnr"tags" (unsigned 64 bit integers). If a tag is zero, then tagging for this particular resource (a buffer in this case) is disabled. Otherwise, after the resource had been unregistered and it's not used anymore, a CQE will be posted withuser_dataset to the specified tag and all other fields zeroed.Note that resource updates, e.g.
REGISTER_BUFFERS_UPDATE, don't necessarily deallocate resources by the time it returns, but they might be held alive until all requests using it complete.Available since 5.13.
REGISTER_BUFFERS_UPDATE- Updates registered buffers with new ones, either turning a sparse entry into a real one, or replacing an existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of structiovec.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_IOWQ_AFF- By default, async workers created byio_uringwill inherit the CPU mask of its parent.This is usually all the CPUs in the system, unless the parent is being run with a limited set. If this isn't the desired outcome, the application may explicitly tell
io_uringwhat CPUs the async workers may run on.argmust point to acpu_set_tmask, andnr_argsthe byte size of that mask.Available since 5.14.
UNREGISTER_IOWQ_AFF- Undoes a CPU mask previously set withREGISTER_IOWQ_AFF.Must not have
argornr_argsset.Available since 5.14.
REGISTER_IOWQ_MAX_WORKERS- By default,io_uringlimits the unbounded workers created to the maximum processor count set byRLIMIT_NPROCand the bounded workers is a function of the SQ ring size and the number of CPUs in the system. Sometimes this can be excessive (or too little, for bounded), and this command provides a way to change the count per ring (per NUMA node) instead.argmust be set to an unsigned int pointer to an array of two values, with the values in the array being set to the maximum count of workers per NUMA node. Index 0 holds the bounded worker count, and index 1 holds the unbounded worker count. On successful return, the passed in array will contain the previous maximum valyes for each type. If the count being passed in is 0, then this command returns the current maximum values and doesn't modify the current setting.nr_argsmust be set to 2, as the command takes two values.Available since 5.15.
REGISTER_RING_FDS- Wheneverenteris called to submit request or wait for completions, the kernel must grab a reference to the file descriptor. If the application using io_uring is threaded, the file table is marked as shared, and the reference grab and put of the file descriptor count is more expensive than it is for a non-threaded application.Similarly to how io_uring allows registration of files, this allow registration of the ring file descriptor itself. This reduces the overhead of the
io_uring_enter (2)system call.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Thedatafield of this struct must point to an io_uring file descriptor, and theoffsetfield can be either-1or an explicit offset desired for the registered file descriptor value. If-1is used, then upon successful return of this system call, the field will contain the value of the registered file descriptor to be used for futureio_uring_enter (2)system calls.On successful completion of this request, the returned descriptors may be used instead of the real file descriptor for
io_uring_enter (2), provided thatIORING_ENTER_REGISTERED_RINGis set in theflagsfor the system call. This flag tells the kernel that a registered descriptor is used rather than a real file descriptor.Each thread or process using a ring must register the file descriptor directly by issuing this request.
The maximum number of supported registered ring descriptors is currently limited to
16.Available since 5.18.
UNREGISTER_RING_FDS- Unregister descriptors previously registered withREGISTER_RING_FDS.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Only theoffsetfield should be set in the structure, containing the registered file descriptor offset previously returned fromIORING_REGISTER_RING_FDSthat the application wishes to unregister.Note that this isn't done automatically on ring exit, if the thread or task that previously registered a ring file descriptor isn't exiting. It is recommended to manually unregister any previously registered ring descriptors if the ring is closed and the task persists. This will free up a registration slot, making it available for future use.
Available since 5.18.
REGISTER_PBUF_RING- register ring based provide buffer groupUNREGISTER_PBUF_RING- unregister ring based provide buffer groupREGISTER_SYNC_CANCEL- sync cancelation APIREGISTER_FILE_ALLOC_RANGE- register a range of fixed file slots for automatic slot allocationREGISTER_PBUF_STATUS- return status information for a buffer groupREGISTER_NAPI- set busy poll settingsUNREGISTER_NAPI- clear busy poll settingsREGISTER_CLOCKREGISTER_CLONE_BUFFERS- clone registered buffers from source ring to current ringREGISTER_LASTREGISTER_USE_REGISTERED_RING
- See Also:
-
IORING_REGISTER_PERSONALITY
public static final int IORING_REGISTER_PERSONALITYio_uring_register_opEnum values:
REGISTER_BUFFERS-argpoints to a structiovecarray ofnr_argsentries.The buffers associated with the
iovecswill be locked in memory and charged against the user'sRLIMIT_MEMLOCKresource limit. Seegetrlimit(2)for more information. Additionally, there is a size limit of 1GiB per buffer. Currently, the buffers must be anonymous, non-file-backed memory, such as that returned bymalloc(3)ormmap(2)with theMAP_ANONYMOUSflag set. It is expected that this limitation will be lifted in the future. Huge pages are supported as well. Note that the entire huge page will be pinned in the kernel, even if only a portion of it is used.After a successful call, the supplied buffers are mapped into the kernel and eligible for I/O. To make use of them, the application must specify the
OP_READ_FIXEDorOP_WRITE_FIXEDopcodesin the submission queue entry (see the structio_uring_sqedefinition inenter), and set thebuf_indexfield to the desired buffer index. The memory range described by the submission queue entry'saddrandlenfields must fall within the indexed buffer.It is perfectly valid to setup a large buffer and then only use part of it for an I/O, as long as the range is within the originally mapped region.
An application can increase or decrease the size or number of registered buffers by first unregistering the existing buffers, and then issuing a new call to
io_uring_register()with the new buffers.Note that before 5.13 registering buffers would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding.
An application need not unregister buffers explicitly before shutting down the io_uring instance.
Available since 5.1.
UNREGISTER_BUFFERS- This operation takes no argument, andargmust be passed asNULL.All previously registered buffers associated with the
io_uringinstance will be released.Available since 5.1.
REGISTER_FILES- Register files for I/O.argcontains a pointer to an array ofnr_argsfile descriptors (signed 32 bit integers). To make use of the registered files, theIOSQE_FIXED_FILEflag must be set in theflagsmember of the structio_uring_sqe, and thefdmember is set to the index of the file in the file descriptor array.The file set may be sparse, meaning that the
fdfield in the array may be set to -1. SeeREGISTER_FILES_UPDATEfor how to update files in place.Note that before 5.13 registering files would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding. See
REGISTER_FILES_UPDATEfor how to update an existing set without that limitation.Files are automatically unregistered when the io_uring instance is torn down. An application needs only unregister if it wishes to register a new set of fds.
Available since 5.1.
UNREGISTER_FILES- This operation requires no argument, andargmust be passed asNULL.All previously registered files associated with the
io_uringinstance will be unregistered.Available since 5.1.
REGISTER_EVENTFD- It's possible to useeventfd(2)to get notified of completion events on anio_uringinstance. If this is desired, an eventfd file descriptor can be registered through this operation.argmust contain a pointer to the eventfd file descriptor, andnr_argsmust be 1.Available since 5.2.
An application can temporarily disable notifications, coming through the registered eventfd, by setting the
CQ_EVENTFD_DISABLEDbit in theflagsfield of the CQ ring.Available since 5.8.
UNREGISTER_EVENTFD- Unregister an eventfd file descriptor to stop notifications.Since only one eventfd descriptor is currently supported, this operation takes no argument, and
argmust be passed asNULLandnr_argsmust be zero.Available since 5.2.
REGISTER_FILES_UPDATE- This operation replaces existing files in the registered file set with new ones, either turning a sparse entry (one wherefdis equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_files_update, which contains an offset on which to start the update, and an array of file descriptors to use for the update.nr_argsmust contain the number of descriptors in the passed in array.Available since 5.5.
File descriptors can be skipped if they are set to
REGISTER_FILES_SKIP. Skipping an fd will not touch the file associated with the previous fd at that index.Available since 5.12.
REGISTER_EVENTFD_ASYNC- This works just likeREGISTER_EVENTFD, except notifications are only posted for events that complete in an async manner.This means that events that complete inline while being submitted do not trigger a notification event. The arguments supplied are the same as for
IORING_REGISTER_EVENTFD.Available since 5.6.
REGISTER_PROBE- This operation returns a structure,io_uring_probe, which contains information about theopcodessupported byio_uringon the running kernel.argmust contain a pointer to a structio_uring_probe, andnr_argsmust contain the size of the ops array in that probe struct. Theopsarray is of the typeio_uring_probe_op, which holds the value of theopcodeand aflagsfield. If the flags field hasIO_URING_OP_SUPPORTEDset, then this opcode is supported on the running kernel.Available since 5.6.
REGISTER_PERSONALITY- This operation registers credentials of the running application withio_uring, and returns an id associated with these credentials.Applications wishing to share a ring between separate users/processes can pass in this credential id in the sqe personality field. If set, that particular sqe will be issued with these credentials. Must be invoked with
argset toNULLandnr_argsset to zero.Available since 5.6.
UNREGISTER_PERSONALITY- This operation unregisters a previously registered personality withio_uring.nr_argsmust be set to the id in question, andargmust be set toNULL.Available since 5.6.
REGISTER_RESTRICTIONS-argpoints to a structio_uring_restrictionarray ofnr_argsentries.With an entry it is possible to allow an
registeropcode, or specify whichopcodeand flags of the submission queue entry are allowed, or require certain flags to be specified (these flags must be set on each submission queue entry).All the restrictions must be submitted with a single
io_uring_register()call and they are handled as an allowlist (opcodesand flags not registered, are not allowed).Restrictions can be registered only if the
io_uringring started in a disabled state (SETUP_R_DISABLEDmust be specified in the call tosetup).Available since 5.10.
REGISTER_ENABLE_RINGS- This operation enables anio_uringring started in a disabled state (SETUP_R_DISABLEDwas specified in the call tosetup).While the
io_uringring is disabled, submissions are not allowed and registrations are not restricted. After the execution of this operation, theio_uringring is enabled: submissions and registration are allowed, but they will be validated following the registered restrictions (if any). This operation takes no argument, must be invoked withargset toNULLandnr_argsset to zero.Available since 5.10.
REGISTER_FILES2- Register files for I/O. Similar toREGISTER_FILES.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The
datafield contains a pointer to an array ofnrfile descriptors (signed 32 bit integers).tagsfield should either be 0 or or point to an array ofnr"tags" (unsigned 64 bit integers). SeeREGISTER_BUFFERS2for more info on resource tagging.Note that resource updates, e.g.
REGISTER_FILES_UPDATE, don't necessarily deallocate resources, they might be held until all requests using that resource complete.Available since 5.13.
REGISTER_FILES_UPDATE2- Similar toREGISTER_FILES_UPDATE, replaces existing files in the registered file set with new ones, either turning a sparse entry (one where fd is equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of file descriptors to use for the update stored in data.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_BUFFERS2- Register buffers for I/O.Similar to
REGISTER_BUFFERSbut aims to have a more extensible ABI.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The data field contains a pointer to a struct
iovecarray ofnrentries. Thetagsfield should either be 0, then tagging is disabled, or point to an array ofnr"tags" (unsigned 64 bit integers). If a tag is zero, then tagging for this particular resource (a buffer in this case) is disabled. Otherwise, after the resource had been unregistered and it's not used anymore, a CQE will be posted withuser_dataset to the specified tag and all other fields zeroed.Note that resource updates, e.g.
REGISTER_BUFFERS_UPDATE, don't necessarily deallocate resources by the time it returns, but they might be held alive until all requests using it complete.Available since 5.13.
REGISTER_BUFFERS_UPDATE- Updates registered buffers with new ones, either turning a sparse entry into a real one, or replacing an existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of structiovec.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_IOWQ_AFF- By default, async workers created byio_uringwill inherit the CPU mask of its parent.This is usually all the CPUs in the system, unless the parent is being run with a limited set. If this isn't the desired outcome, the application may explicitly tell
io_uringwhat CPUs the async workers may run on.argmust point to acpu_set_tmask, andnr_argsthe byte size of that mask.Available since 5.14.
UNREGISTER_IOWQ_AFF- Undoes a CPU mask previously set withREGISTER_IOWQ_AFF.Must not have
argornr_argsset.Available since 5.14.
REGISTER_IOWQ_MAX_WORKERS- By default,io_uringlimits the unbounded workers created to the maximum processor count set byRLIMIT_NPROCand the bounded workers is a function of the SQ ring size and the number of CPUs in the system. Sometimes this can be excessive (or too little, for bounded), and this command provides a way to change the count per ring (per NUMA node) instead.argmust be set to an unsigned int pointer to an array of two values, with the values in the array being set to the maximum count of workers per NUMA node. Index 0 holds the bounded worker count, and index 1 holds the unbounded worker count. On successful return, the passed in array will contain the previous maximum valyes for each type. If the count being passed in is 0, then this command returns the current maximum values and doesn't modify the current setting.nr_argsmust be set to 2, as the command takes two values.Available since 5.15.
REGISTER_RING_FDS- Wheneverenteris called to submit request or wait for completions, the kernel must grab a reference to the file descriptor. If the application using io_uring is threaded, the file table is marked as shared, and the reference grab and put of the file descriptor count is more expensive than it is for a non-threaded application.Similarly to how io_uring allows registration of files, this allow registration of the ring file descriptor itself. This reduces the overhead of the
io_uring_enter (2)system call.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Thedatafield of this struct must point to an io_uring file descriptor, and theoffsetfield can be either-1or an explicit offset desired for the registered file descriptor value. If-1is used, then upon successful return of this system call, the field will contain the value of the registered file descriptor to be used for futureio_uring_enter (2)system calls.On successful completion of this request, the returned descriptors may be used instead of the real file descriptor for
io_uring_enter (2), provided thatIORING_ENTER_REGISTERED_RINGis set in theflagsfor the system call. This flag tells the kernel that a registered descriptor is used rather than a real file descriptor.Each thread or process using a ring must register the file descriptor directly by issuing this request.
The maximum number of supported registered ring descriptors is currently limited to
16.Available since 5.18.
UNREGISTER_RING_FDS- Unregister descriptors previously registered withREGISTER_RING_FDS.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Only theoffsetfield should be set in the structure, containing the registered file descriptor offset previously returned fromIORING_REGISTER_RING_FDSthat the application wishes to unregister.Note that this isn't done automatically on ring exit, if the thread or task that previously registered a ring file descriptor isn't exiting. It is recommended to manually unregister any previously registered ring descriptors if the ring is closed and the task persists. This will free up a registration slot, making it available for future use.
Available since 5.18.
REGISTER_PBUF_RING- register ring based provide buffer groupUNREGISTER_PBUF_RING- unregister ring based provide buffer groupREGISTER_SYNC_CANCEL- sync cancelation APIREGISTER_FILE_ALLOC_RANGE- register a range of fixed file slots for automatic slot allocationREGISTER_PBUF_STATUS- return status information for a buffer groupREGISTER_NAPI- set busy poll settingsUNREGISTER_NAPI- clear busy poll settingsREGISTER_CLOCKREGISTER_CLONE_BUFFERS- clone registered buffers from source ring to current ringREGISTER_LASTREGISTER_USE_REGISTERED_RING
- See Also:
-
IORING_UNREGISTER_PERSONALITY
public static final int IORING_UNREGISTER_PERSONALITYio_uring_register_opEnum values:
REGISTER_BUFFERS-argpoints to a structiovecarray ofnr_argsentries.The buffers associated with the
iovecswill be locked in memory and charged against the user'sRLIMIT_MEMLOCKresource limit. Seegetrlimit(2)for more information. Additionally, there is a size limit of 1GiB per buffer. Currently, the buffers must be anonymous, non-file-backed memory, such as that returned bymalloc(3)ormmap(2)with theMAP_ANONYMOUSflag set. It is expected that this limitation will be lifted in the future. Huge pages are supported as well. Note that the entire huge page will be pinned in the kernel, even if only a portion of it is used.After a successful call, the supplied buffers are mapped into the kernel and eligible for I/O. To make use of them, the application must specify the
OP_READ_FIXEDorOP_WRITE_FIXEDopcodesin the submission queue entry (see the structio_uring_sqedefinition inenter), and set thebuf_indexfield to the desired buffer index. The memory range described by the submission queue entry'saddrandlenfields must fall within the indexed buffer.It is perfectly valid to setup a large buffer and then only use part of it for an I/O, as long as the range is within the originally mapped region.
An application can increase or decrease the size or number of registered buffers by first unregistering the existing buffers, and then issuing a new call to
io_uring_register()with the new buffers.Note that before 5.13 registering buffers would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding.
An application need not unregister buffers explicitly before shutting down the io_uring instance.
Available since 5.1.
UNREGISTER_BUFFERS- This operation takes no argument, andargmust be passed asNULL.All previously registered buffers associated with the
io_uringinstance will be released.Available since 5.1.
REGISTER_FILES- Register files for I/O.argcontains a pointer to an array ofnr_argsfile descriptors (signed 32 bit integers). To make use of the registered files, theIOSQE_FIXED_FILEflag must be set in theflagsmember of the structio_uring_sqe, and thefdmember is set to the index of the file in the file descriptor array.The file set may be sparse, meaning that the
fdfield in the array may be set to -1. SeeREGISTER_FILES_UPDATEfor how to update files in place.Note that before 5.13 registering files would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding. See
REGISTER_FILES_UPDATEfor how to update an existing set without that limitation.Files are automatically unregistered when the io_uring instance is torn down. An application needs only unregister if it wishes to register a new set of fds.
Available since 5.1.
UNREGISTER_FILES- This operation requires no argument, andargmust be passed asNULL.All previously registered files associated with the
io_uringinstance will be unregistered.Available since 5.1.
REGISTER_EVENTFD- It's possible to useeventfd(2)to get notified of completion events on anio_uringinstance. If this is desired, an eventfd file descriptor can be registered through this operation.argmust contain a pointer to the eventfd file descriptor, andnr_argsmust be 1.Available since 5.2.
An application can temporarily disable notifications, coming through the registered eventfd, by setting the
CQ_EVENTFD_DISABLEDbit in theflagsfield of the CQ ring.Available since 5.8.
UNREGISTER_EVENTFD- Unregister an eventfd file descriptor to stop notifications.Since only one eventfd descriptor is currently supported, this operation takes no argument, and
argmust be passed asNULLandnr_argsmust be zero.Available since 5.2.
REGISTER_FILES_UPDATE- This operation replaces existing files in the registered file set with new ones, either turning a sparse entry (one wherefdis equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_files_update, which contains an offset on which to start the update, and an array of file descriptors to use for the update.nr_argsmust contain the number of descriptors in the passed in array.Available since 5.5.
File descriptors can be skipped if they are set to
REGISTER_FILES_SKIP. Skipping an fd will not touch the file associated with the previous fd at that index.Available since 5.12.
REGISTER_EVENTFD_ASYNC- This works just likeREGISTER_EVENTFD, except notifications are only posted for events that complete in an async manner.This means that events that complete inline while being submitted do not trigger a notification event. The arguments supplied are the same as for
IORING_REGISTER_EVENTFD.Available since 5.6.
REGISTER_PROBE- This operation returns a structure,io_uring_probe, which contains information about theopcodessupported byio_uringon the running kernel.argmust contain a pointer to a structio_uring_probe, andnr_argsmust contain the size of the ops array in that probe struct. Theopsarray is of the typeio_uring_probe_op, which holds the value of theopcodeand aflagsfield. If the flags field hasIO_URING_OP_SUPPORTEDset, then this opcode is supported on the running kernel.Available since 5.6.
REGISTER_PERSONALITY- This operation registers credentials of the running application withio_uring, and returns an id associated with these credentials.Applications wishing to share a ring between separate users/processes can pass in this credential id in the sqe personality field. If set, that particular sqe will be issued with these credentials. Must be invoked with
argset toNULLandnr_argsset to zero.Available since 5.6.
UNREGISTER_PERSONALITY- This operation unregisters a previously registered personality withio_uring.nr_argsmust be set to the id in question, andargmust be set toNULL.Available since 5.6.
REGISTER_RESTRICTIONS-argpoints to a structio_uring_restrictionarray ofnr_argsentries.With an entry it is possible to allow an
registeropcode, or specify whichopcodeand flags of the submission queue entry are allowed, or require certain flags to be specified (these flags must be set on each submission queue entry).All the restrictions must be submitted with a single
io_uring_register()call and they are handled as an allowlist (opcodesand flags not registered, are not allowed).Restrictions can be registered only if the
io_uringring started in a disabled state (SETUP_R_DISABLEDmust be specified in the call tosetup).Available since 5.10.
REGISTER_ENABLE_RINGS- This operation enables anio_uringring started in a disabled state (SETUP_R_DISABLEDwas specified in the call tosetup).While the
io_uringring is disabled, submissions are not allowed and registrations are not restricted. After the execution of this operation, theio_uringring is enabled: submissions and registration are allowed, but they will be validated following the registered restrictions (if any). This operation takes no argument, must be invoked withargset toNULLandnr_argsset to zero.Available since 5.10.
REGISTER_FILES2- Register files for I/O. Similar toREGISTER_FILES.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The
datafield contains a pointer to an array ofnrfile descriptors (signed 32 bit integers).tagsfield should either be 0 or or point to an array ofnr"tags" (unsigned 64 bit integers). SeeREGISTER_BUFFERS2for more info on resource tagging.Note that resource updates, e.g.
REGISTER_FILES_UPDATE, don't necessarily deallocate resources, they might be held until all requests using that resource complete.Available since 5.13.
REGISTER_FILES_UPDATE2- Similar toREGISTER_FILES_UPDATE, replaces existing files in the registered file set with new ones, either turning a sparse entry (one where fd is equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of file descriptors to use for the update stored in data.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_BUFFERS2- Register buffers for I/O.Similar to
REGISTER_BUFFERSbut aims to have a more extensible ABI.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The data field contains a pointer to a struct
iovecarray ofnrentries. Thetagsfield should either be 0, then tagging is disabled, or point to an array ofnr"tags" (unsigned 64 bit integers). If a tag is zero, then tagging for this particular resource (a buffer in this case) is disabled. Otherwise, after the resource had been unregistered and it's not used anymore, a CQE will be posted withuser_dataset to the specified tag and all other fields zeroed.Note that resource updates, e.g.
REGISTER_BUFFERS_UPDATE, don't necessarily deallocate resources by the time it returns, but they might be held alive until all requests using it complete.Available since 5.13.
REGISTER_BUFFERS_UPDATE- Updates registered buffers with new ones, either turning a sparse entry into a real one, or replacing an existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of structiovec.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_IOWQ_AFF- By default, async workers created byio_uringwill inherit the CPU mask of its parent.This is usually all the CPUs in the system, unless the parent is being run with a limited set. If this isn't the desired outcome, the application may explicitly tell
io_uringwhat CPUs the async workers may run on.argmust point to acpu_set_tmask, andnr_argsthe byte size of that mask.Available since 5.14.
UNREGISTER_IOWQ_AFF- Undoes a CPU mask previously set withREGISTER_IOWQ_AFF.Must not have
argornr_argsset.Available since 5.14.
REGISTER_IOWQ_MAX_WORKERS- By default,io_uringlimits the unbounded workers created to the maximum processor count set byRLIMIT_NPROCand the bounded workers is a function of the SQ ring size and the number of CPUs in the system. Sometimes this can be excessive (or too little, for bounded), and this command provides a way to change the count per ring (per NUMA node) instead.argmust be set to an unsigned int pointer to an array of two values, with the values in the array being set to the maximum count of workers per NUMA node. Index 0 holds the bounded worker count, and index 1 holds the unbounded worker count. On successful return, the passed in array will contain the previous maximum valyes for each type. If the count being passed in is 0, then this command returns the current maximum values and doesn't modify the current setting.nr_argsmust be set to 2, as the command takes two values.Available since 5.15.
REGISTER_RING_FDS- Wheneverenteris called to submit request or wait for completions, the kernel must grab a reference to the file descriptor. If the application using io_uring is threaded, the file table is marked as shared, and the reference grab and put of the file descriptor count is more expensive than it is for a non-threaded application.Similarly to how io_uring allows registration of files, this allow registration of the ring file descriptor itself. This reduces the overhead of the
io_uring_enter (2)system call.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Thedatafield of this struct must point to an io_uring file descriptor, and theoffsetfield can be either-1or an explicit offset desired for the registered file descriptor value. If-1is used, then upon successful return of this system call, the field will contain the value of the registered file descriptor to be used for futureio_uring_enter (2)system calls.On successful completion of this request, the returned descriptors may be used instead of the real file descriptor for
io_uring_enter (2), provided thatIORING_ENTER_REGISTERED_RINGis set in theflagsfor the system call. This flag tells the kernel that a registered descriptor is used rather than a real file descriptor.Each thread or process using a ring must register the file descriptor directly by issuing this request.
The maximum number of supported registered ring descriptors is currently limited to
16.Available since 5.18.
UNREGISTER_RING_FDS- Unregister descriptors previously registered withREGISTER_RING_FDS.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Only theoffsetfield should be set in the structure, containing the registered file descriptor offset previously returned fromIORING_REGISTER_RING_FDSthat the application wishes to unregister.Note that this isn't done automatically on ring exit, if the thread or task that previously registered a ring file descriptor isn't exiting. It is recommended to manually unregister any previously registered ring descriptors if the ring is closed and the task persists. This will free up a registration slot, making it available for future use.
Available since 5.18.
REGISTER_PBUF_RING- register ring based provide buffer groupUNREGISTER_PBUF_RING- unregister ring based provide buffer groupREGISTER_SYNC_CANCEL- sync cancelation APIREGISTER_FILE_ALLOC_RANGE- register a range of fixed file slots for automatic slot allocationREGISTER_PBUF_STATUS- return status information for a buffer groupREGISTER_NAPI- set busy poll settingsUNREGISTER_NAPI- clear busy poll settingsREGISTER_CLOCKREGISTER_CLONE_BUFFERS- clone registered buffers from source ring to current ringREGISTER_LASTREGISTER_USE_REGISTERED_RING
- See Also:
-
IORING_REGISTER_RESTRICTIONS
public static final int IORING_REGISTER_RESTRICTIONSio_uring_register_opEnum values:
REGISTER_BUFFERS-argpoints to a structiovecarray ofnr_argsentries.The buffers associated with the
iovecswill be locked in memory and charged against the user'sRLIMIT_MEMLOCKresource limit. Seegetrlimit(2)for more information. Additionally, there is a size limit of 1GiB per buffer. Currently, the buffers must be anonymous, non-file-backed memory, such as that returned bymalloc(3)ormmap(2)with theMAP_ANONYMOUSflag set. It is expected that this limitation will be lifted in the future. Huge pages are supported as well. Note that the entire huge page will be pinned in the kernel, even if only a portion of it is used.After a successful call, the supplied buffers are mapped into the kernel and eligible for I/O. To make use of them, the application must specify the
OP_READ_FIXEDorOP_WRITE_FIXEDopcodesin the submission queue entry (see the structio_uring_sqedefinition inenter), and set thebuf_indexfield to the desired buffer index. The memory range described by the submission queue entry'saddrandlenfields must fall within the indexed buffer.It is perfectly valid to setup a large buffer and then only use part of it for an I/O, as long as the range is within the originally mapped region.
An application can increase or decrease the size or number of registered buffers by first unregistering the existing buffers, and then issuing a new call to
io_uring_register()with the new buffers.Note that before 5.13 registering buffers would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding.
An application need not unregister buffers explicitly before shutting down the io_uring instance.
Available since 5.1.
UNREGISTER_BUFFERS- This operation takes no argument, andargmust be passed asNULL.All previously registered buffers associated with the
io_uringinstance will be released.Available since 5.1.
REGISTER_FILES- Register files for I/O.argcontains a pointer to an array ofnr_argsfile descriptors (signed 32 bit integers). To make use of the registered files, theIOSQE_FIXED_FILEflag must be set in theflagsmember of the structio_uring_sqe, and thefdmember is set to the index of the file in the file descriptor array.The file set may be sparse, meaning that the
fdfield in the array may be set to -1. SeeREGISTER_FILES_UPDATEfor how to update files in place.Note that before 5.13 registering files would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding. See
REGISTER_FILES_UPDATEfor how to update an existing set without that limitation.Files are automatically unregistered when the io_uring instance is torn down. An application needs only unregister if it wishes to register a new set of fds.
Available since 5.1.
UNREGISTER_FILES- This operation requires no argument, andargmust be passed asNULL.All previously registered files associated with the
io_uringinstance will be unregistered.Available since 5.1.
REGISTER_EVENTFD- It's possible to useeventfd(2)to get notified of completion events on anio_uringinstance. If this is desired, an eventfd file descriptor can be registered through this operation.argmust contain a pointer to the eventfd file descriptor, andnr_argsmust be 1.Available since 5.2.
An application can temporarily disable notifications, coming through the registered eventfd, by setting the
CQ_EVENTFD_DISABLEDbit in theflagsfield of the CQ ring.Available since 5.8.
UNREGISTER_EVENTFD- Unregister an eventfd file descriptor to stop notifications.Since only one eventfd descriptor is currently supported, this operation takes no argument, and
argmust be passed asNULLandnr_argsmust be zero.Available since 5.2.
REGISTER_FILES_UPDATE- This operation replaces existing files in the registered file set with new ones, either turning a sparse entry (one wherefdis equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_files_update, which contains an offset on which to start the update, and an array of file descriptors to use for the update.nr_argsmust contain the number of descriptors in the passed in array.Available since 5.5.
File descriptors can be skipped if they are set to
REGISTER_FILES_SKIP. Skipping an fd will not touch the file associated with the previous fd at that index.Available since 5.12.
REGISTER_EVENTFD_ASYNC- This works just likeREGISTER_EVENTFD, except notifications are only posted for events that complete in an async manner.This means that events that complete inline while being submitted do not trigger a notification event. The arguments supplied are the same as for
IORING_REGISTER_EVENTFD.Available since 5.6.
REGISTER_PROBE- This operation returns a structure,io_uring_probe, which contains information about theopcodessupported byio_uringon the running kernel.argmust contain a pointer to a structio_uring_probe, andnr_argsmust contain the size of the ops array in that probe struct. Theopsarray is of the typeio_uring_probe_op, which holds the value of theopcodeand aflagsfield. If the flags field hasIO_URING_OP_SUPPORTEDset, then this opcode is supported on the running kernel.Available since 5.6.
REGISTER_PERSONALITY- This operation registers credentials of the running application withio_uring, and returns an id associated with these credentials.Applications wishing to share a ring between separate users/processes can pass in this credential id in the sqe personality field. If set, that particular sqe will be issued with these credentials. Must be invoked with
argset toNULLandnr_argsset to zero.Available since 5.6.
UNREGISTER_PERSONALITY- This operation unregisters a previously registered personality withio_uring.nr_argsmust be set to the id in question, andargmust be set toNULL.Available since 5.6.
REGISTER_RESTRICTIONS-argpoints to a structio_uring_restrictionarray ofnr_argsentries.With an entry it is possible to allow an
registeropcode, or specify whichopcodeand flags of the submission queue entry are allowed, or require certain flags to be specified (these flags must be set on each submission queue entry).All the restrictions must be submitted with a single
io_uring_register()call and they are handled as an allowlist (opcodesand flags not registered, are not allowed).Restrictions can be registered only if the
io_uringring started in a disabled state (SETUP_R_DISABLEDmust be specified in the call tosetup).Available since 5.10.
REGISTER_ENABLE_RINGS- This operation enables anio_uringring started in a disabled state (SETUP_R_DISABLEDwas specified in the call tosetup).While the
io_uringring is disabled, submissions are not allowed and registrations are not restricted. After the execution of this operation, theio_uringring is enabled: submissions and registration are allowed, but they will be validated following the registered restrictions (if any). This operation takes no argument, must be invoked withargset toNULLandnr_argsset to zero.Available since 5.10.
REGISTER_FILES2- Register files for I/O. Similar toREGISTER_FILES.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The
datafield contains a pointer to an array ofnrfile descriptors (signed 32 bit integers).tagsfield should either be 0 or or point to an array ofnr"tags" (unsigned 64 bit integers). SeeREGISTER_BUFFERS2for more info on resource tagging.Note that resource updates, e.g.
REGISTER_FILES_UPDATE, don't necessarily deallocate resources, they might be held until all requests using that resource complete.Available since 5.13.
REGISTER_FILES_UPDATE2- Similar toREGISTER_FILES_UPDATE, replaces existing files in the registered file set with new ones, either turning a sparse entry (one where fd is equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of file descriptors to use for the update stored in data.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_BUFFERS2- Register buffers for I/O.Similar to
REGISTER_BUFFERSbut aims to have a more extensible ABI.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The data field contains a pointer to a struct
iovecarray ofnrentries. Thetagsfield should either be 0, then tagging is disabled, or point to an array ofnr"tags" (unsigned 64 bit integers). If a tag is zero, then tagging for this particular resource (a buffer in this case) is disabled. Otherwise, after the resource had been unregistered and it's not used anymore, a CQE will be posted withuser_dataset to the specified tag and all other fields zeroed.Note that resource updates, e.g.
REGISTER_BUFFERS_UPDATE, don't necessarily deallocate resources by the time it returns, but they might be held alive until all requests using it complete.Available since 5.13.
REGISTER_BUFFERS_UPDATE- Updates registered buffers with new ones, either turning a sparse entry into a real one, or replacing an existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of structiovec.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_IOWQ_AFF- By default, async workers created byio_uringwill inherit the CPU mask of its parent.This is usually all the CPUs in the system, unless the parent is being run with a limited set. If this isn't the desired outcome, the application may explicitly tell
io_uringwhat CPUs the async workers may run on.argmust point to acpu_set_tmask, andnr_argsthe byte size of that mask.Available since 5.14.
UNREGISTER_IOWQ_AFF- Undoes a CPU mask previously set withREGISTER_IOWQ_AFF.Must not have
argornr_argsset.Available since 5.14.
REGISTER_IOWQ_MAX_WORKERS- By default,io_uringlimits the unbounded workers created to the maximum processor count set byRLIMIT_NPROCand the bounded workers is a function of the SQ ring size and the number of CPUs in the system. Sometimes this can be excessive (or too little, for bounded), and this command provides a way to change the count per ring (per NUMA node) instead.argmust be set to an unsigned int pointer to an array of two values, with the values in the array being set to the maximum count of workers per NUMA node. Index 0 holds the bounded worker count, and index 1 holds the unbounded worker count. On successful return, the passed in array will contain the previous maximum valyes for each type. If the count being passed in is 0, then this command returns the current maximum values and doesn't modify the current setting.nr_argsmust be set to 2, as the command takes two values.Available since 5.15.
REGISTER_RING_FDS- Wheneverenteris called to submit request or wait for completions, the kernel must grab a reference to the file descriptor. If the application using io_uring is threaded, the file table is marked as shared, and the reference grab and put of the file descriptor count is more expensive than it is for a non-threaded application.Similarly to how io_uring allows registration of files, this allow registration of the ring file descriptor itself. This reduces the overhead of the
io_uring_enter (2)system call.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Thedatafield of this struct must point to an io_uring file descriptor, and theoffsetfield can be either-1or an explicit offset desired for the registered file descriptor value. If-1is used, then upon successful return of this system call, the field will contain the value of the registered file descriptor to be used for futureio_uring_enter (2)system calls.On successful completion of this request, the returned descriptors may be used instead of the real file descriptor for
io_uring_enter (2), provided thatIORING_ENTER_REGISTERED_RINGis set in theflagsfor the system call. This flag tells the kernel that a registered descriptor is used rather than a real file descriptor.Each thread or process using a ring must register the file descriptor directly by issuing this request.
The maximum number of supported registered ring descriptors is currently limited to
16.Available since 5.18.
UNREGISTER_RING_FDS- Unregister descriptors previously registered withREGISTER_RING_FDS.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Only theoffsetfield should be set in the structure, containing the registered file descriptor offset previously returned fromIORING_REGISTER_RING_FDSthat the application wishes to unregister.Note that this isn't done automatically on ring exit, if the thread or task that previously registered a ring file descriptor isn't exiting. It is recommended to manually unregister any previously registered ring descriptors if the ring is closed and the task persists. This will free up a registration slot, making it available for future use.
Available since 5.18.
REGISTER_PBUF_RING- register ring based provide buffer groupUNREGISTER_PBUF_RING- unregister ring based provide buffer groupREGISTER_SYNC_CANCEL- sync cancelation APIREGISTER_FILE_ALLOC_RANGE- register a range of fixed file slots for automatic slot allocationREGISTER_PBUF_STATUS- return status information for a buffer groupREGISTER_NAPI- set busy poll settingsUNREGISTER_NAPI- clear busy poll settingsREGISTER_CLOCKREGISTER_CLONE_BUFFERS- clone registered buffers from source ring to current ringREGISTER_LASTREGISTER_USE_REGISTERED_RING
- See Also:
-
IORING_REGISTER_ENABLE_RINGS
public static final int IORING_REGISTER_ENABLE_RINGSio_uring_register_opEnum values:
REGISTER_BUFFERS-argpoints to a structiovecarray ofnr_argsentries.The buffers associated with the
iovecswill be locked in memory and charged against the user'sRLIMIT_MEMLOCKresource limit. Seegetrlimit(2)for more information. Additionally, there is a size limit of 1GiB per buffer. Currently, the buffers must be anonymous, non-file-backed memory, such as that returned bymalloc(3)ormmap(2)with theMAP_ANONYMOUSflag set. It is expected that this limitation will be lifted in the future. Huge pages are supported as well. Note that the entire huge page will be pinned in the kernel, even if only a portion of it is used.After a successful call, the supplied buffers are mapped into the kernel and eligible for I/O. To make use of them, the application must specify the
OP_READ_FIXEDorOP_WRITE_FIXEDopcodesin the submission queue entry (see the structio_uring_sqedefinition inenter), and set thebuf_indexfield to the desired buffer index. The memory range described by the submission queue entry'saddrandlenfields must fall within the indexed buffer.It is perfectly valid to setup a large buffer and then only use part of it for an I/O, as long as the range is within the originally mapped region.
An application can increase or decrease the size or number of registered buffers by first unregistering the existing buffers, and then issuing a new call to
io_uring_register()with the new buffers.Note that before 5.13 registering buffers would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding.
An application need not unregister buffers explicitly before shutting down the io_uring instance.
Available since 5.1.
UNREGISTER_BUFFERS- This operation takes no argument, andargmust be passed asNULL.All previously registered buffers associated with the
io_uringinstance will be released.Available since 5.1.
REGISTER_FILES- Register files for I/O.argcontains a pointer to an array ofnr_argsfile descriptors (signed 32 bit integers). To make use of the registered files, theIOSQE_FIXED_FILEflag must be set in theflagsmember of the structio_uring_sqe, and thefdmember is set to the index of the file in the file descriptor array.The file set may be sparse, meaning that the
fdfield in the array may be set to -1. SeeREGISTER_FILES_UPDATEfor how to update files in place.Note that before 5.13 registering files would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding. See
REGISTER_FILES_UPDATEfor how to update an existing set without that limitation.Files are automatically unregistered when the io_uring instance is torn down. An application needs only unregister if it wishes to register a new set of fds.
Available since 5.1.
UNREGISTER_FILES- This operation requires no argument, andargmust be passed asNULL.All previously registered files associated with the
io_uringinstance will be unregistered.Available since 5.1.
REGISTER_EVENTFD- It's possible to useeventfd(2)to get notified of completion events on anio_uringinstance. If this is desired, an eventfd file descriptor can be registered through this operation.argmust contain a pointer to the eventfd file descriptor, andnr_argsmust be 1.Available since 5.2.
An application can temporarily disable notifications, coming through the registered eventfd, by setting the
CQ_EVENTFD_DISABLEDbit in theflagsfield of the CQ ring.Available since 5.8.
UNREGISTER_EVENTFD- Unregister an eventfd file descriptor to stop notifications.Since only one eventfd descriptor is currently supported, this operation takes no argument, and
argmust be passed asNULLandnr_argsmust be zero.Available since 5.2.
REGISTER_FILES_UPDATE- This operation replaces existing files in the registered file set with new ones, either turning a sparse entry (one wherefdis equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_files_update, which contains an offset on which to start the update, and an array of file descriptors to use for the update.nr_argsmust contain the number of descriptors in the passed in array.Available since 5.5.
File descriptors can be skipped if they are set to
REGISTER_FILES_SKIP. Skipping an fd will not touch the file associated with the previous fd at that index.Available since 5.12.
REGISTER_EVENTFD_ASYNC- This works just likeREGISTER_EVENTFD, except notifications are only posted for events that complete in an async manner.This means that events that complete inline while being submitted do not trigger a notification event. The arguments supplied are the same as for
IORING_REGISTER_EVENTFD.Available since 5.6.
REGISTER_PROBE- This operation returns a structure,io_uring_probe, which contains information about theopcodessupported byio_uringon the running kernel.argmust contain a pointer to a structio_uring_probe, andnr_argsmust contain the size of the ops array in that probe struct. Theopsarray is of the typeio_uring_probe_op, which holds the value of theopcodeand aflagsfield. If the flags field hasIO_URING_OP_SUPPORTEDset, then this opcode is supported on the running kernel.Available since 5.6.
REGISTER_PERSONALITY- This operation registers credentials of the running application withio_uring, and returns an id associated with these credentials.Applications wishing to share a ring between separate users/processes can pass in this credential id in the sqe personality field. If set, that particular sqe will be issued with these credentials. Must be invoked with
argset toNULLandnr_argsset to zero.Available since 5.6.
UNREGISTER_PERSONALITY- This operation unregisters a previously registered personality withio_uring.nr_argsmust be set to the id in question, andargmust be set toNULL.Available since 5.6.
REGISTER_RESTRICTIONS-argpoints to a structio_uring_restrictionarray ofnr_argsentries.With an entry it is possible to allow an
registeropcode, or specify whichopcodeand flags of the submission queue entry are allowed, or require certain flags to be specified (these flags must be set on each submission queue entry).All the restrictions must be submitted with a single
io_uring_register()call and they are handled as an allowlist (opcodesand flags not registered, are not allowed).Restrictions can be registered only if the
io_uringring started in a disabled state (SETUP_R_DISABLEDmust be specified in the call tosetup).Available since 5.10.
REGISTER_ENABLE_RINGS- This operation enables anio_uringring started in a disabled state (SETUP_R_DISABLEDwas specified in the call tosetup).While the
io_uringring is disabled, submissions are not allowed and registrations are not restricted. After the execution of this operation, theio_uringring is enabled: submissions and registration are allowed, but they will be validated following the registered restrictions (if any). This operation takes no argument, must be invoked withargset toNULLandnr_argsset to zero.Available since 5.10.
REGISTER_FILES2- Register files for I/O. Similar toREGISTER_FILES.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The
datafield contains a pointer to an array ofnrfile descriptors (signed 32 bit integers).tagsfield should either be 0 or or point to an array ofnr"tags" (unsigned 64 bit integers). SeeREGISTER_BUFFERS2for more info on resource tagging.Note that resource updates, e.g.
REGISTER_FILES_UPDATE, don't necessarily deallocate resources, they might be held until all requests using that resource complete.Available since 5.13.
REGISTER_FILES_UPDATE2- Similar toREGISTER_FILES_UPDATE, replaces existing files in the registered file set with new ones, either turning a sparse entry (one where fd is equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of file descriptors to use for the update stored in data.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_BUFFERS2- Register buffers for I/O.Similar to
REGISTER_BUFFERSbut aims to have a more extensible ABI.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The data field contains a pointer to a struct
iovecarray ofnrentries. Thetagsfield should either be 0, then tagging is disabled, or point to an array ofnr"tags" (unsigned 64 bit integers). If a tag is zero, then tagging for this particular resource (a buffer in this case) is disabled. Otherwise, after the resource had been unregistered and it's not used anymore, a CQE will be posted withuser_dataset to the specified tag and all other fields zeroed.Note that resource updates, e.g.
REGISTER_BUFFERS_UPDATE, don't necessarily deallocate resources by the time it returns, but they might be held alive until all requests using it complete.Available since 5.13.
REGISTER_BUFFERS_UPDATE- Updates registered buffers with new ones, either turning a sparse entry into a real one, or replacing an existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of structiovec.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_IOWQ_AFF- By default, async workers created byio_uringwill inherit the CPU mask of its parent.This is usually all the CPUs in the system, unless the parent is being run with a limited set. If this isn't the desired outcome, the application may explicitly tell
io_uringwhat CPUs the async workers may run on.argmust point to acpu_set_tmask, andnr_argsthe byte size of that mask.Available since 5.14.
UNREGISTER_IOWQ_AFF- Undoes a CPU mask previously set withREGISTER_IOWQ_AFF.Must not have
argornr_argsset.Available since 5.14.
REGISTER_IOWQ_MAX_WORKERS- By default,io_uringlimits the unbounded workers created to the maximum processor count set byRLIMIT_NPROCand the bounded workers is a function of the SQ ring size and the number of CPUs in the system. Sometimes this can be excessive (or too little, for bounded), and this command provides a way to change the count per ring (per NUMA node) instead.argmust be set to an unsigned int pointer to an array of two values, with the values in the array being set to the maximum count of workers per NUMA node. Index 0 holds the bounded worker count, and index 1 holds the unbounded worker count. On successful return, the passed in array will contain the previous maximum valyes for each type. If the count being passed in is 0, then this command returns the current maximum values and doesn't modify the current setting.nr_argsmust be set to 2, as the command takes two values.Available since 5.15.
REGISTER_RING_FDS- Wheneverenteris called to submit request or wait for completions, the kernel must grab a reference to the file descriptor. If the application using io_uring is threaded, the file table is marked as shared, and the reference grab and put of the file descriptor count is more expensive than it is for a non-threaded application.Similarly to how io_uring allows registration of files, this allow registration of the ring file descriptor itself. This reduces the overhead of the
io_uring_enter (2)system call.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Thedatafield of this struct must point to an io_uring file descriptor, and theoffsetfield can be either-1or an explicit offset desired for the registered file descriptor value. If-1is used, then upon successful return of this system call, the field will contain the value of the registered file descriptor to be used for futureio_uring_enter (2)system calls.On successful completion of this request, the returned descriptors may be used instead of the real file descriptor for
io_uring_enter (2), provided thatIORING_ENTER_REGISTERED_RINGis set in theflagsfor the system call. This flag tells the kernel that a registered descriptor is used rather than a real file descriptor.Each thread or process using a ring must register the file descriptor directly by issuing this request.
The maximum number of supported registered ring descriptors is currently limited to
16.Available since 5.18.
UNREGISTER_RING_FDS- Unregister descriptors previously registered withREGISTER_RING_FDS.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Only theoffsetfield should be set in the structure, containing the registered file descriptor offset previously returned fromIORING_REGISTER_RING_FDSthat the application wishes to unregister.Note that this isn't done automatically on ring exit, if the thread or task that previously registered a ring file descriptor isn't exiting. It is recommended to manually unregister any previously registered ring descriptors if the ring is closed and the task persists. This will free up a registration slot, making it available for future use.
Available since 5.18.
REGISTER_PBUF_RING- register ring based provide buffer groupUNREGISTER_PBUF_RING- unregister ring based provide buffer groupREGISTER_SYNC_CANCEL- sync cancelation APIREGISTER_FILE_ALLOC_RANGE- register a range of fixed file slots for automatic slot allocationREGISTER_PBUF_STATUS- return status information for a buffer groupREGISTER_NAPI- set busy poll settingsUNREGISTER_NAPI- clear busy poll settingsREGISTER_CLOCKREGISTER_CLONE_BUFFERS- clone registered buffers from source ring to current ringREGISTER_LASTREGISTER_USE_REGISTERED_RING
- See Also:
-
IORING_REGISTER_FILES2
public static final int IORING_REGISTER_FILES2io_uring_register_opEnum values:
REGISTER_BUFFERS-argpoints to a structiovecarray ofnr_argsentries.The buffers associated with the
iovecswill be locked in memory and charged against the user'sRLIMIT_MEMLOCKresource limit. Seegetrlimit(2)for more information. Additionally, there is a size limit of 1GiB per buffer. Currently, the buffers must be anonymous, non-file-backed memory, such as that returned bymalloc(3)ormmap(2)with theMAP_ANONYMOUSflag set. It is expected that this limitation will be lifted in the future. Huge pages are supported as well. Note that the entire huge page will be pinned in the kernel, even if only a portion of it is used.After a successful call, the supplied buffers are mapped into the kernel and eligible for I/O. To make use of them, the application must specify the
OP_READ_FIXEDorOP_WRITE_FIXEDopcodesin the submission queue entry (see the structio_uring_sqedefinition inenter), and set thebuf_indexfield to the desired buffer index. The memory range described by the submission queue entry'saddrandlenfields must fall within the indexed buffer.It is perfectly valid to setup a large buffer and then only use part of it for an I/O, as long as the range is within the originally mapped region.
An application can increase or decrease the size or number of registered buffers by first unregistering the existing buffers, and then issuing a new call to
io_uring_register()with the new buffers.Note that before 5.13 registering buffers would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding.
An application need not unregister buffers explicitly before shutting down the io_uring instance.
Available since 5.1.
UNREGISTER_BUFFERS- This operation takes no argument, andargmust be passed asNULL.All previously registered buffers associated with the
io_uringinstance will be released.Available since 5.1.
REGISTER_FILES- Register files for I/O.argcontains a pointer to an array ofnr_argsfile descriptors (signed 32 bit integers). To make use of the registered files, theIOSQE_FIXED_FILEflag must be set in theflagsmember of the structio_uring_sqe, and thefdmember is set to the index of the file in the file descriptor array.The file set may be sparse, meaning that the
fdfield in the array may be set to -1. SeeREGISTER_FILES_UPDATEfor how to update files in place.Note that before 5.13 registering files would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding. See
REGISTER_FILES_UPDATEfor how to update an existing set without that limitation.Files are automatically unregistered when the io_uring instance is torn down. An application needs only unregister if it wishes to register a new set of fds.
Available since 5.1.
UNREGISTER_FILES- This operation requires no argument, andargmust be passed asNULL.All previously registered files associated with the
io_uringinstance will be unregistered.Available since 5.1.
REGISTER_EVENTFD- It's possible to useeventfd(2)to get notified of completion events on anio_uringinstance. If this is desired, an eventfd file descriptor can be registered through this operation.argmust contain a pointer to the eventfd file descriptor, andnr_argsmust be 1.Available since 5.2.
An application can temporarily disable notifications, coming through the registered eventfd, by setting the
CQ_EVENTFD_DISABLEDbit in theflagsfield of the CQ ring.Available since 5.8.
UNREGISTER_EVENTFD- Unregister an eventfd file descriptor to stop notifications.Since only one eventfd descriptor is currently supported, this operation takes no argument, and
argmust be passed asNULLandnr_argsmust be zero.Available since 5.2.
REGISTER_FILES_UPDATE- This operation replaces existing files in the registered file set with new ones, either turning a sparse entry (one wherefdis equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_files_update, which contains an offset on which to start the update, and an array of file descriptors to use for the update.nr_argsmust contain the number of descriptors in the passed in array.Available since 5.5.
File descriptors can be skipped if they are set to
REGISTER_FILES_SKIP. Skipping an fd will not touch the file associated with the previous fd at that index.Available since 5.12.
REGISTER_EVENTFD_ASYNC- This works just likeREGISTER_EVENTFD, except notifications are only posted for events that complete in an async manner.This means that events that complete inline while being submitted do not trigger a notification event. The arguments supplied are the same as for
IORING_REGISTER_EVENTFD.Available since 5.6.
REGISTER_PROBE- This operation returns a structure,io_uring_probe, which contains information about theopcodessupported byio_uringon the running kernel.argmust contain a pointer to a structio_uring_probe, andnr_argsmust contain the size of the ops array in that probe struct. Theopsarray is of the typeio_uring_probe_op, which holds the value of theopcodeand aflagsfield. If the flags field hasIO_URING_OP_SUPPORTEDset, then this opcode is supported on the running kernel.Available since 5.6.
REGISTER_PERSONALITY- This operation registers credentials of the running application withio_uring, and returns an id associated with these credentials.Applications wishing to share a ring between separate users/processes can pass in this credential id in the sqe personality field. If set, that particular sqe will be issued with these credentials. Must be invoked with
argset toNULLandnr_argsset to zero.Available since 5.6.
UNREGISTER_PERSONALITY- This operation unregisters a previously registered personality withio_uring.nr_argsmust be set to the id in question, andargmust be set toNULL.Available since 5.6.
REGISTER_RESTRICTIONS-argpoints to a structio_uring_restrictionarray ofnr_argsentries.With an entry it is possible to allow an
registeropcode, or specify whichopcodeand flags of the submission queue entry are allowed, or require certain flags to be specified (these flags must be set on each submission queue entry).All the restrictions must be submitted with a single
io_uring_register()call and they are handled as an allowlist (opcodesand flags not registered, are not allowed).Restrictions can be registered only if the
io_uringring started in a disabled state (SETUP_R_DISABLEDmust be specified in the call tosetup).Available since 5.10.
REGISTER_ENABLE_RINGS- This operation enables anio_uringring started in a disabled state (SETUP_R_DISABLEDwas specified in the call tosetup).While the
io_uringring is disabled, submissions are not allowed and registrations are not restricted. After the execution of this operation, theio_uringring is enabled: submissions and registration are allowed, but they will be validated following the registered restrictions (if any). This operation takes no argument, must be invoked withargset toNULLandnr_argsset to zero.Available since 5.10.
REGISTER_FILES2- Register files for I/O. Similar toREGISTER_FILES.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The
datafield contains a pointer to an array ofnrfile descriptors (signed 32 bit integers).tagsfield should either be 0 or or point to an array ofnr"tags" (unsigned 64 bit integers). SeeREGISTER_BUFFERS2for more info on resource tagging.Note that resource updates, e.g.
REGISTER_FILES_UPDATE, don't necessarily deallocate resources, they might be held until all requests using that resource complete.Available since 5.13.
REGISTER_FILES_UPDATE2- Similar toREGISTER_FILES_UPDATE, replaces existing files in the registered file set with new ones, either turning a sparse entry (one where fd is equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of file descriptors to use for the update stored in data.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_BUFFERS2- Register buffers for I/O.Similar to
REGISTER_BUFFERSbut aims to have a more extensible ABI.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The data field contains a pointer to a struct
iovecarray ofnrentries. Thetagsfield should either be 0, then tagging is disabled, or point to an array ofnr"tags" (unsigned 64 bit integers). If a tag is zero, then tagging for this particular resource (a buffer in this case) is disabled. Otherwise, after the resource had been unregistered and it's not used anymore, a CQE will be posted withuser_dataset to the specified tag and all other fields zeroed.Note that resource updates, e.g.
REGISTER_BUFFERS_UPDATE, don't necessarily deallocate resources by the time it returns, but they might be held alive until all requests using it complete.Available since 5.13.
REGISTER_BUFFERS_UPDATE- Updates registered buffers with new ones, either turning a sparse entry into a real one, or replacing an existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of structiovec.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_IOWQ_AFF- By default, async workers created byio_uringwill inherit the CPU mask of its parent.This is usually all the CPUs in the system, unless the parent is being run with a limited set. If this isn't the desired outcome, the application may explicitly tell
io_uringwhat CPUs the async workers may run on.argmust point to acpu_set_tmask, andnr_argsthe byte size of that mask.Available since 5.14.
UNREGISTER_IOWQ_AFF- Undoes a CPU mask previously set withREGISTER_IOWQ_AFF.Must not have
argornr_argsset.Available since 5.14.
REGISTER_IOWQ_MAX_WORKERS- By default,io_uringlimits the unbounded workers created to the maximum processor count set byRLIMIT_NPROCand the bounded workers is a function of the SQ ring size and the number of CPUs in the system. Sometimes this can be excessive (or too little, for bounded), and this command provides a way to change the count per ring (per NUMA node) instead.argmust be set to an unsigned int pointer to an array of two values, with the values in the array being set to the maximum count of workers per NUMA node. Index 0 holds the bounded worker count, and index 1 holds the unbounded worker count. On successful return, the passed in array will contain the previous maximum valyes for each type. If the count being passed in is 0, then this command returns the current maximum values and doesn't modify the current setting.nr_argsmust be set to 2, as the command takes two values.Available since 5.15.
REGISTER_RING_FDS- Wheneverenteris called to submit request or wait for completions, the kernel must grab a reference to the file descriptor. If the application using io_uring is threaded, the file table is marked as shared, and the reference grab and put of the file descriptor count is more expensive than it is for a non-threaded application.Similarly to how io_uring allows registration of files, this allow registration of the ring file descriptor itself. This reduces the overhead of the
io_uring_enter (2)system call.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Thedatafield of this struct must point to an io_uring file descriptor, and theoffsetfield can be either-1or an explicit offset desired for the registered file descriptor value. If-1is used, then upon successful return of this system call, the field will contain the value of the registered file descriptor to be used for futureio_uring_enter (2)system calls.On successful completion of this request, the returned descriptors may be used instead of the real file descriptor for
io_uring_enter (2), provided thatIORING_ENTER_REGISTERED_RINGis set in theflagsfor the system call. This flag tells the kernel that a registered descriptor is used rather than a real file descriptor.Each thread or process using a ring must register the file descriptor directly by issuing this request.
The maximum number of supported registered ring descriptors is currently limited to
16.Available since 5.18.
UNREGISTER_RING_FDS- Unregister descriptors previously registered withREGISTER_RING_FDS.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Only theoffsetfield should be set in the structure, containing the registered file descriptor offset previously returned fromIORING_REGISTER_RING_FDSthat the application wishes to unregister.Note that this isn't done automatically on ring exit, if the thread or task that previously registered a ring file descriptor isn't exiting. It is recommended to manually unregister any previously registered ring descriptors if the ring is closed and the task persists. This will free up a registration slot, making it available for future use.
Available since 5.18.
REGISTER_PBUF_RING- register ring based provide buffer groupUNREGISTER_PBUF_RING- unregister ring based provide buffer groupREGISTER_SYNC_CANCEL- sync cancelation APIREGISTER_FILE_ALLOC_RANGE- register a range of fixed file slots for automatic slot allocationREGISTER_PBUF_STATUS- return status information for a buffer groupREGISTER_NAPI- set busy poll settingsUNREGISTER_NAPI- clear busy poll settingsREGISTER_CLOCKREGISTER_CLONE_BUFFERS- clone registered buffers from source ring to current ringREGISTER_LASTREGISTER_USE_REGISTERED_RING
- See Also:
-
IORING_REGISTER_FILES_UPDATE2
public static final int IORING_REGISTER_FILES_UPDATE2io_uring_register_opEnum values:
REGISTER_BUFFERS-argpoints to a structiovecarray ofnr_argsentries.The buffers associated with the
iovecswill be locked in memory and charged against the user'sRLIMIT_MEMLOCKresource limit. Seegetrlimit(2)for more information. Additionally, there is a size limit of 1GiB per buffer. Currently, the buffers must be anonymous, non-file-backed memory, such as that returned bymalloc(3)ormmap(2)with theMAP_ANONYMOUSflag set. It is expected that this limitation will be lifted in the future. Huge pages are supported as well. Note that the entire huge page will be pinned in the kernel, even if only a portion of it is used.After a successful call, the supplied buffers are mapped into the kernel and eligible for I/O. To make use of them, the application must specify the
OP_READ_FIXEDorOP_WRITE_FIXEDopcodesin the submission queue entry (see the structio_uring_sqedefinition inenter), and set thebuf_indexfield to the desired buffer index. The memory range described by the submission queue entry'saddrandlenfields must fall within the indexed buffer.It is perfectly valid to setup a large buffer and then only use part of it for an I/O, as long as the range is within the originally mapped region.
An application can increase or decrease the size or number of registered buffers by first unregistering the existing buffers, and then issuing a new call to
io_uring_register()with the new buffers.Note that before 5.13 registering buffers would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding.
An application need not unregister buffers explicitly before shutting down the io_uring instance.
Available since 5.1.
UNREGISTER_BUFFERS- This operation takes no argument, andargmust be passed asNULL.All previously registered buffers associated with the
io_uringinstance will be released.Available since 5.1.
REGISTER_FILES- Register files for I/O.argcontains a pointer to an array ofnr_argsfile descriptors (signed 32 bit integers). To make use of the registered files, theIOSQE_FIXED_FILEflag must be set in theflagsmember of the structio_uring_sqe, and thefdmember is set to the index of the file in the file descriptor array.The file set may be sparse, meaning that the
fdfield in the array may be set to -1. SeeREGISTER_FILES_UPDATEfor how to update files in place.Note that before 5.13 registering files would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding. See
REGISTER_FILES_UPDATEfor how to update an existing set without that limitation.Files are automatically unregistered when the io_uring instance is torn down. An application needs only unregister if it wishes to register a new set of fds.
Available since 5.1.
UNREGISTER_FILES- This operation requires no argument, andargmust be passed asNULL.All previously registered files associated with the
io_uringinstance will be unregistered.Available since 5.1.
REGISTER_EVENTFD- It's possible to useeventfd(2)to get notified of completion events on anio_uringinstance. If this is desired, an eventfd file descriptor can be registered through this operation.argmust contain a pointer to the eventfd file descriptor, andnr_argsmust be 1.Available since 5.2.
An application can temporarily disable notifications, coming through the registered eventfd, by setting the
CQ_EVENTFD_DISABLEDbit in theflagsfield of the CQ ring.Available since 5.8.
UNREGISTER_EVENTFD- Unregister an eventfd file descriptor to stop notifications.Since only one eventfd descriptor is currently supported, this operation takes no argument, and
argmust be passed asNULLandnr_argsmust be zero.Available since 5.2.
REGISTER_FILES_UPDATE- This operation replaces existing files in the registered file set with new ones, either turning a sparse entry (one wherefdis equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_files_update, which contains an offset on which to start the update, and an array of file descriptors to use for the update.nr_argsmust contain the number of descriptors in the passed in array.Available since 5.5.
File descriptors can be skipped if they are set to
REGISTER_FILES_SKIP. Skipping an fd will not touch the file associated with the previous fd at that index.Available since 5.12.
REGISTER_EVENTFD_ASYNC- This works just likeREGISTER_EVENTFD, except notifications are only posted for events that complete in an async manner.This means that events that complete inline while being submitted do not trigger a notification event. The arguments supplied are the same as for
IORING_REGISTER_EVENTFD.Available since 5.6.
REGISTER_PROBE- This operation returns a structure,io_uring_probe, which contains information about theopcodessupported byio_uringon the running kernel.argmust contain a pointer to a structio_uring_probe, andnr_argsmust contain the size of the ops array in that probe struct. Theopsarray is of the typeio_uring_probe_op, which holds the value of theopcodeand aflagsfield. If the flags field hasIO_URING_OP_SUPPORTEDset, then this opcode is supported on the running kernel.Available since 5.6.
REGISTER_PERSONALITY- This operation registers credentials of the running application withio_uring, and returns an id associated with these credentials.Applications wishing to share a ring between separate users/processes can pass in this credential id in the sqe personality field. If set, that particular sqe will be issued with these credentials. Must be invoked with
argset toNULLandnr_argsset to zero.Available since 5.6.
UNREGISTER_PERSONALITY- This operation unregisters a previously registered personality withio_uring.nr_argsmust be set to the id in question, andargmust be set toNULL.Available since 5.6.
REGISTER_RESTRICTIONS-argpoints to a structio_uring_restrictionarray ofnr_argsentries.With an entry it is possible to allow an
registeropcode, or specify whichopcodeand flags of the submission queue entry are allowed, or require certain flags to be specified (these flags must be set on each submission queue entry).All the restrictions must be submitted with a single
io_uring_register()call and they are handled as an allowlist (opcodesand flags not registered, are not allowed).Restrictions can be registered only if the
io_uringring started in a disabled state (SETUP_R_DISABLEDmust be specified in the call tosetup).Available since 5.10.
REGISTER_ENABLE_RINGS- This operation enables anio_uringring started in a disabled state (SETUP_R_DISABLEDwas specified in the call tosetup).While the
io_uringring is disabled, submissions are not allowed and registrations are not restricted. After the execution of this operation, theio_uringring is enabled: submissions and registration are allowed, but they will be validated following the registered restrictions (if any). This operation takes no argument, must be invoked withargset toNULLandnr_argsset to zero.Available since 5.10.
REGISTER_FILES2- Register files for I/O. Similar toREGISTER_FILES.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The
datafield contains a pointer to an array ofnrfile descriptors (signed 32 bit integers).tagsfield should either be 0 or or point to an array ofnr"tags" (unsigned 64 bit integers). SeeREGISTER_BUFFERS2for more info on resource tagging.Note that resource updates, e.g.
REGISTER_FILES_UPDATE, don't necessarily deallocate resources, they might be held until all requests using that resource complete.Available since 5.13.
REGISTER_FILES_UPDATE2- Similar toREGISTER_FILES_UPDATE, replaces existing files in the registered file set with new ones, either turning a sparse entry (one where fd is equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of file descriptors to use for the update stored in data.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_BUFFERS2- Register buffers for I/O.Similar to
REGISTER_BUFFERSbut aims to have a more extensible ABI.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The data field contains a pointer to a struct
iovecarray ofnrentries. Thetagsfield should either be 0, then tagging is disabled, or point to an array ofnr"tags" (unsigned 64 bit integers). If a tag is zero, then tagging for this particular resource (a buffer in this case) is disabled. Otherwise, after the resource had been unregistered and it's not used anymore, a CQE will be posted withuser_dataset to the specified tag and all other fields zeroed.Note that resource updates, e.g.
REGISTER_BUFFERS_UPDATE, don't necessarily deallocate resources by the time it returns, but they might be held alive until all requests using it complete.Available since 5.13.
REGISTER_BUFFERS_UPDATE- Updates registered buffers with new ones, either turning a sparse entry into a real one, or replacing an existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of structiovec.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_IOWQ_AFF- By default, async workers created byio_uringwill inherit the CPU mask of its parent.This is usually all the CPUs in the system, unless the parent is being run with a limited set. If this isn't the desired outcome, the application may explicitly tell
io_uringwhat CPUs the async workers may run on.argmust point to acpu_set_tmask, andnr_argsthe byte size of that mask.Available since 5.14.
UNREGISTER_IOWQ_AFF- Undoes a CPU mask previously set withREGISTER_IOWQ_AFF.Must not have
argornr_argsset.Available since 5.14.
REGISTER_IOWQ_MAX_WORKERS- By default,io_uringlimits the unbounded workers created to the maximum processor count set byRLIMIT_NPROCand the bounded workers is a function of the SQ ring size and the number of CPUs in the system. Sometimes this can be excessive (or too little, for bounded), and this command provides a way to change the count per ring (per NUMA node) instead.argmust be set to an unsigned int pointer to an array of two values, with the values in the array being set to the maximum count of workers per NUMA node. Index 0 holds the bounded worker count, and index 1 holds the unbounded worker count. On successful return, the passed in array will contain the previous maximum valyes for each type. If the count being passed in is 0, then this command returns the current maximum values and doesn't modify the current setting.nr_argsmust be set to 2, as the command takes two values.Available since 5.15.
REGISTER_RING_FDS- Wheneverenteris called to submit request or wait for completions, the kernel must grab a reference to the file descriptor. If the application using io_uring is threaded, the file table is marked as shared, and the reference grab and put of the file descriptor count is more expensive than it is for a non-threaded application.Similarly to how io_uring allows registration of files, this allow registration of the ring file descriptor itself. This reduces the overhead of the
io_uring_enter (2)system call.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Thedatafield of this struct must point to an io_uring file descriptor, and theoffsetfield can be either-1or an explicit offset desired for the registered file descriptor value. If-1is used, then upon successful return of this system call, the field will contain the value of the registered file descriptor to be used for futureio_uring_enter (2)system calls.On successful completion of this request, the returned descriptors may be used instead of the real file descriptor for
io_uring_enter (2), provided thatIORING_ENTER_REGISTERED_RINGis set in theflagsfor the system call. This flag tells the kernel that a registered descriptor is used rather than a real file descriptor.Each thread or process using a ring must register the file descriptor directly by issuing this request.
The maximum number of supported registered ring descriptors is currently limited to
16.Available since 5.18.
UNREGISTER_RING_FDS- Unregister descriptors previously registered withREGISTER_RING_FDS.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Only theoffsetfield should be set in the structure, containing the registered file descriptor offset previously returned fromIORING_REGISTER_RING_FDSthat the application wishes to unregister.Note that this isn't done automatically on ring exit, if the thread or task that previously registered a ring file descriptor isn't exiting. It is recommended to manually unregister any previously registered ring descriptors if the ring is closed and the task persists. This will free up a registration slot, making it available for future use.
Available since 5.18.
REGISTER_PBUF_RING- register ring based provide buffer groupUNREGISTER_PBUF_RING- unregister ring based provide buffer groupREGISTER_SYNC_CANCEL- sync cancelation APIREGISTER_FILE_ALLOC_RANGE- register a range of fixed file slots for automatic slot allocationREGISTER_PBUF_STATUS- return status information for a buffer groupREGISTER_NAPI- set busy poll settingsUNREGISTER_NAPI- clear busy poll settingsREGISTER_CLOCKREGISTER_CLONE_BUFFERS- clone registered buffers from source ring to current ringREGISTER_LASTREGISTER_USE_REGISTERED_RING
- See Also:
-
IORING_REGISTER_BUFFERS2
public static final int IORING_REGISTER_BUFFERS2io_uring_register_opEnum values:
REGISTER_BUFFERS-argpoints to a structiovecarray ofnr_argsentries.The buffers associated with the
iovecswill be locked in memory and charged against the user'sRLIMIT_MEMLOCKresource limit. Seegetrlimit(2)for more information. Additionally, there is a size limit of 1GiB per buffer. Currently, the buffers must be anonymous, non-file-backed memory, such as that returned bymalloc(3)ormmap(2)with theMAP_ANONYMOUSflag set. It is expected that this limitation will be lifted in the future. Huge pages are supported as well. Note that the entire huge page will be pinned in the kernel, even if only a portion of it is used.After a successful call, the supplied buffers are mapped into the kernel and eligible for I/O. To make use of them, the application must specify the
OP_READ_FIXEDorOP_WRITE_FIXEDopcodesin the submission queue entry (see the structio_uring_sqedefinition inenter), and set thebuf_indexfield to the desired buffer index. The memory range described by the submission queue entry'saddrandlenfields must fall within the indexed buffer.It is perfectly valid to setup a large buffer and then only use part of it for an I/O, as long as the range is within the originally mapped region.
An application can increase or decrease the size or number of registered buffers by first unregistering the existing buffers, and then issuing a new call to
io_uring_register()with the new buffers.Note that before 5.13 registering buffers would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding.
An application need not unregister buffers explicitly before shutting down the io_uring instance.
Available since 5.1.
UNREGISTER_BUFFERS- This operation takes no argument, andargmust be passed asNULL.All previously registered buffers associated with the
io_uringinstance will be released.Available since 5.1.
REGISTER_FILES- Register files for I/O.argcontains a pointer to an array ofnr_argsfile descriptors (signed 32 bit integers). To make use of the registered files, theIOSQE_FIXED_FILEflag must be set in theflagsmember of the structio_uring_sqe, and thefdmember is set to the index of the file in the file descriptor array.The file set may be sparse, meaning that the
fdfield in the array may be set to -1. SeeREGISTER_FILES_UPDATEfor how to update files in place.Note that before 5.13 registering files would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding. See
REGISTER_FILES_UPDATEfor how to update an existing set without that limitation.Files are automatically unregistered when the io_uring instance is torn down. An application needs only unregister if it wishes to register a new set of fds.
Available since 5.1.
UNREGISTER_FILES- This operation requires no argument, andargmust be passed asNULL.All previously registered files associated with the
io_uringinstance will be unregistered.Available since 5.1.
REGISTER_EVENTFD- It's possible to useeventfd(2)to get notified of completion events on anio_uringinstance. If this is desired, an eventfd file descriptor can be registered through this operation.argmust contain a pointer to the eventfd file descriptor, andnr_argsmust be 1.Available since 5.2.
An application can temporarily disable notifications, coming through the registered eventfd, by setting the
CQ_EVENTFD_DISABLEDbit in theflagsfield of the CQ ring.Available since 5.8.
UNREGISTER_EVENTFD- Unregister an eventfd file descriptor to stop notifications.Since only one eventfd descriptor is currently supported, this operation takes no argument, and
argmust be passed asNULLandnr_argsmust be zero.Available since 5.2.
REGISTER_FILES_UPDATE- This operation replaces existing files in the registered file set with new ones, either turning a sparse entry (one wherefdis equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_files_update, which contains an offset on which to start the update, and an array of file descriptors to use for the update.nr_argsmust contain the number of descriptors in the passed in array.Available since 5.5.
File descriptors can be skipped if they are set to
REGISTER_FILES_SKIP. Skipping an fd will not touch the file associated with the previous fd at that index.Available since 5.12.
REGISTER_EVENTFD_ASYNC- This works just likeREGISTER_EVENTFD, except notifications are only posted for events that complete in an async manner.This means that events that complete inline while being submitted do not trigger a notification event. The arguments supplied are the same as for
IORING_REGISTER_EVENTFD.Available since 5.6.
REGISTER_PROBE- This operation returns a structure,io_uring_probe, which contains information about theopcodessupported byio_uringon the running kernel.argmust contain a pointer to a structio_uring_probe, andnr_argsmust contain the size of the ops array in that probe struct. Theopsarray is of the typeio_uring_probe_op, which holds the value of theopcodeand aflagsfield. If the flags field hasIO_URING_OP_SUPPORTEDset, then this opcode is supported on the running kernel.Available since 5.6.
REGISTER_PERSONALITY- This operation registers credentials of the running application withio_uring, and returns an id associated with these credentials.Applications wishing to share a ring between separate users/processes can pass in this credential id in the sqe personality field. If set, that particular sqe will be issued with these credentials. Must be invoked with
argset toNULLandnr_argsset to zero.Available since 5.6.
UNREGISTER_PERSONALITY- This operation unregisters a previously registered personality withio_uring.nr_argsmust be set to the id in question, andargmust be set toNULL.Available since 5.6.
REGISTER_RESTRICTIONS-argpoints to a structio_uring_restrictionarray ofnr_argsentries.With an entry it is possible to allow an
registeropcode, or specify whichopcodeand flags of the submission queue entry are allowed, or require certain flags to be specified (these flags must be set on each submission queue entry).All the restrictions must be submitted with a single
io_uring_register()call and they are handled as an allowlist (opcodesand flags not registered, are not allowed).Restrictions can be registered only if the
io_uringring started in a disabled state (SETUP_R_DISABLEDmust be specified in the call tosetup).Available since 5.10.
REGISTER_ENABLE_RINGS- This operation enables anio_uringring started in a disabled state (SETUP_R_DISABLEDwas specified in the call tosetup).While the
io_uringring is disabled, submissions are not allowed and registrations are not restricted. After the execution of this operation, theio_uringring is enabled: submissions and registration are allowed, but they will be validated following the registered restrictions (if any). This operation takes no argument, must be invoked withargset toNULLandnr_argsset to zero.Available since 5.10.
REGISTER_FILES2- Register files for I/O. Similar toREGISTER_FILES.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The
datafield contains a pointer to an array ofnrfile descriptors (signed 32 bit integers).tagsfield should either be 0 or or point to an array ofnr"tags" (unsigned 64 bit integers). SeeREGISTER_BUFFERS2for more info on resource tagging.Note that resource updates, e.g.
REGISTER_FILES_UPDATE, don't necessarily deallocate resources, they might be held until all requests using that resource complete.Available since 5.13.
REGISTER_FILES_UPDATE2- Similar toREGISTER_FILES_UPDATE, replaces existing files in the registered file set with new ones, either turning a sparse entry (one where fd is equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of file descriptors to use for the update stored in data.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_BUFFERS2- Register buffers for I/O.Similar to
REGISTER_BUFFERSbut aims to have a more extensible ABI.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The data field contains a pointer to a struct
iovecarray ofnrentries. Thetagsfield should either be 0, then tagging is disabled, or point to an array ofnr"tags" (unsigned 64 bit integers). If a tag is zero, then tagging for this particular resource (a buffer in this case) is disabled. Otherwise, after the resource had been unregistered and it's not used anymore, a CQE will be posted withuser_dataset to the specified tag and all other fields zeroed.Note that resource updates, e.g.
REGISTER_BUFFERS_UPDATE, don't necessarily deallocate resources by the time it returns, but they might be held alive until all requests using it complete.Available since 5.13.
REGISTER_BUFFERS_UPDATE- Updates registered buffers with new ones, either turning a sparse entry into a real one, or replacing an existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of structiovec.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_IOWQ_AFF- By default, async workers created byio_uringwill inherit the CPU mask of its parent.This is usually all the CPUs in the system, unless the parent is being run with a limited set. If this isn't the desired outcome, the application may explicitly tell
io_uringwhat CPUs the async workers may run on.argmust point to acpu_set_tmask, andnr_argsthe byte size of that mask.Available since 5.14.
UNREGISTER_IOWQ_AFF- Undoes a CPU mask previously set withREGISTER_IOWQ_AFF.Must not have
argornr_argsset.Available since 5.14.
REGISTER_IOWQ_MAX_WORKERS- By default,io_uringlimits the unbounded workers created to the maximum processor count set byRLIMIT_NPROCand the bounded workers is a function of the SQ ring size and the number of CPUs in the system. Sometimes this can be excessive (or too little, for bounded), and this command provides a way to change the count per ring (per NUMA node) instead.argmust be set to an unsigned int pointer to an array of two values, with the values in the array being set to the maximum count of workers per NUMA node. Index 0 holds the bounded worker count, and index 1 holds the unbounded worker count. On successful return, the passed in array will contain the previous maximum valyes for each type. If the count being passed in is 0, then this command returns the current maximum values and doesn't modify the current setting.nr_argsmust be set to 2, as the command takes two values.Available since 5.15.
REGISTER_RING_FDS- Wheneverenteris called to submit request or wait for completions, the kernel must grab a reference to the file descriptor. If the application using io_uring is threaded, the file table is marked as shared, and the reference grab and put of the file descriptor count is more expensive than it is for a non-threaded application.Similarly to how io_uring allows registration of files, this allow registration of the ring file descriptor itself. This reduces the overhead of the
io_uring_enter (2)system call.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Thedatafield of this struct must point to an io_uring file descriptor, and theoffsetfield can be either-1or an explicit offset desired for the registered file descriptor value. If-1is used, then upon successful return of this system call, the field will contain the value of the registered file descriptor to be used for futureio_uring_enter (2)system calls.On successful completion of this request, the returned descriptors may be used instead of the real file descriptor for
io_uring_enter (2), provided thatIORING_ENTER_REGISTERED_RINGis set in theflagsfor the system call. This flag tells the kernel that a registered descriptor is used rather than a real file descriptor.Each thread or process using a ring must register the file descriptor directly by issuing this request.
The maximum number of supported registered ring descriptors is currently limited to
16.Available since 5.18.
UNREGISTER_RING_FDS- Unregister descriptors previously registered withREGISTER_RING_FDS.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Only theoffsetfield should be set in the structure, containing the registered file descriptor offset previously returned fromIORING_REGISTER_RING_FDSthat the application wishes to unregister.Note that this isn't done automatically on ring exit, if the thread or task that previously registered a ring file descriptor isn't exiting. It is recommended to manually unregister any previously registered ring descriptors if the ring is closed and the task persists. This will free up a registration slot, making it available for future use.
Available since 5.18.
REGISTER_PBUF_RING- register ring based provide buffer groupUNREGISTER_PBUF_RING- unregister ring based provide buffer groupREGISTER_SYNC_CANCEL- sync cancelation APIREGISTER_FILE_ALLOC_RANGE- register a range of fixed file slots for automatic slot allocationREGISTER_PBUF_STATUS- return status information for a buffer groupREGISTER_NAPI- set busy poll settingsUNREGISTER_NAPI- clear busy poll settingsREGISTER_CLOCKREGISTER_CLONE_BUFFERS- clone registered buffers from source ring to current ringREGISTER_LASTREGISTER_USE_REGISTERED_RING
- See Also:
-
IORING_REGISTER_BUFFERS_UPDATE
public static final int IORING_REGISTER_BUFFERS_UPDATEio_uring_register_opEnum values:
REGISTER_BUFFERS-argpoints to a structiovecarray ofnr_argsentries.The buffers associated with the
iovecswill be locked in memory and charged against the user'sRLIMIT_MEMLOCKresource limit. Seegetrlimit(2)for more information. Additionally, there is a size limit of 1GiB per buffer. Currently, the buffers must be anonymous, non-file-backed memory, such as that returned bymalloc(3)ormmap(2)with theMAP_ANONYMOUSflag set. It is expected that this limitation will be lifted in the future. Huge pages are supported as well. Note that the entire huge page will be pinned in the kernel, even if only a portion of it is used.After a successful call, the supplied buffers are mapped into the kernel and eligible for I/O. To make use of them, the application must specify the
OP_READ_FIXEDorOP_WRITE_FIXEDopcodesin the submission queue entry (see the structio_uring_sqedefinition inenter), and set thebuf_indexfield to the desired buffer index. The memory range described by the submission queue entry'saddrandlenfields must fall within the indexed buffer.It is perfectly valid to setup a large buffer and then only use part of it for an I/O, as long as the range is within the originally mapped region.
An application can increase or decrease the size or number of registered buffers by first unregistering the existing buffers, and then issuing a new call to
io_uring_register()with the new buffers.Note that before 5.13 registering buffers would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding.
An application need not unregister buffers explicitly before shutting down the io_uring instance.
Available since 5.1.
UNREGISTER_BUFFERS- This operation takes no argument, andargmust be passed asNULL.All previously registered buffers associated with the
io_uringinstance will be released.Available since 5.1.
REGISTER_FILES- Register files for I/O.argcontains a pointer to an array ofnr_argsfile descriptors (signed 32 bit integers). To make use of the registered files, theIOSQE_FIXED_FILEflag must be set in theflagsmember of the structio_uring_sqe, and thefdmember is set to the index of the file in the file descriptor array.The file set may be sparse, meaning that the
fdfield in the array may be set to -1. SeeREGISTER_FILES_UPDATEfor how to update files in place.Note that before 5.13 registering files would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding. See
REGISTER_FILES_UPDATEfor how to update an existing set without that limitation.Files are automatically unregistered when the io_uring instance is torn down. An application needs only unregister if it wishes to register a new set of fds.
Available since 5.1.
UNREGISTER_FILES- This operation requires no argument, andargmust be passed asNULL.All previously registered files associated with the
io_uringinstance will be unregistered.Available since 5.1.
REGISTER_EVENTFD- It's possible to useeventfd(2)to get notified of completion events on anio_uringinstance. If this is desired, an eventfd file descriptor can be registered through this operation.argmust contain a pointer to the eventfd file descriptor, andnr_argsmust be 1.Available since 5.2.
An application can temporarily disable notifications, coming through the registered eventfd, by setting the
CQ_EVENTFD_DISABLEDbit in theflagsfield of the CQ ring.Available since 5.8.
UNREGISTER_EVENTFD- Unregister an eventfd file descriptor to stop notifications.Since only one eventfd descriptor is currently supported, this operation takes no argument, and
argmust be passed asNULLandnr_argsmust be zero.Available since 5.2.
REGISTER_FILES_UPDATE- This operation replaces existing files in the registered file set with new ones, either turning a sparse entry (one wherefdis equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_files_update, which contains an offset on which to start the update, and an array of file descriptors to use for the update.nr_argsmust contain the number of descriptors in the passed in array.Available since 5.5.
File descriptors can be skipped if they are set to
REGISTER_FILES_SKIP. Skipping an fd will not touch the file associated with the previous fd at that index.Available since 5.12.
REGISTER_EVENTFD_ASYNC- This works just likeREGISTER_EVENTFD, except notifications are only posted for events that complete in an async manner.This means that events that complete inline while being submitted do not trigger a notification event. The arguments supplied are the same as for
IORING_REGISTER_EVENTFD.Available since 5.6.
REGISTER_PROBE- This operation returns a structure,io_uring_probe, which contains information about theopcodessupported byio_uringon the running kernel.argmust contain a pointer to a structio_uring_probe, andnr_argsmust contain the size of the ops array in that probe struct. Theopsarray is of the typeio_uring_probe_op, which holds the value of theopcodeand aflagsfield. If the flags field hasIO_URING_OP_SUPPORTEDset, then this opcode is supported on the running kernel.Available since 5.6.
REGISTER_PERSONALITY- This operation registers credentials of the running application withio_uring, and returns an id associated with these credentials.Applications wishing to share a ring between separate users/processes can pass in this credential id in the sqe personality field. If set, that particular sqe will be issued with these credentials. Must be invoked with
argset toNULLandnr_argsset to zero.Available since 5.6.
UNREGISTER_PERSONALITY- This operation unregisters a previously registered personality withio_uring.nr_argsmust be set to the id in question, andargmust be set toNULL.Available since 5.6.
REGISTER_RESTRICTIONS-argpoints to a structio_uring_restrictionarray ofnr_argsentries.With an entry it is possible to allow an
registeropcode, or specify whichopcodeand flags of the submission queue entry are allowed, or require certain flags to be specified (these flags must be set on each submission queue entry).All the restrictions must be submitted with a single
io_uring_register()call and they are handled as an allowlist (opcodesand flags not registered, are not allowed).Restrictions can be registered only if the
io_uringring started in a disabled state (SETUP_R_DISABLEDmust be specified in the call tosetup).Available since 5.10.
REGISTER_ENABLE_RINGS- This operation enables anio_uringring started in a disabled state (SETUP_R_DISABLEDwas specified in the call tosetup).While the
io_uringring is disabled, submissions are not allowed and registrations are not restricted. After the execution of this operation, theio_uringring is enabled: submissions and registration are allowed, but they will be validated following the registered restrictions (if any). This operation takes no argument, must be invoked withargset toNULLandnr_argsset to zero.Available since 5.10.
REGISTER_FILES2- Register files for I/O. Similar toREGISTER_FILES.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The
datafield contains a pointer to an array ofnrfile descriptors (signed 32 bit integers).tagsfield should either be 0 or or point to an array ofnr"tags" (unsigned 64 bit integers). SeeREGISTER_BUFFERS2for more info on resource tagging.Note that resource updates, e.g.
REGISTER_FILES_UPDATE, don't necessarily deallocate resources, they might be held until all requests using that resource complete.Available since 5.13.
REGISTER_FILES_UPDATE2- Similar toREGISTER_FILES_UPDATE, replaces existing files in the registered file set with new ones, either turning a sparse entry (one where fd is equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of file descriptors to use for the update stored in data.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_BUFFERS2- Register buffers for I/O.Similar to
REGISTER_BUFFERSbut aims to have a more extensible ABI.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The data field contains a pointer to a struct
iovecarray ofnrentries. Thetagsfield should either be 0, then tagging is disabled, or point to an array ofnr"tags" (unsigned 64 bit integers). If a tag is zero, then tagging for this particular resource (a buffer in this case) is disabled. Otherwise, after the resource had been unregistered and it's not used anymore, a CQE will be posted withuser_dataset to the specified tag and all other fields zeroed.Note that resource updates, e.g.
REGISTER_BUFFERS_UPDATE, don't necessarily deallocate resources by the time it returns, but they might be held alive until all requests using it complete.Available since 5.13.
REGISTER_BUFFERS_UPDATE- Updates registered buffers with new ones, either turning a sparse entry into a real one, or replacing an existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of structiovec.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_IOWQ_AFF- By default, async workers created byio_uringwill inherit the CPU mask of its parent.This is usually all the CPUs in the system, unless the parent is being run with a limited set. If this isn't the desired outcome, the application may explicitly tell
io_uringwhat CPUs the async workers may run on.argmust point to acpu_set_tmask, andnr_argsthe byte size of that mask.Available since 5.14.
UNREGISTER_IOWQ_AFF- Undoes a CPU mask previously set withREGISTER_IOWQ_AFF.Must not have
argornr_argsset.Available since 5.14.
REGISTER_IOWQ_MAX_WORKERS- By default,io_uringlimits the unbounded workers created to the maximum processor count set byRLIMIT_NPROCand the bounded workers is a function of the SQ ring size and the number of CPUs in the system. Sometimes this can be excessive (or too little, for bounded), and this command provides a way to change the count per ring (per NUMA node) instead.argmust be set to an unsigned int pointer to an array of two values, with the values in the array being set to the maximum count of workers per NUMA node. Index 0 holds the bounded worker count, and index 1 holds the unbounded worker count. On successful return, the passed in array will contain the previous maximum valyes for each type. If the count being passed in is 0, then this command returns the current maximum values and doesn't modify the current setting.nr_argsmust be set to 2, as the command takes two values.Available since 5.15.
REGISTER_RING_FDS- Wheneverenteris called to submit request or wait for completions, the kernel must grab a reference to the file descriptor. If the application using io_uring is threaded, the file table is marked as shared, and the reference grab and put of the file descriptor count is more expensive than it is for a non-threaded application.Similarly to how io_uring allows registration of files, this allow registration of the ring file descriptor itself. This reduces the overhead of the
io_uring_enter (2)system call.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Thedatafield of this struct must point to an io_uring file descriptor, and theoffsetfield can be either-1or an explicit offset desired for the registered file descriptor value. If-1is used, then upon successful return of this system call, the field will contain the value of the registered file descriptor to be used for futureio_uring_enter (2)system calls.On successful completion of this request, the returned descriptors may be used instead of the real file descriptor for
io_uring_enter (2), provided thatIORING_ENTER_REGISTERED_RINGis set in theflagsfor the system call. This flag tells the kernel that a registered descriptor is used rather than a real file descriptor.Each thread or process using a ring must register the file descriptor directly by issuing this request.
The maximum number of supported registered ring descriptors is currently limited to
16.Available since 5.18.
UNREGISTER_RING_FDS- Unregister descriptors previously registered withREGISTER_RING_FDS.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Only theoffsetfield should be set in the structure, containing the registered file descriptor offset previously returned fromIORING_REGISTER_RING_FDSthat the application wishes to unregister.Note that this isn't done automatically on ring exit, if the thread or task that previously registered a ring file descriptor isn't exiting. It is recommended to manually unregister any previously registered ring descriptors if the ring is closed and the task persists. This will free up a registration slot, making it available for future use.
Available since 5.18.
REGISTER_PBUF_RING- register ring based provide buffer groupUNREGISTER_PBUF_RING- unregister ring based provide buffer groupREGISTER_SYNC_CANCEL- sync cancelation APIREGISTER_FILE_ALLOC_RANGE- register a range of fixed file slots for automatic slot allocationREGISTER_PBUF_STATUS- return status information for a buffer groupREGISTER_NAPI- set busy poll settingsUNREGISTER_NAPI- clear busy poll settingsREGISTER_CLOCKREGISTER_CLONE_BUFFERS- clone registered buffers from source ring to current ringREGISTER_LASTREGISTER_USE_REGISTERED_RING
- See Also:
-
IORING_REGISTER_IOWQ_AFF
public static final int IORING_REGISTER_IOWQ_AFFio_uring_register_opEnum values:
REGISTER_BUFFERS-argpoints to a structiovecarray ofnr_argsentries.The buffers associated with the
iovecswill be locked in memory and charged against the user'sRLIMIT_MEMLOCKresource limit. Seegetrlimit(2)for more information. Additionally, there is a size limit of 1GiB per buffer. Currently, the buffers must be anonymous, non-file-backed memory, such as that returned bymalloc(3)ormmap(2)with theMAP_ANONYMOUSflag set. It is expected that this limitation will be lifted in the future. Huge pages are supported as well. Note that the entire huge page will be pinned in the kernel, even if only a portion of it is used.After a successful call, the supplied buffers are mapped into the kernel and eligible for I/O. To make use of them, the application must specify the
OP_READ_FIXEDorOP_WRITE_FIXEDopcodesin the submission queue entry (see the structio_uring_sqedefinition inenter), and set thebuf_indexfield to the desired buffer index. The memory range described by the submission queue entry'saddrandlenfields must fall within the indexed buffer.It is perfectly valid to setup a large buffer and then only use part of it for an I/O, as long as the range is within the originally mapped region.
An application can increase or decrease the size or number of registered buffers by first unregistering the existing buffers, and then issuing a new call to
io_uring_register()with the new buffers.Note that before 5.13 registering buffers would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding.
An application need not unregister buffers explicitly before shutting down the io_uring instance.
Available since 5.1.
UNREGISTER_BUFFERS- This operation takes no argument, andargmust be passed asNULL.All previously registered buffers associated with the
io_uringinstance will be released.Available since 5.1.
REGISTER_FILES- Register files for I/O.argcontains a pointer to an array ofnr_argsfile descriptors (signed 32 bit integers). To make use of the registered files, theIOSQE_FIXED_FILEflag must be set in theflagsmember of the structio_uring_sqe, and thefdmember is set to the index of the file in the file descriptor array.The file set may be sparse, meaning that the
fdfield in the array may be set to -1. SeeREGISTER_FILES_UPDATEfor how to update files in place.Note that before 5.13 registering files would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding. See
REGISTER_FILES_UPDATEfor how to update an existing set without that limitation.Files are automatically unregistered when the io_uring instance is torn down. An application needs only unregister if it wishes to register a new set of fds.
Available since 5.1.
UNREGISTER_FILES- This operation requires no argument, andargmust be passed asNULL.All previously registered files associated with the
io_uringinstance will be unregistered.Available since 5.1.
REGISTER_EVENTFD- It's possible to useeventfd(2)to get notified of completion events on anio_uringinstance. If this is desired, an eventfd file descriptor can be registered through this operation.argmust contain a pointer to the eventfd file descriptor, andnr_argsmust be 1.Available since 5.2.
An application can temporarily disable notifications, coming through the registered eventfd, by setting the
CQ_EVENTFD_DISABLEDbit in theflagsfield of the CQ ring.Available since 5.8.
UNREGISTER_EVENTFD- Unregister an eventfd file descriptor to stop notifications.Since only one eventfd descriptor is currently supported, this operation takes no argument, and
argmust be passed asNULLandnr_argsmust be zero.Available since 5.2.
REGISTER_FILES_UPDATE- This operation replaces existing files in the registered file set with new ones, either turning a sparse entry (one wherefdis equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_files_update, which contains an offset on which to start the update, and an array of file descriptors to use for the update.nr_argsmust contain the number of descriptors in the passed in array.Available since 5.5.
File descriptors can be skipped if they are set to
REGISTER_FILES_SKIP. Skipping an fd will not touch the file associated with the previous fd at that index.Available since 5.12.
REGISTER_EVENTFD_ASYNC- This works just likeREGISTER_EVENTFD, except notifications are only posted for events that complete in an async manner.This means that events that complete inline while being submitted do not trigger a notification event. The arguments supplied are the same as for
IORING_REGISTER_EVENTFD.Available since 5.6.
REGISTER_PROBE- This operation returns a structure,io_uring_probe, which contains information about theopcodessupported byio_uringon the running kernel.argmust contain a pointer to a structio_uring_probe, andnr_argsmust contain the size of the ops array in that probe struct. Theopsarray is of the typeio_uring_probe_op, which holds the value of theopcodeand aflagsfield. If the flags field hasIO_URING_OP_SUPPORTEDset, then this opcode is supported on the running kernel.Available since 5.6.
REGISTER_PERSONALITY- This operation registers credentials of the running application withio_uring, and returns an id associated with these credentials.Applications wishing to share a ring between separate users/processes can pass in this credential id in the sqe personality field. If set, that particular sqe will be issued with these credentials. Must be invoked with
argset toNULLandnr_argsset to zero.Available since 5.6.
UNREGISTER_PERSONALITY- This operation unregisters a previously registered personality withio_uring.nr_argsmust be set to the id in question, andargmust be set toNULL.Available since 5.6.
REGISTER_RESTRICTIONS-argpoints to a structio_uring_restrictionarray ofnr_argsentries.With an entry it is possible to allow an
registeropcode, or specify whichopcodeand flags of the submission queue entry are allowed, or require certain flags to be specified (these flags must be set on each submission queue entry).All the restrictions must be submitted with a single
io_uring_register()call and they are handled as an allowlist (opcodesand flags not registered, are not allowed).Restrictions can be registered only if the
io_uringring started in a disabled state (SETUP_R_DISABLEDmust be specified in the call tosetup).Available since 5.10.
REGISTER_ENABLE_RINGS- This operation enables anio_uringring started in a disabled state (SETUP_R_DISABLEDwas specified in the call tosetup).While the
io_uringring is disabled, submissions are not allowed and registrations are not restricted. After the execution of this operation, theio_uringring is enabled: submissions and registration are allowed, but they will be validated following the registered restrictions (if any). This operation takes no argument, must be invoked withargset toNULLandnr_argsset to zero.Available since 5.10.
REGISTER_FILES2- Register files for I/O. Similar toREGISTER_FILES.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The
datafield contains a pointer to an array ofnrfile descriptors (signed 32 bit integers).tagsfield should either be 0 or or point to an array ofnr"tags" (unsigned 64 bit integers). SeeREGISTER_BUFFERS2for more info on resource tagging.Note that resource updates, e.g.
REGISTER_FILES_UPDATE, don't necessarily deallocate resources, they might be held until all requests using that resource complete.Available since 5.13.
REGISTER_FILES_UPDATE2- Similar toREGISTER_FILES_UPDATE, replaces existing files in the registered file set with new ones, either turning a sparse entry (one where fd is equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of file descriptors to use for the update stored in data.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_BUFFERS2- Register buffers for I/O.Similar to
REGISTER_BUFFERSbut aims to have a more extensible ABI.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The data field contains a pointer to a struct
iovecarray ofnrentries. Thetagsfield should either be 0, then tagging is disabled, or point to an array ofnr"tags" (unsigned 64 bit integers). If a tag is zero, then tagging for this particular resource (a buffer in this case) is disabled. Otherwise, after the resource had been unregistered and it's not used anymore, a CQE will be posted withuser_dataset to the specified tag and all other fields zeroed.Note that resource updates, e.g.
REGISTER_BUFFERS_UPDATE, don't necessarily deallocate resources by the time it returns, but they might be held alive until all requests using it complete.Available since 5.13.
REGISTER_BUFFERS_UPDATE- Updates registered buffers with new ones, either turning a sparse entry into a real one, or replacing an existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of structiovec.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_IOWQ_AFF- By default, async workers created byio_uringwill inherit the CPU mask of its parent.This is usually all the CPUs in the system, unless the parent is being run with a limited set. If this isn't the desired outcome, the application may explicitly tell
io_uringwhat CPUs the async workers may run on.argmust point to acpu_set_tmask, andnr_argsthe byte size of that mask.Available since 5.14.
UNREGISTER_IOWQ_AFF- Undoes a CPU mask previously set withREGISTER_IOWQ_AFF.Must not have
argornr_argsset.Available since 5.14.
REGISTER_IOWQ_MAX_WORKERS- By default,io_uringlimits the unbounded workers created to the maximum processor count set byRLIMIT_NPROCand the bounded workers is a function of the SQ ring size and the number of CPUs in the system. Sometimes this can be excessive (or too little, for bounded), and this command provides a way to change the count per ring (per NUMA node) instead.argmust be set to an unsigned int pointer to an array of two values, with the values in the array being set to the maximum count of workers per NUMA node. Index 0 holds the bounded worker count, and index 1 holds the unbounded worker count. On successful return, the passed in array will contain the previous maximum valyes for each type. If the count being passed in is 0, then this command returns the current maximum values and doesn't modify the current setting.nr_argsmust be set to 2, as the command takes two values.Available since 5.15.
REGISTER_RING_FDS- Wheneverenteris called to submit request or wait for completions, the kernel must grab a reference to the file descriptor. If the application using io_uring is threaded, the file table is marked as shared, and the reference grab and put of the file descriptor count is more expensive than it is for a non-threaded application.Similarly to how io_uring allows registration of files, this allow registration of the ring file descriptor itself. This reduces the overhead of the
io_uring_enter (2)system call.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Thedatafield of this struct must point to an io_uring file descriptor, and theoffsetfield can be either-1or an explicit offset desired for the registered file descriptor value. If-1is used, then upon successful return of this system call, the field will contain the value of the registered file descriptor to be used for futureio_uring_enter (2)system calls.On successful completion of this request, the returned descriptors may be used instead of the real file descriptor for
io_uring_enter (2), provided thatIORING_ENTER_REGISTERED_RINGis set in theflagsfor the system call. This flag tells the kernel that a registered descriptor is used rather than a real file descriptor.Each thread or process using a ring must register the file descriptor directly by issuing this request.
The maximum number of supported registered ring descriptors is currently limited to
16.Available since 5.18.
UNREGISTER_RING_FDS- Unregister descriptors previously registered withREGISTER_RING_FDS.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Only theoffsetfield should be set in the structure, containing the registered file descriptor offset previously returned fromIORING_REGISTER_RING_FDSthat the application wishes to unregister.Note that this isn't done automatically on ring exit, if the thread or task that previously registered a ring file descriptor isn't exiting. It is recommended to manually unregister any previously registered ring descriptors if the ring is closed and the task persists. This will free up a registration slot, making it available for future use.
Available since 5.18.
REGISTER_PBUF_RING- register ring based provide buffer groupUNREGISTER_PBUF_RING- unregister ring based provide buffer groupREGISTER_SYNC_CANCEL- sync cancelation APIREGISTER_FILE_ALLOC_RANGE- register a range of fixed file slots for automatic slot allocationREGISTER_PBUF_STATUS- return status information for a buffer groupREGISTER_NAPI- set busy poll settingsUNREGISTER_NAPI- clear busy poll settingsREGISTER_CLOCKREGISTER_CLONE_BUFFERS- clone registered buffers from source ring to current ringREGISTER_LASTREGISTER_USE_REGISTERED_RING
- See Also:
-
IORING_UNREGISTER_IOWQ_AFF
public static final int IORING_UNREGISTER_IOWQ_AFFio_uring_register_opEnum values:
REGISTER_BUFFERS-argpoints to a structiovecarray ofnr_argsentries.The buffers associated with the
iovecswill be locked in memory and charged against the user'sRLIMIT_MEMLOCKresource limit. Seegetrlimit(2)for more information. Additionally, there is a size limit of 1GiB per buffer. Currently, the buffers must be anonymous, non-file-backed memory, such as that returned bymalloc(3)ormmap(2)with theMAP_ANONYMOUSflag set. It is expected that this limitation will be lifted in the future. Huge pages are supported as well. Note that the entire huge page will be pinned in the kernel, even if only a portion of it is used.After a successful call, the supplied buffers are mapped into the kernel and eligible for I/O. To make use of them, the application must specify the
OP_READ_FIXEDorOP_WRITE_FIXEDopcodesin the submission queue entry (see the structio_uring_sqedefinition inenter), and set thebuf_indexfield to the desired buffer index. The memory range described by the submission queue entry'saddrandlenfields must fall within the indexed buffer.It is perfectly valid to setup a large buffer and then only use part of it for an I/O, as long as the range is within the originally mapped region.
An application can increase or decrease the size or number of registered buffers by first unregistering the existing buffers, and then issuing a new call to
io_uring_register()with the new buffers.Note that before 5.13 registering buffers would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding.
An application need not unregister buffers explicitly before shutting down the io_uring instance.
Available since 5.1.
UNREGISTER_BUFFERS- This operation takes no argument, andargmust be passed asNULL.All previously registered buffers associated with the
io_uringinstance will be released.Available since 5.1.
REGISTER_FILES- Register files for I/O.argcontains a pointer to an array ofnr_argsfile descriptors (signed 32 bit integers). To make use of the registered files, theIOSQE_FIXED_FILEflag must be set in theflagsmember of the structio_uring_sqe, and thefdmember is set to the index of the file in the file descriptor array.The file set may be sparse, meaning that the
fdfield in the array may be set to -1. SeeREGISTER_FILES_UPDATEfor how to update files in place.Note that before 5.13 registering files would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding. See
REGISTER_FILES_UPDATEfor how to update an existing set without that limitation.Files are automatically unregistered when the io_uring instance is torn down. An application needs only unregister if it wishes to register a new set of fds.
Available since 5.1.
UNREGISTER_FILES- This operation requires no argument, andargmust be passed asNULL.All previously registered files associated with the
io_uringinstance will be unregistered.Available since 5.1.
REGISTER_EVENTFD- It's possible to useeventfd(2)to get notified of completion events on anio_uringinstance. If this is desired, an eventfd file descriptor can be registered through this operation.argmust contain a pointer to the eventfd file descriptor, andnr_argsmust be 1.Available since 5.2.
An application can temporarily disable notifications, coming through the registered eventfd, by setting the
CQ_EVENTFD_DISABLEDbit in theflagsfield of the CQ ring.Available since 5.8.
UNREGISTER_EVENTFD- Unregister an eventfd file descriptor to stop notifications.Since only one eventfd descriptor is currently supported, this operation takes no argument, and
argmust be passed asNULLandnr_argsmust be zero.Available since 5.2.
REGISTER_FILES_UPDATE- This operation replaces existing files in the registered file set with new ones, either turning a sparse entry (one wherefdis equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_files_update, which contains an offset on which to start the update, and an array of file descriptors to use for the update.nr_argsmust contain the number of descriptors in the passed in array.Available since 5.5.
File descriptors can be skipped if they are set to
REGISTER_FILES_SKIP. Skipping an fd will not touch the file associated with the previous fd at that index.Available since 5.12.
REGISTER_EVENTFD_ASYNC- This works just likeREGISTER_EVENTFD, except notifications are only posted for events that complete in an async manner.This means that events that complete inline while being submitted do not trigger a notification event. The arguments supplied are the same as for
IORING_REGISTER_EVENTFD.Available since 5.6.
REGISTER_PROBE- This operation returns a structure,io_uring_probe, which contains information about theopcodessupported byio_uringon the running kernel.argmust contain a pointer to a structio_uring_probe, andnr_argsmust contain the size of the ops array in that probe struct. Theopsarray is of the typeio_uring_probe_op, which holds the value of theopcodeand aflagsfield. If the flags field hasIO_URING_OP_SUPPORTEDset, then this opcode is supported on the running kernel.Available since 5.6.
REGISTER_PERSONALITY- This operation registers credentials of the running application withio_uring, and returns an id associated with these credentials.Applications wishing to share a ring between separate users/processes can pass in this credential id in the sqe personality field. If set, that particular sqe will be issued with these credentials. Must be invoked with
argset toNULLandnr_argsset to zero.Available since 5.6.
UNREGISTER_PERSONALITY- This operation unregisters a previously registered personality withio_uring.nr_argsmust be set to the id in question, andargmust be set toNULL.Available since 5.6.
REGISTER_RESTRICTIONS-argpoints to a structio_uring_restrictionarray ofnr_argsentries.With an entry it is possible to allow an
registeropcode, or specify whichopcodeand flags of the submission queue entry are allowed, or require certain flags to be specified (these flags must be set on each submission queue entry).All the restrictions must be submitted with a single
io_uring_register()call and they are handled as an allowlist (opcodesand flags not registered, are not allowed).Restrictions can be registered only if the
io_uringring started in a disabled state (SETUP_R_DISABLEDmust be specified in the call tosetup).Available since 5.10.
REGISTER_ENABLE_RINGS- This operation enables anio_uringring started in a disabled state (SETUP_R_DISABLEDwas specified in the call tosetup).While the
io_uringring is disabled, submissions are not allowed and registrations are not restricted. After the execution of this operation, theio_uringring is enabled: submissions and registration are allowed, but they will be validated following the registered restrictions (if any). This operation takes no argument, must be invoked withargset toNULLandnr_argsset to zero.Available since 5.10.
REGISTER_FILES2- Register files for I/O. Similar toREGISTER_FILES.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The
datafield contains a pointer to an array ofnrfile descriptors (signed 32 bit integers).tagsfield should either be 0 or or point to an array ofnr"tags" (unsigned 64 bit integers). SeeREGISTER_BUFFERS2for more info on resource tagging.Note that resource updates, e.g.
REGISTER_FILES_UPDATE, don't necessarily deallocate resources, they might be held until all requests using that resource complete.Available since 5.13.
REGISTER_FILES_UPDATE2- Similar toREGISTER_FILES_UPDATE, replaces existing files in the registered file set with new ones, either turning a sparse entry (one where fd is equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of file descriptors to use for the update stored in data.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_BUFFERS2- Register buffers for I/O.Similar to
REGISTER_BUFFERSbut aims to have a more extensible ABI.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The data field contains a pointer to a struct
iovecarray ofnrentries. Thetagsfield should either be 0, then tagging is disabled, or point to an array ofnr"tags" (unsigned 64 bit integers). If a tag is zero, then tagging for this particular resource (a buffer in this case) is disabled. Otherwise, after the resource had been unregistered and it's not used anymore, a CQE will be posted withuser_dataset to the specified tag and all other fields zeroed.Note that resource updates, e.g.
REGISTER_BUFFERS_UPDATE, don't necessarily deallocate resources by the time it returns, but they might be held alive until all requests using it complete.Available since 5.13.
REGISTER_BUFFERS_UPDATE- Updates registered buffers with new ones, either turning a sparse entry into a real one, or replacing an existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of structiovec.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_IOWQ_AFF- By default, async workers created byio_uringwill inherit the CPU mask of its parent.This is usually all the CPUs in the system, unless the parent is being run with a limited set. If this isn't the desired outcome, the application may explicitly tell
io_uringwhat CPUs the async workers may run on.argmust point to acpu_set_tmask, andnr_argsthe byte size of that mask.Available since 5.14.
UNREGISTER_IOWQ_AFF- Undoes a CPU mask previously set withREGISTER_IOWQ_AFF.Must not have
argornr_argsset.Available since 5.14.
REGISTER_IOWQ_MAX_WORKERS- By default,io_uringlimits the unbounded workers created to the maximum processor count set byRLIMIT_NPROCand the bounded workers is a function of the SQ ring size and the number of CPUs in the system. Sometimes this can be excessive (or too little, for bounded), and this command provides a way to change the count per ring (per NUMA node) instead.argmust be set to an unsigned int pointer to an array of two values, with the values in the array being set to the maximum count of workers per NUMA node. Index 0 holds the bounded worker count, and index 1 holds the unbounded worker count. On successful return, the passed in array will contain the previous maximum valyes for each type. If the count being passed in is 0, then this command returns the current maximum values and doesn't modify the current setting.nr_argsmust be set to 2, as the command takes two values.Available since 5.15.
REGISTER_RING_FDS- Wheneverenteris called to submit request or wait for completions, the kernel must grab a reference to the file descriptor. If the application using io_uring is threaded, the file table is marked as shared, and the reference grab and put of the file descriptor count is more expensive than it is for a non-threaded application.Similarly to how io_uring allows registration of files, this allow registration of the ring file descriptor itself. This reduces the overhead of the
io_uring_enter (2)system call.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Thedatafield of this struct must point to an io_uring file descriptor, and theoffsetfield can be either-1or an explicit offset desired for the registered file descriptor value. If-1is used, then upon successful return of this system call, the field will contain the value of the registered file descriptor to be used for futureio_uring_enter (2)system calls.On successful completion of this request, the returned descriptors may be used instead of the real file descriptor for
io_uring_enter (2), provided thatIORING_ENTER_REGISTERED_RINGis set in theflagsfor the system call. This flag tells the kernel that a registered descriptor is used rather than a real file descriptor.Each thread or process using a ring must register the file descriptor directly by issuing this request.
The maximum number of supported registered ring descriptors is currently limited to
16.Available since 5.18.
UNREGISTER_RING_FDS- Unregister descriptors previously registered withREGISTER_RING_FDS.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Only theoffsetfield should be set in the structure, containing the registered file descriptor offset previously returned fromIORING_REGISTER_RING_FDSthat the application wishes to unregister.Note that this isn't done automatically on ring exit, if the thread or task that previously registered a ring file descriptor isn't exiting. It is recommended to manually unregister any previously registered ring descriptors if the ring is closed and the task persists. This will free up a registration slot, making it available for future use.
Available since 5.18.
REGISTER_PBUF_RING- register ring based provide buffer groupUNREGISTER_PBUF_RING- unregister ring based provide buffer groupREGISTER_SYNC_CANCEL- sync cancelation APIREGISTER_FILE_ALLOC_RANGE- register a range of fixed file slots for automatic slot allocationREGISTER_PBUF_STATUS- return status information for a buffer groupREGISTER_NAPI- set busy poll settingsUNREGISTER_NAPI- clear busy poll settingsREGISTER_CLOCKREGISTER_CLONE_BUFFERS- clone registered buffers from source ring to current ringREGISTER_LASTREGISTER_USE_REGISTERED_RING
- See Also:
-
IORING_REGISTER_IOWQ_MAX_WORKERS
public static final int IORING_REGISTER_IOWQ_MAX_WORKERSio_uring_register_opEnum values:
REGISTER_BUFFERS-argpoints to a structiovecarray ofnr_argsentries.The buffers associated with the
iovecswill be locked in memory and charged against the user'sRLIMIT_MEMLOCKresource limit. Seegetrlimit(2)for more information. Additionally, there is a size limit of 1GiB per buffer. Currently, the buffers must be anonymous, non-file-backed memory, such as that returned bymalloc(3)ormmap(2)with theMAP_ANONYMOUSflag set. It is expected that this limitation will be lifted in the future. Huge pages are supported as well. Note that the entire huge page will be pinned in the kernel, even if only a portion of it is used.After a successful call, the supplied buffers are mapped into the kernel and eligible for I/O. To make use of them, the application must specify the
OP_READ_FIXEDorOP_WRITE_FIXEDopcodesin the submission queue entry (see the structio_uring_sqedefinition inenter), and set thebuf_indexfield to the desired buffer index. The memory range described by the submission queue entry'saddrandlenfields must fall within the indexed buffer.It is perfectly valid to setup a large buffer and then only use part of it for an I/O, as long as the range is within the originally mapped region.
An application can increase or decrease the size or number of registered buffers by first unregistering the existing buffers, and then issuing a new call to
io_uring_register()with the new buffers.Note that before 5.13 registering buffers would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding.
An application need not unregister buffers explicitly before shutting down the io_uring instance.
Available since 5.1.
UNREGISTER_BUFFERS- This operation takes no argument, andargmust be passed asNULL.All previously registered buffers associated with the
io_uringinstance will be released.Available since 5.1.
REGISTER_FILES- Register files for I/O.argcontains a pointer to an array ofnr_argsfile descriptors (signed 32 bit integers). To make use of the registered files, theIOSQE_FIXED_FILEflag must be set in theflagsmember of the structio_uring_sqe, and thefdmember is set to the index of the file in the file descriptor array.The file set may be sparse, meaning that the
fdfield in the array may be set to -1. SeeREGISTER_FILES_UPDATEfor how to update files in place.Note that before 5.13 registering files would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding. See
REGISTER_FILES_UPDATEfor how to update an existing set without that limitation.Files are automatically unregistered when the io_uring instance is torn down. An application needs only unregister if it wishes to register a new set of fds.
Available since 5.1.
UNREGISTER_FILES- This operation requires no argument, andargmust be passed asNULL.All previously registered files associated with the
io_uringinstance will be unregistered.Available since 5.1.
REGISTER_EVENTFD- It's possible to useeventfd(2)to get notified of completion events on anio_uringinstance. If this is desired, an eventfd file descriptor can be registered through this operation.argmust contain a pointer to the eventfd file descriptor, andnr_argsmust be 1.Available since 5.2.
An application can temporarily disable notifications, coming through the registered eventfd, by setting the
CQ_EVENTFD_DISABLEDbit in theflagsfield of the CQ ring.Available since 5.8.
UNREGISTER_EVENTFD- Unregister an eventfd file descriptor to stop notifications.Since only one eventfd descriptor is currently supported, this operation takes no argument, and
argmust be passed asNULLandnr_argsmust be zero.Available since 5.2.
REGISTER_FILES_UPDATE- This operation replaces existing files in the registered file set with new ones, either turning a sparse entry (one wherefdis equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_files_update, which contains an offset on which to start the update, and an array of file descriptors to use for the update.nr_argsmust contain the number of descriptors in the passed in array.Available since 5.5.
File descriptors can be skipped if they are set to
REGISTER_FILES_SKIP. Skipping an fd will not touch the file associated with the previous fd at that index.Available since 5.12.
REGISTER_EVENTFD_ASYNC- This works just likeREGISTER_EVENTFD, except notifications are only posted for events that complete in an async manner.This means that events that complete inline while being submitted do not trigger a notification event. The arguments supplied are the same as for
IORING_REGISTER_EVENTFD.Available since 5.6.
REGISTER_PROBE- This operation returns a structure,io_uring_probe, which contains information about theopcodessupported byio_uringon the running kernel.argmust contain a pointer to a structio_uring_probe, andnr_argsmust contain the size of the ops array in that probe struct. Theopsarray is of the typeio_uring_probe_op, which holds the value of theopcodeand aflagsfield. If the flags field hasIO_URING_OP_SUPPORTEDset, then this opcode is supported on the running kernel.Available since 5.6.
REGISTER_PERSONALITY- This operation registers credentials of the running application withio_uring, and returns an id associated with these credentials.Applications wishing to share a ring between separate users/processes can pass in this credential id in the sqe personality field. If set, that particular sqe will be issued with these credentials. Must be invoked with
argset toNULLandnr_argsset to zero.Available since 5.6.
UNREGISTER_PERSONALITY- This operation unregisters a previously registered personality withio_uring.nr_argsmust be set to the id in question, andargmust be set toNULL.Available since 5.6.
REGISTER_RESTRICTIONS-argpoints to a structio_uring_restrictionarray ofnr_argsentries.With an entry it is possible to allow an
registeropcode, or specify whichopcodeand flags of the submission queue entry are allowed, or require certain flags to be specified (these flags must be set on each submission queue entry).All the restrictions must be submitted with a single
io_uring_register()call and they are handled as an allowlist (opcodesand flags not registered, are not allowed).Restrictions can be registered only if the
io_uringring started in a disabled state (SETUP_R_DISABLEDmust be specified in the call tosetup).Available since 5.10.
REGISTER_ENABLE_RINGS- This operation enables anio_uringring started in a disabled state (SETUP_R_DISABLEDwas specified in the call tosetup).While the
io_uringring is disabled, submissions are not allowed and registrations are not restricted. After the execution of this operation, theio_uringring is enabled: submissions and registration are allowed, but they will be validated following the registered restrictions (if any). This operation takes no argument, must be invoked withargset toNULLandnr_argsset to zero.Available since 5.10.
REGISTER_FILES2- Register files for I/O. Similar toREGISTER_FILES.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The
datafield contains a pointer to an array ofnrfile descriptors (signed 32 bit integers).tagsfield should either be 0 or or point to an array ofnr"tags" (unsigned 64 bit integers). SeeREGISTER_BUFFERS2for more info on resource tagging.Note that resource updates, e.g.
REGISTER_FILES_UPDATE, don't necessarily deallocate resources, they might be held until all requests using that resource complete.Available since 5.13.
REGISTER_FILES_UPDATE2- Similar toREGISTER_FILES_UPDATE, replaces existing files in the registered file set with new ones, either turning a sparse entry (one where fd is equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of file descriptors to use for the update stored in data.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_BUFFERS2- Register buffers for I/O.Similar to
REGISTER_BUFFERSbut aims to have a more extensible ABI.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The data field contains a pointer to a struct
iovecarray ofnrentries. Thetagsfield should either be 0, then tagging is disabled, or point to an array ofnr"tags" (unsigned 64 bit integers). If a tag is zero, then tagging for this particular resource (a buffer in this case) is disabled. Otherwise, after the resource had been unregistered and it's not used anymore, a CQE will be posted withuser_dataset to the specified tag and all other fields zeroed.Note that resource updates, e.g.
REGISTER_BUFFERS_UPDATE, don't necessarily deallocate resources by the time it returns, but they might be held alive until all requests using it complete.Available since 5.13.
REGISTER_BUFFERS_UPDATE- Updates registered buffers with new ones, either turning a sparse entry into a real one, or replacing an existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of structiovec.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_IOWQ_AFF- By default, async workers created byio_uringwill inherit the CPU mask of its parent.This is usually all the CPUs in the system, unless the parent is being run with a limited set. If this isn't the desired outcome, the application may explicitly tell
io_uringwhat CPUs the async workers may run on.argmust point to acpu_set_tmask, andnr_argsthe byte size of that mask.Available since 5.14.
UNREGISTER_IOWQ_AFF- Undoes a CPU mask previously set withREGISTER_IOWQ_AFF.Must not have
argornr_argsset.Available since 5.14.
REGISTER_IOWQ_MAX_WORKERS- By default,io_uringlimits the unbounded workers created to the maximum processor count set byRLIMIT_NPROCand the bounded workers is a function of the SQ ring size and the number of CPUs in the system. Sometimes this can be excessive (or too little, for bounded), and this command provides a way to change the count per ring (per NUMA node) instead.argmust be set to an unsigned int pointer to an array of two values, with the values in the array being set to the maximum count of workers per NUMA node. Index 0 holds the bounded worker count, and index 1 holds the unbounded worker count. On successful return, the passed in array will contain the previous maximum valyes for each type. If the count being passed in is 0, then this command returns the current maximum values and doesn't modify the current setting.nr_argsmust be set to 2, as the command takes two values.Available since 5.15.
REGISTER_RING_FDS- Wheneverenteris called to submit request or wait for completions, the kernel must grab a reference to the file descriptor. If the application using io_uring is threaded, the file table is marked as shared, and the reference grab and put of the file descriptor count is more expensive than it is for a non-threaded application.Similarly to how io_uring allows registration of files, this allow registration of the ring file descriptor itself. This reduces the overhead of the
io_uring_enter (2)system call.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Thedatafield of this struct must point to an io_uring file descriptor, and theoffsetfield can be either-1or an explicit offset desired for the registered file descriptor value. If-1is used, then upon successful return of this system call, the field will contain the value of the registered file descriptor to be used for futureio_uring_enter (2)system calls.On successful completion of this request, the returned descriptors may be used instead of the real file descriptor for
io_uring_enter (2), provided thatIORING_ENTER_REGISTERED_RINGis set in theflagsfor the system call. This flag tells the kernel that a registered descriptor is used rather than a real file descriptor.Each thread or process using a ring must register the file descriptor directly by issuing this request.
The maximum number of supported registered ring descriptors is currently limited to
16.Available since 5.18.
UNREGISTER_RING_FDS- Unregister descriptors previously registered withREGISTER_RING_FDS.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Only theoffsetfield should be set in the structure, containing the registered file descriptor offset previously returned fromIORING_REGISTER_RING_FDSthat the application wishes to unregister.Note that this isn't done automatically on ring exit, if the thread or task that previously registered a ring file descriptor isn't exiting. It is recommended to manually unregister any previously registered ring descriptors if the ring is closed and the task persists. This will free up a registration slot, making it available for future use.
Available since 5.18.
REGISTER_PBUF_RING- register ring based provide buffer groupUNREGISTER_PBUF_RING- unregister ring based provide buffer groupREGISTER_SYNC_CANCEL- sync cancelation APIREGISTER_FILE_ALLOC_RANGE- register a range of fixed file slots for automatic slot allocationREGISTER_PBUF_STATUS- return status information for a buffer groupREGISTER_NAPI- set busy poll settingsUNREGISTER_NAPI- clear busy poll settingsREGISTER_CLOCKREGISTER_CLONE_BUFFERS- clone registered buffers from source ring to current ringREGISTER_LASTREGISTER_USE_REGISTERED_RING
- See Also:
-
IORING_REGISTER_RING_FDS
public static final int IORING_REGISTER_RING_FDSio_uring_register_opEnum values:
REGISTER_BUFFERS-argpoints to a structiovecarray ofnr_argsentries.The buffers associated with the
iovecswill be locked in memory and charged against the user'sRLIMIT_MEMLOCKresource limit. Seegetrlimit(2)for more information. Additionally, there is a size limit of 1GiB per buffer. Currently, the buffers must be anonymous, non-file-backed memory, such as that returned bymalloc(3)ormmap(2)with theMAP_ANONYMOUSflag set. It is expected that this limitation will be lifted in the future. Huge pages are supported as well. Note that the entire huge page will be pinned in the kernel, even if only a portion of it is used.After a successful call, the supplied buffers are mapped into the kernel and eligible for I/O. To make use of them, the application must specify the
OP_READ_FIXEDorOP_WRITE_FIXEDopcodesin the submission queue entry (see the structio_uring_sqedefinition inenter), and set thebuf_indexfield to the desired buffer index. The memory range described by the submission queue entry'saddrandlenfields must fall within the indexed buffer.It is perfectly valid to setup a large buffer and then only use part of it for an I/O, as long as the range is within the originally mapped region.
An application can increase or decrease the size or number of registered buffers by first unregistering the existing buffers, and then issuing a new call to
io_uring_register()with the new buffers.Note that before 5.13 registering buffers would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding.
An application need not unregister buffers explicitly before shutting down the io_uring instance.
Available since 5.1.
UNREGISTER_BUFFERS- This operation takes no argument, andargmust be passed asNULL.All previously registered buffers associated with the
io_uringinstance will be released.Available since 5.1.
REGISTER_FILES- Register files for I/O.argcontains a pointer to an array ofnr_argsfile descriptors (signed 32 bit integers). To make use of the registered files, theIOSQE_FIXED_FILEflag must be set in theflagsmember of the structio_uring_sqe, and thefdmember is set to the index of the file in the file descriptor array.The file set may be sparse, meaning that the
fdfield in the array may be set to -1. SeeREGISTER_FILES_UPDATEfor how to update files in place.Note that before 5.13 registering files would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding. See
REGISTER_FILES_UPDATEfor how to update an existing set without that limitation.Files are automatically unregistered when the io_uring instance is torn down. An application needs only unregister if it wishes to register a new set of fds.
Available since 5.1.
UNREGISTER_FILES- This operation requires no argument, andargmust be passed asNULL.All previously registered files associated with the
io_uringinstance will be unregistered.Available since 5.1.
REGISTER_EVENTFD- It's possible to useeventfd(2)to get notified of completion events on anio_uringinstance. If this is desired, an eventfd file descriptor can be registered through this operation.argmust contain a pointer to the eventfd file descriptor, andnr_argsmust be 1.Available since 5.2.
An application can temporarily disable notifications, coming through the registered eventfd, by setting the
CQ_EVENTFD_DISABLEDbit in theflagsfield of the CQ ring.Available since 5.8.
UNREGISTER_EVENTFD- Unregister an eventfd file descriptor to stop notifications.Since only one eventfd descriptor is currently supported, this operation takes no argument, and
argmust be passed asNULLandnr_argsmust be zero.Available since 5.2.
REGISTER_FILES_UPDATE- This operation replaces existing files in the registered file set with new ones, either turning a sparse entry (one wherefdis equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_files_update, which contains an offset on which to start the update, and an array of file descriptors to use for the update.nr_argsmust contain the number of descriptors in the passed in array.Available since 5.5.
File descriptors can be skipped if they are set to
REGISTER_FILES_SKIP. Skipping an fd will not touch the file associated with the previous fd at that index.Available since 5.12.
REGISTER_EVENTFD_ASYNC- This works just likeREGISTER_EVENTFD, except notifications are only posted for events that complete in an async manner.This means that events that complete inline while being submitted do not trigger a notification event. The arguments supplied are the same as for
IORING_REGISTER_EVENTFD.Available since 5.6.
REGISTER_PROBE- This operation returns a structure,io_uring_probe, which contains information about theopcodessupported byio_uringon the running kernel.argmust contain a pointer to a structio_uring_probe, andnr_argsmust contain the size of the ops array in that probe struct. Theopsarray is of the typeio_uring_probe_op, which holds the value of theopcodeand aflagsfield. If the flags field hasIO_URING_OP_SUPPORTEDset, then this opcode is supported on the running kernel.Available since 5.6.
REGISTER_PERSONALITY- This operation registers credentials of the running application withio_uring, and returns an id associated with these credentials.Applications wishing to share a ring between separate users/processes can pass in this credential id in the sqe personality field. If set, that particular sqe will be issued with these credentials. Must be invoked with
argset toNULLandnr_argsset to zero.Available since 5.6.
UNREGISTER_PERSONALITY- This operation unregisters a previously registered personality withio_uring.nr_argsmust be set to the id in question, andargmust be set toNULL.Available since 5.6.
REGISTER_RESTRICTIONS-argpoints to a structio_uring_restrictionarray ofnr_argsentries.With an entry it is possible to allow an
registeropcode, or specify whichopcodeand flags of the submission queue entry are allowed, or require certain flags to be specified (these flags must be set on each submission queue entry).All the restrictions must be submitted with a single
io_uring_register()call and they are handled as an allowlist (opcodesand flags not registered, are not allowed).Restrictions can be registered only if the
io_uringring started in a disabled state (SETUP_R_DISABLEDmust be specified in the call tosetup).Available since 5.10.
REGISTER_ENABLE_RINGS- This operation enables anio_uringring started in a disabled state (SETUP_R_DISABLEDwas specified in the call tosetup).While the
io_uringring is disabled, submissions are not allowed and registrations are not restricted. After the execution of this operation, theio_uringring is enabled: submissions and registration are allowed, but they will be validated following the registered restrictions (if any). This operation takes no argument, must be invoked withargset toNULLandnr_argsset to zero.Available since 5.10.
REGISTER_FILES2- Register files for I/O. Similar toREGISTER_FILES.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The
datafield contains a pointer to an array ofnrfile descriptors (signed 32 bit integers).tagsfield should either be 0 or or point to an array ofnr"tags" (unsigned 64 bit integers). SeeREGISTER_BUFFERS2for more info on resource tagging.Note that resource updates, e.g.
REGISTER_FILES_UPDATE, don't necessarily deallocate resources, they might be held until all requests using that resource complete.Available since 5.13.
REGISTER_FILES_UPDATE2- Similar toREGISTER_FILES_UPDATE, replaces existing files in the registered file set with new ones, either turning a sparse entry (one where fd is equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of file descriptors to use for the update stored in data.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_BUFFERS2- Register buffers for I/O.Similar to
REGISTER_BUFFERSbut aims to have a more extensible ABI.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The data field contains a pointer to a struct
iovecarray ofnrentries. Thetagsfield should either be 0, then tagging is disabled, or point to an array ofnr"tags" (unsigned 64 bit integers). If a tag is zero, then tagging for this particular resource (a buffer in this case) is disabled. Otherwise, after the resource had been unregistered and it's not used anymore, a CQE will be posted withuser_dataset to the specified tag and all other fields zeroed.Note that resource updates, e.g.
REGISTER_BUFFERS_UPDATE, don't necessarily deallocate resources by the time it returns, but they might be held alive until all requests using it complete.Available since 5.13.
REGISTER_BUFFERS_UPDATE- Updates registered buffers with new ones, either turning a sparse entry into a real one, or replacing an existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of structiovec.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_IOWQ_AFF- By default, async workers created byio_uringwill inherit the CPU mask of its parent.This is usually all the CPUs in the system, unless the parent is being run with a limited set. If this isn't the desired outcome, the application may explicitly tell
io_uringwhat CPUs the async workers may run on.argmust point to acpu_set_tmask, andnr_argsthe byte size of that mask.Available since 5.14.
UNREGISTER_IOWQ_AFF- Undoes a CPU mask previously set withREGISTER_IOWQ_AFF.Must not have
argornr_argsset.Available since 5.14.
REGISTER_IOWQ_MAX_WORKERS- By default,io_uringlimits the unbounded workers created to the maximum processor count set byRLIMIT_NPROCand the bounded workers is a function of the SQ ring size and the number of CPUs in the system. Sometimes this can be excessive (or too little, for bounded), and this command provides a way to change the count per ring (per NUMA node) instead.argmust be set to an unsigned int pointer to an array of two values, with the values in the array being set to the maximum count of workers per NUMA node. Index 0 holds the bounded worker count, and index 1 holds the unbounded worker count. On successful return, the passed in array will contain the previous maximum valyes for each type. If the count being passed in is 0, then this command returns the current maximum values and doesn't modify the current setting.nr_argsmust be set to 2, as the command takes two values.Available since 5.15.
REGISTER_RING_FDS- Wheneverenteris called to submit request or wait for completions, the kernel must grab a reference to the file descriptor. If the application using io_uring is threaded, the file table is marked as shared, and the reference grab and put of the file descriptor count is more expensive than it is for a non-threaded application.Similarly to how io_uring allows registration of files, this allow registration of the ring file descriptor itself. This reduces the overhead of the
io_uring_enter (2)system call.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Thedatafield of this struct must point to an io_uring file descriptor, and theoffsetfield can be either-1or an explicit offset desired for the registered file descriptor value. If-1is used, then upon successful return of this system call, the field will contain the value of the registered file descriptor to be used for futureio_uring_enter (2)system calls.On successful completion of this request, the returned descriptors may be used instead of the real file descriptor for
io_uring_enter (2), provided thatIORING_ENTER_REGISTERED_RINGis set in theflagsfor the system call. This flag tells the kernel that a registered descriptor is used rather than a real file descriptor.Each thread or process using a ring must register the file descriptor directly by issuing this request.
The maximum number of supported registered ring descriptors is currently limited to
16.Available since 5.18.
UNREGISTER_RING_FDS- Unregister descriptors previously registered withREGISTER_RING_FDS.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Only theoffsetfield should be set in the structure, containing the registered file descriptor offset previously returned fromIORING_REGISTER_RING_FDSthat the application wishes to unregister.Note that this isn't done automatically on ring exit, if the thread or task that previously registered a ring file descriptor isn't exiting. It is recommended to manually unregister any previously registered ring descriptors if the ring is closed and the task persists. This will free up a registration slot, making it available for future use.
Available since 5.18.
REGISTER_PBUF_RING- register ring based provide buffer groupUNREGISTER_PBUF_RING- unregister ring based provide buffer groupREGISTER_SYNC_CANCEL- sync cancelation APIREGISTER_FILE_ALLOC_RANGE- register a range of fixed file slots for automatic slot allocationREGISTER_PBUF_STATUS- return status information for a buffer groupREGISTER_NAPI- set busy poll settingsUNREGISTER_NAPI- clear busy poll settingsREGISTER_CLOCKREGISTER_CLONE_BUFFERS- clone registered buffers from source ring to current ringREGISTER_LASTREGISTER_USE_REGISTERED_RING
- See Also:
-
IORING_UNREGISTER_RING_FDS
public static final int IORING_UNREGISTER_RING_FDSio_uring_register_opEnum values:
REGISTER_BUFFERS-argpoints to a structiovecarray ofnr_argsentries.The buffers associated with the
iovecswill be locked in memory and charged against the user'sRLIMIT_MEMLOCKresource limit. Seegetrlimit(2)for more information. Additionally, there is a size limit of 1GiB per buffer. Currently, the buffers must be anonymous, non-file-backed memory, such as that returned bymalloc(3)ormmap(2)with theMAP_ANONYMOUSflag set. It is expected that this limitation will be lifted in the future. Huge pages are supported as well. Note that the entire huge page will be pinned in the kernel, even if only a portion of it is used.After a successful call, the supplied buffers are mapped into the kernel and eligible for I/O. To make use of them, the application must specify the
OP_READ_FIXEDorOP_WRITE_FIXEDopcodesin the submission queue entry (see the structio_uring_sqedefinition inenter), and set thebuf_indexfield to the desired buffer index. The memory range described by the submission queue entry'saddrandlenfields must fall within the indexed buffer.It is perfectly valid to setup a large buffer and then only use part of it for an I/O, as long as the range is within the originally mapped region.
An application can increase or decrease the size or number of registered buffers by first unregistering the existing buffers, and then issuing a new call to
io_uring_register()with the new buffers.Note that before 5.13 registering buffers would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding.
An application need not unregister buffers explicitly before shutting down the io_uring instance.
Available since 5.1.
UNREGISTER_BUFFERS- This operation takes no argument, andargmust be passed asNULL.All previously registered buffers associated with the
io_uringinstance will be released.Available since 5.1.
REGISTER_FILES- Register files for I/O.argcontains a pointer to an array ofnr_argsfile descriptors (signed 32 bit integers). To make use of the registered files, theIOSQE_FIXED_FILEflag must be set in theflagsmember of the structio_uring_sqe, and thefdmember is set to the index of the file in the file descriptor array.The file set may be sparse, meaning that the
fdfield in the array may be set to -1. SeeREGISTER_FILES_UPDATEfor how to update files in place.Note that before 5.13 registering files would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding. See
REGISTER_FILES_UPDATEfor how to update an existing set without that limitation.Files are automatically unregistered when the io_uring instance is torn down. An application needs only unregister if it wishes to register a new set of fds.
Available since 5.1.
UNREGISTER_FILES- This operation requires no argument, andargmust be passed asNULL.All previously registered files associated with the
io_uringinstance will be unregistered.Available since 5.1.
REGISTER_EVENTFD- It's possible to useeventfd(2)to get notified of completion events on anio_uringinstance. If this is desired, an eventfd file descriptor can be registered through this operation.argmust contain a pointer to the eventfd file descriptor, andnr_argsmust be 1.Available since 5.2.
An application can temporarily disable notifications, coming through the registered eventfd, by setting the
CQ_EVENTFD_DISABLEDbit in theflagsfield of the CQ ring.Available since 5.8.
UNREGISTER_EVENTFD- Unregister an eventfd file descriptor to stop notifications.Since only one eventfd descriptor is currently supported, this operation takes no argument, and
argmust be passed asNULLandnr_argsmust be zero.Available since 5.2.
REGISTER_FILES_UPDATE- This operation replaces existing files in the registered file set with new ones, either turning a sparse entry (one wherefdis equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_files_update, which contains an offset on which to start the update, and an array of file descriptors to use for the update.nr_argsmust contain the number of descriptors in the passed in array.Available since 5.5.
File descriptors can be skipped if they are set to
REGISTER_FILES_SKIP. Skipping an fd will not touch the file associated with the previous fd at that index.Available since 5.12.
REGISTER_EVENTFD_ASYNC- This works just likeREGISTER_EVENTFD, except notifications are only posted for events that complete in an async manner.This means that events that complete inline while being submitted do not trigger a notification event. The arguments supplied are the same as for
IORING_REGISTER_EVENTFD.Available since 5.6.
REGISTER_PROBE- This operation returns a structure,io_uring_probe, which contains information about theopcodessupported byio_uringon the running kernel.argmust contain a pointer to a structio_uring_probe, andnr_argsmust contain the size of the ops array in that probe struct. Theopsarray is of the typeio_uring_probe_op, which holds the value of theopcodeand aflagsfield. If the flags field hasIO_URING_OP_SUPPORTEDset, then this opcode is supported on the running kernel.Available since 5.6.
REGISTER_PERSONALITY- This operation registers credentials of the running application withio_uring, and returns an id associated with these credentials.Applications wishing to share a ring between separate users/processes can pass in this credential id in the sqe personality field. If set, that particular sqe will be issued with these credentials. Must be invoked with
argset toNULLandnr_argsset to zero.Available since 5.6.
UNREGISTER_PERSONALITY- This operation unregisters a previously registered personality withio_uring.nr_argsmust be set to the id in question, andargmust be set toNULL.Available since 5.6.
REGISTER_RESTRICTIONS-argpoints to a structio_uring_restrictionarray ofnr_argsentries.With an entry it is possible to allow an
registeropcode, or specify whichopcodeand flags of the submission queue entry are allowed, or require certain flags to be specified (these flags must be set on each submission queue entry).All the restrictions must be submitted with a single
io_uring_register()call and they are handled as an allowlist (opcodesand flags not registered, are not allowed).Restrictions can be registered only if the
io_uringring started in a disabled state (SETUP_R_DISABLEDmust be specified in the call tosetup).Available since 5.10.
REGISTER_ENABLE_RINGS- This operation enables anio_uringring started in a disabled state (SETUP_R_DISABLEDwas specified in the call tosetup).While the
io_uringring is disabled, submissions are not allowed and registrations are not restricted. After the execution of this operation, theio_uringring is enabled: submissions and registration are allowed, but they will be validated following the registered restrictions (if any). This operation takes no argument, must be invoked withargset toNULLandnr_argsset to zero.Available since 5.10.
REGISTER_FILES2- Register files for I/O. Similar toREGISTER_FILES.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The
datafield contains a pointer to an array ofnrfile descriptors (signed 32 bit integers).tagsfield should either be 0 or or point to an array ofnr"tags" (unsigned 64 bit integers). SeeREGISTER_BUFFERS2for more info on resource tagging.Note that resource updates, e.g.
REGISTER_FILES_UPDATE, don't necessarily deallocate resources, they might be held until all requests using that resource complete.Available since 5.13.
REGISTER_FILES_UPDATE2- Similar toREGISTER_FILES_UPDATE, replaces existing files in the registered file set with new ones, either turning a sparse entry (one where fd is equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of file descriptors to use for the update stored in data.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_BUFFERS2- Register buffers for I/O.Similar to
REGISTER_BUFFERSbut aims to have a more extensible ABI.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The data field contains a pointer to a struct
iovecarray ofnrentries. Thetagsfield should either be 0, then tagging is disabled, or point to an array ofnr"tags" (unsigned 64 bit integers). If a tag is zero, then tagging for this particular resource (a buffer in this case) is disabled. Otherwise, after the resource had been unregistered and it's not used anymore, a CQE will be posted withuser_dataset to the specified tag and all other fields zeroed.Note that resource updates, e.g.
REGISTER_BUFFERS_UPDATE, don't necessarily deallocate resources by the time it returns, but they might be held alive until all requests using it complete.Available since 5.13.
REGISTER_BUFFERS_UPDATE- Updates registered buffers with new ones, either turning a sparse entry into a real one, or replacing an existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of structiovec.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_IOWQ_AFF- By default, async workers created byio_uringwill inherit the CPU mask of its parent.This is usually all the CPUs in the system, unless the parent is being run with a limited set. If this isn't the desired outcome, the application may explicitly tell
io_uringwhat CPUs the async workers may run on.argmust point to acpu_set_tmask, andnr_argsthe byte size of that mask.Available since 5.14.
UNREGISTER_IOWQ_AFF- Undoes a CPU mask previously set withREGISTER_IOWQ_AFF.Must not have
argornr_argsset.Available since 5.14.
REGISTER_IOWQ_MAX_WORKERS- By default,io_uringlimits the unbounded workers created to the maximum processor count set byRLIMIT_NPROCand the bounded workers is a function of the SQ ring size and the number of CPUs in the system. Sometimes this can be excessive (or too little, for bounded), and this command provides a way to change the count per ring (per NUMA node) instead.argmust be set to an unsigned int pointer to an array of two values, with the values in the array being set to the maximum count of workers per NUMA node. Index 0 holds the bounded worker count, and index 1 holds the unbounded worker count. On successful return, the passed in array will contain the previous maximum valyes for each type. If the count being passed in is 0, then this command returns the current maximum values and doesn't modify the current setting.nr_argsmust be set to 2, as the command takes two values.Available since 5.15.
REGISTER_RING_FDS- Wheneverenteris called to submit request or wait for completions, the kernel must grab a reference to the file descriptor. If the application using io_uring is threaded, the file table is marked as shared, and the reference grab and put of the file descriptor count is more expensive than it is for a non-threaded application.Similarly to how io_uring allows registration of files, this allow registration of the ring file descriptor itself. This reduces the overhead of the
io_uring_enter (2)system call.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Thedatafield of this struct must point to an io_uring file descriptor, and theoffsetfield can be either-1or an explicit offset desired for the registered file descriptor value. If-1is used, then upon successful return of this system call, the field will contain the value of the registered file descriptor to be used for futureio_uring_enter (2)system calls.On successful completion of this request, the returned descriptors may be used instead of the real file descriptor for
io_uring_enter (2), provided thatIORING_ENTER_REGISTERED_RINGis set in theflagsfor the system call. This flag tells the kernel that a registered descriptor is used rather than a real file descriptor.Each thread or process using a ring must register the file descriptor directly by issuing this request.
The maximum number of supported registered ring descriptors is currently limited to
16.Available since 5.18.
UNREGISTER_RING_FDS- Unregister descriptors previously registered withREGISTER_RING_FDS.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Only theoffsetfield should be set in the structure, containing the registered file descriptor offset previously returned fromIORING_REGISTER_RING_FDSthat the application wishes to unregister.Note that this isn't done automatically on ring exit, if the thread or task that previously registered a ring file descriptor isn't exiting. It is recommended to manually unregister any previously registered ring descriptors if the ring is closed and the task persists. This will free up a registration slot, making it available for future use.
Available since 5.18.
REGISTER_PBUF_RING- register ring based provide buffer groupUNREGISTER_PBUF_RING- unregister ring based provide buffer groupREGISTER_SYNC_CANCEL- sync cancelation APIREGISTER_FILE_ALLOC_RANGE- register a range of fixed file slots for automatic slot allocationREGISTER_PBUF_STATUS- return status information for a buffer groupREGISTER_NAPI- set busy poll settingsUNREGISTER_NAPI- clear busy poll settingsREGISTER_CLOCKREGISTER_CLONE_BUFFERS- clone registered buffers from source ring to current ringREGISTER_LASTREGISTER_USE_REGISTERED_RING
- See Also:
-
IORING_REGISTER_PBUF_RING
public static final int IORING_REGISTER_PBUF_RINGio_uring_register_opEnum values:
REGISTER_BUFFERS-argpoints to a structiovecarray ofnr_argsentries.The buffers associated with the
iovecswill be locked in memory and charged against the user'sRLIMIT_MEMLOCKresource limit. Seegetrlimit(2)for more information. Additionally, there is a size limit of 1GiB per buffer. Currently, the buffers must be anonymous, non-file-backed memory, such as that returned bymalloc(3)ormmap(2)with theMAP_ANONYMOUSflag set. It is expected that this limitation will be lifted in the future. Huge pages are supported as well. Note that the entire huge page will be pinned in the kernel, even if only a portion of it is used.After a successful call, the supplied buffers are mapped into the kernel and eligible for I/O. To make use of them, the application must specify the
OP_READ_FIXEDorOP_WRITE_FIXEDopcodesin the submission queue entry (see the structio_uring_sqedefinition inenter), and set thebuf_indexfield to the desired buffer index. The memory range described by the submission queue entry'saddrandlenfields must fall within the indexed buffer.It is perfectly valid to setup a large buffer and then only use part of it for an I/O, as long as the range is within the originally mapped region.
An application can increase or decrease the size or number of registered buffers by first unregistering the existing buffers, and then issuing a new call to
io_uring_register()with the new buffers.Note that before 5.13 registering buffers would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding.
An application need not unregister buffers explicitly before shutting down the io_uring instance.
Available since 5.1.
UNREGISTER_BUFFERS- This operation takes no argument, andargmust be passed asNULL.All previously registered buffers associated with the
io_uringinstance will be released.Available since 5.1.
REGISTER_FILES- Register files for I/O.argcontains a pointer to an array ofnr_argsfile descriptors (signed 32 bit integers). To make use of the registered files, theIOSQE_FIXED_FILEflag must be set in theflagsmember of the structio_uring_sqe, and thefdmember is set to the index of the file in the file descriptor array.The file set may be sparse, meaning that the
fdfield in the array may be set to -1. SeeREGISTER_FILES_UPDATEfor how to update files in place.Note that before 5.13 registering files would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding. See
REGISTER_FILES_UPDATEfor how to update an existing set without that limitation.Files are automatically unregistered when the io_uring instance is torn down. An application needs only unregister if it wishes to register a new set of fds.
Available since 5.1.
UNREGISTER_FILES- This operation requires no argument, andargmust be passed asNULL.All previously registered files associated with the
io_uringinstance will be unregistered.Available since 5.1.
REGISTER_EVENTFD- It's possible to useeventfd(2)to get notified of completion events on anio_uringinstance. If this is desired, an eventfd file descriptor can be registered through this operation.argmust contain a pointer to the eventfd file descriptor, andnr_argsmust be 1.Available since 5.2.
An application can temporarily disable notifications, coming through the registered eventfd, by setting the
CQ_EVENTFD_DISABLEDbit in theflagsfield of the CQ ring.Available since 5.8.
UNREGISTER_EVENTFD- Unregister an eventfd file descriptor to stop notifications.Since only one eventfd descriptor is currently supported, this operation takes no argument, and
argmust be passed asNULLandnr_argsmust be zero.Available since 5.2.
REGISTER_FILES_UPDATE- This operation replaces existing files in the registered file set with new ones, either turning a sparse entry (one wherefdis equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_files_update, which contains an offset on which to start the update, and an array of file descriptors to use for the update.nr_argsmust contain the number of descriptors in the passed in array.Available since 5.5.
File descriptors can be skipped if they are set to
REGISTER_FILES_SKIP. Skipping an fd will not touch the file associated with the previous fd at that index.Available since 5.12.
REGISTER_EVENTFD_ASYNC- This works just likeREGISTER_EVENTFD, except notifications are only posted for events that complete in an async manner.This means that events that complete inline while being submitted do not trigger a notification event. The arguments supplied are the same as for
IORING_REGISTER_EVENTFD.Available since 5.6.
REGISTER_PROBE- This operation returns a structure,io_uring_probe, which contains information about theopcodessupported byio_uringon the running kernel.argmust contain a pointer to a structio_uring_probe, andnr_argsmust contain the size of the ops array in that probe struct. Theopsarray is of the typeio_uring_probe_op, which holds the value of theopcodeand aflagsfield. If the flags field hasIO_URING_OP_SUPPORTEDset, then this opcode is supported on the running kernel.Available since 5.6.
REGISTER_PERSONALITY- This operation registers credentials of the running application withio_uring, and returns an id associated with these credentials.Applications wishing to share a ring between separate users/processes can pass in this credential id in the sqe personality field. If set, that particular sqe will be issued with these credentials. Must be invoked with
argset toNULLandnr_argsset to zero.Available since 5.6.
UNREGISTER_PERSONALITY- This operation unregisters a previously registered personality withio_uring.nr_argsmust be set to the id in question, andargmust be set toNULL.Available since 5.6.
REGISTER_RESTRICTIONS-argpoints to a structio_uring_restrictionarray ofnr_argsentries.With an entry it is possible to allow an
registeropcode, or specify whichopcodeand flags of the submission queue entry are allowed, or require certain flags to be specified (these flags must be set on each submission queue entry).All the restrictions must be submitted with a single
io_uring_register()call and they are handled as an allowlist (opcodesand flags not registered, are not allowed).Restrictions can be registered only if the
io_uringring started in a disabled state (SETUP_R_DISABLEDmust be specified in the call tosetup).Available since 5.10.
REGISTER_ENABLE_RINGS- This operation enables anio_uringring started in a disabled state (SETUP_R_DISABLEDwas specified in the call tosetup).While the
io_uringring is disabled, submissions are not allowed and registrations are not restricted. After the execution of this operation, theio_uringring is enabled: submissions and registration are allowed, but they will be validated following the registered restrictions (if any). This operation takes no argument, must be invoked withargset toNULLandnr_argsset to zero.Available since 5.10.
REGISTER_FILES2- Register files for I/O. Similar toREGISTER_FILES.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The
datafield contains a pointer to an array ofnrfile descriptors (signed 32 bit integers).tagsfield should either be 0 or or point to an array ofnr"tags" (unsigned 64 bit integers). SeeREGISTER_BUFFERS2for more info on resource tagging.Note that resource updates, e.g.
REGISTER_FILES_UPDATE, don't necessarily deallocate resources, they might be held until all requests using that resource complete.Available since 5.13.
REGISTER_FILES_UPDATE2- Similar toREGISTER_FILES_UPDATE, replaces existing files in the registered file set with new ones, either turning a sparse entry (one where fd is equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of file descriptors to use for the update stored in data.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_BUFFERS2- Register buffers for I/O.Similar to
REGISTER_BUFFERSbut aims to have a more extensible ABI.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The data field contains a pointer to a struct
iovecarray ofnrentries. Thetagsfield should either be 0, then tagging is disabled, or point to an array ofnr"tags" (unsigned 64 bit integers). If a tag is zero, then tagging for this particular resource (a buffer in this case) is disabled. Otherwise, after the resource had been unregistered and it's not used anymore, a CQE will be posted withuser_dataset to the specified tag and all other fields zeroed.Note that resource updates, e.g.
REGISTER_BUFFERS_UPDATE, don't necessarily deallocate resources by the time it returns, but they might be held alive until all requests using it complete.Available since 5.13.
REGISTER_BUFFERS_UPDATE- Updates registered buffers with new ones, either turning a sparse entry into a real one, or replacing an existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of structiovec.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_IOWQ_AFF- By default, async workers created byio_uringwill inherit the CPU mask of its parent.This is usually all the CPUs in the system, unless the parent is being run with a limited set. If this isn't the desired outcome, the application may explicitly tell
io_uringwhat CPUs the async workers may run on.argmust point to acpu_set_tmask, andnr_argsthe byte size of that mask.Available since 5.14.
UNREGISTER_IOWQ_AFF- Undoes a CPU mask previously set withREGISTER_IOWQ_AFF.Must not have
argornr_argsset.Available since 5.14.
REGISTER_IOWQ_MAX_WORKERS- By default,io_uringlimits the unbounded workers created to the maximum processor count set byRLIMIT_NPROCand the bounded workers is a function of the SQ ring size and the number of CPUs in the system. Sometimes this can be excessive (or too little, for bounded), and this command provides a way to change the count per ring (per NUMA node) instead.argmust be set to an unsigned int pointer to an array of two values, with the values in the array being set to the maximum count of workers per NUMA node. Index 0 holds the bounded worker count, and index 1 holds the unbounded worker count. On successful return, the passed in array will contain the previous maximum valyes for each type. If the count being passed in is 0, then this command returns the current maximum values and doesn't modify the current setting.nr_argsmust be set to 2, as the command takes two values.Available since 5.15.
REGISTER_RING_FDS- Wheneverenteris called to submit request or wait for completions, the kernel must grab a reference to the file descriptor. If the application using io_uring is threaded, the file table is marked as shared, and the reference grab and put of the file descriptor count is more expensive than it is for a non-threaded application.Similarly to how io_uring allows registration of files, this allow registration of the ring file descriptor itself. This reduces the overhead of the
io_uring_enter (2)system call.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Thedatafield of this struct must point to an io_uring file descriptor, and theoffsetfield can be either-1or an explicit offset desired for the registered file descriptor value. If-1is used, then upon successful return of this system call, the field will contain the value of the registered file descriptor to be used for futureio_uring_enter (2)system calls.On successful completion of this request, the returned descriptors may be used instead of the real file descriptor for
io_uring_enter (2), provided thatIORING_ENTER_REGISTERED_RINGis set in theflagsfor the system call. This flag tells the kernel that a registered descriptor is used rather than a real file descriptor.Each thread or process using a ring must register the file descriptor directly by issuing this request.
The maximum number of supported registered ring descriptors is currently limited to
16.Available since 5.18.
UNREGISTER_RING_FDS- Unregister descriptors previously registered withREGISTER_RING_FDS.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Only theoffsetfield should be set in the structure, containing the registered file descriptor offset previously returned fromIORING_REGISTER_RING_FDSthat the application wishes to unregister.Note that this isn't done automatically on ring exit, if the thread or task that previously registered a ring file descriptor isn't exiting. It is recommended to manually unregister any previously registered ring descriptors if the ring is closed and the task persists. This will free up a registration slot, making it available for future use.
Available since 5.18.
REGISTER_PBUF_RING- register ring based provide buffer groupUNREGISTER_PBUF_RING- unregister ring based provide buffer groupREGISTER_SYNC_CANCEL- sync cancelation APIREGISTER_FILE_ALLOC_RANGE- register a range of fixed file slots for automatic slot allocationREGISTER_PBUF_STATUS- return status information for a buffer groupREGISTER_NAPI- set busy poll settingsUNREGISTER_NAPI- clear busy poll settingsREGISTER_CLOCKREGISTER_CLONE_BUFFERS- clone registered buffers from source ring to current ringREGISTER_LASTREGISTER_USE_REGISTERED_RING
- See Also:
-
IORING_UNREGISTER_PBUF_RING
public static final int IORING_UNREGISTER_PBUF_RINGio_uring_register_opEnum values:
REGISTER_BUFFERS-argpoints to a structiovecarray ofnr_argsentries.The buffers associated with the
iovecswill be locked in memory and charged against the user'sRLIMIT_MEMLOCKresource limit. Seegetrlimit(2)for more information. Additionally, there is a size limit of 1GiB per buffer. Currently, the buffers must be anonymous, non-file-backed memory, such as that returned bymalloc(3)ormmap(2)with theMAP_ANONYMOUSflag set. It is expected that this limitation will be lifted in the future. Huge pages are supported as well. Note that the entire huge page will be pinned in the kernel, even if only a portion of it is used.After a successful call, the supplied buffers are mapped into the kernel and eligible for I/O. To make use of them, the application must specify the
OP_READ_FIXEDorOP_WRITE_FIXEDopcodesin the submission queue entry (see the structio_uring_sqedefinition inenter), and set thebuf_indexfield to the desired buffer index. The memory range described by the submission queue entry'saddrandlenfields must fall within the indexed buffer.It is perfectly valid to setup a large buffer and then only use part of it for an I/O, as long as the range is within the originally mapped region.
An application can increase or decrease the size or number of registered buffers by first unregistering the existing buffers, and then issuing a new call to
io_uring_register()with the new buffers.Note that before 5.13 registering buffers would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding.
An application need not unregister buffers explicitly before shutting down the io_uring instance.
Available since 5.1.
UNREGISTER_BUFFERS- This operation takes no argument, andargmust be passed asNULL.All previously registered buffers associated with the
io_uringinstance will be released.Available since 5.1.
REGISTER_FILES- Register files for I/O.argcontains a pointer to an array ofnr_argsfile descriptors (signed 32 bit integers). To make use of the registered files, theIOSQE_FIXED_FILEflag must be set in theflagsmember of the structio_uring_sqe, and thefdmember is set to the index of the file in the file descriptor array.The file set may be sparse, meaning that the
fdfield in the array may be set to -1. SeeREGISTER_FILES_UPDATEfor how to update files in place.Note that before 5.13 registering files would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding. See
REGISTER_FILES_UPDATEfor how to update an existing set without that limitation.Files are automatically unregistered when the io_uring instance is torn down. An application needs only unregister if it wishes to register a new set of fds.
Available since 5.1.
UNREGISTER_FILES- This operation requires no argument, andargmust be passed asNULL.All previously registered files associated with the
io_uringinstance will be unregistered.Available since 5.1.
REGISTER_EVENTFD- It's possible to useeventfd(2)to get notified of completion events on anio_uringinstance. If this is desired, an eventfd file descriptor can be registered through this operation.argmust contain a pointer to the eventfd file descriptor, andnr_argsmust be 1.Available since 5.2.
An application can temporarily disable notifications, coming through the registered eventfd, by setting the
CQ_EVENTFD_DISABLEDbit in theflagsfield of the CQ ring.Available since 5.8.
UNREGISTER_EVENTFD- Unregister an eventfd file descriptor to stop notifications.Since only one eventfd descriptor is currently supported, this operation takes no argument, and
argmust be passed asNULLandnr_argsmust be zero.Available since 5.2.
REGISTER_FILES_UPDATE- This operation replaces existing files in the registered file set with new ones, either turning a sparse entry (one wherefdis equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_files_update, which contains an offset on which to start the update, and an array of file descriptors to use for the update.nr_argsmust contain the number of descriptors in the passed in array.Available since 5.5.
File descriptors can be skipped if they are set to
REGISTER_FILES_SKIP. Skipping an fd will not touch the file associated with the previous fd at that index.Available since 5.12.
REGISTER_EVENTFD_ASYNC- This works just likeREGISTER_EVENTFD, except notifications are only posted for events that complete in an async manner.This means that events that complete inline while being submitted do not trigger a notification event. The arguments supplied are the same as for
IORING_REGISTER_EVENTFD.Available since 5.6.
REGISTER_PROBE- This operation returns a structure,io_uring_probe, which contains information about theopcodessupported byio_uringon the running kernel.argmust contain a pointer to a structio_uring_probe, andnr_argsmust contain the size of the ops array in that probe struct. Theopsarray is of the typeio_uring_probe_op, which holds the value of theopcodeand aflagsfield. If the flags field hasIO_URING_OP_SUPPORTEDset, then this opcode is supported on the running kernel.Available since 5.6.
REGISTER_PERSONALITY- This operation registers credentials of the running application withio_uring, and returns an id associated with these credentials.Applications wishing to share a ring between separate users/processes can pass in this credential id in the sqe personality field. If set, that particular sqe will be issued with these credentials. Must be invoked with
argset toNULLandnr_argsset to zero.Available since 5.6.
UNREGISTER_PERSONALITY- This operation unregisters a previously registered personality withio_uring.nr_argsmust be set to the id in question, andargmust be set toNULL.Available since 5.6.
REGISTER_RESTRICTIONS-argpoints to a structio_uring_restrictionarray ofnr_argsentries.With an entry it is possible to allow an
registeropcode, or specify whichopcodeand flags of the submission queue entry are allowed, or require certain flags to be specified (these flags must be set on each submission queue entry).All the restrictions must be submitted with a single
io_uring_register()call and they are handled as an allowlist (opcodesand flags not registered, are not allowed).Restrictions can be registered only if the
io_uringring started in a disabled state (SETUP_R_DISABLEDmust be specified in the call tosetup).Available since 5.10.
REGISTER_ENABLE_RINGS- This operation enables anio_uringring started in a disabled state (SETUP_R_DISABLEDwas specified in the call tosetup).While the
io_uringring is disabled, submissions are not allowed and registrations are not restricted. After the execution of this operation, theio_uringring is enabled: submissions and registration are allowed, but they will be validated following the registered restrictions (if any). This operation takes no argument, must be invoked withargset toNULLandnr_argsset to zero.Available since 5.10.
REGISTER_FILES2- Register files for I/O. Similar toREGISTER_FILES.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The
datafield contains a pointer to an array ofnrfile descriptors (signed 32 bit integers).tagsfield should either be 0 or or point to an array ofnr"tags" (unsigned 64 bit integers). SeeREGISTER_BUFFERS2for more info on resource tagging.Note that resource updates, e.g.
REGISTER_FILES_UPDATE, don't necessarily deallocate resources, they might be held until all requests using that resource complete.Available since 5.13.
REGISTER_FILES_UPDATE2- Similar toREGISTER_FILES_UPDATE, replaces existing files in the registered file set with new ones, either turning a sparse entry (one where fd is equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of file descriptors to use for the update stored in data.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_BUFFERS2- Register buffers for I/O.Similar to
REGISTER_BUFFERSbut aims to have a more extensible ABI.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The data field contains a pointer to a struct
iovecarray ofnrentries. Thetagsfield should either be 0, then tagging is disabled, or point to an array ofnr"tags" (unsigned 64 bit integers). If a tag is zero, then tagging for this particular resource (a buffer in this case) is disabled. Otherwise, after the resource had been unregistered and it's not used anymore, a CQE will be posted withuser_dataset to the specified tag and all other fields zeroed.Note that resource updates, e.g.
REGISTER_BUFFERS_UPDATE, don't necessarily deallocate resources by the time it returns, but they might be held alive until all requests using it complete.Available since 5.13.
REGISTER_BUFFERS_UPDATE- Updates registered buffers with new ones, either turning a sparse entry into a real one, or replacing an existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of structiovec.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_IOWQ_AFF- By default, async workers created byio_uringwill inherit the CPU mask of its parent.This is usually all the CPUs in the system, unless the parent is being run with a limited set. If this isn't the desired outcome, the application may explicitly tell
io_uringwhat CPUs the async workers may run on.argmust point to acpu_set_tmask, andnr_argsthe byte size of that mask.Available since 5.14.
UNREGISTER_IOWQ_AFF- Undoes a CPU mask previously set withREGISTER_IOWQ_AFF.Must not have
argornr_argsset.Available since 5.14.
REGISTER_IOWQ_MAX_WORKERS- By default,io_uringlimits the unbounded workers created to the maximum processor count set byRLIMIT_NPROCand the bounded workers is a function of the SQ ring size and the number of CPUs in the system. Sometimes this can be excessive (or too little, for bounded), and this command provides a way to change the count per ring (per NUMA node) instead.argmust be set to an unsigned int pointer to an array of two values, with the values in the array being set to the maximum count of workers per NUMA node. Index 0 holds the bounded worker count, and index 1 holds the unbounded worker count. On successful return, the passed in array will contain the previous maximum valyes for each type. If the count being passed in is 0, then this command returns the current maximum values and doesn't modify the current setting.nr_argsmust be set to 2, as the command takes two values.Available since 5.15.
REGISTER_RING_FDS- Wheneverenteris called to submit request or wait for completions, the kernel must grab a reference to the file descriptor. If the application using io_uring is threaded, the file table is marked as shared, and the reference grab and put of the file descriptor count is more expensive than it is for a non-threaded application.Similarly to how io_uring allows registration of files, this allow registration of the ring file descriptor itself. This reduces the overhead of the
io_uring_enter (2)system call.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Thedatafield of this struct must point to an io_uring file descriptor, and theoffsetfield can be either-1or an explicit offset desired for the registered file descriptor value. If-1is used, then upon successful return of this system call, the field will contain the value of the registered file descriptor to be used for futureio_uring_enter (2)system calls.On successful completion of this request, the returned descriptors may be used instead of the real file descriptor for
io_uring_enter (2), provided thatIORING_ENTER_REGISTERED_RINGis set in theflagsfor the system call. This flag tells the kernel that a registered descriptor is used rather than a real file descriptor.Each thread or process using a ring must register the file descriptor directly by issuing this request.
The maximum number of supported registered ring descriptors is currently limited to
16.Available since 5.18.
UNREGISTER_RING_FDS- Unregister descriptors previously registered withREGISTER_RING_FDS.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Only theoffsetfield should be set in the structure, containing the registered file descriptor offset previously returned fromIORING_REGISTER_RING_FDSthat the application wishes to unregister.Note that this isn't done automatically on ring exit, if the thread or task that previously registered a ring file descriptor isn't exiting. It is recommended to manually unregister any previously registered ring descriptors if the ring is closed and the task persists. This will free up a registration slot, making it available for future use.
Available since 5.18.
REGISTER_PBUF_RING- register ring based provide buffer groupUNREGISTER_PBUF_RING- unregister ring based provide buffer groupREGISTER_SYNC_CANCEL- sync cancelation APIREGISTER_FILE_ALLOC_RANGE- register a range of fixed file slots for automatic slot allocationREGISTER_PBUF_STATUS- return status information for a buffer groupREGISTER_NAPI- set busy poll settingsUNREGISTER_NAPI- clear busy poll settingsREGISTER_CLOCKREGISTER_CLONE_BUFFERS- clone registered buffers from source ring to current ringREGISTER_LASTREGISTER_USE_REGISTERED_RING
- See Also:
-
IORING_REGISTER_SYNC_CANCEL
public static final int IORING_REGISTER_SYNC_CANCELio_uring_register_opEnum values:
REGISTER_BUFFERS-argpoints to a structiovecarray ofnr_argsentries.The buffers associated with the
iovecswill be locked in memory and charged against the user'sRLIMIT_MEMLOCKresource limit. Seegetrlimit(2)for more information. Additionally, there is a size limit of 1GiB per buffer. Currently, the buffers must be anonymous, non-file-backed memory, such as that returned bymalloc(3)ormmap(2)with theMAP_ANONYMOUSflag set. It is expected that this limitation will be lifted in the future. Huge pages are supported as well. Note that the entire huge page will be pinned in the kernel, even if only a portion of it is used.After a successful call, the supplied buffers are mapped into the kernel and eligible for I/O. To make use of them, the application must specify the
OP_READ_FIXEDorOP_WRITE_FIXEDopcodesin the submission queue entry (see the structio_uring_sqedefinition inenter), and set thebuf_indexfield to the desired buffer index. The memory range described by the submission queue entry'saddrandlenfields must fall within the indexed buffer.It is perfectly valid to setup a large buffer and then only use part of it for an I/O, as long as the range is within the originally mapped region.
An application can increase or decrease the size or number of registered buffers by first unregistering the existing buffers, and then issuing a new call to
io_uring_register()with the new buffers.Note that before 5.13 registering buffers would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding.
An application need not unregister buffers explicitly before shutting down the io_uring instance.
Available since 5.1.
UNREGISTER_BUFFERS- This operation takes no argument, andargmust be passed asNULL.All previously registered buffers associated with the
io_uringinstance will be released.Available since 5.1.
REGISTER_FILES- Register files for I/O.argcontains a pointer to an array ofnr_argsfile descriptors (signed 32 bit integers). To make use of the registered files, theIOSQE_FIXED_FILEflag must be set in theflagsmember of the structio_uring_sqe, and thefdmember is set to the index of the file in the file descriptor array.The file set may be sparse, meaning that the
fdfield in the array may be set to -1. SeeREGISTER_FILES_UPDATEfor how to update files in place.Note that before 5.13 registering files would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding. See
REGISTER_FILES_UPDATEfor how to update an existing set without that limitation.Files are automatically unregistered when the io_uring instance is torn down. An application needs only unregister if it wishes to register a new set of fds.
Available since 5.1.
UNREGISTER_FILES- This operation requires no argument, andargmust be passed asNULL.All previously registered files associated with the
io_uringinstance will be unregistered.Available since 5.1.
REGISTER_EVENTFD- It's possible to useeventfd(2)to get notified of completion events on anio_uringinstance. If this is desired, an eventfd file descriptor can be registered through this operation.argmust contain a pointer to the eventfd file descriptor, andnr_argsmust be 1.Available since 5.2.
An application can temporarily disable notifications, coming through the registered eventfd, by setting the
CQ_EVENTFD_DISABLEDbit in theflagsfield of the CQ ring.Available since 5.8.
UNREGISTER_EVENTFD- Unregister an eventfd file descriptor to stop notifications.Since only one eventfd descriptor is currently supported, this operation takes no argument, and
argmust be passed asNULLandnr_argsmust be zero.Available since 5.2.
REGISTER_FILES_UPDATE- This operation replaces existing files in the registered file set with new ones, either turning a sparse entry (one wherefdis equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_files_update, which contains an offset on which to start the update, and an array of file descriptors to use for the update.nr_argsmust contain the number of descriptors in the passed in array.Available since 5.5.
File descriptors can be skipped if they are set to
REGISTER_FILES_SKIP. Skipping an fd will not touch the file associated with the previous fd at that index.Available since 5.12.
REGISTER_EVENTFD_ASYNC- This works just likeREGISTER_EVENTFD, except notifications are only posted for events that complete in an async manner.This means that events that complete inline while being submitted do not trigger a notification event. The arguments supplied are the same as for
IORING_REGISTER_EVENTFD.Available since 5.6.
REGISTER_PROBE- This operation returns a structure,io_uring_probe, which contains information about theopcodessupported byio_uringon the running kernel.argmust contain a pointer to a structio_uring_probe, andnr_argsmust contain the size of the ops array in that probe struct. Theopsarray is of the typeio_uring_probe_op, which holds the value of theopcodeand aflagsfield. If the flags field hasIO_URING_OP_SUPPORTEDset, then this opcode is supported on the running kernel.Available since 5.6.
REGISTER_PERSONALITY- This operation registers credentials of the running application withio_uring, and returns an id associated with these credentials.Applications wishing to share a ring between separate users/processes can pass in this credential id in the sqe personality field. If set, that particular sqe will be issued with these credentials. Must be invoked with
argset toNULLandnr_argsset to zero.Available since 5.6.
UNREGISTER_PERSONALITY- This operation unregisters a previously registered personality withio_uring.nr_argsmust be set to the id in question, andargmust be set toNULL.Available since 5.6.
REGISTER_RESTRICTIONS-argpoints to a structio_uring_restrictionarray ofnr_argsentries.With an entry it is possible to allow an
registeropcode, or specify whichopcodeand flags of the submission queue entry are allowed, or require certain flags to be specified (these flags must be set on each submission queue entry).All the restrictions must be submitted with a single
io_uring_register()call and they are handled as an allowlist (opcodesand flags not registered, are not allowed).Restrictions can be registered only if the
io_uringring started in a disabled state (SETUP_R_DISABLEDmust be specified in the call tosetup).Available since 5.10.
REGISTER_ENABLE_RINGS- This operation enables anio_uringring started in a disabled state (SETUP_R_DISABLEDwas specified in the call tosetup).While the
io_uringring is disabled, submissions are not allowed and registrations are not restricted. After the execution of this operation, theio_uringring is enabled: submissions and registration are allowed, but they will be validated following the registered restrictions (if any). This operation takes no argument, must be invoked withargset toNULLandnr_argsset to zero.Available since 5.10.
REGISTER_FILES2- Register files for I/O. Similar toREGISTER_FILES.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The
datafield contains a pointer to an array ofnrfile descriptors (signed 32 bit integers).tagsfield should either be 0 or or point to an array ofnr"tags" (unsigned 64 bit integers). SeeREGISTER_BUFFERS2for more info on resource tagging.Note that resource updates, e.g.
REGISTER_FILES_UPDATE, don't necessarily deallocate resources, they might be held until all requests using that resource complete.Available since 5.13.
REGISTER_FILES_UPDATE2- Similar toREGISTER_FILES_UPDATE, replaces existing files in the registered file set with new ones, either turning a sparse entry (one where fd is equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of file descriptors to use for the update stored in data.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_BUFFERS2- Register buffers for I/O.Similar to
REGISTER_BUFFERSbut aims to have a more extensible ABI.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The data field contains a pointer to a struct
iovecarray ofnrentries. Thetagsfield should either be 0, then tagging is disabled, or point to an array ofnr"tags" (unsigned 64 bit integers). If a tag is zero, then tagging for this particular resource (a buffer in this case) is disabled. Otherwise, after the resource had been unregistered and it's not used anymore, a CQE will be posted withuser_dataset to the specified tag and all other fields zeroed.Note that resource updates, e.g.
REGISTER_BUFFERS_UPDATE, don't necessarily deallocate resources by the time it returns, but they might be held alive until all requests using it complete.Available since 5.13.
REGISTER_BUFFERS_UPDATE- Updates registered buffers with new ones, either turning a sparse entry into a real one, or replacing an existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of structiovec.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_IOWQ_AFF- By default, async workers created byio_uringwill inherit the CPU mask of its parent.This is usually all the CPUs in the system, unless the parent is being run with a limited set. If this isn't the desired outcome, the application may explicitly tell
io_uringwhat CPUs the async workers may run on.argmust point to acpu_set_tmask, andnr_argsthe byte size of that mask.Available since 5.14.
UNREGISTER_IOWQ_AFF- Undoes a CPU mask previously set withREGISTER_IOWQ_AFF.Must not have
argornr_argsset.Available since 5.14.
REGISTER_IOWQ_MAX_WORKERS- By default,io_uringlimits the unbounded workers created to the maximum processor count set byRLIMIT_NPROCand the bounded workers is a function of the SQ ring size and the number of CPUs in the system. Sometimes this can be excessive (or too little, for bounded), and this command provides a way to change the count per ring (per NUMA node) instead.argmust be set to an unsigned int pointer to an array of two values, with the values in the array being set to the maximum count of workers per NUMA node. Index 0 holds the bounded worker count, and index 1 holds the unbounded worker count. On successful return, the passed in array will contain the previous maximum valyes for each type. If the count being passed in is 0, then this command returns the current maximum values and doesn't modify the current setting.nr_argsmust be set to 2, as the command takes two values.Available since 5.15.
REGISTER_RING_FDS- Wheneverenteris called to submit request or wait for completions, the kernel must grab a reference to the file descriptor. If the application using io_uring is threaded, the file table is marked as shared, and the reference grab and put of the file descriptor count is more expensive than it is for a non-threaded application.Similarly to how io_uring allows registration of files, this allow registration of the ring file descriptor itself. This reduces the overhead of the
io_uring_enter (2)system call.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Thedatafield of this struct must point to an io_uring file descriptor, and theoffsetfield can be either-1or an explicit offset desired for the registered file descriptor value. If-1is used, then upon successful return of this system call, the field will contain the value of the registered file descriptor to be used for futureio_uring_enter (2)system calls.On successful completion of this request, the returned descriptors may be used instead of the real file descriptor for
io_uring_enter (2), provided thatIORING_ENTER_REGISTERED_RINGis set in theflagsfor the system call. This flag tells the kernel that a registered descriptor is used rather than a real file descriptor.Each thread or process using a ring must register the file descriptor directly by issuing this request.
The maximum number of supported registered ring descriptors is currently limited to
16.Available since 5.18.
UNREGISTER_RING_FDS- Unregister descriptors previously registered withREGISTER_RING_FDS.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Only theoffsetfield should be set in the structure, containing the registered file descriptor offset previously returned fromIORING_REGISTER_RING_FDSthat the application wishes to unregister.Note that this isn't done automatically on ring exit, if the thread or task that previously registered a ring file descriptor isn't exiting. It is recommended to manually unregister any previously registered ring descriptors if the ring is closed and the task persists. This will free up a registration slot, making it available for future use.
Available since 5.18.
REGISTER_PBUF_RING- register ring based provide buffer groupUNREGISTER_PBUF_RING- unregister ring based provide buffer groupREGISTER_SYNC_CANCEL- sync cancelation APIREGISTER_FILE_ALLOC_RANGE- register a range of fixed file slots for automatic slot allocationREGISTER_PBUF_STATUS- return status information for a buffer groupREGISTER_NAPI- set busy poll settingsUNREGISTER_NAPI- clear busy poll settingsREGISTER_CLOCKREGISTER_CLONE_BUFFERS- clone registered buffers from source ring to current ringREGISTER_LASTREGISTER_USE_REGISTERED_RING
- See Also:
-
IORING_REGISTER_FILE_ALLOC_RANGE
public static final int IORING_REGISTER_FILE_ALLOC_RANGEio_uring_register_opEnum values:
REGISTER_BUFFERS-argpoints to a structiovecarray ofnr_argsentries.The buffers associated with the
iovecswill be locked in memory and charged against the user'sRLIMIT_MEMLOCKresource limit. Seegetrlimit(2)for more information. Additionally, there is a size limit of 1GiB per buffer. Currently, the buffers must be anonymous, non-file-backed memory, such as that returned bymalloc(3)ormmap(2)with theMAP_ANONYMOUSflag set. It is expected that this limitation will be lifted in the future. Huge pages are supported as well. Note that the entire huge page will be pinned in the kernel, even if only a portion of it is used.After a successful call, the supplied buffers are mapped into the kernel and eligible for I/O. To make use of them, the application must specify the
OP_READ_FIXEDorOP_WRITE_FIXEDopcodesin the submission queue entry (see the structio_uring_sqedefinition inenter), and set thebuf_indexfield to the desired buffer index. The memory range described by the submission queue entry'saddrandlenfields must fall within the indexed buffer.It is perfectly valid to setup a large buffer and then only use part of it for an I/O, as long as the range is within the originally mapped region.
An application can increase or decrease the size or number of registered buffers by first unregistering the existing buffers, and then issuing a new call to
io_uring_register()with the new buffers.Note that before 5.13 registering buffers would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding.
An application need not unregister buffers explicitly before shutting down the io_uring instance.
Available since 5.1.
UNREGISTER_BUFFERS- This operation takes no argument, andargmust be passed asNULL.All previously registered buffers associated with the
io_uringinstance will be released.Available since 5.1.
REGISTER_FILES- Register files for I/O.argcontains a pointer to an array ofnr_argsfile descriptors (signed 32 bit integers). To make use of the registered files, theIOSQE_FIXED_FILEflag must be set in theflagsmember of the structio_uring_sqe, and thefdmember is set to the index of the file in the file descriptor array.The file set may be sparse, meaning that the
fdfield in the array may be set to -1. SeeREGISTER_FILES_UPDATEfor how to update files in place.Note that before 5.13 registering files would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding. See
REGISTER_FILES_UPDATEfor how to update an existing set without that limitation.Files are automatically unregistered when the io_uring instance is torn down. An application needs only unregister if it wishes to register a new set of fds.
Available since 5.1.
UNREGISTER_FILES- This operation requires no argument, andargmust be passed asNULL.All previously registered files associated with the
io_uringinstance will be unregistered.Available since 5.1.
REGISTER_EVENTFD- It's possible to useeventfd(2)to get notified of completion events on anio_uringinstance. If this is desired, an eventfd file descriptor can be registered through this operation.argmust contain a pointer to the eventfd file descriptor, andnr_argsmust be 1.Available since 5.2.
An application can temporarily disable notifications, coming through the registered eventfd, by setting the
CQ_EVENTFD_DISABLEDbit in theflagsfield of the CQ ring.Available since 5.8.
UNREGISTER_EVENTFD- Unregister an eventfd file descriptor to stop notifications.Since only one eventfd descriptor is currently supported, this operation takes no argument, and
argmust be passed asNULLandnr_argsmust be zero.Available since 5.2.
REGISTER_FILES_UPDATE- This operation replaces existing files in the registered file set with new ones, either turning a sparse entry (one wherefdis equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_files_update, which contains an offset on which to start the update, and an array of file descriptors to use for the update.nr_argsmust contain the number of descriptors in the passed in array.Available since 5.5.
File descriptors can be skipped if they are set to
REGISTER_FILES_SKIP. Skipping an fd will not touch the file associated with the previous fd at that index.Available since 5.12.
REGISTER_EVENTFD_ASYNC- This works just likeREGISTER_EVENTFD, except notifications are only posted for events that complete in an async manner.This means that events that complete inline while being submitted do not trigger a notification event. The arguments supplied are the same as for
IORING_REGISTER_EVENTFD.Available since 5.6.
REGISTER_PROBE- This operation returns a structure,io_uring_probe, which contains information about theopcodessupported byio_uringon the running kernel.argmust contain a pointer to a structio_uring_probe, andnr_argsmust contain the size of the ops array in that probe struct. Theopsarray is of the typeio_uring_probe_op, which holds the value of theopcodeand aflagsfield. If the flags field hasIO_URING_OP_SUPPORTEDset, then this opcode is supported on the running kernel.Available since 5.6.
REGISTER_PERSONALITY- This operation registers credentials of the running application withio_uring, and returns an id associated with these credentials.Applications wishing to share a ring between separate users/processes can pass in this credential id in the sqe personality field. If set, that particular sqe will be issued with these credentials. Must be invoked with
argset toNULLandnr_argsset to zero.Available since 5.6.
UNREGISTER_PERSONALITY- This operation unregisters a previously registered personality withio_uring.nr_argsmust be set to the id in question, andargmust be set toNULL.Available since 5.6.
REGISTER_RESTRICTIONS-argpoints to a structio_uring_restrictionarray ofnr_argsentries.With an entry it is possible to allow an
registeropcode, or specify whichopcodeand flags of the submission queue entry are allowed, or require certain flags to be specified (these flags must be set on each submission queue entry).All the restrictions must be submitted with a single
io_uring_register()call and they are handled as an allowlist (opcodesand flags not registered, are not allowed).Restrictions can be registered only if the
io_uringring started in a disabled state (SETUP_R_DISABLEDmust be specified in the call tosetup).Available since 5.10.
REGISTER_ENABLE_RINGS- This operation enables anio_uringring started in a disabled state (SETUP_R_DISABLEDwas specified in the call tosetup).While the
io_uringring is disabled, submissions are not allowed and registrations are not restricted. After the execution of this operation, theio_uringring is enabled: submissions and registration are allowed, but they will be validated following the registered restrictions (if any). This operation takes no argument, must be invoked withargset toNULLandnr_argsset to zero.Available since 5.10.
REGISTER_FILES2- Register files for I/O. Similar toREGISTER_FILES.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The
datafield contains a pointer to an array ofnrfile descriptors (signed 32 bit integers).tagsfield should either be 0 or or point to an array ofnr"tags" (unsigned 64 bit integers). SeeREGISTER_BUFFERS2for more info on resource tagging.Note that resource updates, e.g.
REGISTER_FILES_UPDATE, don't necessarily deallocate resources, they might be held until all requests using that resource complete.Available since 5.13.
REGISTER_FILES_UPDATE2- Similar toREGISTER_FILES_UPDATE, replaces existing files in the registered file set with new ones, either turning a sparse entry (one where fd is equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of file descriptors to use for the update stored in data.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_BUFFERS2- Register buffers for I/O.Similar to
REGISTER_BUFFERSbut aims to have a more extensible ABI.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The data field contains a pointer to a struct
iovecarray ofnrentries. Thetagsfield should either be 0, then tagging is disabled, or point to an array ofnr"tags" (unsigned 64 bit integers). If a tag is zero, then tagging for this particular resource (a buffer in this case) is disabled. Otherwise, after the resource had been unregistered and it's not used anymore, a CQE will be posted withuser_dataset to the specified tag and all other fields zeroed.Note that resource updates, e.g.
REGISTER_BUFFERS_UPDATE, don't necessarily deallocate resources by the time it returns, but they might be held alive until all requests using it complete.Available since 5.13.
REGISTER_BUFFERS_UPDATE- Updates registered buffers with new ones, either turning a sparse entry into a real one, or replacing an existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of structiovec.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_IOWQ_AFF- By default, async workers created byio_uringwill inherit the CPU mask of its parent.This is usually all the CPUs in the system, unless the parent is being run with a limited set. If this isn't the desired outcome, the application may explicitly tell
io_uringwhat CPUs the async workers may run on.argmust point to acpu_set_tmask, andnr_argsthe byte size of that mask.Available since 5.14.
UNREGISTER_IOWQ_AFF- Undoes a CPU mask previously set withREGISTER_IOWQ_AFF.Must not have
argornr_argsset.Available since 5.14.
REGISTER_IOWQ_MAX_WORKERS- By default,io_uringlimits the unbounded workers created to the maximum processor count set byRLIMIT_NPROCand the bounded workers is a function of the SQ ring size and the number of CPUs in the system. Sometimes this can be excessive (or too little, for bounded), and this command provides a way to change the count per ring (per NUMA node) instead.argmust be set to an unsigned int pointer to an array of two values, with the values in the array being set to the maximum count of workers per NUMA node. Index 0 holds the bounded worker count, and index 1 holds the unbounded worker count. On successful return, the passed in array will contain the previous maximum valyes for each type. If the count being passed in is 0, then this command returns the current maximum values and doesn't modify the current setting.nr_argsmust be set to 2, as the command takes two values.Available since 5.15.
REGISTER_RING_FDS- Wheneverenteris called to submit request or wait for completions, the kernel must grab a reference to the file descriptor. If the application using io_uring is threaded, the file table is marked as shared, and the reference grab and put of the file descriptor count is more expensive than it is for a non-threaded application.Similarly to how io_uring allows registration of files, this allow registration of the ring file descriptor itself. This reduces the overhead of the
io_uring_enter (2)system call.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Thedatafield of this struct must point to an io_uring file descriptor, and theoffsetfield can be either-1or an explicit offset desired for the registered file descriptor value. If-1is used, then upon successful return of this system call, the field will contain the value of the registered file descriptor to be used for futureio_uring_enter (2)system calls.On successful completion of this request, the returned descriptors may be used instead of the real file descriptor for
io_uring_enter (2), provided thatIORING_ENTER_REGISTERED_RINGis set in theflagsfor the system call. This flag tells the kernel that a registered descriptor is used rather than a real file descriptor.Each thread or process using a ring must register the file descriptor directly by issuing this request.
The maximum number of supported registered ring descriptors is currently limited to
16.Available since 5.18.
UNREGISTER_RING_FDS- Unregister descriptors previously registered withREGISTER_RING_FDS.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Only theoffsetfield should be set in the structure, containing the registered file descriptor offset previously returned fromIORING_REGISTER_RING_FDSthat the application wishes to unregister.Note that this isn't done automatically on ring exit, if the thread or task that previously registered a ring file descriptor isn't exiting. It is recommended to manually unregister any previously registered ring descriptors if the ring is closed and the task persists. This will free up a registration slot, making it available for future use.
Available since 5.18.
REGISTER_PBUF_RING- register ring based provide buffer groupUNREGISTER_PBUF_RING- unregister ring based provide buffer groupREGISTER_SYNC_CANCEL- sync cancelation APIREGISTER_FILE_ALLOC_RANGE- register a range of fixed file slots for automatic slot allocationREGISTER_PBUF_STATUS- return status information for a buffer groupREGISTER_NAPI- set busy poll settingsUNREGISTER_NAPI- clear busy poll settingsREGISTER_CLOCKREGISTER_CLONE_BUFFERS- clone registered buffers from source ring to current ringREGISTER_LASTREGISTER_USE_REGISTERED_RING
- See Also:
-
IORING_REGISTER_PBUF_STATUS
public static final int IORING_REGISTER_PBUF_STATUSio_uring_register_opEnum values:
REGISTER_BUFFERS-argpoints to a structiovecarray ofnr_argsentries.The buffers associated with the
iovecswill be locked in memory and charged against the user'sRLIMIT_MEMLOCKresource limit. Seegetrlimit(2)for more information. Additionally, there is a size limit of 1GiB per buffer. Currently, the buffers must be anonymous, non-file-backed memory, such as that returned bymalloc(3)ormmap(2)with theMAP_ANONYMOUSflag set. It is expected that this limitation will be lifted in the future. Huge pages are supported as well. Note that the entire huge page will be pinned in the kernel, even if only a portion of it is used.After a successful call, the supplied buffers are mapped into the kernel and eligible for I/O. To make use of them, the application must specify the
OP_READ_FIXEDorOP_WRITE_FIXEDopcodesin the submission queue entry (see the structio_uring_sqedefinition inenter), and set thebuf_indexfield to the desired buffer index. The memory range described by the submission queue entry'saddrandlenfields must fall within the indexed buffer.It is perfectly valid to setup a large buffer and then only use part of it for an I/O, as long as the range is within the originally mapped region.
An application can increase or decrease the size or number of registered buffers by first unregistering the existing buffers, and then issuing a new call to
io_uring_register()with the new buffers.Note that before 5.13 registering buffers would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding.
An application need not unregister buffers explicitly before shutting down the io_uring instance.
Available since 5.1.
UNREGISTER_BUFFERS- This operation takes no argument, andargmust be passed asNULL.All previously registered buffers associated with the
io_uringinstance will be released.Available since 5.1.
REGISTER_FILES- Register files for I/O.argcontains a pointer to an array ofnr_argsfile descriptors (signed 32 bit integers). To make use of the registered files, theIOSQE_FIXED_FILEflag must be set in theflagsmember of the structio_uring_sqe, and thefdmember is set to the index of the file in the file descriptor array.The file set may be sparse, meaning that the
fdfield in the array may be set to -1. SeeREGISTER_FILES_UPDATEfor how to update files in place.Note that before 5.13 registering files would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding. See
REGISTER_FILES_UPDATEfor how to update an existing set without that limitation.Files are automatically unregistered when the io_uring instance is torn down. An application needs only unregister if it wishes to register a new set of fds.
Available since 5.1.
UNREGISTER_FILES- This operation requires no argument, andargmust be passed asNULL.All previously registered files associated with the
io_uringinstance will be unregistered.Available since 5.1.
REGISTER_EVENTFD- It's possible to useeventfd(2)to get notified of completion events on anio_uringinstance. If this is desired, an eventfd file descriptor can be registered through this operation.argmust contain a pointer to the eventfd file descriptor, andnr_argsmust be 1.Available since 5.2.
An application can temporarily disable notifications, coming through the registered eventfd, by setting the
CQ_EVENTFD_DISABLEDbit in theflagsfield of the CQ ring.Available since 5.8.
UNREGISTER_EVENTFD- Unregister an eventfd file descriptor to stop notifications.Since only one eventfd descriptor is currently supported, this operation takes no argument, and
argmust be passed asNULLandnr_argsmust be zero.Available since 5.2.
REGISTER_FILES_UPDATE- This operation replaces existing files in the registered file set with new ones, either turning a sparse entry (one wherefdis equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_files_update, which contains an offset on which to start the update, and an array of file descriptors to use for the update.nr_argsmust contain the number of descriptors in the passed in array.Available since 5.5.
File descriptors can be skipped if they are set to
REGISTER_FILES_SKIP. Skipping an fd will not touch the file associated with the previous fd at that index.Available since 5.12.
REGISTER_EVENTFD_ASYNC- This works just likeREGISTER_EVENTFD, except notifications are only posted for events that complete in an async manner.This means that events that complete inline while being submitted do not trigger a notification event. The arguments supplied are the same as for
IORING_REGISTER_EVENTFD.Available since 5.6.
REGISTER_PROBE- This operation returns a structure,io_uring_probe, which contains information about theopcodessupported byio_uringon the running kernel.argmust contain a pointer to a structio_uring_probe, andnr_argsmust contain the size of the ops array in that probe struct. Theopsarray is of the typeio_uring_probe_op, which holds the value of theopcodeand aflagsfield. If the flags field hasIO_URING_OP_SUPPORTEDset, then this opcode is supported on the running kernel.Available since 5.6.
REGISTER_PERSONALITY- This operation registers credentials of the running application withio_uring, and returns an id associated with these credentials.Applications wishing to share a ring between separate users/processes can pass in this credential id in the sqe personality field. If set, that particular sqe will be issued with these credentials. Must be invoked with
argset toNULLandnr_argsset to zero.Available since 5.6.
UNREGISTER_PERSONALITY- This operation unregisters a previously registered personality withio_uring.nr_argsmust be set to the id in question, andargmust be set toNULL.Available since 5.6.
REGISTER_RESTRICTIONS-argpoints to a structio_uring_restrictionarray ofnr_argsentries.With an entry it is possible to allow an
registeropcode, or specify whichopcodeand flags of the submission queue entry are allowed, or require certain flags to be specified (these flags must be set on each submission queue entry).All the restrictions must be submitted with a single
io_uring_register()call and they are handled as an allowlist (opcodesand flags not registered, are not allowed).Restrictions can be registered only if the
io_uringring started in a disabled state (SETUP_R_DISABLEDmust be specified in the call tosetup).Available since 5.10.
REGISTER_ENABLE_RINGS- This operation enables anio_uringring started in a disabled state (SETUP_R_DISABLEDwas specified in the call tosetup).While the
io_uringring is disabled, submissions are not allowed and registrations are not restricted. After the execution of this operation, theio_uringring is enabled: submissions and registration are allowed, but they will be validated following the registered restrictions (if any). This operation takes no argument, must be invoked withargset toNULLandnr_argsset to zero.Available since 5.10.
REGISTER_FILES2- Register files for I/O. Similar toREGISTER_FILES.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The
datafield contains a pointer to an array ofnrfile descriptors (signed 32 bit integers).tagsfield should either be 0 or or point to an array ofnr"tags" (unsigned 64 bit integers). SeeREGISTER_BUFFERS2for more info on resource tagging.Note that resource updates, e.g.
REGISTER_FILES_UPDATE, don't necessarily deallocate resources, they might be held until all requests using that resource complete.Available since 5.13.
REGISTER_FILES_UPDATE2- Similar toREGISTER_FILES_UPDATE, replaces existing files in the registered file set with new ones, either turning a sparse entry (one where fd is equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of file descriptors to use for the update stored in data.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_BUFFERS2- Register buffers for I/O.Similar to
REGISTER_BUFFERSbut aims to have a more extensible ABI.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The data field contains a pointer to a struct
iovecarray ofnrentries. Thetagsfield should either be 0, then tagging is disabled, or point to an array ofnr"tags" (unsigned 64 bit integers). If a tag is zero, then tagging for this particular resource (a buffer in this case) is disabled. Otherwise, after the resource had been unregistered and it's not used anymore, a CQE will be posted withuser_dataset to the specified tag and all other fields zeroed.Note that resource updates, e.g.
REGISTER_BUFFERS_UPDATE, don't necessarily deallocate resources by the time it returns, but they might be held alive until all requests using it complete.Available since 5.13.
REGISTER_BUFFERS_UPDATE- Updates registered buffers with new ones, either turning a sparse entry into a real one, or replacing an existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of structiovec.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_IOWQ_AFF- By default, async workers created byio_uringwill inherit the CPU mask of its parent.This is usually all the CPUs in the system, unless the parent is being run with a limited set. If this isn't the desired outcome, the application may explicitly tell
io_uringwhat CPUs the async workers may run on.argmust point to acpu_set_tmask, andnr_argsthe byte size of that mask.Available since 5.14.
UNREGISTER_IOWQ_AFF- Undoes a CPU mask previously set withREGISTER_IOWQ_AFF.Must not have
argornr_argsset.Available since 5.14.
REGISTER_IOWQ_MAX_WORKERS- By default,io_uringlimits the unbounded workers created to the maximum processor count set byRLIMIT_NPROCand the bounded workers is a function of the SQ ring size and the number of CPUs in the system. Sometimes this can be excessive (or too little, for bounded), and this command provides a way to change the count per ring (per NUMA node) instead.argmust be set to an unsigned int pointer to an array of two values, with the values in the array being set to the maximum count of workers per NUMA node. Index 0 holds the bounded worker count, and index 1 holds the unbounded worker count. On successful return, the passed in array will contain the previous maximum valyes for each type. If the count being passed in is 0, then this command returns the current maximum values and doesn't modify the current setting.nr_argsmust be set to 2, as the command takes two values.Available since 5.15.
REGISTER_RING_FDS- Wheneverenteris called to submit request or wait for completions, the kernel must grab a reference to the file descriptor. If the application using io_uring is threaded, the file table is marked as shared, and the reference grab and put of the file descriptor count is more expensive than it is for a non-threaded application.Similarly to how io_uring allows registration of files, this allow registration of the ring file descriptor itself. This reduces the overhead of the
io_uring_enter (2)system call.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Thedatafield of this struct must point to an io_uring file descriptor, and theoffsetfield can be either-1or an explicit offset desired for the registered file descriptor value. If-1is used, then upon successful return of this system call, the field will contain the value of the registered file descriptor to be used for futureio_uring_enter (2)system calls.On successful completion of this request, the returned descriptors may be used instead of the real file descriptor for
io_uring_enter (2), provided thatIORING_ENTER_REGISTERED_RINGis set in theflagsfor the system call. This flag tells the kernel that a registered descriptor is used rather than a real file descriptor.Each thread or process using a ring must register the file descriptor directly by issuing this request.
The maximum number of supported registered ring descriptors is currently limited to
16.Available since 5.18.
UNREGISTER_RING_FDS- Unregister descriptors previously registered withREGISTER_RING_FDS.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Only theoffsetfield should be set in the structure, containing the registered file descriptor offset previously returned fromIORING_REGISTER_RING_FDSthat the application wishes to unregister.Note that this isn't done automatically on ring exit, if the thread or task that previously registered a ring file descriptor isn't exiting. It is recommended to manually unregister any previously registered ring descriptors if the ring is closed and the task persists. This will free up a registration slot, making it available for future use.
Available since 5.18.
REGISTER_PBUF_RING- register ring based provide buffer groupUNREGISTER_PBUF_RING- unregister ring based provide buffer groupREGISTER_SYNC_CANCEL- sync cancelation APIREGISTER_FILE_ALLOC_RANGE- register a range of fixed file slots for automatic slot allocationREGISTER_PBUF_STATUS- return status information for a buffer groupREGISTER_NAPI- set busy poll settingsUNREGISTER_NAPI- clear busy poll settingsREGISTER_CLOCKREGISTER_CLONE_BUFFERS- clone registered buffers from source ring to current ringREGISTER_LASTREGISTER_USE_REGISTERED_RING
- See Also:
-
IORING_REGISTER_NAPI
public static final int IORING_REGISTER_NAPIio_uring_register_opEnum values:
REGISTER_BUFFERS-argpoints to a structiovecarray ofnr_argsentries.The buffers associated with the
iovecswill be locked in memory and charged against the user'sRLIMIT_MEMLOCKresource limit. Seegetrlimit(2)for more information. Additionally, there is a size limit of 1GiB per buffer. Currently, the buffers must be anonymous, non-file-backed memory, such as that returned bymalloc(3)ormmap(2)with theMAP_ANONYMOUSflag set. It is expected that this limitation will be lifted in the future. Huge pages are supported as well. Note that the entire huge page will be pinned in the kernel, even if only a portion of it is used.After a successful call, the supplied buffers are mapped into the kernel and eligible for I/O. To make use of them, the application must specify the
OP_READ_FIXEDorOP_WRITE_FIXEDopcodesin the submission queue entry (see the structio_uring_sqedefinition inenter), and set thebuf_indexfield to the desired buffer index. The memory range described by the submission queue entry'saddrandlenfields must fall within the indexed buffer.It is perfectly valid to setup a large buffer and then only use part of it for an I/O, as long as the range is within the originally mapped region.
An application can increase or decrease the size or number of registered buffers by first unregistering the existing buffers, and then issuing a new call to
io_uring_register()with the new buffers.Note that before 5.13 registering buffers would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding.
An application need not unregister buffers explicitly before shutting down the io_uring instance.
Available since 5.1.
UNREGISTER_BUFFERS- This operation takes no argument, andargmust be passed asNULL.All previously registered buffers associated with the
io_uringinstance will be released.Available since 5.1.
REGISTER_FILES- Register files for I/O.argcontains a pointer to an array ofnr_argsfile descriptors (signed 32 bit integers). To make use of the registered files, theIOSQE_FIXED_FILEflag must be set in theflagsmember of the structio_uring_sqe, and thefdmember is set to the index of the file in the file descriptor array.The file set may be sparse, meaning that the
fdfield in the array may be set to -1. SeeREGISTER_FILES_UPDATEfor how to update files in place.Note that before 5.13 registering files would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding. See
REGISTER_FILES_UPDATEfor how to update an existing set without that limitation.Files are automatically unregistered when the io_uring instance is torn down. An application needs only unregister if it wishes to register a new set of fds.
Available since 5.1.
UNREGISTER_FILES- This operation requires no argument, andargmust be passed asNULL.All previously registered files associated with the
io_uringinstance will be unregistered.Available since 5.1.
REGISTER_EVENTFD- It's possible to useeventfd(2)to get notified of completion events on anio_uringinstance. If this is desired, an eventfd file descriptor can be registered through this operation.argmust contain a pointer to the eventfd file descriptor, andnr_argsmust be 1.Available since 5.2.
An application can temporarily disable notifications, coming through the registered eventfd, by setting the
CQ_EVENTFD_DISABLEDbit in theflagsfield of the CQ ring.Available since 5.8.
UNREGISTER_EVENTFD- Unregister an eventfd file descriptor to stop notifications.Since only one eventfd descriptor is currently supported, this operation takes no argument, and
argmust be passed asNULLandnr_argsmust be zero.Available since 5.2.
REGISTER_FILES_UPDATE- This operation replaces existing files in the registered file set with new ones, either turning a sparse entry (one wherefdis equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_files_update, which contains an offset on which to start the update, and an array of file descriptors to use for the update.nr_argsmust contain the number of descriptors in the passed in array.Available since 5.5.
File descriptors can be skipped if they are set to
REGISTER_FILES_SKIP. Skipping an fd will not touch the file associated with the previous fd at that index.Available since 5.12.
REGISTER_EVENTFD_ASYNC- This works just likeREGISTER_EVENTFD, except notifications are only posted for events that complete in an async manner.This means that events that complete inline while being submitted do not trigger a notification event. The arguments supplied are the same as for
IORING_REGISTER_EVENTFD.Available since 5.6.
REGISTER_PROBE- This operation returns a structure,io_uring_probe, which contains information about theopcodessupported byio_uringon the running kernel.argmust contain a pointer to a structio_uring_probe, andnr_argsmust contain the size of the ops array in that probe struct. Theopsarray is of the typeio_uring_probe_op, which holds the value of theopcodeand aflagsfield. If the flags field hasIO_URING_OP_SUPPORTEDset, then this opcode is supported on the running kernel.Available since 5.6.
REGISTER_PERSONALITY- This operation registers credentials of the running application withio_uring, and returns an id associated with these credentials.Applications wishing to share a ring between separate users/processes can pass in this credential id in the sqe personality field. If set, that particular sqe will be issued with these credentials. Must be invoked with
argset toNULLandnr_argsset to zero.Available since 5.6.
UNREGISTER_PERSONALITY- This operation unregisters a previously registered personality withio_uring.nr_argsmust be set to the id in question, andargmust be set toNULL.Available since 5.6.
REGISTER_RESTRICTIONS-argpoints to a structio_uring_restrictionarray ofnr_argsentries.With an entry it is possible to allow an
registeropcode, or specify whichopcodeand flags of the submission queue entry are allowed, or require certain flags to be specified (these flags must be set on each submission queue entry).All the restrictions must be submitted with a single
io_uring_register()call and they are handled as an allowlist (opcodesand flags not registered, are not allowed).Restrictions can be registered only if the
io_uringring started in a disabled state (SETUP_R_DISABLEDmust be specified in the call tosetup).Available since 5.10.
REGISTER_ENABLE_RINGS- This operation enables anio_uringring started in a disabled state (SETUP_R_DISABLEDwas specified in the call tosetup).While the
io_uringring is disabled, submissions are not allowed and registrations are not restricted. After the execution of this operation, theio_uringring is enabled: submissions and registration are allowed, but they will be validated following the registered restrictions (if any). This operation takes no argument, must be invoked withargset toNULLandnr_argsset to zero.Available since 5.10.
REGISTER_FILES2- Register files for I/O. Similar toREGISTER_FILES.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The
datafield contains a pointer to an array ofnrfile descriptors (signed 32 bit integers).tagsfield should either be 0 or or point to an array ofnr"tags" (unsigned 64 bit integers). SeeREGISTER_BUFFERS2for more info on resource tagging.Note that resource updates, e.g.
REGISTER_FILES_UPDATE, don't necessarily deallocate resources, they might be held until all requests using that resource complete.Available since 5.13.
REGISTER_FILES_UPDATE2- Similar toREGISTER_FILES_UPDATE, replaces existing files in the registered file set with new ones, either turning a sparse entry (one where fd is equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of file descriptors to use for the update stored in data.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_BUFFERS2- Register buffers for I/O.Similar to
REGISTER_BUFFERSbut aims to have a more extensible ABI.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The data field contains a pointer to a struct
iovecarray ofnrentries. Thetagsfield should either be 0, then tagging is disabled, or point to an array ofnr"tags" (unsigned 64 bit integers). If a tag is zero, then tagging for this particular resource (a buffer in this case) is disabled. Otherwise, after the resource had been unregistered and it's not used anymore, a CQE will be posted withuser_dataset to the specified tag and all other fields zeroed.Note that resource updates, e.g.
REGISTER_BUFFERS_UPDATE, don't necessarily deallocate resources by the time it returns, but they might be held alive until all requests using it complete.Available since 5.13.
REGISTER_BUFFERS_UPDATE- Updates registered buffers with new ones, either turning a sparse entry into a real one, or replacing an existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of structiovec.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_IOWQ_AFF- By default, async workers created byio_uringwill inherit the CPU mask of its parent.This is usually all the CPUs in the system, unless the parent is being run with a limited set. If this isn't the desired outcome, the application may explicitly tell
io_uringwhat CPUs the async workers may run on.argmust point to acpu_set_tmask, andnr_argsthe byte size of that mask.Available since 5.14.
UNREGISTER_IOWQ_AFF- Undoes a CPU mask previously set withREGISTER_IOWQ_AFF.Must not have
argornr_argsset.Available since 5.14.
REGISTER_IOWQ_MAX_WORKERS- By default,io_uringlimits the unbounded workers created to the maximum processor count set byRLIMIT_NPROCand the bounded workers is a function of the SQ ring size and the number of CPUs in the system. Sometimes this can be excessive (or too little, for bounded), and this command provides a way to change the count per ring (per NUMA node) instead.argmust be set to an unsigned int pointer to an array of two values, with the values in the array being set to the maximum count of workers per NUMA node. Index 0 holds the bounded worker count, and index 1 holds the unbounded worker count. On successful return, the passed in array will contain the previous maximum valyes for each type. If the count being passed in is 0, then this command returns the current maximum values and doesn't modify the current setting.nr_argsmust be set to 2, as the command takes two values.Available since 5.15.
REGISTER_RING_FDS- Wheneverenteris called to submit request or wait for completions, the kernel must grab a reference to the file descriptor. If the application using io_uring is threaded, the file table is marked as shared, and the reference grab and put of the file descriptor count is more expensive than it is for a non-threaded application.Similarly to how io_uring allows registration of files, this allow registration of the ring file descriptor itself. This reduces the overhead of the
io_uring_enter (2)system call.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Thedatafield of this struct must point to an io_uring file descriptor, and theoffsetfield can be either-1or an explicit offset desired for the registered file descriptor value. If-1is used, then upon successful return of this system call, the field will contain the value of the registered file descriptor to be used for futureio_uring_enter (2)system calls.On successful completion of this request, the returned descriptors may be used instead of the real file descriptor for
io_uring_enter (2), provided thatIORING_ENTER_REGISTERED_RINGis set in theflagsfor the system call. This flag tells the kernel that a registered descriptor is used rather than a real file descriptor.Each thread or process using a ring must register the file descriptor directly by issuing this request.
The maximum number of supported registered ring descriptors is currently limited to
16.Available since 5.18.
UNREGISTER_RING_FDS- Unregister descriptors previously registered withREGISTER_RING_FDS.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Only theoffsetfield should be set in the structure, containing the registered file descriptor offset previously returned fromIORING_REGISTER_RING_FDSthat the application wishes to unregister.Note that this isn't done automatically on ring exit, if the thread or task that previously registered a ring file descriptor isn't exiting. It is recommended to manually unregister any previously registered ring descriptors if the ring is closed and the task persists. This will free up a registration slot, making it available for future use.
Available since 5.18.
REGISTER_PBUF_RING- register ring based provide buffer groupUNREGISTER_PBUF_RING- unregister ring based provide buffer groupREGISTER_SYNC_CANCEL- sync cancelation APIREGISTER_FILE_ALLOC_RANGE- register a range of fixed file slots for automatic slot allocationREGISTER_PBUF_STATUS- return status information for a buffer groupREGISTER_NAPI- set busy poll settingsUNREGISTER_NAPI- clear busy poll settingsREGISTER_CLOCKREGISTER_CLONE_BUFFERS- clone registered buffers from source ring to current ringREGISTER_LASTREGISTER_USE_REGISTERED_RING
- See Also:
-
IORING_UNREGISTER_NAPI
public static final int IORING_UNREGISTER_NAPIio_uring_register_opEnum values:
REGISTER_BUFFERS-argpoints to a structiovecarray ofnr_argsentries.The buffers associated with the
iovecswill be locked in memory and charged against the user'sRLIMIT_MEMLOCKresource limit. Seegetrlimit(2)for more information. Additionally, there is a size limit of 1GiB per buffer. Currently, the buffers must be anonymous, non-file-backed memory, such as that returned bymalloc(3)ormmap(2)with theMAP_ANONYMOUSflag set. It is expected that this limitation will be lifted in the future. Huge pages are supported as well. Note that the entire huge page will be pinned in the kernel, even if only a portion of it is used.After a successful call, the supplied buffers are mapped into the kernel and eligible for I/O. To make use of them, the application must specify the
OP_READ_FIXEDorOP_WRITE_FIXEDopcodesin the submission queue entry (see the structio_uring_sqedefinition inenter), and set thebuf_indexfield to the desired buffer index. The memory range described by the submission queue entry'saddrandlenfields must fall within the indexed buffer.It is perfectly valid to setup a large buffer and then only use part of it for an I/O, as long as the range is within the originally mapped region.
An application can increase or decrease the size or number of registered buffers by first unregistering the existing buffers, and then issuing a new call to
io_uring_register()with the new buffers.Note that before 5.13 registering buffers would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding.
An application need not unregister buffers explicitly before shutting down the io_uring instance.
Available since 5.1.
UNREGISTER_BUFFERS- This operation takes no argument, andargmust be passed asNULL.All previously registered buffers associated with the
io_uringinstance will be released.Available since 5.1.
REGISTER_FILES- Register files for I/O.argcontains a pointer to an array ofnr_argsfile descriptors (signed 32 bit integers). To make use of the registered files, theIOSQE_FIXED_FILEflag must be set in theflagsmember of the structio_uring_sqe, and thefdmember is set to the index of the file in the file descriptor array.The file set may be sparse, meaning that the
fdfield in the array may be set to -1. SeeREGISTER_FILES_UPDATEfor how to update files in place.Note that before 5.13 registering files would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding. See
REGISTER_FILES_UPDATEfor how to update an existing set without that limitation.Files are automatically unregistered when the io_uring instance is torn down. An application needs only unregister if it wishes to register a new set of fds.
Available since 5.1.
UNREGISTER_FILES- This operation requires no argument, andargmust be passed asNULL.All previously registered files associated with the
io_uringinstance will be unregistered.Available since 5.1.
REGISTER_EVENTFD- It's possible to useeventfd(2)to get notified of completion events on anio_uringinstance. If this is desired, an eventfd file descriptor can be registered through this operation.argmust contain a pointer to the eventfd file descriptor, andnr_argsmust be 1.Available since 5.2.
An application can temporarily disable notifications, coming through the registered eventfd, by setting the
CQ_EVENTFD_DISABLEDbit in theflagsfield of the CQ ring.Available since 5.8.
UNREGISTER_EVENTFD- Unregister an eventfd file descriptor to stop notifications.Since only one eventfd descriptor is currently supported, this operation takes no argument, and
argmust be passed asNULLandnr_argsmust be zero.Available since 5.2.
REGISTER_FILES_UPDATE- This operation replaces existing files in the registered file set with new ones, either turning a sparse entry (one wherefdis equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_files_update, which contains an offset on which to start the update, and an array of file descriptors to use for the update.nr_argsmust contain the number of descriptors in the passed in array.Available since 5.5.
File descriptors can be skipped if they are set to
REGISTER_FILES_SKIP. Skipping an fd will not touch the file associated with the previous fd at that index.Available since 5.12.
REGISTER_EVENTFD_ASYNC- This works just likeREGISTER_EVENTFD, except notifications are only posted for events that complete in an async manner.This means that events that complete inline while being submitted do not trigger a notification event. The arguments supplied are the same as for
IORING_REGISTER_EVENTFD.Available since 5.6.
REGISTER_PROBE- This operation returns a structure,io_uring_probe, which contains information about theopcodessupported byio_uringon the running kernel.argmust contain a pointer to a structio_uring_probe, andnr_argsmust contain the size of the ops array in that probe struct. Theopsarray is of the typeio_uring_probe_op, which holds the value of theopcodeand aflagsfield. If the flags field hasIO_URING_OP_SUPPORTEDset, then this opcode is supported on the running kernel.Available since 5.6.
REGISTER_PERSONALITY- This operation registers credentials of the running application withio_uring, and returns an id associated with these credentials.Applications wishing to share a ring between separate users/processes can pass in this credential id in the sqe personality field. If set, that particular sqe will be issued with these credentials. Must be invoked with
argset toNULLandnr_argsset to zero.Available since 5.6.
UNREGISTER_PERSONALITY- This operation unregisters a previously registered personality withio_uring.nr_argsmust be set to the id in question, andargmust be set toNULL.Available since 5.6.
REGISTER_RESTRICTIONS-argpoints to a structio_uring_restrictionarray ofnr_argsentries.With an entry it is possible to allow an
registeropcode, or specify whichopcodeand flags of the submission queue entry are allowed, or require certain flags to be specified (these flags must be set on each submission queue entry).All the restrictions must be submitted with a single
io_uring_register()call and they are handled as an allowlist (opcodesand flags not registered, are not allowed).Restrictions can be registered only if the
io_uringring started in a disabled state (SETUP_R_DISABLEDmust be specified in the call tosetup).Available since 5.10.
REGISTER_ENABLE_RINGS- This operation enables anio_uringring started in a disabled state (SETUP_R_DISABLEDwas specified in the call tosetup).While the
io_uringring is disabled, submissions are not allowed and registrations are not restricted. After the execution of this operation, theio_uringring is enabled: submissions and registration are allowed, but they will be validated following the registered restrictions (if any). This operation takes no argument, must be invoked withargset toNULLandnr_argsset to zero.Available since 5.10.
REGISTER_FILES2- Register files for I/O. Similar toREGISTER_FILES.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The
datafield contains a pointer to an array ofnrfile descriptors (signed 32 bit integers).tagsfield should either be 0 or or point to an array ofnr"tags" (unsigned 64 bit integers). SeeREGISTER_BUFFERS2for more info on resource tagging.Note that resource updates, e.g.
REGISTER_FILES_UPDATE, don't necessarily deallocate resources, they might be held until all requests using that resource complete.Available since 5.13.
REGISTER_FILES_UPDATE2- Similar toREGISTER_FILES_UPDATE, replaces existing files in the registered file set with new ones, either turning a sparse entry (one where fd is equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of file descriptors to use for the update stored in data.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_BUFFERS2- Register buffers for I/O.Similar to
REGISTER_BUFFERSbut aims to have a more extensible ABI.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The data field contains a pointer to a struct
iovecarray ofnrentries. Thetagsfield should either be 0, then tagging is disabled, or point to an array ofnr"tags" (unsigned 64 bit integers). If a tag is zero, then tagging for this particular resource (a buffer in this case) is disabled. Otherwise, after the resource had been unregistered and it's not used anymore, a CQE will be posted withuser_dataset to the specified tag and all other fields zeroed.Note that resource updates, e.g.
REGISTER_BUFFERS_UPDATE, don't necessarily deallocate resources by the time it returns, but they might be held alive until all requests using it complete.Available since 5.13.
REGISTER_BUFFERS_UPDATE- Updates registered buffers with new ones, either turning a sparse entry into a real one, or replacing an existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of structiovec.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_IOWQ_AFF- By default, async workers created byio_uringwill inherit the CPU mask of its parent.This is usually all the CPUs in the system, unless the parent is being run with a limited set. If this isn't the desired outcome, the application may explicitly tell
io_uringwhat CPUs the async workers may run on.argmust point to acpu_set_tmask, andnr_argsthe byte size of that mask.Available since 5.14.
UNREGISTER_IOWQ_AFF- Undoes a CPU mask previously set withREGISTER_IOWQ_AFF.Must not have
argornr_argsset.Available since 5.14.
REGISTER_IOWQ_MAX_WORKERS- By default,io_uringlimits the unbounded workers created to the maximum processor count set byRLIMIT_NPROCand the bounded workers is a function of the SQ ring size and the number of CPUs in the system. Sometimes this can be excessive (or too little, for bounded), and this command provides a way to change the count per ring (per NUMA node) instead.argmust be set to an unsigned int pointer to an array of two values, with the values in the array being set to the maximum count of workers per NUMA node. Index 0 holds the bounded worker count, and index 1 holds the unbounded worker count. On successful return, the passed in array will contain the previous maximum valyes for each type. If the count being passed in is 0, then this command returns the current maximum values and doesn't modify the current setting.nr_argsmust be set to 2, as the command takes two values.Available since 5.15.
REGISTER_RING_FDS- Wheneverenteris called to submit request or wait for completions, the kernel must grab a reference to the file descriptor. If the application using io_uring is threaded, the file table is marked as shared, and the reference grab and put of the file descriptor count is more expensive than it is for a non-threaded application.Similarly to how io_uring allows registration of files, this allow registration of the ring file descriptor itself. This reduces the overhead of the
io_uring_enter (2)system call.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Thedatafield of this struct must point to an io_uring file descriptor, and theoffsetfield can be either-1or an explicit offset desired for the registered file descriptor value. If-1is used, then upon successful return of this system call, the field will contain the value of the registered file descriptor to be used for futureio_uring_enter (2)system calls.On successful completion of this request, the returned descriptors may be used instead of the real file descriptor for
io_uring_enter (2), provided thatIORING_ENTER_REGISTERED_RINGis set in theflagsfor the system call. This flag tells the kernel that a registered descriptor is used rather than a real file descriptor.Each thread or process using a ring must register the file descriptor directly by issuing this request.
The maximum number of supported registered ring descriptors is currently limited to
16.Available since 5.18.
UNREGISTER_RING_FDS- Unregister descriptors previously registered withREGISTER_RING_FDS.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Only theoffsetfield should be set in the structure, containing the registered file descriptor offset previously returned fromIORING_REGISTER_RING_FDSthat the application wishes to unregister.Note that this isn't done automatically on ring exit, if the thread or task that previously registered a ring file descriptor isn't exiting. It is recommended to manually unregister any previously registered ring descriptors if the ring is closed and the task persists. This will free up a registration slot, making it available for future use.
Available since 5.18.
REGISTER_PBUF_RING- register ring based provide buffer groupUNREGISTER_PBUF_RING- unregister ring based provide buffer groupREGISTER_SYNC_CANCEL- sync cancelation APIREGISTER_FILE_ALLOC_RANGE- register a range of fixed file slots for automatic slot allocationREGISTER_PBUF_STATUS- return status information for a buffer groupREGISTER_NAPI- set busy poll settingsUNREGISTER_NAPI- clear busy poll settingsREGISTER_CLOCKREGISTER_CLONE_BUFFERS- clone registered buffers from source ring to current ringREGISTER_LASTREGISTER_USE_REGISTERED_RING
- See Also:
-
IORING_REGISTER_CLOCK
public static final int IORING_REGISTER_CLOCKio_uring_register_opEnum values:
REGISTER_BUFFERS-argpoints to a structiovecarray ofnr_argsentries.The buffers associated with the
iovecswill be locked in memory and charged against the user'sRLIMIT_MEMLOCKresource limit. Seegetrlimit(2)for more information. Additionally, there is a size limit of 1GiB per buffer. Currently, the buffers must be anonymous, non-file-backed memory, such as that returned bymalloc(3)ormmap(2)with theMAP_ANONYMOUSflag set. It is expected that this limitation will be lifted in the future. Huge pages are supported as well. Note that the entire huge page will be pinned in the kernel, even if only a portion of it is used.After a successful call, the supplied buffers are mapped into the kernel and eligible for I/O. To make use of them, the application must specify the
OP_READ_FIXEDorOP_WRITE_FIXEDopcodesin the submission queue entry (see the structio_uring_sqedefinition inenter), and set thebuf_indexfield to the desired buffer index. The memory range described by the submission queue entry'saddrandlenfields must fall within the indexed buffer.It is perfectly valid to setup a large buffer and then only use part of it for an I/O, as long as the range is within the originally mapped region.
An application can increase or decrease the size or number of registered buffers by first unregistering the existing buffers, and then issuing a new call to
io_uring_register()with the new buffers.Note that before 5.13 registering buffers would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding.
An application need not unregister buffers explicitly before shutting down the io_uring instance.
Available since 5.1.
UNREGISTER_BUFFERS- This operation takes no argument, andargmust be passed asNULL.All previously registered buffers associated with the
io_uringinstance will be released.Available since 5.1.
REGISTER_FILES- Register files for I/O.argcontains a pointer to an array ofnr_argsfile descriptors (signed 32 bit integers). To make use of the registered files, theIOSQE_FIXED_FILEflag must be set in theflagsmember of the structio_uring_sqe, and thefdmember is set to the index of the file in the file descriptor array.The file set may be sparse, meaning that the
fdfield in the array may be set to -1. SeeREGISTER_FILES_UPDATEfor how to update files in place.Note that before 5.13 registering files would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding. See
REGISTER_FILES_UPDATEfor how to update an existing set without that limitation.Files are automatically unregistered when the io_uring instance is torn down. An application needs only unregister if it wishes to register a new set of fds.
Available since 5.1.
UNREGISTER_FILES- This operation requires no argument, andargmust be passed asNULL.All previously registered files associated with the
io_uringinstance will be unregistered.Available since 5.1.
REGISTER_EVENTFD- It's possible to useeventfd(2)to get notified of completion events on anio_uringinstance. If this is desired, an eventfd file descriptor can be registered through this operation.argmust contain a pointer to the eventfd file descriptor, andnr_argsmust be 1.Available since 5.2.
An application can temporarily disable notifications, coming through the registered eventfd, by setting the
CQ_EVENTFD_DISABLEDbit in theflagsfield of the CQ ring.Available since 5.8.
UNREGISTER_EVENTFD- Unregister an eventfd file descriptor to stop notifications.Since only one eventfd descriptor is currently supported, this operation takes no argument, and
argmust be passed asNULLandnr_argsmust be zero.Available since 5.2.
REGISTER_FILES_UPDATE- This operation replaces existing files in the registered file set with new ones, either turning a sparse entry (one wherefdis equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_files_update, which contains an offset on which to start the update, and an array of file descriptors to use for the update.nr_argsmust contain the number of descriptors in the passed in array.Available since 5.5.
File descriptors can be skipped if they are set to
REGISTER_FILES_SKIP. Skipping an fd will not touch the file associated with the previous fd at that index.Available since 5.12.
REGISTER_EVENTFD_ASYNC- This works just likeREGISTER_EVENTFD, except notifications are only posted for events that complete in an async manner.This means that events that complete inline while being submitted do not trigger a notification event. The arguments supplied are the same as for
IORING_REGISTER_EVENTFD.Available since 5.6.
REGISTER_PROBE- This operation returns a structure,io_uring_probe, which contains information about theopcodessupported byio_uringon the running kernel.argmust contain a pointer to a structio_uring_probe, andnr_argsmust contain the size of the ops array in that probe struct. Theopsarray is of the typeio_uring_probe_op, which holds the value of theopcodeand aflagsfield. If the flags field hasIO_URING_OP_SUPPORTEDset, then this opcode is supported on the running kernel.Available since 5.6.
REGISTER_PERSONALITY- This operation registers credentials of the running application withio_uring, and returns an id associated with these credentials.Applications wishing to share a ring between separate users/processes can pass in this credential id in the sqe personality field. If set, that particular sqe will be issued with these credentials. Must be invoked with
argset toNULLandnr_argsset to zero.Available since 5.6.
UNREGISTER_PERSONALITY- This operation unregisters a previously registered personality withio_uring.nr_argsmust be set to the id in question, andargmust be set toNULL.Available since 5.6.
REGISTER_RESTRICTIONS-argpoints to a structio_uring_restrictionarray ofnr_argsentries.With an entry it is possible to allow an
registeropcode, or specify whichopcodeand flags of the submission queue entry are allowed, or require certain flags to be specified (these flags must be set on each submission queue entry).All the restrictions must be submitted with a single
io_uring_register()call and they are handled as an allowlist (opcodesand flags not registered, are not allowed).Restrictions can be registered only if the
io_uringring started in a disabled state (SETUP_R_DISABLEDmust be specified in the call tosetup).Available since 5.10.
REGISTER_ENABLE_RINGS- This operation enables anio_uringring started in a disabled state (SETUP_R_DISABLEDwas specified in the call tosetup).While the
io_uringring is disabled, submissions are not allowed and registrations are not restricted. After the execution of this operation, theio_uringring is enabled: submissions and registration are allowed, but they will be validated following the registered restrictions (if any). This operation takes no argument, must be invoked withargset toNULLandnr_argsset to zero.Available since 5.10.
REGISTER_FILES2- Register files for I/O. Similar toREGISTER_FILES.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The
datafield contains a pointer to an array ofnrfile descriptors (signed 32 bit integers).tagsfield should either be 0 or or point to an array ofnr"tags" (unsigned 64 bit integers). SeeREGISTER_BUFFERS2for more info on resource tagging.Note that resource updates, e.g.
REGISTER_FILES_UPDATE, don't necessarily deallocate resources, they might be held until all requests using that resource complete.Available since 5.13.
REGISTER_FILES_UPDATE2- Similar toREGISTER_FILES_UPDATE, replaces existing files in the registered file set with new ones, either turning a sparse entry (one where fd is equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of file descriptors to use for the update stored in data.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_BUFFERS2- Register buffers for I/O.Similar to
REGISTER_BUFFERSbut aims to have a more extensible ABI.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The data field contains a pointer to a struct
iovecarray ofnrentries. Thetagsfield should either be 0, then tagging is disabled, or point to an array ofnr"tags" (unsigned 64 bit integers). If a tag is zero, then tagging for this particular resource (a buffer in this case) is disabled. Otherwise, after the resource had been unregistered and it's not used anymore, a CQE will be posted withuser_dataset to the specified tag and all other fields zeroed.Note that resource updates, e.g.
REGISTER_BUFFERS_UPDATE, don't necessarily deallocate resources by the time it returns, but they might be held alive until all requests using it complete.Available since 5.13.
REGISTER_BUFFERS_UPDATE- Updates registered buffers with new ones, either turning a sparse entry into a real one, or replacing an existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of structiovec.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_IOWQ_AFF- By default, async workers created byio_uringwill inherit the CPU mask of its parent.This is usually all the CPUs in the system, unless the parent is being run with a limited set. If this isn't the desired outcome, the application may explicitly tell
io_uringwhat CPUs the async workers may run on.argmust point to acpu_set_tmask, andnr_argsthe byte size of that mask.Available since 5.14.
UNREGISTER_IOWQ_AFF- Undoes a CPU mask previously set withREGISTER_IOWQ_AFF.Must not have
argornr_argsset.Available since 5.14.
REGISTER_IOWQ_MAX_WORKERS- By default,io_uringlimits the unbounded workers created to the maximum processor count set byRLIMIT_NPROCand the bounded workers is a function of the SQ ring size and the number of CPUs in the system. Sometimes this can be excessive (or too little, for bounded), and this command provides a way to change the count per ring (per NUMA node) instead.argmust be set to an unsigned int pointer to an array of two values, with the values in the array being set to the maximum count of workers per NUMA node. Index 0 holds the bounded worker count, and index 1 holds the unbounded worker count. On successful return, the passed in array will contain the previous maximum valyes for each type. If the count being passed in is 0, then this command returns the current maximum values and doesn't modify the current setting.nr_argsmust be set to 2, as the command takes two values.Available since 5.15.
REGISTER_RING_FDS- Wheneverenteris called to submit request or wait for completions, the kernel must grab a reference to the file descriptor. If the application using io_uring is threaded, the file table is marked as shared, and the reference grab and put of the file descriptor count is more expensive than it is for a non-threaded application.Similarly to how io_uring allows registration of files, this allow registration of the ring file descriptor itself. This reduces the overhead of the
io_uring_enter (2)system call.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Thedatafield of this struct must point to an io_uring file descriptor, and theoffsetfield can be either-1or an explicit offset desired for the registered file descriptor value. If-1is used, then upon successful return of this system call, the field will contain the value of the registered file descriptor to be used for futureio_uring_enter (2)system calls.On successful completion of this request, the returned descriptors may be used instead of the real file descriptor for
io_uring_enter (2), provided thatIORING_ENTER_REGISTERED_RINGis set in theflagsfor the system call. This flag tells the kernel that a registered descriptor is used rather than a real file descriptor.Each thread or process using a ring must register the file descriptor directly by issuing this request.
The maximum number of supported registered ring descriptors is currently limited to
16.Available since 5.18.
UNREGISTER_RING_FDS- Unregister descriptors previously registered withREGISTER_RING_FDS.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Only theoffsetfield should be set in the structure, containing the registered file descriptor offset previously returned fromIORING_REGISTER_RING_FDSthat the application wishes to unregister.Note that this isn't done automatically on ring exit, if the thread or task that previously registered a ring file descriptor isn't exiting. It is recommended to manually unregister any previously registered ring descriptors if the ring is closed and the task persists. This will free up a registration slot, making it available for future use.
Available since 5.18.
REGISTER_PBUF_RING- register ring based provide buffer groupUNREGISTER_PBUF_RING- unregister ring based provide buffer groupREGISTER_SYNC_CANCEL- sync cancelation APIREGISTER_FILE_ALLOC_RANGE- register a range of fixed file slots for automatic slot allocationREGISTER_PBUF_STATUS- return status information for a buffer groupREGISTER_NAPI- set busy poll settingsUNREGISTER_NAPI- clear busy poll settingsREGISTER_CLOCKREGISTER_CLONE_BUFFERS- clone registered buffers from source ring to current ringREGISTER_LASTREGISTER_USE_REGISTERED_RING
- See Also:
-
IORING_REGISTER_CLONE_BUFFERS
public static final int IORING_REGISTER_CLONE_BUFFERSio_uring_register_opEnum values:
REGISTER_BUFFERS-argpoints to a structiovecarray ofnr_argsentries.The buffers associated with the
iovecswill be locked in memory and charged against the user'sRLIMIT_MEMLOCKresource limit. Seegetrlimit(2)for more information. Additionally, there is a size limit of 1GiB per buffer. Currently, the buffers must be anonymous, non-file-backed memory, such as that returned bymalloc(3)ormmap(2)with theMAP_ANONYMOUSflag set. It is expected that this limitation will be lifted in the future. Huge pages are supported as well. Note that the entire huge page will be pinned in the kernel, even if only a portion of it is used.After a successful call, the supplied buffers are mapped into the kernel and eligible for I/O. To make use of them, the application must specify the
OP_READ_FIXEDorOP_WRITE_FIXEDopcodesin the submission queue entry (see the structio_uring_sqedefinition inenter), and set thebuf_indexfield to the desired buffer index. The memory range described by the submission queue entry'saddrandlenfields must fall within the indexed buffer.It is perfectly valid to setup a large buffer and then only use part of it for an I/O, as long as the range is within the originally mapped region.
An application can increase or decrease the size or number of registered buffers by first unregistering the existing buffers, and then issuing a new call to
io_uring_register()with the new buffers.Note that before 5.13 registering buffers would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding.
An application need not unregister buffers explicitly before shutting down the io_uring instance.
Available since 5.1.
UNREGISTER_BUFFERS- This operation takes no argument, andargmust be passed asNULL.All previously registered buffers associated with the
io_uringinstance will be released.Available since 5.1.
REGISTER_FILES- Register files for I/O.argcontains a pointer to an array ofnr_argsfile descriptors (signed 32 bit integers). To make use of the registered files, theIOSQE_FIXED_FILEflag must be set in theflagsmember of the structio_uring_sqe, and thefdmember is set to the index of the file in the file descriptor array.The file set may be sparse, meaning that the
fdfield in the array may be set to -1. SeeREGISTER_FILES_UPDATEfor how to update files in place.Note that before 5.13 registering files would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding. See
REGISTER_FILES_UPDATEfor how to update an existing set without that limitation.Files are automatically unregistered when the io_uring instance is torn down. An application needs only unregister if it wishes to register a new set of fds.
Available since 5.1.
UNREGISTER_FILES- This operation requires no argument, andargmust be passed asNULL.All previously registered files associated with the
io_uringinstance will be unregistered.Available since 5.1.
REGISTER_EVENTFD- It's possible to useeventfd(2)to get notified of completion events on anio_uringinstance. If this is desired, an eventfd file descriptor can be registered through this operation.argmust contain a pointer to the eventfd file descriptor, andnr_argsmust be 1.Available since 5.2.
An application can temporarily disable notifications, coming through the registered eventfd, by setting the
CQ_EVENTFD_DISABLEDbit in theflagsfield of the CQ ring.Available since 5.8.
UNREGISTER_EVENTFD- Unregister an eventfd file descriptor to stop notifications.Since only one eventfd descriptor is currently supported, this operation takes no argument, and
argmust be passed asNULLandnr_argsmust be zero.Available since 5.2.
REGISTER_FILES_UPDATE- This operation replaces existing files in the registered file set with new ones, either turning a sparse entry (one wherefdis equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_files_update, which contains an offset on which to start the update, and an array of file descriptors to use for the update.nr_argsmust contain the number of descriptors in the passed in array.Available since 5.5.
File descriptors can be skipped if they are set to
REGISTER_FILES_SKIP. Skipping an fd will not touch the file associated with the previous fd at that index.Available since 5.12.
REGISTER_EVENTFD_ASYNC- This works just likeREGISTER_EVENTFD, except notifications are only posted for events that complete in an async manner.This means that events that complete inline while being submitted do not trigger a notification event. The arguments supplied are the same as for
IORING_REGISTER_EVENTFD.Available since 5.6.
REGISTER_PROBE- This operation returns a structure,io_uring_probe, which contains information about theopcodessupported byio_uringon the running kernel.argmust contain a pointer to a structio_uring_probe, andnr_argsmust contain the size of the ops array in that probe struct. Theopsarray is of the typeio_uring_probe_op, which holds the value of theopcodeand aflagsfield. If the flags field hasIO_URING_OP_SUPPORTEDset, then this opcode is supported on the running kernel.Available since 5.6.
REGISTER_PERSONALITY- This operation registers credentials of the running application withio_uring, and returns an id associated with these credentials.Applications wishing to share a ring between separate users/processes can pass in this credential id in the sqe personality field. If set, that particular sqe will be issued with these credentials. Must be invoked with
argset toNULLandnr_argsset to zero.Available since 5.6.
UNREGISTER_PERSONALITY- This operation unregisters a previously registered personality withio_uring.nr_argsmust be set to the id in question, andargmust be set toNULL.Available since 5.6.
REGISTER_RESTRICTIONS-argpoints to a structio_uring_restrictionarray ofnr_argsentries.With an entry it is possible to allow an
registeropcode, or specify whichopcodeand flags of the submission queue entry are allowed, or require certain flags to be specified (these flags must be set on each submission queue entry).All the restrictions must be submitted with a single
io_uring_register()call and they are handled as an allowlist (opcodesand flags not registered, are not allowed).Restrictions can be registered only if the
io_uringring started in a disabled state (SETUP_R_DISABLEDmust be specified in the call tosetup).Available since 5.10.
REGISTER_ENABLE_RINGS- This operation enables anio_uringring started in a disabled state (SETUP_R_DISABLEDwas specified in the call tosetup).While the
io_uringring is disabled, submissions are not allowed and registrations are not restricted. After the execution of this operation, theio_uringring is enabled: submissions and registration are allowed, but they will be validated following the registered restrictions (if any). This operation takes no argument, must be invoked withargset toNULLandnr_argsset to zero.Available since 5.10.
REGISTER_FILES2- Register files for I/O. Similar toREGISTER_FILES.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The
datafield contains a pointer to an array ofnrfile descriptors (signed 32 bit integers).tagsfield should either be 0 or or point to an array ofnr"tags" (unsigned 64 bit integers). SeeREGISTER_BUFFERS2for more info on resource tagging.Note that resource updates, e.g.
REGISTER_FILES_UPDATE, don't necessarily deallocate resources, they might be held until all requests using that resource complete.Available since 5.13.
REGISTER_FILES_UPDATE2- Similar toREGISTER_FILES_UPDATE, replaces existing files in the registered file set with new ones, either turning a sparse entry (one where fd is equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of file descriptors to use for the update stored in data.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_BUFFERS2- Register buffers for I/O.Similar to
REGISTER_BUFFERSbut aims to have a more extensible ABI.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The data field contains a pointer to a struct
iovecarray ofnrentries. Thetagsfield should either be 0, then tagging is disabled, or point to an array ofnr"tags" (unsigned 64 bit integers). If a tag is zero, then tagging for this particular resource (a buffer in this case) is disabled. Otherwise, after the resource had been unregistered and it's not used anymore, a CQE will be posted withuser_dataset to the specified tag and all other fields zeroed.Note that resource updates, e.g.
REGISTER_BUFFERS_UPDATE, don't necessarily deallocate resources by the time it returns, but they might be held alive until all requests using it complete.Available since 5.13.
REGISTER_BUFFERS_UPDATE- Updates registered buffers with new ones, either turning a sparse entry into a real one, or replacing an existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of structiovec.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_IOWQ_AFF- By default, async workers created byio_uringwill inherit the CPU mask of its parent.This is usually all the CPUs in the system, unless the parent is being run with a limited set. If this isn't the desired outcome, the application may explicitly tell
io_uringwhat CPUs the async workers may run on.argmust point to acpu_set_tmask, andnr_argsthe byte size of that mask.Available since 5.14.
UNREGISTER_IOWQ_AFF- Undoes a CPU mask previously set withREGISTER_IOWQ_AFF.Must not have
argornr_argsset.Available since 5.14.
REGISTER_IOWQ_MAX_WORKERS- By default,io_uringlimits the unbounded workers created to the maximum processor count set byRLIMIT_NPROCand the bounded workers is a function of the SQ ring size and the number of CPUs in the system. Sometimes this can be excessive (or too little, for bounded), and this command provides a way to change the count per ring (per NUMA node) instead.argmust be set to an unsigned int pointer to an array of two values, with the values in the array being set to the maximum count of workers per NUMA node. Index 0 holds the bounded worker count, and index 1 holds the unbounded worker count. On successful return, the passed in array will contain the previous maximum valyes for each type. If the count being passed in is 0, then this command returns the current maximum values and doesn't modify the current setting.nr_argsmust be set to 2, as the command takes two values.Available since 5.15.
REGISTER_RING_FDS- Wheneverenteris called to submit request or wait for completions, the kernel must grab a reference to the file descriptor. If the application using io_uring is threaded, the file table is marked as shared, and the reference grab and put of the file descriptor count is more expensive than it is for a non-threaded application.Similarly to how io_uring allows registration of files, this allow registration of the ring file descriptor itself. This reduces the overhead of the
io_uring_enter (2)system call.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Thedatafield of this struct must point to an io_uring file descriptor, and theoffsetfield can be either-1or an explicit offset desired for the registered file descriptor value. If-1is used, then upon successful return of this system call, the field will contain the value of the registered file descriptor to be used for futureio_uring_enter (2)system calls.On successful completion of this request, the returned descriptors may be used instead of the real file descriptor for
io_uring_enter (2), provided thatIORING_ENTER_REGISTERED_RINGis set in theflagsfor the system call. This flag tells the kernel that a registered descriptor is used rather than a real file descriptor.Each thread or process using a ring must register the file descriptor directly by issuing this request.
The maximum number of supported registered ring descriptors is currently limited to
16.Available since 5.18.
UNREGISTER_RING_FDS- Unregister descriptors previously registered withREGISTER_RING_FDS.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Only theoffsetfield should be set in the structure, containing the registered file descriptor offset previously returned fromIORING_REGISTER_RING_FDSthat the application wishes to unregister.Note that this isn't done automatically on ring exit, if the thread or task that previously registered a ring file descriptor isn't exiting. It is recommended to manually unregister any previously registered ring descriptors if the ring is closed and the task persists. This will free up a registration slot, making it available for future use.
Available since 5.18.
REGISTER_PBUF_RING- register ring based provide buffer groupUNREGISTER_PBUF_RING- unregister ring based provide buffer groupREGISTER_SYNC_CANCEL- sync cancelation APIREGISTER_FILE_ALLOC_RANGE- register a range of fixed file slots for automatic slot allocationREGISTER_PBUF_STATUS- return status information for a buffer groupREGISTER_NAPI- set busy poll settingsUNREGISTER_NAPI- clear busy poll settingsREGISTER_CLOCKREGISTER_CLONE_BUFFERS- clone registered buffers from source ring to current ringREGISTER_LASTREGISTER_USE_REGISTERED_RING
- See Also:
-
IORING_REGISTER_LAST
public static final int IORING_REGISTER_LASTio_uring_register_opEnum values:
REGISTER_BUFFERS-argpoints to a structiovecarray ofnr_argsentries.The buffers associated with the
iovecswill be locked in memory and charged against the user'sRLIMIT_MEMLOCKresource limit. Seegetrlimit(2)for more information. Additionally, there is a size limit of 1GiB per buffer. Currently, the buffers must be anonymous, non-file-backed memory, such as that returned bymalloc(3)ormmap(2)with theMAP_ANONYMOUSflag set. It is expected that this limitation will be lifted in the future. Huge pages are supported as well. Note that the entire huge page will be pinned in the kernel, even if only a portion of it is used.After a successful call, the supplied buffers are mapped into the kernel and eligible for I/O. To make use of them, the application must specify the
OP_READ_FIXEDorOP_WRITE_FIXEDopcodesin the submission queue entry (see the structio_uring_sqedefinition inenter), and set thebuf_indexfield to the desired buffer index. The memory range described by the submission queue entry'saddrandlenfields must fall within the indexed buffer.It is perfectly valid to setup a large buffer and then only use part of it for an I/O, as long as the range is within the originally mapped region.
An application can increase or decrease the size or number of registered buffers by first unregistering the existing buffers, and then issuing a new call to
io_uring_register()with the new buffers.Note that before 5.13 registering buffers would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding.
An application need not unregister buffers explicitly before shutting down the io_uring instance.
Available since 5.1.
UNREGISTER_BUFFERS- This operation takes no argument, andargmust be passed asNULL.All previously registered buffers associated with the
io_uringinstance will be released.Available since 5.1.
REGISTER_FILES- Register files for I/O.argcontains a pointer to an array ofnr_argsfile descriptors (signed 32 bit integers). To make use of the registered files, theIOSQE_FIXED_FILEflag must be set in theflagsmember of the structio_uring_sqe, and thefdmember is set to the index of the file in the file descriptor array.The file set may be sparse, meaning that the
fdfield in the array may be set to -1. SeeREGISTER_FILES_UPDATEfor how to update files in place.Note that before 5.13 registering files would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding. See
REGISTER_FILES_UPDATEfor how to update an existing set without that limitation.Files are automatically unregistered when the io_uring instance is torn down. An application needs only unregister if it wishes to register a new set of fds.
Available since 5.1.
UNREGISTER_FILES- This operation requires no argument, andargmust be passed asNULL.All previously registered files associated with the
io_uringinstance will be unregistered.Available since 5.1.
REGISTER_EVENTFD- It's possible to useeventfd(2)to get notified of completion events on anio_uringinstance. If this is desired, an eventfd file descriptor can be registered through this operation.argmust contain a pointer to the eventfd file descriptor, andnr_argsmust be 1.Available since 5.2.
An application can temporarily disable notifications, coming through the registered eventfd, by setting the
CQ_EVENTFD_DISABLEDbit in theflagsfield of the CQ ring.Available since 5.8.
UNREGISTER_EVENTFD- Unregister an eventfd file descriptor to stop notifications.Since only one eventfd descriptor is currently supported, this operation takes no argument, and
argmust be passed asNULLandnr_argsmust be zero.Available since 5.2.
REGISTER_FILES_UPDATE- This operation replaces existing files in the registered file set with new ones, either turning a sparse entry (one wherefdis equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_files_update, which contains an offset on which to start the update, and an array of file descriptors to use for the update.nr_argsmust contain the number of descriptors in the passed in array.Available since 5.5.
File descriptors can be skipped if they are set to
REGISTER_FILES_SKIP. Skipping an fd will not touch the file associated with the previous fd at that index.Available since 5.12.
REGISTER_EVENTFD_ASYNC- This works just likeREGISTER_EVENTFD, except notifications are only posted for events that complete in an async manner.This means that events that complete inline while being submitted do not trigger a notification event. The arguments supplied are the same as for
IORING_REGISTER_EVENTFD.Available since 5.6.
REGISTER_PROBE- This operation returns a structure,io_uring_probe, which contains information about theopcodessupported byio_uringon the running kernel.argmust contain a pointer to a structio_uring_probe, andnr_argsmust contain the size of the ops array in that probe struct. Theopsarray is of the typeio_uring_probe_op, which holds the value of theopcodeand aflagsfield. If the flags field hasIO_URING_OP_SUPPORTEDset, then this opcode is supported on the running kernel.Available since 5.6.
REGISTER_PERSONALITY- This operation registers credentials of the running application withio_uring, and returns an id associated with these credentials.Applications wishing to share a ring between separate users/processes can pass in this credential id in the sqe personality field. If set, that particular sqe will be issued with these credentials. Must be invoked with
argset toNULLandnr_argsset to zero.Available since 5.6.
UNREGISTER_PERSONALITY- This operation unregisters a previously registered personality withio_uring.nr_argsmust be set to the id in question, andargmust be set toNULL.Available since 5.6.
REGISTER_RESTRICTIONS-argpoints to a structio_uring_restrictionarray ofnr_argsentries.With an entry it is possible to allow an
registeropcode, or specify whichopcodeand flags of the submission queue entry are allowed, or require certain flags to be specified (these flags must be set on each submission queue entry).All the restrictions must be submitted with a single
io_uring_register()call and they are handled as an allowlist (opcodesand flags not registered, are not allowed).Restrictions can be registered only if the
io_uringring started in a disabled state (SETUP_R_DISABLEDmust be specified in the call tosetup).Available since 5.10.
REGISTER_ENABLE_RINGS- This operation enables anio_uringring started in a disabled state (SETUP_R_DISABLEDwas specified in the call tosetup).While the
io_uringring is disabled, submissions are not allowed and registrations are not restricted. After the execution of this operation, theio_uringring is enabled: submissions and registration are allowed, but they will be validated following the registered restrictions (if any). This operation takes no argument, must be invoked withargset toNULLandnr_argsset to zero.Available since 5.10.
REGISTER_FILES2- Register files for I/O. Similar toREGISTER_FILES.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The
datafield contains a pointer to an array ofnrfile descriptors (signed 32 bit integers).tagsfield should either be 0 or or point to an array ofnr"tags" (unsigned 64 bit integers). SeeREGISTER_BUFFERS2for more info on resource tagging.Note that resource updates, e.g.
REGISTER_FILES_UPDATE, don't necessarily deallocate resources, they might be held until all requests using that resource complete.Available since 5.13.
REGISTER_FILES_UPDATE2- Similar toREGISTER_FILES_UPDATE, replaces existing files in the registered file set with new ones, either turning a sparse entry (one where fd is equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of file descriptors to use for the update stored in data.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_BUFFERS2- Register buffers for I/O.Similar to
REGISTER_BUFFERSbut aims to have a more extensible ABI.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The data field contains a pointer to a struct
iovecarray ofnrentries. Thetagsfield should either be 0, then tagging is disabled, or point to an array ofnr"tags" (unsigned 64 bit integers). If a tag is zero, then tagging for this particular resource (a buffer in this case) is disabled. Otherwise, after the resource had been unregistered and it's not used anymore, a CQE will be posted withuser_dataset to the specified tag and all other fields zeroed.Note that resource updates, e.g.
REGISTER_BUFFERS_UPDATE, don't necessarily deallocate resources by the time it returns, but they might be held alive until all requests using it complete.Available since 5.13.
REGISTER_BUFFERS_UPDATE- Updates registered buffers with new ones, either turning a sparse entry into a real one, or replacing an existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of structiovec.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_IOWQ_AFF- By default, async workers created byio_uringwill inherit the CPU mask of its parent.This is usually all the CPUs in the system, unless the parent is being run with a limited set. If this isn't the desired outcome, the application may explicitly tell
io_uringwhat CPUs the async workers may run on.argmust point to acpu_set_tmask, andnr_argsthe byte size of that mask.Available since 5.14.
UNREGISTER_IOWQ_AFF- Undoes a CPU mask previously set withREGISTER_IOWQ_AFF.Must not have
argornr_argsset.Available since 5.14.
REGISTER_IOWQ_MAX_WORKERS- By default,io_uringlimits the unbounded workers created to the maximum processor count set byRLIMIT_NPROCand the bounded workers is a function of the SQ ring size and the number of CPUs in the system. Sometimes this can be excessive (or too little, for bounded), and this command provides a way to change the count per ring (per NUMA node) instead.argmust be set to an unsigned int pointer to an array of two values, with the values in the array being set to the maximum count of workers per NUMA node. Index 0 holds the bounded worker count, and index 1 holds the unbounded worker count. On successful return, the passed in array will contain the previous maximum valyes for each type. If the count being passed in is 0, then this command returns the current maximum values and doesn't modify the current setting.nr_argsmust be set to 2, as the command takes two values.Available since 5.15.
REGISTER_RING_FDS- Wheneverenteris called to submit request or wait for completions, the kernel must grab a reference to the file descriptor. If the application using io_uring is threaded, the file table is marked as shared, and the reference grab and put of the file descriptor count is more expensive than it is for a non-threaded application.Similarly to how io_uring allows registration of files, this allow registration of the ring file descriptor itself. This reduces the overhead of the
io_uring_enter (2)system call.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Thedatafield of this struct must point to an io_uring file descriptor, and theoffsetfield can be either-1or an explicit offset desired for the registered file descriptor value. If-1is used, then upon successful return of this system call, the field will contain the value of the registered file descriptor to be used for futureio_uring_enter (2)system calls.On successful completion of this request, the returned descriptors may be used instead of the real file descriptor for
io_uring_enter (2), provided thatIORING_ENTER_REGISTERED_RINGis set in theflagsfor the system call. This flag tells the kernel that a registered descriptor is used rather than a real file descriptor.Each thread or process using a ring must register the file descriptor directly by issuing this request.
The maximum number of supported registered ring descriptors is currently limited to
16.Available since 5.18.
UNREGISTER_RING_FDS- Unregister descriptors previously registered withREGISTER_RING_FDS.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Only theoffsetfield should be set in the structure, containing the registered file descriptor offset previously returned fromIORING_REGISTER_RING_FDSthat the application wishes to unregister.Note that this isn't done automatically on ring exit, if the thread or task that previously registered a ring file descriptor isn't exiting. It is recommended to manually unregister any previously registered ring descriptors if the ring is closed and the task persists. This will free up a registration slot, making it available for future use.
Available since 5.18.
REGISTER_PBUF_RING- register ring based provide buffer groupUNREGISTER_PBUF_RING- unregister ring based provide buffer groupREGISTER_SYNC_CANCEL- sync cancelation APIREGISTER_FILE_ALLOC_RANGE- register a range of fixed file slots for automatic slot allocationREGISTER_PBUF_STATUS- return status information for a buffer groupREGISTER_NAPI- set busy poll settingsUNREGISTER_NAPI- clear busy poll settingsREGISTER_CLOCKREGISTER_CLONE_BUFFERS- clone registered buffers from source ring to current ringREGISTER_LASTREGISTER_USE_REGISTERED_RING
- See Also:
-
IORING_REGISTER_USE_REGISTERED_RING
public static final int IORING_REGISTER_USE_REGISTERED_RINGio_uring_register_opEnum values:
REGISTER_BUFFERS-argpoints to a structiovecarray ofnr_argsentries.The buffers associated with the
iovecswill be locked in memory and charged against the user'sRLIMIT_MEMLOCKresource limit. Seegetrlimit(2)for more information. Additionally, there is a size limit of 1GiB per buffer. Currently, the buffers must be anonymous, non-file-backed memory, such as that returned bymalloc(3)ormmap(2)with theMAP_ANONYMOUSflag set. It is expected that this limitation will be lifted in the future. Huge pages are supported as well. Note that the entire huge page will be pinned in the kernel, even if only a portion of it is used.After a successful call, the supplied buffers are mapped into the kernel and eligible for I/O. To make use of them, the application must specify the
OP_READ_FIXEDorOP_WRITE_FIXEDopcodesin the submission queue entry (see the structio_uring_sqedefinition inenter), and set thebuf_indexfield to the desired buffer index. The memory range described by the submission queue entry'saddrandlenfields must fall within the indexed buffer.It is perfectly valid to setup a large buffer and then only use part of it for an I/O, as long as the range is within the originally mapped region.
An application can increase or decrease the size or number of registered buffers by first unregistering the existing buffers, and then issuing a new call to
io_uring_register()with the new buffers.Note that before 5.13 registering buffers would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding.
An application need not unregister buffers explicitly before shutting down the io_uring instance.
Available since 5.1.
UNREGISTER_BUFFERS- This operation takes no argument, andargmust be passed asNULL.All previously registered buffers associated with the
io_uringinstance will be released.Available since 5.1.
REGISTER_FILES- Register files for I/O.argcontains a pointer to an array ofnr_argsfile descriptors (signed 32 bit integers). To make use of the registered files, theIOSQE_FIXED_FILEflag must be set in theflagsmember of the structio_uring_sqe, and thefdmember is set to the index of the file in the file descriptor array.The file set may be sparse, meaning that the
fdfield in the array may be set to -1. SeeREGISTER_FILES_UPDATEfor how to update files in place.Note that before 5.13 registering files would wait for the ring to idle. If the application currently has requests in-flight, the registration will wait for those to finish before proceeding. See
REGISTER_FILES_UPDATEfor how to update an existing set without that limitation.Files are automatically unregistered when the io_uring instance is torn down. An application needs only unregister if it wishes to register a new set of fds.
Available since 5.1.
UNREGISTER_FILES- This operation requires no argument, andargmust be passed asNULL.All previously registered files associated with the
io_uringinstance will be unregistered.Available since 5.1.
REGISTER_EVENTFD- It's possible to useeventfd(2)to get notified of completion events on anio_uringinstance. If this is desired, an eventfd file descriptor can be registered through this operation.argmust contain a pointer to the eventfd file descriptor, andnr_argsmust be 1.Available since 5.2.
An application can temporarily disable notifications, coming through the registered eventfd, by setting the
CQ_EVENTFD_DISABLEDbit in theflagsfield of the CQ ring.Available since 5.8.
UNREGISTER_EVENTFD- Unregister an eventfd file descriptor to stop notifications.Since only one eventfd descriptor is currently supported, this operation takes no argument, and
argmust be passed asNULLandnr_argsmust be zero.Available since 5.2.
REGISTER_FILES_UPDATE- This operation replaces existing files in the registered file set with new ones, either turning a sparse entry (one wherefdis equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_files_update, which contains an offset on which to start the update, and an array of file descriptors to use for the update.nr_argsmust contain the number of descriptors in the passed in array.Available since 5.5.
File descriptors can be skipped if they are set to
REGISTER_FILES_SKIP. Skipping an fd will not touch the file associated with the previous fd at that index.Available since 5.12.
REGISTER_EVENTFD_ASYNC- This works just likeREGISTER_EVENTFD, except notifications are only posted for events that complete in an async manner.This means that events that complete inline while being submitted do not trigger a notification event. The arguments supplied are the same as for
IORING_REGISTER_EVENTFD.Available since 5.6.
REGISTER_PROBE- This operation returns a structure,io_uring_probe, which contains information about theopcodessupported byio_uringon the running kernel.argmust contain a pointer to a structio_uring_probe, andnr_argsmust contain the size of the ops array in that probe struct. Theopsarray is of the typeio_uring_probe_op, which holds the value of theopcodeand aflagsfield. If the flags field hasIO_URING_OP_SUPPORTEDset, then this opcode is supported on the running kernel.Available since 5.6.
REGISTER_PERSONALITY- This operation registers credentials of the running application withio_uring, and returns an id associated with these credentials.Applications wishing to share a ring between separate users/processes can pass in this credential id in the sqe personality field. If set, that particular sqe will be issued with these credentials. Must be invoked with
argset toNULLandnr_argsset to zero.Available since 5.6.
UNREGISTER_PERSONALITY- This operation unregisters a previously registered personality withio_uring.nr_argsmust be set to the id in question, andargmust be set toNULL.Available since 5.6.
REGISTER_RESTRICTIONS-argpoints to a structio_uring_restrictionarray ofnr_argsentries.With an entry it is possible to allow an
registeropcode, or specify whichopcodeand flags of the submission queue entry are allowed, or require certain flags to be specified (these flags must be set on each submission queue entry).All the restrictions must be submitted with a single
io_uring_register()call and they are handled as an allowlist (opcodesand flags not registered, are not allowed).Restrictions can be registered only if the
io_uringring started in a disabled state (SETUP_R_DISABLEDmust be specified in the call tosetup).Available since 5.10.
REGISTER_ENABLE_RINGS- This operation enables anio_uringring started in a disabled state (SETUP_R_DISABLEDwas specified in the call tosetup).While the
io_uringring is disabled, submissions are not allowed and registrations are not restricted. After the execution of this operation, theio_uringring is enabled: submissions and registration are allowed, but they will be validated following the registered restrictions (if any). This operation takes no argument, must be invoked withargset toNULLandnr_argsset to zero.Available since 5.10.
REGISTER_FILES2- Register files for I/O. Similar toREGISTER_FILES.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The
datafield contains a pointer to an array ofnrfile descriptors (signed 32 bit integers).tagsfield should either be 0 or or point to an array ofnr"tags" (unsigned 64 bit integers). SeeREGISTER_BUFFERS2for more info on resource tagging.Note that resource updates, e.g.
REGISTER_FILES_UPDATE, don't necessarily deallocate resources, they might be held until all requests using that resource complete.Available since 5.13.
REGISTER_FILES_UPDATE2- Similar toREGISTER_FILES_UPDATE, replaces existing files in the registered file set with new ones, either turning a sparse entry (one where fd is equal to -1) into a real one, removing an existing entry (new one is set to -1), or replacing an existing entry with a new existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of file descriptors to use for the update stored in data.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_BUFFERS2- Register buffers for I/O.Similar to
REGISTER_BUFFERSbut aims to have a more extensible ABI.argpoints to a structio_uring_rsrc_register, andnr_argsshould be set to the number of bytes in the structure.The data field contains a pointer to a struct
iovecarray ofnrentries. Thetagsfield should either be 0, then tagging is disabled, or point to an array ofnr"tags" (unsigned 64 bit integers). If a tag is zero, then tagging for this particular resource (a buffer in this case) is disabled. Otherwise, after the resource had been unregistered and it's not used anymore, a CQE will be posted withuser_dataset to the specified tag and all other fields zeroed.Note that resource updates, e.g.
REGISTER_BUFFERS_UPDATE, don't necessarily deallocate resources by the time it returns, but they might be held alive until all requests using it complete.Available since 5.13.
REGISTER_BUFFERS_UPDATE- Updates registered buffers with new ones, either turning a sparse entry into a real one, or replacing an existing entry.argmust contain a pointer to a structio_uring_rsrc_update2, which contains an offset on which to start the update, and an array of structiovec.tagspoints to an array of tags.nrmust contain the number of descriptors in the passed in arrays. SeeREGISTER_BUFFERS2for the resource tagging description.Available since 5.13.
REGISTER_IOWQ_AFF- By default, async workers created byio_uringwill inherit the CPU mask of its parent.This is usually all the CPUs in the system, unless the parent is being run with a limited set. If this isn't the desired outcome, the application may explicitly tell
io_uringwhat CPUs the async workers may run on.argmust point to acpu_set_tmask, andnr_argsthe byte size of that mask.Available since 5.14.
UNREGISTER_IOWQ_AFF- Undoes a CPU mask previously set withREGISTER_IOWQ_AFF.Must not have
argornr_argsset.Available since 5.14.
REGISTER_IOWQ_MAX_WORKERS- By default,io_uringlimits the unbounded workers created to the maximum processor count set byRLIMIT_NPROCand the bounded workers is a function of the SQ ring size and the number of CPUs in the system. Sometimes this can be excessive (or too little, for bounded), and this command provides a way to change the count per ring (per NUMA node) instead.argmust be set to an unsigned int pointer to an array of two values, with the values in the array being set to the maximum count of workers per NUMA node. Index 0 holds the bounded worker count, and index 1 holds the unbounded worker count. On successful return, the passed in array will contain the previous maximum valyes for each type. If the count being passed in is 0, then this command returns the current maximum values and doesn't modify the current setting.nr_argsmust be set to 2, as the command takes two values.Available since 5.15.
REGISTER_RING_FDS- Wheneverenteris called to submit request or wait for completions, the kernel must grab a reference to the file descriptor. If the application using io_uring is threaded, the file table is marked as shared, and the reference grab and put of the file descriptor count is more expensive than it is for a non-threaded application.Similarly to how io_uring allows registration of files, this allow registration of the ring file descriptor itself. This reduces the overhead of the
io_uring_enter (2)system call.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Thedatafield of this struct must point to an io_uring file descriptor, and theoffsetfield can be either-1or an explicit offset desired for the registered file descriptor value. If-1is used, then upon successful return of this system call, the field will contain the value of the registered file descriptor to be used for futureio_uring_enter (2)system calls.On successful completion of this request, the returned descriptors may be used instead of the real file descriptor for
io_uring_enter (2), provided thatIORING_ENTER_REGISTERED_RINGis set in theflagsfor the system call. This flag tells the kernel that a registered descriptor is used rather than a real file descriptor.Each thread or process using a ring must register the file descriptor directly by issuing this request.
The maximum number of supported registered ring descriptors is currently limited to
16.Available since 5.18.
UNREGISTER_RING_FDS- Unregister descriptors previously registered withREGISTER_RING_FDS.argmust be set to an unsigned int pointer to an array of typestruct io_uring_rsrc_registerofnr_argsnumber of entries. Only theoffsetfield should be set in the structure, containing the registered file descriptor offset previously returned fromIORING_REGISTER_RING_FDSthat the application wishes to unregister.Note that this isn't done automatically on ring exit, if the thread or task that previously registered a ring file descriptor isn't exiting. It is recommended to manually unregister any previously registered ring descriptors if the ring is closed and the task persists. This will free up a registration slot, making it available for future use.
Available since 5.18.
REGISTER_PBUF_RING- register ring based provide buffer groupUNREGISTER_PBUF_RING- unregister ring based provide buffer groupREGISTER_SYNC_CANCEL- sync cancelation APIREGISTER_FILE_ALLOC_RANGE- register a range of fixed file slots for automatic slot allocationREGISTER_PBUF_STATUS- return status information for a buffer groupREGISTER_NAPI- set busy poll settingsUNREGISTER_NAPI- clear busy poll settingsREGISTER_CLOCKREGISTER_CLONE_BUFFERS- clone registered buffers from source ring to current ringREGISTER_LASTREGISTER_USE_REGISTERED_RING
- See Also:
-
IORING_RSRC_REGISTER_SPARSE
public static final int IORING_RSRC_REGISTER_SPARSERegister a fully sparse file space, rather than pass in an array of all -1 file descriptors.- See Also:
-
IO_WQ_BOUND
public static final int IO_WQ_BOUND- See Also:
-
IO_WQ_UNBOUND
public static final int IO_WQ_UNBOUND- See Also:
-
IORING_REGISTER_FILES_SKIP
public static final int IORING_REGISTER_FILES_SKIPSkip updating fd indexes set to this value in the fd table.- See Also:
-
IO_URING_OP_SUPPORTED
public static final int IO_URING_OP_SUPPORTED- See Also:
-
IORING_REGISTER_SRC_REGISTERED
public static final int IORING_REGISTER_SRC_REGISTERED- See Also:
-
IOU_PBUF_RING_MMAP
public static final int IOU_PBUF_RING_MMAPFlags forREGISTER_PBUF_RING.Enum values:
IOU_PBUF_RING_MMAP- If set, kernel will allocate the memory for the ring.The application must not set a
ring_addrin structio_uring_buf_reg, instead it must subsequently callmmap(2)with the offset set as:IORING_OFF_PBUF_RING | (bgid << IORING_OFF_PBUF_SHIFT)to get a virtual mapping for the ring.IOU_PBUF_RING_INC- If set, buffers consumed from this buffer ring can be consumed incrementally.Normally one (or more) buffers are fully consumed. With incremental consumptions, it's feasible to register big ranges of buffers, and each use of it will consume only as much as it needs. This requires that both the kernel and application keep track of where the current read/recv index is at.
- See Also:
-
IOU_PBUF_RING_INC
public static final int IOU_PBUF_RING_INCFlags forREGISTER_PBUF_RING.Enum values:
IOU_PBUF_RING_MMAP- If set, kernel will allocate the memory for the ring.The application must not set a
ring_addrin structio_uring_buf_reg, instead it must subsequently callmmap(2)with the offset set as:IORING_OFF_PBUF_RING | (bgid << IORING_OFF_PBUF_SHIFT)to get a virtual mapping for the ring.IOU_PBUF_RING_INC- If set, buffers consumed from this buffer ring can be consumed incrementally.Normally one (or more) buffers are fully consumed. With incremental consumptions, it's feasible to register big ranges of buffers, and each use of it will consume only as much as it needs. This requires that both the kernel and application keep track of where the current read/recv index is at.
- See Also:
-
IORING_RESTRICTION_REGISTER_OP
public static final int IORING_RESTRICTION_REGISTER_OPio_uring_register_restriction_opEnum values:
RESTRICTION_REGISTER_OP- Allow anio_uring_register(2)opcodeRESTRICTION_SQE_OP- Allow an sqe opcodeRESTRICTION_SQE_FLAGS_ALLOWED- Allow sqe flagsRESTRICTION_SQE_FLAGS_REQUIRED- Require sqe flags (these flags must be set on each submission)RESTRICTION_LAST- Require sqe flags (these flags must be set on each submission)
- See Also:
-
IORING_RESTRICTION_SQE_OP
public static final int IORING_RESTRICTION_SQE_OPio_uring_register_restriction_opEnum values:
RESTRICTION_REGISTER_OP- Allow anio_uring_register(2)opcodeRESTRICTION_SQE_OP- Allow an sqe opcodeRESTRICTION_SQE_FLAGS_ALLOWED- Allow sqe flagsRESTRICTION_SQE_FLAGS_REQUIRED- Require sqe flags (these flags must be set on each submission)RESTRICTION_LAST- Require sqe flags (these flags must be set on each submission)
- See Also:
-
IORING_RESTRICTION_SQE_FLAGS_ALLOWED
public static final int IORING_RESTRICTION_SQE_FLAGS_ALLOWEDio_uring_register_restriction_opEnum values:
RESTRICTION_REGISTER_OP- Allow anio_uring_register(2)opcodeRESTRICTION_SQE_OP- Allow an sqe opcodeRESTRICTION_SQE_FLAGS_ALLOWED- Allow sqe flagsRESTRICTION_SQE_FLAGS_REQUIRED- Require sqe flags (these flags must be set on each submission)RESTRICTION_LAST- Require sqe flags (these flags must be set on each submission)
- See Also:
-
IORING_RESTRICTION_SQE_FLAGS_REQUIRED
public static final int IORING_RESTRICTION_SQE_FLAGS_REQUIREDio_uring_register_restriction_opEnum values:
RESTRICTION_REGISTER_OP- Allow anio_uring_register(2)opcodeRESTRICTION_SQE_OP- Allow an sqe opcodeRESTRICTION_SQE_FLAGS_ALLOWED- Allow sqe flagsRESTRICTION_SQE_FLAGS_REQUIRED- Require sqe flags (these flags must be set on each submission)RESTRICTION_LAST- Require sqe flags (these flags must be set on each submission)
- See Also:
-
IORING_RESTRICTION_LAST
public static final int IORING_RESTRICTION_LASTio_uring_register_restriction_opEnum values:
RESTRICTION_REGISTER_OP- Allow anio_uring_register(2)opcodeRESTRICTION_SQE_OP- Allow an sqe opcodeRESTRICTION_SQE_FLAGS_ALLOWED- Allow sqe flagsRESTRICTION_SQE_FLAGS_REQUIRED- Require sqe flags (these flags must be set on each submission)RESTRICTION_LAST- Require sqe flags (these flags must be set on each submission)
- See Also:
-
SOCKET_URING_OP_SIOCINQ
public static final int SOCKET_URING_OP_SIOCINQio_uring_socket_op}Enum values:
- See Also:
-
SOCKET_URING_OP_SIOCOUTQ
public static final int SOCKET_URING_OP_SIOCOUTQio_uring_socket_op}Enum values:
- See Also:
-
SOCKET_URING_OP_GETSOCKOPT
public static final int SOCKET_URING_OP_GETSOCKOPTio_uring_socket_op}Enum values:
- See Also:
-
SOCKET_URING_OP_SETSOCKOPT
public static final int SOCKET_URING_OP_SETSOCKOPTio_uring_socket_op}Enum values:
- See Also:
-
-
Method Details
-
nio_uring_setup
public static int nio_uring_setup(long _errno, int entries, long p) Unsafe version of:setup -
io_uring_setup
Theio_uring_setup()system call sets up a submission queue (SQ) and completion queue (CQ) with at leastentriesentries, and returns a file descriptor which can be used to perform subsequent operations on theio_uringinstance.The submission and completion queues are shared between userspace and the kernel, which eliminates the need to copy data when initiating and completing I/O.
Closing the file descriptor returned by
io_uring_setup(2)will free all resources associated with theio_uringcontext.- Parameters:
_errno- optionally returns theerrnovalue after this function is calledp- used by the application to pass options to the kernel, and by the kernel to convey information about the ring buffers- Returns:
- a new file descriptor on success.
The application may then provide the file descriptor in a subsequent
mmap(2)call to map the submission and completion queues, or to theregisterorentersystem calls.On error,
-1is returned anderrnois set appropriately.
-
nio_uring_register
public static int nio_uring_register(long _errno, int fd, int opcode, long arg, int nr_args) Unsafe version of:register -
io_uring_register
public static int io_uring_register(@Nullable IntBuffer _errno, int fd, int opcode, long arg, int nr_args) Theio_uring_register()system call registers resources (e.g. user buffers, files, eventfd, personality, restrictions) for use in anio_uringinstance referenced byfd.Registering files or user buffers allows the kernel to take long term references to internal data structures or create long term mappings of application memory, greatly reducing per-I/O overhead.
- Parameters:
_errno- optionally returns theerrnovalue after this function is calledfd- the file descriptor returned by a call tosetupopcode- one of:- Returns:
- on success, returns 0. On error, -1 is returned, and
errnois set accordingly.
-
nio_uring_enter2
public static int nio_uring_enter2(long _errno, int fd, int to_submit, int min_complete, int flags, long sig, int sz) Unsafe version of:enter2 -
io_uring_enter2
public static int io_uring_enter2(@Nullable IntBuffer _errno, int fd, int to_submit, int min_complete, int flags, long sig, int sz) - Parameters:
_errno- optionally returns theerrnovalue after this function is called
-
nio_uring_enter
public static int nio_uring_enter(long _errno, int fd, int to_submit, int min_complete, int flags, long sig) Unsafe version of:enter -
io_uring_enter
public static int io_uring_enter(@Nullable IntBuffer _errno, int fd, int to_submit, int min_complete, int flags, long sig) io_uring_enter()is used to initiate and complete I/O using the shared submission and completion queues setup by a call tosetup.A single call can both submit new I/O and wait for completions of I/O initiated by this call or previous calls to
io_uring_enter().If the
io_uringinstance was configured for polling, by specifyingSETUP_IOPOLLin the call toio_uring_setup(), thenmin_completehas a slightly different meaning. Passing a value of 0 instructs the kernel to return any events which are already complete, without blocking. Ifmin_completeis a non-zero value, the kernel will still return immediately if any completion events are available. If no event completions are available, then the call will poll either until one or more completions become available, or until the process has exceeded its scheduler time slice.Note that, for interrupt driven I/O (where
IORING_SETUP_IOPOLLwas not specified in the call toio_uring_setup()), an application may check the completion queue for event completions without entering the kernel at all.When the system call returns that a certain amount of SQEs have been consumed and submitted, it's safe to reuse SQE entries in the ring. This is true even if the actual IO submission had to be punted to async context, which means that the SQE may in fact not have been submitted yet. If the kernel requires later use of a particular SQE entry, it will have made a private copy of it.
- Parameters:
_errno- optionally returns theerrnovalue after this function is calledfd- the file descriptor returned bysetupto_submit- the number of I/Os to submit from the submission queueflags- one or more of:ENTER_GETEVENTSENTER_SQ_WAKEUPENTER_SQ_WAITENTER_EXT_ARGENTER_REGISTERED_RINGENTER_ABS_TIMERsig- a pointer to a signal mask (seesigprocmask(2)); ifsigis notNULL,io_uring_enter()first replaces the current signal mask by the one pointed to by sig, then waits for events to become available in the completion queue, and then restores the original signal mask. The followingio_uring_enter()call:ret = io_uring_enter(fd, 0, 1, IORING_ENTER_GETEVENTS, &sig);is equivalent to atomically executing the following calls:
pthread_sigmask(SIG_SETMASK, &sig, &orig); ret = io_uring_enter(fd, 0, 1, IORING_ENTER_GETEVENTS, NULL); pthread_sigmask(SIG_SETMASK, &orig, NULL);See the description of
pselect(2)for an explanation of why thesigparameter is necessary.- Returns:
- the number of I/Os successfully consumed.
This can be zero if
to_submitwas zero or if the submission queue was empty. Note that if the ring was created withSETUP_SQPOLLspecified, then the return value will generally be the same asto_submitas submission happens outside the context of the system call.The errors related to a submission queue entry will be returned through a completion queue entry, rather than through the system call itself.
Errors that occur not on behalf of a submission queue entry are returned via the system call directly. On such an error, -1 is returned and
errnois set appropriately.
-