diff options
Diffstat (limited to 'Documentation/core-api')
-rw-r--r-- | Documentation/core-api/index.rst | 1 | ||||
-rw-r--r-- | Documentation/core-api/padata.rst | 169 | ||||
-rw-r--r-- | Documentation/core-api/xarray.rst | 70 |
3 files changed, 212 insertions, 28 deletions
diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst index ab0eae1c153a..ab0b9ec85506 100644 --- a/Documentation/core-api/index.rst +++ b/Documentation/core-api/index.rst @@ -39,6 +39,7 @@ Core utilities ../RCU/index gcc-plugins symbol-namespaces + padata Interfaces for kernel debugging diff --git a/Documentation/core-api/padata.rst b/Documentation/core-api/padata.rst new file mode 100644 index 000000000000..9a24c111781d --- /dev/null +++ b/Documentation/core-api/padata.rst @@ -0,0 +1,169 @@ +.. SPDX-License-Identifier: GPL-2.0 + +======================================= +The padata parallel execution mechanism +======================================= + +:Date: December 2019 + +Padata is a mechanism by which the kernel can farm jobs out to be done in +parallel on multiple CPUs while retaining their ordering. It was developed for +use with the IPsec code, which needs to be able to perform encryption and +decryption on large numbers of packets without reordering those packets. The +crypto developers made a point of writing padata in a sufficiently general +fashion that it could be put to other uses as well. + +Usage +===== + +Initializing +------------ + +The first step in using padata is to set up a padata_instance structure for +overall control of how jobs are to be run:: + + #include <linux/padata.h> + + struct padata_instance *padata_alloc_possible(const char *name); + +'name' simply identifies the instance. + +There are functions for enabling and disabling the instance:: + + int padata_start(struct padata_instance *pinst); + void padata_stop(struct padata_instance *pinst); + +These functions are setting or clearing the "PADATA_INIT" flag; if that flag is +not set, other functions will refuse to work. padata_start() returns zero on +success (flag set) or -EINVAL if the padata cpumask contains no active CPU +(flag not set). padata_stop() clears the flag and blocks until the padata +instance is unused. + +Finally, complete padata initialization by allocating a padata_shell:: + + struct padata_shell *padata_alloc_shell(struct padata_instance *pinst); + +A padata_shell is used to submit a job to padata and allows a series of such +jobs to be serialized independently. A padata_instance may have one or more +padata_shells associated with it, each allowing a separate series of jobs. + +Modifying cpumasks +------------------ + +The CPUs used to run jobs can be changed in two ways, programatically with +padata_set_cpumask() or via sysfs. The former is defined:: + + int padata_set_cpumask(struct padata_instance *pinst, int cpumask_type, + cpumask_var_t cpumask); + +Here cpumask_type is one of PADATA_CPU_PARALLEL or PADATA_CPU_SERIAL, where a +parallel cpumask describes which processors will be used to execute jobs +submitted to this instance in parallel and a serial cpumask defines which +processors are allowed to be used as the serialization callback processor. +cpumask specifies the new cpumask to use. + +There may be sysfs files for an instance's cpumasks. For example, pcrypt's +live in /sys/kernel/pcrypt/<instance-name>. Within an instance's directory +there are two files, parallel_cpumask and serial_cpumask, and either cpumask +may be changed by echoing a bitmask into the file, for example:: + + echo f > /sys/kernel/pcrypt/pencrypt/parallel_cpumask + +Reading one of these files shows the user-supplied cpumask, which may be +different from the 'usable' cpumask. + +Padata maintains two pairs of cpumasks internally, the user-supplied cpumasks +and the 'usable' cpumasks. (Each pair consists of a parallel and a serial +cpumask.) The user-supplied cpumasks default to all possible CPUs on instance +allocation and may be changed as above. The usable cpumasks are always a +subset of the user-supplied cpumasks and contain only the online CPUs in the +user-supplied masks; these are the cpumasks padata actually uses. So it is +legal to supply a cpumask to padata that contains offline CPUs. Once an +offline CPU in the user-supplied cpumask comes online, padata is going to use +it. + +Changing the CPU masks are expensive operations, so it should not be done with +great frequency. + +Running A Job +------------- + +Actually submitting work to the padata instance requires the creation of a +padata_priv structure, which represents one job:: + + struct padata_priv { + /* Other stuff here... */ + void (*parallel)(struct padata_priv *padata); + void (*serial)(struct padata_priv *padata); + }; + +This structure will almost certainly be embedded within some larger +structure specific to the work to be done. Most of its fields are private to +padata, but the structure should be zeroed at initialisation time, and the +parallel() and serial() functions should be provided. Those functions will +be called in the process of getting the work done as we will see +momentarily. + +The submission of the job is done with:: + + int padata_do_parallel(struct padata_shell *ps, + struct padata_priv *padata, int *cb_cpu); + +The ps and padata structures must be set up as described above; cb_cpu +points to the preferred CPU to be used for the final callback when the job is +done; it must be in the current instance's CPU mask (if not the cb_cpu pointer +is updated to point to the CPU actually chosen). The return value from +padata_do_parallel() is zero on success, indicating that the job is in +progress. -EBUSY means that somebody, somewhere else is messing with the +instance's CPU mask, while -EINVAL is a complaint about cb_cpu not being in the +serial cpumask, no online CPUs in the parallel or serial cpumasks, or a stopped +instance. + +Each job submitted to padata_do_parallel() will, in turn, be passed to +exactly one call to the above-mentioned parallel() function, on one CPU, so +true parallelism is achieved by submitting multiple jobs. parallel() runs with +software interrupts disabled and thus cannot sleep. The parallel() +function gets the padata_priv structure pointer as its lone parameter; +information about the actual work to be done is probably obtained by using +container_of() to find the enclosing structure. + +Note that parallel() has no return value; the padata subsystem assumes that +parallel() will take responsibility for the job from this point. The job +need not be completed during this call, but, if parallel() leaves work +outstanding, it should be prepared to be called again with a new job before +the previous one completes. + +Serializing Jobs +---------------- + +When a job does complete, parallel() (or whatever function actually finishes +the work) should inform padata of the fact with a call to:: + + void padata_do_serial(struct padata_priv *padata); + +At some point in the future, padata_do_serial() will trigger a call to the +serial() function in the padata_priv structure. That call will happen on +the CPU requested in the initial call to padata_do_parallel(); it, too, is +run with local software interrupts disabled. +Note that this call may be deferred for a while since the padata code takes +pains to ensure that jobs are completed in the order in which they were +submitted. + +Destroying +---------- + +Cleaning up a padata instance predictably involves calling the three free +functions that correspond to the allocation in reverse:: + + void padata_free_shell(struct padata_shell *ps); + void padata_stop(struct padata_instance *pinst); + void padata_free(struct padata_instance *pinst); + +It is the user's responsibility to ensure all outstanding jobs are complete +before any of the above are called. + +Interface +========= + +.. kernel-doc:: include/linux/padata.h +.. kernel-doc:: kernel/padata.c diff --git a/Documentation/core-api/xarray.rst b/Documentation/core-api/xarray.rst index fcedc5349ace..640934b6f7b4 100644 --- a/Documentation/core-api/xarray.rst +++ b/Documentation/core-api/xarray.rst @@ -25,10 +25,6 @@ good performance with large indices. If your index can be larger than ``ULONG_MAX`` then the XArray is not the data type for you. The most important user of the XArray is the page cache. -Each non-``NULL`` entry in the array has three bits associated with -it called marks. Each mark may be set or cleared independently of -the others. You can iterate over entries which are marked. - Normal pointers may be stored in the XArray directly. They must be 4-byte aligned, which is true for any pointer returned from kmalloc() and alloc_page(). It isn't true for arbitrary user-space pointers, @@ -41,12 +37,11 @@ When you retrieve an entry from the XArray, you can check whether it is a value entry by calling xa_is_value(), and convert it back to an integer by calling xa_to_value(). -Some users want to store tagged pointers instead of using the marks -described above. They can call xa_tag_pointer() to create an -entry with a tag, xa_untag_pointer() to turn a tagged entry -back into an untagged pointer and xa_pointer_tag() to retrieve -the tag of an entry. Tagged pointers use the same bits that are used -to distinguish value entries from normal pointers, so each user must +Some users want to tag the pointers they store in the XArray. You can +call xa_tag_pointer() to create an entry with a tag, xa_untag_pointer() +to turn a tagged entry back into an untagged pointer and xa_pointer_tag() +to retrieve the tag of an entry. Tagged pointers use the same bits that +are used to distinguish value entries from normal pointers, so you must decide whether they want to store value entries or tagged pointers in any particular XArray. @@ -56,10 +51,9 @@ conflict with value entries or internal entries. An unusual feature of the XArray is the ability to create entries which occupy a range of indices. Once stored to, looking up any index in the range will return the same entry as looking up any other index in -the range. Setting a mark on one index will set it on all of them. -Storing to any index will store to all of them. Multi-index entries can -be explicitly split into smaller entries, or storing ``NULL`` into any -entry will cause the XArray to forget about the range. +the range. Storing to any index will store to all of them. Multi-index +entries can be explicitly split into smaller entries, or storing ``NULL`` +into any entry will cause the XArray to forget about the range. Normal API ========== @@ -87,17 +81,11 @@ If you want to only store a new entry to an index if the current entry at that index is ``NULL``, you can use xa_insert() which returns ``-EBUSY`` if the entry is not empty. -You can enquire whether a mark is set on an entry by using -xa_get_mark(). If the entry is not ``NULL``, you can set a mark -on it by using xa_set_mark() and remove the mark from an entry by -calling xa_clear_mark(). You can ask whether any entry in the -XArray has a particular mark set by calling xa_marked(). - You can copy entries out of the XArray into a plain array by calling -xa_extract(). Or you can iterate over the present entries in -the XArray by calling xa_for_each(). You may prefer to use -xa_find() or xa_find_after() to move to the next present -entry in the XArray. +xa_extract(). Or you can iterate over the present entries in the XArray +by calling xa_for_each(), xa_for_each_start() or xa_for_each_range(). +You may prefer to use xa_find() or xa_find_after() to move to the next +present entry in the XArray. Calling xa_store_range() stores the same entry in a range of indices. If you do this, some of the other operations will behave @@ -124,6 +112,31 @@ xa_destroy(). If the XArray entries are pointers, you may wish to free the entries first. You can do this by iterating over all present entries in the XArray using the xa_for_each() iterator. +Search Marks +------------ + +Each entry in the array has three bits associated with it called marks. +Each mark may be set or cleared independently of the others. You can +iterate over marked entries by using the xa_for_each_marked() iterator. + +You can enquire whether a mark is set on an entry by using +xa_get_mark(). If the entry is not ``NULL``, you can set a mark on it +by using xa_set_mark() and remove the mark from an entry by calling +xa_clear_mark(). You can ask whether any entry in the XArray has a +particular mark set by calling xa_marked(). Erasing an entry from the +XArray causes all marks associated with that entry to be cleared. + +Setting or clearing a mark on any index of a multi-index entry will +affect all indices covered by that entry. Querying the mark on any +index will return the same result. + +There is no way to iterate over entries which are not marked; the data +structure does not allow this to be implemented efficiently. There are +not currently iterators to search for logical combinations of bits (eg +iterate over all entries which have both ``XA_MARK_1`` and ``XA_MARK_2`` +set, or iterate over all entries which have ``XA_MARK_0`` or ``XA_MARK_2`` +set). It would be possible to add these if a user arises. + Allocating XArrays ------------------ @@ -180,6 +193,8 @@ No lock needed: Takes RCU read lock: * xa_load() * xa_for_each() + * xa_for_each_start() + * xa_for_each_range() * xa_find() * xa_find_after() * xa_extract() @@ -419,10 +434,9 @@ you last processed. If you have interrupts disabled while iterating, then it is good manners to pause the iteration and reenable interrupts every ``XA_CHECK_SCHED`` entries. -The xas_get_mark(), xas_set_mark() and -xas_clear_mark() functions require the xa_state cursor to have -been moved to the appropriate location in the xarray; they will do -nothing if you have called xas_pause() or xas_set() +The xas_get_mark(), xas_set_mark() and xas_clear_mark() functions require +the xa_state cursor to have been moved to the appropriate location in the +XArray; they will do nothing if you have called xas_pause() or xas_set() immediately before. You can call xas_set_update() to have a callback function |