diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2019-07-08 16:12:03 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2019-07-08 16:12:03 -0700 |
commit | e1928328699a582a540b105e5f4c160832a7fdcb (patch) | |
tree | f36bb303b8648189d7b5a7feb27e58fe9fe3b9f0 /arch | |
parent | 46f1ec23a46940846f86a91c46f7119d8a8b5de1 (diff) | |
parent | 9156e545765e467e6268c4814cfa609ebb16237e (diff) | |
download | blackbird-op-linux-e1928328699a582a540b105e5f4c160832a7fdcb.tar.gz blackbird-op-linux-e1928328699a582a540b105e5f4c160832a7fdcb.zip |
Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
"The main changes in this cycle are:
- rwsem scalability improvements, phase #2, by Waiman Long, which are
rather impressive:
"On a 2-socket 40-core 80-thread Skylake system with 40 reader
and writer locking threads, the min/mean/max locking operations
done in a 5-second testing window before the patchset were:
40 readers, Iterations Min/Mean/Max = 1,807/1,808/1,810
40 writers, Iterations Min/Mean/Max = 1,807/50,344/151,255
After the patchset, they became:
40 readers, Iterations Min/Mean/Max = 30,057/31,359/32,741
40 writers, Iterations Min/Mean/Max = 94,466/95,845/97,098"
There's a lot of changes to the locking implementation that makes
it similar to qrwlock, including owner handoff for more fair
locking.
Another microbenchmark shows how across the spectrum the
improvements are:
"With a locking microbenchmark running on 5.1 based kernel, the
total locking rates (in kops/s) on a 2-socket Skylake system
with equal numbers of readers and writers (mixed) before and
after this patchset were:
# of Threads Before Patch After Patch
------------ ------------ -----------
2 2,618 4,193
4 1,202 3,726
8 802 3,622
16 729 3,359
32 319 2,826
64 102 2,744"
The changes are extensive and the patch-set has been through
several iterations addressing various locking workloads. There
might be more regressions, but unless they are pathological I
believe we want to use this new implementation as the baseline
going forward.
- jump-label optimizations by Daniel Bristot de Oliveira: the primary
motivation was to remove IPI disturbance of isolated RT-workload
CPUs, which resulted in the implementation of batched jump-label
updates. Beyond the improvement of the real-time characteristics
kernel, in one test this patchset improved static key update
overhead from 57 msecs to just 1.4 msecs - which is a nice speedup
as well.
- atomic64_t cross-arch type cleanups by Mark Rutland: over the last
~10 years of atomic64_t existence the various types used by the
APIs only had to be self-consistent within each architecture -
which means they became wildly inconsistent across architectures.
Mark puts and end to this by reworking all the atomic64
implementations to use 's64' as the base type for atomic64_t, and
to ensure that this type is consistently used for parameters and
return values in the API, avoiding further problems in this area.
- A large set of small improvements to lockdep by Yuyang Du: type
cleanups, output cleanups, function return type and othr cleanups
all around the place.
- A set of percpu ops cleanups and fixes by Peter Zijlstra.
- Misc other changes - please see the Git log for more details"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (82 commits)
locking/lockdep: increase size of counters for lockdep statistics
locking/atomics: Use sed(1) instead of non-standard head(1) option
locking/lockdep: Move mark_lock() inside CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING
x86/jump_label: Make tp_vec_nr static
x86/percpu: Optimize raw_cpu_xchg()
x86/percpu, sched/fair: Avoid local_clock()
x86/percpu, x86/irq: Relax {set,get}_irq_regs()
x86/percpu: Relax smp_processor_id()
x86/percpu: Differentiate this_cpu_{}() and __this_cpu_{}()
locking/rwsem: Guard against making count negative
locking/rwsem: Adaptive disabling of reader optimistic spinning
locking/rwsem: Enable time-based spinning on reader-owned rwsem
locking/rwsem: Make rwsem->owner an atomic_long_t
locking/rwsem: Enable readers spinning on writer
locking/rwsem: Clarify usage of owner's nonspinaable bit
locking/rwsem: Wake up almost all readers in wait queue
locking/rwsem: More optimal RT task handling of null owner
locking/rwsem: Always release wait_lock before waking up tasks
locking/rwsem: Implement lock handoff to prevent lock starvation
locking/rwsem: Make rwsem_spin_on_owner() return owner state
...
Diffstat (limited to 'arch')
24 files changed, 594 insertions, 410 deletions
diff --git a/arch/alpha/include/asm/atomic.h b/arch/alpha/include/asm/atomic.h index 150a1c5d6a2c..2144530d1428 100644 --- a/arch/alpha/include/asm/atomic.h +++ b/arch/alpha/include/asm/atomic.h @@ -93,9 +93,9 @@ static inline int atomic_fetch_##op##_relaxed(int i, atomic_t *v) \ } #define ATOMIC64_OP(op, asm_op) \ -static __inline__ void atomic64_##op(long i, atomic64_t * v) \ +static __inline__ void atomic64_##op(s64 i, atomic64_t * v) \ { \ - unsigned long temp; \ + s64 temp; \ __asm__ __volatile__( \ "1: ldq_l %0,%1\n" \ " " #asm_op " %0,%2,%0\n" \ @@ -109,9 +109,9 @@ static __inline__ void atomic64_##op(long i, atomic64_t * v) \ } \ #define ATOMIC64_OP_RETURN(op, asm_op) \ -static __inline__ long atomic64_##op##_return_relaxed(long i, atomic64_t * v) \ +static __inline__ s64 atomic64_##op##_return_relaxed(s64 i, atomic64_t * v) \ { \ - long temp, result; \ + s64 temp, result; \ __asm__ __volatile__( \ "1: ldq_l %0,%1\n" \ " " #asm_op " %0,%3,%2\n" \ @@ -128,9 +128,9 @@ static __inline__ long atomic64_##op##_return_relaxed(long i, atomic64_t * v) \ } #define ATOMIC64_FETCH_OP(op, asm_op) \ -static __inline__ long atomic64_fetch_##op##_relaxed(long i, atomic64_t * v) \ +static __inline__ s64 atomic64_fetch_##op##_relaxed(s64 i, atomic64_t * v) \ { \ - long temp, result; \ + s64 temp, result; \ __asm__ __volatile__( \ "1: ldq_l %2,%1\n" \ " " #asm_op " %2,%3,%0\n" \ @@ -246,9 +246,9 @@ static __inline__ int atomic_fetch_add_unless(atomic_t *v, int a, int u) * Atomically adds @a to @v, so long as it was not @u. * Returns the old value of @v. */ -static __inline__ long atomic64_fetch_add_unless(atomic64_t *v, long a, long u) +static __inline__ s64 atomic64_fetch_add_unless(atomic64_t *v, s64 a, s64 u) { - long c, new, old; + s64 c, new, old; smp_mb(); __asm__ __volatile__( "1: ldq_l %[old],%[mem]\n" @@ -276,9 +276,9 @@ static __inline__ long atomic64_fetch_add_unless(atomic64_t *v, long a, long u) * The function returns the old value of *v minus 1, even if * the atomic variable, v, was not decremented. */ -static inline long atomic64_dec_if_positive(atomic64_t *v) +static inline s64 atomic64_dec_if_positive(atomic64_t *v) { - long old, tmp; + s64 old, tmp; smp_mb(); __asm__ __volatile__( "1: ldq_l %[old],%[mem]\n" diff --git a/arch/arc/include/asm/atomic.h b/arch/arc/include/asm/atomic.h index 17cf1c657cb3..7298ce84762e 100644 --- a/arch/arc/include/asm/atomic.h +++ b/arch/arc/include/asm/atomic.h @@ -321,14 +321,14 @@ ATOMIC_OPS(xor, ^=, CTOP_INST_AXOR_DI_R2_R2_R3) */ typedef struct { - aligned_u64 counter; + s64 __aligned(8) counter; } atomic64_t; #define ATOMIC64_INIT(a) { (a) } -static inline long long atomic64_read(const atomic64_t *v) +static inline s64 atomic64_read(const atomic64_t *v) { - unsigned long long val; + s64 val; __asm__ __volatile__( " ldd %0, [%1] \n" @@ -338,7 +338,7 @@ static inline long long atomic64_read(const atomic64_t *v) return val; } -static inline void atomic64_set(atomic64_t *v, long long a) +static inline void atomic64_set(atomic64_t *v, s64 a) { /* * This could have been a simple assignment in "C" but would need @@ -359,9 +359,9 @@ static inline void atomic64_set(atomic64_t *v, long long a) } #define ATOMIC64_OP(op, op1, op2) \ -static inline void atomic64_##op(long long a, atomic64_t *v) \ +static inline void atomic64_##op(s64 a, atomic64_t *v) \ { \ - unsigned long long val; \ + s64 val; \ \ __asm__ __volatile__( \ "1: \n" \ @@ -372,13 +372,13 @@ static inline void atomic64_##op(long long a, atomic64_t *v) \ " bnz 1b \n" \ : "=&r"(val) \ : "r"(&v->counter), "ir"(a) \ - : "cc"); \ + : "cc"); \ } \ #define ATOMIC64_OP_RETURN(op, op1, op2) \ -static inline long long atomic64_##op##_return(long long a, atomic64_t *v) \ +static inline s64 atomic64_##op##_return(s64 a, atomic64_t *v) \ { \ - unsigned long long val; \ + s64 val; \ \ smp_mb(); \ \ @@ -399,9 +399,9 @@ static inline long long atomic64_##op##_return(long long a, atomic64_t *v) \ } #define ATOMIC64_FETCH_OP(op, op1, op2) \ -static inline long long atomic64_fetch_##op(long long a, atomic64_t *v) \ +static inline s64 atomic64_fetch_##op(s64 a, atomic64_t *v) \ { \ - unsigned long long val, orig; \ + s64 val, orig; \ \ smp_mb(); \ \ @@ -441,10 +441,10 @@ ATOMIC64_OPS(xor, xor, xor) #undef ATOMIC64_OP_RETURN #undef ATOMIC64_OP -static inline long long -atomic64_cmpxchg(atomic64_t *ptr, long long expected, long long new) +static inline s64 +atomic64_cmpxchg(atomic64_t *ptr, s64 expected, s64 new) { - long long prev; + s64 prev; smp_mb(); @@ -464,9 +464,9 @@ atomic64_cmpxchg(atomic64_t *ptr, long long expected, long long new) return prev; } -static inline long long atomic64_xchg(atomic64_t *ptr, long long new) +static inline s64 atomic64_xchg(atomic64_t *ptr, s64 new) { - long long prev; + s64 prev; smp_mb(); @@ -492,9 +492,9 @@ static inline long long atomic64_xchg(atomic64_t *ptr, long long new) * the atomic variable, v, was not decremented. */ -static inline long long atomic64_dec_if_positive(atomic64_t *v) +static inline s64 atomic64_dec_if_positive(atomic64_t *v) { - long long val; + s64 val; smp_mb(); @@ -525,10 +525,9 @@ static inline long long atomic64_dec_if_positive(atomic64_t *v) * Atomically adds @a to @v, if it was not @u. * Returns the old value of @v */ -static inline long long atomic64_fetch_add_unless(atomic64_t *v, long long a, - long long u) +static inline s64 atomic64_fetch_add_unless(atomic64_t *v, s64 a, s64 u) { - long long old, temp; + s64 old, temp; smp_mb(); diff --git a/arch/arm/include/asm/atomic.h b/arch/arm/include/asm/atomic.h index 50c3ac5f0809..75bb2c543e59 100644 --- a/arch/arm/include/asm/atomic.h +++ b/arch/arm/include/asm/atomic.h @@ -246,15 +246,15 @@ ATOMIC_OPS(xor, ^=, eor) #ifndef CONFIG_GENERIC_ATOMIC64 typedef struct { - long long counter; + s64 counter; } atomic64_t; #define ATOMIC64_INIT(i) { (i) } #ifdef CONFIG_ARM_LPAE -static inline long long atomic64_read(const atomic64_t *v) +static inline s64 atomic64_read(const atomic64_t *v) { - long long result; + s64 result; __asm__ __volatile__("@ atomic64_read\n" " ldrd %0, %H0, [%1]" @@ -265,7 +265,7 @@ static inline long long atomic64_read(const atomic64_t *v) return result; } -static inline void atomic64_set(atomic64_t *v, long long i) +static inline void atomic64_set(atomic64_t *v, s64 i) { __asm__ __volatile__("@ atomic64_set\n" " strd %2, %H2, [%1]" @@ -274,9 +274,9 @@ static inline void atomic64_set(atomic64_t *v, long long i) ); } #else -static inline long long atomic64_read(const atomic64_t *v) +static inline s64 atomic64_read(const atomic64_t *v) { - long long result; + s64 result; __asm__ __volatile__("@ atomic64_read\n" " ldrexd %0, %H0, [%1]" @@ -287,9 +287,9 @@ static inline long long atomic64_read(const atomic64_t *v) return result; } -static inline void atomic64_set(atomic64_t *v, long long i) +static inline void atomic64_set(atomic64_t *v, s64 i) { - long long tmp; + s64 tmp; prefetchw(&v->counter); __asm__ __volatile__("@ atomic64_set\n" @@ -304,9 +304,9 @@ static inline void atomic64_set(atomic64_t *v, long long i) #endif #define ATOMIC64_OP(op, op1, op2) \ -static inline void atomic64_##op(long long i, atomic64_t *v) \ +static inline void atomic64_##op(s64 i, atomic64_t *v) \ { \ - long long result; \ + s64 result; \ unsigned long tmp; \ \ prefetchw(&v->counter); \ @@ -323,10 +323,10 @@ static inline void atomic64_##op(long long i, atomic64_t *v) \ } \ #define ATOMIC64_OP_RETURN(op, op1, op2) \ -static inline long long \ -atomic64_##op##_return_relaxed(long long i, atomic64_t *v) \ +static inline s64 \ +atomic64_##op##_return_relaxed(s64 i, atomic64_t *v) \ { \ - long long result; \ + s64 result; \ unsigned long tmp; \ \ prefetchw(&v->counter); \ @@ -346,10 +346,10 @@ atomic64_##op##_return_relaxed(long long i, atomic64_t *v) \ } #define ATOMIC64_FETCH_OP(op, op1, op2) \ -static inline long long \ -atomic64_fetch_##op##_relaxed(long long i, atomic64_t *v) \ +static inline s64 \ +atomic64_fetch_##op##_relaxed(s64 i, atomic64_t *v) \ { \ - long long result, val; \ + s64 result, val; \ unsigned long tmp; \ \ prefetchw(&v->counter); \ @@ -403,10 +403,9 @@ ATOMIC64_OPS(xor, eor, eor) #undef ATOMIC64_OP_RETURN #undef ATOMIC64_OP -static inline long long -atomic64_cmpxchg_relaxed(atomic64_t *ptr, long long old, long long new) +static inline s64 atomic64_cmpxchg_relaxed(atomic64_t *ptr, s64 old, s64 new) { - long long oldval; + s64 oldval; unsigned long res; prefetchw(&ptr->counter); @@ -427,9 +426,9 @@ atomic64_cmpxchg_relaxed(atomic64_t *ptr, long long old, long long new) } #define atomic64_cmpxchg_relaxed atomic64_cmpxchg_relaxed -static inline long long atomic64_xchg_relaxed(atomic64_t *ptr, long long new) +static inline s64 atomic64_xchg_relaxed(atomic64_t *ptr, s64 new) { - long long result; + s64 result; unsigned long tmp; prefetchw(&ptr->counter); @@ -447,9 +446,9 @@ static inline long long atomic64_xchg_relaxed(atomic64_t *ptr, long long new) } #define atomic64_xchg_relaxed atomic64_xchg_relaxed -static inline long long atomic64_dec_if_positive(atomic64_t *v) +static inline s64 atomic64_dec_if_positive(atomic64_t *v) { - long long result; + s64 result; unsigned long tmp; smp_mb(); @@ -475,10 +474,9 @@ static inline long long atomic64_dec_if_positive(atomic64_t *v) } #define atomic64_dec_if_positive atomic64_dec_if_positive -static inline long long atomic64_fetch_add_unless(atomic64_t *v, long long a, - long long u) +static inline s64 atomic64_fetch_add_unless(atomic64_t *v, s64 a, s64 u) { - long long oldval, newval; + s64 oldval, newval; unsigned long tmp; smp_mb(); diff --git a/arch/arm64/include/asm/atomic_ll_sc.h b/arch/arm64/include/asm/atomic_ll_sc.h index 23c378606aed..c8c850bc3dfb 100644 --- a/arch/arm64/include/asm/atomic_ll_sc.h +++ b/arch/arm64/include/asm/atomic_ll_sc.h @@ -122,9 +122,9 @@ ATOMIC_OPS(xor, eor) #define ATOMIC64_OP(op, asm_op) \ __LL_SC_INLINE void \ -__LL_SC_PREFIX(arch_atomic64_##op(long i, atomic64_t *v)) \ +__LL_SC_PREFIX(arch_atomic64_##op(s64 i, atomic64_t *v)) \ { \ - long result; \ + s64 result; \ unsigned long tmp; \ \ asm volatile("// atomic64_" #op "\n" \ @@ -139,10 +139,10 @@ __LL_SC_PREFIX(arch_atomic64_##op(long i, atomic64_t *v)) \ __LL_SC_EXPORT(arch_atomic64_##op); #define ATOMIC64_OP_RETURN(name, mb, acq, rel, cl, op, asm_op) \ -__LL_SC_INLINE long \ -__LL_SC_PREFIX(arch_atomic64_##op##_return##name(long i, atomic64_t *v))\ +__LL_SC_INLINE s64 \ +__LL_SC_PREFIX(arch_atomic64_##op##_return##name(s64 i, atomic64_t *v))\ { \ - long result; \ + s64 result; \ unsigned long tmp; \ \ asm volatile("// atomic64_" #op "_return" #name "\n" \ @@ -161,10 +161,10 @@ __LL_SC_PREFIX(arch_atomic64_##op##_return##name(long i, atomic64_t *v))\ __LL_SC_EXPORT(arch_atomic64_##op##_return##name); #define ATOMIC64_FETCH_OP(name, mb, acq, rel, cl, op, asm_op) \ -__LL_SC_INLINE long \ -__LL_SC_PREFIX(arch_atomic64_fetch_##op##name(long i, atomic64_t *v)) \ +__LL_SC_INLINE s64 \ +__LL_SC_PREFIX(arch_atomic64_fetch_##op##name(s64 i, atomic64_t *v)) \ { \ - long result, val; \ + s64 result, val; \ unsigned long tmp; \ \ asm volatile("// atomic64_fetch_" #op #name "\n" \ @@ -214,10 +214,10 @@ ATOMIC64_OPS(xor, eor) #undef ATOMIC64_OP_RETURN #undef ATOMIC64_OP -__LL_SC_INLINE long +__LL_SC_INLINE s64 __LL_SC_PREFIX(arch_atomic64_dec_if_positive(atomic64_t *v)) { - long result; + s64 result; unsigned long tmp; asm volatile("// atomic64_dec_if_positive\n" diff --git a/arch/arm64/include/asm/atomic_lse.h b/arch/arm64/include/asm/atomic_lse.h index 45e030d54332..69acb1c19a15 100644 --- a/arch/arm64/include/asm/atomic_lse.h +++ b/arch/arm64/include/asm/atomic_lse.h @@ -213,9 +213,9 @@ ATOMIC_FETCH_OP_SUB( , al, "memory") #define __LL_SC_ATOMIC64(op) __LL_SC_CALL(arch_atomic64_##op) #define ATOMIC64_OP(op, asm_op) \ -static inline void arch_atomic64_##op(long i, atomic64_t *v) \ +static inline void arch_atomic64_##op(s64 i, atomic64_t *v) \ { \ - register long x0 asm ("x0") = i; \ + register s64 x0 asm ("x0") = i; \ register atomic64_t *x1 asm ("x1") = v; \ \ asm volatile(ARM64_LSE_ATOMIC_INSN(__LL_SC_ATOMIC64(op), \ @@ -233,9 +233,9 @@ ATOMIC64_OP(add, stadd) #undef ATOMIC64_OP #define ATOMIC64_FETCH_OP(name, mb, op, asm_op, cl...) \ -static inline long arch_atomic64_fetch_##op##name(long i, atomic64_t *v)\ +static inline s64 arch_atomic64_fetch_##op##name(s64 i, atomic64_t *v) \ { \ - register long x0 asm ("x0") = i; \ + register s64 x0 asm ("x0") = i; \ register atomic64_t *x1 asm ("x1") = v; \ \ asm volatile(ARM64_LSE_ATOMIC_INSN( \ @@ -265,9 +265,9 @@ ATOMIC64_FETCH_OPS(add, ldadd) #undef ATOMIC64_FETCH_OPS #define ATOMIC64_OP_ADD_RETURN(name, mb, cl...) \ -static inline long arch_atomic64_add_return##name(long i, atomic64_t *v)\ +static inline s64 arch_atomic64_add_return##name(s64 i, atomic64_t *v) \ { \ - register long x0 asm ("x0") = i; \ + register s64 x0 asm ("x0") = i; \ register atomic64_t *x1 asm ("x1") = v; \ \ asm volatile(ARM64_LSE_ATOMIC_INSN( \ @@ -291,9 +291,9 @@ ATOMIC64_OP_ADD_RETURN( , al, "memory") #undef ATOMIC64_OP_ADD_RETURN -static inline void arch_atomic64_and(long i, atomic64_t *v) +static inline void arch_atomic64_and(s64 i, atomic64_t *v) { - register long x0 asm ("x0") = i; + register s64 x0 asm ("x0") = i; register atomic64_t *x1 asm ("x1") = v; asm volatile(ARM64_LSE_ATOMIC_INSN( @@ -309,9 +309,9 @@ static inline void arch_atomic64_and(long i, atomic64_t *v) } #define ATOMIC64_FETCH_OP_AND(name, mb, cl...) \ -static inline long arch_atomic64_fetch_and##name(long i, atomic64_t *v) \ +static inline s64 arch_atomic64_fetch_and##name(s64 i, atomic64_t *v) \ { \ - register long x0 asm ("x0") = i; \ + register s64 x0 asm ("x0") = i; \ register atomic64_t *x1 asm ("x1") = v; \ \ asm volatile(ARM64_LSE_ATOMIC_INSN( \ @@ -335,9 +335,9 @@ ATOMIC64_FETCH_OP_AND( , al, "memory") #undef ATOMIC64_FETCH_OP_AND -static inline void arch_atomic64_sub(long i, atomic64_t *v) +static inline void arch_atomic64_sub(s64 i, atomic64_t *v) { - register long x0 asm ("x0") = i; + register s64 x0 asm ("x0") = i; register atomic64_t *x1 asm ("x1") = v; asm volatile(ARM64_LSE_ATOMIC_INSN( @@ -353,9 +353,9 @@ static inline void arch_atomic64_sub(long i, atomic64_t *v) } #define ATOMIC64_OP_SUB_RETURN(name, mb, cl...) \ -static inline long arch_atomic64_sub_return##name(long i, atomic64_t *v)\ +static inline s64 arch_atomic64_sub_return##name(s64 i, atomic64_t *v) \ { \ - register long x0 asm ("x0") = i; \ + register s64 x0 asm ("x0") = i; \ register atomic64_t *x1 asm ("x1") = v; \ \ asm volatile(ARM64_LSE_ATOMIC_INSN( \ @@ -381,9 +381,9 @@ ATOMIC64_OP_SUB_RETURN( , al, "memory") #undef ATOMIC64_OP_SUB_RETURN #define ATOMIC64_FETCH_OP_SUB(name, mb, cl...) \ -static inline long arch_atomic64_fetch_sub##name(long i, atomic64_t *v) \ +static inline s64 arch_atomic64_fetch_sub##name(s64 i, atomic64_t *v) \ { \ - register long x0 asm ("x0") = i; \ + register s64 x0 asm ("x0") = i; \ register atomic64_t *x1 asm ("x1") = v; \ \ asm volatile(ARM64_LSE_ATOMIC_INSN( \ @@ -407,7 +407,7 @@ ATOMIC64_FETCH_OP_SUB( , al, "memory") #undef ATOMIC64_FETCH_OP_SUB -static inline long arch_atomic64_dec_if_positive(atomic64_t *v) +static inline s64 arch_atomic64_dec_if_positive(atomic64_t *v) { register long x0 asm ("x0") = (long)v; diff --git a/arch/ia64/include/asm/atomic.h b/arch/ia64/include/asm/atomic.h index 206530d0751b..50440f3ddc43 100644 --- a/arch/ia64/include/asm/atomic.h +++ b/arch/ia64/include/asm/atomic.h @@ -124,10 +124,10 @@ ATOMIC_FETCH_OP(xor, ^) #undef ATOMIC_OP #define ATOMIC64_OP(op, c_op) \ -static __inline__ long \ -ia64_atomic64_##op (__s64 i, atomic64_t *v) \ +static __inline__ s64 \ +ia64_atomic64_##op (s64 i, atomic64_t *v) \ { \ - __s64 old, new; \ + s64 old, new; \ CMPXCHG_BUGCHECK_DECL \ \ do { \ @@ -139,10 +139,10 @@ ia64_atomic64_##op (__s64 i, atomic64_t *v) \ } #define ATOMIC64_FETCH_OP(op, c_op) \ -static __inline__ long \ -ia64_atomic64_fetch_##op (__s64 i, atomic64_t *v) \ +static __inline__ s64 \ +ia64_atomic64_fetch_##op (s64 i, atomic64_t *v) \ { \ - __s64 old, new; \ + s64 old, new; \ CMPXCHG_BUGCHECK_DECL \ \ do { \ @@ -162,7 +162,7 @@ ATOMIC64_OPS(sub, -) #define atomic64_add_return(i,v) \ ({ \ - long __ia64_aar_i = (i); \ + s64 __ia64_aar_i = (i); \ __ia64_atomic_const(i) \ ? ia64_fetch_and_add(__ia64_aar_i, &(v)->counter) \ : ia64_atomic64_add(__ia64_aar_i, v); \ @@ -170,7 +170,7 @@ ATOMIC64_OPS(sub, -) #define atomic64_sub_return(i,v) \ ({ \ - long __ia64_asr_i = (i); \ + s64 __ia64_asr_i = (i); \ __ia64_atomic_const(i) \ ? ia64_fetch_and_add(-__ia64_asr_i, &(v)->counter) \ : ia64_atomic64_sub(__ia64_asr_i, v); \ @@ -178,7 +178,7 @@ ATOMIC64_OPS(sub, -) #define atomic64_fetch_add(i,v) \ ({ \ - long __ia64_aar_i = (i); \ + s64 __ia64_aar_i = (i); \ __ia64_atomic_const(i) \ ? ia64_fetchadd(__ia64_aar_i, &(v)->counter, acq) \ : ia64_atomic64_fetch_add(__ia64_aar_i, v); \ @@ -186,7 +186,7 @@ ATOMIC64_OPS(sub, -) #define atomic64_fetch_sub(i,v) \ ({ \ - long __ia64_asr_i = (i); \ + s64 __ia64_asr_i = (i); \ __ia64_atomic_const(i) \ ? ia64_fetchadd(-__ia64_asr_i, &(v)->counter, acq) \ : ia64_atomic64_fetch_sub(__ia64_asr_i, v); \ diff --git a/arch/mips/include/asm/atomic.h b/arch/mips/include/asm/atomic.h index 94096299fc56..9a82dd11c0e9 100644 --- a/arch/mips/include/asm/atomic.h +++ b/arch/mips/include/asm/atomic.h @@ -254,10 +254,10 @@ static __inline__ int atomic_sub_if_positive(int i, atomic_t * v) #define atomic64_set(v, i) WRITE_ONCE((v)->counter, (i)) #define ATOMIC64_OP(op, c_op, asm_op) \ -static __inline__ void atomic64_##op(long i, atomic64_t * v) \ +static __inline__ void atomic64_##op(s64 i, atomic64_t * v) \ { \ if (kernel_uses_llsc) { \ - long temp; \ + s64 temp; \ \ loongson_llsc_mb(); \ __asm__ __volatile__( \ @@ -280,12 +280,12 @@ static __inline__ void atomic64_##op(long i, atomic64_t * v) \ } #define ATOMIC64_OP_RETURN(op, c_op, asm_op) \ -static __inline__ long atomic64_##op##_return_relaxed(long i, atomic64_t * v) \ +static __inline__ s64 atomic64_##op##_return_relaxed(s64 i, atomic64_t * v) \ { \ - long result; \ + s64 result; \ \ if (kernel_uses_llsc) { \ - long temp; \ + s64 temp; \ \ loongson_llsc_mb(); \ __asm__ __volatile__( \ @@ -314,12 +314,12 @@ static __inline__ long atomic64_##op##_return_relaxed(long i, atomic64_t * v) \ } #define ATOMIC64_FETCH_OP(op, c_op, asm_op) \ -static __inline__ long atomic64_fetch_##op##_relaxed(long i, atomic64_t * v) \ +static __inline__ s64 atomic64_fetch_##op##_relaxed(s64 i, atomic64_t * v) \ { \ - long result; \ + s64 result; \ \ if (kernel_uses_llsc) { \ - long temp; \ + s64 temp; \ \ loongson_llsc_mb(); \ __asm__ __volatile__( \ @@ -386,14 +386,14 @@ ATOMIC64_OPS(xor, ^=, xor) * Atomically test @v and subtract @i if @v is greater or equal than @i. * The function returns the old value of @v minus @i. */ -static __inline__ long atomic64_sub_if_positive(long i, atomic64_t * v) +static __inline__ s64 atomic64_sub_if_positive(s64 i, atomic64_t * v) { - long result; + s64 result; smp_mb__before_llsc(); if (kernel_uses_llsc) { - long temp; + s64 temp; __asm__ __volatile__( " .set push \n" diff --git a/arch/powerpc/include/asm/atomic.h b/arch/powerpc/include/asm/atomic.h index 52eafaf74054..31c231ea56b7 100644 --- a/arch/powerpc/include/asm/atomic.h +++ b/arch/powerpc/include/asm/atomic.h @@ -297,24 +297,24 @@ static __inline__ int atomic_dec_if_positive(atomic_t *v) #define ATOMIC64_INIT(i) { (i) } -static __inline__ long atomic64_read(const atomic64_t *v) +static __inline__ s64 atomic64_read(const atomic64_t *v) { - long t; + s64 t; __asm__ __volatile__("ld%U1%X1 %0,%1" : "=r"(t) : "m"(v->counter)); return t; } -static __inline__ void atomic64_set(atomic64_t *v, long i) +static __inline__ void atomic64_set(atomic64_t *v, s64 i) { __asm__ __volatile__("std%U0%X0 %1,%0" : "=m"(v->counter) : "r"(i)); } #define ATOMIC64_OP(op, asm_op) \ -static __inline__ void atomic64_##op(long a, atomic64_t *v) \ +static __inline__ void atomic64_##op(s64 a, atomic64_t *v) \ { \ - long t; \ + s64 t; \ \ __asm__ __volatile__( \ "1: ldarx %0,0,%3 # atomic64_" #op "\n" \ @@ -327,10 +327,10 @@ static __inline__ void atomic64_##op(long a, atomic64_t *v) \ } #define ATOMIC64_OP_RETURN_RELAXED(op, asm_op) \ -static inline long \ -atomic64_##op##_return_relaxed(long a, atomic64_t *v) \ +static inline s64 \ +atomic64_##op##_return_relaxed(s64 a, atomic64_t *v) \ { \ - long t; \ + s64 t; \ \ __asm__ __volatile__( \ "1: ldarx %0,0,%3 # atomic64_" #op "_return_relaxed\n" \ @@ -345,10 +345,10 @@ atomic64_##op##_return_relaxed(long a, atomic64_t *v) \ } #define ATOMIC64_FETCH_OP_RELAXED(op, asm_op) \ -static inline long \ -atomic64_fetch_##op##_relaxed(long a, atomic64_t *v) \ +static inline s64 \ +atomic64_fetch_##op##_relaxed(s64 a, atomic64_t *v) \ { \ - long res, t; \ + s64 res, t; \ \ __asm__ __volatile__( \ "1: ldarx %0,0,%4 # atomic64_fetch_" #op "_relaxed\n" \ @@ -396,7 +396,7 @@ ATOMIC64_OPS(xor, xor) static __inline__ void atomic64_inc(atomic64_t *v) { - long t; + s64 t; __asm__ __volatile__( "1: ldarx %0,0,%2 # atomic64_inc\n\ @@ -409,9 +409,9 @@ static __inline__ void atomic64_inc(atomic64_t *v) } #define atomic64_inc atomic64_inc -static __inline__ long atomic64_inc_return_relaxed(atomic64_t *v) +static __inline__ s64 atomic64_inc_return_relaxed(atomic64_t *v) { - long t; + s64 t; __asm__ __volatile__( "1: ldarx %0,0,%2 # atomic64_inc_return_relaxed\n" @@ -427,7 +427,7 @@ static __inline__ long atomic64_inc_return_relaxed(atomic64_t *v) static __inline__ void atomic64_dec(atomic64_t *v) { - long t; + s64 t; __asm__ __volatile__( "1: ldarx %0,0,%2 # atomic64_dec\n\ @@ -440,9 +440,9 @@ static __inline__ void atomic64_dec(atomic64_t *v) } #define atomic64_dec atomic64_dec -static __inline__ long atomic64_dec_return_relaxed(atomic64_t *v) +static __inline__ s64 atomic64_dec_return_relaxed(atomic64_t *v) { - long t; + s64 t; __asm__ __volatile__( "1: ldarx %0,0,%2 # atomic64_dec_return_relaxed\n" @@ -463,9 +463,9 @@ static __inline__ long atomic64_dec_return_relaxed(atomic64_t *v) * Atomically test *v and decrement if it is greater than 0. * The function returns the old value of *v minus 1. */ -static __inline__ long atomic64_dec_if_positive(atomic64_t *v) +static __inline__ s64 atomic64_dec_if_positive(atomic64_t *v) { - long t; + s64 t; __asm__ __volatile__( PPC_ATOMIC_ENTRY_BARRIER @@ -502,9 +502,9 @@ static __inline__ long atomic64_dec_if_positive(atomic64_t *v) * Atomically adds @a to @v, so long as it was not @u. * Returns the old value of @v. */ -static __inline__ long atomic64_fetch_add_unless(atomic64_t *v, long a, long u) +static __inline__ s64 atomic64_fetch_add_unless(atomic64_t *v, s64 a, s64 u) { - long t; + s64 t; __asm__ __volatile__ ( PPC_ATOMIC_ENTRY_BARRIER @@ -534,7 +534,7 @@ static __inline__ long atomic64_fetch_add_unless(atomic64_t *v, long a, long u) */ static __inline__ int atomic64_inc_not_zero(atomic64_t *v) { - long t1, t2; + s64 t1, t2; __asm__ __volatile__ ( PPC_ATOMIC_ENTRY_BARRIER diff --git a/arch/riscv/include/asm/atomic.h b/arch/riscv/include/asm/atomic.h index 9038aeb900a6..96f95c9ebd97 100644 --- a/arch/riscv/include/asm/atomic.h +++ b/arch/riscv/include/asm/atomic.h @@ -38,11 +38,11 @@ static __always_inline void atomic_set(atomic_t *v, int i) #ifndef CONFIG_GENERIC_ATOMIC64 #define ATOMIC64_INIT(i) { (i) } -static __always_inline long atomic64_read(const atomic64_t *v) +static __always_inline s64 atomic64_read(const atomic64_t *v) { return READ_ONCE(v->counter); } -static __always_inline void atomic64_set(atomic64_t *v, long i) +static __always_inline void atomic64_set(atomic64_t *v, s64 i) { WRITE_ONCE(v->counter, i); } @@ -66,11 +66,11 @@ void atomic##prefix##_##op(c_type i, atomic##prefix##_t *v) \ #ifdef CONFIG_GENERIC_ATOMIC64 #define ATOMIC_OPS(op, asm_op, I) \ - ATOMIC_OP (op, asm_op, I, w, int, ) + ATOMIC_OP (op, asm_op, I, w, int, ) #else #define ATOMIC_OPS(op, asm_op, I) \ - ATOMIC_OP (op, asm_op, I, w, int, ) \ - ATOMIC_OP (op, asm_op, I, d, long, 64) + ATOMIC_OP (op, asm_op, I, w, int, ) \ + ATOMIC_OP (op, asm_op, I, d, s64, 64) #endif ATOMIC_OPS(add, add, i) @@ -127,14 +127,14 @@ c_type atomic##prefix##_##op##_return(c_type i, atomic##prefix##_t *v) \ #ifdef CONFIG_GENERIC_ATOMIC64 #define ATOMIC_OPS(op, asm_op, c_op, I) \ - ATOMIC_FETCH_OP( op, asm_op, I, w, int, ) \ - ATOMIC_OP_RETURN(op, asm_op, c_op, I, w, int, ) + ATOMIC_FETCH_OP( op, asm_op, I, w, int, ) \ + ATOMIC_OP_RETURN(op, asm_op, c_op, I, w, int, ) #else #define ATOMIC_OPS(op, asm_op, c_op, I) \ - ATOMIC_FETCH_OP( op, asm_op, I, w, int, ) \ - ATOMIC_OP_RETURN(op, asm_op, c_op, I, w, int, ) \ - ATOMIC_FETCH_OP( op, asm_op, I, d, long, 64) \ - ATOMIC_OP_RETURN(op, asm_op, c_op, I, d, long, 64) + ATOMIC_FETCH_OP( op, asm_op, I, w, int, ) \ + ATOMIC_OP_RETURN(op, asm_op, c_op, I, w, int, ) \ + ATOMIC_FETCH_OP( op, asm_op, I, d, s64, 64) \ + ATOMIC_OP_RETURN(op, asm_op, c_op, I, d, s64, 64) #endif ATOMIC_OPS(add, add, +, i) @@ -166,11 +166,11 @@ ATOMIC_OPS(sub, add, +, -i) #ifdef CONFIG_GENERIC_ATOMIC64 #define ATOMIC_OPS(op, asm_op, I) \ - ATOMIC_FETCH_OP(op, asm_op, I, w, int, ) + ATOMIC_FETCH_OP(op, asm_op, I, w, int, ) #else #define ATOMIC_OPS(op, asm_op, I) \ - ATOMIC_FETCH_OP(op, asm_op, I, w, int, ) \ - ATOMIC_FETCH_OP(op, asm_op, I, d, long, 64) + ATOMIC_FETCH_OP(op, asm_op, I, w, int, ) \ + ATOMIC_FETCH_OP(op, asm_op, I, d, s64, 64) #endif ATOMIC_OPS(and, and, i) @@ -219,9 +219,10 @@ static __always_inline int atomic_fetch_add_unless(atomic_t *v, int a, int u) #define atomic_fetch_add_unless atomic_fetch_add_unless #ifndef CONFIG_GENERIC_ATOMIC64 -static __always_inline long atomic64_fetch_add_unless(atomic64_t *v, long a, long u) +static __always_inline s64 atomic64_fetch_add_unless(atomic64_t *v, s64 a, s64 u) { - long prev, rc; + s64 prev; + long rc; __asm__ __volatile__ ( "0: lr.d %[p], %[c]\n" @@ -290,11 +291,11 @@ c_t atomic##prefix##_cmpxchg(atomic##prefix##_t *v, c_t o, c_t n) \ #ifdef CONFIG_GENERIC_ATOMIC64 #define ATOMIC_OPS() \ - ATOMIC_OP( int, , 4) + ATOMIC_OP(int, , 4) #else #define ATOMIC_OPS() \ - ATOMIC_OP( int, , 4) \ - ATOMIC_OP(long, 64, 8) + ATOMIC_OP(int, , 4) \ + ATOMIC_OP(s64, 64, 8) #endif ATOMIC_OPS() @@ -332,9 +333,10 @@ static __always_inline int atomic_sub_if_positive(atomic_t *v, int offset) #define atomic_dec_if_positive(v) atomic_sub_if_positive(v, 1) #ifndef CONFIG_GENERIC_ATOMIC64 -static __always_inline long atomic64_sub_if_positive(atomic64_t *v, int offset) +static __always_inline s64 atomic64_sub_if_positive(atomic64_t *v, s64 offset) { - long prev, rc; + s64 prev; + long rc; __asm__ __volatile__ ( "0: lr.d %[p], %[c]\n" diff --git a/arch/s390/include/asm/atomic.h b/arch/s390/include/asm/atomic.h index fd20ab5d4cf7..491ad53a0d4e 100644 --- a/arch/s390/include/asm/atomic.h +++ b/arch/s390/include/asm/atomic.h @@ -84,9 +84,9 @@ static inline int atomic_cmpxchg(atomic_t *v, int old, int new) #define ATOMIC64_INIT(i) { (i) } -static inline long atomic64_read(const atomic64_t *v) +static inline s64 atomic64_read(const atomic64_t *v) { - long c; + s64 c; asm volatile( " lg %0,%1\n" @@ -94,49 +94,49 @@ static inline long atomic64_read(const atomic64_t *v) return c; } -static inline void atomic64_set(atomic64_t *v, long i) +static inline void atomic64_set(atomic64_t *v, s64 i) { asm volatile( " stg %1,%0\n" : "=Q" (v->counter) : "d" (i)); } -static inline long atomic64_add_return(long i, atomic64_t *v) +static inline s64 atomic64_add_return(s64 i, atomic64_t *v) { - return __atomic64_add_barrier(i, &v->counter) + i; + return __atomic64_add_barrier(i, (long *)&v->counter) + i; } -static inline long atomic64_fetch_add(long i, atomic64_t *v) +static inline s64 atomic64_fetch_add(s64 i, atomic64_t *v) { - return __atomic64_add_barrier(i, &v->counter); + return __atomic64_add_barrier(i, (long *)&v->counter); } -static inline void atomic64_add(long i, atomic64_t *v) +static inline void atomic64_add(s64 i, atomic64_t *v) { #ifdef CONFIG_HAVE_MARCH_Z196_FEATURES if (__builtin_constant_p(i) && (i > -129) && (i < 128)) { - __atomic64_add_const(i, &v->counter); + __atomic64_add_const(i, (long *)&v->counter); return; } #endif - __atomic64_add(i, &v->counter); + __atomic64_add(i, (long *)&v->counter); } #define atomic64_xchg(v, new) (xchg(&((v)->counter), new)) -static inline long atomic64_cmpxchg(atomic64_t *v, long old, long new) +static inline s64 atomic64_cmpxchg(atomic64_t *v, s64 old, s64 new) { - return __atomic64_cmpxchg(&v->counter, old, new); + return __atomic64_cmpxchg((long *)&v->counter, old, new); } #define ATOMIC64_OPS(op) \ -static inline void atomic64_##op(long i, atomic64_t *v) \ +static inline void atomic64_##op(s64 i, atomic64_t *v) \ { \ - __atomic64_##op(i, &v->counter); \ + __atomic64_##op(i, (long *)&v->counter); \ } \ -static inline long atomic64_fetch_##op(long i, atomic64_t *v) \ +static inline long atomic64_fetch_##op(s64 i, atomic64_t *v) \ { \ - return __atomic64_##op##_barrier(i, &v->counter); \ + return __atomic64_##op##_barrier(i, (long *)&v->counter); \ } ATOMIC64_OPS(and) @@ -145,8 +145,8 @@ ATOMIC64_OPS(xor) #undef ATOMIC64_OPS -#define atomic64_sub_return(_i, _v) atomic64_add_return(-(long)(_i), _v) -#define atomic64_fetch_sub(_i, _v) atomic64_fetch_add(-(long)(_i), _v) -#define atomic64_sub(_i, _v) atomic64_add(-(long)(_i), _v) +#define atomic64_sub_return(_i, _v) atomic64_add_return(-(s64)(_i), _v) +#define atomic64_fetch_sub(_i, _v) atomic64_fetch_add(-(s64)(_i), _v) +#define atomic64_sub(_i, _v) atomic64_add(-(s64)(_i), _v) #endif /* __ARCH_S390_ATOMIC__ */ diff --git a/arch/s390/pci/pci_debug.c b/arch/s390/pci/pci_debug.c index 6b48ca7760a7..3408c0df3ebf 100644 --- a/arch/s390/pci/pci_debug.c +++ b/arch/s390/pci/pci_debug.c @@ -74,7 +74,7 @@ static void pci_sw_counter_show(struct seq_file *m) int i; for (i = 0; i < ARRAY_SIZE(pci_sw_names); i++, counter++) - seq_printf(m, "%26s:\t%lu\n", pci_sw_names[i], + seq_printf(m, "%26s:\t%llu\n", pci_sw_names[i], atomic64_read(counter)); } diff --git a/arch/sparc/include/asm/atomic_64.h b/arch/sparc/include/asm/atomic_64.h index 6963482c81d8..b60448397d4f 100644 --- a/arch/sparc/include/asm/atomic_64.h +++ b/arch/sparc/include/asm/atomic_64.h @@ -23,15 +23,15 @@ #define ATOMIC_OP(op) \ void atomic_##op(int, atomic_t *); \ -void atomic64_##op(long, atomic64_t *); +void atomic64_##op(s64, atomic64_t *); #define ATOMIC_OP_RETURN(op) \ int atomic_##op##_return(int, atomic_t *); \ -long atomic64_##op##_return(long, atomic64_t *); +s64 atomic64_##op##_return(s64, atomic64_t *); #define ATOMIC_FETCH_OP(op) \ int atomic_fetch_##op(int, atomic_t *); \ -long atomic64_fetch_##op(long, atomic64_t *); +s64 atomic64_fetch_##op(s64, atomic64_t *); #define ATOMIC_OPS(op) ATOMIC_OP(op) ATOMIC_OP_RETURN(op) ATOMIC_FETCH_OP(op) @@ -61,7 +61,7 @@ static inline int atomic_xchg(atomic_t *v, int new) ((__typeof__((v)->counter))cmpxchg(&((v)->counter), (o), (n))) #define atomic64_xchg(v, new) (xchg(&((v)->counter), new)) -long atomic64_dec_if_positive(atomic64_t *v); +s64 atomic64_dec_if_positive(atomic64_t *v); #define atomic64_dec_if_positive atomic64_dec_if_positive #endif /* !(__ARCH_SPARC64_ATOMIC__) */ diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 3cd94a21bd53..ceb712b0a1c6 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -2179,7 +2179,7 @@ static void x86_pmu_event_mapped(struct perf_event *event, struct mm_struct *mm) * For now, this can't happen because all callers hold mmap_sem * for write. If this changes, we'll need a different solution. */ - lockdep_assert_held_exclusive(&mm->mmap_sem); + lockdep_assert_held_write(&mm->mmap_sem); if (atomic_inc_return(&mm->context.perf_rdpmc_allowed) == 1) on_each_cpu_mask(mm_cpumask(mm), refresh_pce, NULL, 1); diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h index ea3d95275b43..115127c7ad28 100644 --- a/arch/x86/include/asm/atomic.h +++ b/arch/x86/include/asm/atomic.h @@ -54,7 +54,7 @@ static __always_inline void arch_atomic_add(int i, atomic_t *v) { asm volatile(LOCK_PREFIX "addl %1,%0" : "+m" (v->counter) - : "ir" (i)); + : "ir" (i) : "memory"); } /** @@ -68,7 +68,7 @@ static __always_inline void arch_atomic_sub(int i, atomic_t *v) { asm volatile(LOCK_PREFIX "subl %1,%0" : "+m" (v->counter) - : "ir" (i)); + : "ir" (i) : "memory"); } /** @@ -95,7 +95,7 @@ static __always_inline bool arch_atomic_sub_and_test(int i, atomic_t *v) static __always_inline void arch_atomic_inc(atomic_t *v) { asm volatile(LOCK_PREFIX "incl %0" - : "+m" (v->counter)); + : "+m" (v->counter) :: "memory"); } #define arch_atomic_inc arch_atomic_inc @@ -108,7 +108,7 @@ static __always_inline void arch_atomic_inc(atomic_t *v) static __always_inline void arch_atomic_dec(atomic_t *v) { asm volatile(LOCK_PREFIX "decl %0" - : "+m" (v->counter)); + : "+m" (v->counter) :: "memory"); } #define arch_atomic_dec arch_atomic_dec diff --git a/arch/x86/include/asm/atomic64_32.h b/arch/x86/include/asm/atomic64_32.h index 6a5b0ec460da..52cfaecb13f9 100644 --- a/arch/x86/include/asm/atomic64_32.h +++ b/arch/x86/include/asm/atomic64_32.h @@ -9,7 +9,7 @@ /* An 64bit atomic type */ typedef struct { - u64 __aligned(8) counter; + s64 __aligned(8) counter; } atomic64_t; #define ATOMIC64_INIT(val) { (val) } @@ -71,8 +71,7 @@ ATOMIC64_DECL(add_unless); * the old value. */ -static inline long long arch_atomic64_cmpxchg(atomic64_t *v, long long o, - long long n) +static inline s64 arch_atomic64_cmpxchg(atomic64_t *v, s64 o, s64 n) { return arch_cmpxchg64(&v->counter, o, n); } @@ -85,9 +84,9 @@ static inline long long arch_atomic64_cmpxchg(atomic64_t *v, long long o, * Atomically xchgs the value of @v to @n and returns * the old value. */ -static inline long long arch_atomic64_xchg(atomic64_t *v, long long n) +static inline s64 arch_atomic64_xchg(atomic64_t *v, s64 n) { - long long o; + s64 o; unsigned high = (unsigned)(n >> 32); unsigned low = (unsigned)n; alternative_atomic64(xchg, "=&A" (o), @@ -103,7 +102,7 @@ static inline long long arch_atomic64_xchg(atomic64_t *v, long long n) * * Atomically sets the value of @v to @n. */ -static inline void arch_atomic64_set(atomic64_t *v, long long i) +static inline void arch_atomic64_set(atomic64_t *v, s64 i) { unsigned high = (unsigned)(i >> 32); unsigned low = (unsigned)i; @@ -118,9 +117,9 @@ static inline void arch_atomic64_set(atomic64_t *v, long long i) * * Atomically reads the value of @v and returns it. */ -static inline long long arch_atomic64_read(const atomic64_t *v) +static inline s64 arch_atomic64_read(const atomic64_t *v) { - long long r; + s64 r; alternative_atomic64(read, "=&A" (r), "c" (v) : "memory"); return r; } @@ -132,7 +131,7 @@ static inline long long arch_atomic64_read(const atomic64_t *v) * * Atomically adds @i to @v and returns @i + *@v */ -static inline long long arch_atomic64_add_return(long long i, atomic64_t *v) +static inline s64 arch_atomic64_add_return(s64 i, atomic64_t *v) { alternative_atomic64(add_return, ASM_OUTPUT2("+A" (i), "+c" (v)), @@ -143,7 +142,7 @@ static inline long long arch_atomic64_add_return(long long i, atomic64_t *v) /* * Other variants with different arithmetic operators: */ -static inline long long arch_atomic64_sub_return(long long i, atomic64_t *v) +static inline s64 arch_atomic64_sub_return(s64 i, atomic64_t *v) { alternative_atomic64(sub_return, ASM_OUTPUT2("+A" (i), "+c" (v)), @@ -151,18 +150,18 @@ static inline long long arch_atomic64_sub_return(long long i, atomic64_t *v) return i; } -static inline long long arch_atomic64_inc_return(atomic64_t *v) +static inline s64 arch_atomic64_inc_return(atomic64_t *v) { - long long a; + s64 a; alternative_atomic64(inc_return, "=&A" (a), "S" (v) : "memory", "ecx"); return a; } #define arch_atomic64_inc_return arch_atomic64_inc_return -static inline long long arch_atomic64_dec_return(atomic64_t *v) +static inline s64 arch_atomic64_dec_return(atomic64_t *v) { - long long a; + s64 a; alternative_atomic64(dec_return, "=&A" (a), "S" (v) : "memory", "ecx"); return a; @@ -176,7 +175,7 @@ static inline long long arch_atomic64_dec_return(atomic64_t *v) * * Atomically adds @i to @v. */ -static inline long long arch_atomic64_add(long long i, atomic64_t *v) +static inline s64 arch_atomic64_add(s64 i, atomic64_t *v) { __alternative_atomic64(add, add_return, ASM_OUTPUT2("+A" (i), "+c" (v)), @@ -191,7 +190,7 @@ static inline long long arch_atomic64_add(long long i, atomic64_t *v) * * Atomically subtracts @i from @v. */ -static inline long long arch_atomic64_sub(long long i, atomic64_t *v) +static inline s64 arch_atomic64_sub(s64 i, atomic64_t *v) { __alternative_atomic64(sub, sub_return, ASM_OUTPUT2("+A" (i), "+c" (v)), @@ -234,8 +233,7 @@ static inline void arch_atomic64_dec(atomic64_t *v) * Atomically adds @a to @v, so long as it was not @u. * Returns non-zero if the add was done, zero otherwise. */ -static inline int arch_atomic64_add_unless(atomic64_t *v, long long a, - long long u) +static inline int arch_atomic64_add_unless(atomic64_t *v, s64 a, s64 u) { unsigned low = (unsigned)u; unsigned high = (unsigned)(u >> 32); @@ -254,9 +252,9 @@ static inline int arch_atomic64_inc_not_zero(atomic64_t *v) } #define arch_atomic64_inc_not_zero arch_atomic64_inc_not_zero -static inline long long arch_atomic64_dec_if_positive(atomic64_t *v) +static inline s64 arch_atomic64_dec_if_positive(atomic64_t *v) { - long long r; + s64 r; alternative_atomic64(dec_if_positive, "=&A" (r), "S" (v) : "ecx", "memory"); return r; @@ -266,17 +264,17 @@ static inline long long arch_atomic64_dec_if_positive(atomic64_t *v) #undef alternative_atomic64 #undef __alternative_atomic64 -static inline void arch_atomic64_and(long long i, atomic64_t *v) +static inline void arch_atomic64_and(s64 i, atomic64_t *v) { - long long old, c = 0; + s64 old, c = 0; while ((old = arch_atomic64_cmpxchg(v, c, c & i)) != c) c = old; } -static inline long long arch_atomic64_fetch_and(long long i, atomic64_t *v) +static inline s64 arch_atomic64_fetch_and(s64 i, atomic64_t *v) { - long long old, c = 0; + s64 old, c = 0; while ((old = arch_atomic64_cmpxchg(v, c, c & i)) != c) c = old; @@ -284,17 +282,17 @@ static inline long long arch_atomic64_fetch_and(long long i, atomic64_t *v) return old; } -static inline void arch_atomic64_or(long long i, atomic64_t *v) +static inline void arch_atomic64_or(s64 i, atomic64_t *v) { - long long old, c = 0; + s64 old, c = 0; while ((old = arch_atomic64_cmpxchg(v, c, c | i)) != c) c = old; } -static inline long long arch_atomic64_fetch_or(long long i, atomic64_t *v) +static inline s64 arch_atomic64_fetch_or(s64 i, atomic64_t *v) { - long long old, c = 0; + s64 old, c = 0; while ((old = arch_atomic64_cmpxchg(v, c, c | i)) != c) c = old; @@ -302,17 +300,17 @@ static inline long long arch_atomic64_fetch_or(long long i, atomic64_t *v) return old; } -static inline void arch_atomic64_xor(long long i, atomic64_t *v) +static inline void arch_atomic64_xor(s64 i, atomic64_t *v) { - long long old, c = 0; + s64 old, c = 0; while ((old = arch_atomic64_cmpxchg(v, c, c ^ i)) != c) c = old; } -static inline long long arch_atomic64_fetch_xor(long long i, atomic64_t *v) +static inline s64 arch_atomic64_fetch_xor(s64 i, atomic64_t *v) { - long long old, c = 0; + s64 old, c = 0; while ((old = arch_atomic64_cmpxchg(v, c, c ^ i)) != c) c = old; @@ -320,9 +318,9 @@ static inline long long arch_atomic64_fetch_xor(long long i, atomic64_t *v) return old; } -static inline long long arch_atomic64_fetch_add(long long i, atomic64_t *v) +static inline s64 arch_atomic64_fetch_add(s64 i, atomic64_t *v) { - long long old, c = 0; + s64 old, c = 0; while ((old = arch_atomic64_cmpxchg(v, c, c + i)) != c) c = old; diff --git a/arch/x86/include/asm/atomic64_64.h b/arch/x86/include/asm/atomic64_64.h index dadc20adba21..95c6ceac66b9 100644 --- a/arch/x86/include/asm/atomic64_64.h +++ b/arch/x86/include/asm/atomic64_64.h @@ -17,7 +17,7 @@ * Atomically reads the value of @v. * Doesn't imply a read memory barrier. */ -static inline long arch_atomic64_read(const atomic64_t *v) +static inline s64 arch_atomic64_read(const atomic64_t *v) { return READ_ONCE((v)->counter); } @@ -29,7 +29,7 @@ static inline long arch_atomic64_read(const atomic64_t *v) * * Atomically sets the value of @v to @i. */ -static inline void arch_atomic64_set(atomic64_t *v, long i) +static inline void arch_atomic64_set(atomic64_t *v, s64 i) { WRITE_ONCE(v->counter, i); } @@ -41,11 +41,11 @@ static inline void arch_atomic64_set(atomic64_t *v, long i) * * Atomically adds @i to @v. */ -static __always_inline void arch_atomic64_add(long i, atomic64_t *v) +static __always_inline void arch_atomic64_add(s64 i, atomic64_t *v) { asm volatile(LOCK_PREFIX "addq %1,%0" : "=m" (v->counter) - : "er" (i), "m" (v->counter)); + : "er" (i), "m" (v->counter) : "memory"); } /** @@ -55,11 +55,11 @@ static __always_inline void arch_atomic64_add(long i, atomic64_t *v) * * Atomically subtracts @i from @v. */ -static inline void arch_atomic64_sub(long i, atomic64_t *v) +static inline void arch_atomic64_sub(s64 i, atomic64_t *v) { asm volatile(LOCK_PREFIX "subq %1,%0" : "=m" (v->counter) - : "er" (i), "m" (v->counter)); + : "er" (i), "m" (v->counter) : "memory"); } /** @@ -71,7 +71,7 @@ static inline void arch_atomic64_sub(long i, atomic64_t *v) * true if the result is zero, or false for all * other cases. */ -static inline bool arch_atomic64_sub_and_test(long i, atomic64_t *v) +static inline bool arch_atomic64_sub_and_test(s64 i, atomic64_t *v) { return GEN_BINARY_RMWcc(LOCK_PREFIX "subq", v->counter, e, "er", i); } @@ -87,7 +87,7 @@ static __always_inline void arch_atomic64_inc(atomic64_t *v) { asm volatile(LOCK_PREFIX "incq %0" : "=m" (v->counter) - : "m" (v->counter)); + : "m" (v->counter) : "memory"); } #define arch_atomic64_inc arch_atomic64_inc @@ -101,7 +101,7 @@ static __always_inline void arch_atomic64_dec(atomic64_t *v) { asm volatile(LOCK_PREFIX "decq %0" : "=m" (v->counter) - : "m" (v->counter)); + : "m" (v->counter) : "memory"); } #define arch_atomic64_dec arch_atomic64_dec @@ -142,7 +142,7 @@ static inline bool arch_atomic64_inc_and_test(atomic64_t *v) * if the result is negative, or false when * result is greater than or equal to zero. */ -static inline bool arch_atomic64_add_negative(long i, atomic64_t *v) +static inline bool arch_atomic64_add_negative(s64 i, atomic64_t *v) { return GEN_BINARY_RMWcc(LOCK_PREFIX "addq", v->counter, s, "er", i); } @@ -155,43 +155,43 @@ static inline bool arch_atomic64_add_negative(long i, atomic64_t *v) * * Atomically adds @i to @v and returns @i + @v */ -static __always_inline long arch_atomic64_add_return(long i, atomic64_t *v) +static __always_inline s64 arch_atomic64_add_return(s64 i, atomic64_t *v) { return i + xadd(&v->counter, i); } -static inline long arch_atomic64_sub_return(long i, atomic64_t *v) +static inline s64 arch_atomic64_sub_return(s64 i, atomic64_t *v) { return arch_atomic64_add_return(-i, v); } -static inline long arch_atomic64_fetch_add(long i, atomic64_t *v) +static inline s64 arch_atomic64_fetch_add(s64 i, atomic64_t *v) { return xadd(&v->counter, i); } -static inline long arch_atomic64_fetch_sub(long i, atomic64_t *v) +static inline s64 arch_atomic64_fetch_sub(s64 i, atomic64_t *v) { return xadd(&v->counter, -i); } -static inline long arch_atomic64_cmpxchg(atomic64_t *v, long old, long new) +static inline s64 arch_atomic64_cmpxchg(atomic64_t *v, s64 old, s64 new) { return arch_cmpxchg(&v->counter, old, new); } #define arch_atomic64_try_cmpxchg arch_atomic64_try_cmpxchg -static __always_inline bool arch_atomic64_try_cmpxchg(atomic64_t *v, s64 *old, long new) +static __always_inline bool arch_atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new) { return try_cmpxchg(&v->counter, old, new); } -static inline long arch_atomic64_xchg(atomic64_t *v, long new) +static inline s64 arch_atomic64_xchg(atomic64_t *v, s64 new) { return arch_xchg(&v->counter, new); } -static inline void arch_atomic64_and(long i, atomic64_t *v) +static inline void arch_atomic64_and(s64 i, atomic64_t *v) { asm volatile(LOCK_PREFIX "andq %1,%0" : "+m" (v->counter) @@ -199,7 +199,7 @@ static inline void arch_atomic64_and(long i, atomic64_t *v) : "memory"); } -static inline long arch_atomic64_fetch_and(long i, atomic64_t *v) +static inline s64 arch_atomic64_fetch_and(s64 i, atomic64_t *v) { s64 val = arch_atomic64_read(v); @@ -208,7 +208,7 @@ static inline long arch_atomic64_fetch_and(long i, atomic64_t *v) return val; } -static inline void arch_atomic64_or(long i, atomic64_t *v) +static inline void arch_atomic64_or(s64 i, atomic64_t *v) { asm volatile(LOCK_PREFIX "orq %1,%0" : "+m" (v->counter) @@ -216,7 +216,7 @@ static inline void arch_atomic64_or(long i, atomic64_t *v) : "memory"); } -static inline long arch_atomic64_fetch_or(long i, atomic64_t *v) +static inline s64 arch_atomic64_fetch_or(s64 i, atomic64_t *v) { s64 val = arch_atomic64_read(v); @@ -225,7 +225,7 @@ static inline long arch_atomic64_fetch_or(long i, atomic64_t *v) return val; } -static inline void arch_atomic64_xor(long i, atomic64_t *v) +static inline void arch_atomic64_xor(s64 i, atomic64_t *v) { asm volatile(LOCK_PREFIX "xorq %1,%0" : "+m" (v->counter) @@ -233,7 +233,7 @@ static inline void arch_atomic64_xor(long i, atomic64_t *v) : "memory"); } -static inline long arch_atomic64_fetch_xor(long i, atomic64_t *v) +static inline s64 arch_atomic64_fetch_xor(s64 i, atomic64_t *v) { s64 val = arch_atomic64_read(v); diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h index 14de0432d288..84f848c2541a 100644 --- a/arch/x86/include/asm/barrier.h +++ b/arch/x86/include/asm/barrier.h @@ -80,8 +80,8 @@ do { \ }) /* Atomic operations are already serializing on x86 */ -#define __smp_mb__before_atomic() barrier() -#define __smp_mb__after_atomic() barrier() +#define __smp_mb__before_atomic() do { } while (0) +#define __smp_mb__after_atomic() do { } while (0) #include <asm-generic/barrier.h> diff --git a/arch/x86/include/asm/irq_regs.h b/arch/x86/include/asm/irq_regs.h index 8f3bee821e6c..187ce59aea28 100644 --- a/arch/x86/include/asm/irq_regs.h +++ b/arch/x86/include/asm/irq_regs.h @@ -16,7 +16,7 @@ DECLARE_PER_CPU(struct pt_regs *, irq_regs); static inline struct pt_regs *get_irq_regs(void) { - return this_cpu_read(irq_regs); + return __this_cpu_read(irq_regs); } static inline struct pt_regs *set_irq_regs(struct pt_regs *new_regs) @@ -24,7 +24,7 @@ static inline struct pt_regs *set_irq_regs(struct pt_regs *new_regs) struct pt_regs *old_regs; old_regs = get_irq_regs(); - this_cpu_write(irq_regs, new_regs); + __this_cpu_write(irq_regs, new_regs); return old_regs; } diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h index 65191ce8e1cf..06c3cc22a058 100644 --- a/arch/x86/include/asm/jump_label.h +++ b/arch/x86/include/asm/jump_label.h @@ -2,6 +2,8 @@ #ifndef _ASM_X86_JUMP_LABEL_H #define _ASM_X86_JUMP_LABEL_H +#define HAVE_JUMP_LABEL_BATCH + #define JUMP_LABEL_NOP_SIZE 5 #ifdef CONFIG_X86_64 diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h index 1a19d11cfbbd..2278797c769d 100644 --- a/arch/x86/include/asm/percpu.h +++ b/arch/x86/include/asm/percpu.h @@ -87,7 +87,7 @@ * don't give an lvalue though). */ extern void __bad_percpu_size(void); -#define percpu_to_op(op, var, val) \ +#define percpu_to_op(qual, op, var, val) \ do { \ typedef typeof(var) pto_T__; \ if (0) { \ @@ -97,22 +97,22 @@ do { \ } \ switch (sizeof(var)) { \ case 1: \ - asm(op "b %1,"__percpu_arg(0) \ + asm qual (op "b %1,"__percpu_arg(0) \ : "+m" (var) \ : "qi" ((pto_T__)(val))); \ break; \ case 2: \ - asm(op "w %1,"__percpu_arg(0) \ + asm qual (op "w %1,"__percpu_arg(0) \ : "+m" (var) \ : "ri" ((pto_T__)(val))); \ break; \ case 4: \ - asm(op "l %1,"__percpu_arg(0) \ + asm qual (op "l %1,"__percpu_arg(0) \ : "+m" (var) \ : "ri" ((pto_T__)(val))); \ break; \ case 8: \ - asm(op "q %1,"__percpu_arg(0) \ + asm qual (op "q %1,"__percpu_arg(0) \ : "+m" (var) \ : "re" ((pto_T__)(val))); \ break; \ @@ -124,7 +124,7 @@ do { \ * Generate a percpu add to memory instruction and optimize code * if one is added or subtracted. */ -#define percpu_add_op(var, val) \ +#define percpu_add_op(qual, var, val) \ do { \ typedef typeof(var) pao_T__; \ const int pao_ID__ = (__builtin_constant_p(val) && \ @@ -138,41 +138,41 @@ do { \ switch (sizeof(var)) { \ case 1: \ if (pao_ID__ == 1) \ - asm("incb "__percpu_arg(0) : "+m" (var)); \ + asm qual ("incb "__percpu_arg(0) : "+m" (var)); \ else if (pao_ID__ == -1) \ - asm("decb "__percpu_arg(0) : "+m" (var)); \ + asm qual ("decb "__percpu_arg(0) : "+m" (var)); \ else \ - asm("addb %1, "__percpu_arg(0) \ + asm qual ("addb %1, "__percpu_arg(0) \ : "+m" (var) \ : "qi" ((pao_T__)(val))); \ break; \ case 2: \ if (pao_ID__ == 1) \ - asm("incw "__percpu_arg(0) : "+m" (var)); \ + asm qual ("incw "__percpu_arg(0) : "+m" (var)); \ else if (pao_ID__ == -1) \ - asm("decw "__percpu_arg(0) : "+m" (var)); \ + asm qual ("decw "__percpu_arg(0) : "+m" (var)); \ else \ - asm("addw %1, "__percpu_arg(0) \ + asm qual ("addw %1, "__percpu_arg(0) \ : "+m" (var) \ : "ri" ((pao_T__)(val))); \ break; \ case 4: \ if (pao_ID__ == 1) \ - asm("incl "__percpu_arg(0) : "+m" (var)); \ + asm qual ("incl "__percpu_arg(0) : "+m" (var)); \ else if (pao_ID__ == -1) \ - asm("decl "__percpu_arg(0) : "+m" (var)); \ + asm qual ("decl "__percpu_arg(0) : "+m" (var)); \ else \ - asm("addl %1, "__percpu_arg(0) \ + asm qual ("addl %1, "__percpu_arg(0) \ : "+m" (var) \ : "ri" ((pao_T__)(val))); \ break; \ case 8: \ if (pao_ID__ == 1) \ - asm("incq "__percpu_arg(0) : "+m" (var)); \ + asm qual ("incq "__percpu_arg(0) : "+m" (var)); \ else if (pao_ID__ == -1) \ - asm("decq "__percpu_arg(0) : "+m" (var)); \ + asm qual ("decq "__percpu_arg(0) : "+m" (var)); \ else \ - asm("addq %1, "__percpu_arg(0) \ + asm qual ("addq %1, "__percpu_arg(0) \ : "+m" (var) \ : "re" ((pao_T__)(val))); \ break; \ @@ -180,27 +180,27 @@ do { \ } \ } while (0) -#define percpu_from_op(op, var) \ +#define percpu_from_op(qual, op, var) \ ({ \ typeof(var) pfo_ret__; \ switch (sizeof(var)) { \ case 1: \ - asm volatile(op "b "__percpu_arg(1)",%0"\ + asm qual (op "b "__percpu_arg(1)",%0" \ : "=q" (pfo_ret__) \ : "m" (var)); \ break; \ case 2: \ - asm volatile(op "w "__percpu_arg(1)",%0"\ + asm qual (op "w "__percpu_arg(1)",%0" \ : "=r" (pfo_ret__) \ : "m" (var)); \ break; \ case 4: \ - asm volatile(op "l "__percpu_arg(1)",%0"\ + asm qual (op "l "__percpu_arg(1)",%0" \ : "=r" (pfo_ret__) \ : "m" (var)); \ break; \ case 8: \ - asm volatile(op "q "__percpu_arg(1)",%0"\ + asm qual (op "q "__percpu_arg(1)",%0" \ : "=r" (pfo_ret__) \ : "m" (var)); \ break; \ @@ -238,23 +238,23 @@ do { \ pfo_ret__; \ }) -#define percpu_unary_op(op, var) \ +#define percpu_unary_op(qual, op, var) \ ({ \ switch (sizeof(var)) { \ case 1: \ - asm(op "b "__percpu_arg(0) \ + asm qual (op "b "__percpu_arg(0) \ : "+m" (var)); \ break; \ case 2: \ - asm(op "w "__percpu_arg(0) \ + asm qual (op "w "__percpu_arg(0) \ : "+m" (var)); \ break; \ case 4: \ - asm(op "l "__percpu_arg(0) \ + asm qual (op "l "__percpu_arg(0) \ : "+m" (var)); \ break; \ case 8: \ - asm(op "q "__percpu_arg(0) \ + asm qual (op "q "__percpu_arg(0) \ : "+m" (var)); \ break; \ default: __bad_percpu_size(); \ @@ -264,27 +264,27 @@ do { \ /* * Add return operation */ -#define percpu_add_return_op(var, val) \ +#define percpu_add_return_op(qual, var, val) \ ({ \ typeof(var) paro_ret__ = val; \ switch (sizeof(var)) { \ case 1: \ - asm("xaddb %0, "__percpu_arg(1) \ + asm qual ("xaddb %0, "__percpu_arg(1) \ : "+q" (paro_ret__), "+m" (var) \ : : "memory"); \ break; \ case 2: \ - asm("xaddw %0, "__percpu_arg(1) \ + asm qual ("xaddw %0, "__percpu_arg(1) \ : "+r" (paro_ret__), "+m" (var) \ : : "memory"); \ break; \ case 4: \ - asm("xaddl %0, "__percpu_arg(1) \ + asm qual ("xaddl %0, "__percpu_arg(1) \ : "+r" (paro_ret__), "+m" (var) \ : : "memory"); \ break; \ case 8: \ - asm("xaddq %0, "__percpu_arg(1) \ + asm qual ("xaddq %0, "__percpu_arg(1) \ : "+re" (paro_ret__), "+m" (var) \ : : "memory"); \ break; \ @@ -299,13 +299,13 @@ do { \ * expensive due to the implied lock prefix. The processor cannot prefetch * cachelines if xchg is used. */ -#define percpu_xchg_op(var, nval) \ +#define percpu_xchg_op(qual, var, nval) \ ({ \ typeof(var) pxo_ret__; \ typeof(var) pxo_new__ = (nval); \ switch (sizeof(var)) { \ case 1: \ - asm("\n\tmov "__percpu_arg(1)",%%al" \ + asm qual ("\n\tmov "__percpu_arg(1)",%%al" \ "\n1:\tcmpxchgb %2, "__percpu_arg(1) \ "\n\tjnz 1b" \ : "=&a" (pxo_ret__), "+m" (var) \ @@ -313,7 +313,7 @@ do { \ : "memory"); \ break; \ case 2: \ - asm("\n\tmov "__percpu_arg(1)",%%ax" \ + asm qual ("\n\tmov "__percpu_arg(1)",%%ax" \ "\n1:\tcmpxchgw %2, "__percpu_arg(1) \ "\n\tjnz 1b" \ : "=&a" (pxo_ret__), "+m" (var) \ @@ -321,7 +321,7 @@ do { \ : "memory"); \ break; \ case 4: \ - asm("\n\tmov "__percpu_arg(1)",%%eax" \ + asm qual ("\n\tmov "__percpu_arg(1)",%%eax" \ "\n1:\tcmpxchgl %2, "__percpu_arg(1) \ "\n\tjnz 1b" \ : "=&a" (pxo_ret__), "+m" (var) \ @@ -329,7 +329,7 @@ do { \ : "memory"); \ break; \ case 8: \ - asm("\n\tmov "__percpu_arg(1)",%%rax" \ + asm qual ("\n\tmov "__percpu_arg(1)",%%rax" \ "\n1:\tcmpxchgq %2, "__percpu_arg(1) \ "\n\tjnz 1b" \ : "=&a" (pxo_ret__), "+m" (var) \ @@ -345,32 +345,32 @@ do { \ * cmpxchg has no such implied lock semantics as a result it is much * more efficient for cpu local operations. */ -#define percpu_cmpxchg_op(var, oval, nval) \ +#define percpu_cmpxchg_op(qual, var, oval, nval) \ ({ \ typeof(var) pco_ret__; \ typeof(var) pco_old__ = (oval); \ typeof(var) pco_new__ = (nval); \ switch (sizeof(var)) { \ case 1: \ - asm("cmpxchgb %2, "__percpu_arg(1) \ + asm qual ("cmpxchgb %2, "__percpu_arg(1) \ : "=a" (pco_ret__), "+m" (var) \ : "q" (pco_new__), "0" (pco_old__) \ : "memory"); \ break; \ case 2: \ - asm("cmpxchgw %2, "__percpu_arg(1) \ + asm qual ("cmpxchgw %2, "__percpu_arg(1) \ : "=a" (pco_ret__), "+m" (var) \ : "r" (pco_new__), "0" (pco_old__) \ : "memory"); \ break; \ case 4: \ - asm("cmpxchgl %2, "__percpu_arg(1) \ + asm qual ("cmpxchgl %2, "__percpu_arg(1) \ : "=a" (pco_ret__), "+m" (var) \ : "r" (pco_new__), "0" (pco_old__) \ : "memory"); \ break; \ case 8: \ - asm("cmpxchgq %2, "__percpu_arg(1) \ + asm qual ("cmpxchgq %2, "__percpu_arg(1) \ : "=a" (pco_ret__), "+m" (var) \ : "r" (pco_new__), "0" (pco_old__) \ : "memory"); \ @@ -391,58 +391,70 @@ do { \ */ #define this_cpu_read_stable(var) percpu_stable_op("mov", var) -#define raw_cpu_read_1(pcp) percpu_from_op("mov", pcp) -#define raw_cpu_read_2(pcp) percpu_from_op("mov", pcp) -#define raw_cpu_read_4(pcp) percpu_from_op("mov", pcp) - -#define raw_cpu_write_1(pcp, val) percpu_to_op("mov", (pcp), val) -#define raw_cpu_write_2(pcp, val) percpu_to_op("mov", (pcp), val) -#define raw_cpu_write_4(pcp, val) percpu_to_op("mov", (pcp), val) -#define raw_cpu_add_1(pcp, val) percpu_add_op((pcp), val) -#define raw_cpu_add_2(pcp, val) percpu_add_op((pcp), val) -#define raw_cpu_add_4(pcp, val) percpu_add_op((pcp), val) -#define raw_cpu_and_1(pcp, val) percpu_to_op("and", (pcp), val) -#define raw_cpu_and_2(pcp, val) percpu_to_op("and", (pcp), val) -#define raw_cpu_and_4(pcp, val) percpu_to_op("and", (pcp), val) -#define raw_cpu_or_1(pcp, val) percpu_to_op("or", (pcp), val) -#define raw_cpu_or_2(pcp, val) percpu_to_op("or", (pcp), val) -#define raw_cpu_or_4(pcp, val) percpu_to_op("or", (pcp), val) -#define raw_cpu_xchg_1(pcp, val) percpu_xchg_op(pcp, val) -#define raw_cpu_xchg_2(pcp, val) percpu_xchg_op(pcp, val) -#define raw_cpu_xchg_4(pcp, val) percpu_xchg_op(pcp, val) - -#define this_cpu_read_1(pcp) percpu_from_op("mov", pcp) -#define this_cpu_read_2(pcp) percpu_from_op("mov", pcp) -#define this_cpu_read_4(pcp) percpu_from_op("mov", pcp) -#define this_cpu_write_1(pcp, val) percpu_to_op("mov", (pcp), val) -#define this_cpu_write_2(pcp, val) percpu_to_op("mov", (pcp), val) -#define this_cpu_write_4(pcp, val) percpu_to_op("mov", (pcp), val) -#define this_cpu_add_1(pcp, val) percpu_add_op((pcp), val) -#define this_cpu_add_2(pcp, val) percpu_add_op((pcp), val) -#define this_cpu_add_4(pcp, val) percpu_add_op((pcp), val) -#define this_cpu_and_1(pcp, val) percpu_to_op("and", (pcp), val) -#define this_cpu_and_2(pcp, val) percpu_to_op("and", (pcp), val) -#define this_cpu_and_4(pcp, val) percpu_to_op("and", (pcp), val) -#define this_cpu_or_1(pcp, val) percpu_to_op("or", (pcp), val) -#define this_cpu_or_2(pcp, val) percpu_to_op("or", (pcp), val) -#define this_cpu_or_4(pcp, val) percpu_to_op("or", (pcp), val) -#define this_cpu_xchg_1(pcp, nval) percpu_xchg_op(pcp, nval) -#define this_cpu_xchg_2(pcp, nval) percpu_xchg_op(pcp, nval) -#define this_cpu_xchg_4(pcp, nval) percpu_xchg_op(pcp, nval) - -#define raw_cpu_add_return_1(pcp, val) percpu_add_return_op(pcp, val) -#define raw_cpu_add_return_2(pcp, val) percpu_add_return_op(pcp, val) -#define raw_cpu_add_return_4(pcp, val) percpu_add_return_op(pcp, val) -#define raw_cpu_cmpxchg_1(pcp, oval, nval) percpu_cmpxchg_op(pcp, oval, nval) -#define raw_cpu_cmpxchg_2(pcp, oval, nval) percpu_cmpxchg_op(pcp, oval, nval) -#define raw_cpu_cmpxchg_4(pcp, oval, nval) percpu_cmpxchg_op(pcp, oval, nval) - -#define this_cpu_add_return_1(pcp, val) percpu_add_return_op(pcp, val) -#define this_cpu_add_return_2(pcp, val) percpu_add_return_op(pcp, val) -#define this_cpu_add_return_4(pcp, val) percpu_add_return_op(pcp, val) -#define this_cpu_cmpxchg_1(pcp, oval, nval) percpu_cmpxchg_op(pcp, oval, nval) -#define this_cpu_cmpxchg_2(pcp, oval, nval) percpu_cmpxchg_op(pcp, oval, nval) -#define this_cpu_cmpxchg_4(pcp, oval, nval) percpu_cmpxchg_op(pcp, oval, nval) +#define raw_cpu_read_1(pcp) percpu_from_op(, "mov", pcp) +#define raw_cpu_read_2(pcp) percpu_from_op(, "mov", pcp) +#define raw_cpu_read_4(pcp) percpu_from_op(, "mov", pcp) + +#define raw_cpu_write_1(pcp, val) percpu_to_op(, "mov", (pcp), val) +#define raw_cpu_write_2(pcp, val) percpu_to_op(, "mov", (pcp), val) +#define raw_cpu_write_4(pcp, val) percpu_to_op(, "mov", (pcp), val) +#define raw_cpu_add_1(pcp, val) percpu_add_op(, (pcp), val) +#define raw_cpu_add_2(pcp, val) percpu_add_op(, (pcp), val) +#define raw_cpu_add_4(pcp, val) percpu_add_op(, (pcp), val) +#define raw_cpu_and_1(pcp, val) percpu_to_op(, "and", (pcp), val) +#define raw_cpu_and_2(pcp, val) percpu_to_op(, "and", (pcp), val) +#define raw_cpu_and_4(pcp, val) percpu_to_op(, "and", (pcp), val) +#define raw_cpu_or_1(pcp, val) percpu_to_op(, "or", (pcp), val) +#define raw_cpu_or_2(pcp, val) percpu_to_op(, "or", (pcp), val) +#define raw_cpu_or_4(pcp, val) percpu_to_op(, "or", (pcp), val) + +/* + * raw_cpu_xchg() can use a load-store since it is not required to be + * IRQ-safe. + */ +#define raw_percpu_xchg_op(var, nval) \ +({ \ + typeof(var) pxo_ret__ = raw_cpu_read(var); \ + raw_cpu_write(var, (nval)); \ + pxo_ret__; \ +}) + +#define raw_cpu_xchg_1(pcp, val) raw_percpu_xchg_op(pcp, val) +#define raw_cpu_xchg_2(pcp, val) raw_percpu_xchg_op(pcp, val) +#define raw_cpu_xchg_4(pcp, val) raw_percpu_xchg_op(pcp, val) + +#define this_cpu_read_1(pcp) percpu_from_op(volatile, "mov", pcp) +#define this_cpu_read_2(pcp) percpu_from_op(volatile, "mov", pcp) +#define this_cpu_read_4(pcp) percpu_from_op(volatile, "mov", pcp) +#define this_cpu_write_1(pcp, val) percpu_to_op(volatile, "mov", (pcp), val) +#define this_cpu_write_2(pcp, val) percpu_to_op(volatile, "mov", (pcp), val) +#define this_cpu_write_4(pcp, val) percpu_to_op(volatile, "mov", (pcp), val) +#define this_cpu_add_1(pcp, val) percpu_add_op(volatile, (pcp), val) +#define this_cpu_add_2(pcp, val) percpu_add_op(volatile, (pcp), val) +#define this_cpu_add_4(pcp, val) percpu_add_op(volatile, (pcp), val) +#define this_cpu_and_1(pcp, val) percpu_to_op(volatile, "and", (pcp), val) +#define this_cpu_and_2(pcp, val) percpu_to_op(volatile, "and", (pcp), val) +#define this_cpu_and_4(pcp, val) percpu_to_op(volatile, "and", (pcp), val) +#define this_cpu_or_1(pcp, val) percpu_to_op(volatile, "or", (pcp), val) +#define this_cpu_or_2(pcp, val) percpu_to_op(volatile, "or", (pcp), val) +#define this_cpu_or_4(pcp, val) percpu_to_op(volatile, "or", (pcp), val) +#define this_cpu_xchg_1(pcp, nval) percpu_xchg_op(volatile, pcp, nval) +#define this_cpu_xchg_2(pcp, nval) percpu_xchg_op(volatile, pcp, nval) +#define this_cpu_xchg_4(pcp, nval) percpu_xchg_op(volatile, pcp, nval) + +#define raw_cpu_add_return_1(pcp, val) percpu_add_return_op(, pcp, val) +#define raw_cpu_add_return_2(pcp, val) percpu_add_return_op(, pcp, val) +#define raw_cpu_add_return_4(pcp, val) percpu_add_return_op(, pcp, val) +#define raw_cpu_cmpxchg_1(pcp, oval, nval) percpu_cmpxchg_op(, pcp, oval, nval) +#define raw_cpu_cmpxchg_2(pcp, oval, nval) percpu_cmpxchg_op(, pcp, oval, nval) +#define raw_cpu_cmpxchg_4(pcp, oval, nval) percpu_cmpxchg_op(, pcp, oval, nval) + +#define this_cpu_add_return_1(pcp, val) percpu_add_return_op(volatile, pcp, val) +#define this_cpu_add_return_2(pcp, val) percpu_add_return_op(volatile, pcp, val) +#define this_cpu_add_return_4(pcp, val) percpu_add_return_op(volatile, pcp, val) +#define this_cpu_cmpxchg_1(pcp, oval, nval) percpu_cmpxchg_op(volatile, pcp, oval, nval) +#define this_cpu_cmpxchg_2(pcp, oval, nval) percpu_cmpxchg_op(volatile, pcp, oval, nval) +#define this_cpu_cmpxchg_4(pcp, oval, nval) percpu_cmpxchg_op(volatile, pcp, oval, nval) #ifdef CONFIG_X86_CMPXCHG64 #define percpu_cmpxchg8b_double(pcp1, pcp2, o1, o2, n1, n2) \ @@ -466,23 +478,23 @@ do { \ * 32 bit must fall back to generic operations. */ #ifdef CONFIG_X86_64 -#define raw_cpu_read_8(pcp) percpu_from_op("mov", pcp) -#define raw_cpu_write_8(pcp, val) percpu_to_op("mov", (pcp), val) -#define raw_cpu_add_8(pcp, val) percpu_add_op((pcp), val) -#define raw_cpu_and_8(pcp, val) percpu_to_op("and", (pcp), val) -#define raw_cpu_or_8(pcp, val) percpu_to_op("or", (pcp), val) -#define raw_cpu_add_return_8(pcp, val) percpu_add_return_op(pcp, val) -#define raw_cpu_xchg_8(pcp, nval) percpu_xchg_op(pcp, nval) -#define raw_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(pcp, oval, nval) - -#define this_cpu_read_8(pcp) percpu_from_op("mov", pcp) -#define this_cpu_write_8(pcp, val) percpu_to_op("mov", (pcp), val) -#define this_cpu_add_8(pcp, val) percpu_add_op((pcp), val) -#define this_cpu_and_8(pcp, val) percpu_to_op("and", (pcp), val) -#define this_cpu_or_8(pcp, val) percpu_to_op("or", (pcp), val) -#define this_cpu_add_return_8(pcp, val) percpu_add_return_op(pcp, val) -#define this_cpu_xchg_8(pcp, nval) percpu_xchg_op(pcp, nval) -#define this_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(pcp, oval, nval) +#define raw_cpu_read_8(pcp) percpu_from_op(, "mov", pcp) +#define raw_cpu_write_8(pcp, val) percpu_to_op(, "mov", (pcp), val) +#define raw_cpu_add_8(pcp, val) percpu_add_op(, (pcp), val) +#define raw_cpu_and_8(pcp, val) percpu_to_op(, "and", (pcp), val) +#define raw_cpu_or_8(pcp, val) percpu_to_op(, "or", (pcp), val) +#define raw_cpu_add_return_8(pcp, val) percpu_add_return_op(, pcp, val) +#define raw_cpu_xchg_8(pcp, nval) raw_percpu_xchg_op(pcp, nval) +#define raw_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(, pcp, oval, nval) + +#define this_cpu_read_8(pcp) percpu_from_op(volatile, "mov", pcp) +#define this_cpu_write_8(pcp, val) percpu_to_op(volatile, "mov", (pcp), val) +#define this_cpu_add_8(pcp, val) percpu_add_op(volatile, (pcp), val) +#define this_cpu_and_8(pcp, val) percpu_to_op(volatile, "and", (pcp), val) +#define this_cpu_or_8(pcp, val) percpu_to_op(volatile, "or", (pcp), val) +#define this_cpu_add_return_8(pcp, val) percpu_add_return_op(volatile, pcp, val) +#define this_cpu_xchg_8(pcp, nval) percpu_xchg_op(volatile, pcp, nval) +#define this_cpu_cmpxchg_8(pcp, oval, nval) percpu_cmpxchg_op(volatile, pcp, oval, nval) /* * Pretty complex macro to generate cmpxchg16 instruction. The instruction diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h index da545df207b2..0d3fe060a44f 100644 --- a/arch/x86/include/asm/smp.h +++ b/arch/x86/include/asm/smp.h @@ -162,7 +162,8 @@ __visible void smp_call_function_single_interrupt(struct pt_regs *r); * from the initial startup. We map APIC_BASE very early in page_setup(), * so this is correct in the x86 case. */ -#define raw_smp_processor_id() (this_cpu_read(cpu_number)) +#define raw_smp_processor_id() this_cpu_read(cpu_number) +#define __smp_processor_id() __this_cpu_read(cpu_number) #ifdef CONFIG_X86_32 extern int safe_smp_processor_id(void); diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h index 880b5515b1d6..d83e9f771d86 100644 --- a/arch/x86/include/asm/text-patching.h +++ b/arch/x86/include/asm/text-patching.h @@ -18,6 +18,20 @@ static inline void apply_paravirt(struct paravirt_patch_site *start, #define __parainstructions_end NULL #endif +/* + * Currently, the max observed size in the kernel code is + * JUMP_LABEL_NOP_SIZE/RELATIVEJUMP_SIZE, which are 5. + * Raise it if needed. + */ +#define POKE_MAX_OPCODE_SIZE 5 + +struct text_poke_loc { + void *detour; + void *addr; + size_t len; + const char opcode[POKE_MAX_OPCODE_SIZE]; +}; + extern void text_poke_early(void *addr, const void *opcode, size_t len); /* @@ -38,6 +52,7 @@ extern void *text_poke(void *addr, const void *opcode, size_t len); extern void *text_poke_kgdb(void *addr, const void *opcode, size_t len); extern int poke_int3_handler(struct pt_regs *regs); extern void text_poke_bp(void *addr, const void *opcode, size_t len, void *handler); +extern void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries); extern int after_bootmem; extern __ro_after_init struct mm_struct *poking_mm; extern __ro_after_init unsigned long poking_addr; diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 390596b761e3..bd542f9b0953 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -14,6 +14,7 @@ #include <linux/kdebug.h> #include <linux/kprobes.h> #include <linux/mmu_context.h> +#include <linux/bsearch.h> #include <asm/text-patching.h> #include <asm/alternative.h> #include <asm/sections.h> @@ -848,81 +849,133 @@ static void do_sync_core(void *info) sync_core(); } -static bool bp_patching_in_progress; -static void *bp_int3_handler, *bp_int3_addr; +static struct bp_patching_desc { + struct text_poke_loc *vec; + int nr_entries; +} bp_patching; + +static int patch_cmp(const void *key, const void *elt) +{ + struct text_poke_loc *tp = (struct text_poke_loc *) elt; + + if (key < tp->addr) + return -1; + if (key > tp->addr) + return 1; + return 0; +} +NOKPROBE_SYMBOL(patch_cmp); int poke_int3_handler(struct pt_regs *regs) { + struct text_poke_loc *tp; + unsigned char int3 = 0xcc; + void *ip; + /* * Having observed our INT3 instruction, we now must observe - * bp_patching_in_progress. + * bp_patching.nr_entries. * - * in_progress = TRUE INT3 + * nr_entries != 0 INT3 * WMB RMB - * write INT3 if (in_progress) + * write INT3 if (nr_entries) * - * Idem for bp_int3_handler. + * Idem for other elements in bp_patching. */ smp_rmb(); - if (likely(!bp_patching_in_progress)) + if (likely(!bp_patching.nr_entries)) return 0; - if (user_mode(regs) || regs->ip != (unsigned long)bp_int3_addr) + if (user_mode(regs)) return 0; - /* set up the specified breakpoint handler */ - regs->ip = (unsigned long) bp_int3_handler; + /* + * Discount the sizeof(int3). See text_poke_bp_batch(). + */ + ip = (void *) regs->ip - sizeof(int3); + + /* + * Skip the binary search if there is a single member in the vector. + */ + if (unlikely(bp_patching.nr_entries > 1)) { + tp = bsearch(ip, bp_patching.vec, bp_patching.nr_entries, + sizeof(struct text_poke_loc), + patch_cmp); + if (!tp) + return 0; + } else { + tp = bp_patching.vec; + if (tp->addr != ip) + return 0; + } + + /* set up the specified breakpoint detour */ + regs->ip = (unsigned long) tp->detour; return 1; } NOKPROBE_SYMBOL(poke_int3_handler); /** - * text_poke_bp() -- update instructions on live kernel on SMP - * @addr: address to patch - * @opcode: opcode of new instruction - * @len: length to copy - * @handler: address to jump to when the temporary breakpoint is hit + * text_poke_bp_batch() -- update instructions on live kernel on SMP + * @tp: vector of instructions to patch + * @nr_entries: number of entries in the vector * * Modify multi-byte instruction by using int3 breakpoint on SMP. * We completely avoid stop_machine() here, and achieve the * synchronization using int3 breakpoint. * * The way it is done: - * - add a int3 trap to the address that will be patched + * - For each entry in the vector: + * - add a int3 trap to the address that will be patched * - sync cores - * - update all but the first byte of the patched range + * - For each entry in the vector: + * - update all but the first byte of the patched range * - sync cores - * - replace the first byte (int3) by the first byte of - * replacing opcode + * - For each entry in the vector: + * - replace the first byte (int3) by the first byte of + * replacing opcode * - sync cores */ -void text_poke_bp(void *addr, const void *opcode, size_t len, void *handler) +void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries) { + int patched_all_but_first = 0; unsigned char int3 = 0xcc; - - bp_int3_handler = handler; - bp_int3_addr = (u8 *)addr + sizeof(int3); - bp_patching_in_progress = true; + unsigned int i; lockdep_assert_held(&text_mutex); + bp_patching.vec = tp; + bp_patching.nr_entries = nr_entries; + /* * Corresponding read barrier in int3 notifier for making sure the - * in_progress and handler are correctly ordered wrt. patching. + * nr_entries and handler are correctly ordered wrt. patching. */ smp_wmb(); - text_poke(addr, &int3, sizeof(int3)); + /* + * First step: add a int3 trap to the address that will be patched. + */ + for (i = 0; i < nr_entries; i++) + text_poke(tp[i].addr, &int3, sizeof(int3)); on_each_cpu(do_sync_core, NULL, 1); - if (len - sizeof(int3) > 0) { - /* patch all but the first byte */ - text_poke((char *)addr + sizeof(int3), - (const char *) opcode + sizeof(int3), - len - sizeof(int3)); + /* + * Second step: update all but the first byte of the patched range. + */ + for (i = 0; i < nr_entries; i++) { + if (tp[i].len - sizeof(int3) > 0) { + text_poke((char *)tp[i].addr + sizeof(int3), + (const char *)tp[i].opcode + sizeof(int3), + tp[i].len - sizeof(int3)); + patched_all_but_first++; + } + } + + if (patched_all_but_first) { /* * According to Intel, this core syncing is very likely * not necessary and we'd be safe even without it. But @@ -931,14 +984,47 @@ void text_poke_bp(void *addr, const void *opcode, size_t len, void *handler) on_each_cpu(do_sync_core, NULL, 1); } - /* patch the first byte */ - text_poke(addr, opcode, sizeof(int3)); + /* + * Third step: replace the first byte (int3) by the first byte of + * replacing opcode. + */ + for (i = 0; i < nr_entries; i++) + text_poke(tp[i].addr, tp[i].opcode, sizeof(int3)); on_each_cpu(do_sync_core, NULL, 1); /* * sync_core() implies an smp_mb() and orders this store against * the writing of the new instruction. */ - bp_patching_in_progress = false; + bp_patching.vec = NULL; + bp_patching.nr_entries = 0; } +/** + * text_poke_bp() -- update instructions on live kernel on SMP + * @addr: address to patch + * @opcode: opcode of new instruction + * @len: length to copy + * @handler: address to jump to when the temporary breakpoint is hit + * + * Update a single instruction with the vector in the stack, avoiding + * dynamically allocated memory. This function should be used when it is + * not possible to allocate memory. + */ +void text_poke_bp(void *addr, const void *opcode, size_t len, void *handler) +{ + struct text_poke_loc tp = { + .detour = handler, + .addr = addr, + .len = len, + }; + + if (len > POKE_MAX_OPCODE_SIZE) { + WARN_ONCE(1, "len is larger than %d\n", POKE_MAX_OPCODE_SIZE); + return; + } + + memcpy((void *)tp.opcode, opcode, len); + + text_poke_bp_batch(&tp, 1); +} diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c index e631c358f7f4..044053235302 100644 --- a/arch/x86/kernel/jump_label.c +++ b/arch/x86/kernel/jump_label.c @@ -35,41 +35,43 @@ static void bug_at(unsigned char *ip, int line) BUG(); } -static void __ref __jump_label_transform(struct jump_entry *entry, - enum jump_label_type type, - int init) +static void __jump_label_set_jump_code(struct jump_entry *entry, + enum jump_label_type type, + union jump_code_union *code, + int init) { - union jump_code_union jmp; const unsigned char default_nop[] = { STATIC_KEY_INIT_NOP }; const unsigned char *ideal_nop = ideal_nops[NOP_ATOMIC5]; - const void *expect, *code; + const void *expect; int line; - jmp.jump = 0xe9; - jmp.offset = jump_entry_target(entry) - - (jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE); + code->jump = 0xe9; + code->offset = jump_entry_target(entry) - + (jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE); - if (type == JUMP_LABEL_JMP) { - if (init) { - expect = default_nop; line = __LINE__; - } else { - expect = ideal_nop; line = __LINE__; - } - - code = &jmp.code; + if (init) { + expect = default_nop; line = __LINE__; + } else if (type == JUMP_LABEL_JMP) { + expect = ideal_nop; line = __LINE__; } else { - if (init) { - expect = default_nop; line = __LINE__; - } else { - expect = &jmp.code; line = __LINE__; - } - - code = ideal_nop; + expect = code->code; line = __LINE__; } if (memcmp((void *)jump_entry_code(entry), expect, JUMP_LABEL_NOP_SIZE)) bug_at((void *)jump_entry_code(entry), line); + if (type == JUMP_LABEL_NOP) + memcpy(code, ideal_nop, JUMP_LABEL_NOP_SIZE); +} + +static void __ref __jump_label_transform(struct jump_entry *entry, + enum jump_label_type type, + int init) +{ + union jump_code_union code; + + __jump_label_set_jump_code(entry, type, &code, init); + /* * As long as only a single processor is running and the code is still * not marked as RO, text_poke_early() can be used; Checking that @@ -82,12 +84,12 @@ static void __ref __jump_label_transform(struct jump_entry *entry, * always nop being the 'currently valid' instruction */ if (init || system_state == SYSTEM_BOOTING) { - text_poke_early((void *)jump_entry_code(entry), code, + text_poke_early((void *)jump_entry_code(entry), &code, JUMP_LABEL_NOP_SIZE); return; } - text_poke_bp((void *)jump_entry_code(entry), code, JUMP_LABEL_NOP_SIZE, + text_poke_bp((void *)jump_entry_code(entry), &code, JUMP_LABEL_NOP_SIZE, (void *)jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE); } @@ -99,6 +101,75 @@ void arch_jump_label_transform(struct jump_entry *entry, mutex_unlock(&text_mutex); } +#define TP_VEC_MAX (PAGE_SIZE / sizeof(struct text_poke_loc)) +static struct text_poke_loc tp_vec[TP_VEC_MAX]; +static int tp_vec_nr; + +bool arch_jump_label_transform_queue(struct jump_entry *entry, + enum jump_label_type type) +{ + struct text_poke_loc *tp; + void *entry_code; + + if (system_state == SYSTEM_BOOTING) { + /* + * Fallback to the non-batching mode. + */ + arch_jump_label_transform(entry, type); + return true; + } + + /* + * No more space in the vector, tell upper layer to apply + * the queue before continuing. + */ + if (tp_vec_nr == TP_VEC_MAX) + return false; + + tp = &tp_vec[tp_vec_nr]; + + entry_code = (void *)jump_entry_code(entry); + + /* + * The INT3 handler will do a bsearch in the queue, so we need entries + * to be sorted. We can survive an unsorted list by rejecting the entry, + * forcing the generic jump_label code to apply the queue. Warning once, + * to raise the attention to the case of an unsorted entry that is + * better not happen, because, in the worst case we will perform in the + * same way as we do without batching - with some more overhead. + */ + if (tp_vec_nr > 0) { + int prev = tp_vec_nr - 1; + struct text_poke_loc *prev_tp = &tp_vec[prev]; + + if (WARN_ON_ONCE(prev_tp->addr > entry_code)) + return false; + } + + __jump_label_set_jump_code(entry, type, + (union jump_code_union *) &tp->opcode, 0); + + tp->addr = entry_code; + tp->detour = entry_code + JUMP_LABEL_NOP_SIZE; + tp->len = JUMP_LABEL_NOP_SIZE; + + tp_vec_nr++; + + return true; +} + +void arch_jump_label_transform_apply(void) +{ + if (!tp_vec_nr) + return; + + mutex_lock(&text_mutex); + text_poke_bp_batch(tp_vec, tp_vec_nr); + mutex_unlock(&text_mutex); + + tp_vec_nr = 0; +} + static enum { JL_STATE_START, JL_STATE_NO_UPDATE, |