<feed xmlns='http://www.w3.org/2005/Atom'>
<title>talos-obmc-linux/kernel/bpf, branch dev-4.10</title>
<subtitle>Talos™ II Linux sources for OpenBMC</subtitle>
<id>https://git.raptorcs.com/git/talos-obmc-linux/atom?h=dev-4.10</id>
<link rel='self' href='https://git.raptorcs.com/git/talos-obmc-linux/atom?h=dev-4.10'/>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-obmc-linux/'/>
<updated>2017-05-14T12:08:29+00:00</updated>
<entry>
<title>bpf: don't let ldimm64 leak map addresses on unprivileged</title>
<updated>2017-05-14T12:08:29+00:00</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2017-05-07T22:04:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-obmc-linux/commit/?id=c35107a3bec206f68402236a16be537b6ffce070'/>
<id>urn:sha1:c35107a3bec206f68402236a16be537b6ffce070</id>
<content type='text'>
[ Upstream commit 0d0e57697f162da4aa218b5feafe614fb666db07 ]

The patch fixes two things at once:

1) It checks the env-&gt;allow_ptr_leaks and only prints the map address to
   the log if we have the privileges to do so, otherwise it just dumps 0
   as we would when kptr_restrict is enabled on %pK. Given the latter is
   off by default and not every distro sets it, I don't want to rely on
   this, hence the 0 by default for unprivileged.

2) Printing of ldimm64 in the verifier log is currently broken in that
   we don't print the full immediate, but only the 32 bit part of the
   first insn part for ldimm64. Thus, fix this up as well; it's okay to
   access, since we verified all ldimm64 earlier already (including just
   constants) through replace_map_fd_with_map_ptr().

Fixes: 1be7f75d1668 ("bpf: enable non-root eBPF programs")
Fixes: cbd357008604 ("bpf: verifier (add ability to receive verification log)")
Reported-by: Jann Horn &lt;jannh@google.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>bpf: enhance verifier to understand stack pointer arithmetic</title>
<updated>2017-05-14T12:08:28+00:00</updated>
<author>
<name>Yonghong Song</name>
<email>yhs@fb.com</email>
</author>
<published>2017-04-30T05:52:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-obmc-linux/commit/?id=b3468d7ea8c91bfeefa424e90d933869c5924172'/>
<id>urn:sha1:b3468d7ea8c91bfeefa424e90d933869c5924172</id>
<content type='text'>
[ Upstream commit 332270fdc8b6fba07d059a9ad44df9e1a2ad4529 ]

llvm 4.0 and above generates the code like below:
....
440: (b7) r1 = 15
441: (05) goto pc+73
515: (79) r6 = *(u64 *)(r10 -152)
516: (bf) r7 = r10
517: (07) r7 += -112
518: (bf) r2 = r7
519: (0f) r2 += r1
520: (71) r1 = *(u8 *)(r8 +0)
521: (73) *(u8 *)(r2 +45) = r1
....
and the verifier complains "R2 invalid mem access 'inv'" for insn #521.
This is because verifier marks register r2 as unknown value after #519
where r2 is a stack pointer and r1 holds a constant value.

Teach verifier to recognize "stack_ptr + imm" and
"stack_ptr + reg with const val" as valid stack_ptr with new offset.

Signed-off-by: Yonghong Song &lt;yhs@fb.com&gt;
Acked-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>bpf: improve verifier packet range checks</title>
<updated>2017-05-03T15:37:38+00:00</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@fb.com</email>
</author>
<published>2017-03-24T22:57:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-obmc-linux/commit/?id=4f45e887a6326fce709828eb229e17486b68b0fc'/>
<id>urn:sha1:4f45e887a6326fce709828eb229e17486b68b0fc</id>
<content type='text'>
[ Upstream commit b1977682a3858b5584ffea7cfb7bd863f68db18d ]

llvm can optimize the 'if (ptr &gt; data_end)' checks to be in the order
slightly different than the original C code which will confuse verifier.
Like:
if (ptr + 16 &gt; data_end)
  return TC_ACT_SHOT;
// may be followed by
if (ptr + 14 &gt; data_end)
  return TC_ACT_SHOT;
while llvm can see that 'ptr' is valid for all 16 bytes,
the verifier could not.
Fix verifier logic to account for such case and add a test.

Reported-by: Huapeng Zhou &lt;hzhou@fb.com&gt;
Fixes: 969bf05eb3ce ("bpf: direct packet access")
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>bpf: introduce BPF_F_ALLOW_OVERRIDE flag</title>
<updated>2017-02-13T02:52:19+00:00</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@fb.com</email>
</author>
<published>2017-02-11T04:28:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-obmc-linux/commit/?id=7f677633379b4abb3281cdbe7e7006f049305c03'/>
<id>urn:sha1:7f677633379b4abb3281cdbe7e7006f049305c03</id>
<content type='text'>
If BPF_F_ALLOW_OVERRIDE flag is used in BPF_PROG_ATTACH command
to the given cgroup the descendent cgroup will be able to override
effective bpf program that was inherited from this cgroup.
By default it's not passed, therefore override is disallowed.

Examples:
1.
prog X attached to /A with default
prog Y fails to attach to /A/B and /A/B/C
Everything under /A runs prog X

2.
prog X attached to /A with allow_override.
prog Y fails to attach to /A/B with default (non-override)
prog M attached to /A/B with allow_override.
Everything under /A/B runs prog M only.

3.
prog X attached to /A with allow_override.
prog Y fails to attach to /A with default.
The user has to detach first to switch the mode.

In the future this behavior may be extended with a chain of
non-overridable programs.

Also fix the bug where detach from cgroup where nothing is attached
was not throwing error. Return ENOENT in such case.

Add several testcases and adjust libbpf.

Fixes: 3007098494be ("cgroup: add support for eBPF programs")
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Daniel Mack &lt;daniel@zonque.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: don't trigger OOM killer under pressure with map alloc</title>
<updated>2017-01-18T22:12:26+00:00</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2017-01-18T14:14:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-obmc-linux/commit/?id=d407bd25a204bd66b7346dde24bd3d37ef0e0b05'/>
<id>urn:sha1:d407bd25a204bd66b7346dde24bd3d37ef0e0b05</id>
<content type='text'>
This patch adds two helpers, bpf_map_area_alloc() and bpf_map_area_free(),
that are to be used for map allocations. Using kmalloc() for very large
allocations can cause excessive work within the page allocator, so i) fall
back earlier to vmalloc() when the attempt is considered costly anyway,
and even more importantly ii) don't trigger OOM killer with any of the
allocators.

Since this is based on a user space request, for example, when creating
maps with element pre-allocation, we really want such requests to fail
instead of killing other user space processes.

Also, don't spam the kernel log with warnings should any of the allocations
fail under pressure. Given that, we can make backend selection in
bpf_map_area_alloc() generic, and convert all maps over to use this API
for spots with potentially large allocation requests.

Note, replacing the one kmalloc_array() is fine as overflow checks happen
earlier in htab_map_alloc(), since it must also protect the multiplication
for vmalloc() should kmalloc_array() fail.

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: rework prog_digest into prog_tag</title>
<updated>2017-01-16T19:03:31+00:00</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2017-01-13T22:38:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-obmc-linux/commit/?id=f1f7714ea51c56b7163fb1a5acf39c6a204dd758'/>
<id>urn:sha1:f1f7714ea51c56b7163fb1a5acf39c6a204dd758</id>
<content type='text'>
Commit 7bd509e311f4 ("bpf: add prog_digest and expose it via
fdinfo/netlink") was recently discussed, partially due to
admittedly suboptimal name of "prog_digest" in combination
with sha1 hash usage, thus inevitably and rightfully concerns
about its security in terms of collision resistance were
raised with regards to use-cases.

The intended use cases are for debugging resp. introspection
only for providing a stable "tag" over the instruction sequence
that both kernel and user space can calculate independently.
It's not usable at all for making a security relevant decision.
So collisions where two different instruction sequences generate
the same tag can happen, but ideally at a rather low rate. The
"tag" will be dumped in hex and is short enough to introspect
in tracepoints or kallsyms output along with other data such
as stack trace, etc. Thus, this patch performs a rename into
prog_tag and truncates the tag to a short output (64 bits) to
make it obvious it's not collision-free.

Should in future a hash or facility be needed with a security
relevant focus, then we can think about requirements, constraints,
etc that would fit to that situation. For now, rework the exposed
parts for the current use cases as long as nothing has been
released yet. Tested on x86_64 and s390x.

Fixes: 7bd509e311f4 ("bpf: add prog_digest and expose it via fdinfo/netlink")
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: do not use KMALLOC_SHIFT_MAX</title>
<updated>2017-01-11T02:31:54+00:00</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.com</email>
</author>
<published>2017-01-11T00:57:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-obmc-linux/commit/?id=7984c27c2c5cd3298de8afdba3e1bd46f884e934'/>
<id>urn:sha1:7984c27c2c5cd3298de8afdba3e1bd46f884e934</id>
<content type='text'>
Commit 01b3f52157ff ("bpf: fix allocation warnings in bpf maps and
integer overflow") has added checks for the maximum allocateable size.
It (ab)used KMALLOC_SHIFT_MAX for that purpose.

While this is not incorrect it is not very clean because we already have
KMALLOC_MAX_SIZE for this very reason so let's change both checks to use
KMALLOC_MAX_SIZE instead.

The original motivation for using KMALLOC_SHIFT_MAX was to work around
an incorrect KMALLOC_MAX_SIZE which could lead to allocation warnings
but it is no longer needed since "slab: make sure that KMALLOC_MAX_SIZE
will fit into MAX_ORDER".

Link: http://lkml.kernel.org/r/20161220130659.16461-3-mhocko@kernel.org
Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Andrey Konovalov &lt;andreyknvl@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>bpf: fix mark_reg_unknown_value for spilled regs on map value marking</title>
<updated>2016-12-18T02:27:44+00:00</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2016-12-18T00:52:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-obmc-linux/commit/?id=6760bf2ddde8ad64f8205a651223a93de3a35494'/>
<id>urn:sha1:6760bf2ddde8ad64f8205a651223a93de3a35494</id>
<content type='text'>
Martin reported a verifier issue that hit the BUG_ON() for his
test case in the mark_reg_unknown_value() function:

  [  202.861380] kernel BUG at kernel/bpf/verifier.c:467!
  [...]
  [  203.291109] Call Trace:
  [  203.296501]  [&lt;ffffffff811364d5&gt;] mark_map_reg+0x45/0x50
  [  203.308225]  [&lt;ffffffff81136558&gt;] mark_map_regs+0x78/0x90
  [  203.320140]  [&lt;ffffffff8113938d&gt;] do_check+0x226d/0x2c90
  [  203.331865]  [&lt;ffffffff8113a6ab&gt;] bpf_check+0x48b/0x780
  [  203.343403]  [&lt;ffffffff81134c8e&gt;] bpf_prog_load+0x27e/0x440
  [  203.355705]  [&lt;ffffffff8118a38f&gt;] ? handle_mm_fault+0x11af/0x1230
  [  203.369158]  [&lt;ffffffff812d8188&gt;] ? security_capable+0x48/0x60
  [  203.382035]  [&lt;ffffffff811351a4&gt;] SyS_bpf+0x124/0x960
  [  203.393185]  [&lt;ffffffff810515f6&gt;] ? __do_page_fault+0x276/0x490
  [  203.406258]  [&lt;ffffffff816db320&gt;] entry_SYSCALL_64_fastpath+0x13/0x94

This issue got uncovered after the fix in a08dd0da5307 ("bpf: fix
regression on verifier pruning wrt map lookups"). The reason why it
wasn't noticed before was, because as mentioned in a08dd0da5307,
mark_map_regs() was doing the id matching incorrectly based on the
uncached regs[regno].id. So, in the first loop, we walked all regs
and as soon as we found regno == i, then this reg's id was cleared
when calling mark_reg_unknown_value() thus that every subsequent
register was probed against id of 0 (which, in combination with the
PTR_TO_MAP_VALUE_OR_NULL type is an invalid condition that no other
register state can hold), and therefore wasn't type transitioned such
as in the spilled register case for the second loop.

Now since that got fixed, it turned out that 57a09bf0a416 ("bpf:
Detect identical PTR_TO_MAP_VALUE_OR_NULL registers") used
mark_reg_unknown_value() incorrectly for the spilled regs, and thus
hitting the BUG_ON() in some cases due to regno &gt;= MAX_BPF_REG.

Although spilled regs have the same type as the non-spilled regs
for the verifier state, that is, struct bpf_reg_state, they are
semantically different from the non-spilled regs. In other words,
there can be up to 64 (MAX_BPF_STACK / BPF_REG_SIZE) spilled regs
in the stack, for example, register R&lt;x&gt; could have been spilled by
the program to stack location X, Y, Z, and in mark_map_regs() we
need to scan these stack slots of type STACK_SPILL for potential
registers that we have to transition from PTR_TO_MAP_VALUE_OR_NULL.
Therefore, depending on the location, the spilled_regs regno can
be a lot higher than just MAX_BPF_REG's value since we operate on
stack instead. The reset in mark_reg_unknown_value() itself is
just fine, only that the BUG_ON() was inappropriate for this. Fix
it by making a __mark_reg_unknown_value() version that can be
called from mark_map_reg() generically; we know for the non-spilled
case that the regno is always &lt; MAX_BPF_REG anyway.

Fixes: 57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers")
Reported-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: fix overflow in prog accounting</title>
<updated>2016-12-18T02:27:44+00:00</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2016-12-18T00:52:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-obmc-linux/commit/?id=5ccb071e97fbd9ffe623a0d3977cc6d013bee93c'/>
<id>urn:sha1:5ccb071e97fbd9ffe623a0d3977cc6d013bee93c</id>
<content type='text'>
Commit aaac3ba95e4c ("bpf: charge user for creation of BPF maps and
programs") made a wrong assumption of charging against prog-&gt;pages.
Unlike map-&gt;pages, prog-&gt;pages are still subject to change when we
need to expand the program through bpf_prog_realloc().

This can for example happen during verification stage when we need to
expand and rewrite parts of the program. Should the required space
cross a page boundary, then prog-&gt;pages is not the same anymore as
its original value that we used to bpf_prog_charge_memlock() on. Thus,
we'll hit a wrap-around during bpf_prog_uncharge_memlock() when prog
is freed eventually. I noticed this that despite having unlimited
memlock, programs suddenly refused to load with EPERM error due to
insufficient memlock.

There are two ways to fix this issue. One would be to add a cached
variable to struct bpf_prog that takes a snapshot of prog-&gt;pages at the
time of charging. The other approach is to also account for resizes. I
chose to go with the latter for a couple of reasons: i) We want accounting
rather to be more accurate instead of further fooling limits, ii) adding
yet another page counter on struct bpf_prog would also be a waste just
for this purpose. We also do want to charge as early as possible to
avoid going into the verifier just to find out later on that we crossed
limits. The only place that needs to be fixed is bpf_prog_realloc(),
since only here we expand the program, so we try to account for the
needed delta and should we fail, call-sites check for outcome anyway.
On cBPF to eBPF migrations, we don't grab a reference to the user as
they are charged differently. With that in place, my test case worked
fine.

Fixes: aaac3ba95e4c ("bpf: charge user for creation of BPF maps and programs")
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: dynamically allocate digest scratch buffer</title>
<updated>2016-12-18T02:27:44+00:00</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2016-12-18T00:52:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/talos-obmc-linux/commit/?id=aafe6ae9cee32df85eb5e8bb6dd1d918e6807b09'/>
<id>urn:sha1:aafe6ae9cee32df85eb5e8bb6dd1d918e6807b09</id>
<content type='text'>
Geert rightfully complained that 7bd509e311f4 ("bpf: add prog_digest
and expose it via fdinfo/netlink") added a too large allocation of
variable 'raw' from bss section, and should instead be done dynamically:

  # ./scripts/bloat-o-meter kernel/bpf/core.o.1 kernel/bpf/core.o.2
  add/remove: 3/0 grow/shrink: 0/0 up/down: 33291/0 (33291)
  function                                     old     new   delta
  raw                                            -   32832  +32832
  [...]

Since this is only relevant during program creation path, which can be
considered slow-path anyway, lets allocate that dynamically and be not
implicitly dependent on verifier mutex. Move bpf_prog_calc_digest() at
the beginning of replace_map_fd_with_map_ptr() and also error handling
stays straight forward.

Reported-by: Geert Uytterhoeven &lt;geert@linux-m68k.org&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
