bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[X86/Atomics] (Semantically) revert G246098, switch back to the old atomic ↵	Philip Reames	2019-11-05	1	-44/+158
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	example When writing an email for a follow up proposal, I realized one of the diffs in the committed change was incorrect. Digging into it revealed that the fix is complicated enough to require some thought, so reverting in the meantime. The problem is visible in this diff (from the revert): ; X64-SSE-LABEL: store_fp128: ; X64-SSE: # %bb.0: -; X64-SSE-NEXT: movaps %xmm0, (%rdi) +; X64-SSE-NEXT: subq $24, %rsp +; X64-SSE-NEXT: .cfi_def_cfa_offset 32 +; X64-SSE-NEXT: movaps %xmm0, (%rsp) +; X64-SSE-NEXT: movq (%rsp), %rsi +; X64-SSE-NEXT: movq {{[0-9]+}}(%rsp), %rdx +; X64-SSE-NEXT: callq __sync_lock_test_and_set_16 +; X64-SSE-NEXT: addq $24, %rsp +; X64-SSE-NEXT: .cfi_def_cfa_offset 8 ; X64-SSE-NEXT: retq store atomic fp128 %v, fp128* %fptr unordered, align 16 ret void The problem here is three fold: 1) x86-64 doesn't guarantee atomicity of anything larger than 8 bytes. Some platforms observably break this guarantee, others don't, but the codegen isn't considering this, so it's wrong on at least some platforms. 2) When I started to track down the problem, I discovered that DAGCombiner had stripped the atomicity off the store entirely. This comes down to idiomatic usage of DAG.getStore passing all MMO components separately as opposed to just passing the MMO. 3) On x86 (not -64), there are cases where 8 byte atomiciy is supported, but only for floating point operations. This would seem to imply that operation typing matters for correctness, and DAGCombine happily folds away bitcasts. I'm not 100% sure there's a problem here, but I'm not entirely sure there isn't either. I plan on returning to each issue in turn; sorry for the churn here.
*	[SelectionDAG] Enable lowering unordered atomics loads w/LoadSDNode (and ↵	Philip Reames	2019-10-29	1	-158/+44
\| \| \| \| \| \| \| \| \| \|	stores w/StoreSDNode) by default Enable the new SelectionDAG representation for unordered loads and stores introduced in r371441 by default. As a reminder, the new lowering changes the representation of an unordered atomic load from an AtomicSDNode - which is essentially a black box which gets passed through without combines messing with it - to a LoadSDNode w/a atomic marker on the MMO. The later parallels the way we handle volatiles, and I've audited the code to ensure that every location which checks one checks the other. This has been fairly heavily fuzzed, and I examined diffs in a reasonable large corpus of assembly by hand, so I'm reasonable sure this is correct for the common case. Late in the review for this, it was discovered that I hadn't correctly handled cases which could be legalized into CAS operations. This points out that there's a strong bias in the IR of the frontend I'm working with towards only legal atomics. If there are problems with this patch, the most likely area will be legalization. Differential Revision: https://reviews.llvm.org/D69219
*	[X86] Enable fp128 as a legal type with SSE1 rather than with MMX.	Craig Topper	2019-09-02	1	-20/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	FP128 values are passed in xmm registers so should be asssociated with an SSE feature rather than MMX which uses a different set of registers. llc enables sse1 and sse2 by default with x86_64. But does not enable mmx. Clang enables all 3 features by default. I've tried to add command lines to test with -sse where possible, but any test that returns a value in an xmm register fails with a fatal error with -sse since we have no defined ABI for that scenario. llvm-svn: 370682
*	[X86] Prefer locked stack op over mfence for seq_cst 64-bit stores on 32-bit ↵	Philip Reames	2019-05-14	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	targets This is a follow on to D58632, with the same logic. Given a memory operation which needs ordering, but doesn't need to modify any particular address, prefer to use a locked stack op over an mfence. Differential Revision: https://reviews.llvm.org/D61863 llvm-svn: 360649
*	[X86] Use MOVQ for i64 atomic_stores when SSE2 is enabled	Craig Topper	2019-04-27	1	-48/+128
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: If we have SSE2 we can use a MOVQ to store 64-bits and avoid falling back to a cmpxchg8b loop. If its a seq_cst store we need to insert an mfence after the store. Reviewers: spatel, RKSimon, reames, jfb, efriedma Reviewed By: RKSimon Subscribers: hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60546 llvm-svn: 359368
*	[X86] Add patterns for using movss/movsd for atomic load/store of f32/64. ↵	Craig Topper	2019-04-11	1	-30/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove atomic fadd pseudos use isel patterns instead. This patch adds patterns for turning bitcasted atomic load/store into movss/sd. It also removes the pseudo instructions for atomic RMW fadd. Instead just adding isel patterns for folding an atomic load into addss/sd. And relying on the new movss/sd store pattern to handle the write part. This also makes the fadd patterns use VEX and EVEX instructions when AVX or AVX512F are enabled. Differential Revision: https://reviews.llvm.org/D60394 llvm-svn: 358215
*	Recommit r358211 "[X86] Use FILD/FIST to implement i64 atomic load on 32-bit ↵	Craig Topper	2019-04-11	1	-76/+32
\| \| \| \| \| \| \| \| \| \| \| \|	targets with X87, but no SSE2" With correct test checks this time. If we have X87, but not SSE2 we can atomicaly load an i64 value into the significand of an 80-bit extended precision x87 register using fild. We can then use a fist instruction to convert it back to an i64 integ This matches what gcc and icc do for this case and removes an existing FIXME. llvm-svn: 358214
*	Revert r358211 "[X86] Use FILD/FIST to implement i64 atomic load on 32-bit ↵	Craig Topper	2019-04-11	1	-16/+38
\| \| \| \| \| \| \| \|	targets with X87, but no SSE2" I seem to have messed up the test checks. llvm-svn: 358212
*	[X86] Use FILD/FIST to implement i64 atomic load on 32-bit targets with X87, ↵	Craig Topper	2019-04-11	1	-38/+16
\| \| \| \| \| \| \| \| \| \| \| \|	but no SSE2 If we have X87, but not SSE2 we can atomicaly load an i64 value into the significand of an 80-bit extended precision x87 register using fild. We can then use a fist instruction to convert it back to an i64 integer and store it to a stack temporary. From there we can do two 32-bit loads to get the value into integer registers without worrying about atomicness. This matches what gcc and icc do for this case and removes an existing FIXME. Differential Revision: https://reviews.llvm.org/D60156 llvm-svn: 358211
*	[X86] Add SSE1 command line to atomic-fp.ll and atomic-non-integer.ll. NFC	Craig Topper	2019-04-10	1	-46/+125
\| \| \| \|	llvm-svn: 358141
*	[X86] Add avx and avx512f command lines to atomic-non-integer.ll. NFC	Craig Topper	2019-04-08	1	-146/+465
\| \| \| \|	llvm-svn: 357881
*	[X86] Use movq for i64 atomic load on 32-bit targets when sse2 is enable	Craig Topper	2019-03-22	1	-42/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We used a lock cmpxchg8b to do i64 atomic loads. But if we have SSE2 we can do better and use a plain movq to do the load instead. I tried to just use an f64 atomic load and add isel patterns to MOVSD(which the domain fixing pass can turn to MOVQ), but the atomic_load SDNode in TargetSelectionDAG.td requires the type to be integer. So I've emitted VZEXT_LOAD instead which should be selected by isel to a MOVQ. Hopefully we don't need a specific atomic flavor of this. I kept the memory operand from the original AtomicSDNode. I wasn't sure if I might need to set the MOVolatile flag? I've left some FIXMEs for improvements we can do without SSE2. Differential Revision: https://reviews.llvm.org/D59679 llvm-svn: 356807
*	[X86] Add 32-bit command lines with and without SSE2 to ↵	Craig Topper	2019-03-22	1	-79/+432
\| \| \| \| \| \|	atomic-non-integer.ll. NFC llvm-svn: 356733
*	Allow code motion (and thus folding) for atomic (but unordered) memory operands	Philip Reames	2019-03-14	1	-6/+3
\| \| \| \| \| \| \| \| \| \|	Building on the work done in D57601, now that we can distinguish between atomic and volatile memory accesses, go ahead and allow code motion of unordered atomics. As seen in the diffs, this allows much better folding of memory operations into using instructions. (Mostly done by the PeepholeOpt pass.) Note: I have not reviewed all callers of hasOrderedMemoryRef since one of them - isSafeToMove - is very widely used. I'm relying on the documented semantics of each method to judge correctness. Differential Revision: https://reviews.llvm.org/D59345 llvm-svn: 356170
*	[X86] Remove RELEASE_ and ACQUIRE_ pseudo instructions. Use isel patterns ↵	Craig Topper	2018-08-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	and the normal instructions instead At one point in time acquire implied mayLoad and mayStore as did release. Thus we needed separate pseudos that also carried that property. This appears to no longer be the case. I believe it was changed in 2012 with a comment saying that atomic memory accesses are marked volatile which preserves the ordering. So from what I can tell we shouldn't need additional pseudos since they aren't carry any flags that are different from the normal instructions. The only thing I can think of is that we may consider them for load folding candidates in the peephole pass now where we didn't before. If that's important hopefully there's something in the memory operand we can check to prevent the folding without relying on pseudo instructions. Differential Revision: https://reviews.llvm.org/D50212 llvm-svn: 338925
*	[X86] Autogenerate complete checks. NFC	Craig Topper	2018-08-03	1	-37/+83
\| \| \| \|	llvm-svn: 338802
*	[X86][SSE2] Fix asm string for movq (Move Quadword) instruction.	Ayman Musa	2017-04-26	1	-4/+4
\| \| \| \| \| \| \| \|	Replace "mov{d\|q}" with "movq". Differential Revision: https://reviews.llvm.org/D32220 llvm-svn: 301386
*	CodeGen: check return types match when emitting tail call to builtin.	Tim Northover	2016-03-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	We were just completely ignoring the types when determining whether we could safely emit a libcall as a tail call. This is clearly wrong. Theoretically, we could dig deeper looking for incidental matches (much like the generic code in Analysis.cpp does), but it's probably not worth it for the few libcalls that exist. llvm-svn: 264084
*	[IR] Add support for floating pointer atomic loads and stores	Philip Reames	2015-12-16	1	-0/+108
	This patch allows atomic loads and stores of floating point to be specified in the IR and adds an adapter to allow them to be lowered via existing backend support for bitcast-to-equivalent-integer idiom. Previously, the only way to specify a atomic float operation was to bitcast the pointer to a i32, load the value as an i32, then bitcast to a float. At it's most basic, this patch simply moves this expansion step to the point we start lowering to the backend. This patch does not add canonicalization rules to convert the bitcast idioms to the appropriate atomic loads. I plan to do that in the future, but for now, let's simply add the support. I'd like to get instruction selection working through at least one backend (x86-64) without the bitcast conversion before canonicalizing into this form. Similarly, I haven't yet added the target hooks to opt out of the lowering step I added to AtomicExpand. I figured it would more sense to add those once at least one backend (x86) was ready to actually opt out. As you can see from the included tests, the generated code quality is not great. I plan on submitting some patches to fix this, but help from others along that line would be very welcome. I'm not super familiar with the backend and my ramp up time may be material. Differential Revision: http://reviews.llvm.org/D15471 llvm-svn: 255737