bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Add a quick hack so clang does not use mul type instructions on mips.meklort-10.0.1	Evan Lojewski	2020-08-01	1	-15/+36
\|
*	[PPCAsmPrinter] support 'L' output template for memory operands	Nick Desaulniers	2020-06-25	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: L is meant to support the second word used by 32b calling conventions for 64b arguments. This is required for build 32b PowerPC Linux kernels after upstream commit 334710b1496a ("powerpc/uaccess: Implement unsafe_put_user() using 'asm goto'") Thanks for the report from @nathanchance, and reference to GCC's implementation from @segher. Fixes: pr/46186 Fixes: https://github.com/ClangBuiltLinux/linux/issues/1044 Reviewers: echristo, hfinkel, MaskRay Reviewed By: MaskRay Subscribers: MaskRay, wuzish, nemanjai, hiraditya, kbarton, steven.zhang, llvm-commits, segher, nathanchance, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D81767 (cherry picked from commit 2d8e105db6bea10a6b96e4a094e73a87987ef909)
*	[AArch64] Change AArch64 Windows EH UnwindHelp object to be a fixed object	Daniel Frampton	2020-06-25	1	-15/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The UnwindHelp object is used during exception handling by runtime code. It must be findable from a fixed offset from FP. This change allocates the UnwindHelp object as a fixed object (as is done for x86_64) to ensure that both the generated code and runtime agree on the location of the object. Fixes https://bugs.llvm.org/show_bug.cgi?id=45346 Differential Revision: https://reviews.llvm.org/D77016 (cherry picked from commit 494abe139a9aab991582f1b3f3370b99b252944c)
*	[AArch64] Fix mismatch in prologue and epilogue for funclets on Windows	Daniel Frampton	2020-06-25	1	-28/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The generated code for a funclet can have an add to sp in the epilogue for which there is no corresponding sub in the prologue. This patch removes the early return from emitPrologue that was preventing the sub to sp, and instead conditionalizes the appropriate parts of the rest of the function. Fixes https://bugs.llvm.org/show_bug.cgi?id=45345 Differential Revision: https://reviews.llvm.org/D77015 (cherry picked from commit 522b4c4b88a5606b0074926e8658e7fede97c230)
*	[RISCV] Fix incorrect FP base CFI offset for variable argument functions	Shiva Chen	2020-06-25	1	-2/+2
\| \| \| \| \| \| \| \|	When the FP exists, the FP base CFI directive offset should take the size of variable arguments into account. Differential Revision: https://reviews.llvm.org/D73862 (cherry picked from commit 64f417200e1020305f28f3c1e40691585f50f6ad)
*	[RISCV64] Emit correct lib call for fp(float/double) to ui/si	Kamlesh Kumar	2020-06-25	1	-0/+34
\| \| \| \| \| \| \| \| \| \| \|	Since i32 is not legal in riscv64, it always promoted to i64 before emitting lib call and for conversions like float/double to int and float/double to unsigned int wrong lib call was emitted. This commit fix it using custom lowering. Differential Revision: https://reviews.llvm.org/D80526 (cherry picked from commit 7622ea5835f0381a426e504f4c03f11733732b83)
*	[X86] Add an Unoptimized Load Value Injection (LVI) Load Hardening Pass	Scott Constable	2020-06-24	3	-1/+82
\| \| \| \| \| \| \| \| \| \|	@nikic raised an issue on D75936 that the added complexity to the O0 pipeline was causing noticeable slowdowns for `-O0` builds. This patch addresses the issue by adding a pass with equal security properties, but without any optimizations (and more importantly, without the need for expensive analysis dependencies). Reviewers: nikic, craig.topper, mattdr Reviewed By: craig.topper, mattdr Differential Revision: https://reviews.llvm.org/D80964
*	[X86] Add Support for Load Hardening to Mitigate Load Value Injection (LVI)	Scott Constable	2020-06-24	1	-3/+306
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After finding all such gadgets in a given function, the pass minimally inserts LFENCE instructions in such a manner that the following property is satisfied: for all SOURCE+SINK pairs, all paths in the CFG from SOURCE to SINK contain at least one LFENCE instruction. The algorithm that implements this minimal insertion is influenced by an academic paper that minimally inserts memory fences for high-performance concurrent programs: http://www.cs.ucr.edu/~lesani/companion/oopsla15/OOPSLA15.pdf The algorithm implemented in this pass is as follows: 1. Build a condensed CFG (i.e., a GadgetGraph) consisting only of the following components: -SOURCE instructions (also includes function arguments) -SINK instructions -Basic block entry points -Basic block terminators -LFENCE instructions 2. Analyze the GadgetGraph to determine which SOURCE+SINK pairs (i.e., gadgets) are already mitigated by existing LFENCEs. If all gadgets have been mitigated, go to step 6. 3. Use a heuristic or plugin to approximate minimal LFENCE insertion. 4. Insert one LFENCE along each CFG edge that was cut in step 3. 5. Go to step 2. 6. If any LFENCEs were inserted, return true from runOnFunction() to tell LLVM that the function was modified. By default, the heuristic used in Step 3 is a greedy heuristic that avoids inserting LFENCEs into loops unless absolutely necessary. There is also a CLI option to load a plugin that can provide even better optimization, inserting fewer fences, while still mitigating all of the LVI gadgets. The plugin can be found here: https://github.com/intel/lvi-llvm-optimization-plugin, and a description of the pass's behavior with the plugin can be found here: https://software.intel.com/security-software-guidance/insights/optimized-mitigation-approach-load-value-injection. Differential Revision: https://reviews.llvm.org/D75937
*	[X86] Add a Pass that builds a Condensed CFG for Load Value Injection (LVI) ↵	Scott Constable	2020-06-24	7	-0/+984
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Gadgets Adds a new data structure, ImmutableGraph, and uses RDF to find LVI gadgets and add them to a MachineGadgetGraph. More specifically, a new X86 machine pass finds Load Value Injection (LVI) gadgets consisting of a load from memory (i.e., SOURCE), and any operation that may transmit the value loaded from memory over a covert channel, or use the value loaded from memory to determine a branch/call target (i.e., SINK). Also adds a new target feature to X86: +lvi-load-hardening The feature can be added via the clang CLI using -mlvi-hardening. Differential Revision: https://reviews.llvm.org/D75936
*	[X86] Fix to X86LoadValueInjectionRetHardeningPass for possible segfault	Scott Constable	2020-06-24	1	-0/+3
\| \| \| \| \| \|	`MBB.back()` could segfault if `MBB.empty()`. Fixed by checking for `MBB.empty()` in the loop. Differential Revision: https://reviews.llvm.org/D77584
*	Revert "[X86] Add a Pass that builds a Condensed CFG for Load Value ↵	Craig Topper	2020-06-24	7	-1035/+0
\| \| \| \| \| \| \| \| \|	Injection (LVI) Gadgets" This reverts commit c74dd640fd740c6928f66a39c7c15a014af3f66f. Reverting to address coding standard issues raised in post-commit review.
*	Revert "[X86] Add Support for Load Hardening to Mitigate Load Value ↵	Craig Topper	2020-06-24	1	-277/+5
\| \| \| \| \| \| \| \| \|	Injection (LVI)" This reverts commit 62c42e29ba43c9d79cd5bd2084b641fbff6a96d5 Reverting to address coding standard issues raised in post-commit review.
*	[X86] Add Support for Load Hardening to Mitigate Load Value Injection (LVI)	Scott Constable	2020-06-24	1	-5/+277
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After finding all such gadgets in a given function, the pass minimally inserts LFENCE instructions in such a manner that the following property is satisfied: for all SOURCE+SINK pairs, all paths in the CFG from SOURCE to SINK contain at least one LFENCE instruction. The algorithm that implements this minimal insertion is influenced by an academic paper that minimally inserts memory fences for high-performance concurrent programs: http://www.cs.ucr.edu/~lesani/companion/oopsla15/OOPSLA15.pdf The algorithm implemented in this pass is as follows: 1. Build a condensed CFG (i.e., a GadgetGraph) consisting only of the following components: -SOURCE instructions (also includes function arguments) -SINK instructions -Basic block entry points -Basic block terminators -LFENCE instructions 2. Analyze the GadgetGraph to determine which SOURCE+SINK pairs (i.e., gadgets) are already mitigated by existing LFENCEs. If all gadgets have been mitigated, go to step 6. 3. Use a heuristic or plugin to approximate minimal LFENCE insertion. 4. Insert one LFENCE along each CFG edge that was cut in step 3. 5. Go to step 2. 6. If any LFENCEs were inserted, return true from runOnFunction() to tell LLVM that the function was modified. By default, the heuristic used in Step 3 is a greedy heuristic that avoids inserting LFENCEs into loops unless absolutely necessary. There is also a CLI option to load a plugin that can provide even better optimization, inserting fewer fences, while still mitigating all of the LVI gadgets. The plugin can be found here: https://github.com/intel/lvi-llvm-optimization-plugin, and a description of the pass's behavior with the plugin can be found here: https://software.intel.com/security-software-guidance/insights/optimized-mitigation-approach-load-value-injection. Differential Revision: https://reviews.llvm.org/D75937
*	[X86] Add a Pass that builds a Condensed CFG for Load Value Injection (LVI) ↵	Scott Constable	2020-06-24	7	-0/+1035
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Gadgets Adds a new data structure, ImmutableGraph, and uses RDF to find LVI gadgets and add them to a MachineGadgetGraph. More specifically, a new X86 machine pass finds Load Value Injection (LVI) gadgets consisting of a load from memory (i.e., SOURCE), and any operation that may transmit the value loaded from memory over a covert channel, or use the value loaded from memory to determine a branch/call target (i.e., SINK). Also adds a new target feature to X86: +lvi-load-hardening The feature can be added via the clang CLI using -mlvi-hardening. Differential Revision: https://reviews.llvm.org/D75936
*	[X86] Add RET-hardening Support to mitigate Load Value Injection (LVI)	Scott Constable	2020-06-24	4	-0/+145
\| \| \| \| \| \| \| \| \| \| \| \| \|	Adding a pass that replaces every ret instruction with the sequence: pop <scratch-reg> lfence jmp *<scratch-reg> where <scratch-reg> is some available scratch register, according to the calling convention of the function being mitigated. Differential Revision: https://reviews.llvm.org/D75935
*	[X86] Add Indirect Thunk Support to X86 to mitigate Load Value Injection (LVI)	Scott Constable	2020-06-24	4	-8/+68
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This pass replaces each indirect call/jump with a direct call to a thunk that looks like: lfence jmpq *%r11 This ensures that if the value in register %r11 was loaded from memory, then the value in %r11 is (architecturally) correct prior to the jump. Also adds a new target feature to X86: +lvi-cfi ("cfi" meaning control-flow integrity) The feature can be added via clang CLI using -mlvi-cfi. This is an alternate implementation to https://reviews.llvm.org/D75934 That merges the thunk insertion functionality with the existing X86 retpoline code. Differential Revision: https://reviews.llvm.org/D76812
*	[X86] Refactor X86IndirectThunks.cpp to Accommodate Mitigations other than ↵	Scott Constable	2020-06-24	1	-125/+157
\| \| \| \| \| \| \| \|	Retpoline Introduce a ThunkInserter CRTP base class from which new thunk types can inherit, e.g., thunks to mitigate https://software.intel.com/security-software-guidance/software-guidance/load-value-injection. Differential Revision: https://reviews.llvm.org/D76811
*	[X86][NFC] Generalize the naming of "Retpoline Thunks" and related code to ↵	Scott Constable	2020-06-24	14	-113/+135
\| \| \| \| \| \| \| \| \| \| \| \|	"Indirect Thunks" There are applications for indirect call/branch thunks other than retpoline for Spectre v2, e.g., https://software.intel.com/security-software-guidance/software-guidance/load-value-injection Therefore it makes sense to refactor X86RetpolineThunks as a more general capability. Differential Revision: https://reviews.llvm.org/D76810
*	Move RDF from Hexagon to Codegen	Scott Constable	2020-06-24	13	-4711/+16
\| \| \| \| \| \| \| \|	RDF is designed to be target agnostic. Therefore it would be useful to have it available for other targets, such as X86. Based on a previous patch by Krzysztof Parzyszek Differential Revision: https://reviews.llvm.org/D75932
*	[PowerPC] Do not assume operand of ADDI is an immediate	Nemanja Ivanovic	2020-06-23	1	-0/+4
\| \| \| \| \| \| \| \| \| \|	After pseudo-expansion, we may end up with ADDI (add immediate) instructions where the operand is not an immediate but a relocation. For such instructions, attempts to get the immediate result in assertion failures for obvious reasons. Fixes: https://bugs.llvm.org/show_bug.cgi?id=45432 (cherry picked from commit a56d057dfe3127c111c3470606c04e96d35b1fa3)
*	[BPF] fix incorrect type in BPFISelDAGToDAG readonly load optimization	Yonghong Song	2020-06-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In BPF Instruction Selection DAGToDAG transformation phase, BPF backend had an optimization to turn load from readonly data section to direct load of the values. This phase is implemented before libbpf has readonly section support and before alu32 is supported. This phase however may generate incorrect type when alu32 is enabled. The following is an example, -bash-4.4$ cat ~/tmp2/t.c struct t { unsigned char a; unsigned char b; unsigned char c; }; extern void foo(void ); int test() { struct t v = { .b = 2, }; foo(&v); return 0; } The compiler will turn local variable "v" into a readonly section. During instruction selection phase, the compiler generates two loads from readonly section, one 2 byte load or 1 byte load, e.g., for 2 loads, t8: i32,ch = load<(dereferenceable load 2 from `i8 getelementptr inbounds (%struct.t, %struct.t* @__const.test.v, i64 0, i32 0)`, align 1), anyext from i16> t3, GlobalAddress:i64<%struct.t* @__const.test.v> 0, undef:i64 t9: ch = store<(store 2 into %ir.v1.sub1), trunc to i16> t3, t8, FrameIndex:i64<0>, undef:i64 BPF backend changed t8 to i64 = Constant<2> and eventually the generated machine IR: t10: i64 = MOV_ri TargetConstant:i64<2> t40: i32 = SLL_ri_32 t10, TargetConstant:i32<8> t41: i32 = OR_ri_32 t40, TargetConstant:i64<0> t9: ch = STH32<Mem:(store 2 into %ir.v1.sub1)> t41, TargetFrameIndex:i64<0>, TargetConstant:i64<0>, t3 Note that t10 in the above is not correct. The type should be i32 and instruction should be MOV_ri_32. The reason for incorrect insn selection is BPF insn selection generated an i64 constant instead of an i32 constant as specified in the original load instruction. Such incorrect insn sequence eventually caused the following fatal error when a COPY insn tries to copy a 64bit register to a 32bit subregister. Impossible reg-to-reg copy UNREACHABLE executed at ../lib/Target/BPF/BPFInstrInfo.cpp:42! This patch fixed the issue by using the load result type instead of always i64 when doing readonly load optimization. Differential Revision: https://reviews.llvm.org/D81630 (cherry picked from commit 4db1878158a3f481ff673fef2396c12b7a53d280)
*	[BPF] fix a bug for BTF pointee type pruning	Yonghong Song	2020-06-23	1	-0/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In BTF, pointee type pruning is used to reduce cluttering too many unused types into prog BTF. For example, struct task_struct { ... struct mm_struct mm; ... } If bpf program does not access members of "struct mm_struct", there is no need to bring types for "struct mm_struct" to BTF. This patch fixed a bug where an incorrect pruning happened. The test case like below: struct t; typedef struct t _t; struct s1 { _t c; }; int test1(struct s1 arg) { ... } struct t { int a; int b; }; struct s2 { _t c; } int test2(struct s2 arg) { ... } After processing test1(), among others, BPF backend generates BTF types for "struct s1", "_t" and a placeholder for "struct t". Note that "struct t" is not really generated. If later a direct access to "struct t" member happened, "struct t" BTF type will be generated properly. During processing test2(), when processing member type "_t c", BPF backend sees type "_t" already generated, so returned. This caused the problem that "struct t" BTF type is never generated and eventually causing incorrect type definition for "struct s2". To fix the issue, during DebugInfo type traversal, even if a typedef/const/volatile/restrict derived type has been recorded in BTF, if it is not a type pruning candidate, type traversal of its base type continues. Differential Revision: https://reviews.llvm.org/D82041 (cherry picked from commit 89648eb16d01725457f958e634d16c534b64c42c)
*	[PowerPC] Unaligned FP default should apply to scalars only	Nemanja Ivanovic	2020-06-23	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	As reported in PR45186, we could be in a situation where we don't want to handle unaligned memory accesses for FP scalars but still have VSX (which allows unaligned access for vectors). Change the default to only apply to scalars. Fixes: https://bugs.llvm.org/show_bug.cgi?id=45186 (cherry picked from commit 099a875f28d0131a6ae85af91b9eb8627917fbbe)
*	[PowerPC] Prevent legalization loop from promoting SELECT_CC from v4i32 to v4i32	Nemanja Ivanovic	2020-06-22	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	As reported in https://bugs.llvm.org/show_bug.cgi?id=45709 we can hit an infinite loop in legalization since we set the legalization action for ISD::SELECT_CC for all fixed length vector types to Promote. Without some different legalization action for the type being promoted to, the legalizer simply loops. Since we don't have patterns to match the node, the right legalization action should be Expand. Differential revision: https://reviews.llvm.org/D79854 (cherry picked from commit 793cc518b9428a0b7a40c59d4ecd5939a7bc84f7)
*	[PowerPC] Add missing handling for half precision	Tom Stellard	2020-06-22	3	-1/+41
\| \| \| \| \| \| \| \| \| \| \| \|	The fix for PR39865 took care of some of the handling for half precision but it missed a number of issues that still exist. This patch fixes the remaining issues that cause crashes in the PPC back end. Fixes: https://bugs.llvm.org/show_bug.cgi?id=45776 Differential revision: https://reviews.llvm.org/D79283 (cherry picked from commit 1a493b0fa556a07c728862c3c3f70bfd8683bef0)
*	[PowerPC] Add support for vmsumudm	Ahsan Saghir	2020-06-22	2	-0/+5
\| \| \| \| \| \| \| \| \|	This patch adds support for Vector Multiply-Sum Unsigned Doubleword Modulo instruction; vmsumudm. Differential Revision: https://reviews.llvm.org/D80294 (cherry picked from commit a28e9f1208608f8d18750bb88ca74722fb0bcce4)
*	[AARch64] Add Marvell ThunderX3T110 support	Wei Zhao	2020-06-17	14	-15/+2056
\| \| \| \| \| \| \| \| \| \| \|	This is the first checkin to support Marvell ThunderX3T110. Initial definition of the micro-ops of the instructions in ThunderX3T110 is included. Differential Revision: https://reviews.llvm.org/D78129 (cherry picked from commit 382d3a85e2a9269569e7fb8caa487d7ef57900c6)
*	[X86] make sure POP has implicit def/use of stack pointer when materializing ↵	Yuanfang Chen	2020-06-17	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	8-bit immediates for minsize Summary: Otherwise PostRA list scheduler may reorder instruction, such as schedule this ''' pushq $0x8 pop %rbx lea 0x2a0(%rsp),%r15 ''' to ''' pushq $0x8 lea 0x2a0(%rsp),%r15 pop %rbx ''' by mistake. The patch is to prevent this to happen by making sure POP has implicit use of SP. Reviewers: craig.topper Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77031 (cherry picked from commit ece79f47083babcabde3700c67b90ef19967a5b3)
*	[AArch64] Fix BTI instruction emission.	Daniel Kiss	2020-06-16	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: SCTLR_EL1.BT[01] controls the PACI[AB]SP compatibility with PBYTE 11 (see [1]) This bit will be set to zero so PACI[AB]SP are equal to BTI C instruction only. [1] https://developer.arm.com/docs/ddi0595/b/aarch64-system-registers/sctlr_el1 Reviewers: chill, tamas.petz, pbarrio, ostannard Reviewed By: tamas.petz, ostannard Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81746 (cherry picked from commit b8ae3fdfa579dbf366b1bb1cbfdbf8c51db7fa55)
*	[AArch64] Fix BTI landing pad generation.	Daniel Kiss	2020-06-16	1	-0/+4
\| \| \| \| \| \| \| \| \| \|	In some cases BTI landing pad is inserted even compatible instruction was there already. Meta instruction does not count in this case therefore skip them in the check for first instructions in the function. Differential revision: https://reviews.llvm.org/D74492 (cherry picked from commit d5a186a60014dc1a8c979c978cb32aba7ecb9102)
*	[X86] Fold undef elts to 0 in getTargetVShiftByConstNode.	Craig Topper	2020-06-16	1	-3/+6
\| \| \| \| \| \| \| \|	Similar to D81212. Differential Revision: https://reviews.llvm.org/D81292 (cherry picked from commit 3408dcbdf054ac3cc32a97a6a82a3cf5844be609)
*	[X86] Teach combineVectorShiftImm to constant fold undef elements to 0 not ↵	Craig Topper	2020-06-16	1	-2/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	undef. Shifts are supposed to always shift in zeros or sign bits regardless of their inputs. It's possible the input value may have been replaced with undef by SimplifyDemandedBits, but the shift in zeros are still demanded. This issue was reported to me by ispc from 10.0. Unfortunately their failing test does not fail on trunk. Seems to be because the shl is optimized out earlier now and doesn't become VSHLI. ispc bug https://github.com/ispc/ispc/issues/1771 Differential Revision: https://reviews.llvm.org/D81212 (cherry picked from commit 7c9a89fed8f5d53d61fe3a61a2581a7c28b1b6d2)
*	[X86] Add x, t and g modifiers for inline asm	Craig Topper	2020-06-11	1	-1/+45
\| \| \| \| \| \| \| \| \| \| \| \|	This patch adds the x, t and g modifiers for inline asm from GCC. These will print a vector register as xmm, ymm or zmm* respectively. I also fixed register names with modifiers with inteldialect so they are no longer printed with a leading %. Patch by Amanieu d'Antras Differential Revision: https://reviews.llvm.org/D78977 (cherry picked from commit c5f7c039efe7ff09a44cfd252f6cb001ceed6269)
*	[arm] Add big-endian version of pcrel fixups for adr instructions	Dimitry Andric	2020-05-19	1	-12/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In 2e24219d3cbf, a number of ARM pcrel fixups were resolved at assembly time, to solve PR44929. This only covered little-endian ARM however, so add similar fixups for big-endian ARM. Also extend the test case to cover big-endian ARM. Reviewers: hans, psmith, MaskRay Reviewed By: psmith, MaskRay Subscribers: kristof.beyls, hiraditya, danielkiss, emaste, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79774 (cherry picked from commit fc373522b044e0b150561204958f0d603fb4caba)
*	[ARM] Only produce qadd8b under hasV6Ops	David Green	2020-05-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When compiling for a arm5te cpu from clang, the +dsp attribute is set. This meant we could try and generate qadd8 instructions where we would end up having no pattern. I've changed the condition here to be hasV6Ops && hasDSP, which is what other parts of ARMISelLowering seem to use for similar instructions. Fixed PR45677. Differential Revision: https://reviews.llvm.org/D78877 (cherry picked from commit 8807139026b64ac40163bb255dad38a1d8054f08)
*	[PowerPC] Do not attempt to reuse load for 64-bit FP_TO_UINT without FPCVT	Nemanja Ivanovic	2020-05-19	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \|	We call the function that attempts to reuse the conversion without checking whether the target matches the constraints that the callee expects. This patch adds the check prior to the call. Fixes: https://bugs.llvm.org/show_bug.cgi?id=43976 Differential revision: https://reviews.llvm.org/D77564 (cherry picked from commit 64b31d96dfd6c05e6d52d8798726dec60502cfde)
*	Enable IBT(Indirect Branch Tracking) in JIT with CET(Control-flow ↵	Xiang1 Zhang	2020-05-18	3	-8/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Enforcement Technology) Do not commit the llvm/test/ExecutionEngine/MCJIT/cet-code-model-lager.ll because it will cause build bot fail(not suitable for window 32 target). Summary: This patch comes from H.J.'s https://github.com/hjl-tools/llvm-project/commit/2bd54ce7fa9e94fcd1118b948e14d1b6fc54dfd2 This patch fix the failed llvm unit tests which running on CET machine. (e.g. ExecutionEngine/MCJIT/MCJITTests) The reason we enable IBT at "JIT compiled with CET" is mainly that: the JIT don't know the its caller program is CET enable or not. If JIT's caller program is non-CET, it is no problem JIT generate CET code or not. But if JIT's caller program is CET enabled, JIT must generate CET code or it will cause Control protection exceptions. I have test the patch at llvm-unit-test and llvm-test-suite at CET machine. It passed. and H.J. also test it at building and running VNCserver(Virtual Network Console), it works too. (if not apply this patch, VNCserver will crash at CET machine.) Reviewers: hjl.tools, craig.topper, LuoYuanke, annita.zhang, pengfei Reviewed By: LuoYuanke Subscribers: tstellar, efriedma, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76900 (cherry picked from commit 01a32f2bd3fad4331cebe9f4faa270d7c082d281)
*	CET for Exception Handle	Pengfei Wang	2020-05-18	1	-3/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Bug fix for https://bugs.llvm.org/show_bug.cgi?id=45182 Exception handle may indirectly jump to catch pad, So we should add ENDBR instruction before catch pad instructions. Reviewers: craig.topper, hjl.tools, LuoYuanke, annita.zhang, pengfei Reviewed By: LuoYuanke Subscribers: hiraditya, llvm-commits Patch By: Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D76190 (cherry picked from commit 974d649f8eaf3026ccb9d1b77bdec55da25366e5)
*	BPF: fix a CORE optimization bug	Yonghong Song	2020-05-06	1	-1/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For the test case in this patch like below struct t { int a; } __attribute__((preserve_access_index)); int foo(void ); int test(struct t arg) { long param[1]; param[0] = (long)&arg->a; return foo(param); } The IR right before BPF SimplifyPatchable phase: %1:gpr = LD_imm64 @"llvm.t:0:0$0:0" %2:gpr = LDD killed %1:gpr, 0 %3:gpr = ADD_rr %0:gpr(tied-def 0), killed %2:gpr STD killed %3:gpr, %stack.0.param, 0 After SimplifyPatchable phase, the incorrect IR is generated: %1:gpr = LD_imm64 @"llvm.t:0:0$0:0" %3:gpr = ADD_rr %0:gpr(tied-def 0), killed %1:gpr CORE_MEM killed %3:gpr, 306, %0:gpr, @"llvm.t:0:0$0:0" Note that CORE_MEM pseudo op is introduced to encode memory operations related to CORE. In the above, we intend to check whether we have a store like (%3:gpr + 0) = ... and if this is the case, we could replace it with (%0:gpr + @"llvm.t:0:0$0:0"_ = ... Unfortunately, in the above, IR for the store is *(%stack.0.param + 0) = %3:gpr and transformation should not happen. Note that we won't have problem if the actual CORE dereference (arg->a) happens. This patch fixed the problem by skip CORE optimization if the use of ADD_rr result is not the base address of the store operation. Differential Revision: https://reviews.llvm.org/D78466 (cherry picked from commit 3cb7e7bf959dcd3b8080986c62e10a75c7af43f0)
*	[PowerPC] Don't generate ST_VSR_SCAL_INT if power8-vector is disabled	Kai Luo	2020-04-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In https://bugs.llvm.org/show_bug.cgi?id=45297, it fails selecting instructions for `PPCISD::ST_VSR_SCAL_INT`. The reason it generate the `PPCISD::ST_VSR_SCAL_INT` with `-power8-vector` in IR is PPC's combiner checks `hasP8Altivec` rather than `hasP8Vector`. This patch should resolve PR45297. Differential Revision: https://reviews.llvm.org/D76773 (cherry picked from commit 8eb40e41f6ec99985a292e342ec303a0bd6f5f41)
*	[PowerPC] Update alignment for ReuseLoadInfo in LowerFP_TO_INTForReuse	Kai Luo	2020-04-16	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	In LowerFP_TO_INTForReuse, when emitting `stfiwx`, alignment of 4 is set for the `MachineMemOperand`, but RLI(ReuseLoadInfo)'s alignment is not updated for following loads. It's related to failed alignment check reported in https://bugs.llvm.org/show_bug.cgi?id=45297 Differential Revision: https://reviews.llvm.org/D77624 Backport b7d5229d789b7cb2747226d528ed016624b11cea.
*	[X86][SSE] combineX86ShufflesConstants - early out for zeroable vectors ↵	Simon Pilgrim	2020-04-16	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \|	(PR45443) Shuffle combining can insert zero byte sized elements into the shuffle mask, which combineX86ShufflesConstants will attempt to fold without taking into account whether the byte-sized type is legal (e.g. AVX512F only targets). If we have a full-zeroable vector then we should just return a zero version of the root type, otherwise if the type isn't valid we should bail. Fixes PR45443 (cherry picked from commit e3b60597769f79a8abc19fb8ef1f321d9adc1358)
*	[MC][ARM] Resolve some pcrel fixups at assembly time (PR44929)	Hans Wennborg	2020-02-27	1	-12/+10
\| \| \| \| \| \| \| \| \| \| \| \|	MC currently does not emit these relocation types, and lld does not handle them. Add FKF_Constant as a work-around of some ARM code after D72197. Eventually we probably should implement these relocation types. By Fangrui Song! Differential revision: https://reviews.llvm.org/D72892 (cherry picked from commit 2e24219d3cbfcb8c824c58872f97de0a2e94a7c8)
*	Don't generate libcalls for wide shift on Windows ARM (PR42711)	Hans Wennborg	2020-02-25	1	-1/+1
\| \| \| \| \| \| \|	The previous patch (cff90f07cb5cc3c3bc58277926103af31caef308) didn't cover ARM. (cherry picked from commit decd021facba804b57e8d80b6159c987d3261ab8)
*	[RISCV] Correct the CallPreservedMask for the function call in an interrupt ↵	Shiva Chen	2020-02-20	1	-7/+0
\| \| \| \| \| \| \| \| \| \| \|	handler CallPreservedMask is used to describe the register liveness after a function call. The function call in an interrupt handler should use the same CallPreservedMask as normal functions. So that only callee save registers can live through the function call. (cherry picked from commit 1cae2f9d192c69833e22684ca338660942ab464e)
*	Fix unused function warning (PR44808)	Hans Wennborg	2020-02-19	1	-5/+7
\| \| \| \|	(cherry picked from commit a19de32095e4cdb18957e66609574ce2021a8d1c)
*	[X86CmovConversion] Make heuristic for optimized cmov depth more ↵	Nikita Popov	2020-02-19	1	-6/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	conservative (PR44539) Fix/workaround for https://bugs.llvm.org/show_bug.cgi?id=44539. As discussed there, this pass makes some overly optimistic assumptions, as it does not have access to actual branch weights. This patch makes the computation of the depth of the optimized cmov more conservative, by assuming a distribution of 75/25 rather than 50/50 and placing the weights to get the more conservative result (larger depth). The fully conservative choice would be std::max(TrueOpDepth, FalseOpDepth), but that would break at least one existing test (which may or may not be an issue in practice). Differential Revision: https://reviews.llvm.org/D74155 (cherry picked from commit 5eb19bf4a2b0c29a8d4d48dfb0276f096eff9bec)
*	[FPEnv][ARM] Don't call mutateStrictFPToFP when lowering	John Brawn	2020-02-18	1	-2/+10
\| \| \| \| \| \| \| \| \| \| \| \|	mutateStrictFPToFP can delete the node and replace it with another with the same value which can later cause problems, and returning the result of mutateStrictFPToFP doesn't work because SelectionDAGLegalize expects that the returned value has the same number of results as the original. Instead handle things by doing the mutation manually. Differential Revision: https://reviews.llvm.org/D74726 (cherry picked from commit 594a89f7270da74c89f2321432bc6a7135773fa5)
*	[ARM] Fix infinite loop when lowering STRICT_FP_EXTEND	John Brawn	2020-02-18	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the target has FP64 but not FP16 then we have custom lowering for FP_EXTEND and STRICT_FP_EXTEND with type f64. However if the extend is from f32 to f64 the current implementation will cause in infinite loop for STRICT_FP_EXTEND due to emitting a merge_values of the original node which after replacement becomes a merge_values of itself. Fix this by not doing anything for f32 to f64 extend when we have FP64, though for STRICT_FP_EXTEND we have to do the strict-to-nonstrict mutation as that doesn't happen automatically for opcodes with custom lowering. Differential Revision: https://reviews.llvm.org/D74559 (cherry picked from commit 0ec57972967dfb43fc022c2e3788be041d1db730)
*	[FPEnv][AArch64] Add lowering of f128 STRICT_FSETCC	John Brawn	2020-02-18	1	-2/+4
\| \| \| \| \| \| \| \|	These get lowered to function calls, like the non-strict versions. Differential Revision: https://reviews.llvm.org/D73784 (cherry picked from commit 68cf574857c81f711f498a479855a17e7bea40f7)