bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[GlobalISel][AArch64] Add isel support for FP16 vector @llvm.ceil	Jessica Paquette	2019-01-24	4	-1/+306
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds support for vector @llvm.ceil intrinsics when full 16 bit floating point support isn't available. To do this, this patch... - Implements basic isel for G_UNMERGE_VALUES - Teaches the legalizer about 16 bit floats - Teaches AArch64RegisterBankInfo to respect floating point registers on G_BUILD_VECTOR and G_UNMERGE_VALUES - Teaches selectCopy about 16-bit floating point vectors It also adds - A legalizer test for the 16-bit vector ceil which verifies that we create a G_UNMERGE_VALUES and G_BUILD_VECTOR when full fp16 isn't supported - An instruction selection test which makes sure we lower to G_FCEIL when full fp16 is supported - A test for selecting G_UNMERGE_VALUES And also updates arm64-vfloatintrinsics.ll to show that the new ceiling types work as expected. https://reviews.llvm.org/D56682 llvm-svn: 352113
*	[SLH][AArch64] Remove accidentally retained -debug-only line from test.	Kristof Beyls	2019-01-23	1	-1/+0
\| \| \| \|	llvm-svn: 351932
*	[SLH] AArch64: correctly pick temporary register to mask SP	Kristof Beyls	2019-01-23	3	-47/+141
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As part of speculation hardening, the stack pointer gets masked with the taint register (X16) before a function call or before a function return. Since there are no instructions that can directly mask writing to the stack pointer, the stack pointer must first be transferred to another register, where it can be masked, before that value is transferred back to the stack pointer. Before, that temporary register was always picked to be x17, since the ABI allows clobbering x17 on any function call, resulting in the following instruction pattern being inserted before function calls and returns/tail calls: mov x17, sp and x17, x17, x16 mov sp, x17 However, x17 can be live in those locations, for example when the call is an indirect call, using x17 as the target address (blr x17). To fix this, this patch looks for an available register just before the call or terminator instruction and uses that. In the rare case when no register turns out to be available (this situation is only encountered twice across the whole test-suite), just insert a full speculation barrier at the start of the basic block where this occurs. Differential Revision: https://reviews.llvm.org/D56717 llvm-svn: 351930
*	hwasan: Move memory access checks into small outlined functions on aarch64.	Peter Collingbourne	2019-01-23	1	-0/+63
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Each hwasan check requires emitting a small piece of code like this: https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html#memory-accesses The problem with this is that these code blocks typically bloat code size significantly. An obvious solution is to outline these blocks of code. In fact, this has already been implemented under the -hwasan-instrument-with-calls flag. However, as currently implemented this has a number of problems: - The functions use the same calling convention as regular C functions. This means that the backend must spill all temporary registers as required by the platform's C calling convention, even though the check only needs two registers on the hot path. - The functions take the address to be checked in a fixed register, which increases register pressure. Both of these factors can diminish the code size effect and increase the performance hit of -hwasan-instrument-with-calls. The solution that this patch implements is to involve the aarch64 backend in outlining the checks. An intrinsic and pseudo-instruction are created to represent a hwasan check. The pseudo-instruction is register allocated like any other instruction, and we allow the register allocator to select almost any register for the address to check. A particular combination of (register selection, type of check) triggers the creation in the backend of a function to handle the check for specifically that pair. The resulting functions are deduplicated by the linker. The pseudo-instruction (really the function) is specified to preserve all registers except for the registers that the AAPCS specifies may be clobbered by a call. To measure the code size and performance effect of this change, I took a number of measurements using Chromium for Android on aarch64, comparing a browser with inlined checks (the baseline) against a browser with outlined checks. Code size: Size of .text decreases from 243897420 to 171619972 bytes, or a 30% decrease. Performance: Using Chromium's blink_perf.layout microbenchmarks I measured a median performance regression of 6.24%. The fact that a perf/size tradeoff is evident here suggests that we might want to make the new behaviour conditional on -Os/-Oz. But for now I've enabled it unconditionally, my reasoning being that hwasan users typically expect a relatively large perf hit, and ~6% isn't really adding much. We may want to revisit this decision in the future, though. I also tried experimenting with varying the number of registers selectable by the hwasan check pseudo-instruction (which would result in fewer variants being created), on the hypothesis that creating fewer variants of the function would expose another perf/size tradeoff by reducing icache pressure from the check functions at the cost of register pressure. Although I did observe a code size increase with fewer registers, I did not observe a strong correlation between the number of registers and the performance of the resulting browser on the microbenchmarks, so I conclude that we might as well use ~all registers to get the maximum code size improvement. My results are below: Regs \| .text size \| Perf hit -----+------------+--------- ~all \| 171619972 \| 6.24% 16 \| 171765192 \| 7.03% 8 \| 172917788 \| 5.82% 4 \| 177054016 \| 6.89% Differential Revision: https://reviews.llvm.org/D56954 llvm-svn: 351920
*	GlobalISel: Allow shift amount to be a different type	Matt Arsenault	2019-01-22	3	-13/+61
\| \| \| \| \| \| \| \| \|	For AMDGPU the shift amount is never 64-bit, and this needs to use a 32-bit shift. X86 uses i8, but seemed to be hacking around this before. llvm-svn: 351882
*	[AArch64] Add patterns for zext/sext of shift amount.	Eli Friedman	2019-01-22	1	-9/+48
\| \| \| \| \| \| \| \| \|	Not sure this is the best fix, but it saves an instruction for certain constructs involving variable shifts. Differential Revision: https://reviews.llvm.org/D55572 llvm-svn: 351768
*	[AArch64] add more tests for buildvec to shuffle transform; NFC	Sanjay Patel	2019-01-21	1	-0/+419
\| \| \| \| \| \| \| \|	These are copied from the sibling x86 file. I'm not sure which of the current outputs (if any) is considered optimal, but someone more familiar with AArch may want to take a look. llvm-svn: 351754
*	[DAGCombiner] fix crash when converting build vector to shuffle	Sanjay Patel	2019-01-21	1	-0/+22
\| \| \| \| \| \| \| \| \| \|	The regression test is reduced from the example shown in D56281. This does raise a question as noted in the test file: do we want to handle this pattern? I don't have a motivating example for that on x86 yet, but it seems like we could have that pattern there too, so we could avoid the back-and-forth using a shuffle. llvm-svn: 351753
*	GlobalISel: Verify G_BITCAST	Matt Arsenault	2019-01-18	1	-4/+4
\| \| \| \|	llvm-svn: 351594
*	Fix the buildbot failure introduced by r351404	Sanjin Sijaric	2019-01-17	1	-1/+1
\| \| \| \| \| \| \| \| \|	EXPENSIVE_CHECKS buildbots are failing due to r351404. Add x1 as live in to the funclet basic block for SEH funclets, as well as -verify-machineinstrs to the test case that triggered the failure. llvm-svn: 351472
*	[ARM64][Windows] Share unwind codes between epilogues	Sanjin Sijaric	2019-01-17	2	-3/+228
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are cases where we have multiple epilogues that have the exact same unwind code sequence. In that case, the epilogues can share the same unwind codes in the .xdata section. This should get us past the assert "SEH unwind data splitting not yet implemented" in many cases. We still need to add support for generating multiple .pdata/.xdata sections for those functions that need to be split into fragments. Differential Revision: https://reviews.llvm.org/D56813 llvm-svn: 351421
*	[SEH] [ARM64] Retrieve the frame pointer from SEH funclets	Sanjin Sijaric	2019-01-17	1	-0/+121
\| \| \| \| \| \| \|	The Windows ARM64 runtime passes the establisher frame to funclets as the first argument. llvm-svn: 351404
*	[COFF, ARM64] Implement support for SEH extensions __try/__except/__finally	Mandeep Singh Grang	2019-01-16	2	-0/+97
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch supports MS SEH extensions __try/__except/__finally. The intrinsics localescape and localrecover are responsible for communicating escaped static allocas from the try block to the handler. We need to preserve frame pointers for SEH. So we create a new function/property HasLocalEscape. Reviewers: rnk, compnerd, mstorsjo, TomTan, efriedma, ssijaric Reviewed By: rnk, efriedma Subscribers: smeenai, jrmuizel, alex, majnemer, ssijaric, ehsan, dmajor, kristina, javed.absar, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D53540 llvm-svn: 351370
*	[GISel]: Add support for CSEing continuously during GISel passes.	Aditya Nandakumar	2019-01-16	4	-0/+58
\| \| \| \| \| \| \| \| \| \|	https://reviews.llvm.org/D52803 This patch adds support to continuously CSE instructions during each of the GISel passes. It consists of a GISelCSEInfo analysis pass that can be used by the CSEMIRBuilder. llvm-svn: 351283
*	Remove irrelevant references to legacy git repositories from	James Y Knight	2019-01-15	1	-1/+1
\| \| \| \| \| \| \| \| \|	compiler identification lines in test-cases. (Doing so only because it's then easier to search for references which are actually important and need fixing.) llvm-svn: 351200
*	[AArch64] Adjust the feature set for Exynos	Evandro Menezes	2019-01-15	1	-0/+1
\| \| \| \| \| \|	Enable the fusion of arithmetic and logic instructions for Exynos M4. llvm-svn: 351149
*	[AArch64] Fix typo (NFC)	Evandro Menezes	2019-01-15	1	-1/+1
\| \| \| \| \| \| \|	Fix another typo, this time in the `RUN` line, which used a syntax not universally supported, in test case added by D56572. llvm-svn: 351144
*	[AArch64] Fix typo (NFC)	Evandro Menezes	2019-01-15	1	-22/+22
\| \| \| \| \| \|	Fix typo in test case added by D56572 (rL351139). llvm-svn: 351143
*	[EarlyIfConversion] Don't if-convert unconditional branches.	Eli Friedman	2019-01-15	1	-0/+29
\| \| \| \| \| \| \| \| \| \| \| \| \|	A block ending in an unconditional branch can have two successors if one is a landing pad. In practice, I think this only has an effect on Windows because landing pads are never empty for Itanium unwinding. (Alternatively, I could add a check to AArch64InstrInfo::canInsertSelect, but this seems more obvious.) Differential Revision: https://reviews.llvm.org/D56468 llvm-svn: 351142
*	[AArch64] Explicitly use v1i64 type for llvm.aarch64.neon.abs.i64 .	Eli Friedman	2019-01-15	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Otherwise, with D56544, the intrinsic will be expanded to an integer csel, which is probably not what the user expected. This matches the general convention of using "v1" types to represent scalar integer operations in vector registers. While I'm here, also add some error checking so we don't generate illegal ABS nodes. Differential Revision: https://reviews.llvm.org/D56616 llvm-svn: 351141
*	[AArch64] Add new target feature to fuse arithmetic and logic operations	Evandro Menezes	2019-01-14	1	-0/+111
\| \| \| \| \| \| \| \| \|	This feature enables the fusion of some arithmetic and logic instructions together. Differential revision: https://reviews.llvm.org/D56572 llvm-svn: 351139
*	Replace "no-frame-pointer-*" function attributes with "frame-pointer"	Francis Visoiu Mistrih	2019-01-14	17	-29/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Part of the effort to refactoring frame pointer code generation. We used to use two function attributes "no-frame-pointer-elim" and "no-frame-pointer-elim-non-leaf" to represent three kinds of frame pointer usage: (all) frames use frame pointer, (non-leaf) frames use frame pointer, (none) frame use frame pointer. This CL makes the idea explicit by using only one enum function attribute "frame-pointer" Option "-frame-pointer=" replaces "-disable-fp-elim" for tools such as llc. "no-frame-pointer-elim" and "no-frame-pointer-elim-non-leaf" are still supported for easy migration to "frame-pointer". tests are mostly updated with // replace command line args ‘-disable-fp-elim=false’ with ‘-frame-pointer=none’ grep -iIrnl '\-disable-fp-elim=false' * \| xargs sed -i '' -e "s/-disable-fp-elim=false/-frame-pointer=none/g" // replace command line args ‘-disable-fp-elim’ with ‘-frame-pointer=all’ grep -iIrnl '\-disable-fp-elim' * \| xargs sed -i '' -e "s/-disable-fp-elim/-frame-pointer=all/g" Patch by Yuanfang Chen (tabloid.adroit)! Differential Revision: https://reviews.llvm.org/D56351 llvm-svn: 351049
*	[X86][AARCH64] Improve ISD::ABS support	Simon Pilgrim	2019-01-12	1	-1/+2
\| \| \| \| \| \| \| \|	This patch takes some of the code from D49837 to allow us to enable ISD::ABS support for all SSE vector types. Differential Revision: https://reviews.llvm.org/D56544 llvm-svn: 350998
*	[AArch64] Fix operation actions for FP16 vector intrinsics	Bryan Chan	2019-01-10	1	-215/+318
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch changes the legalization action for some half-precision floating- point vector intrinsics (FSIN, FLOG, etc.) from Promote to Expand. These ops are not supported in hardware for half-precision vectors, but promotion is not always possible (for v8f16 operands). Changing the action to Expand fixes an assertion failure in the legalizer when the frontend produces such ops. In addition, a quick microbenchmark shows that, in the v4f16 case, expanding introduces fewer spills and is therefore slightly faster than promoting. Reviewers: t.p.northover, SjoerdMeijer Reviewed By: SjoerdMeijer Subscribers: javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D56296 llvm-svn: 350825
*	[AArch64] Emit the correct MCExpr relocations specifiers like VK_ABS_G0, etc	Mandeep Singh Grang	2019-01-10	1	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: D55896 and D56029 add support to emit fixups for :abs_g0: , :abs_g1_s: , etc. This patch adds the necessary enums and MCExpr needed for lowering these. Reviewers: rnk, mstorsjo, efriedma Reviewed By: efriedma Subscribers: javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D56037 llvm-svn: 350798
*	[AArch64] Add test for constant shrinking with multiple users (NFC).	Florian Hahn	2019-01-09	1	-0/+18
\| \| \| \| \| \|	Test to avoid regression fixed by rL350684. llvm-svn: 350762
*	[CodeGen] Ignore return sext/zext attributes of unused results for tail calls	Francis Visoiu Mistrih	2019-01-09	1	-0/+36
\| \| \| \| \| \| \| \| \| \| \| \| \|	If the caller's return type does not have a zeroext attribute but the callee does a tail call zeroext, we won't consider the tail call during CodeGenPrepare because the attributes don't match. However, if the result of the tail call has no uses, it makes sense to drop the sext/zext attributes. Differential Revision: https://reviews.llvm.org/D56486 llvm-svn: 350753
*	Initial AArch64 SLH implementation.	Kristof Beyls	2019-01-09	1	-0/+157
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is an initial implementation for Speculative Load Hardening for AArch64. It builds on top of the recently introduced AArch64SpeculationHardening pass. This doesn't implement (yet) some of the optimizations implemented for the X86SpeculativeLoadHardening pass. I thought introducing the optimizations incrementally in follow-up patches should make this easier to review. Differential Revision: https://reviews.llvm.org/D55929 llvm-svn: 350729
*	GlobalISel: Implement fewerElements for implicit_def	Matt Arsenault	2019-01-09	2	-1/+37
\| \| \| \|	llvm-svn: 350697
*	GlobalISel: Implement widenScalar for implicit_def	Matt Arsenault	2019-01-09	1	-1/+20
\| \| \| \|	llvm-svn: 350695
*	[GlobalISel] Fix choice of instruction selector for AArch64 at -O0 with ↵	Petr Pavlu	2019-01-08	1	-0/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	-global-isel=0 Commit rL347861 introduced an unintentional change in the behaviour when compiling for AArch64 at -O0 with -global-isel=0. Previously, explicitly disabling GlobalISel resulted in using FastISel but an updated condition in the commit changed it to using SelectionDAG. The patch fixes this condition and slightly better organizes the code that chooses the instruction selector. Fixes PR40131. Differential Revision: https://reviews.llvm.org/D56266 llvm-svn: 350626
*	AArch64: avoid splitting vector truncating stores.	Tim Northover	2019-01-08	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \|	We have code to split vector splats (of zero and non-zero) for performance reasons, but it ignores the fact that a store might be truncating. Actually, truncating stores are formed for vNi8 and vNi16 types. Since the truncation is from a legal type, the size of the store is always <= 64-bits and so they don't actually benefit from being split up anyway, so this patch just disables that transformation. llvm-svn: 350620
*	Reversing the commit in revision 350186. Revision causes regression in 4	Ayonam Ray	2019-01-01	1	-62/+0
\| \| \| \| \| \|	tests. llvm-svn: 350187
*	Omit range checks from jump tables when lowering switches with unreachable	Ayonam Ray	2019-01-01	1	-0/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	default During the lowering of a switch that would result in the generation of a jump table, a range check is performed before indexing into the jump table, for the switch value being outside the jump table range and a conditional branch is inserted to jump to the default block. In case the default block is unreachable, this conditional jump can be omitted. This patch implements omitting this conditional branch for unreachable defaults. Review Reference: D52002 llvm-svn: 350186
*	NFC][CodeGen][X86][AArch64] Tests for bit extract (pat. a/c/d) with trunc ↵	Roman Lebedev	2018-12-22	1	-0/+153
\| \| \| \| \| \|	(PR36419) llvm-svn: 350000
*	[NFC][CodeGen][X86][AArch64] Bit extract: add nounwind attr to drop .cfi noise	Roman Lebedev	2018-12-22	1	-18/+18
\| \| \| \| \| \|	Forgot about that. llvm-svn: 349999
*	[NFC][CodeGen][X86][AArch64] Tests for bit extract (pat. b) with trunc (PR36419)	Roman Lebedev	2018-12-22	1	-0/+61
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	@bextr64_32_b1 is extracted from hotpath of real-world code (RawSpeed BitStream<>::peekBitsNoFill()) after `clang -O3`. @bextr64_32_b2/@bextr64_32_b0 is the same pattern, but with trunc done last, showing how i think it can be handled: https://rise4fun.com/Alive/K4B https://rise4fun.com/Alive/qC9 It is possible that middle-end should do some of this, too. https://bugs.llvm.org/show_bug.cgi?id=36419 llvm-svn: 349998
*	[GlobalISel][AArch64] Add support for widening G_FCEIL	Jessica Paquette	2018-12-21	1	-0/+23
\| \| \| \| \| \| \| \| \| \| \| \|	This adds support for widening G_FCEIL in LegalizerHelper and AArch64LegalizerInfo. More specifically, it teaches the AArch64 legalizer to widen G_FCEIL from a 16-bit float to a 32-bit float when the subtarget doesn't support full FP 16. This also updates AArch64/f16-instructions.ll to show that we perform the correct transformation. llvm-svn: 349927
*	[GlobalISel][AArch64] Add G_FCEIL to isPreISelGenericFloatingPointOpcode	Jessica Paquette	2018-12-20	1	-0/+16
\| \| \| \| \| \| \| \| \|	If you don't do this, then if you hit a G_LOAD in getInstrMapping, you'll end up with GPRs on the G_FCEIL instead of FPRs. This causes a fallback. Add it to the switch, and add a test verifying that this happens. llvm-svn: 349822
*	Test commit	Amilendra Kodithuwakku	2018-12-20	1	-1/+1
\| \| \| \| \| \|	Fix a simple typo. llvm-svn: 349771
*	Fix build errors introduced by r349712 on aarch64 bots.	Amara Emerson	2018-12-20	1	-5/+5
\| \| \| \|	llvm-svn: 349723
*	[AArch64][GlobalISel] Implement selection og G_MERGE of two s32s into s64.	Amara Emerson	2018-12-20	1	-0/+34
\| \| \| \| \| \| \| \| \| \| \| \| \|	This code pattern is an unfortunate side effect of the way some types get split at call lowering. Ideally we'd either not generate it at all or combine it away in the legalizer artifact combiner. Until then, add selection support anyway which is a significant proportion of our current fallbacks on CTMark. rdar://46491420 llvm-svn: 349712
*	[GlobalISel][AArch64] Add support for @llvm.ceil	Jessica Paquette	2018-12-19	4	-0/+143
\| \| \| \| \| \| \| \| \| \| \| \|	This adds a G_FCEIL generic instruction and uses it in AArch64. This adds selection for floating point ceil where it has a supported, dedicated instruction. Other cases aren't handled here. It updates the relevant gisel tests and adds a select-ceil test. It also adds a check to arm64-vcvt.ll which ensures that we don't fall back when we run into one of the relevant cases. llvm-svn: 349664
*	[TargetLowering] Fix propagation of undefs in zero extension ops (PR40091)	Simon Pilgrim	2018-12-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	As described on PR40091, we have several places where zext (and zext_vector_inreg) fold an undef input into an undef output. For zero extensions this is incorrect as the output should guarantee to least have the new upper bits set to zero. SimplifyDemandedVectorElts is the worst offender (and its the most likely to cause new undefs to appear) but DAGCombiner's tryToFoldExtendOfConstant has a similar issue. Thanks to @dmgreen for catching this. Differential Revision: https://reviews.llvm.org/D55883 llvm-svn: 349625
*	[AARCH64] Added test case for PR40091	Simon Pilgrim	2018-12-18	1	-0/+22
\| \| \| \|	llvm-svn: 349543
*	Add FMF management to common fp intrinsics in GlobalIsel	Michael Berg	2018-12-18	1	-16/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This the initial code change to facilitate managing FMF flags from Instructions to MI wrt Intrinsics in Global Isel. Eventually the GlobalObserver interface will be added as well, where FMF additions can be tracked for the builder and CSE. Reviewers: aditya_nandakumar, bogner Reviewed By: bogner Subscribers: rovka, kristof.beyls, javed.absar Differential Revision: https://reviews.llvm.org/D55668 llvm-svn: 349514
*	[AArch64] - Return address signing dwarf support	Luke Cheeseman	2018-12-18	1	-11/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Reapply changes intially introduced in r343089 - The archtecture info is no longer loaded whenever a DWARFContext is created - The runtimes libraries (santiziers) make use of the dwarf context classes but do not intialise the target info - The architecture of the object can be obtained without loading the target info - Adding a method to the dwarf context to get this information and multiplex the string printing later on Differential Revision: https://reviews.llvm.org/D55774 llvm-svn: 349472
*	Introduce control flow speculation tracking pass for AArch64	Kristof Beyls	2018-12-18	5	-0/+346
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The pass implements tracking of control flow miss-speculation into a "taint" register. That taint register can then be used to mask off registers with sensitive data when executing under miss-speculation, a.k.a. "transient execution". This pass is aimed at mitigating against SpectreV1-style vulnarabilities. At the moment, it implements the tracking of miss-speculation of control flow into a taint register, but doesn't implement a mechanism yet to then use that taint register to mask off vulnerable data in registers (something for a follow-on improvement). Possible strategies to mask out vulnerable data that can be implemented on top of this are: - speculative load hardening to automatically mask of data loaded in registers. - using intrinsics to mask of data in registers as indicated by the programmer (see https://lwn.net/Articles/759423/). For AArch64, the following implementation choices are made. Some of these are different than the implementation choices made in the similar pass implemented in X86SpeculativeLoadHardening.cpp, as the instruction set characteristics result in different trade-offs. - The speculation hardening is done after register allocation. With a relative abundance of registers, one register is reserved (X16) to be the taint register. X16 is expected to not clash with other register reservation mechanisms with very high probability because: . The AArch64 ABI doesn't guarantee X16 to be retained across any call. . The only way to request X16 to be used as a programmer is through inline assembly. In the rare case a function explicitly demands to use X16/W16, this pass falls back to hardening against speculation by inserting a DSB SYS/ISB barrier pair which will prevent control flow speculation. - It is easy to insert mask operations at this late stage as we have mask operations available that don't set flags. - The taint variable contains all-ones when no miss-speculation is detected, and contains all-zeros when miss-speculation is detected. Therefore, when masking, an AND instruction (which only changes the register to be masked, no other side effects) can easily be inserted anywhere that's needed. - The tracking of miss-speculation is done by using a data-flow conditional select instruction (CSEL) to evaluate the flags that were also used to make conditional branch direction decisions. Speculation of the CSEL instruction can be limited with a CSDB instruction - so the combination of CSEL + a later CSDB gives the guarantee that the flags as used in the CSEL aren't speculated. When conditional branch direction gets miss-speculated, the semantics of the inserted CSEL instruction is such that the taint register will contain all zero bits. One key requirement for this to work is that the conditional branch is followed by an execution of the CSEL instruction, where the CSEL instruction needs to use the same flags status as the conditional branch. This means that the conditional branches must not be implemented as one of the AArch64 conditional branches that do not use the flags as input (CB(N)Z and TB(N)Z). This is implemented by ensuring in the instruction selectors to not produce these instructions when speculation hardening is enabled. This pass will assert if it does encounter such an instruction. - On function call boundaries, the miss-speculation state is transferred from the taint register X16 to be encoded in the SP register as value 0. Future extensions/improvements could be: - Implement this functionality using full speculation barriers, akin to the x86-slh-lfence option. This may be more useful for the intrinsics-based approach than for the SLH approach to masking. Note that this pass already inserts the full speculation barriers if the function for some niche reason makes use of X16/W16. - no indirect branch misprediction gets protected/instrumented; but this could be done for some indirect branches, such as switch jump tables. Differential Revision: https://reviews.llvm.org/D54896 llvm-svn: 349456
*	[AArch64] [MinGW] Allow enabling SEH exceptions	Martin Storsjo	2018-12-18	1	-0/+48
\| \| \| \| \| \| \| \| \|	The default still is dwarf, but SEH exceptions can now be enabled optionally for the MinGW target. Differential Revision: https://reviews.llvm.org/D55748 llvm-svn: 349451
*	FastIsel: take care to update iterators when removing instructions.	Tim Northover	2018-12-17	1	-0/+25
\| \| \| \| \| \| \| \| \| \|	We keep a few iterators into the basic block we're selecting while performing FastISel. Usually this is fine, but occasionally code wants to remove already-emitted instructions. When this happens we have to be careful to update those iterators so they're not pointint at dangling memory. llvm-svn: 349365