bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[X86] -fpatchable-function-entry=N,0: place patch label after ENDBR{32,64}	Fangrui Song	2020-02-05	1	-0/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Similar to D73680 (AArch64 BTI). A local linkage function whose address is not taken does not need ENDBR32/ENDBR64. Placing the patch label after ENDBR32/ENDBR64 has the advantage that code does not need to differentiate whether the function has an initial ENDBR. Also, add 32-bit tests and test that .cfi_startproc is at the function entry. The line information has a general implementation and is tested by AArch64/patchable-function-entry-empty.mir Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D73760 (cherry picked from commit 8ff86fcf4c038c7156ed4f01e7ed35cae49489e2)
*	[X86] Adjust nop emission by compiler to consider target decode limitations	Philip Reames	2020-01-11	1	-0/+17
\| \| \| \| \| \|	The primary motivation of this change is to bring the code more closely in sync behavior wise with the assembler's version of nop emission. I'd like to eventually factor them into one, but that's hard to do when one has features the other doesn't. The longest encodeable nop on x86 is 15 bytes, but many processors - for instance all intel chips - can't decode the 15 byte form efficiently. On those processors, it's better to use either a 10 byte or 11 byte sequence depending.
*	[X86] Support function attribute "patchable-function-entry"	Fangrui Song	2020-01-10	1	-3/+15
\| \| \| \| \| \| \|	For x86-64, we diverge from GCC -fpatchable-function-entry in that we emit multi-byte NOPs. Differential Revision: https://reviews.llvm.org/D72220
*	[BranchAlign] Compiler support for suppressing branch align	Philip Reames	2020-01-08	1	-0/+42
\| \| \| \| \| \| \| \| \| \| \| \|	As discussed heavily in the original review (D70157), there's a need for the compiler to be able to selective suppress padding (either nop or prefix) to respect assumptions about the meaning of labels and instructions in generated code. Rather than wait for syntax to be finalized - which appears to be a very slow process - this patch focuses on the compiler use case and only worries about the integrated assembler. To my knowledge, this covers all cases mentioned to date for clang/JIT support. For testing purposes, I wired it up so that if the integrated assembler was using autopadding for branch alignment (e.g. enabled at command line) then the textual assembly output would contain a comment for each location where padding was enabled or disabled. This seemed like the least painful choice overall. Note that the result of this patch effective disables the jcc errata mitigation for many constructs (statepoints, implicit null checks, xray, etc...) which is non ideal. It is at least correct and should allow us to enable the mitigation for the compiler. Once that's done, and a few other items are worked through, we probably want to come back to this an explore a bundling based approach instead so that we can pad instructions while keeping labels in the right place. Differential Revision: https://reviews.llvm.org/D72303
*	Fix "use of uninitialized variable" static analyzer warnings. NFCI.	Simon Pilgrim	2020-01-06	1	-0/+2
\| \| \| \|	Add "unreachable" default cases like we do for the other switch()s in X86MCInstLower::Lower
*	[StackMaps] Be explicit about label formation [NFC] (try 2)	Philip Reames	2019-12-19	1	-3/+14
\| \| \| \| \| \|	Recommit after making the same API change in non-x86 targets. This has been build for all targets, and tested for effected ones. Why the difference? Because my disk filled up when I tried make check for all. For auto-padding assembler support, we'll need to bundle the label with the instructions (nops or call sequences) so that they don't get separated. This just rearranges the code to make the upcoming change more obvious.
*	Temporarily Revert "[StackMaps] Be explicit about label formation [NFC]"	Eric Christopher	2019-12-19	1	-14/+3
\| \| \| \| \| \|	as it broke the aarch64 build. This reverts commit bc7595d934b958ab481288d7b8e768fe5310be8f.
*	[StackMaps] Be explicit about label formation [NFC]	Philip Reames	2019-12-19	1	-3/+14
\| \| \| \|	For auto-padding assembler support, we'll need to bundle the label with the instructions (nops or call sequences) so that they don't get separated. This just rearranges the code to make the upcoming change more obvious.
*	[FaultMaps] Make label formation a bit more explicit [NFC]	Philip Reames	2019-12-19	1	-1/+5
\| \| \| \|	This is in advance of assembler padding directives support where we'll need to bundle the label w/the corresponding faulting instruction to avoid padding being inserted between.
*	[X86] Teach X86MCInstLower to swap operands of commutable instructions to ↵	Craig Topper	2019-11-04	1	-0/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	enable 2-byte VEX encoding. Summary: The 2 source operands commutable instructions are encoded in the VEX.VVVV field and the r/m field of the MODRM byte plus the VEX.B field. The VEX.B field is missing from the 2-byte VEX encoding. If the VEX.VVVV source is 0-7 and the other register is 8-15 we can swap them to avoid needing the VEX.B field. This works as long as the VEX.W, VEX.mmmmm, and VEX.X fields are also not needed. Fixes PR36706. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68550
*	[X86] Use 64-bit version of source register in LowerPATCHABLE_EVENT_CALL and ↵	Craig Topper	2019-10-27	1	-6/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	LowerPATCHABLE_TYPED_EVENT_CALL Summary: The PATCHABLE_EVENT_CALL uses i32 in the intrinsic. This results in the register allocator picking a 32-bit register. We need to use the 64-bit register when forming the MOV64rr instructions. Otherwise we print illegal assembly in the text output. I think prior to this it was impossible for SrcReg to be equal to DstReg so the NOP code was not reachable. While there use Register instead of unsigned. Also add a FIXME for what looks like a bug. Reviewers: dberris Reviewed By: dberris Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69365
*	[X86] Remove isel patterns for mask vpcmpgt/vpcmpeq. Switch vpcmp to these ↵	Craig Topper	2019-10-04	1	-0/+175
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	based on the immediate in MCInstLower The immediate form of VPCMP can represent these completely. The vpcmpgt/eq are just shorter encodings. This patch removes the isel patterns and just swaps the opcodes and removes the immediate in MCInstLower. This matches where we do some other encodings tricks. Removes over 10K bytes from the isel table. Differential Revision: https://reviews.llvm.org/D68446 llvm-svn: 373766
*	Revert [Windows] Disable TrapUnreachable for Win64, add SEH_NoReturn	Reid Kleckner	2019-09-03	1	-14/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts r370525 (git commit 0bb1630685fba255fa93def92603f064c2ffd203) Also reverts r370543 (git commit 185ddc08eed6542781040b8499ef7ad15c8ae9f4) The approach I took only works for functions marked `noreturn`. In general, a call that is not known to be noreturn may be followed by unreachable for other reasons. For example, there could be multiple call sites to a function that throws sometimes, and at some call sites, it is known to always throw, so it is followed by unreachable. We need to insert an `int3` in these cases to pacify the Windows unwinder. I think this probably deserves its own standalone, Win64-only fixup pass that runs after block placement. Implementing that will take some time, so let's revert to TrapUnreachable in the mean time. llvm-svn: 370829
*	[X86] Print register names in .seh_* directives	Reid Kleckner	2019-08-30	1	-13/+5
\| \| \| \| \| \| \| \| \| \|	Also improve assembler parser register validation for .seh_ directives. This requires moving X86-specific seh directive handling into the x86 backend, which addresses some assembler FIXMEs. Differential Revision: https://reviews.llvm.org/D66625 llvm-svn: 370533
*	[Windows] Disable TrapUnreachable for Win64, add SEH_NoReturn	Reid Kleckner	2019-08-30	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Users have complained llvm.trap produce two ud2 instructions on Win64, one for the trap, and one for unreachable. This change fixes that. TrapUnreachable was added and enabled for Win64 in r206684 (April 2014) to avoid poorly understood issues with the Windows unwinder. There seem to be two major things in play: - the unwinder - C++ EH, _CxxFrameHandler3 & co The unwinder disassembles forward from the return address to scan for epilogues. Inserting a ud2 had the effect of stopping the unwinder, and ensuring that it ran the EH personality function for the current frame. However, it's not clear what the unwinder does when the return address happens to be the last address of one function and the first address of the next function. The Visual C++ EH personality, _CxxFrameHandler3, needs to figure out what the current EH state number is. It does this by consulting the ip2state table, which maps from PC to state number. This seems to go wrong when the return address is the last PC of the function or catch funclet. I'm not sure precisely which system is involved here, but in order to address these real or hypothetical problems, I believe it is enough to insert int3 after a call site if it would otherwise be the last instruction in a function or funclet. I was able to reproduce some similar problems locally by arranging for a noreturn call to appear at the end of a catch block immediately before an unrelated function, and I confirmed that the problems go away when an extra trailing int3 instruction is added. MSVC inserts int3 after every noreturn function call, but I believe it's only necessary to do it if the call would be the last instruction. This change inserts a pseudo instruction that expands to int3 if it is in the last basic block of a function or funclet. I did what I could to run the Microsoft compiler EH tests, and the ones I was able to run showed no behavior difference before or after this change. Differential Revision: https://reviews.llvm.org/D66980 llvm-svn: 370525
*	[X86] Remove encoding information from the TAILJMP instructions that are ↵	Craig Topper	2019-08-27	1	-23/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	lowered by MCInstLowering. Fix LowerPATCHABLE_TAIL_CALL to also convert them to regular JMP/JCC instructions There are 5 instructions here that are converted from TAILJMP opcodes to regular JMP/JCC opcodes during MCInstLowering. So normally there encoding information isn't used. The exception being when XRay wraps them in PATCHABLE_TAIL_CALL. For the ones that weren't already handled in MCInstLowering, add handling for those and remove their encoding information. This patch fixes PATCHABLE_TAIL_CALL to do the same opcode conversion as the regular lowering patch. Then removes the encoding information. Differential Revision: https://reviews.llvm.org/D66561 llvm-svn: 370079
*	[X86] Remove MCInstLower code that drops operands from some CALL and TAILJMP ↵	Craig Topper	2019-08-22	1	-18/+8
\| \| \| \| \| \| \| \| \| \|	instructions. Add asserts to verify operand count It appears the FIXME here was handled at some point. r159728 from 2012 seems to be at least aportion of fixing it. Differential Revision: https://reviews.llvm.org/D66570 llvm-svn: 369665
*	Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM	Daniel Sanders	2019-08-15	1	-9/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This clang-tidy check is looking for unsigned integer variables whose initializer starts with an implicit cast from llvm::Register and changes the type of the variable to llvm::Register (dropping the llvm:: where possible). Partial reverts in: X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned& MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register PPCFastISel.cpp - No Register::operator-=() PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned& MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor Manual fixups in: ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned& HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register. PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned& Depends on D65919 Reviewers: arsenm, bogner, craig.topper, RKSimon Reviewed By: arsenm Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65962 llvm-svn: 369041
*	[X86] -fno-plt: use GOT __tls_get_addr only if GOTPCRELX is enabled	Fangrui Song	2019-07-11	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: As of binutils 2.32, ld has a bogus TLS relaxation error when the GD/LD code sequence using R_X86_64_GOTPCREL (instead of R_X86_64_GOTPCRELX) is attempted to be relaxed to IE/LE (binutils PR24784). gold and lld are good. In gcc/config/i386/i386.md, there is a configure-time check of as/ld support and the GOT relaxation will not be used if as/ld doesn't support it: if (flag_plt \|\| !HAVE_AS_IX86_TLS_GET_ADDR_GOT) return "call\t%P2"; return "call\t{*%p2@GOT(%1)\|[DWORD PTR %p2@GOT[%1]]}"; In clang, -DENABLE_X86_RELAX_RELOCATIONS=OFF is the default. The ld.bfd bogus error can be reproduced with: thread_local int a; int main() { return a; } clang -fno-plt -fpic a.cc -fuse-ld=bfd GOTPCRELX gained relative good support in 2016, which is considered relatively new. It is even difficult to conditionally default to -DENABLE_X86_RELAX_RELOCATIONS=ON due to cross compilation reasons. So work around the ld.bfd bug by only using GOT when GOTPCRELX is enabled. Reviewers: dalias, hjl.tools, nikic, rnk Reviewed By: nikic Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64304 llvm-svn: 365752
*	[X86] Add VP2INTERSECT instructions	Pengfei Wang	2019-05-31	1	-0/+71
\| \| \| \| \| \| \| \| \| \|	Support Intel AVX512 VP2INTERSECT instructions in llvm Patch by Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D62366 llvm-svn: 362188
*	[X86] Support -fno-plt __tls_get_addr calls	Fangrui Song	2019-05-23	1	-51/+72
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In general dynamic/local dynamic TLS models, with -fno-plt, * x86: emit `calll ___tls_get_addr@GOT(%ebx)` instead of `calll ___tls_get_addr@PLT` Note, on x86, if we can get rid of %ebx as the PIC register, it may be better to use a register not preserved across function calls. x86_64: emit `callq *__tls_get_addr@GOTPCREL(%rip)` instead of `callq __tls_get_addr@PLT` Reorganize the code by separating 32-bit and 64-bit. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D62106 llvm-svn: 361453
*	[X86] Move InstPrinter files to MCTargetDesc. NFC	Richard Trieu	2019-05-10	1	-2/+2
\| \| \| \| \| \| \| \| \|	For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360484
*	Fixed "Value stored to 'Opc' is never read" warning. NFCI.	Simon Pilgrim	2019-05-07	1	-1/+1
\| \| \| \|	llvm-svn: 360133
*	Assigning to a local object in a return statement prevents copy elision. NFC.	David Blaikie	2019-04-25	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I added a diagnostic along the lines of `-Wpessimizing-move` to detect `return x = y` suppressing copy elision, but I don't know if the diagnostic is really worth it. Anyway, here are the places where my diagnostic reported that copy elision would have been possible if not for the assignment. P1155R1 in the post-San-Diego WG21 (C++ committee) mailing discusses whether WG21 should fix this pitfall by just changing the core language to permit copy elision in cases like these. (Kona update: The bulk of P1155 is proceeding to CWG review, but specifically not the parts that explored the notion of permitting copy-elision in these specific cases.) Reviewed By: dblaikie Author: Arthur O'Dwyer Differential Revision: https://reviews.llvm.org/D54885 llvm-svn: 359236
*	[X86] Merge the different Jcc instructions for each condition code into ↵	Craig Topper	2019-04-05	1	-5/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	single instructions that store the condition code as an operand. Summary: This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between Jcc instructions and condition codes. Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser. Reviewers: spatel, lebedev.ri, courbet, gchatelet, RKSimon Reviewed By: RKSimon Subscribers: MatzeB, qcolombet, eraman, hiraditya, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60228 llvm-svn: 357802
*	[X86] Make ADD*_DB post-RA pseudos and expand them in expandPostRAPseudo.	Craig Topper	2019-03-18	1	-16/+0
\| \| \| \| \| \| \| \| \|	These are used to help convert OR->LEA when needed to avoid avoid a copy. They aren't need after register allocation. Happens to remove an ugly goto from X86MCCodeEmitter.cpp llvm-svn: 356356
*	For faulting ops, include a comment w/the fault destination	Philip Reames	2019-03-12	1	-0/+1
\| \| \| \| \| \|	A faulting_op is one that has specified behavior when a fault occurs, generally redirecting control flow to another location. This change just adds a comment to the assembly output which makes it both human readable, and machine checkable w/o having to parse the FaultMap section. This is used to split a test file into two parts, so that I can (in a near future commit) easily extend the test file to demonstrate another case. llvm-svn: 355982
*	[X86] Enable 8-bit OR with disjoint bits to convert to LEA	Craig Topper	2019-03-05	1	-0/+2
\| \| \| \| \| \| \| \|	We already support 8-bits adds in convertToThreeAddress. But we can also support 8-bit OR if the bits are disjoint. We already do this for 16/32/64. Differential Revision: https://reviews.llvm.org/D58863 llvm-svn: 355423
*	[AsmPrinter] Remove hidden flag -print-schedule.	Andrea Di Biagio	2019-02-04	1	-17/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch removes hidden codegen flag -print-schedule effectively reverting the logic originally committed as r300311 (https://llvm.org/viewvc/llvm-project?view=revision&revision=300311). Flag -print-schedule was originally introduced by r300311 to address PR32216 (https://bugs.llvm.org/show_bug.cgi?id=32216). That bug was about adding "Better testing of schedule model instruction latencies/throughputs". These days, we can use llvm-mca to test scheduling models. So there is no longer a need for flag -print-schedule in LLVM. The main use case for PR32216 is now addressed by llvm-mca. Flag -print-schedule is mainly used for debugging purposes, and it is only actually used by x86 specific tests. We already have extensive (latency and throughput) tests under "test/tools/llvm-mca" for X86 processor models. That means, most (if not all) existing -print-schedule tests for X86 are redundant. When flag -print-schedule was first added to LLVM, several files had to be modified; a few APIs gained new arguments (see for example method MCAsmStreamer::EmitInstruction), and MCSubtargetInfo/TargetSubtargetInfo gained a couple of getSchedInfoStr() methods. Method getSchedInfoStr() had to originally work for both MCInst and MachineInstr. The original implmentation of getSchedInfoStr() introduced a subtle layering violation (reported as PR37160 and then fixed/worked-around by r330615). In retrospect, that new API could have been designed more optimally. We can always query MCSchedModel to get the latency and throughput. More importantly, the "sched-info" string should not have been generated by the subtarget. Note, r317782 fixed an issue where "print-schedule" didn't work very well in the presence of inline assembly. That commit is also reverted by this change. Differential Revision: https://reviews.llvm.org/D57244 llvm-svn: 353043
*	Update the file headers across all of the LLVM projects in the monorepo	Chandler Carruth	2019-01-19	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
*	[X86] Force floating point values in constant pool decoding to print in ↵	Craig Topper	2018-10-29	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	scientific notation so they can't be confused with integers. When the floating point constants are whole numbers they have no decimal point so look like integers, but mean something very different in something like an 'and' instruction. Ideally we would just print a decimal point and a 0, but I couldn't see how to make APFloat::toString do that. llvm-svn: 345488
*	[ELF] Fix large code model MIR verifier errors	Reid Kleckner	2018-10-24	1	-35/+0
\| \| \| \| \| \| \| \| \| \| \|	Instead of using the MOVGOT64r pseudo, use the existing MO_PIC_BASE_OFFSET support on symbol operands. Now I don't have to create a "scratch register operand" for the pseudo to use, and the register allocator can make better decisions. Fixes some X86 verifier errors tracked in PR27481. llvm-svn: 345219
*	X86: fix a comment copy-paste issue (NFC)	Saleem Abdulrasool	2018-10-22	1	-1/+1
\| \| \| \| \| \|	The comment was copy-pasted but not updated. NFC. llvm-svn: 344973
*	Recommit r344877 "[X86] Stop promoting integer loads to vXi64"	Craig Topper	2018-10-22	1	-5/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I've included a fix to DAGCombiner::ForwardStoreValueToDirectLoad that I believe will prevent the previous miscompile. Original commit message: Theoretically this was done to simplify the amount of isel patterns that were needed. But it also meant a substantial number of our isel patterns have to match an explicit bitcast. By making the vXi32/vXi16/vXi8 types legal for loads, DAG combiner should be able to change the load type to rem I had to add some additional plain load instruction patterns and a few other special cases, but overall the isel table has reduced in size by ~12000 bytes. So it looks like this promotion was hurting us more than helping. I still have one crash in vector-trunc.ll that I'm hoping @RKSimon can help with. It seems to relate to using getTargetConstantFromNode on a load that was shrunk due to an extract_subvector combine after the constant pool entry was created. So we end up decoding more mask elements than the lo I'm hoping this patch will simplify the number of patterns needed to remove the and/or/xor promotion. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits, RKSimon Differential Revision: https://reviews.llvm.org/D53306 llvm-svn: 344965
*	Revert r344877 "[X86] Stop promoting integer loads to vXi64"	Craig Topper	2018-10-22	1	-21/+5
\| \| \| \| \| \|	Sam McCall reported miscompiles in some tensorflow code. Reverting while I try to figure out. llvm-svn: 344921
*	[X86] Stop promoting integer loads to vXi64	Craig Topper	2018-10-21	1	-5/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Theoretically this was done to simplify the amount of isel patterns that were needed. But it also meant a substantial number of our isel patterns have to match an explicit bitcast. By making the vXi32/vXi16/vXi8 types legal for loads, DAG combiner should be able to change the load type to remove the bitcast. I had to add some additional plain load instruction patterns and a few other special cases, but overall the isel table has reduced in size by ~12000 bytes. So it looks like this promotion was hurting us more than helping. I still have one crash in vector-trunc.ll that I'm hoping @RKSimon can help with. It seems to relate to using getTargetConstantFromNode on a load that was shrunk due to an extract_subvector combine after the constant pool entry was created. So we end up decoding more mask elements than the load size. I'm hoping this patch will simplify the number of patterns needed to remove the and/or/xor promotion. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits, RKSimon Differential Revision: https://reviews.llvm.org/D53306 llvm-svn: 344877
*	Revert r344873 "foo"	Craig Topper	2018-10-21	1	-20/+4
\| \| \| \| \| \|	Rebase gone wrong left this in my tree. llvm-svn: 344875
*	foo	Craig Topper	2018-10-21	1	-4/+20
\| \| \| \|	llvm-svn: 344873
*	[X86] Only extract constant pool shuffle mask data with zero offsets	Simon Pilgrim	2018-10-21	1	-1/+1
\| \| \| \| \| \|	D53306 exposes an issue where we sometimes use constant pool data from bigger vectors than the target shuffle mask. This should be safe to do, but we have to be certain that we're using the bottom most part of the vector as the shuffle mask decoders have no way to peek into subvectors with non-zero offsets. llvm-svn: 344867
*	[X86] Add 128 MOVDDUP to the constant pool printing in ↵	Craig Topper	2018-10-15	1	-0/+6
\| \| \| \| \| \| \| \|	X86AsmPrinter::EmitInstruction. We use this instruction to broadcast a single 64-bit value to a v2i64/v2f64 vector. llvm-svn: 344486
*	[codeview] Fix 32-bit x86 variable locations in realigned stack frames	Reid Kleckner	2018-10-02	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add the .cv_fpo_stackalign directive so that we can define $T0, or the VFRAME virtual register, with it. This was overlooked in the initial implementation because unlike MSVC, we push CSRs before allocating stack space, so this value is only needed to describe local variable locations. Variables that the compiler now addresses via ESP are instead described as being stored at offsets from VFRAME, which for us is ESP after alignment in the prologue. This adds tests that show that we use the VFRAME register properly in our S_DEFRANGE records, and that we emit the correct FPO data to define it. Fixes PR38857 llvm-svn: 343603
*	[X86] Add APInt constant assembly printer helper	Simon Pilgrim	2018-10-02	1	-15/+18
\| \| \| \|	llvm-svn: 343577
*	[X86] Standardize floating point assembly comments	Simon Pilgrim	2018-10-02	1	-7/+11
\| \| \| \| \| \| \| \|	Consistently try to use APFloat::toString for floating point constant comments to get rid of differences between Constant / ConstantDataSequential values - it should help stop some of the linux-windows buildbot failures matching NaN/INF etc. as well. Differential Revision: https://reviews.llvm.org/D52702 llvm-svn: 343562
*	[MinGW] [X86] Pass true for the second parameter to StubValueTy for ↵	Martin Storsjo	2018-08-31	1	-2/+1
\| \| \| \| \| \| \| \| \| \|	MO_COFFSTUB. NFC. These stubs should never be emitted for internal symbols, and nothing in AsmPrinter ever actually use this value when producing the stubs for COFF anyway. llvm-svn: 341177
*	[MinGW] [X86] Add stubs for references to data variables that might end up ↵	Martin Storsjo	2018-08-29	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	imported from a dll Variables declared with the dllimport attribute are accessed via a stub variable named __imp_<var>. In MinGW configurations, variables that aren't declared with a dllimport attribute might still end up imported from another DLL with runtime pseudo relocs. For x86_64, this avoids the risk that the target is out of range for a 32 bit PC relative reference, in case the target DLL is loaded further than 4 GB from the reference. It also avoids having to make the text section writable at runtime when doing the runtime fixups, which makes it worthwhile to do for i386 as well. Add stub variables for all dso local data references where a definition of the variable isn't visible within the module, since the DLL data autoimporting might make them imported even though they are marked as dso local within LLVM. Don't do this for variables that actually are defined within the same module, since we then know for sure that it actually is dso local. Don't do this for references to functions, since there's no need for runtime pseudo relocations for autoimporting them; if a function from a different DLL is called without the appropriate dllimport attribute, the call just gets routed via a thunk instead. GCC does something similar since 4.9 (when compiling with -mcmodel=medium or large; from that version, medium is the default code model for x86_64 mingw), but only for x86_64. Differential Revision: https://reviews.llvm.org/D51288 llvm-svn: 340942
*	[x86/retpoline] Split the LLVM concept of retpolines into separate	Chandler Carruth	2018-08-23	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	subtarget features for indirect calls and indirect branches. This is in preparation for enabling only the call retpolines when using speculative load hardening. I've continued to use subtarget features for now as they continue to seem the best fit given the lack of other retpoline like constructs so far. The LLVM side is pretty simple. I'd like to eventually get rid of the old feature, but not sure what backwards compatibility issues that will cause. This does remove the "implies" from requesting an external thunk. This always seemed somewhat questionable and is now clearly not desirable -- you specify a thunk the same way no matter which set of things are getting retpolines. I really want to keep this nicely isolated from end users and just an LLVM implementation detail, so I've moved the `-mretpoline` flag in Clang to no longer rely on a specific subtarget feature by that name and instead to be directly handled. In some ways this is simpler, but in order to preserve existing behavior I've had to add some fallback code so that users who relied on merely passing -mretpoline-external-thunk continue to get the same behavior. We should eventually remove this I suspect (we have never tested that it works!) but I've not done that in this patch. Differential Revision: https://reviews.llvm.org/D51150 llvm-svn: 340515
*	[X86] Remove RELEASE_ and ACQUIRE_ pseudo instructions. Use isel patterns ↵	Craig Topper	2018-08-03	1	-64/+0
\| \| \| \| \| \| \| \| \| \| \| \|	and the normal instructions instead At one point in time acquire implied mayLoad and mayStore as did release. Thus we needed separate pseudos that also carried that property. This appears to no longer be the case. I believe it was changed in 2012 with a comment saying that atomic memory accesses are marked volatile which preserves the ordering. So from what I can tell we shouldn't need additional pseudos since they aren't carry any flags that are different from the normal instructions. The only thing I can think of is that we may consider them for load folding candidates in the peephole pass now where we didn't before. If that's important hopefully there's something in the memory operand we can check to prevent the folding without relying on pseudo instructions. Differential Revision: https://reviews.llvm.org/D50212 llvm-svn: 338925
*	[X86] Prevent promotion of i16 add/sub/and/or/xor to i32 if we can fold an ↵	Craig Topper	2018-08-03	1	-0/+8
\| \| \| \| \| \| \| \|	atomic load and atomic store. This makes them consistent with i8/i32/i64. Which still seems to be more aggressive on folding than icc, gcc, or MSVC. llvm-svn: 338795
*	[X86] Allow 'atomic_store (neg/not atomic_load)' to isel to a RMW instruction.	Craig Topper	2018-08-02	1	-0/+8
\| \| \| \| \| \|	There was a FIXMe in the td file about a type inference issue that was easy to fix. llvm-svn: 338782
*	Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code ↵	Reid Kleckner	2018-07-23	1	-0/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	models" Don't try to generate large PIC code for non-ELF targets. Neither COFF nor MachO have relocations for large position independent code, and users have been using "large PIC" code models to JIT 64-bit code for a while now. With this change, if they are generating ELF code, their JITed code will truly be PIC, but if they target MachO or COFF, it will contain 64-bit immediates that directly reference external symbols. For a JIT, that's perfectly fine. llvm-svn: 337740