summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* [WebAssembly] Switch to a more traditional assembly syntaxDan Gohman2015-10-064-144/+113
| | | | | | | | | | | This new syntax is built around putting each instruction on its own line in a "mnemonic op, op, op" like syntax. It also uses conventional data section directives like ".byte" and so on rather than requiring everything to be in hierarchical S-expression format. This is a more natural syntax for a ".s" file format from the perspective of LLVM MC and related tools, while remaining easy to translate into other forms as needed. llvm-svn: 249364
* [WinEH] Update CATCHRET's operand to match its successorDavid Majnemer2015-10-051-0/+1
| | | | | | | | | | | | The CATCHRET operand did not match the MachineFunction's CFG. This mismatch happened because FrameLowering created a new MachineBasicBlock and updated the CFG but forgot to update the CATCHRET operand. Let's make sure this doesn't happen again by strengthing the funclet membership analysis: it can now reason about the membership of all basic blocks, not just those inside of funclets. llvm-svn: 249344
* AMDGPU/SI: Add a helper for creating aliases for the _e32 instructionsTom Stellard2015-10-051-11/+49
| | | | | | | | | | | | | | | | | | Summary: We are currently only using these aliases for VOPC instructions, but this helper will make it easier to use them everywhere. These aliases allow for the automatic matching of instructions with forced 32-bit encoding. Eventually, we should be able to remove the custom C++ logic we have for this in the assembler. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D13396 llvm-svn: 249330
* [ARM] Modify codegen for memcpy intrinsic to prefer LDM/STM.Scott Douglass2015-10-058-30/+166
| | | | | | | | | | | | | | | | | | | | | | | | | | | We were previously codegen'ing memcpy as regular load/store operations and hoping that the register allocator would allocate registers in ascending order so that we could apply an LDM/STM combine after register allocation. According to the commit that first introduced this code (r37179), we planned to teach the register allocator to allocate the registers in ascending order. This never got implemented, and up to now we've been stuck with very poor codegen. A much simpler approach for achieving better codegen is to create MEMCPY pseudo instructions, attach scratch virtual registers to them and then, post register allocation, expand the MEMCPYs into LDM/STM pairs using the scratch registers. The register allocator will have picked arbitrary registers which we sort when expanding the MEMCPY. This approach also avoids the need to repeatedly calculate offsets which ultimately ought to be eliminated pre-RA in order to decrease register pressure. Fixes PR9199 and PR23768. [This is based on Peter Collingbourne's r238473 which was reverted.] Differential Revision: http://reviews.llvm.org/D13239 Change-Id: I727543c2e94136e0f80b8e22d5642d7b9ee5b458 Author: Peter Collingbourne <peter@pcc.me.uk> llvm-svn: 249322
* [mips][microMIPS] Implement JALRC16, JRCADDIUSP and JRC16 instructionsZoran Jovanovic2015-10-055-5/+82
| | | | | | Differential Revision: http://reviews.llvm.org/D11219 llvm-svn: 249317
* [MC layer][AArch64] llvm-mc accepts 4-bit immediate values forAlexandros Lamprineas2015-10-055-13/+87
| | | | | | | | | "msr pan, #imm", while only 1-bit immediate values should be valid. Changed encoding and decoding for msr pstate instructions. Differential Revision: http://reviews.llvm.org/D13011 llvm-svn: 249313
* [mips] Changed the way symbols are handled in dla and la instructions to ↵Daniel Sanders2015-10-051-12/+9
| | | | | | | | | | | | | | | | | | | | | allow simple expressions. Summary: An instruction like "(d)la $5, symbol+8" previously would have crashed the assembler as it contains an expression. This is now fixed. A few tests cases have also been changed to reflect these changes, however these should only be syntax changes. Some new test cases have also been added. Patch by Scott Egerton. Reviewers: vkalintiris, dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12760 llvm-svn: 249311
* Fix pr24486.Rafael Espindola2015-10-058-13/+13
| | | | | | | | | | | | | | | | | | This extends the work done in r233995 so that now getFragment (in addition to getSection) also works for variable symbols. With that the existing logic to decide if a-b can be computed works even if a or b are variables. Given that, the expression evaluation can avoid expanding variables as aggressively and that in turn lets the relocation code see the original variable. In order for this to work with the asm streamer, there is now a dummy fragment per section. It is used to assign a section to a symbol when no other fragment exists. This patch is a joint work by Maxim Ostapenko andy myself. llvm-svn: 249303
* [SPARCv9] Add support for the rdpr/wrpr instructions.Joerg Sonnenberger2015-10-044-0/+131
| | | | llvm-svn: 249262
* AVX512: Implemented encoding and intrinsics for VPERMILPS/PD instructions.Igor Breger2015-10-044-61/+102
| | | | | | | | Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D12690 llvm-svn: 249261
* Fix typo in READMEJeroen Ketema2015-10-041-1/+1
| | | | llvm-svn: 249253
* [X86] Lower SEXTLOAD using SIGN_EXTEND_VECTOR_INREG. NCI.Simon Pilgrim2015-10-031-22/+5
| | | | | | The custom lowering in LowerExtendedLoad is doing the equivalent shuffle, so make use of existing lowering code to reduce duplication. llvm-svn: 249243
* [WebAssembly] Implement the remaining conversion operations.Dan Gohman2015-10-031-31/+54
| | | | | | | This is a temporary assembly syntax that will likely evolve along with broader upcoming syntax changes. llvm-svn: 249225
* AMDGPU/SI: Remove unused tablegen multiclassTom Stellard2015-10-031-16/+0
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D13395 llvm-svn: 249221
* [WebAssembly] Rename setlocal to set_local to match the spec.Dan Gohman2015-10-031-1/+1
| | | | llvm-svn: 249218
* [WebAssembly] Fix CFG stackification of nested loops.Dan Gohman2015-10-021-4/+15
| | | | llvm-svn: 249187
* [WebAssembly] Support calls marked as "tail", fastcc, and coldcc.Dan Gohman2015-10-021-4/+14
| | | | llvm-svn: 249184
* [WebAssembly] Add a resize_memory intrinsic.Dan Gohman2015-10-021-0/+8
| | | | llvm-svn: 249178
* [WebAssembly] Add a memory_size intrinsic.Dan Gohman2015-10-021-0/+8
| | | | llvm-svn: 249171
* AMDGPU/SI: Add verifier check for exec readsMatt Arsenault2015-10-022-2/+14
| | | | | | | Make sure we aren't accidentally not setting these in the instruction definitions. llvm-svn: 249170
* Actually switch the arch when we see .arch. PR21695Roman Divacky2015-10-021-0/+5
| | | | llvm-svn: 249165
* ARM: diagnose invalid local fixups on Thumb1Tim Northover2015-10-022-20/+62
| | | | | | | | | We previously stopped producing Thumb2 relaxations when they weren't supported, but only diagnosed the case where an actual relocation was produced. We should also tell people if local symbols aren't going to work rather than silently overflowing. llvm-svn: 249164
* ARM: correctly align constant pool value on Thumb1 targets.Tim Northover2015-10-021-1/+1
| | | | | | | Since we're using tLDRpci to access it, the constant pool's address must be 0 (mod 4). llvm-svn: 249163
* [ARM] Typo. NFC.Chad Rosier2015-10-021-1/+1
| | | | llvm-svn: 249153
* Reapply r249121 : "[FastISel][x86] Teach how to select SSE2/AVX bitcasts ↵Andrea Di Biagio2015-10-021-0/+24
| | | | | | | | | | | | | | | | | | | | | | | between 128/256-bit vector types." This patch teaches FastIsel the following two things: 1) On SSE2, no instructions are needed for bitcasts between 128-bit vector types; 2) On AVX, no instructions are needed for bitcasts between 256-bit vector types. Example: %1 = bitcast <4 x i31> %V to <2 x i64> Before (-fast-isel -fast-isel-abort=1): FastIsel miss: %1 = bitcast <4 x i31> %V to <2 x i64> Now we don't fall back to SelectionDAG and we correctly fold that computation propagating the register associated to %V. Originally reviewed here: http://reviews.llvm.org/D13347 llvm-svn: 249147
* Revert: [FastISel][x86] Teach how to select SSE2/AVX bitcasts between ↵Andrea Di Biagio2015-10-021-24/+0
| | | | | | | | | 128/256-bit vector types. r249121 caused a Clang test failure (avx2-buitins.c). Revert r249121 while I keep investigating on the reason why that test failed. llvm-svn: 249124
* [mips][microMIPS] Fix an issue with selecting sqrt instruction in LLVM backendZoran Jovanovic2015-10-021-5/+2
| | | | | | Differential Revision: http://reviews.llvm.org/D13235 llvm-svn: 249123
* [FastISel][x86] Teach how to select SSE2/AVX bitcasts between 128/256-bit ↵Andrea Di Biagio2015-10-021-0/+24
| | | | | | | | | | | | | | | | | | | | | | | vector types. This patch teaches FastIsel the following two things: 1) On SSE2, no instructions are needed for bitcasts between 128-bit vector types; 2) On AVX, no instructions are needed for bitcasts between 256-bit vector types. Example: %1 = bitcast <4 x i31> %V to <2 x i64> Before (-fast-isel -fast-isel-abort=1): FastIsel miss: %1 = bitcast <4 x i31> %V to <2 x i64> Now we don't fall back to SelectionDAG and we correctly fold that computation propagating the register associated to %V. Differential Revision: http://reviews.llvm.org/D13347 llvm-svn: 249121
* AMDGPU: Fix unused variable warning in release buildMatt Arsenault2015-10-011-2/+2
| | | | llvm-svn: 249091
* AMDGPU: Move SIFixSGPRLiveRanges to be a regalloc passMatt Arsenault2015-10-012-33/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace LiveInterval usage with LiveVariables. LiveIntervals computes far more information than is needed for this pass which just needs to find if an SGPR is live out of the defining block. LiveIntervals are not usually available that early, requiring computing them twice which is very expensive. The extra run of LiveIntervals/LiveVariables/SlotIndexes was costing in total about 5% of compile time. Continuing to use LiveIntervals is problematic. It seems there is an option (early-live-intervals) to run the analysis about where it should go to avoid recomputing LiveVariables, but it seems to be completely broken with subreg liveness enabled. There are also problems from trying to recompute LiveIntervals since this seems to undo LiveVariables and clearing kill flags, causing TwoAddressInstructions to make bad decisions. Insert the pass right after live variables and preserve it. The tricky case to worry about might be phis since LiveVariables doesn't count a register as live out if in the successor block it is only used in a phi, but I don't think this is a concern right now because SIFixSGPRCopies replaces SGPR phis. llvm-svn: 249087
* Fix relocation used for GOT references in non-PIC mode. Fix relocationsJoerg Sonnenberger2015-10-011-28/+33
| | | | | | | | for "set" pseudo op in PIC mode. Differential Revision: http://reviews.llvm.org/D13173 llvm-svn: 249086
* AMDGPU: Merge if and switchMatt Arsenault2015-10-011-14/+17
| | | | llvm-svn: 249082
* AMDGPU: Remove dead codeMatt Arsenault2015-10-011-7/+4
| | | | | | | There's no point in checking VReg_1 because all uses of it should already have been removed by SILowerI1Copies. llvm-svn: 249081
* AMDGPU: Make SIInsertWaits about a factor of 4 fasterMatt Arsenault2015-10-012-23/+24
| | | | | | | | | | | | | | | | | | This was the slowest target custom pass and was spending 80% of the time in getMinimalPhysRegClass which was called for every register operand. Try to use the statically known register class when possible from the instruction's MCOperandInfo. There are a few pseudo instructions which are not well behaved with unknown register classes which still require the expensive physical register class search. There are a few other possibilities for making this even faster, such as not inspecting implicit operands. For now those are checked because it is technically possible to have a scalar load into exec or vcc which can be implicitly used. llvm-svn: 249079
* [WinEH] Emit __C_specific_handler tables for the new IRReid Kleckner2015-10-011-4/+15
| | | | | | | | | | | | We emit denormalized tables, where every range of invokes in the same state gets a complete list of EH action entries. This is significantly simpler than trying to infer the correct nested scoping structure from the MI. Fortunately, for SEH, the nesting structure is really just a size optimization. With this, some basic __try / __except examples work. llvm-svn: 249078
* AMDGPU/SI: Remove assert from AMDGPUOpenCLImageTypeLowering passTom Stellard2015-10-011-2/+6
| | | | | | | | | | | | | | | Summary: Instead of asserting when the kernel metadata is different than we expect, we should just skip lowering that function. This fixes assertion failures with OpenCL argument metadata from older LLVM releases. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D13356 llvm-svn: 249073
* [WinEH] Make FuncletLayout more robust against catchretDavid Majnemer2015-10-014-19/+38
| | | | | | | | | Catchret transfers control from a catch funclet to an earlier funclet. However, it is not completely clear which funclet the catchret target is part of. Make this clear by stapling the catchret target's funclet membership onto the CATCHRET SDAG node. llvm-svn: 249052
* [AArch64] Deprecate a command-line option used for testing.Chad Rosier2015-10-011-12/+4
| | | | | | | Support for pairing unscaled loads and stores has been enabled since the original ARM64 port. This feature is no longer experimental, AFAICT. llvm-svn: 249049
* [SystemZ] Add some generic (floating point support) load instructions.Jonas Paulsson2015-10-014-17/+37
| | | | | | | | | | | | | | | | | | | | | | Add generic instructions for load complement, load negative and load positive for fp32 and fp64, and let isel prefer them. They do not clobber CC, and so give scheduler more freedom. SystemZElimCompare pass will convert them when it can to the CC-setting variants. Regression tests updated to expect the new opcodes in places where the old ones where used. New test case SystemZ/fp-cmp-05.ll checks that SystemZCompareElim.cpp can handle the new opcodes. README.txt updated (bullet removed). Note that fp128 is not yet handled, because it is relatively rare, and is a bit trickier, because of the fact that l.dfr would operate on the sign bit of one of the subregisters of a fp128, but we would not want to copy the other sub-reg in case src and dst regs are not the same. Reviewed by Ulrich Weigand. llvm-svn: 249046
* AMDGPU: Add MEM_RAT STORE_TYPED.Tom Stellard2015-10-013-0/+23
| | | | | | | | | | | | v2: Add test (Matt). Fix capitalization of isEOP (Matt). Move pattern to class parameter (Matt). Make the instruction available to Cayman (Matt). Change name from MEM_RAT WRITE_TYPED to MEM_RAT STORE_TYPED. Patch by: Zoltan Gilian llvm-svn: 249042
* AMDGPU: Factor out EOP query.Tom Stellard2015-10-011-4/+6
| | | | | | | | v2: Fix brace placement and capitalization (Matt). Patch by: Zoltan Gilian llvm-svn: 249041
* Revert r248959, "[WinEH] Emit int3 after noreturn calls on Win64"NAKAMURA Takumi2015-10-012-23/+7
| | | | | | It broke; LLVM :: CodeGen__Generic__2009-11-16-BadKillsCrash.ll llvm-svn: 249032
* [SystemZ] Add assembly instructions for obtaining clock values as well as ↵Ulrich Weigand2015-10-011-0/+20
| | | | | | | | | | | CPU features Provide assembler support for STCK, STCKF, STCKE, and STFLE. Author: joncmu Differential Revision: http://reviews.llvm.org/D13299 llvm-svn: 249015
* [AArch64] Hoist commonly failing check. NFC.Chad Rosier2015-10-011-6/+6
| | | | llvm-svn: 249011
* [AArch64] Rename variable to improve readability. NFC.Chad Rosier2015-10-011-10/+10
| | | | llvm-svn: 249008
* [AArch64] Update comment to reflect reality.Chad Rosier2015-10-011-2/+2
| | | | llvm-svn: 249007
* [mips][microMIPS] Implement CACHEE, WRPGPR and WSBH instructionsZoran Jovanovic2015-10-013-5/+39
| | | | | | Differential Revision: http://reviews.llvm.org/D10337 llvm-svn: 249004
* [ARM] More care with Thumb1 writeback in ARMLoadStoreOptimizerScott Douglass2015-10-011-3/+7
| | | | | | Differential Revision: http://reviews.llvm.org/D13240 llvm-svn: 249002
* AMDGPU/SI: Re-order PreloadedValue enum and number entries based on init orderTom Stellard2015-10-011-9/+12
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D12451 llvm-svn: 248978
* [X86] Don't custom-lower vNi32 uint_to_fp when unsafe-fp-math.Ahmed Bougacha2015-10-011-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The custom code produces incorrect results if later reassociated. Since r221657, on x86, vNi32 uitofp is lowered using an optimized sequence: movdqa LCPI0_0(%rip), %xmm1 ## xmm1 = [65535, ...] pand %xmm0, %xmm1 por LCPI0_1(%rip), %xmm1 ## [0x4b000000, ...] psrld $16, %xmm0 por LCPI0_2(%rip), %xmm0 ## [0x53000000, ...] addps LCPI0_3(%rip), %xmm0 ## [float -5.497642e+11, ...] addps %xmm1, %xmm0 Since r240361, the machine combiner opportunistically reassociates 2-instruction sequences (with -ffast-math). In the new code sequence, the ADDPS' are eligible. In isolation, for simple examples (without reassociable users), this makes no performance difference (the goal being to enable reassociation of longer chains). In the trivial example (just one uitofp), the reassociation doesn't happen, because (I think) it would require the emission of a separate movaps for a constantpool load (instead of folding it into addps). However, when we have multiple uitofp sequences, and the constantpool loads are CSE'd earlier, the machine combiner can do the reassociation. When the ADDPS' are reassociated, the resulting sequence isn't correct anymore, as we'd be adding large (2**39) constants with comparatively smaller values (~2**23). Given that two of the three inputs are powers of 2 larger than 2**16, and that ulp(2**39) == 2**(39-24) == 2**15, the reassociated chain will produce 0 for any input in [0, 2**14[. In my testing, it also produces wrong results for 99.5% of [0, 2**32[. Avoid this by disabling the new lowering when -ffast-math. It does mean that we'll get slower code than without it, but at least we won't get egregiously incorrect code. One might argue that, considering -ffast-math is all but meaningless, uitofp producing wrong results isn't a compiler bug. But it really is. Fixes PR24512. ...though this is really more of a workaround. Ideally, we'd have some sort of Machine FMF, but that's a problem that's not worth tackling until we do more with machine IR. llvm-svn: 248965
OpenPOWER on IntegriCloud