summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] New and improved VZeroUpperInserter optimization.Lang Hames2014-03-171-165/+162
| | | | | | | | | | | | | | | | | - Adds support for inserting vzerouppers before tail-calls. This is enabled implicitly by having MachineInstr::copyImplicitOps preserve regmask operands, which allows VZeroUpperInserter to see where tail-calls use vector registers. - Fixes a bug that caused the previous version of this optimization to miss some vzeroupper insertion points in loops. (Loops-with-vector-code that followed loops-without-vector-code were mistakenly overlooked by the previous version). - New algorithm never revisits instructions. Fixes <rdar://problem/16228798> llvm-svn: 204021
* Remove some dead assignements found by scan-buildArnaud A. de Grandmaison2014-03-151-1/+0
| | | | llvm-svn: 204013
* Replace ValueTypes.h with MachineValueType.h if possible.Patrik Hagglund2014-03-154-3/+5
| | | | | | | | | Utilize the previous move of MVT to a separate header for all trivial cases (that don't need any further restructuring). Reviewed By: Tim Northover llvm-svn: 204003
* R600: Remove unnecessary attempt to zext a pointer.Matt Arsenault2014-03-151-3/+6
| | | | | | Private pointers are now always 32-bits. llvm-svn: 203989
* R600: Code cleanup.Matt Arsenault2014-03-151-11/+12
| | | | | | | Use sign_extend_inreg and getZeroExtendInReg instead of using the bit operations they expand into. llvm-svn: 203988
* x86: Add missing break to getCallPreservedMask()Duncan P. N. Exon Smith2014-03-141-0/+1
| | | | | | | | | | | | This change brings getCallPreservedMask()'s logic in line with getCalleeSavedRegs(). While this changes the control flow slightly, the change is not currently observable. is64Bit must be false to get to the accidental fallthrough, but the case that we fall into (coldcc) does nothing unless is64Bit is true. llvm-svn: 203943
* x86: NFC: Make getCallPreservedMask() more similar to getCalleeSavedRegs()Duncan P. N. Exon Smith2014-03-141-4/+6
| | | | | | | Changing order of checks in getCallPreservedMask() to match getCalleeSavedRegs() so that the logic is easier to compare. llvm-svn: 203939
* x86: getCalleeSavedRegs() would crash on 0 (so don't default to it)Duncan P. N. Exon Smith2014-03-142-1/+2
| | | | | | | The current logic assumes that MF is not 0. Assert that it isn't, and remove the default of 0 from the header. llvm-svn: 203934
* [ppc64] Avoid copy relocs in named rodata sectionsUlrich Weigand2014-03-141-13/+9
| | | | | | | | | | | | | | Commit r181723 introduced code to avoid placing initialized variables needing relocations into the .rodata section, which avoid copy relocs that do not work as expected on ppc64 function references. The same treatment is also needed for *named* .rodata.XXX sections. This patch changes PPC64LinuxTargetObjectFile::SelectSectionForGlobal to modify "Kind" *before* calling the default SelectSectionForGlobal routine, instead of first calling the default routine and then just checking for the (main) .rodata section afterwards. llvm-svn: 203921
* AddressSanitizer instrumentation for MOV and MOVAPS.Evgeniy Stepanov2014-03-144-3/+304
| | | | | | | | This is an initial version of *Sanitizer instrumentation of assembly code. Patch by Yuri Gorshenin. llvm-svn: 203908
* Remove the linker_private and linker_private_weak linkages.Rafael Espindola2014-03-131-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These linkages were introduced some time ago, but it was never very clear what exactly their semantics were or what they should be used for. Some investigation found these uses: * utf-16 strings in clang. * non-unnamed_addr strings produced by the sanitizers. It turns out they were just working around a more fundamental problem. For some sections a MachO linker needs a symbol in order to split the section into atoms, and llvm had no idea that was the case. I fixed that in r201700 and it is now safe to use the private linkage. When the object ends up in a section that requires symbols, llvm will use a 'l' prefix instead of a 'L' prefix and things just work. With that, these linkages were already dead, but there was a potential future user in the objc metadata information. I am still looking at CGObjcMac.cpp, but at this point I am convinced that linker_private and linker_private_weak are not what they need. The objc uses are currently split in * Regular symbols (no '\01' prefix). LLVM already directly provides whatever semantics they need. * Uses of a private name (start with "\01L" or "\01l") and private linkage. We can drop the "\01L" and "\01l" prefixes as soon as llvm agrees with clang on L being ok or not for a given section. I have two patches in code review for this. * Uses of private name and weak linkage. The last case is the one that one could think would fit one of these linkages. That is not the case. The semantics are * the linker will merge these symbol by *name*. * the linker will hide them in the final DSO. Given that the merging is done by name, any of the private (or internal) linkages would be a bad match. They allow llvm to rename the symbols, and that is really not what we want. From the llvm point of view, these objects should really be (linkonce|weak)(_odr)?. For now, just keeping the "\01l" prefix is probably the best for these symbols. If we one day want to have a more direct support in llvm, IMHO what we should add is not a linkage, it is just a hidden_symbol attribute. It would be applicable to multiple linkages. For example, on weak it would produce the current behavior we have for objc metadata. On internal, it would be equivalent to private (and we should then remove private). llvm-svn: 203866
* Phase 2 of the great MachineRegisterInfo cleanup. This time, we're changingOwen Anderson2014-03-1312-40/+44
| | | | | | | | | | operator* on the by-operand iterators to return a MachineOperand& rather than a MachineInstr&. At this point they almost behave like normal iterators! Again, this requires making some existing loops more verbose, but should pave the way for the big range-based for-loop cleanups in the future. llvm-svn: 203865
* Use printable names to implement directional labels.Rafael Espindola2014-03-131-2/+1
| | | | | | | | | | | | | | This changes the implementation of local directional labels to use a dedicated map. With that it can then just use CreateTempSymbol, which is what the rest of MC uses. CreateTempSymbol doesn't do a great job at making sure the names are unique (or being efficient when the names are not needed), but that should probably be fixed in a followup patch. This fixes pr18928. llvm-svn: 203826
* R600: LDS instructions shouldn't implicitly define OQAPTom Stellard2014-03-131-2/+0
| | | | | | | | | LDS instructions are pseudo instructions which model the OQAP defs and uses within a single instruction. This fixes a hang in the opencv MedianFilter tests. llvm-svn: 203818
* [ARM] Use symbolic register names in .cfi directives only with IAS (PR19110)Hans Wennborg2014-03-132-3/+11
| | | | | | | | | | This is a follow-up to r203635. Saleem pointed out that since symbolic register names are much easier to read, it would be good if we could turn them off only when we really need to because we're using an external assembler. Differential Revision: http://llvm-reviews.chandlerc.com/D3056 llvm-svn: 203806
* CodeGenPrep: sink extends of illegal types into use block.Manuel Jacob2014-03-131-48/+0
| | | | | | | | | | | | | | | | | | | Summary: This helps the instruction selector to lower an i64 * i64 -> i128 multiplication into a single instruction on targets which support it. This is an update of D2973 which was reverted because of a bug reported as PR19084. Reviewers: t.p.northover, chapuni Reviewed By: t.p.northover CC: llvm-commits, alex, chapuni Differential Revision: http://llvm-reviews.chandlerc.com/D3021 llvm-svn: 203797
* AVX-512: masked load/store + intrinsics for them.Elena Demikhovsky2014-03-131-121/+108
| | | | llvm-svn: 203790
* AArch64: error when both positional & named operands are used.Tim Northover2014-03-133-4/+7
| | | | | | | | | | Only one instruction pair needed changing: SMULH & UMULH. The previous code worked, but MC was doing extra work treating Ra as a valid operand (which then got completely overwritten in MCCodeEmitter). No behaviour change, so no tests. llvm-svn: 203772
* [PowerPC] Initial support for the VSX instruction setHal Finkel2014-03-1319-23/+1275
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VSX is an ISA extension supported on the POWER7 and later cores that enhances floating-point vector and scalar capabilities. Among other things, this adds <2 x double> support and generally helps to reduce register pressure. The interesting part of this ISA feature is the register configuration: there are 64 new 128-bit vector registers, the 32 of which are super-registers of the existing 32 scalar floating-point registers, and the second 32 of which overlap with the 32 Altivec vector registers. This makes things like vector insertion and extraction tricky: this can be free but only if we force a restriction to the right register subclass when needed. A new "minipass" PPCVSXCopy takes care of this (although it could do a more-optimal job of it; see the comment about unnecessary copies below). Please note that, currently, VSX is not enabled by default when targeting anything because it is not yet ready for that. The assembler and disassembler are fully implemented and tested. However: - CodeGen support causes miscompiles; test-suite runtime failures: MultiSource/Benchmarks/FreeBench/distray/distray MultiSource/Benchmarks/McCat/08-main/main MultiSource/Benchmarks/Olden/voronoi/voronoi MultiSource/Benchmarks/mafft/pairlocalalign MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4 SingleSource/Benchmarks/CoyoteBench/almabench SingleSource/Benchmarks/Misc/matmul_f64_4x4 - The lowering currently falls back to using Altivec instructions far more than it should. Worse, there are some things that are scalarized through the stack that shouldn't be. - A lot of unnecessary copies make it past the optimizers, and this needs to be fixed. - Many more regression tests are needed. Normally, I'd fix these things prior to committing, but there are some students and other contributors who would like to work this, and so it makes sense to move this development process upstream where it can be subject to the regular code-review procedures. llvm-svn: 203768
* [TableGen] Optionally forbid overlap between named and positional operandsHal Finkel2014-03-131-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are currently two schemes for mapping instruction operands to instruction-format variables for generating the instruction encoders and decoders for the assembler and disassembler respectively: a) to map by name and b) to map by position. In the long run, we'd like to remove the position-based scheme and use only name-based mapping. Unfortunately, the name-based scheme currently cannot deal with complex operands (those with suboperands), and so we currently must use the position-based scheme for those. On the other hand, the position-based scheme cannot deal with (register) variables that are split into multiple ranges. An upcoming commit to the PowerPC backend (adding VSX support) will require this capability. While we could teach the position-based scheme to handle that, since we'd like to move away from the position-based mapping generally, it seems silly to teach it new tricks now. What makes more sense is to allow for partial transitioning: use the name-based mapping when possible, and only use the position-based scheme when necessary. Now the problem is that mixing the two sensibly was not possible: the position-based mapping would map based on position, but would not skip those variables that were mapped by name. Instead, the two sets of assignments would overlap. However, I cannot currently change the current behavior, because there are some backends that rely on it [I think mistakenly, but I'll send a message to llvmdev about that]. So I've added a new TableGen bit variable: noNamedPositionallyEncodedOperands, that can be used to cause the position-based mapping to skip variables mapped by name. llvm-svn: 203767
* ARM: ignore unused variable to fix -Wunused-variable buildsSaleem Abdulrasool2014-03-131-0/+1
| | | | llvm-svn: 203765
* ARM: support emission of complex SO expressionsSaleem Abdulrasool2014-03-131-2/+13
| | | | | | | | | | | | | | Support to the IAS was added to actually parse and handle the complex SO expressions. However, the object file lowering was not updated to compensate for the fact that the shift operand may be an absolute expression. When trying to assemble to an object file, the lowering would fail while succeeding when emitting purely assembly. Add an appropriate test. The test case is inspired by the test case provided by Jiangning Liu who also brought the issue to light. llvm-svn: 203762
* [X86] Add peephole for masked rotate amountAdam Nemet2014-03-121-0/+2
| | | | | | | | | | | | | | | | Extend what's currently done for shift because the HW performs this masking implicitly: (rotl:i32 x, (and y, 31)) -> (rotl:i32 x, y) I use the newly factored out multiclass that was only supporting shifts so far. For testing I extended my testcase for the new rotation idiom. <rdar://problem/15295856> llvm-svn: 203718
* Allow exclamation and tilde to be parsed as a part of the ppc asm operand.Roman Divacky2014-03-121-0/+2
| | | | llvm-svn: 203699
* R600: Fix trunc store from i64 to i1Matt Arsenault2014-03-121-0/+6
| | | | llvm-svn: 203695
* [X86] Refactor peepholes for masked shift amount into a multiclassAdam Nemet2014-03-121-55/+25
| | | | | | | | | | | | | | | | | | The peephole (shift x, (and y, 31)) -> (shift x, y) is repeated for each integer type and each shift variant. To improve this a new multiclass is added that covers all integer types. The shift patterns are now instantiated from this. I am planning to add new instances for rotates as well. No functional change intended: * test/CodeGen/X86/shift-and.ll provides coverage * Compared the expanded tablegen output and matched up the defs for these Pat<>s before and after llvm-svn: 203685
* [X86] Set the scheduling resources of some of the FPStack instructions.Quentin Colombet2014-03-121-0/+17
| | | | | | This is related to <rdar://problem/15607571>. llvm-svn: 203682
* Try harder to evaluate expressions when printing assembly.Rafael Espindola2014-03-124-8/+7
| | | | | | | | | When printing assembly we don't have a Layout object, but we can still try to fold some constants. Testcase by Ulrich Weigand. llvm-svn: 203677
* Add comment pointing to the binutils bugzilla entryHans Wennborg2014-03-121-0/+1
| | | | | | This is a follow-up to r203635 as suggested by Rafael. llvm-svn: 203670
* Update the datalayout string for ppc64LE.Will Schmidt2014-03-121-2/+7
| | | | | | Update the datalayout string for ppc64LE. llvm-svn: 203664
* [mips][fp64] Add an implicit def to MTHC1 claiming that it reads the lower ↵Daniel Sanders2014-03-121-8/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | 32-bits of 64-bit FPR Summary: This is a white lie to workaround a widespread bug in the -mfp64 implementation. The problem is that none of the 32-bit fpu ops mention the fact that they clobber the upper 32-bits of the 64-bit FPR. This allows MTHC1 to be scheduled on the wrong side of most 32-bit FPU ops, particularly MTC1. Fixing that requires a major overhaul of the FPU implementation which can't be done right now due to time constraints. The testcase is SingleSource/Benchmarks/Misc/oourafft.c when given TARGET_CFLAGS='-mips32r2 mfp64 -mmsa'. Also correct the comment added in r203464 to indicate that two instructions were affected. Reviewers: matheusalmeida, jacksprat Reviewed By: matheusalmeida Differential Revision: http://llvm-reviews.chandlerc.com/D3029 llvm-svn: 203659
* [mips] BSEL's and BINS[RL] operands are reversed compared to the vselect ↵Daniel Sanders2014-03-123-17/+42
| | | | | | | | | | | | | | | | | | | | | | | node used in the pattern. Summary: Correct the match patterns and the lowerings that made the CodeGen tests pass despite the mistakes. The original testcase that discovered the problem was SingleSource/UnitTests/SignlessType/factor.c in test-suite. During review, we also found that some of the existing CodeGen tests were incorrect and fixed them: * bitwise.ll: In bsel_v16i8 the IfSet/IfClear were reversed because bsel and bmnz have different operand orders and the test didn't correctly account for this. bmnz goes 'IfClear, IfSet, CondMask', while bsel goes 'CondMask, IfClear, IfSet'. * vec.ll: In the cases where a bsel is emitted as a bmnz (they are the same operation with a different input tied to the result) the operands were in the wrong order. * compare.ll and compare_float.ll: The bsel operand order was correct for a greater-than comparison, but a greater-than comparison instruction doesn't exist. Lowering this operation inverts the condition so the IfSet/IfClear need to be swapped to match. The differences between BSEL, BMNZ, and BMZ and how they map to/from vselect are rather confusing. I've therefore added a note to MSA.txt to explain this in a single place in addition to the comments that explain each case. Reviewers: matheusalmeida, jacksprat Reviewed By: matheusalmeida Differential Revision: http://llvm-reviews.chandlerc.com/D3028 llvm-svn: 203657
* ARM: correct Dwarf output for non-contiguous VFP saves.Tim Northover2014-03-121-0/+9
| | | | | | | | | | | | | | | | | | When the list of VFP registers to be saved was non-contiguous (so multiple vpush/vpop instructions were needed) these were being ordered oddly, as in: vpush {d8, d9} vpush {d11} This led to the layout in memory being [d11, d8, d9] which is ugly and doesn't match the CFI_INSTRUCTIONs we're generating either (so Dwarf info would be broken). This switches the order of vpush/vpop (in both prologue and epilogue, obviously) so that the Dwarf locations are correct again. rdar://problem/16264856 llvm-svn: 203655
* Replace '#include ValueTypes.h' with forward declarations.Patrik Hagglund2014-03-129-8/+6
| | | | | | | In some cases the include is pushed "downstream" (or removed if unused). llvm-svn: 203644
* [ARM] Use DWARF register numbers for CFI directives in ELF assemblyHans Wennborg2014-03-121-0/+3
| | | | | | | | | | | | | | It seems gas can't handle CFI directives with VFP register names ("d12", etc.). This broke us trying to build Chromium for Android after 201423. A gas bug has been filed: https://sourceware.org/bugzilla/show_bug.cgi?id=16694 compnerd suggested making this conditional on whether we're using the integrated assembler or not. I'll look into that in a follow-up patch. Differential Revision: http://llvm-reviews.chandlerc.com/D3049 llvm-svn: 203635
* [mips] Implement NaCl sandboxing of function calls:Sasa Stankovic2014-03-111-2/+56
| | | | | | | | | * Add masking instructions before indirect calls (in MC layer). * Align call + branch delay to the bundle end (in MC layer). Differential Revision: http://llvm-reviews.chandlerc.com/D3032 llvm-svn: 203606
* Simplify a really complicated check for Arch == X86_64.Rafael Espindola2014-03-111-1/+0
| | | | | | | | | | | | The function hasReliableSymbolDifference had exactly one use in the MachO writer. It is also only true for X86_64. In fact, the comments refers to "Darwin x86_64" and everything else, so this makes the code match the comment. If this is to be abstracted again, it should be a property of TargetObjectWriter, like useAggressiveSymbolFolding. llvm-svn: 203605
* Range-ify a loop.Owen Anderson2014-03-111-4/+1
| | | | llvm-svn: 203590
* X86: Don't generate 64-bit movd after cmpneqsd in 32-bit mode (PR19059)Hans Wennborg2014-03-111-4/+20
| | | | | | | | | This fixes the bug where we would bitcast the 64-bit floating point result of cmpneqsd to a 64-bit integer even on 32-bit targets. Differential Revision: http://llvm-reviews.chandlerc.com/D3009 llvm-svn: 203581
* ARM: honour -f{no-,}optimize-sibling-callsSaleem Abdulrasool2014-03-111-2/+4
| | | | | | | | | | | Use the options in the ARMISelLowering to control whether tail calls are optimised or not. Previously, this option was entirely ignored on the ARM target and only honoured on x86. This option is mostly useful in profiling scenarios. The default remains that tail call optimisations will be applied. llvm-svn: 203577
* ARM: remove ancient -arm-tail-calls optionSaleem Abdulrasool2014-03-111-8/+2
| | | | | | | | This option is from 2010, designed to work around a linker issue on Darwin for ARM. According to grosbach this is no longer an issue and this option can safely be removed. llvm-svn: 203576
* ARM: enable tail call optimisation on Thumb 2Saleem Abdulrasool2014-03-111-1/+3
| | | | | | | | | | | | Tail call optimisation was previously disabled on all targets other than iOS5.0+. This enables the tail call optimisation on all Thumb 2 capable platforms. The test adjustments are to remove the IR hint "tail" to function invocation. The tests were designed assuming that tail call optimisations would not kick in which no longer holds true. llvm-svn: 203575
* ARM: simplify EmitAtomicBinary64Tim Northover2014-03-113-34/+19
| | | | | | | | | ATOMIC_STORE operations always get here as a lowered ATOMIC_SWAP, so there's no need for any code to handle them specially. There should be no functionality change so no tests. llvm-svn: 203567
* IR: add a second ordering operand to cmpxhg for failureTim Northover2014-03-113-6/+11
| | | | | | | | | | | | | | | The syntax for "cmpxchg" should now look something like: cmpxchg i32* %addr, i32 42, i32 3 acquire monotonic where the second ordering argument gives the required semantics in the case that no exchange takes place. It should be no stronger than the first ordering constraint and cannot be either "release" or "acq_rel" (since no store will have taken place). rdar://problem/15996804 llvm-svn: 203559
* R600: Calculate store mask instead of using switch.Matt Arsenault2014-03-111-17/+3
| | | | llvm-svn: 203527
* X86: Enable ISel of 16-bit MOVBE instructions.Jim Grosbach2014-03-112-1/+9
| | | | | | | | | | | | | | | | | When the MOVBE instructions are available, use them for 16-bit endian swapping as well as for 32 and 64 bit. The patterns were already present on the instructions, but weren't being matched because the operation was unconditionally marked to 'Expand.' Change that to be conditional on whether the MOVBE instructions are available. Use 'rolw' to implement the in-register version (32 and 64 bit have the dedicated 'bswap' instruction for that). Patch by Louis Gerbarg <lgg@apple.com>. rdar://15479984 llvm-svn: 203524
* Remove incomplete commentMatt Arsenault2014-03-111-2/+0
| | | | llvm-svn: 203518
* Move trivial getter into header.Matt Arsenault2014-03-112-7/+4
| | | | llvm-svn: 203517
* Use .data() instead of &x[0]Matt Arsenault2014-03-111-2/+2
| | | | llvm-svn: 203516
* Fix indentationMatt Arsenault2014-03-111-9/+8
| | | | llvm-svn: 203515
OpenPOWER on IntegriCloud