summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* Strip old MachineInstrs *after* we know we can put them back.Tim Northover2012-09-051-6/+6
| | | | | | | | Previous patch accidentally decided it couldn't convert a VFP to a NEON instruction after it had already destroyed the old one. Not a good move. llvm-svn: 163230
* LLVM Bug Fix 13709: Remove needless lsr(Rp, #32) instruction access thePranav Bhandarkar2012-09-051-0/+35
| | | | | | | | | | | | | | | subreg_hireg of register pair Rp. * lib/Target/Hexagon/HexagonPeephole.cpp(PeepholeDoubleRegsMap): New DenseMap similar to PeepholeMap that additionally records subreg info too. (runOnMachineFunction): Record information in PeepholeDoubleRegsMap and copy propagate the high sub-reg of Rp0 in Rp1 = lsr(Rp0, #32) to the instruction Rx = COPY Rp1:logreg_subreg. * test/CodeGen/Hexagon/remove_lsr.ll: New test. llvm-svn: 163214
* Remove some of the patterns added in r163196. Increasing the complexity on ↵Craig Topper2012-09-051-42/+2
| | | | | | insert_subvector into undef accomplishes the same thing. llvm-svn: 163198
* Add patterns for integer forms of VINSERTF128/VINSERTI128 folded with loads. ↵Craig Topper2012-09-051-4/+76
| | | | | | Also add patterns to turn subvector inserts with loads to index 0 of an undef into VMOVAPS. llvm-svn: 163196
* Fix UseInitArray option for MIPS target.Logan Chien2012-09-051-0/+1
| | | | llvm-svn: 163193
* Convert vextracti128/vextractf128 intrinsics to extract_subvector at DAG ↵Craig Topper2012-09-051-28/+51
| | | | | | build time. Similar was previously done for vinserti128/vinsertf128. Add patterns for folding these extract_subvectors with stores. llvm-svn: 163192
* Remove redundant semicolons to fix -pedantic-errors build.Richard Smith2012-09-051-2/+2
| | | | llvm-svn: 163190
* Fix function name per coding standard.Chad Rosier2012-09-054-10/+10
| | | | llvm-svn: 163187
* Generic Bypass Slow DivPreston Gurd2012-09-044-1/+15
| | | | | | | | | | | | | | | | | | | | | | | - CodeGenPrepare pass for identifying div/rem ops - Backend specifies the type mapping using addBypassSlowDivType - Enabled only for Intel Atom with O2 32-bit -> 8-bit - Replace IDIV with instructions which test its value and use DIVB if the value is positive and less than 256. - In the case when the quotient and remainder of a divide are used a DIV and a REM instruction will be present in the IR. In the non-Atom case they are both lowered to IDIVs and CSE removes the redundant IDIV instruction, using the quotient and remainder from the first IDIV. However, due to this optimization CSE is not able to eliminate redundant IDIV instructions because they are located in different basic blocks. This is overcome by calculating both the quotient (DIV) and remainder (REM) in each basic block that is inserted by the optimization and reusing the result values when a subsequent DIV or REM instruction uses the same operands. - Test cases check for the presents of the optimization when calculating either the quotient, remainder, or both. Patch by Tyler Nowicki! llvm-svn: 163150
* Porting Hexagon MI Scheduler to the new API.Sergei Larin2012-09-048-1/+1377
| | | | | | | Change current Hexagon MI scheduler to use new converging scheduler. Integrates DFA resource model into it. llvm-svn: 163137
* Patch to implement UMLAL/SMLAL instructions for the ARM architectureArnold Schwaighofer2012-09-045-17/+251
| | | | | | | | | | | This patch corrects the definition of umlal/smlal instructions and adds support for matching them to the ARM dag combiner. Bug 12213 Patch by Yin Ma! llvm-svn: 163136
* This patch optimizes shuffle instruction - generates 2 instructions instead ↵Elena Demikhovsky2012-09-041-16/+17
| | | | | | | | | | | | | | | | | | | | of 4. Since this specific shuffle is widely used in many workloads we have ~10% performance on them. shufflevector <8 x float> %A, <8 x float> %B, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 4, i32 12, i32 6, i32 14> vmovaps (%rdx), %ymm0 vshufps $8, %ymm0, %ymm0, %ymm0 vmovaps (%rcx), %ymm1 vshufps $8, %ymm0, %ymm1, %ymm1 vunpcklps %ymm0, %ymm1, %ymm0 vmovaps (%rcx), %ymm0 vmovsldup (%rdx), %ymm1 vblendps $85, %ymm0, %ymm1, %ymm0 llvm-svn: 163134
* [ms-inline asm] Asm operands can map to one or more MCOperands. Therefore, addChad Rosier2012-09-034-8/+11
| | | | | | | the NumMCOperands argument to the GetMCInstOperandNum() function that is set to the number of MCOperands this asm operand mapped to. llvm-svn: 163124
* [ms-inline asm] Add a comment.Chad Rosier2012-09-031-0/+3
| | | | llvm-svn: 163123
* [ms-inline asm] Add an interface to the GetMCInstOperandNum() function in theChad Rosier2012-09-034-0/+30
| | | | | | MCTargetAsmParser class. llvm-svn: 163122
* Remove always true checks. Noticed by Adhemerval Zanella.Roman Divacky2012-09-031-2/+2
| | | | llvm-svn: 163117
* Add braces to the case statement.Chad Rosier2012-09-031-1/+2
| | | | llvm-svn: 163116
* Removed unused argument.Chad Rosier2012-09-033-18/+15
| | | | llvm-svn: 163104
* some peepholes that should match horizontal add/sub operations.Chris Lattner2012-09-031-0/+12
| | | | llvm-svn: 163103
* [ms-inline asm] Expose the Kind and Opcode variables from theChad Rosier2012-09-033-11/+25
| | | | | | | | | | MatchInstructionImpl() function. These values are used by the ConvertToMCInst() function to index into the ConversionTable. The values are also needed to call the GetMCInstOperandNum() function. llvm-svn: 163101
* Move ErrorLoc decl into the scope where it's actually used.Chad Rosier2012-09-031-2/+1
| | | | llvm-svn: 163100
* Not all targets have efficient ISel code generation for select instructions.Nadav Rotem2012-09-021-0/+5
| | | | | | | | | For example, the ARM target does not have efficient ISel handling for vector selects with scalar conditions. This patch adds a TLI hook which allows the different targets to report which selects are supported well and which selects should be converted to CF duting codegen prepare. llvm-svn: 163093
* Limit domain conversion to cases where it won't break dep chains.Tim Northover2012-09-011-12/+48
| | | | | | | | NEON domain conversion was too heavy-handed with its widened registers, which could have stripped existing instructions of their dependency, leaving them vulnerable to scheduling errors. llvm-svn: 163070
* Fix Thumb2 fixup kind in the integrated-as.Logan Chien2012-09-011-0/+4
| | | | llvm-svn: 163063
* TyposCraig Topper2012-09-012-2/+2
| | | | llvm-svn: 163053
* SelectionDAG: when constructing VZEXT_LOAD from other loads, make sure itsManman Ren2012-08-311-0/+12
| | | | | | | | | | | output chain is correctly setup. As an example, if the original load must happen before later stores, we need to make sure the constructed VZEXT_LOAD is constrained to be before the stores. rdar://11457792 llvm-svn: 163036
* Mark FMA4 instructions as commutable and add them to the folding tables.Craig Topper2012-08-312-0/+64
| | | | llvm-svn: 163035
* Remove an unused argument. The MCInst opcode is set in the ConvertToMCInst()Chad Rosier2012-08-311-47/+42
| | | | | | function nowadays. llvm-svn: 163030
* Add selection of RegOp2MemOpTable3 to canFoldMemoryOperandCraig Topper2012-08-311-0/+2
| | | | llvm-svn: 163029
* Fix PR12359Michael Liao2012-08-311-3/+5
| | | | | | | | | - In addition to undefined, if V2 is zero vector, skip 2nd PSHUFB and POR as well as PSHUFB will zero elements with negative indices. Patch by Sriram Murali <sriram.murali@intel.com> llvm-svn: 163018
* The instruction DINS may be transformed into DINSU or DEXTM dependingJack Carter2012-08-314-23/+31
| | | | | | | | | | | | | | | | | | | | | on the size of the extraction and its position in the 64 bit word. This patch allows support of the dext transformations with mips64 direct object output. 0 <= msb < 32 0 <= lsb < 32 0 <= pos < 32 1 <= size <= 32 DINS The field is entirely contained in the right-most word of the doubleword 32 <= msb < 64 0 <= lsb < 32 0 <= pos < 32 2 <= size <= 64 DINSM The field straddles the words of the doubleword 32 <= msb < 64 32 <= lsb < 64 32 <= pos < 64 1 <= size <= 32 DINSU The field is entirely contained in the left-most word of the doubleword llvm-svn: 163010
* Add a comment to explain what's really going on.Chad Rosier2012-08-311-0/+6
| | | | llvm-svn: 163005
* The ConvertToMCInst() function can't fail, so remove the now dead ↵Chad Rosier2012-08-313-8/+0
| | | | | | Match_ConversionFail enum. llvm-svn: 163002
* Mark FMA3 instructions as commutable so that the operands to the multiply ↵Craig Topper2012-08-311-0/+4
| | | | | | part can be commuted. llvm-svn: 163001
* Add support for converting llvm.fma to fma4 instructions.Craig Topper2012-08-313-36/+76
| | | | llvm-svn: 162999
* Clean up AddedComplexity further after adding UseSSExMichael Liao2012-08-311-17/+13
| | | | llvm-svn: 162973
* Fix a couple of typos in EmitAtomic.Jakob Stoklund Olesen2012-08-311-2/+2
| | | | | | | | | Thumb2 instructions are mostly constrained to rGPR, not tGPR which is for Thumb1. rdar://problem/12203728 llvm-svn: 162968
* X86: Fix encoding of 'movd %xmm0, %rax'Jim Grosbach2012-08-311-1/+1
| | | | | | | The assembly string for the VMOVPQIto64rr instruction incorrectly lacked the 'v' prefix, resulting in mis-assembly of the vanilla movd instruction. llvm-svn: 162963
* With the fix in r162954/162955 every cvt function returns true. Thus, haveChad Rosier2012-08-311-64/+42
| | | | | | | the ConvertToMCInst() return void, rather then a bool. Update all the cvt functions as well. llvm-svn: 162961
* Fix for r162954. Return the Error.Chad Rosier2012-08-301-2/+2
| | | | llvm-svn: 162955
* Move a check to the validateInstruction() function where it more properly ↵Chad Rosier2012-08-301-11/+13
| | | | | | belongs. llvm-svn: 162954
* Typo.Chad Rosier2012-08-301-1/+1
| | | | llvm-svn: 162952
* Introduce 'UseSSEx' to force SSE legacy encodingMichael Liao2012-08-305-122/+158
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Add 'UseSSEx' to force SSE legacy insn not being selected when AVX is enabled. As the penalty of inter-mixing SSE and AVX instructions, we need prevent SSE legacy insn from being generated except explicitly specified through some intrinsics. For patterns supported by both SSE and AVX, so far, we force AVX insn will be tried first relying on AddedComplexity or position in td file. It's error-prone and introduces bugs accidentally. 'UseSSEx' is disabled when AVX is turned on. For SSE insns inherited by AVX, we need this predicate to force VEX encoding or SSE legacy encoding only. For insns not inherited by AVX, we still use the previous predicates, i.e. 'HasSSEx'. So far, these insns fall into the following categories: * SSE insns with MMX operands * SSE insns with GPR/MEM operands only (xFENCE, PREFETCH, CLFLUSH, CRC, and etc.) * SSE4A insns. * MMX insns. * x87 insns added by SSE. 2 test cases are modified: - test/CodeGen/X86/fast-isel-x86-64.ll AVX code generation is different from SSE one. 'vcvtsi2sdq' cannot be selected by fast-isel due to complicated pattern and fast-isel fallback to materialize it from constant pool. - test/CodeGen/X86/widen_load-1.ll AVX code generation is different from SSE one after fixing SSE/AVX inter-mixing. Exec-domain fixing prefers 'vmovapd' instead of 'vmovaps'. llvm-svn: 162919
* PPCISelLowering.cpp: Fix r162725.NAKAMURA Takumi2012-08-301-1/+5
| | | | | | | | [Tobias von Koch] What's happening here is that the CR6SET/CR6UNSET is breaking the chain of register copies glued to the function call (BL_SVR4 node). The scheduler then moves other instructions in between those and the function call, which isn't good! Right. That's the case where there is no chain of register copies before the call, so InFlag == 0... Attached is a new revision of the patch which should fix this for good. llvm-svn: 162916
* PPCISelLowering.cpp: Whitespace.NAKAMURA Takumi2012-08-301-1/+1
| | | | llvm-svn: 162915
* Add support for moving pure S-register to NEON pipeline if desiredTim Northover2012-08-301-2/+71
| | | | llvm-svn: 162898
* Only perform DAG combine on FMAs of legal types.Craig Topper2012-08-301-0/+4
| | | | llvm-svn: 162892
* Fix PR13727Michael Liao2012-08-301-3/+7
| | | | | | | | | | | - The root cause is that target constant materialization in X86 fast-isel creates a PC-rel addressing which may overflow 32-bit range in non-Small code model if .rodata section is allocated too far away from code segment in MCJIT, which uses Large code model so far. - Follow the similar logic to fix non-Small code model in fast-isel by skipping non-Small code model. llvm-svn: 162881
* Rename hasVolatileMemoryRef() to hasOrderedMemoryRef().Jakob Stoklund Olesen2012-08-291-2/+2
| | | | | | | | Ordered memory operations are more constrained than volatile loads and stores because they must be ordered with respect to all other memory operations. llvm-svn: 162861
* Reserve space for the mandatory traceback fields on PPC64.Hal Finkel2012-08-291-4/+8
| | | | | | | | | | | | | | | | | | | | | We need to reserve space for the mandatory traceback fields, though leaving them as zero is appropriate for now. Although the ABI calls for these fields to be filled in fully, no compiler on Linux currently does this, and GDB does not read these fields. GDB uses the first word of zeroes during exception handling to find the end of the function and the size field, allowing it to compute the beginning of the function. DWARF information is used for everything else. We need the extra 8 bytes of pad so the size field is found in the right place. As a comparison, GCC fills in a few of the fields -- language, number of saved registers -- but ignores the rest. IBM's proprietary OSes do make use of the full traceback table facility. Patch by Bill Schmidt. llvm-svn: 162854
OpenPOWER on IntegriCloud