org.apache.commons.lang3.text.translate.NumericEntityEscaper Java Examples

The following examples show how to use org.apache.commons.lang3.text.translate.NumericEntityEscaper. You can vote up the ones you like or vote down the ones you don't like, and go to the original project or source file by following the links above each example. You may check out the related API usage on the sidebar.
Example #1
Source File: HtmlConverter.java    From aceql-http with GNU Lesser General Public License v2.1 6 votes vote down vote up
/**
    * Converts special characters to their HTML values. <br>
    * Example : "�" is converted to "&amp;eacute;"
    * <p>
    * 
    * @param string
    *            A String to convert from original to HTML
    *            <p>
    * @return A String of char converted to HTML equivalent.
    * 
    */

   public static String toHtml(String string) {

if (DO_NOTHING)
    return string;

string = StringEscapeUtils.ESCAPE_HTML4
	.with(NumericEntityEscaper.between(0x7f, Integer.MAX_VALUE))
	.translate(string);

if (string != null) {
    string = string.replaceAll("&amp;", "&"); // To keep same result if
					      // multi-call
}

return string;
   }
 
Example #2
Source File: StringEscapeUtilsTest.java    From astor with GNU General Public License v2.0 6 votes vote down vote up
@Test
public void testEscapeXmlAllCharacters() {
    // http://www.w3.org/TR/xml/#charsets says:
    // Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character,
    // excluding the surrogate blocks, FFFE, and FFFF. */
    final CharSequenceTranslator escapeXml = StringEscapeUtils.ESCAPE_XML
            .with(NumericEntityEscaper.below(9), NumericEntityEscaper.between(0xB, 0xC), NumericEntityEscaper.between(0xE, 0x19),
                    NumericEntityEscaper.between(0xD800, 0xDFFF), NumericEntityEscaper.between(0xFFFE, 0xFFFF), NumericEntityEscaper.above(0x110000));

    assertEquals("&#0;&#1;&#2;&#3;&#4;&#5;&#6;&#7;&#8;", escapeXml.translate("\u0000\u0001\u0002\u0003\u0004\u0005\u0006\u0007\u0008"));
    assertEquals("\t", escapeXml.translate("\t")); // 0x9
    assertEquals("\n", escapeXml.translate("\n")); // 0xA
    assertEquals("&#11;&#12;", escapeXml.translate("\u000B\u000C"));
    assertEquals("\r", escapeXml.translate("\r")); // 0xD
    assertEquals("Hello World! Ain&apos;t this great?", escapeXml.translate("Hello World! Ain't this great?"));
    assertEquals("&#14;&#15;&#24;&#25;", escapeXml.translate("\u000E\u000F\u0018\u0019"));
}
 
Example #3
Source File: XmlWriter.java    From tcases with MIT License 5 votes vote down vote up
/**
 * Writes an attribute definition.
 */
protected void writeAttribute( String name, String value)
  {
  print( " ");
  print( name);
  print( "=\"");
  // StringEscapeUtils escapes symbols ', < >, &, ", and some control characters
  // NumericEntityEscaper translates additional control characters \n, \t, ...
  print( NumericEntityEscaper.below(0x20).translate(StringEscapeUtils.escapeXml11(value)));
  print( "\"");
  }
 
Example #4
Source File: StringEscapeUtilsTest.java    From astor with GNU General Public License v2.0 3 votes vote down vote up
/**
 * Tests Supplementary characters. 
 * <p>
 * From http://www.w3.org/International/questions/qa-escapes
 * </p>
 * <blockquote>
 * Supplementary characters are those Unicode characters that have code points higher than the characters in
 * the Basic Multilingual Plane (BMP). In UTF-16 a supplementary character is encoded using two 16-bit surrogate code points from the
 * BMP. Because of this, some people think that supplementary characters need to be represented using two escapes, but this is incorrect
 * - you must use the single, code point value for that character. For example, use &#x233B4; rather than &#xD84C;&#xDFB4;.
 * </blockquote>
 * @see <a href="http://www.w3.org/International/questions/qa-escapes">Using character escapes in markup and CSS</a>
 * @see <a href="https://issues.apache.org/jira/browse/LANG-728">LANG-728</a>
 */
@Test
public void testEscapeXmlSupplementaryCharacters() {
    CharSequenceTranslator escapeXml = 
        StringEscapeUtils.ESCAPE_XML.with( NumericEntityEscaper.between(0x7f, Integer.MAX_VALUE) );

    assertEquals("Supplementary character must be represented using a single escape", "&#144308;",
            escapeXml.translate("\uD84C\uDFB4"));
}
 
Example #5
Source File: StringEscapeUtilsTest.java    From astor with GNU General Public License v2.0 3 votes vote down vote up
/**
 * Tests Supplementary characters. 
 * <p>
 * From http://www.w3.org/International/questions/qa-escapes
 * </p>
 * <blockquote>
 * Supplementary characters are those Unicode characters that have code points higher than the characters in
 * the Basic Multilingual Plane (BMP). In UTF-16 a supplementary character is encoded using two 16-bit surrogate code points from the
 * BMP. Because of this, some people think that supplementary characters need to be represented using two escapes, but this is incorrect
 * - you must use the single, code point value for that character. For example, use &#x233B4; rather than &#xD84C;&#xDFB4;.
 * </blockquote>
 * @see <a href="http://www.w3.org/International/questions/qa-escapes">Using character escapes in markup and CSS</a>
 * @see <a href="https://issues.apache.org/jira/browse/LANG-728">LANG-728</a>
 */
@Test
public void testEscapeXmlSupplementaryCharacters() {
    final CharSequenceTranslator escapeXml = 
        StringEscapeUtils.ESCAPE_XML.with( NumericEntityEscaper.between(0x7f, Integer.MAX_VALUE) );

    assertEquals("Supplementary character must be represented using a single escape", "&#144308;",
            escapeXml.translate("\uD84C\uDFB4"));
}
 
Example #6
Source File: StringEscapeUtilsTest.java    From astor with GNU General Public License v2.0 3 votes vote down vote up
/**
 * Tests Supplementary characters. 
 * <p>
 * From http://www.w3.org/International/questions/qa-escapes
 * </p>
 * <blockquote>
 * Supplementary characters are those Unicode characters that have code points higher than the characters in
 * the Basic Multilingual Plane (BMP). In UTF-16 a supplementary character is encoded using two 16-bit surrogate code points from the
 * BMP. Because of this, some people think that supplementary characters need to be represented using two escapes, but this is incorrect
 * - you must use the single, code point value for that character. For example, use &#x233B4; rather than &#xD84C;&#xDFB4;.
 * </blockquote>
 * @see <a href="http://www.w3.org/International/questions/qa-escapes">Using character escapes in markup and CSS</a>
 * @see <a href="https://issues.apache.org/jira/browse/LANG-728">LANG-728</a>
 */
@Test
public void testEscapeXmlSupplementaryCharacters() {
    CharSequenceTranslator escapeXml = 
        StringEscapeUtils.ESCAPE_XML.with( NumericEntityEscaper.between(0x7f, Integer.MAX_VALUE) );

    assertEquals("Supplementary character must be represented using a single escape", "&#144308;",
            escapeXml.translate("\uD84C\uDFB4"));
}