Changeset 7854


Ignore:
Timestamp:
06/19/10 20:46:04 (2 years ago)
Author:
MichaelTempest
Message:

Item5990, Item9170, Item761, Item2231: Fix character encoding issues with WysiwygPlugin

  • As far as I can tell, unicode characters and entities are now converted correctly.
  • Numeric entities in ordinary text are converted to characters in the site charset (if the site charset can represent the character) or named entities (if there is a named entity for the character) which should improve readability of TML. The same conversion is also applied to UTF-8 characters not represented in the site charset, and numeric entities are used where necessary (for characters for which there are no named entities).
  • Entities are now preserved (i.e. not modified at all) inside sticky and verbatim blocks.

There are several changes here, but I cannot do this in small steps without breaking things in between. Each time I fixed one problem, another (lurking) problem popped up somewhere else.

HTML::Entities::_decode_entities converts numeric entities to characters. The numbers always correspond to Unicode codepoints (see  http://en.wikipedia.org/wiki/Html_entities#HTML_character_references). Foswiki uses HTML::Entities::_decode_entities to convert named entities to characters. I changed the named-entity conversion to convert to Unicode codepoints, too (it was converting to site charset, which can cause data corruption for numeric entities in the range 127 to 255 for charsets other than UTF-8, ISO-8859-1).

This meant that I had to change the text to Unicode characters (not encoded as UTF-8) before decoding entities, which meant extra conversions, including a step to convert characters that cannot be represented in the site charset to entities.

There was code to do that in RESTParameter2SiteCharSet, but it used PERLQQ encoding, which corrupted text (converting to perl escape sequences, e.g. \x{2460}, which surprises everyone who encounters this behaviour). That was fixed, too.

Many browsers (including Firefox) interpret pages identified as ISO-8859-1 as if they were encoded with Windows-1252. When posting (e.g. saving) in response to such pages, they also encode data in the same way. This is why mapUnicode2HighBit (and its opposite, mapHighBit2Unicode) were needed. However, those functions complicate the conversion to entities of characters that cannot be represented in the site charset. Perl's standards-compliant Encode to the rescue! If you tell Encode to use the Windows-1252 encoding instead of ISO-8859-1, then it does exactly what we want, and those mapping functions are not necessary.

The WysiwygPluginTests test the conversions for various site charsets using ranges of character codes. I could not determine what charset(s) those character codes referred to, so I changed the tests to be explicit - either unicode codepoints or codes in the site charset (given as parameter to the test function). I removed the tests for Unicode codepoints 127 to 159 because they are control characters, which (as far as I am aware) Foswiki does not use. Instead, I added tests for the Unicode codepoints for the Windows-1252 characters with codes 127 to 159.

Foswiki::Plugins::WysiwygPlugin::Constants stores computed data that is derived from %Foswiki::cfg. Some of the WysiwygPlugin unit tests that depend on that data change %Foswiki::cfg temporarily, so the stored data in Foswiki::Plugins::WysiwygPlugin::Constants must be reset before running unit tests that depend on that data.

I tested this with the following site charsets: (default value), 'ISO-8859-1', 'ISO-8859-15', 'utf-8'

Location:
trunk/WysiwygPlugin
Files:
8 edited

Legend:

Unmodified
Added
Removed
  • trunk/WysiwygPlugin/lib/Foswiki/Plugins/WysiwygPlugin/Constants.pm

    r7644 r7854  
    44use strict; 
    55use warnings; 
     6 
     7use Encode; 
    68 
    79# HTML elements that are strictly block type, as defined by 
     
    203205############ Encodings ############### 
    204206 
    205 # Mapping high-bit characters from unicode back to iso-8859-1 
    206 # (a.k.a Windows 1252 a.k.a "ANSI") - http://www.alanwood.net/demos/ansi.html 
    207 our %unicode2HighBit = ( 
    208     chr(8364) => chr(128), 
    209     chr(8218) => chr(130), 
    210     chr(402)  => chr(131), 
    211     chr(8222) => chr(132), 
    212     chr(8230) => chr(133), 
    213     chr(8224) => chr(134), 
    214     chr(8225) => chr(135), 
    215     chr(710)  => chr(136), 
    216     chr(8240) => chr(137), 
    217     chr(352)  => chr(138), 
    218     chr(8249) => chr(139), 
    219     chr(338)  => chr(140), 
    220     chr(381)  => chr(142), 
    221     chr(8216) => chr(145), 
    222     chr(8217) => chr(146), 
    223     chr(8220) => chr(147), 
    224     chr(8221) => chr(148), 
    225     chr(8226) => chr(149), 
    226     chr(8211) => chr(150), 
    227     chr(8212) => chr(151), 
    228     chr(732)  => chr(152), 
    229     chr(8482) => chr(153), 
    230     chr(353)  => chr(154), 
    231     chr(8250) => chr(155), 
    232     chr(339)  => chr(156), 
    233     chr(382)  => chr(158), 
    234     chr(376)  => chr(159), 
    235 ); 
    236  
    237 # Reverse mapping 
    238 our %highBit2Unicode = map { $unicode2HighBit{$_} => $_ } keys %unicode2HighBit; 
    239  
    240 our $unicode2HighBitChars = join( '', keys %unicode2HighBit ); 
    241 our $highBit2UnicodeChars = join( '', keys %highBit2Unicode ); 
    242207our $encoding; 
    243208 
     
    246211        $encoding = 
    247212          Encode::resolve_alias( $Foswiki::cfg{Site}{CharSet} || 'iso-8859-1' ); 
     213 
     214        $encoding = 'windows-1252' if $encoding =~ /^iso-8859-1$/i; 
    248215    } 
    249216    return $encoding; 
    250217} 
    251218 
    252 # Map selected unicode characters back to high-bit chars if 
    253 # iso-8859-1 is selected. This is required because the same characters 
    254 # have different code points in unicode and iso-8859-1. For example, 
    255 # € is 128 in iso-8859-1 and 8364 in unicode. 
    256 sub mapUnicode2HighBit { 
    257     if ( encoding() eq 'iso-8859-1' ) { 
    258  
    259         # Map unicode back to iso-8859 high-bit chars 
    260         $_[0] =~ s/([$unicode2HighBitChars])/$unicode2HighBit{$1}/ge; 
    261     } 
    262 } 
    263  
    264 # Map selected high-bit chars to unicode if 
    265 # iso-8859-1 is selected. 
    266 sub mapHighBit2Unicode { 
    267     if ( encoding() eq 'iso-8859-1' ) { 
    268  
    269         # Map unicode back to iso-8859 high-bit chars 
    270         $_[0] =~ s/([$highBit2UnicodeChars])/$highBit2Unicode{$1}/ge; 
     219my $siteCharsetRepresentable; 
     220 
     221# Convert characters (unicode codepoints) that cannot be represented in 
     222# the site charset to entities. Prefer named entities to numeric entities. 
     223sub convertNotRepresentabletoEntity { 
     224    if ( encoding() =~ /^utf-?8/ ) { 
     225        # UTF-8 can represent all characters, so no entities needed 
     226    } 
     227    else { 
     228        unless ($siteCharsetRepresentable) { 
     229            # Produce a string of unicode characters that contains all of the 
     230            # characters representable in the site charset 
     231            $siteCharsetRepresentable = ''; 
     232            for my $code (0 .. 255) { 
     233                my $unicodeChar = Encode::decode(encoding(), chr($code), Encode::FB_PERLQQ); 
     234                if ($unicodeChar =~ /^\\x/) { 
     235                    # code is not valid, so skip it 
     236                } 
     237                else { 
     238                    # Escape codes in the standard ASCII range, as necessary, 
     239                    # to avoid special interpretation by perl 
     240                    $unicodeChar = quotemeta($unicodeChar) if ord($unicodeChar) <= 127; 
     241 
     242                    $siteCharsetRepresentable .= $unicodeChar; 
     243                } 
     244            } 
     245        } 
     246 
     247        require HTML::Entities; 
     248        $_[0] = HTML::Entities::encode_entities($_[0], "^$siteCharsetRepresentable"); 
     249        # All characters that cannot be represented in the site charset are now encoded as entities 
     250        # Named entities are used if available, otherwise numeric entities, 
     251        # because named entities produce more readable TML 
    271252    } 
    272253} 
     
    284265  Oslash Ugrave Uacute Ucirc  Uuml   Yacute THORN  szlig 
    285266  agrave aacute acirc  atilde auml   aring  aelig  ccedil 
    286   egrave eacute ecirc  uml    igrave iacute icirc  iuml 
     267  egrave eacute ecirc  euml   igrave iacute icirc  iuml 
    287268  eth    ntilde ograve oacute ocirc  otilde ouml   divide 
    288269  oslash ugrave uacute ucirc  uuml   yacute thorn  yuml 
     
    292273our $safe_entities; 
    293274 
    294 # Get a hash that maps the safe entities values to characters 
    295 # in the site charset. 
     275# Get a hash that maps the safe entities values to unicode characters 
    296276sub safeEntities { 
    297277    unless ($safe_entities) { 
     
    301281            my $unicode = HTML::Entities::decode_entities("&$entity;"); 
    302282 
    303             # Map unicode back to iso-8859 high-bit chars if required 
    304             mapUnicode2HighBit($unicode); 
    305             $safe_entities->{$entity} = Encode::encode( encoding(), $unicode ); 
     283            $safe_entities->{"$entity"} = $unicode; 
    306284        } 
    307285    } 
     
    325303} 
    326304 
     305# Allow the unit tests to force re-initialisation of  
     306# %Foswiki::cfg-dependent cached data 
     307sub reinitialiseForTesting { 
     308    undef $encoding; 
     309    undef $siteCharsetRepresentable; 
     310} 
     311 
    327312# Create shorter alias for other modules 
    328313no strict 'refs'; 
  • trunk/WysiwygPlugin/lib/Foswiki/Plugins/WysiwygPlugin/HTML2TML.pm

    r7737 r7854  
    9292=cut 
    9393 
     94sub debugEncode { 
     95    my $text = shift; 
     96    $text = WC::debugEncode($text); 
     97    $text =~ s/([^\x20-\x7E])/sprintf '\\x{%X}', ord($1)/ge; 
     98    return $text; 
     99} 
     100 
    94101sub convert { 
    95102    my ( $this, $text, $options ) = @_; 
     
    101108      if ( $options->{very_clean} ); 
    102109 
    103     # If the text is UTF8-encoded we have to decode it first, otherwise 
    104     # the HTML parser will barf. 
     110    # $text is octets, encoded as per the $Foswiki::cfg{Site}{CharSet} 
     111    #print STDERR "input     [". debugEncode($text). "]\n\n"; 
     112 
     113    # Convert (safe) named entities back to the 
     114    # site charset. Numeric entities are mapped straight to the 
     115    # corresponding code point unless their value overflow. 
     116    # HTML::Entities::_decode_entities converts numeric entities  
     117    # to Unicode codepoints, so first convert the text to Unicode 
     118    # characters 
    105119    if ( WC::encoding() =~ /^utf-?8/ ) { 
     120        # text is already UTF-8, so just decode 
    106121        $text = Encode::decode_utf8($text); 
    107122    } 
     123    else { 
     124        # convert to unicode codepoints 
     125        $text = Encode::decode(WC::encoding(), $text); 
     126    } 
     127    # $text is now Unicode characters 
     128    #print STDERR "unicoded  [". debugEncode($text). "]\n\n"; 
     129 
     130    # Make sure that & < > ' and " remain encoded, because the parser depends 
     131    # on it. The safe-entities does not include the corresponding named 
     132    # entities, so convert numeric entities for these characters to the named  
     133    # entity. 
     134    $text =~ s/\&\#38;/\&amp;/go; 
     135    $text =~ s/\&\#x26;/\&amp;/goi; 
     136    $text =~ s/\&\#60;/\&lt;/go; 
     137    $text =~ s/\&\#x3c;/\&lt;/goi; 
     138    $text =~ s/\&\#62;/\&gt;/go; 
     139    $text =~ s/\&\#x3e;/\&gt;/goi; 
     140    $text =~ s/\&\#39;/\&apos;/go; 
     141    $text =~ s/\&\#x27;/\&apos;/goi; 
     142    $text =~ s/\&\#34;/\&quot;/go; 
     143    $text =~ s/\&\#x22;/\&quot;/goi; 
     144 
     145    require HTML::Entities; 
     146    HTML::Entities::_decode_entities( $text, WC::safeEntities() ); 
     147    #print STDERR "decodedent[". debugEncode($text). "]\n\n"; 
     148 
     149    # HTML::Entities::_decode_entities is NOT aware of the site charset 
     150    # so it converts numeric entities to characters willy-nilly. 
     151    # Some of those were entities in the first place because the 
     152    # site character set cannot represent them. 
     153    # Convert them back to entities: 
     154    WC::convertNotRepresentabletoEntity($text); 
     155    #print STDERR "notrep2ent[". debugEncode($text). "]\n\n"; 
     156 
     157    # $text is now Unicode characters that are representable 
     158    # in the site charset. Convert to the site charset: 
     159    if ( WC::encoding() =~ /^utf-?8/ ) { 
     160        # nothing to do, already in unicode 
     161    } 
     162    else { 
     163        $text = Encode::encode(WC::encoding(), $text); 
     164    } 
     165    #print STDERR "sitechrset[". debugEncode($text). "]\n\n"; 
    108166 
    109167    # get rid of nasties 
     
    120178    $text = $this->{stackTop}->rootGenerate($opts); 
    121179 
     180    #print STDERR "parsed    [". debugEncode($text). "]\n\n"; 
     181 
    122182    # If the site charset is UTF8, we need to recode 
    123183    if ( WC::encoding() =~ /^utf-?8/ ) { 
    124184        $text = Encode::encode_utf8($text); 
    125     } 
    126  
    127     # Convert (safe) named entities back to the 
    128     # site charset. Numeric entities are mapped straight to the 
    129     # corresponding code point unless their value overflow. 
    130     require HTML::Entities; 
    131     HTML::Entities::_decode_entities( $text, WC::safeEntities() ); 
    132  
    133     # After decoding entities, we have to map unicode characters 
    134     # back to high bit 
    135     WC::mapUnicode2HighBit($text); 
    136  
     185        #print STDERR "re-encoded[". debugEncode($text). "]\n\n"; 
     186    } 
     187 
     188    # $text is octets, encoded as per the $Foswiki::cfg{Site}{CharSet} 
    137189    return $text; 
    138190} 
  • trunk/WysiwygPlugin/lib/Foswiki/Plugins/WysiwygPlugin/HTML2TML/Node.pm

    r7737 r7854  
    227227    my ( $this, $opts ) = @_; 
    228228 
    229     #print STDERR "Raw       [", WC::debugEncode($this->stringify()), "\n\n"; 
     229    #print STDERR "Raw       [", WC::debugEncode($this->stringify()), "]\n\n"; 
    230230    $this->cleanParseTree(); 
    231231 
  • trunk/WysiwygPlugin/lib/Foswiki/Plugins/WysiwygPlugin/Handlers.pm

    r7813 r7854  
    129129    return unless $query; 
    130130 
    131     if (   $Foswiki::cfg{Site}{CharSet} 
    132         && $Foswiki::cfg{Site}{CharSet} =~ /^utf-?8$/i ) 
    133     { 
    134  
    135         # If the site charset is utf-8, then form POSTs (such as the one 
    136         # that got us here) are utf-8 encoded. we have to decode to prevent 
    137         # the HTML parser from going tits up when it sees utf-8 in the data. 
    138         $text = Encode::decode_utf8($text); 
    139     } 
    140  
    141131    return 
    142132      unless defined( $query->param('wysiwyg_edit') ) 
     
    155145sub TranslateHTML2TML { 
    156146    my ( $text, $topic, $web ) = @_; 
     147    # $text must be in encoded in the site charset 
     148    ASSERT( $text !~ /[^\x00-\xff]/, 
     149        "only octets expected in input to TranslateHTML2TML" ) 
     150      if DEBUG; 
    157151 
    158152    unless ($html2tml) { 
     
    183177    $text =~ s/\s+$/\n/s; 
    184178 
     179    ASSERT( $text !~ /[^\x00-\xff]/, 
     180        "only octets expected in topic text output from TranslateHTML2TML" ) 
     181      if DEBUG; 
    185182    return $top . $text . $bottom; 
    186183} 
     
    493490    my ( $text, $web, $topic, @extraConvertOptions ) = @_; 
    494491 
     492    ASSERT( $text !~ /[^\x00-\xff]/, 
     493        "only octets expected in input to TranslateTML2HTML" ) 
     494      if DEBUG; 
     495 
    495496    # Translate the topic text to pure HTML. 
    496497    unless ($Foswiki::Plugins::WysiwygPlugin::tml2html) { 
     
    499500          new Foswiki::Plugins::WysiwygPlugin::TML2HTML(); 
    500501    } 
    501     return $Foswiki::Plugins::WysiwygPlugin::tml2html->convert( 
     502    my $html = $Foswiki::Plugins::WysiwygPlugin::tml2html->convert( 
    502503        $_[0], 
    503504        { 
     
    510511        } 
    511512    ); 
     513    ASSERT( $html !~ /[^\x00-\xff]/, 
     514        "only octets expected in output from TranslateTML2HTML" ) 
     515      if DEBUG; 
     516    return $html; 
    512517} 
    513518 
     
    589594    my ($text) = @_; 
    590595 
    591     # $text is supposed to contain octets that are a valid UTF-8 encoding. 
     596    #print STDERR "octets in [". WC::debugEncode($text). "]\n\n"; 
     597    # $text is supposed to contain octets that are valid UTF-8. 
    592598    # $text should certainly not have any codes above 255. 
    593599    ASSERT( $text !~ /[^\x00-\xff]/, 
     
    595601      if DEBUG; 
    596602 
    597     # $text might contain octets that are not a valid UTF-8 encoding 
     603    # $text might contain octets that are not valid UTF-8 
    598604    # because it came from the browser, and so it might be hostile content. 
    599605    # Encode::FB_PERLQQ makes decode_utf8 convert invalid octet sequences 
     
    603609 
    604610    # $text now contains unicode characters 
    605  
    606     WC::mapUnicode2HighBit($text); 
    607  
    608     if ( $Foswiki::cfg{Site}{CharSet} ) { 
    609         $text = Encode::encode( $Foswiki::cfg{Site}{CharSet}, 
    610             $text, Encode::FB_PERLQQ ); 
    611  
    612         # $text is now encoded as per the site charset.  
    613         # For UTF-8 - that means octets. 
    614  
    615         # SMELL: The use of Encode::FB_PERLQQ is probably incorrect here. 
    616         # If {Site}{CharSet} is set to 'iso-8859-1' then wide characters 
    617         # (with codes greater than 256) which cannot be represented in 
    618         # iso-5589-1 are encoded as perl escapes e.g. \x{03b1}. 
    619         # Encode::FB_HTMLCREF would be far better, as characters that 
    620         # cannot be represented in the specified site character set 
    621         # would be converted to HTML entities e.g. &#945; 
    622     } 
    623  
    624     # SMELL: if {Site}{CharSet} is blank (which is the default) 
    625     # then $text may contain wide characters. 
    626     # Thus, $text is NOT encoded in the SiteCharSet! 
     611    #print STDERR "as utf-8  [". WC::debugEncode($text). "]\n\n"; 
     612 
     613    if ( WC::encoding() =~ /^utf-?8/ ) { 
     614        $text = Encode::encode_utf8($text); 
     615    } 
     616    else { 
     617        # The site charset is a non-UTF-8 8-bit charset 
     618 
     619        WC::convertNotRepresentabletoEntity($text); 
     620        # All characters that cannot be represented in the site charset are now encoded as entities 
     621        # Named entities are used if available, otherwise numeric entities, 
     622        # because named entities produce more readable TML 
     623 
     624        # Encode $text in the site charset 
     625        # The Encode::FB_HTMLCREF should not be needed, as all characters in $text 
     626        # are supposed to be representable in the site charset. 
     627        $text = Encode::encode( WC::encoding(), 
     628            $text, Encode::FB_HTMLCREF ); 
     629    } 
     630 
     631    # $text is now encoded as per the site charset.  
     632    # For UTF-8 - that means octets. 
     633    # For non-UTF8, Unicode characters that cannot be represented in the site charset 
     634    # are converted to HTML entities (preferring named entities to numeric entities) 
    627635 
    628636    # The return value is supposed to be according to the currently selected 
     
    632640        "only octets expected in return value for RESTParameter2SiteCharSet" ) 
    633641      if DEBUG; 
     642    #print STDERR "octets out [". WC::debugEncode($text). "]\n\n"; 
    634643    return $text; 
    635644} 
     
    642651sub returnRESTResult { 
    643652    my ( $response, $status, $text ) = @_; 
    644  
    645     if ( $Foswiki::cfg{Site}{CharSet} ) { 
    646         $text = Encode::decode( $Foswiki::cfg{Site}{CharSet}, 
    647             $text, Encode::FB_PERLQQ ); 
    648     } 
    649  
    650     WC::mapHighBit2Unicode($text); 
     653    ASSERT( $text !~ /[^\x00-\xff]/, 
     654        "only octets expected in input to returnRESTResult" ) 
     655      if DEBUG; 
     656 
     657    $text = Encode::decode( WC::encoding(), 
     658        $text, Encode::FB_HTMLCREF ); 
     659 
     660    #print STDERR "unicodechr[". WC::debugEncode($text). "]\n\n"; 
    651661 
    652662    $text = Encode::encode_utf8($text); 
     
    730740    } 
    731741    my $html = Foswiki::Func::getCgiQuery()->param('text'); 
     742    #print STDERR "param     [". Foswiki::Plugins::WysiwygPlugin::HTML2TML::debugEncode($html). "]\n\n"; 
    732743 
    733744    $html = RESTParameter2SiteCharSet($html); 
     745    #print STDERR "paraminSC [". Foswiki::Plugins::WysiwygPlugin::HTML2TML::debugEncode($html). "]\n\n"; 
    734746 
    735747    $html =~ s/<!--$SECRET_ID-->//go; 
     
    744756        } 
    745757    ); 
     758    #print STDERR "tml inSc  [". Foswiki::Plugins::WysiwygPlugin::HTML2TML::debugEncode($tml). "]\n\n"; 
    746759 
    747760    returnRESTResult( $response, 200, $tml ); 
  • trunk/WysiwygPlugin/test/unit/WysiwygPlugin/BrowserEditorInterface.pm

    r7644 r7854  
    88 
    99use Scalar::Util; 
     10 
     11sub _DEBUG {0}; 
    1012 
    1113my $editFrameLocator        = "css=iframe#topic_ifr"; 
     
    1416my $editTextareaLocator     = "css=textarea#topic"; 
    1517my $editCancelButtonLocator = "css=input#cancel"; 
     18my $editSaveButtonLocator   = "css=input#save"; 
    1619 
    1720# This must match the text in foswiki_tiny.js 
     
    3437            _editorMode          => {}, 
    3538            _interactions        => 0, 
     39            _web                 => undef, 
     40            _topic               => undef, 
    3641        }, 
    3742        $class 
     
    4550sub init { 
    4651    my $this = shift; 
     52    print STDERR "BrowserEditorInterface::init()\n" if _DEBUG; 
    4753 
    4854    if ( not $this->{_initWebPreferences} ) { 
     
    7985sub finish { 
    8086    my $this = shift; 
     87    print STDERR "BrowserEditorInterface::finish()\n" if _DEBUG; 
     88 
    8189    for my $browser ( keys %{ $this->{_editorModeForBrowser} } ) { 
    8290        $this->{_test}->selectBrowser($browser); 
     
    97105sub editorMode { 
    98106    my $this = shift; 
     107    print STDERR "BrowserEditorInterface::editorMode()\n" if _DEBUG; 
    99108    if ( 
    100109        exists $this->{_editorModeForBrowser} 
     
    113122    my $web   = shift; 
    114123    my $topic = shift; 
     124    print STDERR "BrowserEditorInterface::openWysiwygEditor()\n" if _DEBUG; 
     125    $this->{_web} = $web; 
     126    $this->{_topic} = $topic; 
    115127 
    116128    $this->cancelEdit() 
     
    135147sub cancelEdit { 
    136148    my $this = shift; 
     149    print STDERR "BrowserEditorInterface::cancelEdit()\n" if _DEBUG; 
    137150 
    138151    return 
     
    143156    $this->{_test}->selenium->click($editCancelButtonLocator); 
    144157 
     158    $this->{_web} = undef; 
     159    $this->{_topic} = undef; 
    145160    delete $this->{_editorModeForBrowser}->{ $this->{_test}->browserName() }; 
    146161} 
    147162 
     163sub save { 
     164    my $this = shift; 
     165    print STDERR "BrowserEditorInterface::save()\n" if _DEBUG; 
     166 
     167    $this->{_test}->assert(0, "editor not open") 
     168      unless exists $this->{_editorModeForBrowser} 
     169          ->{ $this->{_test}->browserName() }; 
     170 
     171    $this->selectTopFrame(); 
     172    $this->{_test}->selenium->click_ok($editSaveButtonLocator); 
     173    $this->{_test}->{selenium}->wait_for_page_to_load( $this->{_test}->{selenium_timeout} ); 
     174 
     175    my $postSaveLocation = $this->{_test}->{selenium}->get_location(); 
     176    my $viewUrl = Foswiki::Func::getScriptUrl( $this->{_web}, $this->{_topic}, 'view'); 
     177    $this->{_test}->assert_matches(qr/\Q$viewUrl\E$/, $postSaveLocation); 
     178 
     179    $this->{_web} = undef; 
     180    $this->{_topic} = undef; 
     181    delete $this->{_editorModeForBrowser}->{ $this->{_test}->browserName() }; 
     182} 
     183 
    148184sub selectWysiwygEditorFrame { 
    149185    my $this = shift; 
     186    print STDERR "BrowserEditorInterface::selectWysiwygEditorFrame()\n" if _DEBUG; 
    150187    $this->{_test}->selenium->select_frame_ok($editFrameLocator); 
    151188} 
     
    153190sub selectTopFrame { 
    154191    my $this = shift; 
     192    print STDERR "BrowserEditorInterface::selectTopFrame()\n" if _DEBUG; 
    155193    $this->{_test}->selenium->select_frame_ok("relative=top"); 
    156194} 
     
    159197    my $this = shift; 
    160198    my $text = shift; 
     199    print STDERR "BrowserEditorInterface::setWikitextEditorContent()\n" if _DEBUG; 
    161200    $this->{_test}->type( $editTextareaLocator, $text ); 
    162201 
     
    166205sub getWikitextEditorContent { 
    167206    my $this = shift; 
     207    print STDERR "BrowserEditorInterface::getWikitextEditorContent()\n" if _DEBUG; 
    168208    return $this->{_test}->selenium->get_value($editTextareaLocator); 
    169209} 
     
    172212    my $this = shift; 
    173213    my $text = shift; 
     214    print STDERR "BrowserEditorInterface::setWysiwygEditorContent()\n" if _DEBUG; 
    174215 
    175216    $this->selectWysiwygEditorFrame(); 
     
    202243sub getWysiwygEditorContent { 
    203244    my $this = shift; 
     245    print STDERR "BrowserEditorInterface::getWysiwygEditorContent()\n" if _DEBUG; 
    204246 
    205247    $this->selectWysiwygEditorFrame(); 
     
    213255sub selectWikitextMode { 
    214256    my $this = shift; 
     257    print STDERR "BrowserEditorInterface::selectWikitextMode()\n" if _DEBUG; 
    215258    return 
    216259      if $this->{_editorModeForBrowser}->{ $this->{_test}->browserName() } eq 
     
    248291sub selectWysiwygMode { 
    249292    my $this = shift; 
     293    print STDERR "BrowserEditorInterface::selectWysiwygMode()\n" if _DEBUG; 
    250294    return 
    251295      if $this->{_editorModeForBrowser}->{ $this->{_test}->browserName() } eq 
  • trunk/WysiwygPlugin/test/unit/WysiwygPlugin/BrowserTranslatorTests.pm

    r7794 r7854  
    55package BrowserTranslatorTests; 
    66 
     7use Encode; 
     8 
    79use FoswikiSeleniumTestCase; 
    810use TranslatorBase; 
     
    1214use Foswiki::Func; 
    1315use Foswiki::Plugins::WysiwygPlugin::Handlers; 
     16use Foswiki::Plugins::WysiwygPlugin::Constants; 
    1417 
    1518# The following big table contains all the testcases. These are 
     
    7174        name => 'Item1798', 
    7275        exec => $TranslatorBase::ROUNDTRIP, 
    73         tml  => <<HERE, 
     76        tml  => <<'HERE', 
    7477| [[LegacyTopic1]] | Main.SomeGuy | 
    7578%SEARCH{"legacy" nonoise="on" format="| [[\$topic]] | [[\$wikiname]] |"}% 
    7679HERE 
    77         html => <<THERE, 
     80        html => <<'THERE', 
    7881<table cellspacing="1" cellpadding="0" border="1"> 
    7982<tr><td><span class="WYSIWYG_LINK">[[LegacyTopic1]]</span></td><td><span class="WYSIWYG_LINK">Main.SomeGuy</span></td></tr> 
     
    8689        exec => $TranslatorBase::ROUNDTRIP, 
    8790        tml  => '&#9792;', 
    88         finaltml  => chr(9792), 
     91        finaltml  =>  _siteCharsetIsUTF8() ? chr(9792) : '&#x2640;', 
    8992    }, 
    9093    { 
     
    9497        finaltml  => '&alpha;', 
    9598    }, 
     99 
     100    # This test's finaltml is correct for ISO-8859-1 and ISO-8859-15, 
     101    # but not necessarily any other charsets 
     102    ( ( not $Foswiki::cfg{Site}{CharSet} or 
     103        $Foswiki::cfg{Site}{CharSet} =~ /^iso-8859-15?$/i) 
     104    ? { 
     105        name => 'safeNamedEntity', 
     106        exec => $TranslatorBase::ROUNDTRIP, 
     107        tml  => '&Aring;', 
     108        finaltml  => chr(0xC5),  
     109      } 
     110    : () ), 
     111 
    96112    { 
    97113        name => 'namedEntity', 
     
    197213 
    198214    {    # Copied on 29 April 2010 from 
    199            # http://merlin.lavrsen.dk/foswiki10/bin/view/Myweb/NewLineEatingTest 
     215         # http://merlin.lavrsen.dk/foswiki10/bin/view/Myweb/NewLineEatingTest 
    200216         # and then split into multiple tests to make analysing the result managable 
    201217        name => 'KennethsNewLineEatingTest1', 
     
    406422]; 
    407423 
     424sub _siteCharsetIsUTF8 { 
     425    Foswiki::Plugins::WysiwygPlugin::Constants::reinitialiseForTesting(); 
     426    return Foswiki::Plugins::WysiwygPlugin::Constants::encoding() =~ /^utf-?8/; 
     427} 
     428 
    408429sub new { 
    409430    my $self = shift()->SUPER::new( 'BrowserTranslator', @_ ); 
     
    412433 
    413434    return $self; 
     435} 
     436 
     437sub set_up { 
     438    my $this = shift; 
     439    $this->SUPER::set_up(); 
     440 
     441    Foswiki::Plugins::WysiwygPlugin::Constants::reinitialiseForTesting(); 
    414442} 
    415443 
     
    432460    $this->SUPER::DESTROY if $this->can('SUPER::DESTROY'); 
    433461} 
     462 
     463# Item9170 
     464sub verify_editSaveTopicWithUnnamedUnicodeEntity { 
     465    my $this = shift; 
     466     
     467    $this->{editor}->init(); 
     468 
     469    # Close the editor because this tests uses a different topic 
     470    if ( $this->{editor}->editorMode() ) { 
     471        $this->{editor}->cancelEdit(); 
     472    } 
     473 
     474    # \x{eb} is representable in 8-bit charsets.  
     475    # In iso-8859-1 it is e-with-umluat, or &euml; 
     476    # &#x2640 is a valid unicode character without a 
     477    # common entity name 
     478    my $testText = "A \x{eb} B &#x2640; C"; 
     479    my $expectedText = $testText; 
     480    if ( _siteCharsetIsUTF8() ) { 
     481        $expectedText =~ s/\&\#x(\w+);/chr(hex($1))/ge; 
     482        $testText = Encode::encode_utf8($testText); 
     483        $expectedText = Encode::encode_utf8($expectedText); 
     484    } 
     485 
     486    # Create the test topic 
     487    my $topicName = $this->{test_topic}."For9170"; 
     488    my $topicObject = Foswiki::Meta->new( 
     489        $this->{session}, 
     490        $this->{test_web}, 
     491        $topicName, 
     492        "Before${testText}After\n"); 
     493    $topicObject->save(); 
     494 
     495    # Open the test topic in the wysiwyg editor 
     496    $this->{editor} 
     497      ->openWysiwygEditor( $this->{test_web}, $topicName ); 
     498 
     499    # Write rubbish over the topic, which will be overwritten on save 
     500    $topicObject->text("Rubbish"); 
     501    $topicObject->save(); 
     502    undef $topicObject; 
     503 
     504    # Save from the editor 
     505    $this->{editor}->save(); 
     506 
     507    # Reload the topic and check that the content is as expected 
     508    $topicObject = Foswiki::Meta->new( 
     509        $this->{session}, 
     510        $this->{test_web}, 
     511        $topicName); 
     512 
     513    my $text = $topicObject->text(); 
     514 
     515    # Isolate the portion of interest 
     516    $text =~ s/.*Before//ms or $this->assert(0, $text); 
     517    $text =~ s/After.*//ms or $this->assert(0, $text); 
     518 
     519    # Showtime: 
     520    for ($expectedText, $text) { 
     521        s/([^\x20-\x7e])/sprintf "\\x{%X}", ord($1)/ge; 
     522    } 
     523    $this->assert_str_equals($expectedText, $text); 
     524} 
     525 
    434526 
    435527sub compareTML_HTML { 
  • trunk/WysiwygPlugin/test/unit/WysiwygPlugin/TranslatorTests.pm

    r7754 r7854  
    19881988    }, 
    19891989    { 
    1990         exec => $TML2HTML, # SMELL fails these: $HTML2TML | $ROUNDTRIP, 
    1991         name => 'entityInsideSticky', 
     1990        exec => $TML2HTML | $HTML2TML | $ROUNDTRIP, 
     1991        name => 'entityWithNoNameInsideSticky', 
    19921992        tml  => <<'GLUED', 
    19931993<sticky>&#9792;</sticky> 
     
    19981998</p> 
    19991999STUCK 
    2000 }, 
     2000    }, 
    20012001    { 
    20022002        exec => $TML2HTML | $HTML2TML | $ROUNDTRIP, 
  • trunk/WysiwygPlugin/test/unit/WysiwygPlugin/WysiwygPluginTests.pm

    r7644 r7854  
    1717use Carp; 
    1818 
     19my @unicodeCodepointsForWindows1252 = ( 
     20 
     21    # From http://www.alanwood.net/demos/ansi.html 
     22    # unicode   windows-1252 
     23    8364,    # 128 
     24    8218,    # 130 
     25    402,     # 131 
     26    8222,    # 132 
     27    8230,    # 133 
     28    8224,    # 134 
     29    8225,    # 135 
     30    710,     # 136 
     31    8240,    # 137 
     32    352,     # 138 
     33    8249,    # 139 
     34    338,     # 140 
     35    381,     # 142 
     36    8216,    # 145 
     37    8217,    # 146 
     38    8220,    # 147 
     39    8221,    # 148 
     40    8226,    # 149 
     41    8211,    # 150 
     42    8212,    # 151 
     43    732,     # 152 
     44    8482,    # 153 
     45    353,     # 154 
     46    8250,    # 155 
     47    339,     # 156 
     48    382,     # 158 
     49    376,     # 159 
     50); 
     51 
    1952my $UI_FN; 
    2053 
     
    2962    $this->SUPER::set_up(); 
    3063    $UI_FN ||= $this->getUIFn('save'); 
     64 
     65    Foswiki::Plugins::WysiwygPlugin::Constants::reinitialiseForTesting(); 
    3166 
    3267    $Foswiki::cfg{Plugins}{WysiwygPlugin}{Enabled} = 1; 
     
    5994} 
    6095 
    61 sub save_test { 
     96sub save_testCharsetCodesRange { 
    6297    my ( $this, $charset, $firstchar, $lastchar ) = @_; 
    63  
    64     $Foswiki::cfg{Site}{CharSet} = $charset; 
     98    my @test; 
     99    for ( my $i = $firstchar ; $i <= $lastchar ; $i++ ) { 
     100        push( @test, Encode::decode( _perlEncodeCharset($charset), chr($i) ) ); 
     101    } 
     102    my $text = join( '', @test ) . "."; 
     103 
     104    $this->save_test( $charset, $text, $text ); 
     105} 
     106 
     107sub save_testUnicodeCodepointsRange { 
     108    my ( $this, $charset, $firstchar, $lastchar ) = @_; 
    65109 
    66110    my @test; 
     
    69113    } 
    70114    my $text = join( '', @test ) . "."; 
    71     my $t = $charset ? Encode::encode( $charset, $text ) : $text; 
     115 
     116    $this->save_test( $charset, $text, $text ); 
     117} 
     118 
     119sub _perlEncodeCharset { 
     120    my $charset = shift; 
     121 
     122    # The default encoding is 'iso-8859-1' 
     123    # Foswiki treats that encoding like windows-1252 
     124    # Perl's Encode library treats the differently 
     125    $charset = 'windows-1252' if not $charset or $charset eq 'iso-8859-1'; 
     126    return $charset; 
     127} 
     128 
     129# $input and $expectedOutput contain unicode codepoints; 
     130# they are wide characters, NOT utf-8 encoded 
     131sub save_test { 
     132    my ( $this, $charset, $input, $expectedOutput ) = @_; 
     133 
     134    # Is this enough? Regexes are inited before we get here, aren't they? 
     135    $Foswiki::cfg{Site}{CharSet} = $charset; 
     136 
     137    my $t = 
     138      $charset 
     139      ? Encode::encode( _perlEncodeCharset($charset), $input ) 
     140      : $input; 
     141    my $e = 
     142      $charset 
     143      ? Encode::encode( _perlEncodeCharset($charset), $expectedOutput ) 
     144      : $expectedOutput; 
    72145 
    73146    my $query = new Unit::Request( 
     
    108181    $out =~ s/\s*$//s; 
    109182 
    110     $this->assert( $t eq $out, "'" . anal($out) . "' !=\n'" . anal($t) . "'" ); 
    111 } 
    112  
    113 sub TML2HTML_test { 
     183    $this->assert( $e eq $out, "'" . anal($out) . "' !=\n'" . anal($e) . "'" ); 
     184} 
     185 
     186sub TML2HTML_testCharsetCodesRange { 
    114187    my ( $this, $charset, $firstchar, $lastchar ) = @_; 
    115  
    116     # Is this enough? Regexes are inited before we get here, aren't they? 
    117     $Foswiki::cfg{Site}{CharSet} = $charset; 
     188    my @test; 
     189    for ( my $i = $firstchar ; $i <= $lastchar ; $i++ ) { 
     190        push( @test, Encode::decode( _perlEncodeCharset($charset), chr($i) ) ); 
     191    } 
     192    my $text = join( '', @test ) . "."; 
     193 
     194    $this->TML2HTML_test( $charset, $text, $text ); 
     195} 
     196 
     197sub TML2HTML_testUnicodeCodepointsRange { 
     198    my ( $this, $charset, $firstchar, $lastchar ) = @_; 
    118199 
    119200    my @test; 
     
    122203    } 
    123204    my $text = join( '', @test ) . "."; 
     205 
     206    $this->TML2HTML_test( $charset, $text, $text ); 
     207} 
     208 
     209# $input and $expectedOutput contain unicode codepoints; 
     210# they are wide characters, NOT utf-8 encoded 
     211sub TML2HTML_test { 
     212    my ( $this, $charset, $input, $expectedOutput ) = @_; 
     213 
     214    # Is this enough? Regexes are inited before we get here, aren't they? 
     215    $Foswiki::cfg{Site}{CharSet} = $charset; 
     216 
    124217    my $query = new Unit::Request( 
    125218        { 
     
    127220 
    128221            # REST parameters are always UTF8 encoded 
    129             'text' => [ Encode::encode_utf8($text) ], 
     222            'text' => [ Encode::encode_utf8($input) ], 
    130223        } 
    131224    ); 
     
    133226 
    134227    my $foswiki = new Foswiki( 'guest', $query ); 
    135     $foswiki->{response}->charset($charset) if $charset; 
     228    $foswiki->{response}->charset($charset) 
     229      if $charset;    # why? REST responses are supposed to be UTF-8 encoded 
    136230 
    137231    my ( $out, $result ) = $this->captureWithKey( 
     
    155249 
    156250    my $id = "<!--$Foswiki::Plugins::WysiwygPlugin::Handlers::SECRET_ID-->"; 
    157     $this->assert( $out =~ s/^\s*$id<p>\s*//s, anal($out) ); 
    158     $out =~ s/\s*<\/p>\s*$//s; 
    159  
    160     require Foswiki::Plugins::WysiwygPlugin::Constants; 
    161     Foswiki::Plugins::WysiwygPlugin::Constants::mapUnicode2HighBit($out); 
    162  
    163     $this->assert( $text eq $out, 
    164         "'" . anal($out) . "' !=\n'" . anal($text) . "'" ); 
     251    $this->assert( $out =~ s/^\s*$id<p>[ \t\n]*//s, anal($out) ); 
     252    $out =~ s/[ \t\n]*<\/p>\s*$//s; 
     253 
     254    $this->assert( $expectedOutput eq $out, 
     255        "'" . anal($out) . "' !=\n'" . anal($expectedOutput) . "'" ); 
    165256    $foswiki->finish(); 
    166257} 
    167258 
    168 sub HTML2TML_test { 
     259sub HTML2TML_testCharsetCodesRange { 
    169260    my ( $this, $charset, $firstchar, $lastchar ) = @_; 
    170  
    171     # Is this enough? Regexes are inited before we get here, aren't they? 
    172     $Foswiki::cfg{Site}{CharSet} = $charset; 
     261    my @test; 
     262    for ( my $i = $firstchar ; $i <= $lastchar ; $i++ ) { 
     263        push( @test, Encode::decode( _perlEncodeCharset($charset), chr($i) ) ); 
     264    } 
     265    my $text = join( '', @test ) . "."; 
     266 
     267    $this->HTML2TML_test( $charset, $text, $text ); 
     268} 
     269 
     270sub HTML2TML_testUnicodeCodepointsRange { 
     271    my ( $this, $charset, $firstchar, $lastchar ) = @_; 
    173272 
    174273    my @test; 
     
    177276    } 
    178277    my $text = join( '', @test ) . "."; 
     278 
     279    $this->HTML2TML_test( $charset, $text, $text ); 
     280} 
     281 
     282# $input and $expectedOutput contain unicode codepoints; 
     283# they are wide characters, NOT utf-8 encoded 
     284sub HTML2TML_test { 
     285    my ( $this, $charset, $input, $expectedOutput ) = @_; 
     286 
     287    # Is this enough? Regexes are inited before we get here, aren't they? 
     288    $Foswiki::cfg{Site}{CharSet} = $charset; 
     289 
    179290    my $query = new Unit::Request( 
    180291        { 
     
    182293 
    183294            # REST parameters are always UTF8 encoded 
    184             'text' => [ Encode::encode_utf8($text) ], 
     295            'text' => [ Encode::encode_utf8($input) ], 
    185296        } 
    186297    ); 
    187298    $query->method('GET'); 
    188299    my $foswiki = new Foswiki( 'guest', $query ); 
    189     $foswiki->{response}->charset($charset) if $charset; 
     300    $foswiki->{response}->charset($charset) 
     301      if $charset;    # why? REST responses are supposed to be UTF-8 encoded 
    190302 
    191303    my ( $out, $result ) = $this->captureWithKey( 
     
    208320    $out = Encode::decode_utf8($out); 
    209321 
    210     require Foswiki::Plugins::WysiwygPlugin::Constants; 
    211     Foswiki::Plugins::WysiwygPlugin::Constants::mapUnicode2HighBit($out); 
    212  
    213322    $out =~ s/\s*$//s; 
    214323 
    215     $this->assert_str_equals( $text, $out, 
    216         "'" . anal($out) . "' !=\n'" . anal($text) . "'" ); 
     324    $this->assert_str_equals( $expectedOutput, $out, 
     325        "'" . anal($out) . "' !=\n'" . anal($expectedOutput) . "'" ); 
    217326    $foswiki->finish(); 
    218327} 
     
    221330sub test_restTML2HTML_undef { 
    222331    my $this = shift; 
    223     $this->TML2HTML_test( undef, 127, 255 ); 
     332    $this->TML2HTML_testUnicodeCodepointsRange( undef, 160, 255 ); 
     333 
     334    # Browsers commonly treat iso-8859-1 as if it is windows-1252 
     335    # and so does Foswiki 
     336    my $unicodeOfWindows1252 = 
     337      join( '', map { chr($_) } @unicodeCodepointsForWindows1252 ); 
     338 
     339    $this->TML2HTML_test( undef, $unicodeOfWindows1252, $unicodeOfWindows1252 ); 
     340 
     341    $this->TML2HTML_test( undef, chr(0x3B1) . chr(0x2640), '&alpha;&#x2640;' ); 
    224342} 
    225343 
    226344sub test_restTML2HTML_iso_8859_1 { 
    227345    my $this = shift; 
    228     $this->TML2HTML_test( 'iso-8859-1', 127, 255 ); 
     346    $this->TML2HTML_testUnicodeCodepointsRange( 'iso-8859-1', 160, 255 ); 
     347 
     348    # Browsers commonly treat iso-8859-1 as if it is windows-1252 
     349    # and so does Foswiki 
     350    my $unicodeOfWindows1252 = 
     351      join( '', map { chr($_) } @unicodeCodepointsForWindows1252 ); 
     352 
     353    $this->TML2HTML_test( 'iso-8859-1', $unicodeOfWindows1252, 
     354        $unicodeOfWindows1252 ); 
     355 
     356    $this->TML2HTML_test( 'iso-8859-1', chr(0x3B1) . chr(0x2640), 
     357        '&alpha;&#x2640;' ); 
     358} 
     359 
     360sub test_restTML2HTML_iso_8859_7 { 
     361    my $this = shift; 
     362 
     363    $this->TML2HTML_testCharsetCodesRange( 'iso-8859-7', 160, 173 ); 
     364    $this->TML2HTML_testCharsetCodesRange( 'iso-8859-7', 175, 209 ); 
     365    $this->TML2HTML_testCharsetCodesRange( 'iso-8859-7', 211, 254 ); 
    229366} 
    230367 
    231368sub test_restTML2HTML_iso_8859_15 { 
    232369    my $this = shift; 
    233     $this->TML2HTML_test( 'iso-8859-15', 127, 163 ); 
    234     $this->TML2HTML_test( 'iso-8859-15', 169, 179 ); 
    235     $this->TML2HTML_test( 'iso-8859-15', 181, 183 ); 
    236     $this->TML2HTML_test( 'iso-8859-15', 191, 255 ); 
     370    $this->TML2HTML_testUnicodeCodepointsRange( 'iso-8859-15', 127, 163 ); 
     371    $this->TML2HTML_testUnicodeCodepointsRange( 'iso-8859-15', 169, 179 ); 
     372    $this->TML2HTML_testUnicodeCodepointsRange( 'iso-8859-15', 181, 183 ); 
     373    $this->TML2HTML_testUnicodeCodepointsRange( 'iso-8859-15', 191, 255 ); 
     374 
     375    # These are the codes that are different to iso-8859-1, and thus 
     376    # different to unicode 
     377    for my $code ( 0xA4, 0xA6, 0xA8, 0xB4, 0xBC, 0xBD, 0xBE ) { 
     378        $this->TML2HTML_testCharsetCodesRange( 'iso-8859-15', $code, $code ); 
     379    } 
    237380} 
    238381 
    239382sub test_restTML2HTML_utf_8 { 
    240383    my $this = shift; 
    241     $this->TML2HTML_test( 'utf-8', 127, 300 ); 
    242     $this->TML2HTML_test( 'utf-8', 301, 400 ); 
    243     $this->TML2HTML_test( 'utf-8', 401, 500 ); 
     384    $this->TML2HTML_testUnicodeCodepointsRange( 'utf-8', 127, 300 ); 
     385    $this->TML2HTML_testUnicodeCodepointsRange( 'utf-8', 301, 400 ); 
     386    $this->TML2HTML_testUnicodeCodepointsRange( 'utf-8', 401, 500 ); 
    244387 
    245388    # Chinese 
    246     $this->TML2HTML_test( 'utf-8', 8000, 9000 ); 
     389    $this->TML2HTML_testUnicodeCodepointsRange( 'utf-8', 8000, 9000 ); 
    247390} 
    248391 
    249392sub test_restHTML2TML_undef { 
    250393    my $this = shift; 
    251     $this->HTML2TML_test( undef, 127, 255 ); 
     394    $this->HTML2TML_testUnicodeCodepointsRange( undef, 160, 255 ); 
     395 
     396    # Browsers commonly treat iso-8859-1 as if it is windows-1252 
     397    # and so does Foswiki 
     398    my $unicodeOfWindows1252 = 
     399      join( '', map { chr($_) } @unicodeCodepointsForWindows1252 ); 
     400 
     401    $this->HTML2TML_test( undef, $unicodeOfWindows1252, $unicodeOfWindows1252 ); 
    252402} 
    253403 
    254404sub test_restHTML2TML_iso_8859_1 { 
    255405    my $this = shift; 
    256     $this->HTML2TML_test( 'iso-8859-1', 127, 255 ); 
     406    $this->HTML2TML_testUnicodeCodepointsRange( 'iso-8859-1', 160, 255 ); 
     407 
     408    # Browsers commonly treat iso-8859-1 as if it is windows-1252 
     409    # and so does Foswiki 
     410    my $unicodeOfWindows1252 = 
     411      join( '', map { chr($_) } @unicodeCodepointsForWindows1252 ); 
     412 
     413    $this->HTML2TML_test( 'iso-8859-1', $unicodeOfWindows1252, 
     414        $unicodeOfWindows1252 ); 
     415} 
     416 
     417sub test_restHTML2TML_iso_8859_7 { 
     418    my $this = shift; 
     419 
     420    $this->HTML2TML_testCharsetCodesRange( 'iso-8859-7', 160, 173 ); 
     421    $this->HTML2TML_testCharsetCodesRange( 'iso-8859-7', 175, 209 ); 
     422    $this->HTML2TML_testCharsetCodesRange( 'iso-8859-7', 211, 254 ); 
    257423} 
    258424 
    259425sub test_restHTML2TML_iso_8859_15 { 
    260426    my $this = shift; 
    261     $this->HTML2TML_test( 'iso-8859-15', 127, 163 ); 
    262     $this->HTML2TML_test( 'iso-8859-15', 169, 179 ); 
    263     $this->HTML2TML_test( 'iso-8859-15', 181, 183 ); 
    264     $this->HTML2TML_test( 'iso-8859-15', 191, 255 ); 
     427    $this->HTML2TML_testUnicodeCodepointsRange( 'iso-8859-15', 127, 163 ); 
     428    $this->HTML2TML_testUnicodeCodepointsRange( 'iso-8859-15', 169, 179 ); 
     429    $this->HTML2TML_testUnicodeCodepointsRange( 'iso-8859-15', 181, 183 ); 
     430    $this->HTML2TML_testUnicodeCodepointsRange( 'iso-8859-15', 191, 255 ); 
     431 
     432    # These are the codes that are different to iso-8859-1, and thus 
     433    # different to unicode 
     434    for my $code ( 0xA4, 0xA6, 0xA8, 0xB4, 0xBC, 0xBD, 0xBE ) { 
     435        $this->HTML2TML_testCharsetCodesRange( 'iso-8859-15', $code, $code ); 
     436    } 
    265437} 
    266438 
    267439sub test_restHTML2TML_utf_8 { 
    268440    my $this = shift; 
    269     $this->HTML2TML_test( 'utf-8', 127, 300 ); 
    270     $this->HTML2TML_test( 'utf-8', 301, 400 ); 
    271     $this->HTML2TML_test( 'utf-8', 401, 500 ); 
     441    $this->HTML2TML_testUnicodeCodepointsRange( 'utf-8', 127, 300 ); 
     442    $this->HTML2TML_testUnicodeCodepointsRange( 'utf-8', 301, 400 ); 
     443    $this->HTML2TML_testUnicodeCodepointsRange( 'utf-8', 401, 500 ); 
    272444 
    273445    # Chinese 
    274     $this->HTML2TML_test( 'utf-8', 8000, 9000 ); 
     446    $this->HTML2TML_testUnicodeCodepointsRange( 'utf-8', 8000, 9000 ); 
    275447} 
    276448 
    277449sub test_save_undef { 
    278450    my $this = shift; 
    279     $this->save_test( undef, 127, 255 ); 
     451    $this->save_testUnicodeCodepointsRange( undef, 127, 128 ); 
     452    $this->save_testUnicodeCodepointsRange( undef, 130, 140 ); 
     453    $this->save_testUnicodeCodepointsRange( undef, 142, 142 ); 
     454    $this->save_testUnicodeCodepointsRange( undef, 145, 156 ); 
     455    $this->save_testUnicodeCodepointsRange( undef, 158, 255 ); 
    280456} 
    281457 
    282458sub test_save_iso_8859_1 { 
    283459    my $this = shift; 
    284     $this->save_test( 'iso-8859-1', 127, 255 ); 
     460    $this->save_testUnicodeCodepointsRange( 'iso-8859-1', 160, 255 ); 
     461 
     462    # Browsers commonly treat iso-8859-1 as if it is windows-1252 
     463    # and so does Foswiki 
     464    my $unicodeOfWindows1252 = 
     465      join( '', map { chr($_) } @unicodeCodepointsForWindows1252 ); 
     466 
     467    $this->save_test( 'iso-8859-1', $unicodeOfWindows1252, 
     468        $unicodeOfWindows1252 ); 
     469} 
     470 
     471sub test_save_iso_8859_7 { 
     472    my $this = shift; 
     473 
     474    $this->save_testCharsetCodesRange( 'iso-8859-7', 160, 173 ); 
     475    $this->save_testCharsetCodesRange( 'iso-8859-7', 175, 209 ); 
     476    $this->save_testCharsetCodesRange( 'iso-8859-7', 211, 254 ); 
    285477} 
    286478 
    287479sub test_save_iso_8859_15 { 
    288480    my $this = shift; 
    289     $this->save_test( 'iso-8859-15', 127, 163 ); 
    290     $this->save_test( 'iso-8859-15', 169, 179 ); 
    291     $this->save_test( 'iso-8859-15', 181, 183 ); 
    292     $this->save_test( 'iso-8859-15', 191, 255 ); 
     481    $this->save_testUnicodeCodepointsRange( 'iso-8859-15', 127, 163 ); 
     482    $this->save_testUnicodeCodepointsRange( 'iso-8859-15', 169, 179 ); 
     483    $this->save_testUnicodeCodepointsRange( 'iso-8859-15', 181, 183 ); 
     484    $this->save_testUnicodeCodepointsRange( 'iso-8859-15', 191, 255 ); 
     485 
     486    # These are the codes that are different to iso-8859-1, and thus 
     487    # different to unicode 
     488    for my $code ( 0xA4, 0xA6, 0xA8, 0xB4, 0xBC, 0xBD, 0xBE ) { 
     489        $this->save_testCharsetCodesRange( 'iso-8859-15', $code, $code ); 
     490    } 
    293491} 
    294492 
    295493sub test_save_utf_8a { 
    296494    my $this = shift; 
    297     $this->save_test( 'utf-8', 127, 300 ); 
     495    $this->save_testUnicodeCodepointsRange( 'utf-8', 127, 300 ); 
    298496} 
    299497 
    300498sub test_save_utf_8b { 
    301499    my $this = shift; 
    302     $this->save_test( 'utf-8', 301, 400 ); 
     500    $this->save_testUnicodeCodepointsRange( 'utf-8', 301, 400 ); 
    303501} 
    304502 
    305503sub test_save_utf_8d { 
    306504    my $this = shift; 
    307     $this->save_test( 'utf-8', 401, 500 ); 
     505    $this->save_testUnicodeCodepointsRange( 'utf-8', 401, 500 ); 
    308506} 
    309507 
     
    312510 
    313511    # Chinese 
    314     $this->save_test( 'utf-8', 8000, 9000 ); 
     512    $this->save_testUnicodeCodepointsRange( 'utf-8', 8000, 9000 ); 
    315513} 
    316514 
Note: See TracChangeset for help on using the changeset viewer.