SAUK Discussion Board - View Single Post

Sisyphus · #42 1st August 2019, 11:08

My Brain will not leave this alone.

My understanding to date:-

Issue 1: Unicode has been deliberately broken using the swear filter for some reason that I will probably sleep better not knowing. Let us assume it is going to stay that way.

Issue 2: For me, extended punctuation marks are accepted but are then trashed if the post is subsequently edited.

The second issue happens when characters are used that are included in charset=Windows-1252 but not included in charset=ISO-8859-1.

Looking online it does appear that these two charsets are not handled consistently and sometimes considered synonymous.

Could it be these two different charsets (or two different interpretations of the same charset) are active in two different sections of the board’s engine room and active translation between them cause promotion of the extended punctuation marks to Unicode and BAM! The swear filter kicks in?

The solution would seem to be to ensure that only one charset is used throughout and that it contains the extended punctuation marks, i.e. charset=windows-1252.

In August 2018 a page header was:-
<html xmlns="http://www.w3.org/1999/xhtml" dir="ltr" lang="en"><head>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
<meta name="generator" content="vBulletin 3.8.7">

Today it is:-
<html xmlns="http://www.w3.org/1999/xhtml" dir="ltr" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
<meta name="generator" content="vBulletin 3.8.7" />

This makes me think that there may be two different interpretations of ISO-8859-1 being used at the same time.

Of course I could be completely wrong. It has happened before.