Plaintext backup does not produce a valid XML file with some unicode characters
Created by: whiver
This bug was first mentioned in #438 (closed).
Bug description
When creating a plaintext backup, wide characters are stored using what it seems to be "surrogates" characters, which are considered as invalid in an XML file. Though, as the XML is set to UTF-8 encoding, the proper unicode codes should be used for the smileys and not this trick. A strange behaviour is that not every smiley is concerned. See below:
Example of exported SMS:
Helloooo, un weekend pluvieux qui s'annonce, mais qui s'en soucie ?
☂ �� ⛪️ Ce soir, un concert pas comme les autres : électro, son
et lumière à l'Eglise Saint-Merry : http://hij.am/2wy5u ��
�� Demain, exposition sur Herb Ritts, un virtuose de la photo de mode :
http://hij.am/d9bz7 ���� Et dimanche, c'est
l'anniversaire de James Gandolfini (RIP), l'acteur principal de la série Soprano. Un bon
prétexte pour se (re)faire quelques saisons dans son lit :) Profite bien !
Here I get an Invalid character reference
on character �
:
I also noticed that SMS backup & restore does not use these surrogates codes on export (although it can read them during import) so I can read directly the XML file in the browser (which is automatically linked to an XSLT stylesheet).
For more information please see the original thread: https://github.com/SilenceIM/Silence/issues/438#issuecomment-359573265
How to reproduce
- have at least one SMS with smileys, such as
😎 - run a plaintext export
- try to open the generated file, using Firefox by example.
Actual result: An error is thrown, like Parse error: invalid character reference
Expected result: We should see the XML file
Device info
- Device: Asus Zenfone Selfie
- Android version: 6.0.1
- Silence version: 0.15.10