Old 12th June 2005, 14:50   #1
Instructor
Major Dude
 
Join Date: Jul 2004
Posts: 671
Plugin for Unicode files conversion

Features:
-Convert file from Unicode to ANSI
-Convert file from ANSI to Unicode
Conversions supported:
"UTF-8" <-> ANSI
"UTF-16LE" <-> ANSI
"UTF-16BE" <-> ANSI

-Get file unicode type:
"NONE" - None Unicode
"UTF-8" - 8-bit Variable Width (Web)
"UTF-16LE|UCS-2LE" - 16-bit Little Endian (Default for Windows)
"UTF-16BE|UCS-2BE" - 16-bit Big Endian (Default for Linux)
"UTF-32LE|UCS-4LE" - 32-bit Little Endian
"UTF-32BE|UCS-4BE" - 32-bit Big Endian

"unicode" v1.0
Attached Files
File Type: zip unicode.zip (7.6 KB, 1326 views)
Instructor is offline   Reply With Quote
Old 18th November 2008, 12:30   #2
tpr
Junior Member
 
Join Date: Jan 2007
Posts: 8
Many thanks for this. I needed exactly this one.
tpr is offline   Reply With Quote
Old 26th February 2009, 15:04   #3
AxelMock
Junior Member
 
Join Date: Apr 2007
Location: Seltz, France
Posts: 46
Re:Plugin for Unicode files conversion

Hi,

I used this plugin to convert an .inf file from Unicode to ANSI to search for some value in the .inf file using the standard (non-Unicode) NSIS version.

Worked fine on the systems we tested here (German, English), but a customer reported that the application would stop (exception) on a chinese windows system (IDENTICAL .inf file).

Testing showed that japanese systems were affected too.

I started debugging, made a debug version of the DLL and a small test program. I ended debugging on the japanese and chinese system finding that the Call to kernel32::WideCharToMultiByte delivers the exception.
According to Microsoft documentation of that function the call made in unicode.dll is correct (using CP_ACP and no user defined replacement).
I limited the conversion to the starting file part woth NO language specific unicode character, and the function succeeds. IMHO must be some problem of the function with the ANSI codepage on these systems.

Playing around with some option flags, I could avoid the exeception, but the conversion did NOT take place (0 chars converted).

In my case a working function FileUnicode2UTF8 could have helped. I just could'n t figure out how to dimension the buffer as a UTF-8 char might take up multiple byte chars.

IMHO:This Unicode conversion plugin will get more important when the Unicode version of NSIS will be more widely used.
(Or does the NSIS Unicode branch supply its own conversion tools/keywords?)
AxelMock is offline   Reply With Quote
Old 27th February 2009, 16:21   #4
akopts
Junior Member
 
Join Date: Feb 2009
Posts: 5
Is there a way to convert from UTF-16LE to UTF-8?
akopts is offline   Reply With Quote
Old 2nd March 2009, 16:20   #5
AxelMock
Junior Member
 
Join Date: Apr 2007
Location: Seltz, France
Posts: 46
Quote:
Originally posted by akopts
Is there a way to convert from UTF-16LE to UTF-8?
See
Comment in NSIS Unicode Thread
AxelMock is offline   Reply With Quote
Old 2nd March 2009, 17:24   #6
AxelMock
Junior Member
 
Join Date: Apr 2007
Location: Seltz, France
Posts: 46
Updated version V1.1 available

Quote:
Originally posted by AxelMock
See
Comment in NSIS Unicode Thread
The updated version V1.1 is available.
Wiki updated too.
New function: FileUnicode2UTF8
AxelMock is offline   Reply With Quote
Old 29th July 2010, 22:38   #7
gringoloco023
Member
 
Join Date: Nov 2009
Posts: 52
Bugs, the following line:
unicode::FileUnicode2Ansi "$EXEDIR\UTF-16LE.txt" "$EXEDIR\Temp.txt" "UTF-16LE"

unicode.dll v1.0 : adds a question-mark to the beginning of 'Temp.txt'
unicode.dll v1.1 : just creates an empty 'Temp.txt' file, but returns 0

Where 'UTF-16LE' is the file from the example script, but it seems to happen to all utf-16LE files !
gringoloco023 is offline   Reply With Quote
Old 1st September 2010, 16:42   #8
koelzk
Junior Member
 
Join Date: Aug 2010
Posts: 4
FileUnicode2UTF8 fails in NSIS Unicode

Hi,

I am working on an installer for the NSIS Unicode build. I wanted to use FileUnicode2UTF8 to convert a text file from UCS-2 LE to UTF-8. The background is I want to write a user selected folder path into a configuration file that uses UTF-8 encoding.

However, I just can't get the plug-in to convert the file, I always get error code 2. Is this a problem of NSIS Unicode or is my script file wrong (see attached file)?
Attached Files
File Type: nsi test.nsi (601 Bytes, 500 views)
koelzk is offline   Reply With Quote
Old 1st September 2010, 17:57   #9
gringoloco023
Member
 
Join Date: Nov 2009
Posts: 52
unicode plug-in does not work in Unicode Nsis !

As there where a few utf-16 functions missing in TextFunc.nsh, I updated it one day and included some extras for ${FileRecode}

Aswell I made a couple of minor adjustments to your script:
PHP Code:
!include TextFuncUTF16.nsh
Section 
""
    
Write Settings.ini:
    
FileOpen $"$EXEDIR\Settings.ini" w
    StrCpy $R0 
"Folder=セちさ" Sample unicode string with 3 Japanese characters
    FileWriteWord 
$0 0xFEFF write the BOM
    FileWriteUTF16LE 
$0 $R0
    FileClose 
$0
    
Convert file from Unicode to UTF-8
    StrCpy 
$"$EXEDIR\Settings.ini"
    
StrCpy $"$EXEDIR\Settings2.ini"
    
StrCpy $"ToUTF8"
    
CopyFiles /SILENT $$1
    
${FileRecode} $$2
    
; Print some information
    DetailPrint 
'FileRecode to UTF8 "$0" "$1" $2'
    
DetailPrint "$3"
    
DetailPrint ""
SectionEnd 
BTW: I remember I used to get the occasional crash whenever I done repetitive re-coding. Although I found out how to fix it I did not get around to it yet. If you're experiencing any problems with it I will put it higher on my priority list .
Attached Files
File Type: nsh TextFuncUTF16.nsh (32.9 KB, 523 views)
gringoloco023 is offline   Reply With Quote
Old 1st September 2010, 20:06   #10
koelzk
Junior Member
 
Join Date: Aug 2010
Posts: 4
Thanks for the quick reply . Your script is a big help. However, as you said, there still seems to be a bug in the conversion.

When I run the script, a few random characters are added to the end of the converted text file. The number of random characters differs from run to run (usually 3), sometimes no characters are added and the converted file is fine.
koelzk is offline   Reply With Quote
Old 1st September 2010, 21:08   #11
gringoloco023
Member
 
Join Date: Nov 2009
Posts: 52
Hmm....

I never experienced random characters on the end of the file.

Just to make sure, you are not talking about the BOM for utf-8 (  ) ?
Which re-coding are you doing ? From utf-16LE to utf-8 ?

Anyway, I will look into it these days... (shouldn't be that much work)

Edit: Just to remind you, your Japanese characters take 6 bytes in utf-16, but 9 bytes in utf-8.
utf-8:
EF BB BF 46 6F 6C 64 65 72 3D E3 82 BB E3 81 A1 E3 81 95 = Folder=セちさ

utf-16LE:
FF FE 46 00 6F 00 6C 00 64 00 65 00 72 00 3D 00 BB 30 61 30 55 30 = ÿþF.o.l.d.e.r.=.»0a0U0
gringoloco023 is offline   Reply With Quote
Old 2nd September 2010, 17:12   #12
koelzk
Junior Member
 
Join Date: Aug 2010
Posts: 4
No, I don't think it's a byte order mark, just a few random bytes (and the bom at the beginning of the Settings2.ini is correct.). Notepad++ also shows them as additional characters.

I modified the script so the same string is written with line terminator three times into the Settings.ini, which seems to provoke this bug more often. See the attached script and output files (I compiled the script with NSIS Unicode 2.46 and launched the exe on Windows 7 and XP). I hope this help to track the error. Thanks for taking a look into this issue
Attached Files
File Type: zip script.zip (841 Bytes, 469 views)
koelzk is offline   Reply With Quote
Old 3rd September 2010, 12:45   #13
gringoloco023
Member
 
Join Date: Nov 2009
Posts: 52
Hmm... I see what you mean now.

So long for my testing

I first have to finish the project I'm working on these days, then by the weekend I will have time to look at this issue.

thanx for reporting it
gringoloco023 is offline   Reply With Quote
Old 4th September 2010, 21:55   #14
gringoloco023
Member
 
Join Date: Nov 2009
Posts: 52
Fixed ?

I was not allocating any space to fit the terminating null-byte before ReadFile(), so the string ended in any kind of random characters.

Not sure how I could have missed that before

Anyway, give it ago (the script you send me is working fine for me)
Attached Files
File Type: nsh TextFuncUTF16.nsh (33.0 KB, 613 views)
gringoloco023 is offline   Reply With Quote
Old 5th September 2010, 20:44   #15
koelzk
Junior Member
 
Join Date: Aug 2010
Posts: 4
Cool, thanks alot! Now the installer of my plotter app can write a user selected path into the settings file even when it's Chinese. You really helped me out
koelzk is offline   Reply With Quote
Old 30th September 2010, 14:13   #16
Legace
Junior Member
 
Join Date: Jun 2008
Posts: 9
Thanks a lot gringoloco023!
Legace is offline   Reply With Quote
Old 16th June 2022, 17:59   #17
pkonduru
Member
 
Join Date: Jul 2015
Posts: 86
Hi All,
I have tried to use this plugin to convert my install.log which is UTF-16LE to utf-8 but I haven't been successful. The file is utf-16LE. I checked this by opening the file in the traditional windows notepad.This is what I used:
StrCpy $0 "$INSTDIR\install.log"
StrCpy $1 "$INSTDIR\install.log"
StrCpy $2 "AUTO"

unicode::FileUnicode2UTF8 "$0" "$1" "$2"
Pop $3
MessageBox MB_ICONINFORMATION|MB_OK "Result is : $3"

I keep getting "2" which means
"2" - wrong UnicodeType specified

I have tried various options for StrCpy $2 "AUTO":
"AUTO" - autodetect Unicode type
"UTF-8" - force Unicode type as UTF-8
"UTF-16LE" - force Unicode type as UTF-16LE
"UTF-16BE" - force Unicode type as UTF-16BE
But none of them worked.
pkonduru is offline   Reply With Quote
Old 17th June 2022, 01:54   #18
Anders
Moderator
 
Anders's Avatar
 
Join Date: Jun 2002
Location: ${NSISDIR}
Posts: 5,539
Here is a simple one for Unicode installers:

PHP Code:
Unicode True
RequestExecutionLevel User

!include LogicLib.nsh
Function U16ToU8
System
::Store S
Pop 
$1
Pop 
$2
System
::Call 'USER32::GetClientRect(p0,@r4)'
ClearErrors
FileOpen 
$$2 r
${If} $!= ""
    
FileOpen $$1 w
    
${If} $!= ""
    
IntOp $${NSIS_MAX_STRLEN} - 24
    IntOp 
$$* ${NSIS_CHAR_SIZE}
    
FileWriteByte $1 0xEF
    FileWriteByte 
$1 0xBB
    FileWriteByte 
$1 0xBF
loop
:
        
FileReadUTF16LE $$3
        IfErrors done
        System
::Call 'KERNEL32::WideCharToMultiByte(i65001, i0, wr3, i-1, pr4, ir5, p0, p0)i.r6'
        
${IfThen} $6 U${|} IntOp $$${|}
        
System::Call 'KERNEL32::WriteFile(pr1, pr4, ir6, *i, p0)i.r6'
        
${IfThen} $<> ${|} Goto loop ${|}
done:
    ${EndIf}
    
FileClose $2
${EndIf}
System::Store L
FunctionEnd
!macro U16ToU8 input output
Push 
"${input}"
Push "${output}"
Call U16ToU8
!macroend

Section
FileOpen 
$"$exedir\testu16.txt" w
FileWriteUTF16LE 
/BOM $"Hello from $\r$\n"
FileWriteUTF16LE $"${U+2115}SIS$\r$\n"
FileClose $1

!insertmacro U16ToU8 "$exedir\testu16.txt" "$exedir\testu8.txt"
SectionEnd 

IntOp $PostCount $PostCount + 1
Anders is offline   Reply With Quote
Old 18th June 2022, 11:52   #19
JasonFriday13
Major Dude
 
JasonFriday13's Avatar
 
Join Date: May 2005
Location: New Zealand
Posts: 923
I may look at updating the plugin itself to be unicode compatible, it's mainly the plugin exports and file path variables that have to be changed. Probably no need for it to be "always loaded" though.

"Only a MouseHelmet will save you from a MouseTrap" -Jason Ross (Me)
NSIS 3 POSIX Ninja
Wiki Profile
JasonFriday13 is offline   Reply With Quote
Old 21st June 2022, 21:31   #20
pkonduru
Member
 
Join Date: Jul 2015
Posts: 86
Thank you Anders, that worked for me.
pkonduru is offline   Reply With Quote
Old 25th June 2022, 11:30   #21
JasonFriday13
Major Dude
 
JasonFriday13's Avatar
 
Join Date: May 2005
Location: New Zealand
Posts: 923
I updated the plugin for NSIS v3.x, and I made a separate download for my modified version on the wiki.

https://nsis.sourceforge.io/Unicode_plug-in.

"Only a MouseHelmet will save you from a MouseTrap" -Jason Ross (Me)
NSIS 3 POSIX Ninja
Wiki Profile
JasonFriday13 is offline   Reply With Quote
Reply
Go Back   Winamp & Shoutcast Forums > Developer Center > NSIS Discussion

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump