Welcome! Log In Create A New Profile

Advanced

Unicode/UTF-8 file names Oxnas

Posted by renojim 
Unicode/UTF-8 file names Oxnas
April 17, 2015 05:18PM
So I got one each of the Mobile and Pro $5 Pogoplugs. The setup of each went pretty smoothly, but I cannot get the Pro to handle file names with Unicode characters. For example, I have a file on a share named "Ãnima.jpg". On the Mobile, a directory listing shows the proper file name, but on the Pro I get "?nima.jpg". I thought maybe it was a Samba issue, but even doing the following doesn't work:
echo Ãnima.jpg > /tmp/Ãnima.txt
The listing of /tmp still shows "?nima.txt". I'm at my wits end. I've been through locales and I've tried to figure out what's different about the setup of the Pro and the Mobile, but they're pretty much identical except that the Pro runs an Oxnas kernel (v3.18.5).

Any help would be greatly appreciated!
-JT
Re: Unicode/UTF-8 file names Oxnas
April 17, 2015 05:38PM
JT,

Try the latest kernel 4.0 in the release thread:
http://forum.doozan.com/read.php?2,16044

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Re: Unicode/UTF-8 file names Oxnas
April 17, 2015 05:59PM
Thanks bodhi. I was making my way throught the OXNAS kernel thread and noticed this:

Quote
robert1968@gmail.com
Hi Bodhi,

You planned to make new kernel for V3 Oxnas in the future with some kernel modules enabled.
-keyboard
May I ask to enable also
-nls_utf8

It is also not urgent at all... Im fine with the present 3.17. and my usb mouse button workaround.

Regards,
Robert

I figured it was time to look into v4!
-JT
Re: Unicode/UTF-8 file names Oxnas
April 17, 2015 06:34PM
Sadly, no joy w/4.0. :( I still get "?nima.jpg". I'm starting to think this is more than just some little issue.

-JT
Re: Unicode/UTF-8 file names Oxnas
April 17, 2015 06:59PM
renojim Wrote:
-------------------------------------------------------
> Sadly, no joy w/4.0. :( I still get "?nima.jpg".
> I'm starting to think this is more than just some
> little issue.
>
> -JT

Yes. There are a few thing you'd need to do to display the characters correctly. Do you have a list of things you did? or did you just use the defaul basic 3.16/ 3.17 rootfs?

I should mention that I'm able to see many non-English fonts on my Pogo Pro. So it is probably just a set up problem for yours.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)



Edited 1 time(s). Last edit at 04/17/2015 07:05PM by bodhi.
Re: Unicode/UTF-8 file names Oxnas
April 17, 2015 08:41PM
As always, thanks so much for you help.

I followed Qui's guide first, but then I thought I borked something with my various locale experiments to get the characters to display, so I wiped the flash drive and reloaded 3.17 OXNAS and Wheezy on it while it was connected to the Pogoplug Mobile that I just got. I then booted it on the Pro and did the apt-get update/upgrade and then upgraded to the 3.18.5 kernel. After that I installed "locales" and did a locale-gen. That's what it took on the Mobile to get things to work. After that I followed your instructions for 4.0 V3 Oxnas (by the way, there's a slight typo in step 4 on the "dpkg" instruction - should be "linux-image-4.0.0-oxnas-tld-1_1.0_armel.deb", there's a missing ".0").

I don't think it's a font issue since I'm able to see the first character when I issue the echo command I gave above and before installing locales I couldn't even see that. I'm also getting a 'UnicodeDecodeError' in a Python app that I don't get on the Mobile. I can't be sure it's related, but the error occurs when it tries to read the name of the file given above.

-JT
Re: Unicode/UTF-8 file names Oxnas
April 17, 2015 09:00PM
JT,

When I have some time I'll create an Oxnas basic rootfs USB and see if I can tell what's needed to be added.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Re: Unicode/UTF-8 file names Oxnas
April 17, 2015 09:29PM
Thank you! No rush. As I've seen someone else mention, I'm sure your time is valuable.

-JT
TEN
Re: convmv
April 18, 2015 05:43AM
renojim Wrote:
-------------------------------------------------------
> So I got one each of the Mobile and Pro
> $5
> Pogoplugs
. The setup of each went pretty
> smoothly, but I cannot get the Pro to handle file
> names with Unicode characters. For example, I
> have a file on a share named "Ãnima.jpg". On the
> Mobile, a directory listing shows the proper file
> name, but on the Pro I get "?nima.jpg". I thought
> maybe it was a Samba issue, but even doing the
> following doesn't work:
echo Ãnima.jpg >
> /tmp/Ãnima.txt

One-time application of convmv (set to test run by default) might also help if names seem to "remain broken" after copying from a file system with different character set.
Credits to Samba dev "kukks" for pointing out this utility on IRC some years ago, which proves invaluable whenever migrating old OS/2 and Windows systems to Debian-based Linux.
Re: convmv
April 18, 2015 05:38PM
Thanks, TEN, but I don't think convmv is going to help and the more I play with this the more I realize I don't understand a thing about character encoding.

I'll try to give as much detail as possible because maybe something will make sense to someone.

The "problem" file is on an NTFS formatted hard drive connected to my Dockstar. It's shared using Samba. If I mount the drive on my Windows machine the file listing appears as:
3/12/2010  23:00          19,607  Ænima.jpg

On the Dockstar, the file listing appears as:
-rwxrwxrwx 1 root root 19607 Mar 12  2010 Ãnima.jpg

Notice that they're not identical, but I don't really care about that. Now the interesting thing is how convmv views the file.

On the Dockstar:
convmv -t utf8 *nima.jpg
Your Perl version has fleas #37757 #49830
Starting a dry run without changes...
Skipping, already UTF-8: ./Ãnima.jpg
No changes to your files done. Use --notest to finally rename the files.

On the PogoPro:
convmv -t utf8 *nima.jpg
Your Perl version has fleas #37757 #49830
Starting a dry run without changes...
mv "./Ænima.jpg"        "./�nima.jpg"
No changes to your files done. Use --notest to finally rename the files.

Notice how on the Dockstar convmv says it's already UTF-8, but it shows a different name than Windows, and on the PogoPro convmv shows the correct name, but doesn't think it's UTF-8. Huh? To take Samba out of the picture, I coppied the file to a USB drive using my Windows machine and plugged it into the Pro and I get the same results (mv "./Ænima.jpg" "./�nima.jpg").

One other note, my "echo" test above doesn't seem to prove anything. Whether I do it on the Dockstar or the Pro I get the same results. The directory listing shows the file as "?nima.txt" and convmv doesn't show the file as already UTF-8 when I try to create the file in /tmp.

Now to confuse things further, if I try to create the file on the hard disk using the Dockstar I get:
echo Ãnima.jpg > Ãnima.txt
-bash: Ãnima.txt: Invalid or incomplete multibyte or wide character

When I do it on the Pro creating the file on the Samba mounted drive the echo command completes, however convmv on the Pro thinks the file isn't UTF-8 while convmv on the Dockstar thinks it is.

At this point I'm pretty confused.
-JT
Re: convmv
April 18, 2015 10:13PM
JT,

Do these steps to setup unicode (I'm using US as the example locale here). On a fresh rootfs (or use the one you already have):
#apt-get update
#apt-get install locales
add your locale to the end of /etc/locale.gen (uncomment your locale on the list):
en_US.UTF-8 UTF-8
then generate it:
#locale-gen
create file .utf-8 file (the only line in there):
#cat ~/.utf8 
export LANG=en_US.UTF-8
install unicode fonts:
#apt-get install xfonts-efont-unicode
#apt-get install xfonts-efont-unicode-ib
add .utf8 file reference to somewhere in your .profile:
#grep utf .profile 
. ~/.utf8

Exit the SSH session and log back in. You should see the unicode fonts display correctly.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)



Edited 2 time(s). Last edit at 04/18/2015 10:15PM by bodhi.
Re: Unicode/UTF-8 file names Oxnas
April 18, 2015 10:56PM
Thanks bohdi, but it didn't work. I had already done everything up to the installation of the fonts and installing the fonts didn't make a difference. I really don't think it's a font issue as I can display the characters, they just don't seem to be recognized/interpreted in a file name.

-JT
Re: Unicode/UTF-8 file names Oxnas
April 19, 2015 02:12AM
JT,

I think it might be your Windows file name already encoded differently.

Try this

On Dockstar
# touch /tmp/Ãnima.txt

On PogoPro
# rsync -a <Dockstar IP addr>:/tmp/Ãnima.txt .

# ls *.txt
Ãnima.txt

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)



Edited 1 time(s). Last edit at 04/19/2015 02:18AM by bodhi.
TEN
Re: convmv
April 19, 2015 03:01AM
renojim Wrote:
> Thanks, TEN, but I don't think convmv is going to help
> on an NTFS formatted hard drive connected to my Dockstar.
> It's shared using Samba. If I mount the drive on my Windows machine
> the file listing appears as:
3/12/2010  23:00  19,607  Ænima.jpg
> On the Dockstar, the file listing appears as:
-rwxrwxrwx 1 root root 19607 Mar 12  2010 Ãnima.jpg
> Notice that they're not identical, but I don't really care about that.

You might have to as the difference indicates the codepage/character set on NTFS or through Samba may be different (just not resulting in INVALID utf-8 characters this time), i.e. need to be specified with -f(rom) to convmv where you've only tried -t(o).
Assuming Ænima.jpg is the actual (desired) name.

On the PogoPro:
convmv -t utf8 *nima.jpg
mv "./Ænima.jpg"        "./�nima.jpg"



Edited 2 time(s). Last edit at 04/19/2015 03:03AM by TEN.
Re: Unicode/UTF-8 file names Oxnas
April 20, 2015 01:05AM
Well guys, I really appreciate your help. but I think I give up. There's something strange going on here that I don't think I'll ever understand.

TEN,
convmv -t utf8 *nima.jpg
mv "./Ænima.jpg"        "./�nima.jpg"
really makes a mess of things. I end up with a file named "./�nima.jpg". That's really not what I want.

bohdi, I tried your commands and I still get ?nima.txt in the directory listing when I copy the file to /tmp.

I've managed to get things to almost work. I had to install ntfs-3g on the Pro (Note: I didn't need to do this on the Mobile) and I now see "Ãnima.jpg" in the directory listing when I put the file on a NTFS formatted flash drive and my Python app doesn't bitch about Unicode errors, but I still get "?nima.jpg" when I mount the Samba share and the Python app pukes when it tries to read the problem file name. Maybe it's a Samba issue... maybe not... but at this point I'm a beaten man.

For now, I'll be able to work with the Pro only using connected USB devices. I don't think I really need shares to work for my purposes.

My main goal was to get the Python file server to work without having to modify it (since it works on my Dockstar and Pogo Mobile) and I've accomplished that. If the Python server had worked on the Pro in the first place, I never would have noticed that some Unicode/UTF-8 file name things work in some places but not others (I really don't understand the difference between Unicode and UTF-8 or why I sometimes see "Ãnima.jpg" and other times I see "Ænima.jpg").

I think the only thing that will help now is massive amounts of alcohol.
-JT
Re: Unicode/UTF-8 file names Oxnas
April 20, 2015 02:15AM
JT,

Hope you're not drunk yet by now :)

> On Dockstar
>
> # touch /tmp/Ãnima.txt
>
>
> On PogoPro
>
> # rsync -a <Dockstar IP addr>:/tmp/Ãnima.txt .
>

I wonder what did you see in the command line when you paste this (with a real IP address):
rsync -a 192.168.0.100:/tmp/Ãnima.txt

Did you actually see Ãnima.txt or do you see a weird character? If you see a correctly displayed character on the command line, then your utf-8 works. But your file system was screwed up and the file got created with the wrong name. But if you can't even type (paste) it on the command line then your utf-8 set up does not work.

That I think could eliminate one cause or the other.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)



Edited 1 time(s). Last edit at 04/20/2015 02:26AM by bodhi.
TEN
Re: Unicode/UTF-8 file names Oxnas
April 20, 2015 01:05PM
renojim Wrote:
convmv -t utf8 *nima.jpg
mv "./Ænima.jpg"  "./�nima.jpg"
> really makes a mess of things. I end up with a
> file named "./�nima.jpg". That's really not what I want.

As a wrote, a first question should be (and you'll need to tell convmv) which character set this is coming -f(rom).

> I think the only thing that will help now is massive amounts of alcohol.

That'll only make you see ÆÆnniimmaa ;)



Edited 1 time(s). Last edit at 04/20/2015 01:07PM by TEN.
Re: Unicode/UTF-8 file names Oxnas
April 20, 2015 01:40PM
On the command line, I've been able to view the characters 'Ã' and 'Æ' for quite a while (installing locales fixed that problem). It's when I do a directory listing that I don't see the proper characters.

bodhi Wrote:
-------------------------------------------------------
> JT,
>
> I think it might be your Windows file name already
> encoded differently.
>
> Try this
>
> On Dockstar
>
> # touch /tmp/Ãnima.txt
>
>
> On PogoPro
>
> # rsync -a <Dockstar IP addr>:/tmp/Ãnima.txt .
>
>
>
> # ls *.txt
> Ãnima.txt
>

Here's the first problem. touch on the Dockstar doesn't create a file with a UTF-8 file name:
root@debian:/tmp# touch /tmp/Ãnima.txt
root@debian:/tmp# ls
?nima.txt
rsync will copy the file, but, no surprise, it still doesn't have a UTF-8 file name:
root@PogoPro:/tmp# rsync -a 192.168.0.115:/tmp/Ãnima.txt .
root@192.168.0.115's password:
root@PogoPro:/tmp# ls
?nima.txt

I have been able to copy a file off of the Dockstar that has a UTF-8 name to the Pro, so the underlying problem may be samba related.

On the Dockstar:
root@debian:/tmp# cp /media/500GB/Music/Album\ Art/*nima.jpg .
root@debian:/tmp# ls
Ãnima.jpg

On the Pro:
root@PogoPro:/tmp# rsync -a 192.168.0.115:/tmp/Ãnima.jpg .
root@192.168.0.115's password:
rsync: link_stat "/tmp/\#303nima.jpg" failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1536) [Receiver=3.0.9]
root@PogoPro:/tmp# rsync -a 192.168.0.115:/tmp/*nima.jpg .
root@192.168.0.115's password:
root@PogoPro:/tmp# ls
Ãnima.jpg
root@PogoPro:/tmp# convmv -t utf-8 *nima.*
Your Perl version has fleas #37757 #49830
Starting a dry run without changes...
Skipping, already UTF-8: ./Ãnima.jpg
No changes to your files done. Use --notest to finally rename the files.
So there's still something weird going on. I could 'rsync' Ãnima.txt off of the Dockstar even though it displayed as ?nima.txt on the Dockstar, but I can't rsync "Ãnima.jpg" off of the Dockstar when it displays as Ãnima.jpg on the Dockstar.

I'm still shaking my head. It appears the kernel on the Pro supports UTF-8, but there's still something strange going on.

-JT
Re: Unicode/UTF-8 file names Oxnas
April 20, 2015 03:43PM
JT,

Quote

Here's the first problem. touch on the Dockstar doesn't create a file with a UTF-8 file name:
root@debian:/tmp# touch /tmp/Ãnima.txt
root@debian:/tmp# ls
?nima.txt

It seems the Dockstar set up is your problem. It should show the correct display file name both on command line and then when executing ls. A file copied from somewhere else really not helpful when we test, so I would not count on it. It must be created on the Dockstar, and displayed correctly on the Dockstar.

I wonder what is in your rootfs mount option? I don't know if it is related, but you could try adding extended attributes so that it mounted with:
/dev/sdb1 on / type ext3 (rw,noatime,errors=remount-ro,user_xattr,acl,barrier=1,data=ordered)

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Re: Unicode/UTF-8 file names Oxnas
April 21, 2015 03:26AM
/ on the Dockstar:
/dev/sda1 on / type ext2 (rw,noatime,errors=remount-ro)

The "barrier=1" and "data=ordered" attributes weren't accepted and the others didn't make any difference. I just can't seem to create a file with Unicode/UTF-8 characters in the name and have that name display properly (or as something other than a question mark) on any file system on the Dockstar (or the Pro or the Mobile) unless I do it from Windows to an NTFS drive mounted via Samba.

-JT
Re: Unicode/UTF-8 file names Oxnas
April 21, 2015 05:13AM
JT,

Instead of chasing something you can't see. How about starting fresh again? I would create a new rootfs on a single Ext3 partition USB. Don't upgrade. Do all the steps in
http://forum.doozan.com/read.php?2,21263,21288#msg-21288

I did that yesterday from a newly reformatted Ext3 USB stick on the Pogo Pro to see if I can duplicate your problem. And I could not :) Once you did this, I think you will see where the problem was.

-bodhi
===========================
Forum Wiki
bodhi's corner (buy bodhi a beer)
Re: Unicode/UTF-8 file names Oxnas
April 21, 2015 05:52PM
Thanks bodhi. Will do when I get some free time.

-JT
Author:

Your Email:


Subject:


Spam prevention:
Please, enter the code that you see below in the input field. This is for blocking bots that try to post this form automatically. If the code is hard to read, then just try to guess it right. If you enter the wrong code, a new image is created and you get another chance to enter it right.
Message: