Notepad doesn't understand Unix line endings

Started by Forestidia86, December 29, 2017, 02:46:07 PM

Previous topic - Next topic

0 Members and 2 Guests are viewing this topic.

Forestidia86

Edit Simon: This was split off mobius's how replays work ??

Quote from: Simon on December 29, 2017, 01:51:51 PM
Don't use Windows Notepad, it cannot display text files with Unix line endings, which is a 30-year-old bug.

Isn't that too much said? Of all the plenty files I inspected I could open all but sometimes the text was just clustered with no real line breaks. So isn't it more like it can't understand the line breaks than that it can't be read at all?
$FILENAME should be the first entry, so even without line breaks it should be viable.

Quote from: Simon on December 29, 2017, 01:51:51 PM
Use a reasonable text editor, of which there are many.

That's not really helpful. You point users to go on search for external software to use your product properly. Especially if there are many it can be hard to decide what to take and what is safe to take.
I can understand that it would be problematic to promote a particular program it is nevertheless not user friendly.
I don't know if it would be at least ok to link to a trustworthy arcticle which presents the different alternatives?

Simon

Quote from: Forestidia86 on December 29, 2017, 02:46:07 PM
it can't understand the line breaks
$FILENAME should be the first entry, so even without line breaks it should be viable.

Yeah, Notepad misinterprets the line endings. Printable ASCII chars are displayed correctly. I don't remember how Notepad handles Unicode.

As long as Notepad saves the linebreaks back to file, it produces acceptable output. Linebreaks carry meaning in the Lix formats. You can even put Windows linebreaks and Unix linebreaks in the same text file and Lix should still be happy.

Still, I don't know though why anybody would want to suffer from such a broken program. It supposed to accomplish a single thing, edit text files, yet cannot understand the simplest format of text files.

Quote
Quote from: Simon on December 29, 2017, 01:51:51 PM
Use a reasonable text editor, of which there are many.
That's not really helpful. You point users to go on search for external software to use your product properly.

I merely remember complaints about how Lix text files contained garbage. They do not. The warning against Notepad is merely a precaution against this common reply.

Text editors are fundamental parts of operating systems, as are utilities to copy, move, delete files. Now, Windows's default text editor is sorely broken. That merely means every Windows user has to fix their problem with their operating system. The bugs in Notepad will hit them with any other culture that uses text files. It's hardly specific to Lix.

Maybe it's already enough to open text files with Wordpad instead of Notepad? I haven't researched this.

-- Simon

Forestidia86

#2
Quote from: Simon on December 29, 2017, 03:55:46 PM
Maybe it's already enough to open text files with Wordpad instead of Notepad? I haven't researched this.

Yeah, Wordpad seems to show it properly, but is almost too fancy for this task.

Quote from: Simon on December 29, 2017, 03:55:46 PM
Text editors are fundamental parts of operating systems, as are utilities to copy, move, delete files. Now, Windows's default text editor is sorely broken. That merely means every Windows user has to fix their problem with their operating system. The bugs in Notepad will hit them with any other culture that uses text files. It's hardly specific to Lix.

I can't approve of your view concerning user responsibility but that's how it is. As a Windows user you are used to have the full package, we both are caught in our own (OS) culture in this disagreement, I think. But I will have to accept that.
I think you rarely need to use text editing to use a program properly (on Windows) so I think it's a bit particular to Lix. (Copying, deleting, moving is usually done per mouse and drag&drop or shortcuts or rightclick etc. on Windows not via console at least by standard users I think.) 
But I generally agree that there are and can be different instances where it hits (yeah, if there are plain txt-files to read it's sometimes clustered for other programs as well), so it's a general problem, ok. 
I don't want to derail this thread further, so I won't say anymore.

nin10doadict

QuoteThe bugs in Notepad will hit them with any other culture that uses text files.
See, I never knew that Notepad was so buggy. When I've delved into hacking Fire Emblem, I've noticed that the text files related to that all seem to have awful formatting. For all I know, that isn't the case and Notepad is just failing to display the new line characters so everything seems jammed together.
...Upon opening such files in WordPad, the new line characters are displayed and everything is nicely spaced. Huh. I might have to start using WordPad as the default. Learning things! 8-)

Simon

#4
Common tasks shouldn't require hand-editing in the first place. This guideline has fuelled the different file browsers in Lix and partly the framestepping, to assign skills with perfect precision.

But fixing $FILENAME lines has been my most common hand-editing task. I've even written scripts that fix the innermost dir in $FILENAME lines according to where the level sits, to detect moves across ranks in packs. I'm already considering to print the pointed-to level path inside Lix: Issue #276. Maybe it should be changable from inside Lix, maybe even with something smarter than a mere text-entry field.

Yeah, the culture on Linux is massively different. I don't even expect Windows users to write scripts, but I'd expect them, if problems arise, to at least look at simple self-describing (I hope) data.

QuoteWordpad seems to show it properly, but is almost too fancy
QuoteWordPad, the new line characters are displayed and everything is nicely spaced

Yeah, Wordpad isn't ideal for text files, it's designed for formatting rich text. But it gets our job done without installing anything, which is nice.

-- Simon




I've split this thread off mobius's how replays work ??, here are half-posts from there that belong into this topic:

Quote from: mobiusnotepad pissed me off because basically what was said; I often used it write down a post for a forum before hand* but because of the line breaks issue the word wrap feature of notepad is stupid and I would have to constantly switch it on and off.

Anyway: I now use and highly recommend Notepad++ ; free and very simple but way more features than notepad; and word wrap that works :thumbsup:

*because of things like the other annoying issue where if something goes wrong like trying to upload an attachment which is too large; the site psudo-crashes, fails to post and you lose your text.

Quote from: ForestidiaYeah, I have Notepad++, too. It's probably good for advanced tasks but too fancy for me for simple text editing. (I really don't want to play defender for Win Notepad, I just like the plainness; but it seems from the reactions that my issue is a non-issue.)

Forestidia86

Quote from: Simon on December 29, 2017, 05:00:09 PM
QuoteWordpad seems to show it properly, but is almost too fancy
QuoteWordPad, the new line characters are displayed and everything is nicely spaced

Yeah, Wordpad isn't ideal for text files, it's designed for formatting rich text. But it gets our job done without installing anything, which is nice.

Just an interesting thing I've noticed: If you save the file with WordPad, it actually shows properly afterwards in Notepad.

Ryemanni


ccexplore

Quote from: Forestidia86 on December 30, 2017, 03:52:42 AM(Only out of curiosity: Why are you using slash instead of backslash? Is it slash in Linux? (Though slash seems to work for cmd prompt in Win as well but path is generally shown with backslash.))

*nix OSes had always used the forward slash as the path separator.  This tradition likely even bled into other forms of paths like URLs that also uses forward slashes in a similar way.  Windows had traditionally used backslashes instead but I think in at least some (if not all) contexts it will accept either.  I don't know the history of how all this came to be, but I'm sure you can find out via Google and Wikipedia.

Simon

Yeah, as ccexplore explains, on Linux and Mac, slash is the only allowed directory separator.

On Windows, backslash may be default, but Windows understands slash perfectly fine. Thus, slash it is, everywhere in Lix. Instant cross-platform compatibility. :lix-cool: This is really nice of Windows, for once.

In light of this argument for slash, it looks sensible to argue that I should output CRLF instead of LF in all text files, because CRLF works with any tool including Windows Notepad. But, unlike slash vs. backslash, CRLF is more complex a token, and would be more complex to implement because I'd have to override D standard library behavior. I'll let the library worry about the line terminator, write simple code, and indeed accept both CRLF and LF in Lix files.

-- Simon

Simon

#9
I've dabbled in the documentation of the Zig programming language recently, and found this about Zig source file encodings:

Ascii control characters [are allowed], except for U+000a (LF): U+0000 - U+0009, U+000b - U+0001f, U+007f. (Note that Windows line endings (CRLF) are not allowed, and hard tabs are not allowed.)

Details on allowed characters, e.g., non-ASCII may only appear in string literals, not in identifiers
Rationale on forbidding Windows line endings, mentioning Notepad bugs

From my viewpoint of maintaining files by many different people, this worthy of praise. Tab characters for indentation and different line endings cause endless merge conflicts. But the language now requires every Windows developer to configure their editor first, even for Hello World.

It doesn't flag trailing whitespace as errors though, even though I consider trailing whitespace a nice indicator for mediocre codebase quality. :lix-evil:

-- Simon

Forestidia86

I actually don't understand enough of it to really be able to comment on it.
But from a mere argumentative viewpoint:
How coherent is it to disallow line endings of a very widespread OS but at the same time complain that Notepad doesn't support line endings of other OS? Maybe the same rationale lies behind Notepad not getting fixed in understanding Unix endings as Zig doesn't get the feature to understand Windows endings.
Why should be Unix line endings the base of everything and not the Windows ones?

Simon

#11
The idea behind Zig syntax is that code should be clear and verbose where appropriate, and that very common problems have one idiomatic solution.

Apparently, this design principle goes even into source formatting. If the code authors are forced to adhere to such a standard, other users of the source code enjoy stronger guarantees, e.g., combine snippets from different sources and always end with consistent linebreaks troughout the new file. I happen to like such strict standards, but that's really where different tastes clash.

Now, from the possible standards of CRLF or LF, they chose LF because it's simpler. It happens that git and sed play slightly better with LF, too, but simplicity was their main argument.

They plan to offer an extra tool, zig fmt, that takes your source and automatically converts CRLF to LF, tabs to spaces, strip unnecessary spaces, etc. Similar tools are accepted in other ecosystems, e.g., the Go language. Until they have zig fmt ready, the burden is on the programmer. But that's still in line with the philosophy that the programmer should take good care to bring their code in the best-presentable shape.

-- Simon

Forestidia86

This is the last thing I say to it because it upsets me; you can't image how much. But as I indicated, maybe that results from my non-understanding of the matter.

What you say makes no sense to me.
If Zig sacrifices cross-compatibility it's a good feature but if Notepad does it's a bad bug.
You say it's better for other users. But who are these other users you have in mind, only Linux users? Code can be produced without going through a compiler. And code from OSes that don't have the right endings, format etc. has to be converted as well or especially because of the strict standard. This seems just like a mix-up of what ought to be and what is. Maybe if all programmers adhere to this philosophy everything works out fine but is this realistic?
For me this sounds just like a nightmare for all users that have an OS that doesn't use Unix endings/standards by default. 

Simon

Don't focus on the tools.

Focus on the data. Data either conforms to a rule, or it doesn't. There is no "code from an OS", there is "valid Lix level file" or "valid Zig code".

For Lix levels, the levels may have LF or CRLF endings, because I merely call the D standard library and let it worry about file endings.

For Lix's documentation, I explicitly demand in doc/srcfmt.txt that it have CRLF endings, such that Notepad users can still read and change it. For reading documentation, it doesn't matter in the slightest that Notepad might choke on some levels.

In Zig source, CR and tab characters are considered bugs, by definition of the format. This is not sacrificing cross-compatibility, this is the definition of the format.

Now, when people say "text file", you have to ask exactly what formatting they mean. There are many possible formats out there, sometimes they aren't even Unicode, such as the Windows-1252 8-bit format. I'll wager a guess that the most common text file formats are UTF-8 with LF and UTF-8 with CRLF. If you consider all UTF-8 files with LF malformed, then Notepad has no bug.

Confusion only arises as long as we don't agree what valid data is.

QuoteBut who are these other users you have in mind, only Linux users?

Every git user, every diff user.

These tools care about differences in files. You want to minimize irrelevant differences in files. LF/CRLF differences, tabs/space differences, and trailing whitespace are the most common source of such irrelevant differences. It's reasonable that many ecosystems eventually settle on a standard.

-- Simon

Forestidia86

Quote from: Simon on January 23, 2018, 06:12:12 PM
In Zig source, CR and tab characters are considered bugs, by definition of the format. This is not sacrificing cross-compatibility, this is the definition of the format.

But exactly that implementation of the definition is the act/decision that sacrifices cross-compability (it would be possible otherwise). There maybe good reasons for it but that doesn't change that it is a deliberate decision to declare these things invalid.

Quote from: Simon on January 23, 2018, 06:12:12 PM
These tools care about differences in files. You want to minimize irrelevant differences in files. LF/CRLF differences, tabs/space differences, and trailing whitespace are the most common source of such irrelevant differences. It's reasonable that many ecosystems eventually settle on a standard.

By making the differences to something relevant? So line endings play a role and not only the visible strings of signs. With that you bloat the relevant aspects for the code to work. It so plays a role which standard the editor has that you use. Part of the data that is relevant for the compiler seems to be obscure for a normal user because you can't see it easily.
It's not only experienced programmers that can be hit by that. What about people like me that are no programmers but nevertheless have to do with the source code since they build from source or similiar.

Quote from: Simon on January 23, 2018, 06:12:12 PMDon't focus on the tools.

Well, you surely have to, if you need to care about line endings and formats.