Notepad doesn't understand Unix line endings

Started by Forestidia86, December 29, 2017, 02:46:07 PM

Previous topic - Next topic

0 Members and 2 Guests are viewing this topic.

Simon

#15
Quote from: Forestidia86 on January 23, 2018, 07:13:51 PM
But exactly that implementation of the definition is the act/decision that sacrifices cross-compability (it would be possible otherwise).

That decision makes it impossible to edit Zig source with Notepad, correct. At least until they finish their formatting tool.

Having stronger standards makes collaboration easier, because the files have less chance to differ in seemingly-irrelevant noise. I assume you agree that there is at least potential benefit here; I judge this benefit very valuable but I can accept if you deem it small.

Then the issue is whether killing Notepad interop is worth the stricter standards. They decided that it is. It's still some work in other editors, even if you can configure their endings.

In Lix, I decided that it's not worth it, and accept either file.

QuoteBy making the differences to something relevant? So line endings play a role and not only the visible strings of signs.

I assume this targets version control and diff tools, because it's a reply to the answer to (Users of which software are hit by LF/CRLF differences and can thus profit from a standardized codebase?).

Then yes, whitespace plays a role for version control and diff tools. The files are different on disk, and these tools should then treat files as different. Version control is agnostic of your use case: It doesn't know whether the text makes sense in any language. Different bytes between file A and file B mean different files, and the tools should highlight those changes to me.

QuotePart of the data that is relevant for the compiler seems to be obscure for a normal user because you can't see it easily.

Correct, relevant for the compiler. At least the compiler will immediately, and noisily, tell you about it.

The hope is that easier merges with version control is worth this, because programmers are less likely to send code with different endings to each other (the compiler would have complained). And that it's easier to write extra tools because the standard is so strict.

QuoteIt's not only experienced programmers that can be hit by that. What about people like me that are no programmers but nevertheless have to do with the source code since they build from source or similiar.

Yes, because "text files" come in many encodings and varieties. Learn the nuances, or be prepared to run into subtle problems. Even with an editor that understands LF-terminated files, there can still be issues with encodings, or even within Unicode. Text files aren't simple.

If the Zig compiler accepted CRLF-files, would that prevent you from ever running into trouble with Notepad?

QuoteWell, you surely have to, if you need to care about line endings and formats.

What would be your suggested alternative to learning about different formats? Ask every community to only produce UTF-8 with CRLF, because Notepad can understand that?

Or encourage tools that abstract away from the different formats? ... But Notepad fails this criterion?

-- Simon

ccexplore

Quote from: Simon on January 23, 2018, 08:07:59 PMThen the issue is whether killing Notepad interop is worth the stricter standards. They decided that it is. It's still some work in other editors, even if you can configure their endings.

One key difference is that no professional software engineers will be caught dead using Notepad to edit their source code. :P Lix is a bit different since we don't want to exclude non-programmers for a technical reason that they'd neither understand nor care.

It should be pretty easy to find free text editors out there for Windows that are more powerful than Notepad (of course, that's about as low as the bar can go) and can handle Unix line endings just fine.

Forestidia86

Just to be clear: I extra got Notepad++ for Lix some time ago. And yeah, with in-depth programming you are probably generally better off with using something more powerful than mere Notepad.

You talk about sending code but what constitutes code under this conditions? You seem to need more than the plain text, you seem to need the file to be sure? (If I open one of the Lix source files on github, mark the text and copy the content in a text file, then it doesn't seem to convey the file endings but only the plain text? (Notepad shows it then properly as opposed to copying it from the file itself.))

About strictness: One example of kind of non-strictness from your code that actually had hit me:

In one situation you said to me to change _trapMouse = true to false in src/hardware/mouse.d.

But there is an instance in the code that looks like that:
_trapMouse       = true; (l. 151)

The spaces make sense because so everything is neatly in line but if I do find and replace it seems to be seen as a different string of signs. Was I supposed to change that as well? I actually did that after having it overlooked at first. Are these spaces no problem for strict standards?

Simon

Quotewhat constitutes code under this conditions? You seem to need more than the plain text, you seem to need the file to be sure?

The 100 % exact answer depends on the language. E.g., for D, it's UTF-8 text with CRLF or LF endings (which both behave similar to a space I think). The text can be in a file, but need not be. Non-Unicode encodings aren't valid D. ASCII is a subset of UTF-8 and thus valid unicode.

QuoteIf I open one of the Lix source files on github, mark the text and copy the content in a text file, then it doesn't seem to convey the file endings but only the plain text? (Notepad shows it then properly as opposed to copying it from the file itself.)

Interesting phenomenon. From this gif, I assume that Windwos's copy buffer is encoding-agnostic. If you somehow get LF text inside the copy buffer, it will come out as LF text.

Now, according to atom editor issue 8365's first post, most text editors convert LF to CRLF when you copy, such that only CRLF text makes it into the copy buffer.

I assume your webbrowser behaves the same: When it renders HTML or a text file for you, it silently converts the displayed text to CRLF once you highlight & copy.

Quote_trapMouse       = true; (l. 151)
The spaces make sense because so everything is neatly in line but if I do find and replace it seems to be seen as a different string of signs.

Excellent catch.

These spaces have no meaning, but make the editing harder, unnecessarily. They affect search & replace. When one of the neighboring lines change, the alignment becomes wrong anyway, and we would have to make larger changes than necessary (re-align all lines in the block), then the change will be harder to understand.

Especially older parts of the source still have these decorative spaces for alignment with neighboring lines. I try to not put them anymore into any code. I admit that the habit is hard to break.

Yeah, I assumed you had change this line as well. My bad for not finding this in your attached mouse.d. <_<;; I took a brief look at the file back then, you didn't change anything but such lines, but didn't check whether you found every single occurrence.




In Lix, I won't stop accepting/outputting CRLF on Windows. I've merely found the Zig rules an unexpected example of how far you can take stylistic rules. And I hope that the discussion was not considered trolling, even though in hindsight, it may easily look like it.

Whenever I create the Windows Lix download, I should probably convert all levels to CRLF.

For git, it's possible to configure per repository (I haven't done it so far) how levels should be checked out (e.g., have them as LF in the repo but CRLF in a Windows worktree). Sadly, the repo has half-LF, half-CRLF levels, even though I've paid attention for a while now to check in only LF levels.

-- Simon

Crane

To give you my input... Notepad is a bare-bones text editor and is honestly not that good when you have to do technical things. The fact it can't handle Unix-style line endings is one of the main problems with it, and this shortcoming should not be a reason to modify your programs to produce Windows-style line endings because, as mentioned, this breaks cross-compatibility (although usually the worst you'll see is a symbol representing CR in the Unix text editors)

I personally use Notepad++ for my technical work, and this does properly support and preserve whatever line endings are given.