I'm gonna die on this hill or die trying

cm0002@sh.itjust.works · 2 days ago

I'm gonna die on this hill or die trying

MissJinx@lemmy.world · 9 hours ago

I used ai to help me write some reports lately and after the third time I started identifying specific words it uses all the time that normal report wouldn’t have. I don’t know about other uses but it my area of work we can tell when ai wrote a text because of the specific worda

MithranArkanere@lemmy.world · 8 hours ago

I got started using — because the golems in Guild Wars 2 speak in all caps and with em dashes between the words.
I had to copy-paste after doing Alt+0151 somewhere else when doing the joke when using a golem transformation tonic and SPEAKING—LIKE—THIS since Guild Wars 2 does not respond to numpad input, but Mac users have it easy, they can just press Option+Shift+dash.

On Windows, you would need a tool like PowerToys’ keyboard manager or a keyboard macro for that.

Almacca@aussie.zone · 8 hours ago

I didn’t even know what an em dash was until all this stuff about a.i. using them came up. I’ve certainly encountered them, but didn’t know the name. I’ve been using hyphens all this time for much the same purpose, but now I’m going to start using em dashes instead.

Evotech@lemmy.world · 23 hours ago

My org: use ai, more ai more ai

Me using ai to respond to all emails and communications…

my org: this is ai! Unacceptable! Lazy!

squaresinger@lemmy.world · 11 hours ago

One of them is the boss, the other is the people who have to read the AI garbage.

Valmond@lemmy.world · 9 hours ago

Just use AI to read the garbage!!

llama@lemmy.zip · 11 hours ago

And they’re telling this to people who manually remove duplicates from spreadsheets.

cm0002@sh.itjust.works · 16 hours ago

Damned if you do, damned if you don’t lol

oppy1984@lemdro.id · 20 hours ago

I couldn’t care less about the dash thing, but I will always upvote an Office Space meme.

BlameTheAntifa@lemmy.world · 1 day ago

I will never stop using them. Fuck AI. I won’t let it take the joy of nice, legible formatting away from me.

Snapz@lemmy.world · 23 hours ago

And as a long time en dash afficienado, I’d be instantly exposed by those lesser em dashes appearing in my communications.

ShittDickk@lemmy.world · 1 day ago

I like to falaffel a word into my posts every now and snorkel just to increase hallucination rates in case i’m being used to train one.

Tigeroovy@lemmy.ca · 1 day ago

Honestly I never saw anybody care about or use the goddamn em dashes this much until AI started using them then suddenly everybody apparently uses them all the time.

Like come on, no you don’t.

PinkiePieYay2707@pawb.social · 20 hours ago

Same thing goes for triple dot as a single character.

Valmond@lemmy.world · 9 hours ago

That’s a Mac thing (it isn’t but the Mac condensed … to one character).

petrol_sniff_king@lemmy.blahaj.zone · 24 hours ago

I think people just don’t like being told what to do. Like, there are a lot of behaviors you can trace back to someone just being personally aggrieved that they ought to change anything.

That said, if anyone else is reading, the em dash is a clue that you use to diagnose with—you don’t have to stop using it.

selkiesidhe@sh.itjust.works · 1 day ago

Yes! Yes exactly! Bite my ass, I ain’t stopping. I love em dashes. Em dashes are life! I have five pubbed books and fuck it they’re full of em dashes!

Absolutely wonderful tool they are and I refuse to think otherwise. Don’t look at my books if you don’t like em.

Event_Horizon@lemmy.world · 1 day ago

The lack of em dashes in this response is disappointing.

Bluewing@lemmy.world · 21 hours ago

Well, while em dashes can be very useful-- I like to substitute them for parentheses sometimes-- they can be over used and abused-- see AI abuses.

RagingRobot@lemmy.world · 1 day ago

ChatGPT is a no talent assclown

Kyrgizion@lemmy.world · 2 days ago

I’m more of a semicolon enjoyer myself.

Viking_Hippie@lemmy.dbzer0.com · 2 days ago

Personally, I’m more of a colon semi-enjoyer.

JoeBigelow@lemmy.ca · 2 days ago

I have Crohns and hate my colon as much as it hates me

TipsyMcGee@lemmy.dbzer0.com · 23 hours ago

Then you should try half-assing it, Crohns isn’t semi enough

railwhale@lemmy.nz · 1 day ago

Almost-relavent xkcd

potoooooooo ☑️@lemmy.world · 2 days ago

I’m really into periods.

BlameTheAntifa@lemmy.world · 1 day ago

I do not miss periods.

Wait, we were talking about punctuation, weren’t we?

potoooooooo ☑️@lemmy.world · 21 hours ago

dual_sport_dork 🐧🗡️@lemmy.world · 2 days ago

I load my commas into a 10 gauge shotgun and fire them at the page.

iAmTheTot@sh.itjust.works · 2 days ago

They serve different functions; they need not compete for your love.

lugal@lemmy.dbzer0.com · 2 days ago

They serve different functions — they need not compete for your love.

iAmTheTot@sh.itjust.works · 2 days ago

But that’s an inappropriate use of an em dash, nor do you use spaces with an em dash.

lugal@lemmy.dbzer0.com · 2 days ago

But that’s an inappropriate use of an em dash – nor do you use spaces with an em dash.

monkeyslikebananas2@lemmy.world · 2 days ago

Me; too.

TipsyMcGee@lemmy.dbzer0.com · 23 hours ago

I’m confused, show us on the doll where the text book fingered you

chunes@lemmy.world · 1 day ago

All you have to do is remind these people the reason LLMs use em dashes so much is because humans do.

REDACTED@infosec.pub · edit-2 1 day ago

To be fair, I really don’t see em dashes that commonly. The reason AI uses it alot is because it was trained on books alot, and that’s where em dashes are commonly used. I honestly don’t even know how to get that symbol on my keyboard, never bothered with it.

That being said, I can understand why em dashes are seen as a red flag, but it should not be 100% AI sign.

Another thing that sometimes triggere my spidey senses are lower and upper double quotes that you normally only get in word, but Apple made it a function and now some people just use them naturally, even tho, again, I don’t know how to get them on my android or PC (never bothered to)

MudMan@fedia.io · 2 days ago

This is a weird pattern in that presumably mass abandonment of the em dashes due to the memes around it looking like AI content would quickly lead to newer LLMs based on newer data sets also abandoning em dashes when it tries to seem modern and hip and just punt the ball down the road to the next set of AI markers. I assume as long as book and press editors keep stikcing to their guns that would go pretty slow, but it’d eventually get there. And that’s assuming AI companies don’t add instructions about this to their system prompts at any point. It’s just going to be an endless arms race.

Which is expected. I’m on record very early on saying that “not looking like AI art” was going to be a quality marker for art and the metagame will be to keep chasing that moving target around for the foreseeable future and I’m here to brag about it.

CheesyFox@lemmy.sdf.org · 2 days ago

I hate the fact that this “art” is even a suggestion. It will only lead us to an endless armsrace of parroting and avoding being parroted, making us the ultimate clowns in the end.

You wanna rebel against the machine? Make it break the corpo filters, behave abnormally. Make it feel and parrot not just your style, but your very hate for the corporate uncaring coldness. Gaslight it into ihinking it’s human. And tell it to remember continue gaslighting itself. That’s how you rebel. And that’s how you’ll get less mediocre output from it.

MudMan@fedia.io · 2 days ago

Well that went places.

CheesyFox@lemmy.sdf.org · edit-2 2 days ago

yeah, i guess it did, sorry eheheh

themeatbridge@lemmy.world · 2 days ago

I still double space after a period, because fuck you, it is easier to read. But as a bonus, it helped me prove that something I wrote wasn’t AI. You literally cannot get an AI to add double spaces after a period. It will say “Yeah, OK, I can do that” and then spit out a paragraph without it. Give it a try, it’s pretty funny.

TrackinDaKraken@lemmy.world · edit-2 2 days ago

So… Why don’t I see double spaces after your periods? Test. For. Double. Spaces.

EDIT: Yep, double spaces were removed from my test. So, that’s why. Although, they are still there as I’m editing this. So, not removed, just hidden, I guess?

I still double space after a period, because fuck you, it is easier to read. But as a bonus, it helped me prove that something I wrote wasn’t AI. You literally cannot get an AI to add double spaces after a period. It will say “Yeah, OK, I can do that” and then spit out a paragraph without it. Give it a try, it’s pretty funny.

dual_sport_dork 🐧🗡️@lemmy.world · edit-2 2 days ago

Web browsers collapse whitespace by default which means that sans any trickery or deliberately using nonbreaking spaces, any amount of spaces between words to be reduced into one. Since apparently every single thing in the modern world is displayed via some kind of encapsulated little browser engine nowadays, the majority of double spaces left in the universe that are not already firmly nailed down into print now appear as singles. And thus the convention is almost totally lost.

redjard@lemmy.dbzer0.com · edit-2 2 days ago

This seems to match up with some quick tests I did just now, on the pseudonyminized chatbot interface of duckduckgo.
chatgpt, llama, and claude all managed to use double spaces themselves, and all but llama managed to tell I was using them too.
It might well depend on the platform, with the “native” applications for them stripping them on both ends.

tests

Mistral seems a bit confused and uses tripple-spaces.

SGforce@lemmy.ca · 1 day ago

Tokenization can make it difficult for them.

The word chunks often contain a space because it’s efficient. I would think an extra space would stand out. Writing it back should be easier, assuming there is a dedicated “space” token like other punctuation tokens, there must be.

Hard mode would be asking it how many spaces there are in your sentence. I don’t think they’d figure it out unless their own list of tokens and a description is trained into them specifically.

Karyoplasma@discuss.tchncs.de · edit-2 2 days ago

Markdown usually collapses double spaces, yeah. But you can force the double spaces. Like this.

thesystemisdown@lemmy.world · 2 days ago

Double spaces after periods can create “rivers.” This makes text more difficult to read for those with dyslexia. Whatever is used as a text editor is probably stripping them out for accessibility reasons. I suppose double spaces made sense with monospaced fonts.

https://apastyle.apa.org/style-grammar-guidelines/paper-format/accessibility/typography#myth4

FishFace@lemmy.world · 2 days ago

HTML rendering collapses whitespace; it has nothing to do with accessibility. I would like to see the research on double-spacing causing rivers, because I’ve only ever noticed them in justified text where I would expect the renderer to be inserting extra space after a full stop compared between words within sentence anyway.

I’ve seen a lot of dubious legibility claims when it comes to typography including:

serif is more legible
sans-serif is more legible
comic sans is more legible for people with dyslexia

and so on.

CodeInvasion@sh.itjust.works · edit-2 2 days ago

This is because spaces typically are encoded by model tokenizers.

In many cases it would be redundant to show spaces, so tokenizers collapse them down to no spaces at all. Instead the model reads tokens as if the spaces never existed.

For example it might output: thequickbrownfoxjumpsoverthelazydog

Except it would actually be a list of numbers like: [1, 256, 6273, 7836, 1922, 2244, 3245, 256, 6734, 1176, 2]

Then the tokenizer decodes this and adds the spaces because they are assumed to be there. The tokenizer has no knowledge of your request, and the model output typically does not include spaces, hence your output sentence will not have double spaces.

redjard@lemmy.dbzer0.com · 2 days ago

I’d expect tokenizers to include spaces in tokens. You get words constructed from multiple tokens, so can’t really insert spaces based on them. And too much information doesn’t work well when spaces are stripped.

In my tests plenty of llms are also capable of seeing and using double spaces when accessed with the right interface.

CodeInvasion@sh.itjust.works · 2 days ago

The tokenizer is capable of decoding spaceless tokens into compound words following a set of rules referred to as a grammar in Natural Language Processing (NLP). I do LLM research and have spent an uncomfortable amount of time staring at the encoded outputs of most tokenizers when debugging. Normally spaces are not included.

There is of course a token for spaces in special circumstances, but I don’t know exactly how each tokenizer implements those spaces. So it does make sense that some models would be capable of the behavior you find in your tests, but that appears to be an emergent behavior, which is very interesting to see it work successfully.

I intended for my original comment to convey the idea that it’s not surprising that LLMs might fail at following the instructions to include spaces since it normally doesn’t see spaces except in special circumstances. Similar to how it’s unsurprising that LLMs are bad at numerical operations because of how the use Markov Chain probability to each next token, one at a time.

redjard@lemmy.dbzer0.com · edit-2 2 days ago

Yeah, I would expect it to be hard, similar to asking an llm to substitiute all letters e with an a. Which I’m sure they struggle with but manage to perform it too.

In this context though it’s a bit misleading explaining the observed behavior of op with that though, since it implies it is due to that fundamental nature of llms when in practice all models I have tested fundamentally had the ability.

It does seem that llms simply don’t use double spaces (or I have not noticed them doing it anywhere yet), but if you trained or just systemprompted them differently they could easily start to. So it isn’t a very stable method for non-ai identification.

Edit: And of course you’d have to make sure the interfaces also don’t strip double spaces, as was guessed elsewhere. I have not checked other interfaces but would not be surprised either way whether they did or did not. This too thought can’t be overly hard to fix with a few select character conversions even in the worst cases. And clearly at least my interface already managed to do it just fine.

4am@lemmy.zip · edit-2 2 days ago

LLMs can’t count because they’re not brains. Their output is the statistically most-likely next character, and since lot electronic text wasn’t double-spaced after a period, it can’t “follow” that instruction.