• boonhet@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    2
    ·
    3 hours ago

    Haven’t they clearly documented how they did it and what they used so that anyone can replicate it?

    They don’t put up the actual code for their training pipeline though. It’s more of a “if you have enough engineers, you can do this too” whitepaper, because they wouldn’t want any rando training their own model.

    Right now, even if you had the exact training set (which is a CRUCIAL part of an LLM and you can NOT replicate it without it), you couldn’t rebuild the thing exactly, you’d need to do a whole lot of extra work.

    So how is it not open source in this specific domain of problems?

    You could call all proprietary software open source then. The UI and user manual describe what it does, you can do your own engineering to duplicate the functionality.