• YoureHotCupCake@lemmy.world
    link
    fedilink
    English
    arrow-up
    6
    arrow-down
    1
    ·
    24 hours ago

    You mean all of this code that is clearly on their github: https://github.com/deepseek-ai? They release both their model weights as well as the source code for their AI. You can literally take what they have provided to create your own LLM if you would like to and get a good understanding of their AI. Sure you can’t see the training data but that would be like putting the entirety of the internet in a github repo and just isn’t feasible, but you can contribute your own training data to a local setup of deepseek and shape it in a way you want to.

    • boonhet@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      6
      ·
      22 hours ago

      The training data is as important as the source code here to replicate the end result. The weights are more like a binary distribution. You can run the model and you can technically edit it just like you can technically edit a binary file.

      They also only release some libraries and tools for running the model if you have a set of weights (which they do graciously provide), but they do NOT release the source code for their training pipeline itself. That’s up to you to reverse engineer from the whitepapers. Right now even if you had the exact training data and the compute available, you could not train your own Deepseek V3.2, let alone V4.

      • xep@discuss.online
        link
        fedilink
        English
        arrow-up
        5
        ·
        21 hours ago

        If people on Lemmy can’t understand this I have no hope for the average person.

      • humanspiral@lemmy.ca
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        1
        ·
        12 hours ago

        training data is as important as the source code here to replicate the end result

        this is the nature of this flame war. Perfect replication of the end result, which is extremely opaque in how it works, is not nearly as important as the weights, that you can post train for any domain specific/general improvement with any other dataset. Which is how the authors would improve/change the weights further as well.

    • chloroken@lemmy.ml
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      1
      ·
      17 hours ago

      You’re talking like you know what you’re talking about, but you clearly are guessing. Knock it off. Don’t mask conjecture as fact.