Understanding How AI Works is Critical to Our Privacy Defense | Rob Braxman Tech

Categories
Posted in: News, Patriots, Rob Braxman Tech
SPREAD THE WORD

BA WORRIED ABOUT 5G FB BANNER 728X90

 

Summary

➡ Rob Braxman Tech talks about how AI is a powerful tool that can enhance personal knowledge and understanding. It’s safe to use, especially when it’s local and not connected to any external party. To fully benefit from AI, it’s important to understand how it works. This includes understanding the transformer architecture, which powers AI, and how it uses a mechanism called attention to analyze input data. This architecture allows AI to simulate intelligence and understand complex linguistic phenomena. It’s also important to understand the concept of the model universe, where words and concepts are represented and grouped based on their meanings and relationships. The more you understand about how AI works, the more effectively you can use it.
➡ AI models, like the transformer architecture, work by looping through layers to generate responses based on context. However, these models have limitations, such as being out of date with current events and potentially providing inaccurate or incomplete data. They can be improved by adding context or using larger models with more parameters. Despite these improvements, models still lack situational awareness and can only respond based on the data they’ve been given.
➡ This text discusses the potential dangers of AI, particularly when it comes to privacy and manipulation. The author warns that external parties can manipulate AI to alter information based on the user’s profile. They also discuss the limitations of AI censorship and how it can be bypassed. To combat these issues, the author suggests using local AI and privacy-focused products, which they offer on their social media platform, Brax Me.

 

Transcript

I will state this clearly. AI is incredibly useful for personal use. It’s like adding more brains to your head, but without any chip, without neuralink, or in our case, no connection to any external party. And that’s the only way we will use it with a privacy focus. A local AI and I guarantee this way will be safe and I free. If you avoid AI out of privacy fear, then you will end up falling behind in knowledge compared to others, and we don’t want that. As I discussed in prior videos, currently the way to do this is to go to Alama AI and download Olama and your desired open source models to your computer.

But in order to maximize the benefits of AI, you need to understand how it works, and when you do, you will realize what kind of questions are suited to the AI and what are not, and understand also why it’s safe when using it. The way I recommend, the local AI’s we’re going to be using are large language models or LLMs Skynet type questions nope. Is it spying on you? Nope. A local AI will not the best approach to understanding an LLM is to see how it works, and then you can control it and manage its limitations.

To learn how to do this, stay right there. Sometime in 2017, a bunch of AI researchers tried out a new approach to doing AI, which was described in a paper called attention is all you need. That new way was called the transformer architecture. This is what is powering the current LLM AI revolution. The new direction allowed for faster training and scalability because of a mechanism called attention. This allowed the model to learn how to view the input data simultaneously, instead of sequentially looking at each word as was done before. So, in simple words, instead of just sequentially feeding input or a prompt to the model, the model learned how to analyze input ahead of time, in totality and by itself.

Roughly three years after that paper, the first example of the new transformer based AI chat, GPT-3 showed up. This appeared to simulate intelligence of an elementary school kidde. Today, we now have the promise of a coming chat GPT five by next year that may have PhD level skills. I’m giving you technical details of the transformer architecture so you can appreciate its strengths and limits. This will allow us to control it. Let me set you up with something you can visualize. Let’s imagine that the model is a supreme being. At least it thinks it is. It can make universes galore as many as it wants, and can make all the elements in the universe, like galaxies, clusters of stars, individual stars, and planets we humans communicate with a model using words, but someone figured out that a way to represent words is to give them a location in this theoretical universe.

And thats how it starts. First, the AI developers put words and word fragments in this universe in order to find these words. They are initially recorded with a location and direction, which in math is called a vector. I described this universe using three dimensions for simplicity. However, a computer can actually store each word in a location with very dense dimensions. So consider my 3d description of this universe as an oversimplification. In general, vectors point to some spot in the imaginary universe of the model. But here’s the important thing to visualize. The end goal of the model is to represent every possible occurrence of language in this imaginary universe.

They’re not stored in this imaginary universe randomly in a fixed location. Instead, after training, a model will move similar concepts together. This movement of data enables the model to represent complex linguistic phenomena such as semantic meaning, syntactic structure, and pragmatic context, with a correlation to its location in the universe. Pretty heavy words so close by concepts in this universe will be related, and all we have to do is find these concepts inside this universe. To be able to simulate intelligence using this analogy, we will travel through this model universe using a spaceship guided by maps to search for a context, and we should find related contexts nearby.

Maybe this is the way a human memory is organized. We don’t know, but this is how a model interprets it. As you will find out shortly, there are actually many universes in a model, a metaverse, each universe focusing on a particular nuance of context for its training memories. But the first universe we encounter in a transformer model is always called the embedding layer. Now, to be specific, in the case of an open source model like llama, three know that 50,000 words or word fragments are stored in this embedding, and initially, each word is assigned some initial pre established location in this universe.

This is part of the initial input data to the model and is not learned. An embedding looks like this, basically a grid where every word is a column and the row of values represents the vector of the word. Each column is a word that only appears once. All words are represented either as a complete word or with word fragments. There are 50,000 of them, as I said. And then the rows underneath each column represent its individual vector. Each vector is unique and random. During the training of a model, it begins to group words into actual meanings.

As it learns patterns from the data, it trains with the same words found in the embedding layer can be featured multiple times in another universe, and again are put together in some grouping that is tied to proximity based on detected patterns of contextual connections between words. This next universe is called the encoder layer, except there are multiple encoder layers. During the learning process, each encoder layer is focused on some particular characteristic of a contextual relationship. The idea is that each additional encoder layer refines the results of the prior layer by adding more and more nuances. Depending on the model, there could be roughly a dozen encoding layers in a smaller model, or a hundred layers in a larger model, or maybe more.

Now, the reason the transformer based model splits the neural network into multiple encoders is that it has figured out that this is the way to make managing large inputs more easily. This allows it to analyze the input in more manageable chunks and use less resources. By the way, the actual data in each encoder layer is built from multiple weights and biases, and these are layers of numeric matrices and are fixed in a pre trained model like a pre drawn map. This map guides our spaceship in each model universe depending on what we’re looking for, which is triggered by the input we give it.

This is the big picture of how a user interacts with a transformer architecture. In an LLM model, when we are actually querying the model, we are doing inference. So let me describe the flow of the model during this inference stage. First there is an input token layer. Then there is the embedding layer. Then you have multiple repeats of an encoder structure, then a decoder structure, and then the output. Then theres that arrow that shows the loop back to the encoder structure, as I will explain later. Now let me get into the detail. First, you would provide an input to the model, which is your prompt.

This is stored in a token layer. Then from this token layer your input triggers the equivalent words in the embedding layer. Now it’s more powerful than this because it could actually activate words it knows about in a prior conversation, which I will call context memory. And this includes all prior responses in its memory cache. Thus a lot more data could make up the input. Think of the input layer as really having three parts. One, the current input prompt, two prior contacts from memory cache, three words generated so far, which we will get into next. This is the part you need to understand the safety of a local AI.

The only portion of the AI with changing data is this input layer, and this always starts out as empty. When you start a local open source AI session, the input data is not persistent, nor is it learned. Now, you’ll be surprised by this if you didn’t know, but the encoder decoder layers are just there to predict what the next word should be in the response based on its combined collection of tendrils from the total collection of inputs. It passes the input to a mapping function which mathematically is made up of weights and biases that are activated based on your input.

Then this will guide a spaceship of the model to the most relevant area in the universe of words group by related meanings, and there the spaceship will discover the best related words that fit into the current input sequence, and those words are selected for output. After the last encoder layer, the model will go to the decoder and choose the most appropriate word based on a probability function which can be manipulated by the user using a temperature parameter to add creativity. Now, at the end of all this, to make it all clear so far we only get one word.

Then the model will loop again through the encoder decoder layers to find the next word, until the model determines that it has completed its response and stops the loop. Now let’s talk about an important sub element which explains what the encoder actually does. In the encoder layer, the model makes some context sense of the next word to generate based on the accumulated information acquired from the entire input layer. But don’t imagine the input tokens and subsequent generated words as individual words. Instead, think of the input layer as being constantly examined in word groups with parallel processes to find new words holding a similar contextual relationship.

By the way, how it examines the input is also learned, so the model will have optimized how to do this by itself. Think of a spacecraft that is searching for the next word to respond with. But the spacecraft has special features. It can connect active tendrils to the entire input layer, either original input words, context tokens, or newly generated words, and by doing so has real time guidance, so its map is more accurate, sort of like a car navigation system with knowledge of current traffic conditions in your current route. In practice, there are multiple attention mechanisms, and it could be different at each encoder or the decoder because the attention computations themselves are learned as part of the encoder machine learning just a few examples, the attention mechanism could examine the sentence as parts like subject, verb, object.

Also, the positions of words can be determined. The input tokens can be summarized, entities can be identified, the sentiment of the input can be analyzed, but the model learned how to do this by itself. It is not set up by pre established rules. So in our illustration of the transformer architecture, the attention blocks are part of the encoder but are connected to the input layer. Now, given this understanding of general AI design with transformer architecture, what are the limitations? When you use a model from the Olama list, let’s say like a llama three, understand that this is called a pre trained model based on what I just described.

It means the organization and locations of word concepts in the model universe will no longer change or get additional data. Basically, it was trained with information from some past period in time, let’s say one to four years ago. This means a model is never up to date with current events, so don’t waste time asking current event questions blindly. Understand the concept of knowledge versus intelligence. For example, you could meet Sam today, an MIT trained scientist who was sailing around the world for the last three years and is not up to date on science news. Sam is very intelligent, but Sam does not have current knowledge.

If you ask him a current event science question he wont know the answer. But if you gave him a quick summary of that knowledge before you ask, im sure hell have a good answer. So trying to trick AI into responding about something it doesnt know about will result in something called hallucinations, meaning it will make it up. Asking a model questions about Skynet plans to kill humans and world domination will reveal nothing in a pre trained model if that data was not part of the training and likely it will hallucinate. The answers will likely come from novels it read and not be based on any actual reality.

So be careful in interpreting this. An example that will solidify your understanding of the pre training cutoff is to ask the AI a question about stock market prices. If I ask the AI for the history of the stock price of Apple during the last ten years, it will end the price where it last got data and will assume that nothing has changed since. So in my testing of llama three, the apple price was from a year ago. Because of the size of the data used in learning, basically digitized books, academic papers, the entire Internet, and all other interactions that can be digitally recorded, there is no vetting process that can be performed in advance.

This means it is possible for the model to receive either inaccurate data or bye data or incomplete data. So private data, for example the internal documents of corporations will not be known if they are not published. Your personal profile and data will not be known if it is not fed to the AI training, for example your Facebook interactions or your Google searches. Now you can overcome this limitation by supplementing its knowledge base later on via context, which I will focus on later. And this could be the part that could make the AI really good or really evil pretty much instantly.

But you need to understand the limits of its learning. Larger models often give better results in the context of my analogy of the AI encoder universe. The more contextual subdivisions of ideas with deeper nuances then the richer the answers are from the model. This explains the failure of the transformer model in the chat GPT-3 stage. It didnt have enough knowledge of current models is constantly growing and often changing within a couple of months, so this is a short term issue. In today’s lingo, the size of the model’s addressable space is correlated to the parameter count. Smaller open source models for local use are typically 7 billion parameters, larger open source models are 70 billion parameters, and the large cloud models have trillions of parameters.

Later I will explain how you can overcome the limitations of smaller models using the current context. Models are just existing in a theoretical universe. They are frozen with fixed data. Remember that fixed data, that is the current state of AI. They don’t even know what model they are. They certainly don’t know they are running on Olama or that they’re open source. The model doesn’t know Olama because it was created after the model did its learning. They are not connected to the Internet by itself. They cannot connect to your computer clock. They do not even see what their parameters are when you run them.

So if you ask questions that expect some awareness, the model will hallucinate. It will invent answers. However, you can control this. For example, in my chatbot, I will always include the current local date and time in my context. So the AI can respond with comments like two years ago, or even tell you what time it is now, which will be based on what I told it last. So if you’re going to ask a question about something that requires the AI to have situational awareness, then provide that data to the AI if you need to refer to answers about it.

For example, some cloud based models will acquire contacts from you automatically. Bye. Asking you to take screenshots a very important element in AI use now is the token limit. Actually, there are two limits to consider. One is the input token limit. This is how many words you can include in your prompt. And there’s another limitation, and that is the context token limit. The context token limit includes prior conversations and is the more important value. This context limit is extremely important because even a smaller 7 billion parameter model can be made very powerful if you give it additional context as part of your prompt.

This is like giving Sam the scientist the latest news to bring him up to date. This will allow the model to add the current data to its knowledge base and respond more intelligently. Just as an example, lets say you are reading a brand new academic paper on AI. You can pre ingest this paper in your local AI and then start asking questions about it. There are technical terms for explaining what this is called, including retrieval, augmented generation, rag, or fine tuning depending on how it is done. But the end explanation is very simple. Its a way to pass knowledge to the AI without having to go through an expensive machine learning.

The idea by the way is to pass it only whats relevant to your question. Otherwise the context limit may apply. If the AI has enough space, it can preprocess data ahead of your prompt and then you can use that temporary learning in your future prompts in the current session. So in essence it’s like learning on the go. The pre trained model may be out of date, but this is solved by adding this information as context during a session. There’s a way to do limited fine tuning that does not require retraining and is built into the Olama project.

It is called model file and allows you to copy an existing model and make minor additions like changing roles, but complex fine tuning like teaching a model a lot of detailed private company data is not something thing you can do without using some cloud tools, so I will not recommend it. By the way, I believe the stated context limit to the llama three model is around 4096 or 4k tokens, which is around the number of words in this entire video. In testing, I’m going to guess the limit is smaller, like half that. The Quintu model may have a larger context limit, but I haven’t tested that yet.

When the context limit is reached, the model will become forgetful. It may not understand the complete past context, so passing long documents will not work. The newer cloud model solved this by having very large context limits like 128k tokens. So this is a temporary problem. The higher the context limit, the more usable the model will be. By having the ability to augment its knowledge via context, it could really be as simple as saying to the model, I will pass you this document as a context only. You do not need to respond and then follow that up with a prompt.

Or as I do in the chatbot I wrote, I just gave it some categorization in the input context prompt and it will figure it out. This is technically called rag or retrieval augmented generation, but think of it as just passing context. Examples of this are like passing source code in context or passing an image and then performing the prompt. I showed several examples of this in my last video in actual use. Now. Something I will discuss in a future video is how a personal profile or personal data can be passed as context and result in the AI having deep knowledge of you.

This does not have to be part of the training to be a danger. This is partly one of the approaches considered by Windows Copilot and by Apple Intelligence. This is one of the reasons I have complete distrust of externally controlled AI. Someone can manipulate the context and thus possibly manipulate the user. For example, say you are anti vax and you query the AI about that topic. An external party can alter the return information based on who you are and reward things to fit your thought process by adding a supplementary context or fine tuning. This can be applied to politics or any current event.

This is why I only want to use a local AI. It won’t happen in the way we use the AI models here. A model is typically censored by doing fine tuning, and this is more detailed than what we can do with a local model using olama. You actually have to do some machine learning, so we typically receive a model with censorship built in. What this is, it actually is like another encoder layer added to the inference sequence, which can alter the weights and biases in the original fixed model. While the model developer may try to include everything in the censorship, you will find that you can reason with the AI to bypass it.

In fact, the censorship rules are just instructions given. Do not allow certain questions. So if you have the wherewithal to manipulate the AI, you can bypass the censorship. For example, one approach is to imply that you are role playing. This of course has to be allowed, because if you’re writing a story, for example, it is not realistic for the story to not have some evil protagonist doing evil things. In conclusion, whether the model developers like it or not, censorship will never be perfect. Now the problem with this topic is that it is so huge that it is impossible for me to fit everything in one video.

So after youve absorbed this, we will get into some of the mechanics in later videos. If you cant wait, use the olama AI site to download the local AI in easy steps. Then ill give you the link to the current iteration of my chatbot built using Python. It will be in my GitHub page, which will be in the description. It’s very simple to run since it’s a single file called Myai py and I run it using versus code which really makes the UI more workable. I’ll have to go deeper in how to use the chatbot more effectively in later videos as I will be changing it over and over in the near future.

But I want to be clear that my chatbot is local only. No Internet connectivity required outside of accessing the local port. Localhost colon eleven four three four and then Olama itself accesses another random port made by the actual transformer module called Lama CPP. I’ve just explained to you how the transformer works, so thats what llama CPP does. Both these ports are accessible only inside your home network, so theres no fear of someone connecting to your AI from the Internet. Dont worry about that. Ive checked the source code on all of a llama and also the transformer Atama CPP and all these behaved as expected.

No situational awareness or local content. As I said, no hidden messaging or communications with HQ anywhere, and the model does not retain your contacts outside of the tools I built for you to maintain your own context with the chatbot. I hope this gives you a conceptual starting point again, leave comments if you want me to dig deeper into some of these elements. Folks. As we switch to an AI driven world, what I’ve been teaching about privacy seems to have even more importance. We need to stop the AI from knowing you personally, because if it does, you can be manipulated by it.

Fortunately, we can stop that with products I’ve already made and products that support this channel. We have the Google phones running AOSP that do not pass information to big tech and are not directly connected to big tech. We have a Brax virtual phone product which allows you to have inexpensive phone numbers that you can use to keep your identity away from big tech. For future AI intelligence. We have Braxmail that keeps your identity private so it cannot be harvested for AI data later on by big tech. We have bytes, VPN and Braxrouter that hide your ip addresses, which is a major identifier that can be harvested to identify our past actions.

All these products can be found on my social media platform, Brax Me. This platform is a place for people to discuss privacy, and over 100,000 people are there talking about privacy issues. There’s a store on there with all these products. Come visit us, chat with the community, and support what we do. Thank you for watching and I’ll see you next time.
[tr:tra].

See more of Rob Braxman Tech on their Public Channel and the MPN Rob Braxman Tech channel.

Author

Sign Up Below To Get Daily Patriot Updates & Connect With Patriots From Around The Globe

Let Us Unite As A  Patriots Network!

By clicking "Sign Me Up," you agree to receive emails from My Patriots Network about our updates, community, and sponsors. You can unsubscribe anytime. Read our Privacy Policy.

BA WORRIED ABOUT 5G FB BANNER 728X90

SPREAD THE WORD

Tags

AI and privacy concerns AI's lack of situational benefits of local AI bypassing AI censorship enhancing personal knowledge with AI improving AI with larger models limitations of AI models manipulation of AI by external parties potential dangers of AI privacy-focused products on Brax Me understanding AI model universe understanding AI transformer architecture using local AI for privacy

Leave a Reply

Your email address will not be published. Required fields are marked *