[METHOD] How to Accurately Filter Followers by Language - JARVEE (+BONUS TIP)

Hello,
I’ve been receiving an overwhelming amount of positive feedback, on my last 2 posts, so I back with another.

What if your client wants to target Arabic users [مرحبا], for example, but doesnt want to target Indian users [हैलो];

There is an option in JARVEE under user filters called “Skip Non English Users”, The way this works is it won’t accept any account that has any characters other than [a, b, c…y, z, 1, 2, 3…], You get the point.

Using the “Skip Non English Users” filter clearly wont work as [مرحبا], for example, doesn’t have English Characters.

You could add the male and female names in advanced settings, but thats also not the best way to do it as you can’t really target a language by a person’s name.

The Method:

  1. We are going to do this using the “User bio/username/name must not contain any invalid words” filter
    Check that box in your filters, then go to configure lists, alternatively go to Global Tools, then IG Words List

  2. We’re going to create a list for each language you want to avoid, something like this

  3. The data I use comes from https://1000mostcommonwords.com/, you can use whatever source you want once you understand the concept of how this method works.

  4. Go to that website, click the language you want to avoid, and you will be given a list of the 1000 most popular words for each language.
    Copy the data, put it through excel/text editor, and reformat it to the way JARVEE wants it.

  5. Do this for as many languages as you find necessary.

  6. Go to the clients follow/like tool, and now under the “User bio/username/name must not contain any invalid words” filter, add the lists like this:


    You can set a different combination of lists for each individual user.

  7. Now when JARVEE is scraping, it will not accept any users that have any words from those lists on their profile, which we classified by language.
    Now my client can get arabic users and not the other non english languages.
    Make sure to uncheck Skip Non English Users or none of this will work.

You can also do this using the “User bio/username/name must contain one of the following words
You could use this if you only want to target Hindi, arabic, chinese… users.

Remember this should work most of the time, but it will not if someone’s bio/username/name does not contain a word from the list you entered, but I still haven’t had any issues ever since I set this up as the lists from the site are great.

Bonus Tip: for the lists, let’s say some people are writing Hindi with english characters, to solve this, copy your list into google translate, and copy the pronunciation (highlighted in blue) and add it to the list.
Note that google translate only supports 5000 characters at a time, so you might need to do this in batches of about 500 words at a time!

Extra step:
Copy the latin version of the language into microsoft word, you will get a red line under words not in the english dictionary (your writing language needs to be set to English).
go ahead and remove the words that match anything from an English dictionary in order accidentally remove English accounts.
You could also visually scan through the list and make sure that there are no english words in this translated version.

Let me know if you have any questions,
-Hadi

30 Likes

This guy is on fire, thank you for the gold nuggets!

This never works for me!

2 Likes

Thank You!

Try my method, it should work for you, the skip non english users never worked for me either which is why I developed this technique.

2 Likes

ikr he’s really on fire lol

2 Likes

well first don’t filter Hebrew :slight_smile:
second your method is processor burner while it can be much easier to just filter chars instead of words like a*, b*… it uses regex

3 Likes

This is a great share. Thank you!

You can use wildcards to blacklist arabic letters.

Add wildcards before and after each arabic letter. Example:

3 Likes

This is a must for many arab clients

I didn’t know about that, thanks for the info, I am aware that this is processor intensive, I personally run jarvee on a beefed up computer so it doesn’t really affect my performance, but your method would definitely be more efficient.
The reason I prefer this method over the avoid “Skip Non English Users” method is that a lot of english users like to get fancy and write things in their bio like (🄷🄴🄻🄻🄾 𝓜𝓟 Ⓢⓞⓒⓘⓐⓛ), which is not considered an english character.
Would it work if we add the letters into lists?
Also, @ossi, does this work with every language or just arabic? I assume it should but I want to confirm.

My method will still work great for those who write in their language using english letters as explained here:

I like the bonus tip.
can btw, someone share language letters?
your method is great for Turkish where the chars are mostly English one.

1 Like

BTW in my personal bot I use Google translate so it can detect much smarter cnn network… jarvee need to have a built it feature for that.

1 Like

This should help, just search through the page for whatever your looking for,

1 Like

Thanks for the share!

The problem with this approach is that you are removing valid words. Example in your screenshot about Hindi translation, I can see the word “main” which is a valid word in English…

For non latin languages, you can just use the alphabet instead of adding words that way you are sure you will filter non latin characters…

1 Like

You are right, I used this screenshot just as a concept, but in order to fix the issue you mention its really easy:
Copy the latin version of the language into microsoft word, you will get a red line under words not in the english dictionary (your writing language needs to be set to English).
go ahead and remove the words that match anything from an English dictionary.
Thanks for brining it to my attention!

I will try this out, although the letter path might not always work for me, since some english accounts use a few foreign letters in their bio as it looks “fancy”, like a font. Filtering out each letter individually will remove that user from the list, whereas using the 1000 most popular list method, it will remove users who write full words in a foreign language, confirming what language they speak.

Another golddy tutorial from @Hadi. You’re hyped up man, keep it up. Hat off for the golden tutorials :slight_smile:

1 Like

Thanks so much for sharing this! Such an amazing tutorial.

1 Like

Thank you for sharing very helpful :pray:

1 Like

Reviving the thread!

Protip, once you copy to your languages to a list on excel, copy the column, paste it “text only” in Word.

Then use Ctr-H to find and replace all the gaps, replace all ^p with a comma.

Thanks again Hadi, finally implemented this as one of my clients was concerned with non english followings.

1 Like

Great tip,
I’ll throw in the same tip for mac users too.

Copy from excel/numbers, and paste in textEdit as plain text “control+option+shift+v”
then click “command+f” to open find and replace, then click “control+option+command+p”, select “line break”, and in the replace field add a comma.

1 Like

yes if someone can share his lang letters it be very helpful :v:

Great! so this way we can filter not to follow Indian users but how can we effectively target only Arabic users? With the Name list?

1 Like

Thanks for sharing @Hadi :smiley:

1 Like