Do you ever ask questions like in the picture below?

what is text analysis?
what is text analysis?

If so, here I am trying to write a simple article for those who have asked questions and want to know the meaning of text analytics.

Good without further ado following the discussion.

Definition of Text Analytics

Text analysis is an activity analyzing text data such as emails, blogs, tweets, forums and other forms. Text can be included in the category of unstructured data (we can call it unstructured-data). Because the discussion on the notion of text analysis is incomplete if it does not also discuss what is unstructured-data, for that unstructured-data I will discuss further below.

What is unstructured-data?

Why to understand text analysis we need to know about unstructured data, this is because text data or unstructured data have several characteristics that influence the way we do text analytics.

If you have often used database technology before, then accounts must be familiar with structured data (we call it: structured-data).

The amount of data from structured data is far less when compared to unstructured data, this is if you pay attention to the large number of blog articles, status on social media such as Facebook, Twitter, Linkedin, etc., documents created, emails sent every day.

Accounts can imagine that in a day nearly 2 million more blog articles are created, 500 million tweets (not to mention other social media), 100 billion emails are sent. From this we can know the enormous amount of unstructured data compared to structured data.

What is the importance of the unstructured data for us?

To be able to see the importance of unstructured-data we need to know some examples of important information that we can get from unstructured-data.

In every document there is always information hidden in it, we take the example of a quarterly financial statement document from a company, from these documents we can get important information such as income statistics or names of important people from the company.

Another important example of information is that we can know a very important pattern of communication from unstructured data, this important information can be in the form of people’s habits in communicating that is how they express their opinions when communicating both in e-mail, status on social media. This information is very important as a tool in the field of marketing, measuring customer satisfaction, knowing one’s interests, and various other benefits.

Problems encountered from unstructured-data

But to get important information from this data type is not as easy as the structured-data type. This is because the main character of unstructured-data is easy to understand for humans but difficult for computers or machines.

Why is it more difficult for computers to understand unstructured data?

When compared with structured data arranged in tabular form, the attributes of data such as attribute names and data types (such as integer, Decimal, Text, Numeric, etc.) have been defined. this is easy for computers to understand.

However, the main characteristic of unstructured-data is that it does not have specific data type attributes as in structured-data, it is only a collection of text.

Then, how can humans know the meaning of unstructured data?

Humans can know through the context of the sentence or text.

Take for example the sentence “Dana just got a fresh fund”, we can easily understand that the word Dana means different in context in the sentence. But for computers the word fund is an identical word.

Then how do we analyze the unstructured data if it turns out that it is not easy for the computer?

Relax … it might have been difficult, but now there are a lot of technology developing that can help text analysis, especially nowadays with the increasingly big and booming technology of Big Data. For additional information I will try to mention below examples of technologies that we can use for this text analysis.

The question that arises next is why we need to do text analysis?

Benefits of Text Analytics

To be able to answer the question why the need to do text analysis is because this process will provide significant benefits for most industries or other interests such as politics for example. For example, here are some of the uses derived from text analysis activities.

  • If the company is suspicious about the possibility of company secrets being leaked to competitors by employees themselves, the company can analyze millions of company employees’ emails to detect the possibility early.
  • If the account wants to know the difficulties / complaints faced by customers when using the product, the account can analyze their comments and questions in the forum or on social media.
  • If the account wants to measure and compare positive or negative perceptions of the company, brand, or product, then the account can do Sentiment Analytics using text analysis.
  • If you want to measure and compare people’s perceptions of potential leaders in an area, take the hot example, election of the governor of Jakarta.
  • Pak Basuki Tjahaja Purnama (Ahok) or his opponents like Sandiaga Uno can use text analysis as an important part of their strategy to support their respective victories. and many more examples of the benefits of doing text analysis.

Text analytics technology available

With so many technologies available today, when choosing the text analysis technology that we will use can depend on the following three factors.

With so many technologies available today, when choosing the text analysis technology that we will use can depend on the following three factors.

  • The type of data to be processed
  • What important information you want to extract
  • The technological environment to be used

The following are some examples of technologies that can be used for text analysis

Pig
JAQL
AQL
Python Natural languange Toolkits (NLTK)
General Architecture for Text Engineering (GATE)
QDA Miner Lite
TAMS Analyzer
Carror2
PAINT
KH Coder

The top three technologies include tools available in the Big Data and Hadoop technology environments. actually there are still many tools or technologies besides the 10 technologies above, but for this article I try to mention the list above as an example.

Thus I wrote this article, if the account feels useful, of course I will be happy to allow the account to share or like this article.

Author

I am Data Science enthusiast. Interested in Big Data, Python, Machine Learning. My daily activities analyzing data to solve problems

1 Comment

  1. Sanjay Mehrotra Reply

    You forgot the two main programming languages that are adept at text analysis : Python and R.

Write A Comment