Wine

Objective

I want to find some states who has outstanding features compared with other states.

Data wrangling

First I made a new column that can check aging period as “duration”.

After that, I changed award into numeric type, as the greater is the better. I did this because I found that almost over 95% of the data could be categorized by “Double gold”, “Gold”, “Silver”, “Bronze” and this could have order in itself.

Loading...

In addition, looking into the state information, found that most of the data is composed by the top 5 states with every award so I will decide making other groups except the top 5 states and will set my target more concrete like “Are there any outstanding feature among these top frequently awarded 5 states”

Loading...

Analytics (PCA)

Finihsed wrangling, I put data into PCA at first with the column as follows.

With the output, things below could be considered.

Loading...
  • Price and Award is plotted in the same upper vector
  • Against my expection Award and duration has less relation
  • Price and aging period is plotted in the same left vector

And same thing coulde be confirmed with the heatmap too.

Loading...

Anyway, returning to the original objective, I would like to found outstanding states, so will using color to check each feature

Overall
Ok too much CA.

Loading...

Let’s check each states.

NY

Very intersting. Seems NY’s awarded wine has tendency to be cheep. Simultaneously, NY’s awaraded wine has less aging period.

OR
Seems nothing special to be referred.

TX
Seems there is a similarity between NY

WA
Nothing special.

Visualize

Ok, Let’s visualize what we expect for NY and TX.

Loading...
Loading...

Summary

Checking the violin plot, it could be said that US awarded wine has more strong feature for cheapness and less aging period.