Beyond Word2Vec Usage For Only Words


Beyond Word2Vec Usage For Only Words

By Stanko Kuveljic

Problem with Kafka streams!?

What is Kafka Streams? Before diving straight into the main topic, let me intro ...

By Aleksandar Pejakovic

What it takes to build a production ready AI solution

When we decided to open up the SmartCat, we had already gained various experienc ...

By Nenad Bozic

When data goes missing

  Are you working on a project where no data is available to you? Maybe you are ...

By Nina Marjanovic

Cassandra to Kafka data pipeline Part 2

If you haven’t read the previous part of this blog, you can find it here. There, ...

By Vladimir Vajda

Word2Vec - the world of word vectors

    Have you ever wondered how a chatbot can learn about the meaning of words in ...

By Stefan Nikolic

Freedom vs. Structure (and impact on performance)

    One of my favorite comedies, Office Space, touches upon the employee motivat ...

By Bojan Kovac

Kafka racing: Know the Circuit

Introduction This is the first post in a blog series dedicated to Apache Kafka ...

By Nikola Ivancevic

Chatbot Nabu will assist you now

INTRODUCTION Chatbots are computer programs that are able to conduct a conversa ...

By Nina Marjanovic

Facelyzr Deep Learning Project

Introduction     Deep learning is a phrase that follows us everywhere. Even the ...

By Stanko Kuveljic

Family First

     Internally, at SmartCat, we joke about tough subjects in today’s society. ...

By Bojan Kovac

WoE and IV Variable Screening with {Information} in R

Variable screening comes as an important step in the contemporary EDA for predic ...

By Goran S. Milovanovic, PhD

Cassandra to Kafka Data Pipeline Part 1

Introduction I’ve wanted to create a system which in its core uses event sourci ...

By Vladimir Vajda

SmartCat Values: No Bullshit Company

Disclaimer: This blog post contains a tasty dose of profanities. Some may call u ...

By Bojan Kovac

Visualising Similarity: Maps vs. Graphs

The visualization of complex data sets is of essential importance in communicati ...

By Goran S. Milovanovic, PhD

SmartCat Values: Aim to Impress

        Standards we have set for ourselves and the work we do reflect on the st ...

By Bojan Kovac

Cognitive Computing

What is Cognitive Computing?  Most probably anyone who is even remotely aware o ...

By Goran S. Milovanovic, PhD

Integration testing with Ranger

range /reɪn(d)ʒ/ (noun): the area of variation between upper and lower limits on ...

By Milan Milosevic

The Curse of Simplification

A short, non-technical narrative on complexity in Data Science projects and how ...

By Goran S. Milovanovic, PhD

SmartCat Values: Start with "Why?"

  Why did we organize this meeting? Why do I have to write the minutes of meeti ...

By Bojan Kovac

Load testing Kafka with Ranger

The best way to test an infrastructure before going into production is to mimic ...

By Matija Gobec

Fast matrix factorization in R

This article will be a wrap-up of our series related to collaborative filtering ...

By Stefan Nikolic

SmartCat Values: Knowledge is power

When we decided to start our own company, back in 2015, the first thing we did a ...

By Bojan Kovac

Hybrid content-based and collaborative filtering recommendations with {ordinal} logistic regression (2): Recommendation as discrete choice

In this continuation of "Hybrid content-based and collaborative filtering recomm ...

By Goran S. Milovanovic, PhD

Hybrid content-based and collaborative filtering recommendations with {ordinal} logistic regression (1): Feature engineering

  I will use {ordinal} clm() (and other cool R packages such as {text2vec} as w ...

By Goran S. Milovanovic, PhD


You shall not pass! This pretty much sums up the main reason why Twitalyzr was m ...

By Stanko Kuveljic

Data Science Unicorns and Where to Find Them

Each project in this Big Data world is going through a carefully paved path. Fir ...

By Nenad Bozic

#AskNASA: What's the Optimal Time for Aliens to Invade Earth?

My inaugural blog as a Data Science Consultant for SmartCat. The code that accom ...

By Goran S. Milovanovic, PhD

Improved R implementation of collaborative filtering

Collaborative filtering (CF) is one of the most popular techniques for building ...

By Stefan Nikolic

Challenges of Monitoring Distributed Systems

Last October one of our co-founders and senior consultants Nenad Bozic held a pr ...

By Nenad Bozic

Where is my data - debugging SSTables in Cassandra

Apache Cassandra is great for handling huge volumes of data. Everything works re ...

By Nenad Bozic

MongoDB vs Couchbase - part two

This is the round two in comparing MongoDB vs Couchbase. In round one, we saw th ...

By Milan Milosevic

Cassandra Tuning - Above and Beyond

This September one of our co-founders and senior consultants Matija Gobec held a ...

By Matija Gobec

Tuning Java Driver for Heavy write and Low Latency Read Scenario

In the first two blog posts (part 1 and part 2) we gave a couple of pointers abo ...

By Nenad Bozic

Recommender Systems: Matrix operations for fast calculation of similarities

Recommender systems have become ubiquitous and very important in recent years. T ...

By Stefan Nikolic

Tuning DataStax Java Driver for Cassandra - Part 2

In first part of this blog post series we covered basic settings which can give ...

By Nenad Bozic

Tuning DataStax Java Driver for Cassandra - Part 1

When people think of tuning Apache Cassandra to perform better, their first inst ...

By Nenad Bozic

The Next Generation of OSS Software Won’t Be Apache

The Apache Software Foundation (ASF) has been a steward of free open source soft ...

By Scott Hirleman

Intro to Document-Oriented NoSQL Databases

This is the first post in the series about comparing MongoDB with Couchbase, whi ...

By Milan Milosevic

Systemd Or How I learned to stop worrying and love newness

Working in the IT world where things are not yet fully connected, integrated a ...

By Nikola Ivancevic

Metric Collection Stack for Distributed Systems

In our previous post we referred to the subject of having logs in a central plac ...

By Nenad Bozic

Slow Queries Monitoring

Working on high nines where the latency of every query matters is a whole differ ...

By Nenad Bozic

Distributed logging

Browsing through logs is always hard, even when you are on a single node system. ...

By Nenad Bozic

After a mile in your own shoes

Spoiler Alert: This article is not technology-focused. It’s people-focused. Ton ...

By Bojan Kovac

What's new in Apache Cassandra 3.0 - part 2

In the part one of “What’s new in Cassandra 3.0” I got into details about materi ...

By Matija Gobec

Monitoring stack for distributed systems

Microservice architecture on the one hand, and distributed systems on the other, ...

By Nikola Ivancevic

Craft conference 2016

This was our third Craft Conference, the place to be if you are connected to IT ...

By Nenad Bozic

Polyglot Persistence in NoSQL Space

Relational databases have been around for a long time, developers tend to use th ...

By Nenad Bozic

To walk a mile in client's shoes...

     It was late one Wednesday evening, 8:30 pm. We were pulling long hours befo ...

By Bojan Kovac

Introduction to Apache Kafka

In my previous blog I wrote about distributed systems and why we choose this pat ...

By Matija Gobec

How (not) to start with Apache Cassandra

Within several previous projects, we have held consultations for development tea ...

By Nenad Bozic

Functional testing of email communication

Functional testing series Blackbox testing microservices Graybox testing - C ...

By Nenad Bozic

What's new in Apache Cassandra 3.0 - part 1

In the world of a fast growing number of NoSql databases and fast, scalable and ...

By Matija Gobec

Go CD - Continuous delivery through pipelines

In order to compete in today’s IT market, you must be truly agile, you must list ...

By Nenad Bozic

Front-end first development?

Whenever a new product or new feature implementation is ahead of us, there are m ...

By Bojan Kovac

What it means to be a geek

I walked through the awakening downtown of Novi Sad. It was early. People with t ...

By Bojan Kovac

How to hire a good data scientist and avoid fake ones?

With the current “big data” hype there is a big demand for skilled and knowledge ...

By Milos Grubjesic

Graybox testing - Control your dependencies

Functional testing series Blackbox testing microservices Graybox testing - C ...

By Nenad Bozic

Bring functional tests closer to business with Cucumber

Functional testing series Blackbox testing microservices Graybox testing - C ...

By Nenad Bozic

Cassandra Summit 2015

As this was the biggest NoSQL event in the world and the biggest gathering of Ca ...

By Nenad Bozic

Spring batch as framework for system integration

We had finished up the first set of requirements for some project and obtained a ...

By Nenad Bozic

Spark + Cassandra: The perfect match

Hadoop has been the leading platform for distributed data storage and analytics ...

By Matija Gobec

Leveraging parallel execution

With NoSql databases comes change in physical data modelling. When it comes to t ...

By Matija Gobec

Setting up Embedded Cassandra on Spring project

When we first started using Cassandra, we immediately realized there would be a ...

By Nenad Bozic

Cassandra migration tool

Developing a product usually means that during the period of development you are ...

By Matija Gobec

Migrating time series data from MySql to Cassandra

MySql is still widely used in application development as a stable, fairly perfor ...

By Matija Gobec

Blackbox testing microservices

Functional testing series Blackbox testing microservices Graybox testing - C ...

By Nenad Bozic

Why go distributed

Why go distributed? When talking to other fellow engineers and people in our ind ...

By Matija Gobec

Cassandra complex queries - lessons learned

Just a couple of years ago, the decisions faced by software architects were quit ...

By Nenad Bozic

Why Big Data

We wanted to share why we want to do what we do and why we think this is the fut ...

By Nenad Bozic

Craft conference 2015

This was our second a craft conference, the place to be if you are connected to ...

By Nenad Bozic