用手势操控你的电脑——基于Opencv.js，Tensorflow，electron的手势人机交互

前言

昨晚看了以下许久没有访问的博客，发现最近一篇博客居然是今年3月份的一份AGC题解。还记得大一的时候，写博客的热情高涨，恨不得日更三篇；上了大二之后，一方面是各方面的压力变大了，另一方面是自己在这方面越来越懒，~~于是出现了四个月没写过一篇文章的情况~~。这学期上多媒体课程，期末大作业和刘总一起做了一个基于手势识别的人机交互小程序，就拿来~~水一水~~吧

介绍

看标题就知道，这是一个通过识别手势来完成一些电脑操作的东西。因为是一个桌面应用，同时为了界面好看（好看是第一生产力.jpg），我们选择了electron。首先是用python+Tensorflow训练一个CNN用于手势识别，并导出该模型为json格式；然后，我们在electron中使用opencv.js处理图像，并把上一步训练得到的json格式模型导入到Tensorflow.js中做手势识别；最后，借助robot.js来实现操控电脑。

放一下结构图

之所以要分离出前端和服务端，是因为robot.js要塞进electron里面比较麻烦，尝试了好几次之后都没有成功，于是索性不把他放到electron里面。这样一来，避免了配置上的麻烦，而且结构更清晰，代码写起来也更方便。

目前仅支持在windows上使用，可以完成的操作有：

切换窗口
隐藏窗口
静音/恢复音量
关闭当前窗口
上、下、左、右四个方向键

模型搭建

这一步主要就是采集训练样本，然后使用Opencv处理图像，最后扔进Tensorflow里训练。

先说一下采集训练样本以及处理图像这两部分。通过Opencv调用摄像头，每0.1秒截取一次图像，然后用Opencv处理这一图像。所做的处理包括：

获取ROI（感兴趣区域）

其实就是从原图像上截取一个指定区域。

处理前

处理后

代码如下：

def getRoi(frame, x0, y0, width, height):
  roi = frame[y0:y0 + height, x0:x0 + width]
  cv.imshow('roi', roi)
  return roi

使用Otsu法（大津法）进行肤色检测

大致原理是通过遮罩的方法，计算按位与运算，从而过滤掉那些我们不关心的像素。代码如下：

def getSkin(frame):
    ycrcb = cv.cvtColor(frame, cv.COLOR_BGR2YCR_CB)
    y, cr, cb = cv.split(ycrcb)
    cr_ = cv.GaussianBlur(cr, (5, 5), 0) # 高斯模糊
   _, skin = cv.threshold(cr_, 0, 255, cv.THRESH_BINARY | cv.THRESH_OTSU) # otsu二值化 
    ret = cv.bitwise_and(frame, frame, mask=skin)
    return ret

处理前

处理后

提取轮廓并计算傅里叶算子

提取轮廓可以用opencv中的findContour。计算傅里叶算子可以理解为提取特征。

处理前

处理后

代码如下：

提取轮廓

def findContour(Laplacian):
  h = cv.findContours(Laplacian, cv.RETR_EXTERNAL, cv.CHAIN_APPROX_NONE)
  contour = h[0]
  contour = sorted(contour, key=cv.contourArea, reverse=True)
  return contour

计算傅里叶算子：

def trucate(des):
    ret = np.fft.fftshift(des)
    centerIdx = int(len(ret) / 2)
    low, high = centerIdx - int(MIN_DESCRIPTOR / 2), centerIdx + int(MIN_DESCRIPTOR / 2)
    ret = ret[low:high]
    ret = np.fft.ifftshift(ret)
    return ret

def fourier(frame):
    gray = cv.cvtColor(frame, cv.COLOR_BGR2GRAY)
    dst = cv.Laplacian(gray, cv.CV_16S, ksize=3)
    Laplacian = cv.convertScaleAbs(dst)
    contour = findContour(Laplacian)
    contourArray = contour[0][:, 0, :]
    retbg = np.ones(dst.shape, np.uint8)
    ret = cv.drawContours(retbg, contour[0], -1, (255, 255, 255), 1)
    contourComplex = np.empty(contourArray.shape[:-1], dtype=complex)
    contourComplex.real = contourArray[:, 0]
    contourComplex.imag = contourArray[:, 1]
    fourierResult = np.fft.fft(contourComplex)
    desInUse = trucate(fourierResult)
    return ret, desInUse

根据傅里叶算子重构：

def reconstruct(img, desInUse):
    contour_reconstruct = np.fft.ifft(descirptor_in_use)
    contour_reconstruct = np.array([contour_reconstruct.real, contour_reconstruct.imag])
    contour_reconstruct = np.transpose(contour_reconstruct)
    contour_reconstruct = np.expand_dims(contour_reconstruct, axis=1)
    if contour_reconstruct.min() < 0:
        contour_reconstruct -= contour_reconstruct.min()
    contour_reconstruct *= img.shape[0] / contour_reconstruct.max()
    contour_reconstruct = contour_reconstruct.astype(np.int32, copy=False)

    black_np = np.ones(img.shape, np.uint8)  # 创建黑色幕布
    black = cv2.drawContours(black_np, contour_reconstruct, -1, (255, 255, 255), 1)  # 绘制白色轮廓
    cv2.imshow('contour_reconstruct', black)
    return black

完成上面这些步骤后，就可以得到原始的训练样本了。

接着还要处理得到测试集与训练集。这里做的事情包括:

改变原始图片尺寸

将尺寸从\(300\times 300\)改成\(128\times 128\)。
归一化

将图片中每个像素的值除以255。
划分测试集与训练集

最后搭建CNN并训练了。CNN的结构如下：

Layer	Width	Height	Filter	Kernel Size
Input	128	128	-	-
Convolution	128	128	32	3×3
Max Pooling	64	64	32	2×2
Convolution	64	64	64	3×3
Max Pooling	32	32	64	2×2
Convolution	32	32	128	3×3
Max Pooling	16	16	128	2×2
Flatten	32768	-	-	-
Dense	64	-	-	-
Dropout	64	-	-	-
Dense	9	-	-	-

最终训练结果：

效果还不错（至少目前来说）

前端

前端部分做的事情除了展示界面，还包括以下这些：

采集手势

调用摄像头并读取图像帧。代码如下：

function getMedia() {
  let constraints = {
    video: {
      width: 400, 
      height: 400, 
      facingMode: 'user',
      mirrored: true
    },
    audio: false,
  };
  let promise = navigator.mediaDevices.getUserMedia(constraints);
  promise.then(function (MediaStream) {
    video.srcObject = MediaStream;
    video.play();
  }).catch(function (PermissionDeniedError) {
    console.log(PermissionDeniedError);
  });
}

数据处理
其实就是把在Opencv里的东西在Opencv.js上再实现一遍
模型预测

将处理后的图像帧数据转换为Tensorflow.js输入所对应的格式后，将其输入上述已经训练好的模型中，获得预测值
手势判断与输出

根据模型预测输出的Tensor判断手势类型，且根据“阈值溢出机制”判断是否向服务器发送手势。什么是“阈值溢出机制”呢？实际上就是，单位时间内，如果某个手势的出现频率超过一个阈值\(\lambda(>0.5)\)，则认为这一手势就是当前手势。

服务端

这一部分通过TCP协议获取前端的手势识别结果，并根据这一结果，通过调用robot.js来完成相应操作。

服务端代码：

const net = require('net')
const robot = require('robotjs')
let controlable = false//解锁
let directable = false//开启上下左右键
let key_set = []
function clear_key() {
  for (let index = 0; index < key_set.length; index++) {
    robot.keyToggle(key_set[index], 'up');
  }
  key_set = []
}

function minimize() {
  console.log('minimize')
  pre_gesture = 7;
  robot.keyToggle('command', 'down')
  robot.keyTap('D')
  robot.keyToggle('command', 'up')

}
function closeWindow() {
  console.log('closeWin')
  pre_gesture = 4;
  robot.keyToggle('alt', 'down')
  robot.keyTap('f4')
  robot.keyToggle('alt', 'up')
  
}
function altTab() {
  console.log('altTab')
  pre_gesture = 8;
  robot.keyToggle('alt', 'down')
  key_set.push('alt')
  robot.keyTap('tab')
}
function re_altTab() {
  console.log('re_altTab')
  pre_gesture = 9;
  robot.keyToggle('alt', 'down')
  key_set.push('alt')
  robot.keyToggle('shift', 'down')
  key_set.push('shift')
  robot.keyTap('tab')
}
function audioMute() {
  console.log('audio_mute')
  robot.keyTap('audio_mute');
}

let pre_gesture;
const server = net.createServer(function (sock) {
  sock.on('close', function () {
    console.log('close socket')
    server.close()
  })
  sock.on('data', function (data) {
    console.log('ok!')
    console.log(data.toString())
    let stringifyData = data.toString()
    if (stringifyData === '5') {
      directable = false;
      if (controlable) {
        controlable = false;
        clear_key();
      } else {
        controlable = true;
      }
      pre_gesture = stringifyData;
    }
    if (controlable) {
      if (directable) {
        if (stringifyData === '1') {
          console.log('up')
          robot.keyTap('up')
        } else if (stringifyData === '7') {
          console.log('down')
          robot.keyTap('down')
        } else if (stringifyData === '8') {
          console.log('right')
          robot.keyTap('right')
        } else if (stringifyData === '9') {
          console.log('left')
          robot.keyTap('left')
        }
      } else {
        if (stringifyData != pre_gesture) {
          clear_key()
        }
        if (stringifyData === '6') {
          audioMute()
        } else if (stringifyData === '7') {
          minimize()
        } else if (stringifyData === '4') {
          directable = true;
        } else if (stringifyData === '2') {
          closeWindow()
        } else if (stringifyData === '9') {
          re_altTab()
        } else if (stringifyData === '8') {
          altTab()
        }
      }
    }
  })
})

server.on('listening', function () {
  console.log('start listening')
})

server.on('error', function () {
  console.log('listen error')
})

server.on('close', function () {
  console.log('stop listening')
})

server.listen({
  port: 6080,
  host: '127.0.0.1',
  exclusive: true
})

electron的主进程中连接服务端并发送手势识别结果

const sockConfig = {
  port: 6080,
  host: '127.0.0.1'
}
const sock = net.connect(sockConfig, function () {
  console.log('connected to server!')
})

sock.on('connect', function () {
  console.log('connect success')
})

function sendGesture(ges) {
  let ges2string = ges.toString()
  console.log(ges2string)
  sock.write(ges2string)
}

最终实现效果

为了避免误操作，一开始是上锁状态，无法操作

手势5解锁

无法识别手势，需要调整手势

NeoRuTayE's blog